WebHarvy 6.1 – Internal Proxies, Database/File Update, New Capture window options

The following are the main changes in this version.

Option to leave a blank row when data is unavailable for a keyword/category/URL

In WebHarvy’s Keyword/Category settings page a new option has been added to leave a blank row filled with corresponding keyword/category/URL when data is unavailable for that item. This option is available only when ‘Tag with Category/URL/Keyword’ option is enabled.

For mining data using a list of keywords, categories or URLs, enabling this option helps in identifying the items for which WebHarvy failed to fetch data, as shown below.

Proxies are used internally by WebHarvy, not system-wide

In earlier versions, proxies set in WebHarvy Settings were applied system wide during mining. This caused side effects for other applications especially in cases where proxies required login with a user name and password and when a list of proxies were cycled. Starting from this version WebHarvy will use proxies internally so that other applications are not affected during mining. You still can apply proxies directly in Windows settings (system wide) and WebHarvy will use it automatically.

Also, the configuration browser will start using the proxies which are set in WebHarvy settings. In earlier versions, proxies were used only during mining.

Database, Excel File Export : Update option (Upsert)

While saving/exporting mined data to a database or excel file which already contains data (from a previous mining session), WebHarvy now allows you to update those rows of data which has the same first column value as those in the newly mined data, without creating duplicate rows.

For file export this option is currently available only for Excel files.

New Capture window options : Page reload and Go back

2 new capture window options for page interaction have been added – Reload & Go back. The reload option is helpful in cases where a page is not correctly loaded first time when a link is followed. The ‘Go back’ option navigates the browser back to the previously loaded page.

Keywords can be added even after starting configuration

Just like URLs, Keywords can also be added after starting configuration. This method is useful in cases where the normal method of Keyword Scraping cannot be applied. The only condition for adding keywords in this method is that the first keyword entered should be present in the Start URL or Post Data of the configuration.

Other minor changes

  1. During configuration, in pages reached by following links from the starting page, links (URLs) selected by applying Regular Expressions on HTML can be followed using the ‘Follow this link’ option. Earlier, only the Click option was available for this scenario.
  2. Automatically handles encoded URLs selected from HTML. Example: URLs including ‘&’. This works for following links as well as for image URLs.
  3. ‘Enable JavaScript’, ‘Share Location’ and ‘Enable plugins’ options removed from Browser settings.
  4. Bug related to scraping a list of URLs when one of the URLs fails to load fixed.
  5. While scraping a list of URLs, URLs which do not start with HTTP scheme part (http:// or https://) are handled.

Download the latest version

The latest version of WebHarvy is available here. If you are new to WebHarvy we would recommend you to view our ‘Getting started‘ guide.

 

Leave a Comment