WebHarvy 5.5 (Custom User Agent String, Handles frames, better form submission/navigation)

The following are the main changes (features/improvements) of WebHarvy 5.5

1. Custom User Agent String

If you go to WebHarvy Settings > Browser tab, you can enable custom user agent string as shown below.

The ‘Enable custom user agent string’ option allows you to specify a user agent string which WebHarvy configuration and mining browsers will use. This option can be used to make WebHarvy’s browser appear like another specific browser (ex: Microsoft EdgeMozilla FirefoxGoogle Chrome or Apple Safari) to websites from which you are trying to extract data.

2. Better form submission, initial navigation

Suppose that you need the configuration to input values to a search form (like the one shown below) and then click the ‘Search’ button to perform search and display results. The results contain the data which you need to extract.

Earlier, you needed to disable pattern detection before filling the form fields. After clicking the search button, when the data which you need to extract is displayed, you will need to enable pattern detection back again, before selecting the required data.

But now, with the latest version, you no longer need to adjust the pattern detection state manually. WebHarvy will handle this internally, automatically.

3. Open frames and select data

Earlier, if the data which you need to select for extraction occur within a frame inside the page, you needed to find the frame URL and load the frame URL independently within WebHarvy and then start configuration.

With this version, we have added a new Capture window option to open frames. Whenever you click on any item which occurs within a frame, the resulting Capture window displayed will have an ‘Open Frame’ option clicking which WebHarvy automatically loads the frame contents within the browser view, so that you can proceed with data selection.

4. Browser Search

You can hit CTRL + F in configuration browser (while not in configuration mode) to bring up the search window, using which you can perform textual search on currently loaded page.

5. Capture full page HTML

Sometimes you will need to capture the full page HTML to extract some data within it by applying regular expressions. Earlier you needed to click anywhere on the page and select Capture More Content option multiple times so that the whole page content is selected and then you can select Capture HTML option to get the full page HTML. With the latest version, you can double click on the ‘Capture HTML’ toolbar button to capture the full page HTML directly.

6. Reset settings to default

You no longer need to remember what the default settings were. Just click the ‘Reset settings to default’ link in the Settings window.

7. Lower repetition intervals for scheduled tasks

Mining tasks scheduler now allows you to repeat mining tasks at 5, 10, 15 and 30 minutes intervals.

Minor Changes

  1. ‘Enable Web Security’ option in Browser Settings is ON by default
  2. Browser handles ‘Need Client Certificate’ request from Web Servers
  3. Updated internal browser to latest possible version of Chromium
  4. HTTP2 support enabled
  5. Bug fixes and overall improvements
    1. Fixed issue where some selected data items were not extracted correctly during mining
    2. Preview generation is stopped when configuration is stopped
    3. Deleting data fields not allowed while preview generation is in progress
    4. Fixed issue with ‘pattern detection enabled for a while’ soon after opening popup
    5. Issue with editing start page URL in a configuration with multiple URLs fixed
    6. Single-term search supported from configuration browser address bar
    7. Pagination controls enabled in Miner window for single-page configurations after mining is stopped. Fixed.
    8. Fixed bug in Keyword Scraping due to case sensitiveness of keyword replacement in start URL / Post-data.
    9. Sometimes while starting WebHarvy the initial page (quick start guide) takes forever to load. Fixed.

As always, you can download and install the latest version from https://www.webharvy.com/download.html.

Leave a Comment