| | YouTube Channel | KB Articles

Articles Home

Product Help

YouTube Channel

WebHarvy Blog

Trouble Shooting Guide - DIY

  1. 1. Page not loading correctly in WebHarvy's browser
  2. 2. Mining process is slow
  3. 3. No data during mining
  4. 4. Mining stops before completion
  5. 5. Next page not loading / Data extracted only from the first page
  6. 6. Unable to select data during configuration / Data within frame
  7. 7. While scraping table, only the first column data is selected
  8. 8. Mining does not stop. Last page is repeatedly extracted.
  9. 9. WebHarvy crashes/terminates unexpectedly
  10. 10. Unlocking the software fails with message "Activation failed due to unknown reason"
  11. 11. Cannot see miner window / Miner window is blank

Page not loading correctly in WebHarvy's browser

Make sure that JavaScript is enabled in Browser Settings. Check WebHarvy > Home menu > Settings > Browser tab > Enable JavaScript (ticked).

Mining process is slow

Please try the following :-

1. Decrease the 'Page Load Timeout' (to 10 seconds or below depending on your internet speed) and 'AJAX Load Wait Time' (1 or 2 seconds) values in Miner Settings.

2. Turn off 'Image Loading' in Browser Settings and pages will load faster during mining. WebHarvy > Home menu > Settings > Browser tab > Disable loading images (ticked)

No data during mining

During configuration, preview of captured data was generated and displayed correctly, but during mining (Start Mine) no data was extracted.

Please try the following to solve this problem :-

1. The first thing you can try is to save the current configuration, restart WebHarvy, open the saved configuration file and start mining again.

2. If the website from which you are trying to scrape data require login (with a user name and password), to view the data which you need to extract, then please make sure that you follow the steps given at Scraping websites which require login.

3. Make sure that settings in Browser Settings Window have the same values as when the configuration was created. Check especially JavaScript enable/disable.

4. Delete browsing history and try mining (Start Mine) again. WebHarvy > Home menu > Settings > Browser tab > Delete Cache / Browsing History > Click (see Browser Settings Window). This will delete browsing history including cookies/cached files. Then try mining again.

5. Increase the 'AJAX Load Wait Time' in Miner Settings. Websites which employ AJAX to load and display elements may require some additional time after page load to display all data. The default value of this setting is 5 seconds. Increase it to 10, 15 or 20 seconds and see if WebHarvy is able to get results during mining.

6. Soon after starting configuration, click anywhere on the page and select More Options > Scroll Page from the resulting Capture window. Then continue with the configuration.

7. Edit the configuration (Home menu > Edit) and see if the configuration start page as well as the preview data is loaded and displayed correctly. If the problem is related to loading the start page then it can be identified here. You can fix problems related to starting page URL and Post data by selecting Configuration menu > Edit > StartURL / PostData.

8. You may also Try Scraping via Proxy Servers.

Mining stops before completion

Mining stops before the requested/total number of pages are scraped

This usually happens when WebHarvy is unable to load the next page of data by clicking the next page link selected during configuration. Please try the following.

1. Increase 'Page Load Timeout' and 'AJAX Load Wait Time' values in Miner Settings so that the page gets enough time to load all data before it is scraped. Increasing these values will slow down mining but will minimize page load time outs.

2. When mining aborts before completion, you can click the Start button again (without closing the Miner window) and WebHarvy will try to resume mining from where it stopped.

3. You can also directly change the starting URL of the configuration so that mining starts at a different page (where it stopped) than it was originally configured for.

4. Also, websites can potentially block you if you access their pages via software for long time/data for data extraction. The solution here is to scrape via proxy servers or VPN so that you can remain anonymous and avoid getting blocked by websites. Try using proxy servers with WebHarvy.

5. In case you are trying to scrape a relatively large number of records please refer 'How to scrape large amounts of data ?'

Next page not loading / Data extracted only from first page

During mining data is extracted only from the first page. Mining stops after first page extraction, or more pages are loaded but no more data is extracted.

Try the following :-

1. Sometimes the first page of listings has a slightly different layout than the rest of the pages (Ex: Many Amazon product listings). In such cases the first and rest of pages should be scraped separately. Load the second page and then start configuration. See if data from subsequent pages (3,4 etc.) can be extracted during mining stage.

2. Make sure that the first item which you click after starting configuration does not belong to an advertised or sponsored listing. For example, with Yellow Pages Listings the first few ones may be sponsored/advertised listings and will only be present in the first page. Selecting data from these will prevent WebHarvy from getting data from subsequent pages.

3. Try setting the next page link via both the methods explained (see the images) at Selecting pagination links. The next link can be set either by clicking the next link/arrow or by clicking the direct link to load page number 2. Try both these methods.

4. You may also try increasing the 'AJAX Load Wait Time' value in Miner Settings.

5. In case the direct links (URLs) to each page of listings has the page number embedded in it you can try the URL based pagination method.

6. With some websites multi-page scraping will work only if you disable scripting (JavaScript) in Browser Settings. To disable JavaScript : WebHarvy > Home menu > Settings > Browser tab > Enable JavaScript > uncheck. Do not forget to turn JavaScript back ON (enable) after finishing mining.

7. Soon after starting configuration', click anywhere on the page and select More Options > Scroll Page. Then continue with the configuration.

Data within Frame / Unable to select data during configuration

After starting configuration, Capture window is not displayed when a item (text/image) is clicked.

This usually happens when the data to be selected is inside a frame. To select data you will have to find the frame URL and load it directly in WebHarvy. If you have Chrome browser installed then the frame URL can be found as follows.

Load the page in Chrome browser. Right-clicking on the data item which you need to extract should show you the option "View frame source". By clicking on it, it will open the source code in a new tab. Its URL is on the address bar. Remove the "view-source:" prefix from the address bar string to get the frame URL.

Load the frame URL directly in WebHarvy and start configuration. You should be able to select data.

Scraping data from Table / Grid layout

While scraping items displayed in table/grid layout, only the first column items are selected/extracted.

For example, product listings are often displayed in a grid layout (table - row/column format). While configuring WebHarvy to extract data from such pages when the first product's title (or any other detail) is selected, only products from the first column are automatically identified. Products from the remaining columns are missed.

To solve this please follow the steps below :-

1. Open WebHarvy Settings : Home menu > Settings
2. Click on the 'Advanced Miner Options' button
3. In the resulting window adjust the value of the first list option - 'Minimum number of items required in a list'. Select a value which is equal to or less than the number of columns in the table/grid. For example if products are displayed in 4 columns, this value should be set to '3'.
4. Apply Changes.
5. Now start configuration and select first product detail, details of all remaining products (from all columns) should be automatically selected.

Please make sure that you reset the change done in Step 3 above before mining other websites since this is a global miner setting.

Mining does not stop. Last page data is repeatedly extracted.

This usually happens when WebHarvy is unable to detect the end of pagination. This can be avoided by enabling the Automatically remove duplicate records while mining option in Miner Settings. When this option is enabled, mining is automatically stopped when a page full of duplicate entries is encountered.

This can also be prevented by configuring the next page link by clicking on the direct link to load page number 2 (if present), instead of clicking on the 'next' link. This is as shown in the second image displayed at How to select pagination links ?

WebHarvy crashes / terminates unexpectedly

Please try the following to solve this issue :-

1. Make sure that 'Enable Plugins' and 'Enable Web Security' options in Browser Settings are turned off.

2. Make sure that you have installed the latest version of WebHarvy available for download.

3. Install and run WebHarvy from a user account with administrative privileges, or right click application icon and select 'Run as administrator'. In case you face the issue again you may try running WebHarvy in compatibility mode. Right click WebHarvy desktop icon, select Properties, click on the Compatibility tab, select 'Run this program in compatibility mode for' checkbox and select a previous version of windows from the following list box. You can also tick the 'Run this program as administrator' box.

When trying to unlock the trial version of WebHarvy using the license key file the error message "Activation failed due to unknown reason" is displayed

WebHarvy registration involves online activation. So internet connection is required for successful registration/unlock of trial version of WebHarvy using the license key file. Each WebHarvy license (except site license) has a fixed number of activations. Activations beyond this limit will not be allowed.

In case activation fails, please check if your firewall/antivirus is blocking WebHarvy. WebHarvy's configuration browser and miner will work without firewall/antivirus exceptions (since they are browsers), but activation will fail if the connection request to our license server is blocked. So in case online activation fails please try again after adding rules/exceptions in your firewall/antivirus to allow outgoing connections from WebHarvy.

When 'Start-Mine' is clicked, Miner window is not displayed. Taskbar preview shows blank window.

This problem usually happens when WebHarvy is run with multiple monitors attached to your computer/laptop and later you have disconnect the extra monitor. Please try any of the solutions given below :

Solution 1:

Hover mouse over WebHarvy application icon (in taskbar) so that you can see small preview of the Miner window. Click on the Miner window so that it becomes the active window.

1. Now hit ALT + Space Bar
2. Press the ‘M’ key
3. Now you can use the up/down/left/right arrows in your keyboard to move this window around so that you might get it back in the current desktop. Once you start using the arrow keys, you can also use the mouse to move the window around.

Solution 2:

You may also try connecting the external / additional monitor and see the Miner window.