WebHarvy Web Scraper : Troubleshoot common problems

The following guide will help you solve the most commonly faced problems while scraping data.

1. Mining process is slow
2. No data during mining
3. Data missing in some rows during mining
4. Mining stops before completion
5. Next page not loading / Data extracted only from the first page
6. Unable to select data during configuration / Data within frame
7. While scraping table, only the first column data is selected
8. Mining does not stop. Last page is repeatedly extracted.
9. Mining terminates, configuration browser goes blank, application crashes
10. Unlocking the software fails with message "Activation failed due to unknown reason"
11. Cannot see miner window / Miner window is blank
12. 'Follow this link' option is disabled
13. Web page not loading / 'Checking if site connection is secure' message
14. Only the first item on the page is selected
15. Scraping data from multiple pages fails for scroll-to-load / load-more type pagination.
16. Page not fully loaded after following link
17. All items / images on page are not scraped
18. Cannot see preview pane
19. Difficult to select data during configuration due to slow text highlighting
20. Incorrect links/buttons clicked during mining
21. Proxy settings are ignored by WebHarvy
22. 'Capture Target URL' option is disabled
23. Incorrect data mined
24. Error Code:-375 while trying to load web page
25. First item on page does not have all required data
26. Configuration not working after a few days

1. Mining process is slow

Try the following to increase mining speed

1. Decrease the 'Page Load Timeout' (to 10 seconds or below depending on your internet speed) and 'Script Load Wait Time' (1 or 2 seconds) values in Miner Settings.
2. Disable loading images in Browser Settings and pages will load faster during mining. WebHarvy > Settings > Browser tab > Disable loading images (ticked)
3. Increase the value of Maximum number of parallel mining threads setting in Advanced Miner Options.

The above settings should be adjusted in such a manner that it does not result in missing data during mining. If the miner does not get enough time to load pages or if you send too many requests at once to the website it can result in page load issues.

Note: Starting from version 7.3, all miner options, including page load timeout and script load wait time, are stored in the configuration file. Therefore, to apply new setting values while running a saved configuration, you must first open the configuration file, then open the Settings window, and click the Apply button (even if no changes are made to the local setting values).

2. No data during mining

During configuration, preview of captured data was generated and displayed correctly, but during mining (Start Mine) no data was extracted.

Please try the following to solve this problem:

1. Increase the 'Script Load Wait Time' in Miner Settings. The default value of this setting is 5 seconds. Increase it to 10, 15 or 20 seconds and see if WebHarvy is able to get results during mining.
2. Soon after starting configuration, click anywhere on the page and select More Options > Page > Scroll Down from the resulting Capture window. Then continue with the configuration.
3. If the website from which you are trying to scrape data require login (with a user name and password), to view the data which you need to extract, then please make sure that you follow the steps given at Scraping websites which require login.
4. Delete cache / browsing history and try mining (Start Mine) again. WebHarvy > Settings > Browser tab > Delete Cache / Browsing History (see Browser Settings Window). This will delete browsing history including cookies/cached files. Then try mining again.
5. Edit the configuration (Home menu > Edit) and see if the configuration start page as well as the preview data is loaded and displayed correctly. If the problem is related to loading the start page, then it can be identified here. You can fix problems related to starting page URL and Post data by selecting Configuration menu > Edit > StartURL / PostData.
6. Sometimes web servers may block your IP for continuously scraping their pages. In such cases you should try Scraping via Proxy Servers to avoid detection as well as to continue mining data.
7. You may also try changing the user agent string associated with the mining browser. User agent string can be set in WebHarvy Browser Settings. Enable custom user agent string option and provide user agent string for browsers like FireFox or Edge.
8. Open WebHarvy Settings and click on Advanced Miner Options button. Change the value of Data selection accuracy setting to Low or Medium and apply the change. Try again by creating a new configuration.
9. For scraping data from search result pages, if the results page does not have its own URL, the search needs to be performed as part of the configuration. Start Configuration after loading the search form. Use the page interaction options in Capture window (Input Text, Select Dropdown, Click etc.) to perform search and load the results page, from where you can select the required data.
10. Try by creating a new configuration. Changes in the webpage's layout, design, or internal structure may cause old configuration files to stop working.

3. Empty cells in data table during mining

During mining, some cells in the data table remain blank. This may happen due to the following reasons:

1. Data is actually not present in the page

2. Page load failed / Page was not completely loaded

3. Data is present, but WebHarvy was not able to locate it based on the configuration

Try the following steps to make sure that the page is completely loaded before extraction starts:

1. Increase the 'Script Load Wait Time' in Miner Settings. The default value of this setting is 5 seconds. Increase it to 10, 15 or 20 seconds and see if WebHarvy is able to get results during mining.
2. Soon after starting configuration, click anywhere on the page and select More Options > Page > Scroll Down from the resulting Capture window. Then continue with the configuration. Similarly, after following links from the start page, once the new page is loaded, before selecting any data, click anywhere on the page and select More Options > Page > Scroll Down. This helps to completely load all page elements before extraction is attempted.
3. Decrease the value of Maximum number of parallel mining threads setting in Advanced Miner Options. Set the value of this option to 1 and try mining again.

Sometimes, the location of the text which you need to extract varies slightly from page to page. In such cases, during configuration, if you directly click on the required text and select it, during mining data will not be extracted for some pages where it occurs at a slightly different location.

Try the following to overcome this problem:

1. If the required text always appears after a heading text, then use the Capture following text method to select it during configuration. This method works independent of the location of text.
2. Instead of clicking directly on the required text and selecting it, capture a larger area of text, and select the required portion from it by highlighting or by applying regular expressions.

4. Mining stops before completion

Mining stops before the requested/total number of pages are scraped

This usually happens when WebHarvy is unable to load the next page of data by clicking the next page link selected during configuration. Please try the following.

1. Increase 'Page Load Timeout' and 'Script Load Wait Time' values in Miner Settings so that the page gets enough time to load all data before it is scraped. Increasing these values will slow down mining but will minimize page load time outs.
2. If the website displays separate links to load pages 1, 2, 3 etc., click on the direct link to load page number 2 and set it as the next page link.
3. Try URL based pagination or pagination via JavaScript.
4. When mining aborts before completion, you can click the Start button again (without closing the Miner window) and WebHarvy will try to resume mining from where it stopped.
5. You can also directly change the starting URL of the configuration so that mining starts at a different page (where it stopped) than it was originally configured for.
6. Also, websites can potentially block you if you access their pages via software for long time/data for data extraction. The solution here is to scrape via proxy servers or VPN so that you can remain anonymous and avoid getting blocked by websites. Try using proxy servers with WebHarvy.
7. Try after enabling the Disable cookies while mining option and/or Use separate browser engines for mining links option in Browser Settings.
8. In case you are trying to scrape a relatively large number of records please refer 'How to scrape large amounts of data ?'

5. Next page not loading / Data extracted only from first page

During mining data is extracted only from the first page. Mining stops after first page extraction, or more pages are loaded but no more data is extracted.

Try the following:

1. Increase the 'Script Load Wait Time' value in Miner Settings. Default value of this setting is 5 seconds. Increase it to 10, 15 or 20 seconds and try mining again.
2. Soon after starting configuration', click anywhere on the page and select More Options > Page > Scroll Down. Then continue with the configuration. This will make sure that the whole page (including pagination links) is loaded before data extraction begins.
You may also try by clicking on the title of the first item on page and by selecting More Options > Scroll List soon after starting configuration.
3. Make sure that the first item which you click after starting configuration does not belong to an advertised or sponsored listing. Always select data from a normal (non ad, non sponsored) listing.
4. Sometimes the first page of listings has a slightly different layout than the rest of the pages. In such cases the first and rest of pages should be scraped separately. Load the second page and then start configuration. See if data from subsequent pages (3,4 etc.) can be extracted during mining stage.
5. Try setting the next page link via both the methods explained (see the images) at Selecting pagination links. The next link can be set either by clicking the next link/arrow or by clicking the direct link to load page number 2. Try both these methods.
6. In case the direct links (URLs) to each page of listings has the page number embedded in it you can try the URL based pagination method.
7. Configure pagination for pages which load more data on the same page when scrolled down (infinite scroll) or when a button/link is clicked (load more).

6. Data within Frame / Unable to select data during configuration

After starting configuration, Capture window is not displayed when an item (text/image) is clicked.

Update: WebHarvy 5.5 and later versions have a new capture window option to open frames and extract data. Please see the Open Frame option for more details.

This usually happens when the data to be selected is inside a frame. To select data, you will have to find the frame URL and load it directly in WebHarvy. If you have Chrome browser installed then the frame URL can be found as follows.

Load the page in Chrome browser. Right-clicking on the data item which you need to extract should show you the option "View frame source". By clicking on it, it will open the source code in a new tab. Its URL is on the address bar. Remove the "view-source:" prefix from the address bar string to get the frame URL.

Load the frame URL directly in WebHarvy and start configuration. You should be able to select data.

7. Scraping data from Table / Grid layout. While scraping items displayed in table/grid layout, only the first column items are selected/extracted.

For example, product listings are often displayed in a grid layout (table - row/column format). While configuring WebHarvy to extract data from such pages when the first product's title (or any other detail) is selected, only products from the first column are automatically identified. Products from the remaining columns are missed.

To solve this please follow the steps below:

1. Open WebHarvy Settings : Home menu > Settings
2. Click on the 'Advanced Miner Options' button
3. In the resulting window adjust the value of the first list option - 'Minimum number of items required in a list'. Select a value which is equal to or less than the number of columns in the table/grid. For example, if products are displayed in 4 columns, this value should be set to '3'.
4. Apply Changes.
5. Now start configuration and select first product detail, details of all remaining products (from all columns) should be automatically selected.

Please make sure that you reset the change done in Step 3 above before mining other websites since this is a global miner setting.

8. Mining does not stop. Last page data is repeatedly extracted.

This usually happens when WebHarvy is unable to detect the end of pagination. This can be avoided by enabling the Automatically remove duplicate records while mining option in Miner Settings. When this option is enabled, mining is automatically stopped when a page full of duplicate entries is encountered.

This can also be prevented by configuring the next page link by clicking on the direct link to load page number 2 (if present), instead of clicking on the 'next' link. This is as shown in the second image displayed at How to select pagination links ?

9. Mining aborts followed by configuration browser / application crash

Mining terminates prematurely followed by the configuration browser going blank. If you try to load any page in the browser after closing the miner window, it will result in application crash.

Please try the following to solve this issue:

1. In Miner Settings > Advanced Miner Options, set value 1 for 'Maximum number of parallel mining threads'. Crashes can occur if you are low on memory (RAM) and try to use multiple mining threads.
2. In Browser Settings, enable the option 'Use separate browser engine for mining links' .
3. Make sure that you have installed the latest version of WebHarvy available for download.
4. Install and run WebHarvy from a user account with administrative privileges, or right click application icon and select 'Run as administrator'. In case you face the issue again you may try running WebHarvy in compatibility mode. Right click WebHarvy desktop icon, select Properties, click on the Compatibility tab, select 'Run this program in compatibility mode for' checkbox and select a previous version of windows from the following list box. You can also tick the 'Run this program as administrator' box.

10. When trying to unlock the trial version of WebHarvy using the license key file the error message "Activation failed due to unknown reason" is displayed

WebHarvy registration involves online activation. So, internet connection is required for successful registration/unlock of trial version of WebHarvy using the license key file. Each WebHarvy license (except site license) has a fixed number of activations. Activations beyond this limit will not be allowed.

Please try the following in case you are unable to perform activation:

1. Please check if your firewall/antivirus is blocking WebHarvy. WebHarvy's configuration browser and miner will work without firewall/antivirus exceptions (since they are browsers), but activation will fail if the connection request to our license server is blocked. So, in case online activation fails please try again after adding rules/exceptions in your firewall/antivirus to allow outgoing connections from WebHarvy.

2. Sometimes your internet provider may be blocking the connection to our license server. In such cases, connecting to internet using another network and then performing activation will work. For example, you may connect to another Wi-Fi network or your mobile hotspot and then perform activation.

3. You may also try after changing the DNS server addresses which your computer uses. Try to unlock after changing the DNS server addresses to 8.8.8.8 / 8.8.8.4 (Google Public DNS), or to 1.1.1.1. You may refer this article to know how to change DNS settings in Windows.

11. When 'Start-Mine' is clicked, Miner window is not displayed. Taskbar preview shows blank window.

This problem usually happens when WebHarvy is run with multiple monitors attached to your computer/laptop and later you have disconnected the extra monitor. Please try any of the solutions given below:

Solution 1:

1. Hover mouse over WebHarvy application icon (in taskbar) so that you can see small preview of the Miner window. Click on the Miner window so that it becomes the active window.
2. Now hit ALT + Space Bar
3. Press the 'M' key 3. Now you can use the up/down/left/right arrows in your keyboard to move this window around so that you might get it back in the current desktop. Once you start using the arrow keys, you can also use the mouse to move the window around.

Solution 2:

Try reconnecting the external / additional monitor and see if the Miner window is visible.

12. 'Follow this link' option is disabled when a link is clicked during configuration

During configuration, if you click on a link and find the Follow this link option disabled, then try applying the Capture More Content option once or twice (or more times as required) without closing the Capture window. The follow option might get enabled.

If the follow option is still disabled, you can try using the Open Popup option.

1. Click on the title/link
2. Select More Options > Open Popup
3. Wait for the details page to load
4. Select required data from the details page
5. Click anywhere on the page and select More Options > Page > Go back

Another option is to select the link/URL from the HTML source of the page using regular expression.

13. Web page not loading. Browser displays 'Checking if site connection is secure' message.

Make the following change in WebHarvy Settings to resolve this problem.

Go to WebHarvy Settings > Browser and select the Enable Custom User Agent String option. Select FireFox from the dropdown menu and apply changes. Restart WebHarvy.

14. Only the first item on the page is selected

Only the first (clicked) item on the page is selected. Subsequent items are not scraped.

Please try the steps given below to solve this problem:

1. Open WebHarvy Settings and click on Advanced Miner Options button. Change the value of Data selection accuracy setting to Low or Medium and apply the change. Try again by creating a new configuration.
2. If you used the Click option to follow links from the starting page, then only the first item's link will be followed. Instead, use the Open Popup option. Once all required data has been selected from the newly opened page, click on the link/button to go back to the listings page and select the Click option. If such a link/button does not exist, click anywhere on the page and select More Options > Page > Go Back to navigate back to the listings page.
3. Make sure that the first item which you click after starting configuration does not belong to an advertised or sponsored listing. Always select data from a normal (non ad, non sponsored) listing.

15. For scroll-to-load / load-more type pagination, data is scraped only from the first page

For pages which load more data on the same page when scrolled down (infinite scroll) or when a load-more button/link is clicked, if only the first page data is scraped during mining, please follow the steps below.

1. Open WebHarvy Settings and click on Advanced Miner Options button. Change the value of Data selection accuracy setting to Low, Minimum number of items required in a list to 2, Number of levels higher in HTML DOM.. to 1 and Apply changes.
2. Load the page containing the items which you need to scrape within WebHarvy. Manually scroll down the page and/or click the load-more link/button, so that multiple pages (more than 3) of listings are loaded.
3. Scroll back up to the first item on the page and Start Configuration.
4. Follow the normal method of configuration and see if the problem is solved.

When initial navigation is required (form submission, button/link clicks) to load the listings page after starting configuration, please follow the steps given below in the exact same order to make sure that pagination is configured correctly.

1. Start configuration
2. Perform initial navigation (input text, select dropdown, click etc.) required to load the data listings page
3. Configure scroll to load or load more data pagination.
4. Start selecting data and continue with the rest of the configuration

16. Page not fully loaded, after following link

During configuration, while following link from the starting page to select more details, the resulting page is not loaded correctly. The whole page is blank or some portions are not loaded.

To solve this problem, soon after following link from the starting page, if the page is not loaded correctly, click anywhere on the page and select More Options > Page > Reload from the Capture window. If sections of the page are loaded only when the user scrolls down, click anywhere on the page and select More Options > Page > Scroll Down from the Capture window.

17. All items or images on the page are not scraped

During mining, not all items displayed on the page are scraped. Only the first few items or images are correctly scraped.

This can happen if the page loads complete data only when the user scrolls down. Either of the following 2 methods will solve this issue.

1. After starting configuration, before selecting any data, click on the title of the first item on page and select More Options > Scroll List
2. After starting configuration, before selecting any data, click anywhere on the page and select More Options > Page > Scroll Down

Also, make sure that the first item which you click after starting configuration does not belong to an advertised or sponsored listing. Always select data from a normal (non ad, non sponsored) listing.

18. Preview pane is not visible

'Captured Data Preview' pane in the main window is not visible

Follow the steps given below to solve this problem.

1. Exit WebHarvy
2. Delete the following file from your computer
- C:\Users\{ your user name}>\AppData\Roaming\SysNucleus\WebHarvy\{WebHarvy version number}\layout.xml
3. Open WebHarvy

19. Data selection difficulty due to slow text highlighting

Text highlighting during configuration is very slow, making data selection difficult.

To solve this problem, go to WebHarvy Settings > Browser Settings and select the 'Disable element highlighting' option. Apply changes.

20. Incorrect links/buttons clicked during mining

Clicks are not working correctly during mining. Adjacent links/buttons are clicked instead of the correct one.

To ensure that clicks are accurately directed to the correct button or link, make sure to disable pattern detection before selecting the 'Click' option during configuration. If you need to select a list of items after performing the click, you can re-enable patterns after the click is made.

21. Proxy details provided are not used

Unable to load web page even after enabling proxy in settings.

Open WebHarvy Settings > Browser and click on Delete Cache / Browsing History button. Restart WebHarvy and see if the problem is solved.

22. 'Capture Target URL' option is disabled

When clicking on a link to scrape its URL, the 'Capture Target URL' option in Capture window is disabled (greyed out).

Follow the steps given below to solve this problem. (watch video)

1. Click on the link/button whose URL you need to scrape
2. Apply Capture More Content option repeatedly till the Follow this link option is enabled
3. Select More Options > Capture HTML option
4. Select More Options > Apply Regular Expression option
5. From the RegEx dropdown, select 'Get link/URL from HTML - href="([^"]*)' and Apply
6. Click the main 'Capture HTML' button to select the URL for extraction.

Note: Even if a relative URL (without https:// starting part) is displayed in the preview, full/absolute URL will be scraped during the mining process.

23. Incorrect data mined

In the miner window's data table, some cells contain incorrect data. This usually happens when the position/location of the data item changes from page to page.

Try the solutions given below to avoid this problem.

1. During configuration, instead of directly clicking on the item's text, if it appears after a heading text, click on the heading text and use the Capture Following Text option.
2. Before starting configuration, open WebHarvy Settings > Advanced Miner Options and select option Strict for Data Selection Accuracy setting.
3. Select the data from the text or HTML content of the page using Regular Expressions.

24. Loading web page fails with Error Code:-375

This happens when you have enabled proxies within WebHarvy Settings, but WebHarvy is unable to connect to the proxy server.

Try the following to solve this problem.

1. Make sure that the proxy server address, port, type (HTTP, HTTPS etc.), username and password provided are correct. Contact your proxy service provider or refer their documentation to verify the same.
2. Make sure that your computer's current IP is whitelisted with your proxy service provider. Many proxy providers require that you authenticate your IP before making connection.
3. You may also disable proxies, apply changes and restart WebHarvy to solve this problem.

26. Configuration not working after a few days

Configuration fails to scrape data after a few days. Creating a new configuration will solve the issue.

This usually happens when the target website internals change slightly, periodically. Please try the following solution.

Open WebHarvy Settings and click on Advanced Miner Options button. Change the value of Data selection accuracy setting to Low and apply the change. Create a new configuration.

Trouble Shooting Guide