One of the main problems which you might face while trying to extract data from websites using automated techniques is the web server blocking your computer's IP, thereby denying you access to load its pages. Many websites have mechanisms in place to detect automated data scraping using software and block the IP of computers from where they are run. Also, while scraping data, you may not want to reveal your identity (network details) to remote web servers.
The best solution to avoid blocking and to protect your privacy is to use proxy servers or VPN while scraping data. These help you to remain anonymous while scraping data as well as to avoid getting blocked. Both these can be easily setup along with WebHarvy.
How to avoid getting blocked ?
1. You may use the 'Inject pauses during mining' feature to avoid making continuous page requests to web servers for long duration. Although this method will minimize the chances of getting detected and blocked by web servers, this may not be effective always and your identify is still not hidden from the web server.
2. Select the 'Disable cookies while mining' option in Browser Settings. Websites can get details regarding your previous visits using cookies stored locally by the browser. WebHarvy will periodically delete browser cookies during mining when this option is enabled.
3. The 'Scrape via Proxy Server' feature allows you to access and scrape websites through proxy servers, thereby maintaining anonymity while scraping data.
You may also use a VPN instead of proxies to anonymously scrape websites.
To configure this feature, click the 'Settings' option from the Edit menu and select the 'Proxy Settings' tab. You may provide a single proxy address or a list of proxy addresses as shown below. Know More
Either a single proxy server or a list of proxy servers can be used for web scraping. In case you select the 'Rotate proxies' option, WebHarvy will automatically rotate and use each proxy server in the list periodically. Otherwise, the first proxy in the list will be used.
How to obtain proxy server addresses ?
There are free as well as paid proxy servers available in the internet. You may find them by performing a google search.
The free proxies available are often slow and unreliable, and may result in early termination of mining process. For this reason we do not recommend using free proxies with WebHarvy.
You can choose any Proxy or VPN service to perform web scraping anonymously. We highly recommend that you make use of the free trial offered by most services before purchasing them. This is to verify that the service (proxy/VPN) works well with the websites from which you intend to extract data.
You can follow the link below to see some of the proxy services which we have tested and which we recommend using along with WebHarvy for anonymous web scraping.
Please contact our support in case you need assistance or have any questions.