Loading Web Pages & Starting Configuration
Selecting Data to Scrape
Following a link
Capturing data from multiple pages
Export captured data
Keyword based Scraping
Scrape via Proxy Server
Scheduler & Command line options
How to register ?
You can setup WebHarvy to scrape websites via Proxy Servers. Scraping via Proxy Servers helps you to maintain a level of anonymity, by hiding your IP, while extracting data from websites. To edit Proxy Settings click the Settings option from the Edit menu (Edit menu > Settings) and select the 'Proxy Settings' tab.
To add a proxy, provide the proxy server details in the 'Add proxy' box and click the '+' button. Proxy details include proxy server address, port and authentication details (if required). You may add multiple proxy servers to the 'Proxy List'.
- Either a single proxy server or a list of proxy servers can be used for web scraping. In case you select the 'Rotate proxies' option, WebHarvy will automatically rotate and use each proxy server in the list periodically. Otherwise, the first proxy in the list will be used.
- It is recommended that you use paid proxy servers since the free/open proxy servers available are very slow and unreliable, and result in early termination of mining process.
- To import a list of proxy addresses from a file (CSV or Text), click the Import button . The proxy list file should have the following format.
Each line of the file will describe a proxy server in the following format :-
proxy-address:port username password
As shown above, a blank space must be used to separate proxy address, username and password. The username and password fields are optional. So in case they are absent only the proxy IP address will be present per line. Each line describes a proxy server. You may also separate proxy server information by commas (,) or semicolons (;) instead of newline (line by line format).
Example Proxy List File :-
http://126.96.36.199:8080 w343sa pwd123
- In the above example, the first proxy has login credentials (username and password), while the last two are open.
- Read Article: Anonymously Scrape Data