- The Evaluation version of WebHarvy supports only 2 keywords while scraping data using this feature
The Keyword Scraping feature of WebHarvy allows you to scrape data by submitting a list of keywords to a web page. This feature lets you configure WebHarvy to scrape the data which is displayed after submitting each keyword in the list. Let us follow an example.
Suppose we need to scrape the search results for the following keywords at a yellow pages website : Accountant, Lawyer, Plumber and Doctor. First navigate to the search/home page of the website. Place the cursor in the search box and click the Input a list of keywords button from Actions menu as shown below.
In the resulting window, enter keywords. You may type in the keywords one-per-line or copy-paste a keyword list in CSV format. You can also import keywords directly from a CSV file (or file with one keyword-per-line format) by clicking the import button on the top right side of the window.
Click the OK button and WebHarvy will automatically fill the selected input (search) box with the first keyword in the list. Make sure that you do not change this. In case WebHarvy does not automatically fill the input (search) box with the first keyword, you should manually enter the first keyword (as it is, case sensitive).
You may fill multiple input fields with keywords following the above method. For example separate keywords lists can be provided for search term and location.
Once all keyword lists are configured, you may manually fill additional search / form parameters and click the 'Search'/Form submit button.
Once the page displaying search results is loaded, click the Configuration - Start button in Home menu and start selecting data to be scraped. Click Configuration - Stop button in Home menu when you have finished selecting data. Click Start-Mine button to start mining data. While mining, the configuration will be repeated for all specified keywords.
Adding keywords after starting configuration
If for some reason you are unable to configure Keywords as explained above, you can add them after starting configuration. For this, after starting configuration, click on the 'Keywords' button in the 'Configuration' tab.
In the resulting Keywords window you can type in the keywords one per line or comma separated. But, the first keyword which you enter should be present 'as it is' in either the start URL of the configuration or the Post Data.