WebHarvy can be used to web scrape data from job listing websites like Dice, Indeed, Monster etc. With an easy to use, point and click user interface, you can select most of the data which you need to scrape by just clicking over them during configuration.
The following video shows how WebHarvy can be used to scrape Dice.com job listings. Details like Job title, description, application link, company, contact details of job poster, qualifications required etc. can be scraped from Dice job listings.
Steps to Scrape Dice Job Listings Data
- Download and install WebHarvy in your computer
- Open WebHarvy and load Dice.com. Navigate to the job listings page from which you need to scrape data.
- Start Configuration
- Click over any text which you need to scrape. WebHarvy displays a Capture window with various options. Select the Capture Text option to select the text of the item to scrape. Details like job title, short description, job details page URL etc. can be selected in this method
- Scroll down to the bottom of the page and configure pagination. Click on the link to load the next page of listings and set it as the next page link.
- Scroll back up and click on the title/link of the first job listing. Select Follow this link option from the Capture window. Wait for the details page to load.
- Click and select more details from the job details page.
- The Capture following text option should be used whenever the required data is guaranteed to occur after a heading text (ex: Job Description)
- Stop Configuration
- Save Configuration (optional)
- Start Mine
- Once mining finishes, the scraped data can be saved to a file or to a database.