Steps to follow to scrape email addresses
- Open WebHarvy and load the page from which you need to scrape email addresses
- Start Configuration (or you might be already in configuration mode and navigated to the page which contains the email addresses to scrape, by following links from the starting page)
- Click anywhere on the page to bring up the Capture window
- Double click on the Capture HTML toolbar icon of Capture window. This will select and display the entire page HTML content in the preview area of Capture window.
- Select More Options > Apply Regular Expression (or click on the Apply RegEx toolbar icon)
- In the resulting window, expand the drop down and select the regular expression to capture email address.
- Select the ‘Match Multiple Times’ option if you wish to scrape multiple (all) email addresses from the page
- Click the Apply button
- If the page contains email addresses, they will be selected and displayed in the Preview area of Capture window.
- Click on the main Capture HTML button to scrape the selected email addresses
To scrape phone numbers, use any of the following RegEx strings in Step 6 above.