Scrape HTML

WebHarvy allows you  to scrape HTML of page contents in addition to plain text. In the Capture window, click ‘More Options’ button and select the ‘Capture HTML’ option to scrape the HTML of the selected content.
To capture only a portion of the displayed HTML, you may select and highlight the required portion before clicking the Capture button.
Usually Regular Expressions are applied over the HTML source of the content to extract the data of interest like image URL or hidden fields like phone number.
The following video shows how the ‘Capture HTML’ option is used along with Regular Expressions to correctly extract the product price.
http://www.youtube.com/watch?v=tAMvnQT2-kg
Try out the free evaluation copy of WebHarvy from https://www.webharvy.com/download.html.

Leave a Comment