Follow Links and Scrape Data

WebHarvy can automatically follow links in web pages and capture data from the resulting pages.

  1. 1. 'Follow this link' option
  2. 2. 'Click' option
  3. 3. Follow URLs present in HTML

  • 'Follow this link' option

    In order to gather more detailed data by following a link within the page, click on the link. In the resulting Capture window, click 'Follow this Link' button as shown below.

    Tip: If 'Follow this link' option is disabled when you click on a click, try applying Capture More Content option once or twice.

    WebHarvy Web Harvestor
  • When 'Follow this link' button is clicked WebHarvy will navigate following the link which you clicked. When the new page is loaded, you can select more data items to scrape by just clicking on them.

    WebHarvy Web Harvestor
  • Click Option

    In case 'Follow this link' option is disabled, you may use the 'Click' option listed under 'More Options' to click a link, load the linked page and then extract data. This is useful if you need to navigate links within product details pages or select tabs within details page before extracting data.

    WebHarvy Web Harvestor

    Please make sure that you use the 'Click' option only when 'Follow this link' option is disabled.

  • Follow URLs present in HTML

    WebHarvy can be configured to follow links (URLs : absolute as well as relative) present in the HTML code of the selected content. This option can be used when the 'Follow this link' or 'Click' options are not enabled or does not result in loading the required page. This is particularly useful while trying to capture data from popups.

    During configuration, click on the link/image/button/element where the URL is embedded. In the resulting Capture window displayed, click 'More Options' and select 'Capture HTML' option to display the HTML code in the preview area. You may sometimes need to apply the 'Capture more content' option before selecting the 'Capture HTML' option to make sure that the HTML displayed in preview area contains the URL to be opened.

    WebHarvy - Capture HTML

    Once the HTML containing the URL is displayed, click 'More Options' and select 'Apply Regular Expression' option to capture the URL.

    WebHarvy - Apply Regular Expression

    Using the correct RegEx string capture the URL from the HTML code displayed in the preview.

    WebHarvy - Apply Regular Expression

    Once the URL (relative or absolute) is captured and displayed in the preview area, the 'Follow this link' option will be enabled. Click the 'Follow this link' option to load the URL. If the 'Follow this link' option is disabled, click 'More Options' and select the 'Click' option.

    WebHarvy - Follow link in HTML