Scraping data from paginasamarillas.es | extracción paginas amarillas

In this article we will see how WebHarvy can be used to extract data from Spanish Yellow Pages website – paginasamarillas.es Paginas Amarillas Data Extraction WebHarvy can extract data like business name, address, website, email and phone numbers from paginasamarillas.es listings. The following video shows how this can be done. Most of the details except email … Read more

WebHarvy 5.5.1.170 (Minor Update)

WebHarvy 5.5.1.170 brings an important bug fix and also a few other improvements. Bug fix Sometimes, during configuration, while selecting data from starting page (where there are multiple listings), preview gets updated with only a single item, giving the impression that pattern detection failed. This issue is present in the last 2 versions of WebHarvy, … Read more

WebHarvy 5.5 (Custom User Agent String, Handles frames, better form submission/navigation)

The following are the main changes (features/improvements) of WebHarvy 5.5 1. Custom User Agent String If you go to WebHarvy Settings > Browser tab, you can enable custom user agent string as shown below. The ‘Enable custom user agent string’ option allows you to specify a user agent string which WebHarvy configuration and mining browsers will use. This option … Read more

WebHarvy 5.4 (Auto delete cookies, Load more data using JS)

What is new in WebHarvy Version 5.4 ? Automatically delete cookies while mining Websites can get details regarding your previous visits using cookies stored locally by the browser. A new Browser Settings option has been added to prevent this. WebHarvy will periodically delete browser cookies during mining when this option is enabled. New pagination method … Read more

A minor update to fix crashes reported with latest Windows updates

You must be aware that the latest updates (1809 and its re-release) released by Microsoft for Windows 10 caused issues for many users. Few of our customers reported application crash while trying to start up WebHarvy with these updates installed. We have solved this issue in the latest update (5.3.0.161) of WebHarvy which you may download … Read more

How to web scrape after translating a page to another language ?

You can use Google Translate to translate a web page to another language and then use WebHarvy to scrape the translated content. For this, you will first need to translate the page using Google Translate’s web interface, find the translated page frame URL and then load it within WebHarvy. The video below demonstrates the process. … Read more

WebHarvy 5.3 (Parallel mining, Chrome developer tools)

‘How to increase mining speed ?‘ was one of the most commonly asked questions by our users. With previous versions, the main limitation was that when links had to be followed from the starting page to get each listing details, the miner took more time to scrape a page full of listings. This is because … Read more