WebHarvy can extract data like business name, address, website, email and phone numbers from paginasamarillas.es listings. The following video shows how this can be done. Most of the details except email address (which is not directly displayed by the website) can be selected by directly by clicking on them during configuration. Email address can be selected from the HTML source of the business details page by applying regular expressions. The Regular Expression string used to extract email address is copied below.
To know more we highly recommend that you download and try the free evaluation version of WebHarvy. To get started, please follow the link below.
A special technique is employed to extract data correctly and consistently from yellowpages.com.au listings. This is mainly because the layout of boxes of listings vary from one listing to another – some has header with their logo/image, some does not etc.
The regular expression strings used in the video to extract email, phone, website and address are given below.
WebHarvy can be used to easily scrape data from real estate websites like Zillow, Realtor, Trulia, RedFin etc. In this article we will see how real estate data including agent/owner contact details (phone numbers) can be extracted using WebHarvy.
Scraping Real Estate Data from Zillow
The following video shows the steps involved. You can see that data like property address, price, zestimate, beds/baths, area, property facts and features (like type, year built, parking etc.), pricing history, tax history, neighborhood details etc. can be easily selected for extraction using a point and click interface. WebHarvy will automatically scrape the data which you select from multiple properties listed across multiple pages in Zillow.
Scraping agent phone numbers from Zillow
The following video shows how agent phone numbers can be scraped from Zillow property listings. The ‘contact agent’ button needs to be clicked in each property details page to get the agent contact details.
We recommend that you download and try with the free evaluation version of WebHarvy available in our website and avail our free technical assistance for your first data scraping project. To get started please follow the link below.
The following video shows how match statistics (possession, goal attempts, shots on goal, blocked shots, corners, off-sides etc.) of all matches in a league from FlashScore website can be extracted using WebHarvy.
In addition to FlashScore, WebHarvy can also be used to extract sports betting odds from many other betting sites like BetExplorer, OddsPortal etc.
Yellow Pages business listings often display the location (Map Direction) of the business. The location details are displayed on a map interface. But the latitude, longitude values (GPS coordinates) are not displayed on page. However, this information is present inside the HTML code behind the map interface.
Extracting latitude, longitude values
The Capture HTML feature along with Apply RegEx feature of WebHarvy can be used to extract the map coordinates from the HTML code of the page. The following video shows how this can be done. The Regular Expression strings used in the video are copied below.
We recommend that you download and try using the free evaluation version of WebHarvy available in our website. To get started, please follow the link below.
WebHarvy is a visual web scraping software which can be easily configured to extract data from any website including Yellow Pages. There are various flavors of Yellow Pages websites. In this article we are focusing on www.yellowpages.com (US) website.
YellowPages.com data extraction
YP website is the go to place for contact details related to any business. And for the same reason it is one of the greatest source of business/professional contact details. The following video shows how easy it is to use WebHarvy to extract details like phone number, website address, address etc. from yellow pages listings.
Keyword based scraping
The video also shows how you can automatically submit multiple search keywords at yellow pages website and scrape the resulting data. This feature is called Keyword based Scraping and is explained in the following link.
Multiple lists of keywords can be provided (ex: one list for search and another for location) and WebHarvy will automatically submit all combinations of input keyword lists and scrape the resulting data.
We recommend that you download and try using the free evaluation version of WebHarvy to know more. Please follow the link below to get started.
The Keyword Scraping feature of WebHarvy lets you submit a list of keywords (search terms, ASIN, ISBN etc.) at Amazon and extract the resulting data displayed. WebHarvy supports submitting multiple lists of keywords to multiple search fields (ex: search query + location) in a website and scrape results for all combinations of submitted keywords. To know more please follow the link given below.
The following video shows how this feature can be used to extract data from Amazon for a list of ISBN numbers. Details like book title, author, reviews, publisher, cover image etc. can be extracted. The same technique can be used to extract product data corresponding to a list of ASINs.
To any WebHarvy configuration (built to extract data from a page / website), you can add additional URLs as explained here. This can be done while creating the configuration, or while editing it later.
The following video shows how a list of Amazon product page URLs can be scraped using WebHarvy.
The following video shows the steps which you need to follow to configure WebHarvy for Amazon Image Extraction. The video shows how the default image displayed, how multiple 500×500 higher resolution images and how the highest resolution 1000×1000 images can be extracted.
The Regular Expression strings used in the video are :
WebHarvy is a visual (point and click) data extraction software which can be easily configured to extract data from any website. This article explains how WebHarvy can be configured to extract property details from redfin.com which is a real estate website.
Apart from scraping data from redfin, WebHarvy can also be used to extract data from property listing sites like Zillow, Trulia, Realtor etc.
The following video shows how WebHarvy can be used to easily extract property details from redfin.com. Details like property price, address, area, built date, features, property history etc. are selected via the intuitive point and click interface. WebHarvy can follow each property link to extract additional data as well as automatically load and extract these data from multiple pages of listings.
As you can see, most of the details are selected by directly clicking on them. There is no complex configuration process or code/script to write. To know more and to familiarize with how WebHarvy can be used to extract data from websites, please follow the link below.