Scraping data from yellow pages using WebHarvy
In this article we will see how WebHarvy can be easily configured to scrape data from Yellow Pages websites. Being a generic web scraping software, WebHarvy can be configured to extract data from any website as per your requirement.
- 1. Basic Extraction
- 2. YP websites which load more listings in same page
- 3. Scraping yellowpages.com.au
- 4. Extracting location - Geo Coordinates (latitude/longitude)
- 5. Extract data for multiple search keywords and locations
- 6. Scraping hidden details like phone numbers & emails
WebHarvy can be easily configured to extract data from Yellow Pages websites. Various flavors of yellow pages websites (based on country) are supported. For example, the following list of YP websites are supported. Kindly note that the following list is not inclusive of all supported YP websites.
In general, the following details can be extracted from yellow pages listings
- Business Name
- Street Address
- State, Postal Code
- Phone Number
- Map Coordinates
- Business Description
Mined data can be exported (saved) to a spreadsheet or to a database. There is no limit to the number of contact details which can be extracted. Listings which span across multiple pages can be easily extracted.
Please watch the following video which shows how WebHarvy can be easily configured to extract basic details from a yellowpages.com listing.
YP websites which load more listings in same page
The following video demonstration shows how data can be scraped from multiple pages of yellow pages websites which has a 'Load more content' type link/button to load more listings in the same page (instead of a 'next page' link).
Scraping yellowpages.com.au listings
The following video demonstration shows how WebHarvy can be configured to scrape data from yellowpages.com.au website. Unlike other YP websites, the yp.com.au website requires a special technique of configuration. If you do not follow the method shown in the following video your configuration may not correctly extract all listings, especially from page 2 onward. This is because the data displayed for each listing (in a box) by yp.com.au website has varying layout.
Extracting location - Geo Coordinates (latitude/longitude)
WebHarvy can extract GEO map coordinates (latitude/longitude) from yellow page listings. Most yellow pages websites display a map in the listing details page indicating the location of the business. Even though the Geo coordinates are not displayed, they are present in the HTML source of the page. Using features to capture HTML, and to select only required portion of HTML by applying Regular Expressions, WebHarvy can extract this information as shown in the following video demonstration.
Keyword Scraping - Extract data for multiple search keywords and locations (ZIP codes)
WebHarvy can be configured to extract required business listing details for multiple search keywords and locations. The Keyword Scraping feature of WebHarvy is used for this purpose. During mining, WebHarvy will load search results for each combination of search keyword and location, and scrape listing details from multiple pages of search results. Please watch the video below.
Scraping hidden details like phone numbers & emails
Some websites do not directly display the phone numbers or email addresses of businesses on page. Instead a Contact/Email button is provided. Sometimes a 'Click' is required to reveal a partially displayed phone number or email. WebHarvy covers all the above cases.
The following video shows how hidden phone numbers can be extracted.
The following video shows how hidden email addresses can be extracted.
The best thing about using WebHarvy for scraping yellow pages is that configuring the scraper is incredibly easy. You can start extracting data from within minutes you install the software. And in case you need any assistance you are assured to get assistance from us (firstname.lastname@example.org) within 24 hours.
We recommend that you try the evaluation version available for download.