PagesJaunes.fr is one of the largest online business directories in France, making it a rich source of local business leads. In this article you will learn how to scrape business contact details from PagesJaunes.fr using WebHarvy.
WebHarvy is a visual web scraper which can scrape data from any website including online directories like Yellow Pages, White Pages, PagesJaunes etc. WebHarvy allows you to select the data which you need for extraction via simple mouse clicks and can be installed locally in your computer. (Know More)
Video Demonstration
The following video explains in detail how WebHarvy can be used to scrape business contact details from PagesJaunes.fr listings. The following details can be extracted.
- Business name
- Address
- URL
- Website
- Phone Number
- SIRET number
- etc.
As shown in the above video, there is no need to write any code or scripts to scrape data from PagesJaunes.fr using WebHarvy. The data can be scraped via the easy-to-use point-and-click user interface of WebHarvy.
Step-by-step guide for scraping PagesJuanes.fr
The following are the steps which you need to follow to scrape business contact details and other information from PagesJaunes.fr.
Download and install WebHarvy in your computer
WebHarvy is a desktop application which needs to be installed locally in your computer. We offer a free 15 days evaluation version of WebHarvy for you to try before purchase. The free version can be downloaded and installed from https://www.webharvy.com/download.html
Start Configuration
WebHarvy has a browser like user interface. Load the website from which you wish to scrape data and navigate to the page displaying the data which you need. In this case, load pagesjaunes.fr in WebHarvy’s browser and perform a search using a specific business keyword which you are interested in (example: doctors or electricians). You can also specify the location for which you need to view the results.
Once the search results page is displayed, start configuration by clicking on the Start button in the Configuration pane in Home menu. (Know More)
Select Data from the search results page
Once you have started configuration, you can just click and select the data which you need to extract from the starting page. Details like business name, rating, reviews, short business description etc. can be selected in this method. To select an item (text or image) for extraction, just click over it. WebHarvy will display a Capture window with various options. Select the Capture Text option to scrape the text of the selected item.
Know More: Selecting data and interacting with the page
Configure Pagination
Since the search results span across multiple pages we need to teach WebHarvy how to load those pages one after the other and scrape results. For this locate the pagination links at the end of the page and click on the direct link to load page number 2. Set it as the next page link.
Know More: Scraping data from multiple pages / Handling pagination
Following links
To scrape business contact details like website and phone we need to follow the title link of the results from the starting page to load the business details pages. For this, click on the title link of the first result and select the Follow this link option from the Capture window. Wait for the details page to load. Once loaded, you can click and select more details. Contact details like address, phone, website etc. and business details like SIRET number etc. can be extracted from the details page.
Know More: Following links and scraping data
Stop Configuration and Start Mine
Once you have selected all required data click the Stop button in the Configuration pane of Home menu to stop configuration. Click the Start Mine button to start mining data using the configuration. Once mining finishes, the mined data can be saved to a file or database. Various file formats and databases are supported.
Know More: Exporting data to various file formats and databases
Download and Try
In case you are interested, please download and try using the free evaluation version of WebHarvy. To get started, please follow this link.