How to scrape data from product listings at Amazon's website ?
In this article we will see how WebHarvy can be easily configured to scrape data from Amazon's website. Being a generic web scraping software, WebHarvy can be configured to extract data from any website as per your requirement.
- 1. Amazon Basic Extraction
- 2. Scraping Amazon Best Sellers List
- 3. Scraping Reviews and Rating
- 4. Extracting high-res product images
- 5. Scraping data from a list of product page URLs
- 6. Scraping product data corresponding to a list of ASINs
Amazon Basic Extraction
Amazon is one of the most popular websites from which people in eCommerce businesses need to extract data. For the same reason we have a lot of demonstration videos related to Amazon data extraction in our YouTube channel.
WebHarvy lets you extract all required product data from listing pages at Amazon. This includes :
- Product name
- Images (download multiple product images)
- Bullet Points (description)
- Product Description
- Item/Model number
- Best Seller Rank under various categories
Watch the following video which shows how most of the above data can be extracted from Amazon product listings using WebHarvy.
Since Amazon has varying page layouts for various products, and since the amount of details present can vary from product to product, the following techniques are used in the method shown in the above video to obtain data accurately for all products in the listing.
1. Capture Following Text
This helps to get data based on a heading text, irrespective of the location of the text on the page. This is used to select product data like ASIN, Price, Shipping Weight etc. Know More
2. Regular Expressions
Regular Expressions help to accurately extract only the required portion of text (for example the rating value from an entire text block). This allows you to select text which cannot be directly selected using point and click method (without selecting some additional unwanted text). Know More
Data extracted can be easily exported as a local file (CSV, TSV, XML, JSON formats supported) or to a database (MS SQL, MySQL). There is no limit to the amount of data which can be extracted and exported. Listings which span across multiple pages can be easily extracted.
Scraping Amazon Best Sellers List
Another popular requirement is to scrape data from Amazon’s Best Sellers Lists (100 products). The best sellers list employs a different method of pagination, the page links are labelled 1-20, 21-40, etc. and there is no ‘next page link’. This is handled by using URL based pagination feature, as shown in the following video.
Scraping Reviews and Rating
Apart from manufacturer provided product data, Amazon also stores a wealth of user provided data in the form of product reviews and ratings. Using WebHarvy you can easily extract each product review as well as reviewer’s details as shown in the following video.
Extracting high-res product images
Multiple product images can be easily extracted from product details pages using WebHarvy. See the following video which shows how.
Scraping data from a list of product page URLs
When you already have a list of Amazon product page URLs and need to extract data from all of them, the Add URLs to configuration feature of WebHarvy can be used to scrape all URLs using a single configuration. The following video shows how.
Refer : How to scrape a list of URLs using a single configuration
Scraping product data corresponding to a list of ASINs
Another common requirement is to extract product details corresponding to a list of ASINs. The Keyword Scraping feature of WebHarvy allows you to automatically perform search for a list of ASINs at Amazon.com and extract search result data for each ASIN. Please watch the following video which shows the details involved.
Know more about Keyword Scraping feature of WebHarvy
The best thing about using WebHarvy for scraping products from Amazon is that configuring the scraper is incredibly easy. You can start extracting data from Amazon within minutes of installing the software. And in case you need any assistance you are assured to get a reply from us (email@example.com) within 24 hours.
We highly recommend that you try the free evaluation version available for download.