support@sysnucleus.com | sales@sysnucleus.com | YouTube Channel | KB Articles

Articles Home

Product Help

YouTube Channel

WebHarvy Blog


How to scrape data from product listings at Amazon's website ?


In this article we will see how WebHarvy can be easily configured to scrape data from Amazon's website. Being a generic web scraping software, WebHarvy can be configured to extract data from any website as per your requirement.

  1. 1. Amazon Basic Extraction
  2. 2. Scraping Amazon Best Sellers List
  3. 3. Scraping Reviews and Rating
  4. 4. Extracting high-res product images
  5. 5. Scraping data from a list of product page URLs
  6. 6. Scraping product data corresponding to a list of ASINs

How to easily scrape data from websites using WebHarvy ?

Amazon Basic Extraction


Amazon is one of the most popular websites from which people in eCommerce businesses need to extract data. For the same reason we have a lot of demonstration videos related to Amazon data extraction in our YouTube channel.

WebHarvy lets you extract all required product data from listing pages at Amazon. This includes :

etc.

Watch the following video which shows how most of the above data can be extracted from Amazon product listings using WebHarvy.


Since Amazon has varying page layouts for various products, and since the amount of details present can vary from product to product, the following techniques are used in the method shown in the above video to obtain data accurately for all products in the listing.

  1. 1. Capture Following Text
    This helps to get data based on a heading text, irrespective of the location of the text on the page. This is used to select product data like ASIN, Price, Shipping Weight etc. Know More

  2. 2. Regular Expressions
    Regular Expressions help to accurately extract only the required portion of text (for example the rating value from an entire text block). This allows you to select text which cannot be directly selected using point and click method (without selecting some additional unwanted text). Know More

Data extracted can be easily exported as a local file (CSV, TSV, XML, JSON formats supported) or to a database (MS SQL, MySQL). There is no limit to the amount of data which can be extracted and exported. Listings which span across multiple pages can be easily extracted.

Scraping Amazon Best Sellers List


Another popular requirement is to scrape data from Amazon’s Best Sellers Lists (100 products). The best sellers list employs a different method of pagination, the page links are labelled 1-20, 21-40, etc. and there is no ‘next page link’. This is handled by using URL based pagination feature, as shown in the following video.



Scraping Reviews and Rating


Apart from manufacturer provided product data, Amazon also stores a wealth of user provided data in the form of product reviews and ratings. Using WebHarvy you can easily extract each product review as well as reviewer’s details as shown in the following video.



Download the FREE evaluation version of WebHarvy

Extracting high-res product images


Multiple product images can be easily extracted from product details pages using WebHarvy. See the following video which shows how.



The technique explained here is used to capture multiple product images automatically, in the above demo.

Scraping data from a list of product page URLs


When you already have a list of Amazon product page URLs and need to extract data from all of them, the Add URLs to configuration feature of WebHarvy can be used to scrape all URLs using a single configuration. The following video shows how.



Refer : How to scrape a list of URLs using a single configuration

Scraping product data corresponding to a list of ASINs


Another common requirement is to extract product details corresponding to a list of ASINs. The Keyword Scraping feature of WebHarvy allows you to automatically perform search for a list of ASINs at Amazon.com and extract search result data for each ASIN. Please watch the following video which shows the details involved.



Know more about Keyword Scraping feature of WebHarvy

Summary


The best thing about using WebHarvy for scraping products from Amazon is that configuring the scraper is incredibly easy. You can start extracting data from Amazon within minutes of installing the software. And in case you need any assistance you are assured to get a reply from us (support@sysnucleus.com) within 24 hours.

Watch more videos of WebHarvy, related to data extraction from Amazon

We highly recommend that you try the free evaluation version available for download.

Keywords : Scraping Amazon, Scraping Amazon Product Listings, Amazon Scraper, Amazon Data Extraction