Web Scraping is the process of automatically extracting data from websites using software tools called web scrapers. In this article we will be using WebHarvy to scrape images and other data from Instagram.
WebHarvy is a visual web scraping software using which you can scrape data from any website. WebHarvy is very easy to use, you can select the data to be scraped from web pages by just clicking over them.
How to scrape Instagram images?
The following demonstration video shows how WebHarvy can be used to scrape images from Instagram. Along with downloading images, other details like profile name, image location, number of likes etc. can also be scraped.
The RegEx string and JavaScript code used in the above video can be found here.
Steps to follow to scrape images from Instagram
- 1. Download and install WebHarvy in your computer
- 2. Open WebHarvy and load the Instagram page from which you need to scrape images
- 3. Remove the mouse overlay displayed over images (showing number of likes and comments) using Dev Tools. Remove all mouse event handlers other than Mouse Up and Mouse Down.
- 4. Start Configuration
- 5. Go to Configuration menu tab and select the Disable pattern detection option
- 6. Click on the first image tile and select Click option from the resulting Capture window
- 7. Once the image details page is opened, you can click and select textual details like profile name, image description, number of likes etc. by using the Capture Text option in the Capture window.
- 8. To download the image, click over the image.
- 9. Select Capture More Content option twice
- 10. Select Capture HTML followed by Apply Regular Expression option
- 11. Paste and apply the following RegEx string : src[^\=]*="([^\s"]*)
- 12. Click the Capture Image button which will get enabled once the above RegEx is applied
- 13. To scrape images from multiple pages, use pagination via JavaScript method. Use the JavaScript code given here to configure pagination
Scraping multiple images (per post) from Instagram
Some Instagram posts may contain multiple images. WebHarvy can be configured to scrape all images displayed within each Instagram post. The following demonstration video explains how multiple images can be automatically scraped using WebHarvy. This video also shows how details like image location, image URL and content/description can be scraped.
The regular expression strings used in the video along with the JavaScript code used for pagination can be found in the video description.
How to scrape Instagram image URLs and number of likes?
The following video shows how image URLs and number of likes can be extracted from Instagram. The JavaScript and RegEx codes used in the video can be found here.
Scrape follower details of any Instagram profile
The following demonstration video explains in detail how you can scrape name and handles of all followers of any Instagram profile.
The JavaScript code used in the above video for pagination is copied below:
var scrollEl = document.getElementsByClassName('pbNvD fPMEg')[0].children[1].children[0].children[0]; scrollEl.children[scrollEl.children.length-1].scrollIntoView();
Know More
If you are interested in using WebHarvy to scrape data from Instagram and other websites, we highly recommend that you download and try using the free evaluation version of WebHarvy available in our website. To get started, please follow the link given below.
Need Support or Have Questions ?
In you have any questions please contact us at support team (support@webharvy.com) with the details (URL of the webpage + details of the data to be scraped). We are happy to help you get started with your first data extracting project using WebHarvy.