{"id":1145,"date":"2021-08-27T11:11:26","date_gmt":"2021-08-27T11:11:26","guid":{"rendered":"https:\/\/www.webharvy.com\/blog\/?p=1145"},"modified":"2021-08-27T11:11:27","modified_gmt":"2021-08-27T11:11:27","slug":"scraping-chrome-web-store","status":"publish","type":"post","link":"https:\/\/www.webharvy.com\/blog\/scraping-chrome-web-store\/","title":{"rendered":"Scraping Chrome Web Store"},"content":{"rendered":"\n<p><a href=\"https:\/\/www.webharvy.com\/\">WebHarvy <\/a>can be used to scrape <a href=\"https:\/\/chrome.google.com\/webstore\/category\/extensions\">Chrome Web Store <\/a>data. Chrome web store displays Chrome extensions, listed under various categories. In this article we will see how WebHarvy can be used to scrape details of extensions listed under a specific category from Chrome Web Store. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Using WebHarvy to Scrape Chrome Web Store<\/h2>\n\n\n\n<p><a href=\"https:\/\/chrome.google.com\/webstore\/category\/extensions\">Chrome web store<\/a> uses <a href=\"https:\/\/www.webharvy.com\/tour3.html#ScrollToLoad\">infinite scroll for pagination<\/a>. Extensions are loaded in the same page as we scroll down. The newly loaded extensions are loaded under a different HTML element. For this reason, we will need to <a href=\"https:\/\/www.webharvy.com\/tour1.html#RunScript\">run a JavaScript code<\/a> to bring all extensions under a single HTML element, so that all of them will be selected during mining. <\/p>\n\n\n\n<p>The following video shows the steps involved in detail. You can find the various codes used in the <a href=\"https:\/\/youtu.be\/tsyU7BhfLqs\">video description<\/a>. <\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe loading=\"lazy\" title=\"Scraping Chrome Web Store using WebHarvy\" width=\"1200\" height=\"675\" src=\"https:\/\/www.youtube.com\/embed\/tsyU7BhfLqs?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<p>As shown in the above video, web scraping extensions details from Chrome Web Store is performed in 2 stages :<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>In stage 1, we get the URLs of all extension details pages<\/li><li>In stage 2, we scrape data from all these URLs using a single configuration.<\/li><\/ol>\n\n\n\n<p>The <a href=\"https:\/\/gist.github.com\/sysnucleus\/3b1b0c93941e806499424129ee220d1f\">JavaScript Code<\/a> used to collate all rows of data under a single element is given below. <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>var groups = document.querySelectorAll('&#91;role=\"grid\"]');\r\nvar parent = groups&#91;0];\r\nfor (var i = groups.length - 1; i >= 1; i--) {\r\n\tvar group = groups&#91;i];\r\n\tfor (var j = group.children.length - 1; j >= 0; j--) {\r\n\t\tparent.appendChild(group.children&#91;j]);\r\n\t}\r\n}\r<\/code><\/pre>\n\n\n\n<p>The <a href=\"https:\/\/www.webharvy.com\/tour1.html#ScrapeByRegEx\">regular expression string <\/a>used to get extension details page URL is given below.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>href=\"(&#91;^\"]*)<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Try WebHarvy<\/h2>\n\n\n\n<p>We highly recommend that you download and try using the free evaluation version of WebHarvy available in our website. To get started, please follow the link given below.<\/p>\n\n\n\n<p><a href=\"https:\/\/www.webharvy.com\/articles\/getting-started.html\">Getting started with WebHarvy<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>WebHarvy can be used to scrape Chrome Web Store data. Chrome web store displays Chrome extensions, listed under various categories. In this article we will see how WebHarvy can be used to scrape details of extensions listed under a specific category from Chrome Web Store. Using WebHarvy to Scrape Chrome Web Store Chrome web store &#8230; <a title=\"Scraping Chrome Web Store\" class=\"read-more\" href=\"https:\/\/www.webharvy.com\/blog\/scraping-chrome-web-store\/\" aria-label=\"Read more about Scraping Chrome Web Store\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2,5,8],"tags":[159],"class_list":["post-1145","post","type-post","status-publish","format-standard","hentry","category-use-case","category-howto","category-webharvy","tag-chrome-web-store"],"_links":{"self":[{"href":"https:\/\/www.webharvy.com\/blog\/wp-json\/wp\/v2\/posts\/1145","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.webharvy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.webharvy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.webharvy.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.webharvy.com\/blog\/wp-json\/wp\/v2\/comments?post=1145"}],"version-history":[{"count":4,"href":"https:\/\/www.webharvy.com\/blog\/wp-json\/wp\/v2\/posts\/1145\/revisions"}],"predecessor-version":[{"id":1150,"href":"https:\/\/www.webharvy.com\/blog\/wp-json\/wp\/v2\/posts\/1145\/revisions\/1150"}],"wp:attachment":[{"href":"https:\/\/www.webharvy.com\/blog\/wp-json\/wp\/v2\/media?parent=1145"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.webharvy.com\/blog\/wp-json\/wp\/v2\/categories?post=1145"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.webharvy.com\/blog\/wp-json\/wp\/v2\/tags?post=1145"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}