{"id":675,"date":"2018-10-24T03:38:02","date_gmt":"2018-10-24T03:38:02","guid":{"rendered":"http:\/\/webharvy.com\/whblog\/?p=675"},"modified":"2018-10-24T03:38:02","modified_gmt":"2018-10-24T03:38:02","slug":"webharvy-5-3-parallel-mining-chrome-developer-tools","status":"publish","type":"post","link":"https:\/\/www.webharvy.com\/blog\/webharvy-5-3-parallel-mining-chrome-developer-tools\/","title":{"rendered":"WebHarvy 5.3 (Parallel mining, Chrome developer tools)"},"content":{"rendered":"<p>&#8216;<a href=\"http:\/\/www.webharvy.com\/articles\/troubleshoot.html#MiningSlow\" target=\"_blank\" rel=\"noopener noreferrer\">How to increase mining speed ?<\/a>&#8216; was one of the most commonly asked questions by our users. With previous versions, the main limitation was that when links had to be followed from the starting page to get each listing details, the miner took more time to scrape a page full of listings. This is because WebHarvy used to sequentially load links one after the other to scrape data.<\/p>\n<h3>Parallel Mining<\/h3>\n<p>Instead of processing links to be followed and extracted one after the other, the latest update of WebHarvy processes them in bulk, in parallel, using multiple mining threads. You can set the maximum number of parallel mining threads which WebHarvy uses in <a href=\"https:\/\/www.webharvy.com\/tour81.html#AdvancedMinerOptions\" target=\"_blank\" rel=\"noopener noreferrer\">Advanced Miner Options<\/a> window as shown below.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-676 alignnone\" src=\"http:\/\/webharvy.com\/whblog\/wp-content\/uploads\/2018\/10\/advancedoptions-300x233.png\" alt=\"\" width=\"300\" height=\"233\" \/><\/p>\n<p>Providing a higher value for &#8216;Maximum number of parallel mining threads&#8217; option in the above window will increase mining speed. But, to run more threads in parallel, WebHarvy will require more memory, processing power and\u00a0 internet-bandwidth. So we recommend that you increase this setting only based on your system&#8217;s CPU, installed physical memory (RAM) and internet speed.<\/p>\n<h3>Chrome developer tools<\/h3>\n<p>This feature is for power users who are familiar with web page internals like HTML, DOM structure and JavaScript. We use this tool extensively while supporting our customers with not so straightforward scraping scenarios and complex websites.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-677 alignnone\" src=\"http:\/\/webharvy.com\/whblog\/wp-content\/uploads\/2018\/10\/dev-tools-300x125.png\" alt=\"\" width=\"300\" height=\"125\" \/><\/p>\n<p><a href=\"https:\/\/developers.google.com\/web\/tools\/chrome-devtools\/\" target=\"_blank\" rel=\"noopener noreferrer\">Chrome Developer Tools<\/a> allow you to easily inspect the internal structure of a web page, see how the page is organised, view the HTML and data hidden in HTML source and <a href=\"https:\/\/www.webharvy.com\/tour1.html#ScrapeByRegEx\" target=\"_blank\" rel=\"noopener noreferrer\">devise methods<\/a> to extract them. You can also find the JavaScript code run when buttons\/links are clicked and directly call them using <a href=\"http:\/\/www.webharvy.com\/tour1.html#RunScript\" target=\"_blank\" rel=\"noopener noreferrer\">these<\/a> <a href=\"http:\/\/www.webharvy.com\/tour3.html#JS\" target=\"_blank\" rel=\"noopener noreferrer\">features<\/a>.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-680\" src=\"http:\/\/webharvy.com\/whblog\/wp-content\/uploads\/2018\/10\/inspect-devtools-1024x795.png\" alt=\"\" width=\"640\" height=\"497\" \/><\/p>\n<h3>More accurate automatic sub-text selection<\/h3>\n<p>To <a href=\"http:\/\/www.webharvy.com\/tour1.html#ScrapeSubText\" target=\"_blank\" rel=\"noopener noreferrer\">scrape only a portion of the text<\/a> displayed in the Capture window, you can highlight the required portion with mouse. We have improved the accuracy of this method, especially when the text selected is in between delimiter characters like currency symbols, punctuation\/special characters, new line\/space etc.<\/p>\n<h3>Improvements and bug fixes<\/h3>\n<ol>\n<li>Improved <a href=\"http:\/\/www.webharvy.com\/tour1.html#SelectDropdown\" target=\"_blank\" rel=\"noopener noreferrer\">select dropdown option<\/a>. This option now reflects the selection (selected item change) on the page. Earlier separate JavaScript code needed to be run by the user to reflect page change upon dropdown list selection.<\/li>\n<li>Miner now <em>scrolls<\/em> the page before clicking on\u00a0<a href=\"http:\/\/www.webharvy.com\/tour3.html#LoadMore\" target=\"_blank\" rel=\"noopener noreferrer\">Load More links<\/a>. This is done to make sure that the &#8216;load more&#8217; link is visible and loaded before miner tries to click it.<\/li>\n<li>When <a href=\"https:\/\/support.microsoft.com\/en-in\/help\/4027860\/windows-10-view-display-settings\" target=\"_blank\" rel=\"noopener noreferrer\">text scaling in Windows<\/a> is not set to 100% (which is the recommended setting on most systems), it was not possible to click and correctly select the required data items during configuration. This issue is fixed in this version. Configuration time data selection works irrespective of text scaling.<\/li>\n<li>Fixed issue related to downloading images behind SSL.<\/li>\n<li>Non-visibility of miner window in multi monitor systems when monitor configuration changes is fixed.<\/li>\n<li>Earlier, the Capture window would become unresponsive for a second or two after applying Regular Expression on HTML. This unresponsive state has been removed.<\/li>\n<li>Added browser zoom level and number of parallel mining threads info in status bar of configuration browser.<\/li>\n<li>Fixed issue with loading and displaying upgrade purchase page in cases where user&#8217;s license has expired.<\/li>\n<li>Disabled &#8216;Mine all pages\/Number of pages to mine&#8217; controls while mining is in progress.<\/li>\n<li>Updated internal browser to a more recent version of Chromium.<\/li>\n<\/ol>\n<h3>Update to the latest version<\/h3>\n<p>As always, you can download and install the latest version from\u00a0<a href=\"http:\/\/www.webharvy.com\/download.html\">http:\/\/www.webharvy.com\/download.html<\/a>.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>&#8216;How to increase mining speed ?&#8216; was one of the most commonly asked questions by our users. With previous versions, the main limitation was that when links had to be followed from the starting page to get each listing details, the miner took more time to scrape a page full of listings. This is because &#8230; <a title=\"WebHarvy 5.3 (Parallel mining, Chrome developer tools)\" class=\"read-more\" href=\"https:\/\/www.webharvy.com\/blog\/webharvy-5-3-parallel-mining-chrome-developer-tools\/\" aria-label=\"Read more about WebHarvy 5.3 (Parallel mining, Chrome developer tools)\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,8],"tags":[26,33,35,73,74,85,145,147,150],"class_list":["post-675","post","type-post","status-publish","format-standard","hentry","category-release-update","category-webharvy","tag-chrome-developer-tools","tag-data-extraction","tag-data-scraping","tag-mining","tag-new-release","tag-parallel-mining","tag-web-miner","tag-web-scraper","tag-webharvy"],"_links":{"self":[{"href":"https:\/\/www.webharvy.com\/blog\/wp-json\/wp\/v2\/posts\/675","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.webharvy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.webharvy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.webharvy.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.webharvy.com\/blog\/wp-json\/wp\/v2\/comments?post=675"}],"version-history":[{"count":0,"href":"https:\/\/www.webharvy.com\/blog\/wp-json\/wp\/v2\/posts\/675\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.webharvy.com\/blog\/wp-json\/wp\/v2\/media?parent=675"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.webharvy.com\/blog\/wp-json\/wp\/v2\/categories?post=675"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.webharvy.com\/blog\/wp-json\/wp\/v2\/tags?post=675"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}