WebHarvy 5.5 (Custom User Agent String, Handles frames, better form submission/navigation)

The following are the main changes (features/improvements) of WebHarvy 5.5

1. Custom User Agent String

If you go to WebHarvy Settings > Browser tab, you can enable custom user agent string as shown below.

The ‘Enable custom user agent string’ option allows you to specify a user agent string which WebHarvy configuration and mining browsers will use. This option can be used to make WebHarvy’s browser appear like another specific browser (ex: Microsoft EdgeMozilla FirefoxGoogle Chrome or Apple Safari) to websites from which you are trying to extract data.

2. Better form submission, initial navigation

Suppose that you need the configuration to input values to a search form (like the one shown below) and then click the ‘Search’ button to perform search and display results. The results contain the data which you need to extract.

Earlier, you needed to disable pattern detection before filling the form fields. After clicking the search button, when the data which you need to extract is displayed, you will need to enable pattern detection back again, before selecting the required data.

But now, with the latest version, you no longer need to adjust the pattern detection state manually. WebHarvy will handle this internally, automatically.

3. Open frames and select data

Earlier, if the data which you need to select for extraction occur within a frame inside the page, you needed to find the frame URL and load the frame URL independently within WebHarvy and then start configuration.

With this version, we have added a new Capture window option to open frames. Whenever you click on any item which occurs within a frame, the resulting Capture window displayed will have an ‘Open Frame’ option clicking which WebHarvy automatically loads the frame contents within the browser view, so that you can proceed with data selection.

4. Browser Search

You can hit CTRL + F in configuration browser (while not in configuration mode) to bring up the search window, using which you can perform textual search on currently loaded page.

5. Capture full page HTML

Sometimes you will need to capture the full page HTML to extract some data within it by applying regular expressions. Earlier you needed to click anywhere on the page and select Capture More Content option multiple times so that the whole page content is selected and then you can select Capture HTML option to get the full page HTML. With the latest version, you can double click on the ‘Capture HTML’ toolbar button to capture the full page HTML directly.

6. Reset settings to default

You no longer need to remember what the default settings were. Just click the ‘Reset settings to default’ link in the Settings window.

7. Lower repetition intervals for scheduled tasks

Mining tasks scheduler now allows you to repeat mining tasks at 5, 10, 15 and 30 minutes intervals.

Minor Changes

  1. ‘Enable Web Security’ option in Browser Settings is ON by default
  2. Browser handles ‘Need Client Certificate’ request from Web Servers
  3. Updated internal browser to latest possible version of Chromium
  4. HTTP2 support enabled
  5. Bug fixes and overall improvements
    1. Fixed issue where some selected data items were not extracted correctly during mining
    2. Preview generation is stopped when configuration is stopped
    3. Deleting data fields not allowed while preview generation is in progress
    4. Fixed issue with ‘pattern detection enabled for a while’ soon after opening popup
    5. Issue with editing start page URL in a configuration with multiple URLs fixed
    6. Single-term search supported from configuration browser address bar
    7. Pagination controls enabled in Miner window for single-page configurations after mining is stopped. Fixed.
    8. Fixed bug in Keyword Scraping due to case sensitiveness of keyword replacement in start URL / Post-data.
    9. Sometimes while starting WebHarvy the initial page (quick start guide) takes forever to load. Fixed.

As always, you can download and install the latest version from https://www.webharvy.com/download.html.

How to scrape property details from Zillow real estate listings ?

Scraping Zillow Real Estate Listings

The following video shows how WebHarvy can be easily configured to extract property details from Zillow’s real estate listings. Details like address, price,  Zestimate, beds/baths/area, images, price history, agent/owner details etc. can be extracted.

Most of the details are selected during configuration by directly clicking over them and selecting Capture Text, or by using the Capture Following Text feature whenever applicable. Regular expressions are required to correctly extract Zestimate and images – details of which are provided in the video.

Watch more demonstration videos related to scraping data from Zillow

Try WebHarvy

If you are new to WebHarvy, we highly recommend that you download and try using our free trial version. WebHarvy is very easy to configure and run, to scrape data from most websites.

Get Started with Web Scraping using WebHarvy

Questions?

If you have any questions, please feel free to contact our tech support.

Extracting opening odds from oddsportal website for any bookmaker

Opening odds

Opening odds values are displayed in a tooltip/popup in oddsportal website as you hover the mouse over the odds values, as shown below. So directly clicking and selecting the opening odds value from the popup does not work.

How to extract opening odds values for any bookmaker from oddsportal ?

The trick is to run a JavaScript code which would simulate ‘mouse hover’ over the required item during mining. Then, when we directly select the value from the popup during configuration, it will be correctly extracted during mining. The following video explains the steps involved. It also explains how you can modify the JavaScript code used to extract opening odds values, for any bookmaker and for home, draw or away values.

The JavaScript code used to simulate mouse hover and the RegEx string used to extract the odds value can be found in the video description.

If you have any questions or need assistance in configuring WebHarvy for oddsportal extraction, please do not hesitate to contact our support at https://www.webharvy.com/support.html.

If you are new to WebHarvy, then WebHarvy is a Visual Web Scraper which is very easy to build scrapers for any website. To know more please visit https://www.webharvy.com/articles/getting-started.html

How to get data for Machine Learning projects ?

The need for data

Machine learning algorithms require large quantities of high quality data to learnData is required to train, test and validate machine learning models before they can be used for prediction. The success of a machine learning project depends heavily on the quality and quantity of data used for training and testing the model.

Public Data-sets for Machine Learning

For learning ML and playing around with various ML algorithms and libraries there are many public data-sets available. But for training-testing models which solves problems unique to your projects, the data required may not be available first-hand in public domain.

Web Scraping for collecting training/testing data

In such cases the required data might be already present online in structured format. Then, the technique of web scraping can be used to extract them to a spreadsheet or database.

For example, if your model learns from thousands of reviews/ratings provided by customers for various products in an eCommerce website or for various hotels/restaurants in sites like TripAdvisor, then this data can be easily fetched using web scraping. Or, if your model learns from real estate data of thousands of properties from various locations, then that too can be extracted by employing web scraping.

Using WebHarvy for easy web scraping

You can either write your own script/code to fetch data from multiple pages of various websites, or more easily, you can use a visual web scraping tool like WebHarvy to get the data which you need with the least effort in a more efficient way. In case you are interested, please follow the link below to know more.

Getting started with WebHarvy for Web Scraping

Have any questions ?

Feel free to contact us if you have any questions or need any assistance in fetching data using WebHarvy.

How to automatically extract high resolution product images from Amazon using WebHarvy

WebHarvy can be used to extract product data (product details, images, specification, rank, reviews, rating, images etc.) from Amazon.

Learn more about image extracting using WebHarvy

Scraping high resolution product images from Amazon

The following video demonstrates 2 methods. The first method shows how multiple medium resolution images can be automatically extracted from the thumbnail images displayed besides the main product image. The second method shows how the original high resolution images can be extracted, but this involves some manual steps for repeating the same for multiple images.

If you are new to WebHarvy, then we highly recommend that you follow the link below to get familiarized with the general steps to be followed while using WebHarvy to extract data from any website.

https://webharvy.com/articles/getting-started.html

If you have any questions feel free to contact our technical support.

How to easily scrape sports betting odds using WebHarvy ?

WebHarvy is a visual web scraper with a point-click-select interface for easily extracting data from any website

Betting Odds for Sports Analytics

Getting sports betting odds values from multiple bookmaker and odds comparison websites like oddsportal is crucial for sports analytics and betting. Once you get the necessary odds values in table format, then processing/visualizing them for your requirement becomes quite easy.

WebHarvy can help you with extracting the required odds values like opening/closing odds for home/draw/away – Asian Handicap (AH), Over Under (O/U) etc. from various odds comparison websites like oddsportal, betexplorer, flashscore etc. We have demonstration videos in our YouTube channel explaining the steps to follow for extracting data as per various requirements from these websites.

Scraping odds from OddsPortal

Extracting bet365 home, draw, away odds of matches from various leagues from oddsportal website

Scraping Over/ Under values for Pinnacle from oddsportal.com

All oddportal demonstration videos

Extracting odds from FlashScore

Opening and Closing odds extraction from Flashscore.com

Over/Under odds extraction from Flashscore.com using WebHarvy

Bet365

Extacting match stats and betting odds values from bet365.com using WebHarvy

BetExplorer

How to extract bet365 odds from BetExplorer.com website ?

How to scrape opening odds of a specific bookmaker from BetExplorer website ?

Know More

In case you are interested we highly recommend that you download and try using the free evaluation version of WebHarvy available in our website.

To get started, please click here

Questions ?

If you have any questions please feel free to contact our tech support team at the following link.

Contact us

WebHarvy Settings Explained

WebHarvy Settings involves various options which you can set for Miner, Browser, Proxies, Category/Keyword and Images. We have created the following video which explains the various settings and how each of them can affect the mining performance/consistency and provide additional functionality.

Please contact our support if you have any questions.

WebHarvy 5.4 (Auto delete cookies, Load more data using JS)

What is new in WebHarvy Version 5.4 ?

Automatically delete cookies while mining

Websites can get details regarding your previous visits using cookies stored locally by the browser. A new Browser Settings option has been added to prevent this. WebHarvy will periodically delete browser cookies during mining when this option is enabled.

New pagination method : Load more data in same page using JavaScript

In Pagination using JavaScript, you can now specify the pagination type. Select the ‘Load Next Page’ option if the code loads a new page of data. Select the ‘Load More Data’option if the code loads more data in the same page, without loading a new page. This option helps you to handle pages where more data is loaded in the same page when a button/link is clicked, or when you scroll down the page.

Transfer license to another machine

It is now possible to transfer your WebHarvy license from one PC/laptop to another. For details please refer this link.

Edit mining tasks in Windows Task Scheduler interface

For finer control over how a mining task should be triggered and how often it should be repeated, you can now edit WebHarvy mining tasks directly in Windows Task Scheduler interface.

Minor changes

  1. License upgrades can now be purchased directly from the application, from Help menu > About or while trying to unlock the software
  2. Prompts to save unsaved configurations
  3. Displays previously set JavaScript code for pagination, if user selects JavaScript pagination option again

Download

You may download and install the latest version from the following link:

http://www.webharvy.com/download.html

New Video Tutorial Series

We have also updated our Video Tutorial Series.

 

How to extract owner phone number and address from Zillow (Sale By Owner) listings ?

The following video shows how WebHarvy can be configured to extract owner phone numbers and addresses from Zillow’s ‘Sale By Owner’ listings.

The Regular Expression strings used in the video to follow listing links and also to correctly extract phone numbers are :

href=”([^”]*)

(\d{3}-)?(\d{3}-\d{4})|(\(\d{3}\) ?\d{3}-\d{4})

 

How to get property data?

Millions of records of property details are publicly available in real estate websites like Zillow, Realtor, Trulia etc., or in other online real estate websites specific to your country/region. If having a quick access to this data is vital to the success of your business, then you can use our software, WebHarvy, to easily extract data from these websites.

Property details like address, location, images etc. and building details like number of beds, baths, area etc.  and contact details like owner/agent phone number, email etc. can be easily extracted using WebHarvy.

The following video shows how WebHarvy can be configured to extract property listing details from Zillow.

Property details from Realtor website can be extracted as shown in the following video.

You may also watch our entire playlist of videos related to real estate data extraction.

If you are interested in knowing more about data extraction using WebHarvy we highly recommend that you refer the following link.

Getting started with Web Data Extraction using WebHarvy

If you have any questions, you may please contact our technical support at the following link

http://webharvy.com/support.html