Scraping Zillow to extract property details | Real Estate Data Extraction

WebHarvy can be used to easily extract property details from real estate websites like Zillow, Trulia, Realtor etc. In this article, we discuss how WebHarvy can be used to extract property details from Zillow.com listings. WebHarvy is very easy to configure and use to extract data from most websites. The point and click interface of … Read more

WebHarvy 5.2 | UI revamp + Oracle db support

Changes in 5.2 are mainly related to user interface and experience. The most visible change is the introduction of the ribbon menu system for providing easy access to most software features. In addition to the main interface, other windows like Scheduler / Export etc. have also been updated. The export functionality (to file or database) has … Read more

WebHarvy 5.2 | UI revamp + Oracle db support

Changes in 5.2 are mainly related to user interface and experience. The most visible change is the introduction of the ribbon menu system for providing easy access to most software features. In addition to the main interface, other windows like Scheduler / Export etc. have also been updated. The export functionality (to file or database) has … Read more

WebHarvy based on Google Chrome Released (version 5.0.1.148)

This release comes with least bells and whistles since we have not added features or changed cosmetics of the software. But still, this is a major upgrade. The change is all internal. WebHarvy has been using Microsoft’s Internet Explorer (IE) as its internal browser since inception. Microsoft stopped supporting IE a few years back when … Read more

WebHarvy 4.0.3.128 (Minor Update)

From this release on wards WebHarvy targets (depends on) .NET 4.5 which comes pre-installed on latest Windows editions. This results in smoother installation process, doing away with .NET 3.5 download and install which was previously required. Targeting .NET 4.5 also helps WebHarvy improve performance and resource usage, and to solve issues related to crashes while … Read more

WebHarvy version 3.4 released !

We’ve just released a new WebHarvy update. The following are the changes in this version. Major: Support for pagination where a link/button has to be clicked to load the next set of pages. More Info URL based pagination – automatically increment a numeral in start page URL to load subsequent pages. More Info One-click multiple image extraction … Read more

Use 'Capture Following Text' option to scrape data from details pages

While extracting data from details pages (page reached by navigating a link from the start page), it is recommended that the ‘Capture Following Text‘ option be used whenever possible to correctly and consistently scrape data. This is because the layout and the amount of data displayed in details pages may not be consistent. For example, … Read more

WebHarvy 3.1 (Minor Update)

The 3.1 update of WebHarvy which was released yesterday (July 24) has the following changes. Added option to Tag captured data rows with corresponding Keyword/Category. (Applicable only for Keyword/Category based Scraping). See the new Miner Settings Window (Edit menu – Settings) Option to separately set Page Load Timeout and AJAX Load Wait Time in Miner … Read more

WebHarvy Version 3.0 Released !

We are happy to announce the release of WebHarvy 3.0. We have added a lot of new features in this major update. The feature/changes list for this update is the longest among all product updates which we have done till date. Here we go. . Added the following options in the Capture Window (grouped under … Read more

How to scrape text following a heading using WebHarvy ?

In the latest update of WebHarvy, the Visual Web Scraping Software, the newly introduced ‘capture following text’ option allows you to capture text/block/paragraph following a heading within a webpage. Often with many websites the data to be scraped may not be located at the same position within all pages, but is guaranteed to be found … Read more