The User-Agent string of a web browser helps servers (websites) to identify the browser (Chrome, Edge, FireFox, IE etc.), its version and also the operating system (Windows, Mac, Android, iOS etc.) on which it is running. This mainly helps the websites to serve different pages for various platforms and browser types.
The same detail can be used by websites to block non-standard web browsers and bots. To prevent this we can configure web scrapers to mimic a standard browser’s user agent.
WebHarvy, our generic visual web scraper, allows you to set any user agent string for its mining browser, so that websites assume the web scraper to be a normal browser and will not block access. To configure this, open WebHarvy Settings and go to the Browser settings tab. Here, you should enable the custom user agent string option and paste the user agent string of a standard browser like Chrome or Edge.
How to automatically download images from Instagram searches ?
In addition to downloading images, WebHarvy can also scrape textual data from Instagram like post content, followers of a profile etc.
If you are interested we highly recommend that you download and try using the free evaluation version of WebHarvy available in our website. To get started, please follow the link below.
WebHarvy can be used to scrape data from social media websites like Twitter, LinkedIn, Facebook etc. In the following video you can see how easy it is to scrape tweets from Twitter searches using WebHarvy. Similar technique can be used to scrape tweets from a Twitter profile page.
In case you are interested, we recommend that you download and try using the free evaluation version of WebHarvy available in our website. To get started, please follow the link given below.
The following are the main changes in this version.
Option to leave a blank row when data is unavailable for a keyword/category/URL
In WebHarvy’s Keyword/Category settings page a new option has been added to leave a blank row filled with corresponding keyword/category/URL when data is unavailable for that item. This option is available only when ‘Tag with Category/URL/Keyword’ option is enabled.
For mining data using a list of keywords, categories or URLs, enabling this option helps in identifying the items for which WebHarvy failed to fetch data, as shown below.
Proxies are used internally by WebHarvy, not system-wide
In earlier versions, proxies set in WebHarvy Settings were applied system wide during mining. This caused side effects for other applications especially in cases where proxies required login with a user name and password and when a list of proxies were cycled. Starting from this version WebHarvy will use proxies internally so that other applications are not affected during mining. You still can apply proxies directly in Windows settings (system wide) and WebHarvy will use it automatically.
Also, the configuration browser will start using the proxies which are set in WebHarvy settings. In earlier versions, proxies were used only during mining.
While saving/exporting mined data to a database or excel file which already contains data (from a previous mining session), WebHarvy now allows you to update those rows of data which has the same first column value as those in the newly mined data, without creating duplicate rows.
For file export this option is currently available only for Excel files.
New Capture window options : Page reload and Go back
2 new capture window options for page interaction have been added – Reload & Go back. The reload option is helpful in cases where a page is not correctly loaded first time when a link is followed. The ‘Go back’ option navigates the browser back to the previously loaded page.
Keywords can be added even after starting configuration
During configuration, in pages reached by following links from the starting page, links (URLs) selected by applying Regular Expressions on HTML can be followed using the ‘Follow this link’ option. Earlier, only the Click option was available for this scenario.
Automatically handles encoded URLs selected from HTML. Example: URLs including ‘&’. This works for following links as well as for image URLs.
Bug related to scraping a list of URLs when one of the URLs fails to load fixed.
While scraping a list of URLs, URLs which do not start with HTTP scheme part (http:// or https://) are handled.
Download the latest version
The latest version of WebHarvy is available here. If you are new to WebHarvy we would recommend you to view our ‘Getting started‘ guide.
Web Scraping is the automated process of extracting data from websites using software or an online service. This technique can be used to easily extract property owner or real estate agent contact details from websites like Zillow, Trulia, Realtor etc.
WebHarvy is a point and click, visual web scraper which can be used to extract data from websites.
Getting agent phone numbers
Most real estate websites allow you to search and view details of agents catering to a specific region. The following video shows how WebHarvy can be used to extract agent contact details like name, address, phone number etc from Zillow.
Getting owner/agent contact details from property listings
Owner or agent contact details can also be extracted from property listings as shown in the following videos.
Scraping agent phone numbers from property listings
Scraping owner phone numbers from property listings
Scraping leads from Realtor
The following video shows how agent contact details can be extracted from Realtor website.
We have an entire playlist of videos related to real estate data extraction which you may watch at this link. WebHarvy can be used to extract data automatically from any website.
We recommend that you download and try using the free evaluation version of WebHarvy to know more. To get started, please follow the link below.
WebHarvy is a generic visual web scraping software which can be easily configured to extract data from any website. In this article we will see how WebHarvy can be configured to extract data from Bing maps.
Details like business name, address, phone number, website address, rating etc. can be easily extracted from Bing maps listings using WebHarvy. Just like most map interfaces the details are opened in a popup over the map. The following video shows how WebHarvy can be configured to extract the required details.
As shown in the above video, the Open Popup feature of WebHarvy is used to open each listing details and scrape the data displayed. The Capture following text feature is used to correctly select details like address, website, phone number etc. It is recommended to use this method for data selection whenever the data is guaranteed to appear after a heading text.
Sometimes, Bing maps interface displays a ‘Website’ button, clicking which you can visit the website of the listed business. In such cases the website address as such will not be displayed in the listings popup.
1. To extract website address in such scenarios, during configuration, highlight and click the entire popup area as shown in the following image.
WebHarvy can be used to scrape data from TripAdvisor website. In this article we will be see how WebHarvy can be configured to scrape reviews and ratings from multiple listings at TripAdvisor website.
By default, TripAdvisor does not display the complete review text in its listings pages. You will have to click a ‘Read more’ link at the end of each partially displayed review, to view the complete review. This can be automated using WebHarvy as shown in the following video.
Regular expression strings are used to correctly select the date of review, and also the rating numerical value. The rating value is selected from the HTML source of the rating stars displayed by the website. The RegEx strings used are copied below.
wrote a review (.*)
We have several videos in our YouTube channel related to TripAdvisor data extraction. You may watch them at the following link.
In this article we will see how WebHarvy can be used to extract product data from eBay listings. Details like product name, price, product URL, item specifications (condition, weight, UPC/MPN etc.), seller description etc. can be extracted. WebHarvy can also extract product images (thumbnail as well as high resolution images) from eBay product listings. The following video shows the steps involved.