We are happy to announce the release of WebHarvy 3.0. We have added a lot of new features in this major update. The feature/changes list for this update is the longest among all product updates which we have done till date. Here we go. .
- Added the following options in the Capture Window (grouped under ‘More Options’)
- Capture following text: Improved by using brute force search for all elements in the page
- Capture HTML: Option to scrape HTML of selected element
- Capture Text as File: Option to scrape text and save it as a local file (useful while scraping articles and blog posts)
- Click: Ability to scrape hidden (partially displayed) fields in webpages which require a click from the user to be displayed in full. For example phone numbers or email addresses which are displayed completely only if you click them.
- Apply Regular Expression: Option to apply Regular Expressions (RegEx) on captured text. RegEx can be applied even after applying ‘Capture following text’, ‘Capture HTML’ & ‘Capture More Content’ options.
- Capture More Content: Option to capture more text than the selected text, captures parent element’s text. For example this would capture the entire article if you apply this option after having selected the first paragraph.
- Option to individually select categories/links (one by one) for Category Scraping (Mine menu – Scrape a list of similar links)
- Export captured data as JSON
- Ability to mine data from tables (row-column / grid layout)
- Ability to mine pages which has fewer (less than 10) data items
- Option to test proxies before using them (Edit menu – Settings – Proxy Settings)
- Non responsive proxies are skipped during mining. Mining would not stop because of a bad/non-responsive proxy in the list.
- Option to manually add URLs to an existing configuration (Edit menu – Add URLs to configuration)
- Option to remove duplicates while mining (Edit menu – Settings – Miner)
- Added ‘Hourly’ frequency option in Scheduler (Mine menu – Scheduler)
- Added option to export data directly to database for scheduled mining tasks & command line
- Added ‘Clear’ option in Edit menu which will clear both the browser and data preview pane
- Language encoding defaulted to ‘utf-8’ for file exports (XML, CSV etc)
- CSV/Database export : handles delimiters (comma, quotes etc) in captured data
- Keyword/Category scraping allowed for 2 entries in evaluation version
- Rendering issues with in-built browser fixed – defaults to IE 9 rendering
- New Installer built with InstallShield
Download the latest installation of WebHarvy Web Scraper fromĀ https://www.webharvy.com/download.html.