WebHarvy Version History

Complete release notes and changelog from the first version to the latest release.
Detailed version updates available at webharvy.com/blog/category/release-update/

Build 245 - 7.9.0.245

November 27, 2025

  • Browser updated to Chromium v139
  • Updated internal libraries and user agent strings

Build 244 - 7.8.0.244

October 8, 2025

  • Configuration time element highlighting and text selection accuracy improved by considering DPI scaling
  • Internally used libraries updated
  • Minor changes in quick start index page
  • Excel export updated by grouping data rows within a table.
  • Fixed bug while importing data from another Excel file.

Build 242 - 7.7.0.242

Aug 19, 2025

  • Browser component upgraded: Updated to latest Chromium for better compatibility with modern websites
  • Enhanced silent installation: Installer now supports enhanced silent installation for easier enterprise deployment
  • Improved link mining reliability: Automatic retry mechanism ensures more successful data extraction when pages fail to load initially
  • Excel export improvements: Fixed issue where data exceeding Excel's maximum cell length limit caused export failures

Build 239, 240 - 7.7.0.239/240

May 20, 2025

  • Updated Browser with latest Chromium: Enhanced compatibility and performance with modern web technologies
  • Improved 'Follow this link' functionality: Better navigation and link processing for more accurate data extraction

Build 238 - 7.7.0.238

May 1, 2025

  • Updated browser: Improved handling of Cloudflare protection.
  • Link Extraction Fix: Resolved the “Follow this link” disabled issue for many websites by automatically applying Capture More Content, Link_RegEx, and adjusting the MinLevelsUp miner setting.
  • Excel Export: Switched from SpreadSheetLight to ClosedXML library to fix Excel export issues.
  • UI Framework: Updated Dock Panel Suite package and fixed missing panel issue.
  • Scheduler: Updated Task Scheduler library.
  • Database: Updated MySQL and SQL Server libraries.
  • Proxy Handling: Now supports proxy import in ip:port:username:password format.
  • Pagination Fix: Pagination data is now correctly overwritten while editing configuration files. (Earlier, multiple pagination data items could exist in a single configuration.)

Build 233 - 7.6.0.233

January 2, 2025

  • Updated browser component: Enhanced handling of CloudFlare-protected websites for better access to secured content
  • Improvement of the Input Text feature: Better compatibility with modern JavaScript frameworks like React and Vue for dynamic search functionality

Build 230 - 7.5.0.230

November 15, 2024

  • In-app payment links updated
  • License key validation improved
  • Browser updated.

Build 228 - 7.4.0.228

November 7, 2024

  • Input text dynamically updates page content: Enhanced Input Text feature now triggers page updates automatically, perfect for dynamic search functionality
  • Improved page scroll functionality: Smoother page scrolling for better navigation through long pages and infinite scroll scenarios
  • Enhanced image download capabilities: Multiple images can now be captured and downloaded when using regular expressions to extract image URLs
  • Better configuration file formatting: XML configuration files are now auto-formatted for easier reading and debugging
  • Smart auto-scrolling for specific sites: Automatic scrolling enabled for sites like Zillow to load all available content

Build 222 - 7.3.0.222

June 18, 2024

  • Support for adding keywords via 'Input-Text' option: Enables keyword-based scraping even when search terms aren't visible in URLs, perfect for complex search scenarios
  • Miner options can now be saved in configuration: Each scraping configuration can maintain its own specific mining settings for consistent results
  • Updated browser component: Enhanced compatibility and performance improvements

Build 217 - 7.2.0.217

January 10, 2024

  • Updated quick start guide: Enhanced user onboarding experience with clearer instructions
  • Improved multi-level category scraping: Tag column now shows meaningful category paths instead of URLs for better data organization
  • Faster pagination processing: Significant performance improvements when handling multi-page data extraction
  • Browser updated to Chromium V117: Latest browser engine for better website compatibility
  • Enhanced update notifications: Improved links and user experience for version updates
  • Updated core libraries: Enhanced database connectivity (PostgreSQL, MySQL, Oracle) and task scheduling capabilities

Build 215 - 7.1.0.215

August 24, 2023

  • Enhanced Pagination Handling: Improved support for infinite scroll and load-more techniques commonly used on modern websites
  • Dynamic frame URL handling: Open Frame option now automatically manages frame URLs for better iframe content extraction
  • Smart URL conversion: Automatically converts relative URLs to absolute URLs during mining for more reliable link extraction
  • Faster normal pagination: Performance improvements for traditional page-by-page navigation
  • Configurable element highlighting: New browser settings allow disabling element highlighting during setup for cleaner interface
  • Updated browser engine: Enhanced compatibility and performance

Bug fixes:

  • TagName of elements can sometimes include special characters which are reserved characters in HTML selector string. These needed to be escaped and weren't done so in previous versions.
  • Fixed issue related to getting image URLs using Regular Expressions during mining.
  • Selecting dropdown options - item selection correctly reflected on page by firing the 'change' event.
  • Fixed bug in normal pagination where 'next page element text' occurs for another element (non-pagination) on page.

Build 207 - 7.0.1.207

May 2, 2023

  • Added "Scroll List" feature: New capture window option to automatically scroll through lists of items for comprehensive data extraction
  • Reduced script usage: Enhanced support for infinite scroll and "load more" pagination without requiring custom JavaScript code
  • Smart element targeting: Clicks on links, buttons and tabs are now text-based, ensuring reliable interaction even when element positions change

Bug fixes:

  • 'Scroll to load next page' option cleared (if checked) when configuration is stopped.

Build 204 - 6.7.0.204

March 17, 2023

  • Added PostgreSQL support: New database export option for PostgreSQL alongside existing MySQL and Oracle support
  • Updated browser component: Enhanced compatibility and performance improvements

Build 203 - 6.7.0.203

March 1, 2023

  • Major browser update which overcomes blocking by websites which employ Cloudfare/PerimeterX etc.
  • Major change in installer, moved to MSI (Wix) installer from .EXE (InstallShield)
  • Added support for PostgreSQL database export
  • Text data captured as files can be named based on value of another column (like images)

Bug Fixes:

  • Fixed bug related to clearing all proxies in WebHarvy settings.
  • Minor bug fix related to setting normal pagination link via JavaScript
  • 'AJAX Load Wait Time' renamed to 'Script Load Wait Time'

Build 199 - 6.6.0.199

November, 2022

  • Bug fix in RunScript (JavaScript) functionality which prevents application hang during mining

Build 198 - 6.6.0.198

October 4, 2022

Improvements:

  • Edit details of saved proxy servers: New feature allows editing proxy server details like username, password, address, and port for better proxy management
  • Updated browser to latest Chromium version: Enhanced compatibility and security improvements
  • JavaScript pagination support: Option to set pagination links via JavaScript when traditional pagination elements are not available on the page
  • Screenshot images can be named based on the value of the selected data column (Image Settings)
  • Duplicate removal feature ignores differences in category/keyword tagging column.
  • Added dropdown option to select User Agent strings of popular browsers like Chrome, Firefox, Edge etc. in Browser Settings.
  • Support for running JavaScript code on all pages without selecting a placeholder primary column.
  • Regular Expression window now contains prebuilt RegEx strings to select prices in various currencies.

Bug Fixes:

  • Bug fix in Category Scraping feature which prevents infinite looping on certain types of category pages.

Build 194 - 6.5.0.194

June 2, 2022

  • Bug fix for MySQL server (8.0.29) export: Updated MySQL library to ensure compatibility with the latest MySQL server versions
  • Updated internal libraries: Enhanced SpreadSheetLight library for improved Excel export functionality and reliability

Build 193 - 6.5.0.193

April 13, 2022

  • Internal browser updated to Chromium V96: Latest browser engine for improved website compatibility and performance
  • Added "Auto Enable 'Follow this Link' option": Automatically detects when elements contain URLs and enables link following for streamlined navigation
  • Enhanced Regular Expression support: Multiple item matching with commonly used RegEx patterns available via dropdown for easier configuration
  • Improved image organization: Option to save images in separate folders based on column names for better file management
  • Better URL handling: Support for relative URLs in 'Add URLs' window with improved export file format synchronization
  • Enhanced user experience: Warning when closing miner without saving data, clearer terminology for script loading settings

Build 192 - 6.4.0.192

December, 2021

  • Updated browser component to the latest available version
  • Fixed bug in capturing multiple image URLs from starting page (Image_URL_RegExMulti)

Build 191 - 6.4.0.191

December 21, 2021

  • Improved performance of "Capture Following Text" option: Faster data mining with enhanced text extraction capabilities
  • Faster data mining: General performance improvements for quicker scraping operations
  • Database export feature updates: Enhanced functionality for exporting data to various database formats
  • Updated browser to latest possible version of Chromium
  • Updated MySQL dependency library to Nuget package latest version solving SQL export issue

Build 190

August 1, 2021

  • Updated MySQL library to latest version (8.0.x) to fix authentication error (caching_sha2_password).

Build 189 - 6.3.0.189

September 9, 2021

Major Changes:

  • Added support for custom data fields: New fields include current page URL, page screenshot, mining date/time, and user-provided text for enhanced data collection
  • Smart field name suggestions: Automatically suggests meaningful field names for 'Capture following text' option to speed up configuration
  • Enhanced text capture performance: 'Capture following text' feature optimized for faster configuration and better user experience
  • Multiple image URL scraping: Now supports automatic extraction of multiple image URLs, expanding beyond just image downloads
  • Configuration-specific settings: Advanced Miner Options and registration info saved per configuration for easier project management
  • Updated browser engine: Latest browser version fixes loading issues with specific websites like bet365.com

Minor Changes:

  • Duplicate images are named -001, -002 etc. instead of -1, -2 etc.
  • For smart help (articles/videos search) 'www' part of website removed from search query string.

Build 187 - 6.2.0.187

June 12, 2021

  • New capture window option to add custom fields - date/time, current page URL, constant string
  • Saves registration details and WebHarvy version in the configuration file.

Build 186 - 6.2.0.186

May 11, 2021

  • Updated browser to fix loading issue for websites like bet365.com (earlier versions did not load the page at all)

Build 185 - 6.2.0.185

March 29, 2021

  • Updated location of license server database to AWS RDS

Build 184 - 6.2.0.184

March 17, 2021

  • Added support for different proxy types (HTTP, HTTPS, SOCK4, SOCK5)
  • Added Browser Setting Option to use separate browser engine for mining links. Helps miner recover from crashes during mining.
  • Added Browser Setting Option to disable opening popups
  • Updated browser to latest Chromium version V86

Build 179 - 6.0.1.179

August 19, 2020

  • Updated browser
  • Bug fix: 'Back' button of configuration browser, took browser to the quick start guide (2 pages before) than loading the page from which the link was followed.

Build 178 - 6.0.1.178

August 15, 2020

Major:

  • Option to add blank row with tag for category/keyword/URL scraping when no data is fetched
  • Proxies are set only within WebHarvy and not system wide. Proxies are set in configuration browser too.
  • Added 2 new Capture window options. More Options > Page > Reload & Go-Back
  • Added Update(Upsert) / Overwrite / Append options for database and Excel file export
  • Keywords can be added even after starting configuration

Minor:

  • 'Follow this link' option is enabled for Link_RegEx in details pages. Earlier only Click option was available.
  • Automatically handles encoded URLs selected from HTML. Example: URLs including '&' For Link_RegEx, Image_RegEx/Multi.
  • Security increased for database connection string and EO license string
  • 'Activation Exceeded' message includes link to 'How to transfer license ?'
  • Installing a new version will reset the evaluation period. This will allow users (with expired trials) to try new versions.
  • 'Enable JavaScript', 'Share Location' and 'Enable plugins' options removed from Browser settings

Bug Fixes:

  • While scraping a list of URLs, URLs without HTTP scheme part (http:// or https://) are handled. Earlier such URLs failed to fetch data.
  • Previously, while scraping a list of URLs, if one URL failed to load, data from previously loaded URL was sometimes mined for its row. Fixed.

Build 174 - 6.0.1.174

February 19, 2020

  • Updated browser to Chromium v77
  • Fixed issue in multi URL scraping when one of the intermediate URLs does not have next page link.

Build 173 - 6.0.1.173

January 8, 2020

  • Switched from XPath to CSS selector for data selection
  • New setting 'Data Selection Accuracy' added in Advanced Miner Options
  • Fixed issue where data from followed links are sometimes extracted only if Link_RegEx is used

Build 172 - 5.5.2.172

December 20, 2019

  • Added support for .webp images
  • Added protocol security and user agent while checking image URL for image type

Build 171 - 5.5.2.171

November 27, 2019

  • Fixed issue related to miner window freezing/hanging during mining. Happened especially when a larger number of parallel mining threads are used.

Build 170 - 5.5.1.170

November 22, 2019

  • Bug fix for configuration preview issue: Resolved issue where preview sometimes updated with only a single item during configuration
  • Enhanced .NET framework: Migrated to .NET 4.7 for better performance and security
  • Updated browser engine: Latest Chromium version for improved website compatibility
  • Better file naming: Increased image filename limit to 150 characters for more descriptive names
  • Improved focus handling: Fixed false click-through issues when switching between applications

Build 168 - 5.5.0.168

November 1, 2019

Improvements/Additions:

  • Added custom user agent string: Configure WebHarvy's browser to mimic different browsers and devices for better website compatibility
  • Improved frame handling: Enhanced support for iframes and frame-based websites with automatic frame opening options
  • Better form submission/navigation: Streamlined navigation without manual pattern management for clicks and form interactions
  • Enhanced browser features: Added search functionality (CTRL + F) and full page HTML capture for comprehensive data extraction
  • Faster scheduling options: Support for high-frequency mining tasks with 5, 10, 15, and 30-minute intervals
  • Enhanced security and compatibility: HTTP2 support, client certificate handling, and improved web security settings
  • Updated Chromium engine: Latest browser technology for better website support

Bug Fixes:

  • Fixed issue with getting XPath correctly during configuration by disabling caching on failure
  • Preview generation stopped when configuration is stopped
  • Deleting fields not allowed while preview generation is in progress (preview form context menu disabled)
  • Pattern disabled after opening popups.
  • For configurations with additional URLs added, editing Start URL did not update first URL in the URL list. Fixed.
  • Single term/word search supported from browser address bar
  • Fixed bug in Keyword Scraping due to case sensitiveness of keyword replacement in start URL / postdata.
  • Fixed bug : sometimes while the application starts up, the init page takes forever to load.

Changes due to browser updates:

  • All WebView operations called on thread runner context
  • Element highlighting on mouse hover during configuration is handled in a separate thread
  • Handles both mouse-up and mouse-down events to prevent false click throughs during configuration
  • All keyboard shortcuts handled in separate threads to prevent UI hang
  • Browsers for mining created when Miner window is displayed, and not during Start Mine.

Build 164 - 5.4.0.164

February 19, 2019

  • Auto-delete cookies feature: Automatically clear cookies during mining sessions to avoid being blocked by websites
  • New pagination method: "Load more data using JS" - Enhanced JavaScript-based pagination for sites with infinite scroll or load-more buttons
  • Improved license management: Options to deactivate and transfer licenses between machines, plus direct access to license upgrades
  • Enhanced task scheduling: Direct editing of scheduled mining tasks in Windows Task Scheduler for greater control
  • Better configuration management: Automatic prompts to save unsaved configurations and preservation of JavaScript pagination code

Build 161 - 5.3.0.161

November 21, 2018

  • Updated browser
  • Introduced Worker Process to solve crashes reported with latest Windows updates.
  • Fixed anchoring of controls in 'Run Java Script' window while resizing the window.

Build 160 - 5.3.0.160

October 24, 2018

  • Parallel mining capability: Revolutionary speed improvement with configurable parallel mining threads (1-10 threads, default 4) aimed at increasing mining speed significantly
  • Chrome developer tools integration: Built-in developer tools for advanced debugging and website analysis directly within WebHarvy
  • Enhanced configuration experience: Improved sub-text selection with better highlighting and more accurate dropdown/listbox selection reflection
  • Smart pagination handling: Automatic page scrolling to ensure 'Load More' buttons are properly loaded and accessible
  • Better display compatibility: Fixed data selection issues with non-100% Windows text scaling settings
  • Fixed issue related to downloading images, especially when they are behind SSL (https://)
  • Fixed issue related to non-visibility of miner window in multi monitor systems when monitor configuration changes.
  • Fixed issue related to deleting browser cache in WebHarvy > Browser Settings. Earlier cache was not entirely deleted.
  • Fixed unresponsiveness of Capture window (for a short period of time), after applying RegEx on HTML.
  • Removed all mobirise links from InitPage.
  • Added 'Firewall may be blocking' message in Activation failed window.
  • Removed browser emulation key addition (for previous IE version) from installer.
  • Added Zoom level and number of parallel mining threads to status bar info.
  • ALT + X assigned to Clear instead of CTRL + X which is associated with cut.
  • Fixed major issue with loading and displaying upgrade purchase page in cases where user's license has expired.
  • Disabled 'Mine all pages/Number of pages to mine' controls while mining is in progress.
  • Updated to the latest version of EO.WebBrowser.

Build 155 - 5.2.0.155

March 26, 2018

  • UI revamp with ribbon menu: Complete user interface overhaul featuring modern ribbon menu design and redesigned forms for better user experience
  • Oracle database support: New export option for Oracle databases alongside existing MySQL support
  • Updated export functionality: Enhanced threaded export system with cancellation support for both file and database exports
  • Excel as default export: Streamlined workflow with Excel as the default export format
  • Enhanced browser engine: Updated to EO.WebBrowser 18.0.98 fixing URL update issues
  • Multi line input form for JavaScript (Run Script)
  • Snippet of miner options displayed in status bar
  • Smart help for websites displayed in status bar
  • New 'Update Available' form
  • Added settings link in Miner form
  • Option to edit configuration (or run configuration) right after opening a saved configuration file.
  • Location sharing option in Browser Settings.
  • Preview columns are automatically resized.
  • Minor bug fixes.

Build 153

Private Release

  • Updated browser

Build 152 - 5.1.0.152

January 2018

  • Updated browser
  • Fixed bug where user could continue to unlock and use the software even after activation limit has been reached

Build 150 - 5.1.0.150

January 8, 2018

  • Direct Excel export: New native Excel export capability for seamless data output to spreadsheet format
  • Handles page numbers in JavaScript: Enhanced pagination support for JavaScript-driven page numbering systems
  • Updated Chromium engine: Enhanced browser stability with optimized settings to minimize crashes
  • Better zoom compatibility: Fixed data selection issues when browser zoom level is not set to 100%
  • Improved text formatting: Enhanced handling of line breaks and spaces in capture window for cleaner data extraction
  • Enhanced form handling: Better PostData management for form submissions and subsequent navigation
  • Bug fix in Keyword Scraping > Editing, wherein changing the first keyword was not possible
  • Self deactivates in case license is set as cancelled in WH License Server
  • Minimizes memory usage in mining thread by limiting the number of browser instances created (to 3, which are global and used forever)
  • Address bar can be used for Google search

Build 148 - 5.0.1.148

September 13, 2017

  • Switched from Internet Explorer to Google Chrome browser engine: Revolutionary upgrade replacing IE with Chrome for dramatically better website compatibility and performance

Benefits:

  • Faster mining performance: Significantly improved data extraction speed with the new browser engine
  • Enhanced stability: More reliable and thoroughly tested browser integration
  • Better modern website support: Chrome engine handles JavaScript-heavy and modern web technologies much better than IE

Build 141 - 4.1.5.141

May 2, 2017

All items from builds 130 to 140 / polished / tested

  • First release with browser abstraction layer which will be used later to port webbrowsercontrol to other browser controls (like Chromium)
  • Support for loading next page via JavaScript. User can directly provide script to load next page.
  • Support for next page links implemented via JavaScript
  • Increased size of miner's browser window (1920x1080)
  • 'Load More Content' pagination now works even when the configuration requires click/scroll/JS etc. (navigational items) in the start. Earlier 'Load More Content' was done only in the beginning, so it did not have a chance to run after initial navigation - to reach the page which contains the 'load more' link.

Bug Fixes:

  • WebMiner.NavigateToNextPage. Once next page is loaded, when checking if it is loaded correctly the first primary item is checked blindly fieldList[0] => which fails when the configuration starts with page interaction/navigation type dataitem like click, input-text etc. which may not be present in subsequent pages. So first real data item is checked instead.
  • Selecting data from popups. The popups can now be closed by using 'Click' option. 'Click' (along with other navigational items like JS) are handled within popups.
  • SQL data export. Added UTF-8 encoding in connection string for MySQL. Create table data type changed from text to nvarchar for SQL Server. Fixed chinese language export.
  • Other minor bug fixes

Build 139, 140 - 4.0.4.139/140

April 18, 2017

  • Fixed SQL Server export chinese chars encoding issue. CREATE TABLE datatype changed from TEXT to nvarchar

Build 138 - 4.0.4.138

April 5, 2017

  • Fixed bug related to scraping data from popups

Build 137 - 4.0.4.137

April 5, 2017

  • Fixed MySQL export chinese chars encoding issue. Modified connection string for MySQL.

Build 136 - 4.0.4.136

March 21, 2017

  • First release with WebHarvy-X codebase (browser abstraction layer WH*)
  • Also polished and tested changes in 130 to 135

Build 135 - 4.0.3.135

March 10, 2017

  • Fixed bug in pagination via Link_LoadMoreContent. Bug in HtmlParser.GetNextPageElement

Build 134 - 4.0.3.134

March 9, 2017

  • Fixed bug in pagination via JavaScript

Build 133 - 4.0.3.133

March 6, 2017

  • Implemented loading next pages via JavaScript. User can directly provide script to load next page
  • Increased size of miner browser (1920x1080)

Build 132 - 4.0.3.132

February 9, 2017

  • Fixed bug in selecting data from popups. The popups can now be closed by using 'Click' option. Earlier the click action was not handled within ParsePopup, so there was no need to close them, if required. Now clicks are handled. Also, when a popup is closed it can happen that the parent page is reloaded/dom-changed, so the elements stored can becomes invalid. For this reason the next element is found again from the start.

Build 131 - 4.0.3.131

February 7, 2017

  • Added support for loading more content in same page (load more data link), even when the page is reached by clicking links (navigation in configuration) from the start page.

Build 130 - 4.0.3.130

January 24, 2017

  • Bug fix related to pagination - loading subsequent pages.

Build 129 - 4.0.3.129

December 26, 2016

  • Redistributable in installer correctly set to .NET 4.5 (earlier it was .NET 4.0)

Build 128 - 4.0.3.128

December 5, 2016

  • Depends on .NET 4.5
  • Open Popup : During configuration, wait for ajax load
  • BugFix: Added case for Link_RegEx in BrowserContainer:EditConfiguration
  • BugFix: Adding Link_Back automatically for Link_RegEx navigations. Was not done earlier. (MinerParams.cs#Optimize())
  • Handles pagination when the next page link is a javascript code rather than an actual link/URL (attempts, no 100% fix)
  • Handles pagination when the next link contains a page number which changes from page to page
  • Minor bug fix in executing Javascript on page (allows page to load, allownavigate)

Build 127

Private Release

  • Handles page numbers in 'Load More Content' links
  • Minor bug fix in executing Javascript on page (allows page to load, allownavigate)

Build 126

Private Release

  • Targets .NET 4.5

Build 125 - 4.0.2.125

June 2016

MAJOR:

  • True multi-level category scraping. Scrapes main-sub-sub..final listing tree.
  • Support for multiple keyword lists for multiple input fields
  • Support for selecting drop-down/combo list options
  • Support for inputting text to input fields
  • Option to click and open a popup
  • Option to run java script on page
  • Option to scroll page (main or details) to load contents (not pagination)

MINOR:

  • Internet connection detection where Google is blocked (uses baidu)
  • Improvements in automatically scraping multiple images
  • Capture Image option automatically enabled via HTML/RegEx capture in case it is disabled by default
  • Option to name images by source file name or by value of a specified cell in data table
  • Allows applying Capture More Content after applying Capture HTML
  • Minor bug fixes
  • Quick access to items under More Options in Capture window via toolbar

Build 119 - 3.4.0.119

June 10, 2015

MAJOR:

  • Added pagination support: Revolutionary feature enabling navigation through multiple pages by clicking links or buttons
  • URL-based pagination: Automatic incrementation of numbers in URLs for sequential page loading
  • "One-click multiple image extraction": Breakthrough feature for capturing multiple images from detail pages in a single operation
  • Human emulation mode support for automatic pause injection
  • Online license activation introduced to prevent casual piracy
  • Assembly obfuscated for security

MINOR:

  • 'Click' option (Capture window > More Options > Click) can be used to navigate to the start page
  • Bug Fix : Data alignment issue in miner window data table when some fields of records do not have a value (blank columns)
  • Bug Fix : Keyword based scraping when encoding is required
  • Scheduler option to overwrite or append the export file in case the file already exists
  • 'Follow this link' option enabled even in away (details) pages. Uses 'Click' option underneath.
  • Bug Fix : Images going blank in some cases while mouse hovers over them during configuration
  • Bug Fix : New lines and tabs escaped in JSON export
  • HtmlParser updated to parse elements from <HTML> tag, so META tags can be extracted from the full HTML source of the page
  • Handles commas in keywords
  • Starts with a random proxy address from the proxy list while rotating proxies
  • In-built browser emulates IE 11 on default.

Build 106 - 3.3.0.106

June 2014

  • Depends on .NET 3.5 framework
  • Fixed encoding problems in URLs related to Category Scraping
  • Added option to disable patterns in start page
  • Context menu (copy/cut/paste) for 'Add URLs' window's URL list
  • Option to follow links obtained by applying RegEx on HTML - absolute & relative URLs handled
  • Separate options to scrape image URL and to download image file
  • Image_RegEx made to work for both relative and absolute URLs, works even when URL does not contain file extension
  • Fixed issue with image file extension - extension added automatically if not present in default file name
  • Added multiline option in RegEx
  • Pagination via page numbers - code refined
  • Faster mining 'restart' from where it stopped previously - remembers last mined URL

Build 100 - 3.2.0.100

Minor Release

  • Fixed bug in Registration (license activation) code for language encoding issues
  • Fixed bug in Miner window for Auto Scroll feature

Build 99 - 3.2.0.99

February 2014

MAJOR:

  • Ability to load next page by scrolling to the end of the page (Edit menu - Edit Options - Scroll down to load next page)
  • Ability to edit (add and remove) URLs from configuration (Edit menu - Edit Options - Add/Remove URLs from configuration)
  • Ability to add/remove keywords associated with the configuration
  • Ability to download images whose URL is obtained after applying RegEx on HTML of selected content
  • Ability to select category links one-by-one by disabling automatically parsing category links
  • Refined 'Capture following text' option
  • Handles cases where 'next page link' is a 'load/show more results/data' link/button.
  • RegEx now allows multiple groups to be captured
  • Handles different table layout (ASIN and ASIN:) of product details displayed by Amazon
  • Advanced Miner Options - Set MinChildCount and MinLevelUp miner options directly from Miner settings
  • Automatically check for updates - provides options to download update, upgrade, purchase new version
  • Authentication support for proxies for https (SSL) websites

MINOR:

  • Web browser context menu disabled (IE)
  • Popup-window cases, handles relative URLs
  • 'Set as Next Page Link' / 'Set as 'load more data' link' options always enabled in start page
  • More detailed error messages displayed while exporting to database
  • Handles scraping amazon wishlist where the XPATH varies a bit by introducing a new element (dummy) in between
  • Minor bug fix in HtmlParser.GetNextElement - parsing down to the childless child of the element found for clicking
  • Minor bug fix in HtmlParser.PerformClicks - wait for 200 msecs between clicks
  • Images automatically downloaded to conf file folder while run from command line or scheduler
  • File export : unnecessary comma (,) at the end of each record (line) removed
  • Fixed issue of not writing file prolog (XML) while conf run via command line or scheduler
  • Assembly marked as NOT CLS compliant
  • License file copied to a common location accessible to all versions (updates), no need to unlock after each update
  • Fix proxy rotation bug
  • Fixed bugs in Auto Duplicate removal, mining stops if page full of duplicates found
  • URL displayed in Category Column if enabled (additional column for displaying categories) while adding URLs to Configuration
  • Fixed bug in keyword scraping : keyword encoding (handles spaces in keywords)
  • Minor improvements in following links : WebMiner.VisitURLandGetData

Build 82

Private Release

  • URL encoding for startURL removed (Build 79 Change)

Build 81

Private Release

  • amazon.co.jp, not following links from second page, changes made in VisitURLandGetData - targetURL is verified using Uri.TryCreate
  • RegEx matching modified to concatenate all sub groups matched

Build 80

Private Release

  • Capture Window - all items (follow link, capture image, set as next page link) enabled by default

Build 79

Private Release

  • URL encoding for startURL (browsercontainter) so that language (japanese, chinese etc) strings in the URL are handled well. Amazon Japan Scraping problem fixed.

Build 78

Private Release

  • RegEx fix in previous build was buggy. Fixed it (to be reviewed and tested more).

Build 77

Private Release

  • XML header problem while exporting XML data from command line/scheduler fixed
  • Image download folder selection prompt will not be displayed for Command Line/Scheduler, will be downloaded to configuration file directory instead (without asking).
  • Keyword based scraping - keywords encoded and replaced only if necessary. Keywords can appear in non-encoded format in the URL.
  • ExportData : ending comma or tab removed for each record (line) for CSV/TSV export. Each line now ends on CR.
  • RegEx (HtmlParser.RefineElementText) corrected to correctly make the match

Build 76

Private Release Candidate

  • The latest Private Release Candidate with previous version changes reverted.

Build 75

Private Release

  • Build #74 did not completely solve the issue. This release contains issue specific workaround code which will not be taken forward to the next release

Build 74

Private Release

  • Scraping secondary items more than present primary items (which will not be filled in data table), but will delay scraping removed by introducing the overflowSecondaryData flag in WebMiner.

Build 73

Private Release

  • BrowserContainer.webBrowserCtrl_NewWindow : absolute URL built if given one is relative - to open in same window
  • 'Set as Next Page link' on 'Load More Content' buttons will work

Build 72

Private Release

  • Fixed issue while exporting data from command line (scheduler), append made false for first export, XML files skipped the header (encoding) part before this fix.
  • When file downloads are present in the configuration, the scheduler automatically sets the download folder to that of the configuration file, rather than asking the user and stalling the execution which should proceed without user intervention.

Build 71

Private Release

  • HtmlParser.GetNextElement tweaked for 5 elements or less (removed 10 min child count line)
  • Fixed bug in HtmlParser.PerformClicks - GetNextElement called only for patterns

Build 70

Private Release

  • Miner Settings - Category/Keyword tab has a new option to Not automatically parse and select categories at the first click

Build 69 - v3.1

July 25, 2013

  • Tagging for Keyword/Category Scraping.
  • Option to set separately Page Load Timeout and AJAX Wait Time in Miner Settings
  • All HtmlElement.TagName checked in uppercase, removed .toLower() calls (language dependent issues)
  • Fixed bug related to Category Scraping
  • Modified GetTextNearHeadingBruteForce to correctly find the following text
  • Capture More Content Exception Fix
  • Option to Edit Start URL / PostData / Headers
  • Major Fix in WebMiner.GetData()

Build 60 - Version 3.0

June 2013

  • Added the following new capture options in the Capture Window (grouped under 'More Options')
  • 1. Capture following text: Improved by using brute force search for all elements in the page
  • 2. Capture HTML: Option to scrape HTML of selected element
  • 3. Capture Text as File: Option to scrape text and save it as a local file (useful while scraping articles and blog posts)
  • 4. Click: Ability to scrape hidden (partially displayed) fields in webpages which requires a click from the user to be displayed in full. For example phone numbers or email addresses which are displayed completely only if you click them.
  • 5. Apply Regular Expression: Option to apply Regular Expressions (RegEx) on captured text. RegEx can be applied even after applying 'Capture following text', 'Capture HTML' & 'Capture More Content' options.
  • 6. Capture More Content: Option to capture more text than the selected text, captures parent element's text
  • Option to individually select categories/links (one by one) for Category Scraping
  • Export scraped data as JSON
  • Ability to scrape data from tables (row-column / grid layout)
  • Ability to scrape pages which has fewer (less than 10) data items
  • Option to test proxies before using them (Edit menu - Settings - Proxy Settings)
  • Non responsive proxies are skipped during mining. Mining would not stop because of a bad/non-responsive proxy in the list.
  • Option to manually add URLs to an existing configuration (Edit menu - Add URLs to configuration)
  • Option to remove duplicates while mining (Settings - Miner)
  • Added Hourly frequency option in Scheduler
  • Added option to export data directly to database for scheduled mining tasks & command line
  • Added Clear option in Edit menu which will clear both the browser and data preview pane
  • Fixed bug related to license key validation (which resulted in 'Invalid license file' error for valid license files)
  • Added a new licensing option which would relax trial limitations during the evaluation period (try without limitations)
  • Installation migrated to Installshield limited edition (.Net 3.5 as base)
  • Handles auto save without overwriting issues when multiple instances of WebHarvy are running (configuration name is appended to auto save file name to avoid conflicts)
  • Upgrade option displayed while trying to register with old license key files
  • Amazon.com : configuration can start from the first product. Start from second row limitation removed.
  • URL validation in address bar made less strict to accommodate more common URL schemes
  • Language encoding defaulted to 'utf-8' for file exports (XML, CSV etc)
  • CSV/Database export : handles delimiters (comma, quotes) in captured data
  • Keyword/Category scraping allowed for 2 entries in evaluation version.
  • Rendering issues with browser fixed - defaults to IE 9 rendering

Builds 2-55 (2010-2013)

December 2010 - May 2013

Private and incremental releases leading up to Version 3.0, including:

  • Ability to capture URLs and images in addition to text
  • Export to TSV (Tab Separated Values) format
  • Capture a set of similar links (categories)
  • Export scraped data to database (SQL)
  • Proxy manager for web scraping with IP rotation support
  • Ability scrape websites by submitting multiple keywords
  • Option to edit saved configuration files
  • Capture text near heading feature
  • Built-in scheduler for running mining tasks
  • Command line options
  • Support for MySQL database export
  • Option to capture substring of selected text
  • Support for scraping data from local HTML files
  • Option to resume mining from where it stopped
  • Option to auto-save captured data on regular intervals
  • Option to automatically inject pauses while mining

Build 1 - Initial Release

2010

  • Initial Release

Ready to try WebHarvy?

Download the latest version and start scraping data from websites in minutes.