WebHarvy Version History - Complete Release Notes

Build 253 - 8.0.1.253

June 15, 2026

AI Scraping: Added support for connecting to both local LLMs (via Ollama, LM Studio etc.) and cloud-based LLMs (including OpenAI and Anthropic) for intelligent data extraction.
Intelligent Troubleshooting: Added built-in guidance during configuration to help identify and resolve data selection issues.
Troubleshooting Wizard: Introduced a troubleshooting wizard in the Miner window to assist with common issues such as no data being extracted during mining, pagination problems, and other frequently encountered scenarios.
Database Export Enhancements: Added the ability to specify a custom server port when exporting scraped data to databases that use non-standard port numbers.
Improved MySQL Unicode Support: Fixed an issue affecting Unicode character support when exporting data to MySQL databases.
Clearer Settings Management: Configuration settings and global WebHarvy settings can now be edited independently, eliminating ambiguity between the two.
Full-Page Screenshot Fix: Fixed an issue where full-page screenshots were not being captured correctly.
Simplified Page Reload During Configuration: If a page needs to be reloaded during configuration, clicking the browser's Reload button will now automatically invoke the equivalent of More Options → Page → Reload within the Capture window.

Build 248 - 7.11.0.248

February 20, 2026

Updated browser
Keyword Scraping: Multiple keyword lists now support row-wise submission in addition to submitting all combinations.

Build 246 - 7.9.0.246

January 9, 2026

Additional DPI scaling fixes for element targeting precision during configuration

Build 245 - 7.9.0.245

November 27, 2025

Browser updated to Chromium v139
Updated internal libraries and user agent strings

Build 244 - 7.8.0.244

October 8, 2025

Configuration time element highlighting and text selection accuracy improved by considering DPI scaling
Internally used libraries updated
Minor changes in quick start index page
Excel export updated by grouping data rows within a table.
Fixed bug while importing data from another Excel file.

Build 242 - 7.7.0.242

Aug 19, 2025

Browser component upgraded: Updated to latest Chromium for better compatibility with modern websites
Enhanced silent installation: Installer now supports enhanced silent installation for easier enterprise deployment
Improved link mining reliability: Automatic retry mechanism ensures more successful data extraction when pages fail to load initially
Excel export improvements: Fixed issue where data exceeding Excel's maximum cell length limit caused export failures

Build 239, 240 - 7.7.0.239/240

May 20, 2025

Updated Browser with latest Chromium: Enhanced compatibility and performance with modern web technologies
Improved 'Follow this link' functionality: Better navigation and link processing for more accurate data extraction

Build 238 - 7.7.0.238

May 1, 2025

Updated browser: Improved handling of Cloudflare protection.
Link Extraction Fix: Resolved the “Follow this link” disabled issue for many websites by automatically applying Capture More Content, Link_RegEx, and adjusting the MinLevelsUp miner setting.
Excel Export: Switched from SpreadSheetLight to ClosedXML library to fix Excel export issues.
UI Framework: Updated Dock Panel Suite package and fixed missing panel issue.
Scheduler: Updated Task Scheduler library.
Database: Updated MySQL and SQL Server libraries.
Proxy Handling: Now supports proxy import in ip:port:username:password format.
Pagination Fix: Pagination data is now correctly overwritten while editing configuration files. (Earlier, multiple pagination data items could exist in a single configuration.)

Build 233 - 7.6.0.233

January 2, 2025

Updated browser component: Enhanced handling of CloudFlare-protected websites for better access to secured content
Improvement of the Input Text feature: Better compatibility with modern JavaScript frameworks like React and Vue for dynamic search functionality

Build 230 - 7.5.0.230

November 15, 2024

In-app payment links updated
License key validation improved
Browser updated.

Build 228 - 7.4.0.228

November 7, 2024

Input text dynamically updates page content: Enhanced Input Text feature now triggers page updates automatically, perfect for dynamic search functionality
Improved page scroll functionality: Smoother page scrolling for better navigation through long pages and infinite scroll scenarios
Enhanced image download capabilities: Multiple images can now be captured and downloaded when using regular expressions to extract image URLs
Better configuration file formatting: XML configuration files are now auto-formatted for easier reading and debugging
Smart auto-scrolling for specific sites: Automatic scrolling enabled for sites like Zillow to load all available content

Build 222 - 7.3.0.222

June 18, 2024

Support for adding keywords via 'Input-Text' option: Enables keyword-based scraping even when search terms aren't visible in URLs, perfect for complex search scenarios
Miner options can now be saved in configuration: Each scraping configuration can maintain its own specific mining settings for consistent results
Updated browser component: Enhanced compatibility and performance improvements

Build 217 - 7.2.0.217

January 10, 2024

Updated quick start guide: Enhanced user onboarding experience with clearer instructions
Improved multi-level category scraping: Tag column now shows meaningful category paths instead of URLs for better data organization
Faster pagination processing: Significant performance improvements when handling multi-page data extraction
Browser updated to Chromium V117: Latest browser engine for better website compatibility
Enhanced update notifications: Improved links and user experience for version updates
Updated core libraries: Enhanced database connectivity (PostgreSQL, MySQL, Oracle) and task scheduling capabilities

Build 215 - 7.1.0.215

August 24, 2023

Enhanced Pagination Handling: Improved support for infinite scroll and load-more techniques commonly used on modern websites
Dynamic frame URL handling: Open Frame option now automatically manages frame URLs for better iframe content extraction
Smart URL conversion: Automatically converts relative URLs to absolute URLs during mining for more reliable link extraction
Faster normal pagination: Performance improvements for traditional page-by-page navigation
Configurable element highlighting: New browser settings allow disabling element highlighting during setup for cleaner interface
Updated browser engine: Enhanced compatibility and performance

Bug fixes:

TagName of elements can sometimes include special characters which are reserved characters in HTML selector string. These needed to be escaped and weren't done so in previous versions.
Fixed issue related to getting image URLs using Regular Expressions during mining.
Selecting dropdown options - item selection correctly reflected on page by firing the 'change' event.
Fixed bug in normal pagination where 'next page element text' occurs for another element (non-pagination) on page.

Build 207 - 7.0.1.207

May 2, 2023

Added "Scroll List" feature: New capture window option to automatically scroll through lists of items for comprehensive data extraction
Reduced script usage: Enhanced support for infinite scroll and "load more" pagination without requiring custom JavaScript code
Smart element targeting: Clicks on links, buttons and tabs are now text-based, ensuring reliable interaction even when element positions change

Bug fixes:

'Scroll to load next page' option cleared (if checked) when configuration is stopped.

Build 204 - 6.7.0.204

March 17, 2023

Added PostgreSQL support: New database export option for PostgreSQL alongside existing MySQL and Oracle support
Updated browser component: Enhanced compatibility and performance improvements

Build 203 - 6.7.0.203

March 1, 2023

Major browser update which overcomes blocking by websites which employ Cloudfare/PerimeterX etc.
Major change in installer, moved to MSI (Wix) installer from .EXE (InstallShield)
Added support for PostgreSQL database export
Text data captured as files can be named based on value of another column (like images)

Bug Fixes:

Fixed bug related to clearing all proxies in WebHarvy settings.
Minor bug fix related to setting normal pagination link via JavaScript
'AJAX Load Wait Time' renamed to 'Script Load Wait Time'

Build 199 - 6.6.0.199

November, 2022

Bug fix in RunScript (JavaScript) functionality which prevents application hang during mining

Build 198 - 6.6.0.198

October 4, 2022

Improvements:

Edit details of saved proxy servers: New feature allows editing proxy server details like username, password, address, and port for better proxy management
Updated browser to latest Chromium version: Enhanced compatibility and security improvements
JavaScript pagination support: Option to set pagination links via JavaScript when traditional pagination elements are not available on the page
Screenshot images can be named based on the value of the selected data column (Image Settings)
Duplicate removal feature ignores differences in category/keyword tagging column.
Added dropdown option to select User Agent strings of popular browsers like Chrome, Firefox, Edge etc. in Browser Settings.
Support for running JavaScript code on all pages without selecting a placeholder primary column.
Regular Expression window now contains prebuilt RegEx strings to select prices in various currencies.

Bug Fixes:

Bug fix in Category Scraping feature which prevents infinite looping on certain types of category pages.

Build 194 - 6.5.0.194

June 2, 2022

Bug fix for MySQL server (8.0.29) export: Updated MySQL library to ensure compatibility with the latest MySQL server versions
Updated internal libraries: Enhanced SpreadSheetLight library for improved Excel export functionality and reliability

Build 193 - 6.5.0.193

April 13, 2022

Internal browser updated to Chromium V96: Latest browser engine for improved website compatibility and performance
Added "Auto Enable 'Follow this Link' option": Automatically detects when elements contain URLs and enables link following for streamlined navigation
Enhanced Regular Expression support: Multiple item matching with commonly used RegEx patterns available via dropdown for easier configuration
Improved image organization: Option to save images in separate folders based on column names for better file management
Better URL handling: Support for relative URLs in 'Add URLs' window with improved export file format synchronization
Enhanced user experience: Warning when closing miner without saving data, clearer terminology for script loading settings

Build 192 - 6.4.0.192

December, 2021

Updated browser component to the latest available version
Fixed bug in capturing multiple image URLs from starting page (Image_URL_RegExMulti)

Build 191 - 6.4.0.191

December 21, 2021

Improved performance of "Capture Following Text" option: Faster data mining with enhanced text extraction capabilities
Faster data mining: General performance improvements for quicker scraping operations
Database export feature updates: Enhanced functionality for exporting data to various database formats
Updated browser to latest possible version of Chromium
Updated MySQL dependency library to Nuget package latest version solving SQL export issue

Build 190

August 1, 2021

Updated MySQL library to latest version (8.0.x) to fix authentication error (caching_sha2_password).

Build 189 - 6.3.0.189

September 9, 2021

Major Changes:

Added support for custom data fields: New fields include current page URL, page screenshot, mining date/time, and user-provided text for enhanced data collection
Smart field name suggestions: Automatically suggests meaningful field names for 'Capture following text' option to speed up configuration
Enhanced text capture performance: 'Capture following text' feature optimized for faster configuration and better user experience
Multiple image URL scraping: Now supports automatic extraction of multiple image URLs, expanding beyond just image downloads
Configuration-specific settings: Advanced Miner Options and registration info saved per configuration for easier project management
Updated browser engine: Latest browser version fixes loading issues with specific websites like bet365.com

Minor Changes:

Duplicate images are named -001, -002 etc. instead of -1, -2 etc.
For smart help (articles/videos search) 'www' part of website removed from search query string.

Build 187 - 6.2.0.187

June 12, 2021

New capture window option to add custom fields - date/time, current page URL, constant string
Saves registration details and WebHarvy version in the configuration file.

Build 186 - 6.2.0.186

May 11, 2021

Updated browser to fix loading issue for websites like bet365.com (earlier versions did not load the page at all)

Build 185 - 6.2.0.185

March 29, 2021

Updated location of license server database to AWS RDS

Build 184 - 6.2.0.184

March 17, 2021

Added support for different proxy types (HTTP, HTTPS, SOCK4, SOCK5)
Added Browser Setting Option to use separate browser engine for mining links. Helps miner recover from crashes during mining.
Added Browser Setting Option to disable opening popups
Updated browser to latest Chromium version V86

Build 179 - 6.0.1.179

August 19, 2020

Updated browser
Bug fix: 'Back' button of configuration browser, took browser to the quick start guide (2 pages before) than loading the page from which the link was followed.

Build 178 - 6.0.1.178

August 15, 2020

Major:

Option to add blank row with tag for category/keyword/URL scraping when no data is fetched
Proxies are set only within WebHarvy and not system wide. Proxies are set in configuration browser too.
Added 2 new Capture window options. More Options > Page > Reload & Go-Back
Added Update(Upsert) / Overwrite / Append options for database and Excel file export
Keywords can be added even after starting configuration

Minor:

'Follow this link' option is enabled for Link_RegEx in details pages. Earlier only Click option was available.
Automatically handles encoded URLs selected from HTML. Example: URLs including '&' For Link_RegEx, Image_RegEx/Multi.
Security increased for database connection string and EO license string
'Activation Exceeded' message includes link to 'How to transfer license ?'
Installing a new version will reset the evaluation period. This will allow users (with expired trials) to try new versions.
'Enable JavaScript', 'Share Location' and 'Enable plugins' options removed from Browser settings

Bug Fixes:

While scraping a list of URLs, URLs without HTTP scheme part (http:// or https://) are handled. Earlier such URLs failed to fetch data.
Previously, while scraping a list of URLs, if one URL failed to load, data from previously loaded URL was sometimes mined for its row. Fixed.

Build 174 - 6.0.1.174

February 19, 2020

Updated browser to Chromium v77
Fixed issue in multi URL scraping when one of the intermediate URLs does not have next page link.

Build 173 - 6.0.1.173

January 8, 2020

Switched from XPath to CSS selector for data selection
New setting 'Data Selection Accuracy' added in Advanced Miner Options
Fixed issue where data from followed links are sometimes extracted only if Link_RegEx is used

Build 172 - 5.5.2.172

December 20, 2019

Added support for .webp images
Added protocol security and user agent while checking image URL for image type

Build 171 - 5.5.2.171

November 27, 2019

Fixed issue related to miner window freezing/hanging during mining. Happened especially when a larger number of parallel mining threads are used.

Build 170 - 5.5.1.170

November 22, 2019

Bug fix for configuration preview issue: Resolved issue where preview sometimes updated with only a single item during configuration
Enhanced .NET framework: Migrated to .NET 4.7 for better performance and security
Updated browser engine: Latest Chromium version for improved website compatibility
Better file naming: Increased image filename limit to 150 characters for more descriptive names
Improved focus handling: Fixed false click-through issues when switching between applications

Build 168 - 5.5.0.168

November 1, 2019

Improvements/Additions:

Added custom user agent string: Configure WebHarvy's browser to mimic different browsers and devices for better website compatibility
Improved frame handling: Enhanced support for iframes and frame-based websites with automatic frame opening options
Better form submission/navigation: Streamlined navigation without manual pattern management for clicks and form interactions
Enhanced browser features: Added search functionality (CTRL + F) and full page HTML capture for comprehensive data extraction
Faster scheduling options: Support for high-frequency mining tasks with 5, 10, 15, and 30-minute intervals
Enhanced security and compatibility: HTTP2 support, client certificate handling, and improved web security settings
Updated Chromium engine: Latest browser technology for better website support

Bug Fixes:

Fixed issue with getting XPath correctly during configuration by disabling caching on failure
Preview generation stopped when configuration is stopped
Deleting fields not allowed while preview generation is in progress (preview form context menu disabled)
Pattern disabled after opening popups.
For configurations with additional URLs added, editing Start URL did not update first URL in the URL list. Fixed.
Single term/word search supported from browser address bar
Fixed bug in Keyword Scraping due to case sensitiveness of keyword replacement in start URL / postdata.
Fixed bug : sometimes while the application starts up, the init page takes forever to load.

Changes due to browser updates:

All WebView operations called on thread runner context
Element highlighting on mouse hover during configuration is handled in a separate thread
Handles both mouse-up and mouse-down events to prevent false click throughs during configuration
All keyboard shortcuts handled in separate threads to prevent UI hang
Browsers for mining created when Miner window is displayed, and not during Start Mine.

Build 164 - 5.4.0.164

February 19, 2019

Auto-delete cookies feature: Automatically clear cookies during mining sessions to avoid being blocked by websites
New pagination method: "Load more data using JS" - Enhanced JavaScript-based pagination for sites with infinite scroll or load-more buttons
Improved license management: Options to deactivate and transfer licenses between machines, plus direct access to license upgrades
Enhanced task scheduling: Direct editing of scheduled mining tasks in Windows Task Scheduler for greater control
Better configuration management: Automatic prompts to save unsaved configurations and preservation of JavaScript pagination code

Build 161 - 5.3.0.161

November 21, 2018

Updated browser
Introduced Worker Process to solve crashes reported with latest Windows updates.
Fixed anchoring of controls in 'Run Java Script' window while resizing the window.

Build 160 - 5.3.0.160

October 24, 2018

Parallel mining capability: Revolutionary speed improvement with configurable parallel mining threads (1-10 threads, default 4) aimed at increasing mining speed significantly
Chrome developer tools integration: Built-in developer tools for advanced debugging and website analysis directly within WebHarvy
Enhanced configuration experience: Improved sub-text selection with better highlighting and more accurate dropdown/listbox selection reflection
Smart pagination handling: Automatic page scrolling to ensure 'Load More' buttons are properly loaded and accessible
Better display compatibility: Fixed data selection issues with non-100% Windows text scaling settings
Fixed issue related to downloading images, especially when they are behind SSL (https://)
Fixed issue related to non-visibility of miner window in multi monitor systems when monitor configuration changes.
Fixed issue related to deleting browser cache in WebHarvy > Browser Settings. Earlier cache was not entirely deleted.
Fixed unresponsiveness of Capture window (for a short period of time), after applying RegEx on HTML.
Removed all mobirise links from InitPage.
Added 'Firewall may be blocking' message in Activation failed window.
Removed browser emulation key addition (for previous IE version) from installer.
Added Zoom level and number of parallel mining threads to status bar info.
ALT + X assigned to Clear instead of CTRL + X which is associated with cut.
Fixed major issue with loading and displaying upgrade purchase page in cases where user's license has expired.
Disabled 'Mine all pages/Number of pages to mine' controls while mining is in progress.
Updated to the latest version of EO.WebBrowser.

Build 155 - 5.2.0.155

March 26, 2018

UI revamp with ribbon menu: Complete user interface overhaul featuring modern ribbon menu design and redesigned forms for better user experience
Oracle database support: New export option for Oracle databases alongside existing MySQL support
Updated export functionality: Enhanced threaded export system with cancellation support for both file and database exports
Excel as default export: Streamlined workflow with Excel as the default export format
Enhanced browser engine: Updated to EO.WebBrowser 18.0.98 fixing URL update issues
Multi line input form for JavaScript (Run Script)
Snippet of miner options displayed in status bar
Smart help for websites displayed in status bar
New 'Update Available' form
Added settings link in Miner form
Option to edit configuration (or run configuration) right after opening a saved configuration file.
Location sharing option in Browser Settings.
Preview columns are automatically resized.
Minor bug fixes.

Build 153

Private Release

Updated browser

Build 152 - 5.1.0.152

January 2018

Updated browser
Fixed bug where user could continue to unlock and use the software even after activation limit has been reached

Build 150 - 5.1.0.150

January 8, 2018

Direct Excel export: New native Excel export capability for seamless data output to spreadsheet format
Handles page numbers in JavaScript: Enhanced pagination support for JavaScript-driven page numbering systems
Updated Chromium engine: Enhanced browser stability with optimized settings to minimize crashes
Better zoom compatibility: Fixed data selection issues when browser zoom level is not set to 100%
Improved text formatting: Enhanced handling of line breaks and spaces in capture window for cleaner data extraction
Enhanced form handling: Better PostData management for form submissions and subsequent navigation
Bug fix in Keyword Scraping > Editing, wherein changing the first keyword was not possible
Self deactivates in case license is set as cancelled in WH License Server
Minimizes memory usage in mining thread by limiting the number of browser instances created (to 3, which are global and used forever)
Address bar can be used for Google search

Build 148 - 5.0.1.148

September 13, 2017

Switched from Internet Explorer to Google Chrome browser engine: Revolutionary upgrade replacing IE with Chrome for dramatically better website compatibility and performance

Benefits:

Faster mining performance: Significantly improved data extraction speed with the new browser engine
Enhanced stability: More reliable and thoroughly tested browser integration
Better modern website support: Chrome engine handles JavaScript-heavy and modern web technologies much better than IE

Build 141 - 4.1.5.141

May 2, 2017

All items from builds 130 to 140 / polished / tested

First release with browser abstraction layer which will be used later to port webbrowsercontrol to other browser controls (like Chromium)
Support for loading next page via JavaScript. User can directly provide script to load next page.
Support for next page links implemented via JavaScript
Increased size of miner's browser window (1920x1080)
'Load More Content' pagination now works even when the configuration requires click/scroll/JS etc. (navigational items) in the start. Earlier 'Load More Content' was done only in the beginning, so it did not have a chance to run after initial navigation - to reach the page which contains the 'load more' link.

Bug Fixes:

WebMiner.NavigateToNextPage. Once next page is loaded, when checking if it is loaded correctly the first primary item is checked blindly fieldList[0] => which fails when the configuration starts with page interaction/navigation type dataitem like click, input-text etc. which may not be present in subsequent pages. So first real data item is checked instead.
Selecting data from popups. The popups can now be closed by using 'Click' option. 'Click' (along with other navigational items like JS) are handled within popups.
SQL data export. Added UTF-8 encoding in connection string for MySQL. Create table data type changed from text to nvarchar for SQL Server. Fixed chinese language export.
Other minor bug fixes

Build 139, 140 - 4.0.4.139/140

April 18, 2017

Fixed SQL Server export chinese chars encoding issue. CREATE TABLE datatype changed from TEXT to nvarchar

Build 138 - 4.0.4.138

April 5, 2017

Fixed bug related to scraping data from popups

Build 137 - 4.0.4.137

April 5, 2017

Fixed MySQL export chinese chars encoding issue. Modified connection string for MySQL.

Build 136 - 4.0.4.136

March 21, 2017

First release with WebHarvy-X codebase (browser abstraction layer WH*)
Also polished and tested changes in 130 to 135

Build 135 - 4.0.3.135

March 10, 2017

Fixed bug in pagination via Link_LoadMoreContent. Bug in HtmlParser.GetNextPageElement

Build 134 - 4.0.3.134

March 9, 2017

Fixed bug in pagination via JavaScript

Build 133 - 4.0.3.133

March 6, 2017

Implemented loading next pages via JavaScript. User can directly provide script to load next page
Increased size of miner browser (1920x1080)

Build 132 - 4.0.3.132

February 9, 2017

Fixed bug in selecting data from popups. The popups can now be closed by using 'Click' option. Earlier the click action was not handled within ParsePopup, so there was no need to close them, if required. Now clicks are handled. Also, when a popup is closed it can happen that the parent page is reloaded/dom-changed, so the elements stored can becomes invalid. For this reason the next element is found again from the start.

Build 131 - 4.0.3.131

February 7, 2017

Added support for loading more content in same page (load more data link), even when the page is reached by clicking links (navigation in configuration) from the start page.

Build 130 - 4.0.3.130

January 24, 2017

Bug fix related to pagination - loading subsequent pages.

Build 129 - 4.0.3.129

December 26, 2016

Redistributable in installer correctly set to .NET 4.5 (earlier it was .NET 4.0)

Build 128 - 4.0.3.128

December 5, 2016

Depends on .NET 4.5
Open Popup : During configuration, wait for ajax load
BugFix: Added case for Link_RegEx in BrowserContainer:EditConfiguration
BugFix: Adding Link_Back automatically for Link_RegEx navigations. Was not done earlier. (MinerParams.cs#Optimize())
Handles pagination when the next page link is a javascript code rather than an actual link/URL (attempts, no 100% fix)
Handles pagination when the next link contains a page number which changes from page to page
Minor bug fix in executing Javascript on page (allows page to load, allownavigate)

Build 127

Private Release

Handles page numbers in 'Load More Content' links
Minor bug fix in executing Javascript on page (allows page to load, allownavigate)

Build 126

Private Release

Targets .NET 4.5

Build 125 - 4.0.2.125

June 2016

MAJOR:

True multi-level category scraping. Scrapes main-sub-sub..final listing tree.
Support for multiple keyword lists for multiple input fields
Support for selecting drop-down/combo list options
Support for inputting text to input fields
Option to click and open a popup
Option to run java script on page
Option to scroll page (main or details) to load contents (not pagination)

MINOR:

Internet connection detection where Google is blocked (uses baidu)
Improvements in automatically scraping multiple images
Capture Image option automatically enabled via HTML/RegEx capture in case it is disabled by default
Option to name images by source file name or by value of a specified cell in data table
Allows applying Capture More Content after applying Capture HTML
Minor bug fixes
Quick access to items under More Options in Capture window via toolbar

Build 119 - 3.4.0.119

June 10, 2015

MAJOR:

Added pagination support: Revolutionary feature enabling navigation through multiple pages by clicking links or buttons
URL-based pagination: Automatic incrementation of numbers in URLs for sequential page loading
"One-click multiple image extraction": Breakthrough feature for capturing multiple images from detail pages in a single operation
Human emulation mode support for automatic pause injection
Online license activation introduced to prevent casual piracy
Assembly obfuscated for security

MINOR:

'Click' option (Capture window > More Options > Click) can be used to navigate to the start page
Bug Fix : Data alignment issue in miner window data table when some fields of records do not have a value (blank columns)
Bug Fix : Keyword based scraping when encoding is required
Scheduler option to overwrite or append the export file in case the file already exists
'Follow this link' option enabled even in away (details) pages. Uses 'Click' option underneath.
Bug Fix : Images going blank in some cases while mouse hovers over them during configuration
Bug Fix : New lines and tabs escaped in JSON export
HtmlParser updated to parse elements from <HTML> tag, so META tags can be extracted from the full HTML source of the page
Handles commas in keywords
Starts with a random proxy address from the proxy list while rotating proxies
In-built browser emulates IE 11 on default.

Build 106 - 3.3.0.106

June 2014

Depends on .NET 3.5 framework
Fixed encoding problems in URLs related to Category Scraping
Added option to disable patterns in start page
Context menu (copy/cut/paste) for 'Add URLs' window's URL list
Option to follow links obtained by applying RegEx on HTML - absolute & relative URLs handled
Separate options to scrape image URL and to download image file
Image_RegEx made to work for both relative and absolute URLs, works even when URL does not contain file extension
Fixed issue with image file extension - extension added automatically if not present in default file name
Added multiline option in RegEx
Pagination via page numbers - code refined
Faster mining 'restart' from where it stopped previously - remembers last mined URL

Build 100 - 3.2.0.100

Minor Release

Fixed bug in Registration (license activation) code for language encoding issues
Fixed bug in Miner window for Auto Scroll feature

Build 99 - 3.2.0.99

February 2014

MAJOR:

Ability to load next page by scrolling to the end of the page (Edit menu - Edit Options - Scroll down to load next page)
Ability to edit (add and remove) URLs from configuration (Edit menu - Edit Options - Add/Remove URLs from configuration)
Ability to add/remove keywords associated with the configuration
Ability to download images whose URL is obtained after applying RegEx on HTML of selected content
Ability to select category links one-by-one by disabling automatically parsing category links
Refined 'Capture following text' option
Handles cases where 'next page link' is a 'load/show more results/data' link/button.
RegEx now allows multiple groups to be captured
Handles different table layout (ASIN and ASIN:) of product details displayed by Amazon
Advanced Miner Options - Set MinChildCount and MinLevelUp miner options directly from Miner settings
Automatically check for updates - provides options to download update, upgrade, purchase new version
Authentication support for proxies for https (SSL) websites

MINOR:

Web browser context menu disabled (IE)
Popup-window cases, handles relative URLs
'Set as Next Page Link' / 'Set as 'load more data' link' options always enabled in start page
More detailed error messages displayed while exporting to database
Handles scraping amazon wishlist where the XPATH varies a bit by introducing a new element (dummy) in between
Minor bug fix in HtmlParser.GetNextElement - parsing down to the childless child of the element found for clicking
Minor bug fix in HtmlParser.PerformClicks - wait for 200 msecs between clicks
Images automatically downloaded to conf file folder while run from command line or scheduler
File export : unnecessary comma (,) at the end of each record (line) removed
Fixed issue of not writing file prolog (XML) while conf run via command line or scheduler
Assembly marked as NOT CLS compliant
License file copied to a common location accessible to all versions (updates), no need to unlock after each update
Fix proxy rotation bug
Fixed bugs in Auto Duplicate removal, mining stops if page full of duplicates found
URL displayed in Category Column if enabled (additional column for displaying categories) while adding URLs to Configuration
Fixed bug in keyword scraping : keyword encoding (handles spaces in keywords)
Minor improvements in following links : WebMiner.VisitURLandGetData

Build 82

Private Release

URL encoding for startURL removed (Build 79 Change)

Build 81

Private Release

amazon.co.jp, not following links from second page, changes made in VisitURLandGetData - targetURL is verified using Uri.TryCreate
RegEx matching modified to concatenate all sub groups matched

Build 80

Private Release

Capture Window - all items (follow link, capture image, set as next page link) enabled by default

Build 79

Private Release

URL encoding for startURL (browsercontainter) so that language (japanese, chinese etc) strings in the URL are handled well. Amazon Japan Scraping problem fixed.

Build 78

Private Release

RegEx fix in previous build was buggy. Fixed it (to be reviewed and tested more).

Build 77

Private Release

XML header problem while exporting XML data from command line/scheduler fixed
Image download folder selection prompt will not be displayed for Command Line/Scheduler, will be downloaded to configuration file directory instead (without asking).
Keyword based scraping - keywords encoded and replaced only if necessary. Keywords can appear in non-encoded format in the URL.
ExportData : ending comma or tab removed for each record (line) for CSV/TSV export. Each line now ends on CR.
RegEx (HtmlParser.RefineElementText) corrected to correctly make the match

Build 76

Private Release Candidate

The latest Private Release Candidate with previous version changes reverted.

Build 75

Private Release

Build #74 did not completely solve the issue. This release contains issue specific workaround code which will not be taken forward to the next release

Build 74

Private Release

Scraping secondary items more than present primary items (which will not be filled in data table), but will delay scraping removed by introducing the overflowSecondaryData flag in WebMiner.

Build 73

Private Release

BrowserContainer.webBrowserCtrl_NewWindow : absolute URL built if given one is relative - to open in same window
'Set as Next Page link' on 'Load More Content' buttons will work

Build 72

Private Release

Fixed issue while exporting data from command line (scheduler), append made false for first export, XML files skipped the header (encoding) part before this fix.
When file downloads are present in the configuration, the scheduler automatically sets the download folder to that of the configuration file, rather than asking the user and stalling the execution which should proceed without user intervention.

Build 71

Private Release

HtmlParser.GetNextElement tweaked for 5 elements or less (removed 10 min child count line)
Fixed bug in HtmlParser.PerformClicks - GetNextElement called only for patterns

Build 70

Private Release

Miner Settings - Category/Keyword tab has a new option to Not automatically parse and select categories at the first click

Build 69 - v3.1

July 25, 2013

Tagging for Keyword/Category Scraping.
Option to set separately Page Load Timeout and AJAX Wait Time in Miner Settings
All HtmlElement.TagName checked in uppercase, removed .toLower() calls (language dependent issues)
Fixed bug related to Category Scraping
Modified GetTextNearHeadingBruteForce to correctly find the following text
Capture More Content Exception Fix
Option to Edit Start URL / PostData / Headers
Major Fix in WebMiner.GetData()

Build 60 - Version 3.0

June 2013

Added the following new capture options in the Capture Window (grouped under 'More Options')
1. Capture following text: Improved by using brute force search for all elements in the page
2. Capture HTML: Option to scrape HTML of selected element
3. Capture Text as File: Option to scrape text and save it as a local file (useful while scraping articles and blog posts)
4. Click: Ability to scrape hidden (partially displayed) fields in webpages which requires a click from the user to be displayed in full. For example phone numbers or email addresses which are displayed completely only if you click them.
5. Apply Regular Expression: Option to apply Regular Expressions (RegEx) on captured text. RegEx can be applied even after applying 'Capture following text', 'Capture HTML' & 'Capture More Content' options.
6. Capture More Content: Option to capture more text than the selected text, captures parent element's text
Option to individually select categories/links (one by one) for Category Scraping
Export scraped data as JSON
Ability to scrape data from tables (row-column / grid layout)
Ability to scrape pages which has fewer (less than 10) data items
Option to test proxies before using them (Edit menu - Settings - Proxy Settings)
Non responsive proxies are skipped during mining. Mining would not stop because of a bad/non-responsive proxy in the list.
Option to manually add URLs to an existing configuration (Edit menu - Add URLs to configuration)
Option to remove duplicates while mining (Settings - Miner)
Added Hourly frequency option in Scheduler
Added option to export data directly to database for scheduled mining tasks & command line
Added Clear option in Edit menu which will clear both the browser and data preview pane
Fixed bug related to license key validation (which resulted in 'Invalid license file' error for valid license files)
Added a new licensing option which would relax trial limitations during the evaluation period (try without limitations)
Installation migrated to Installshield limited edition (.Net 3.5 as base)
Handles auto save without overwriting issues when multiple instances of WebHarvy are running (configuration name is appended to auto save file name to avoid conflicts)
Upgrade option displayed while trying to register with old license key files
Amazon.com : configuration can start from the first product. Start from second row limitation removed.
URL validation in address bar made less strict to accommodate more common URL schemes
Language encoding defaulted to 'utf-8' for file exports (XML, CSV etc)
CSV/Database export : handles delimiters (comma, quotes) in captured data
Keyword/Category scraping allowed for 2 entries in evaluation version.
Rendering issues with browser fixed - defaults to IE 9 rendering

Builds 2-55 (2010-2013)

December 2010 - May 2013

Private and incremental releases leading up to Version 3.0, including:

Ability to capture URLs and images in addition to text
Export to TSV (Tab Separated Values) format
Capture a set of similar links (categories)
Export scraped data to database (SQL)
Proxy manager for web scraping with IP rotation support
Ability scrape websites by submitting multiple keywords
Option to edit saved configuration files
Capture text near heading feature
Built-in scheduler for running mining tasks
Command line options
Support for MySQL database export
Option to capture substring of selected text
Support for scraping data from local HTML files
Option to resume mining from where it stopped
Option to auto-save captured data on regular intervals
Option to automatically inject pauses while mining

Build 1 - Initial Release

2010

Initial Release