Scraping TripAdvisor Reviews
The following video shows how WebHarvy can be configured to scrape reviews and ratings from TripAdvisor website. Details like reviewer name, review title, review text, rating etc. can be selected for extraction. Also shown is how 'Read More' links at the end of longer reviews can be clicked to expand them, so that we select the entire review text during mining.
The Regular Expression strings used in the video are copied below.
wrote a review (.*)
Scraping TripAdvisor Reviews by following links
In this method instead of selecting review details from the starting (review listings) page, we follow each review link to load the review details page, from where details like review title, content, reviewer name, location etc. are selected. Notice that, instead of clicking directly on the review link in the starting page and selecting Follow this link option, we select the entire review block and get the review details page URL from the HTML content of the area by applying Regular Expressions. The RegEx string used can be found in the video description.