Thursday, 16 May 2013

.net & c# website scraper

This is a website scraper implemented in C# and .NET that uses regular expressions (regex) to scrape only the text directly related to the website content.

This website scraper was developed by Kevin Rio, a Miami Website Developer,  for the purpose of collecting information and content from affiliate sites, such as E-Commerce applications that do not provide an API or an externally available database. It is also useful for SEO professionals to evaluate competitor websites.

This is a completely free website scraper that can be used in any commercial or private applications.

What This Website Scraper Does

This web scraper collects a website address as input from the user and retrieves all of the HTML code from that site. It then filters all of the HTML and JavaScript code using regular expressions and leaves only the website’s text for the user to explore. It provides the text in an easy to copy text area and a variable that is easy to manipulate and extend in your own scripts.

The download provides all of the files including the Visual Studio solution file.

Source: http://www.krio.me/dot-net-c-sharp-web-scraper/

No comments:

Post a Comment