I read a book about web scraping using Python, and Python has a lot of great libraries for it. I’ve also seen a couple of articles showing how web scraping can be done using Node.js lately. I thought that I’d look around and see how web scraping could be done using C#.
“Webscraping with C#” – CodeProject
A tutorial with multiple parts about web scraping using C#.
The first part of the series.
“Web Scraping In C#” – C# Corner
By Ali Imran. This one is very brief and it shows how you can get started very quickly using HtmlAgilityPack.
“Parsing HTML Documents with the Html Agility Pack” – 4 Guys From Rolla
Another tutorial about HtmlAgilityPack, but a bit longer than the other one, but still a quick read.
The post is here.
“Web Scraping in C#” – MSDN
This short post shows how you can scrape data using AngleSharp, which looks like a very interesting library.
The post is here on MSDN.
NScrape – GitHub project
A framwork for web scraping using C#. Looks interesting, and even allows you to submit forms using code.
NScrape’s GitHub page.
AngleSharp – GitHub project
AngleSharp is a .NET library that gives you the ability to parse angle bracket based hyper-texts like HTML, SVG, and MathML. XML without validation is also supported by the library. An important aspect of AngleSharp is that CSS can also be parsed. The parser is built upon the official W3C specification. This produces a perfectly portable HTML5 DOM representation of the given source code. Also current features such as
querySelectorAllwork for tree traversal.
I meantioned AngleSharp previously, and I think it looks like a very interesting project worth checking out.
AngleSharp on GitHub here.
IronWebScraper – Web Scraping library
IronWebScraper makes it easy to find and read content from websites in C#
It makes Web Scraping in C# easy, providing a tool for you to scrape the content of websites.
IronWebScraper looks quite professional, and it provides web scraping capabilities for C#.
Check it out here.