LINQ to HTML

LINQ to HTML is a free, full featured .NET HTML parser modelled on Microsoft's LINQ to XML. The key benefit of LINQ to HTML is that it is just as simple as LINQ to XML and handles malformed HTML. If you like working with the XDocument class you will love this library.

Install using Nuget

PM> Install-Package Bitlush.LinqToHtml

Example - Parsing Malformed HTML

LINQ to HTML handles malformed HTML (so called "tag soup") so your HTML documents do not have to conform to any minimum standards.

HDocument document = HDocument.Parse("<HTML><img src=something.gif></HTML>");

Console.WriteLine(document.ToString());

The output reflects a clean version of input. For example, closing tags are added to elements and quotes are added to attributes.

<html>
<img src="something.gif" />
</html>

Example - Extracting all Images from a Document

LINQ to HTML uses an identical API to LINQ to XML, making it easy to manipulate HTML documents.

foreach (HElement element in document.Descendants("img"))
{
   Console.WriteLine("src = " + element.Attribute("src"));
}