• This repository has been archived on 17/Jul/2023
  • Stars
    star
    117
  • Rank 291,177 (Top 6 %)
  • Language
    C#
  • License
    MIT License
  • Created almost 5 years ago
  • Updated 10 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Lightweight HTML processor

LtGt

Status Build Coverage Version Downloads

Development of this project is entirely funded by the community. Consider donating to support!

Note: As an alternative, consider using AngleSharp, which is a more performant and feature-complete HTML processing library.

LtGt is a minimalistic library for working with HTML. It can parse any HTML5-compliant code into an object model which you can use to traverse nodes or locate specific elements. The library establishes itself as a foundation that you can build upon, and comes with a lot of extension methods that can help navigate the DOM easily.

Download

  • NuGet: dotnet add package LtGt

Features

  • Parse any HTML5-compliant code
  • Traverse the DOM using LINQ or Seq
  • Use basic element selectors like GetElementById(), GetElementsByTagName(), etc
  • Use CSS selectors via QueryElements()
  • Convert any HTML node to its equivalent Linq2Xml representation
  • Render any HTML entity to code
  • Targets .NET Framework 4.5+ and .NET Standard 1.6+

Screenshots

dom css selectors

Usage

LtGt is a library written in F# but it provides two separate idiomatic APIs that you can use from both C# and F#.

Parse a document

C#

using LtGt;

const string html = @"<!doctype html>
<html>
  <head>
    <title>Document</title>
  </head>
  <body>
    <div>Content</div>
  </body>
</html>";

// This throws an exception on parse errors
var document = Html.ParseDocument(html);

// -or-

// This returns a wrapped result instead
var documentResult = Html.TryParseDocument(html);
if (documentResult.IsOk)
{
    // Handle result
    var document = documentResult.ResultValue;
}
else
{
    // Handle error
    var error = documentResult.ErrorValue;
}

F#

open LtGt

let html = "<!doctype html>
<html>
  <head>
    <title>Document</title>
  </head>
  <body>
    <div>Content</div>
  </body>
</html>"

// This throws an exception on parse errors
let document = Html.parseDocument html

// -or-

// This returns a wrapped result instead
match Html.tryParseDocument html with
| Result.Ok document -> // handle result
| Result.Error error -> // handle error

Parse a fragment

C#

const string html = "<div id=\"some-element\"><a href=\"https://example.com\">Link</a></div>";

// Parse an element node
var element = Html.ParseElement(html);

// Parse any node
var node = Html.ParseNode(html);

F#

let html = "<div id=\"some-element\"><a href=\"https://example.com\">Link</a></div>"

// Parse an element node
let element = Html.parseElement html

// Parse any node
let node = Html.parseNode html

Find specific element

C#

var element1 = document.GetElementById("menu-bar");
var element2 = document.GetElementsByTagName("div").FirstOrDefault();
var element3 = document.GetElementsByClassName("floating-button floating-button--enabled").FirstOrDefault();

var element1Data = element1.GetAttributeValue("data");
var element2Id = element2.GetId();
var element3Text = element3.GetInnerText();

F#

let element1 = document |> Html.tryElementById "menu-bar"
let element2 = document |> Html.elementsByTagName "div" |> Seq.tryHead
let element3 = document |> Html.elementsByClassName "floating-button floating-button--enabled" |> Seq.tryHead

let element1Data = element1 |> Option.bind (Html.tryAttributeValue "data")
let element2Id = element2 |> Option.bind Html.tryId
let element3Text = element3 |> Option.map Html.innerText

You can leverage the full power of CSS selectors as well.

C#

var element = document.QueryElements("div#main > span.container:empty").FirstOrDefault();

F#

let element = document |> CssSelector.queryElements "div#main > span.container:empty" |> Seq.tryHead

Check equality

You can compare two HTML entities by value, including their descendants.

C#

var element1 = new HtmlElement("span",
    new HtmlAttribute("id", "foo"),
    new HtmlText("bar"));

var element2 = new HtmlElement("span",
    new HtmlAttribute("id", "foo"),
    new HtmlText("bar"));

var element3 = new HtmlElement("span",
    new HtmlAttribute("id", "foo"),
    new HtmlText("oof"));

var firstTwoEqual = HtmlEntityEqualityComparer.Instance.Equals(element1, element2); // true
var lastTwoEqual = HtmlEntityEqualityComparer.Instance.Equals(element2, element3); // false

F#

let element1 = HtmlElement("span",
    HtmlAttribute("id", "foo"),
    HtmlText("bar"))

let element2 = HtmlElement("span",
    HtmlAttribute("id", "foo"),
    HtmlText("bar"))

let element3 = HtmlElement("span",
    HtmlAttribute("id", "foo"),
    HtmlText("oof"))

let firstTwoEqual = Html.equal element1 element2 // true
let lastTwoEqual = Html.equal element2 element3 // false

Convert to Linq2Xml

You can convert LtGt's objects to System.Xml.Linq objects (XNode, XElement, etc). This can be useful if you need to convert HTML to XML or if you want to use XPath to select nodes.

C#

var htmlDocument = Html.ParseDocument(html);
var xmlDocument = (XDocument) htmlDocument.ToXObject();
var elements = xmlDocument.XPathSelectElements("//input[@type=\"submit\"]");

F#

let htmlDocument = Html.parseDocument html
let xmlDocument = htmlDocument |> Html.toXObject :?> XDocument
let elements = xmlDocument.XPathSelectElements("//input[@type=\"submit\"]")

Render nodes

You can turn any entity to its equivalent HTML code.

C#

var element = new HtmlElement("div",
    new HtmlAttribute("id", "main"),
    new HtmlText("Hello world"));

var html = element.ToHtml(); // <div id="main">Hello world</div>

F#

let element = HtmlElement("div",
    HtmlAttribute("id", "main"),
    HtmlText("Hello world"))

let html = element |> Html.toHtml // <div id="main">Hello world</div>

Benchmarks

This is how LtGt compares to popular HTML libraries when it comes to parsing a document (in this case, a YouTube video watch page). The results are not in favor of LtGt so if performance is important for your task, you should probably consider using a different parser. That said, these results are still pretty impressive for a parser built with parser combinators as opposed to a traditional manual approach.

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.14393.3384 (1607/AnniversaryUpdate/Redstone1)
Intel Core i5-4460 CPU 3.20GHz (Haswell), 1 CPU, 4 logical and 4 physical cores
Frequency=3125000 Hz, Resolution=320.0000 ns, Timer=TSC
.NET Core SDK=3.1.100
[Host]     : .NET Core 3.1.0 (CoreCLR 4.700.19.56402, CoreFX 4.700.19.56404), X64 RyuJIT DEBUG
DefaultJob : .NET Core 3.1.0 (CoreCLR 4.700.19.56402, CoreFX 4.700.19.56404), X64 RyuJIT
Method Mean Error StdDev Ratio Rank
AngleSharp 11.94 ms 0.104 ms 0.097 ms 0.29 1
HtmlAgilityPack 20.51 ms 0.140 ms 0.124 ms 0.49 2
LtGt 41.59 ms 0.450 ms 0.399 ms 1.00 3

More Repositories

1

DiscordChatExporter

Exports Discord chat logs to a file
C#
5,677
star
2

YoutubeDownloader

Downloads videos and playlists from YouTube
C#
4,837
star
3

CliWrap

Library for running command-line processes
C#
3,705
star
4

YoutubeExplode

Abstraction layer over YouTube's internal API
C#
2,473
star
5

LightBulb

Reduces eye strain by adjusting gamma based on the current time
C#
1,762
star
6

CliFx

Class-first framework for building command-line interfaces
C#
1,324
star
7

Onova

Unintrusive auto-update framework
C#
400
star
8

DotnetRuntimeBootstrapper

Bootstrapped framework-dependent deployment for .NET applications
C#
238
star
9

GitHubActionsTestLogger

.NET test logger that reports to GitHub Actions
C#
221
star
10

MiniRazor

Portable Razor compiler & code generator
C#
214
star
11

Gress

Progress reporting toolbox
C#
137
star
12

OsuHelper

Beatmap suggester for osu!
C#
95
star
13

YoutubeExplode.Converter

Muxes and converts videos from YoutubeExplode
C#
87
star
14

SpellingUkraine

Learn the correct way to spell Ukrainian names in English
TypeScript
77
star
15

interview-questions

Collection of popular interview questions and their answers
66
star
16

Ressy

Resource editor for PE files
C#
43
star
17

Contextual

Implicit parameters via contexts
C#
40
star
18

JetBrainsDotnetDay2020

Presentation and code for my talk at JetBrains .NET Day Online 2020
F#
35
star
19

Extensions

My .NET extensions
C#
29
star
20

JsonExtensions

Extensions for System.Text.Json
C#
29
star
21

YoutubeMusicDownloader

Downloads Youtube videos and playlists as mp3 files
C#
26
star
22

Hallstatt

Low-ceremony testing framework optimized for modern C#
C#
25
star
23

Deorcify

Prevent your software from being used by terrorists
C#
22
star
24

WpfExtensions

My WPF extensions, converters and behaviors
C#
19
star
25

route-descriptor

Single source of truth for routing
TypeScript
17
star
26

Cogwheel

Library for managing application settings
C#
16
star
27

QuickJson

Simple JSON parser in a source-only package
C#
14
star
28

LockFile

Simplest lock file implementation
C#
13
star
29

Failsafe

Retry utility
C#
13
star
30

Tyrrrz.me

My personal website
TypeScript
13
star
31

.github

Assets shared between my repositories
12
star
32

FuncTestingInAspNetCoreExample

Example of doing functional testing with an ASP.NET Core application
C#
12
star
33

PolyShim

Polyfills for projects targeting older versions of .NET
C#
11
star
34

MyFlickList

Social cataloging app for movies and TV-shows
TypeScript
9
star
35

RaidTrend

Documenting air raid alerts across Ukraine
TypeScript
8
star
36

DotNetFest2019

My presentation and live demo project used during my talk at .NET Fest 2019
C#
8
star
37

hip-cloud-test

will delete later
TypeScript
6
star
38

WPSteamMarketExcerpt

Embeds Steam Market listings into WordPress pages
PHP
6
star
39

netfwdays-hipster-cloud

.NET fwdays'21 workshop
TypeScript
4
star
40

DiscordFonts

4
star
41

PrintForegroundWindow

Prints info about current foreground window
C#
4
star
42

Hashsum

Culture-invariant fluent checksum builder
C#
3
star
43

Tyrrrz

Profile readme
2
star
44

action-http-request

GitHub Action that sends an HTTP request
JavaScript
2
star
45

gatsby-plugin-clicky

Clicky web analytics integration for Gatsby
JavaScript
2
star
46

twitter-auth-cli

Quickly generate access token and secret for Twitter API
TypeScript
2
star
47

Scheddulit

Post scheduler and batcher for Reddit
TypeScript
1
star
48

OnovaTestRepo

Test repository used for Onova integration tests
1
star
49

AspNetCore.Mvc.Lightbox

Tag helper used to initialize Lightbox
C#
1
star
50

action-get-tag

GitHub Action that extracts current git tag
JavaScript
1
star
51

AspNetCore.Mvc.Clicky

Tag helper used to render Clicky activity tracker
C#
1
star
52

AspNetCore.Mvc.Disqus

Tag helper used to render Disqus threads
C#
1
star
53

BMAC-API-Cache

Caching layer for BuyMeACoffee API
TypeScript
1
star