• Stars
    star
    1,948
  • Rank 22,808 (Top 0.5 %)
  • Language
    Rust
  • License
    Other
  • Created about 10 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

High-performance browser-grade HTML5 parser

html5ever

Build Status crates.io

API Documentation

html5ever is an HTML parser developed as part of the Servo project.

It can parse and serialize HTML according to the WHATWG specs (aka "HTML5"). However, there are some differences in the actual behavior currently, most of which are documented in the bug tracker. html5ever passes all tokenizer tests from html5lib-tests, with most tree builder tests outside of the unimplemented features. The goal is to pass all html5lib tests, while also providing all hooks needed by a production web browser, e.g. document.write.

Note that the HTML syntax is very similar to XML. For correct parsing of XHTML, use an XML parser (that said, many XHTML documents in the wild are serialized in an HTML-compatible form).

html5ever is written in Rust, therefore it avoids the notorious security problems that come along with using C. Being built with Rust also makes the library come with the high-grade performance you would expect from an HTML parser written in C. html5ever is basically a C HTML parser, but without needing a garbage collector or other heavy runtime processes.

Getting started in Rust

Add html5ever as a dependency in your Cargo.toml file:

[dependencies]
html5ever = "0.27"

You should also take a look at examples/html2html.rs, examples/print-rcdom.rs, and the API documentation.

Getting started in other languages

Bindings for Python and other languages are much desired.

Working on html5ever

To fetch the test suite, you need to run

git submodule update --init

Run cargo doc in the repository root to build local documentation under target/doc/.

Details

html5ever uses callbacks to manipulate the DOM, therefore it does not provide any DOM tree representation.

html5ever exclusively uses UTF-8 to represent strings. In the future it will support other document encodings (and UCS-2 document.write) by converting input.

The code is cross-referenced with the WHATWG syntax spec, and eventually we will have a way to present code and spec side-by-side.

html5ever builds against the official stable releases of Rust, though some optimizations are only supported on nightly releases.

More Repositories

1

servo

The Servo Browser Engine
23,804
star
2

pathfinder

A fast, practical GPU rasterizer for fonts and vector graphics
Rust
3,412
star
3

webrender

A GPU-based renderer for the web
Rust
2,982
star
4

rust-smallvec

"Small vector" optimization for Rust: store up to a small number of items on the stack
Rust
1,218
star
5

rust-url

URL parser for Rust
Rust
1,203
star
6

core-foundation-rs

Rust bindings to Core Foundation and other low level libraries on Mac OS X and iOS
Rust
893
star
7

ipc-channel

A multiprocess drop-in replacement for Rust channels
Rust
776
star
8

rust-cssparser

Rust implementation of CSS Syntax Level 3
Rust
677
star
9

font-kit

A cross-platform font loading library written in Rust
Rust
619
star
10

euclid

Geometry primitives (basic linear algebra) for Rust
Rust
409
star
11

gaol

Cross-platform application sandboxing for Rust
Rust
333
star
12

rust-fnv

Fowler–Noll–Vo hash function
Rust
320
star
13

rust-mozjs

DEPRECATED - moved to servo/mozjs instead.
Rust
293
star
14

cocoa-rs

DEPRECATED - Cocoa/Objective-C bindings for the Rust programming language
Rust
284
star
15

highfive

Github hooks to provide an encouraging atmosphere for new contributors
Python
246
star
16

tendril

Compact string type for zero-copy parsing
Rust
239
star
17

project

A repo for the Servo Project
232
star
18

uluru

A simple, fast, LRU cache implementation.
Rust
185
star
19

string-cache

String interning for Rust
Rust
184
star
20

surfman

Accelerated offscreen graphics for WebGL
Rust
162
star
21

mozjs

Servo's SpiderMonkey fork
Rust
156
star
22

rust-webvr

UNMAINTAINED - WebVR API implementation for servo.
Rust
106
star
23

skia

Skia
C++
105
star
24

heapsize

In support of measuring heap allocations in Rust programs.
Rust
99
star
25

gleam

Generated OpenGL bindings and wrapper for Servo.
Rust
82
star
26

webxr

Bindings for WebXR
Rust
81
star
27

media

Rust
76
star
28

rust-harfbuzz

Rust bindings to HarfBuzz
Rust
70
star
29

unicode-bidi

Implementation of the Unicode Bidirection Algorithm in Rust
Rust
68
star
30

rust-stb-image

Rust bindings to the awesome stb_image library
C
65
star
31

servo-starters

Servo Starters is a list of easy tasks that are good for beginners to rust or servo.
JavaScript
59
star
32

rust-layers

A GPU-accelerated 2D animation library for Rust
Rust
58
star
33

rust-azure

Rust bindings to mozilla-central's graphics abstraction layer
C++
56
star
34

saltfs

Salt Stack Filesystem
SaltStack
54
star
35

rust-opengles

[UNMAINTAINED] OpenGL ES 2.0 bindings for Rust (see servo/gleam)
Rust
42
star
36

mozangle

Mozilla’s fork of Google ANGLE, repackaged as a Rust crate
C++
40
star
37

rust-selectors

CSS Selectors matching for Rust
38
star
38

smallbitvec

A growable bit-vector for Rust, optimized for size
Rust
35
star
39

pixman

C
30
star
40

rust-png

Rust bindings for libpng - UNMAINTAINED - DO NOT USE
C
27
star
41

rust-freetype

Rust bindings for FreeType.
Rust
25
star
42

rust-http-client

[UNMAINTAINED] old HTTP client library for Rust
C
24
star
43

rust-xlib

Rust bindings for xlib. UNMAINTAINED
Rust
22
star
44

core-graphics-rs

DEPRECATED - CoreGraphics bindings for Rust
Rust
21
star
45

rust-glut

[UNMAINTAINED] GLUT bindings for Rust
Rust
20
star
46

core-text-rs

DEPRECATED - Rust bindings for CoreText.
Rust
18
star
47

devices

Servo-specific APIs to access various devices
Rust
17
star
48

hyper_serde

Serde support for Hyper types
Rust
17
star
49

rustc-test

A fork of Rust’s `test` crate that doesn’t require unstable language features.
Rust
17
star
50

stylo

Rust
17
star
51

rust-quicksort

A Rust quicksort implementation for in-place sorting.
Rust
16
star
52

doc.servo.org

Documentation generated from Servo’s source code in its master branch
15
star
53

rust-fontconfig

Rust bindings for fontconfig.
Rust
15
star
54

servo.org_2014-2020

Main website for Servo.
JavaScript
13
star
55

osmesa-src

OSMesa source code and cargo build scripts to compile on Linux and Mac
C
12
star
56

homebrew-servo

Servo formulae repo for Homebrew
Ruby
11
star
57

nss

Network Security Services - UNMAINTAINED - DO NOT USE
C
11
star
58

libfreetype2

C
11
star
59

rust-icu

Rust bindings to ICU (International Components for Unicode)
C++
11
star
60

libcss

[UNMAINTAINED] Servo fork of libcss from the NetSurf project
C
11
star
61

servo-warc-tests

Test Servo on Web Archive snapshots of real web sites
Shell
11
star
62

libhubbub

[UNMAINTAINED] HTML parser library from the NetSurf project
C
10
star
63

libfontconfig

Cargoified libfontconfig for Rust packages
C
10
star
64

cairo

C
10
star
65

libexpat

Not actively updating to new versions of expat. Pull requests to do so accepted.
C
10
star
66

blog.servo.org

The Servo blog
CSS
9
star
67

libgstreamer_android_gen

Scripts to generate Servo Media GStreamer dependencies on Android
Shell
9
star
68

plane-split

Plane splitting with euclid
Rust
9
star
69

io-surface-rs

Rust bindings to IOSurface.framework on Mac OS X and iOS
Rust
9
star
70

futf

Handling fragments of UTF-8 in Rust
Rust
8
star
71

servo.org

Servo project website
HTML
7
star
72

sparkle

GL bindings for Servo's WebGL implementation (alternative to the `gleam` crate)
Rust
7
star
73

fontsan

Sanitiser for untrusted font files
C
7
star
74

rust-glx

GLX 1.4 bindings for Linux
Rust
7
star
75

rust-hubbub

[UNMAINTAINED] Rust bindings to the hubbub HTML parser library from the NetSurf project
Rust
6
star
76

unicode-script

Rust
6
star
77

gecko-media

Firefox's media playback stack in a stand alone Rust crate
C
6
star
78

libparserutils

[UNMAINTAINED] libparserutils from the NetSurf project
C
5
star
79

rust-css

[UNMAINTAINED] obsolete CSS glue code for Servo
Rust
5
star
80

cgl-rs

Rust bindings for CGL on Mac
Rust
5
star
81

nspr

Netscape Portable Runtime
C
5
star
82

surfman-chains

An implementation of double-buffered swap chains for surfman
Rust
5
star
83

internal-wpt-dashboard

A simple wpt.fyi like dashboard to track progress of WPT scores for Servo's focus areas.
JavaScript
4
star
84

webrender_traits

DEPRECATED - now contained in https://github.com/servo/webrender/
Rust
4
star
85

rust-egl

wrapper of EGL (maintenance changes only)
Rust
4
star
86

download.servo.org

download.servo.org landing page
HTML
4
star
87

crowbot

Friendly robot to help Servo developers in #servo.
JavaScript
4
star
88

app_units

Rust
3
star
89

libpng

C
3
star
90

layout-zoo

A collection of spectacular and exotic CSS layout edge cases
3
star
91

servoexperiments.com

Experiments with Servo.
JavaScript
3
star
92

servo-nightly-builds

Repository to host Servo nightly builds using Github Releases.
Shell
2
star
93

content-blocker

A library for parsing Safari-style content blocking lists and dynamically evaluating the rules against against requests.
Rust
2
star
94

servo-viewer

Simple GLUT-based viewer app for Servo
Rust
2
star
95

rust-netsurfcss

[UNMAINTAINED] Rust bindings to libcss
Rust
2
star
96

rust-cairo

Rust bindings for Cairo.
Rust
2
star
97

servo-with-rust-nightly

Detecting breakage early
2
star
98

sharegl

[UNMAINTAINED] A Rust library for cross-process OpenGL texture sharing
Rust
2
star
99

intermittent-tracker

A live database of intermittent test failures based on github's webhook notifications.
Python
2
star
100

HLServo

Servo on Hololens
C++
1
star