• Stars
    star
    2,089
  • Rank 22,116 (Top 0.5 %)
  • Language
    Rust
  • License
    Other
  • Created over 10 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

High-performance browser-grade HTML5 parser

html5ever

Build Status crates.io

API Documentation

html5ever is an HTML parser developed as part of the Servo project.

It can parse and serialize HTML according to the WHATWG specs (aka "HTML5"). However, there are some differences in the actual behavior currently, most of which are documented in the bug tracker. html5ever passes all tokenizer tests from html5lib-tests, with most tree builder tests outside of the unimplemented features. The goal is to pass all html5lib tests, while also providing all hooks needed by a production web browser, e.g. document.write.

Note that the HTML syntax is very similar to XML. For correct parsing of XHTML, use an XML parser (that said, many XHTML documents in the wild are serialized in an HTML-compatible form).

html5ever is written in Rust, therefore it avoids the notorious security problems that come along with using C. Being built with Rust also makes the library come with the high-grade performance you would expect from an HTML parser written in C. html5ever is basically a C HTML parser, but without needing a garbage collector or other heavy runtime processes.

Getting started in Rust

Add html5ever as a dependency in your Cargo.toml file:

[dependencies]
html5ever = "0.27"

You should also take a look at examples/html2html.rs, examples/print-rcdom.rs, and the API documentation.

Getting started in other languages

Bindings for Python and other languages are much desired.

Working on html5ever

To fetch the test suite, you need to run

git submodule update --init

Run cargo doc in the repository root to build local documentation under target/doc/.

Details

html5ever uses callbacks to manipulate the DOM, therefore it does not provide any DOM tree representation.

html5ever exclusively uses UTF-8 to represent strings. In the future it will support other document encodings (and UCS-2 document.write) by converting input.

The code is cross-referenced with the WHATWG syntax spec, and eventually we will have a way to present code and spec side-by-side.

html5ever builds against the official stable releases of Rust, though some optimizations are only supported on nightly releases.

More Repositories

1

servo

The Servo Browser Engine
23,804
star
2

pathfinder

A fast, practical GPU rasterizer for fonts and vector graphics
Rust
3,587
star
3

webrender

A GPU-based renderer for the web
Rust
3,097
star
4

rust-smallvec

"Small vector" optimization for Rust: store up to a small number of items on the stack
Rust
1,320
star
5

rust-url

URL parser for Rust
Rust
1,290
star
6

core-foundation-rs

Rust bindings to Core Foundation and other low level libraries on Mac OS X and iOS
Rust
992
star
7

ipc-channel

A multiprocess drop-in replacement for Rust channels
Rust
838
star
8

rust-cssparser

Rust implementation of CSS Syntax Level 3
Rust
732
star
9

font-kit

A cross-platform font loading library written in Rust
Rust
676
star
10

euclid

Geometry primitives (basic linear algebra) for Rust
Rust
409
star
11

gaol

Cross-platform application sandboxing for Rust
Rust
342
star
12

rust-fnv

Fowler–Noll–Vo hash function
Rust
332
star
13

rust-mozjs

DEPRECATED - moved to servo/mozjs instead.
Rust
293
star
14

cocoa-rs

DEPRECATED - Cocoa/Objective-C bindings for the Rust programming language
Rust
284
star
15

highfive

Github hooks to provide an encouraging atmosphere for new contributors
Python
255
star
16

tendril

Compact string type for zero-copy parsing
Rust
248
star
17

project

A repo for the Servo Project
236
star
18

string-cache

String interning for Rust
Rust
193
star
19

uluru

A simple, fast, LRU cache implementation.
Rust
191
star
20

surfman

Accelerated offscreen graphics for WebGL
Rust
171
star
21

mozjs

Servo's SpiderMonkey fork
Rust
156
star
22

rust-webvr

UNMAINTAINED - WebVR API implementation for servo.
Rust
106
star
23

skia

Skia
C++
105
star
24

heapsize

In support of measuring heap allocations in Rust programs.
Rust
99
star
25

gleam

Generated OpenGL bindings and wrapper for Servo.
Rust
83
star
26

media

Rust
82
star
27

webxr

Bindings for WebXR
Rust
81
star
28

unicode-bidi

Implementation of the Unicode Bidirection Algorithm in Rust
Rust
75
star
29

rust-harfbuzz

Rust bindings to HarfBuzz
Rust
70
star
30

rust-stb-image

Rust bindings to the awesome stb_image library
C
65
star
31

stylo

Rust
59
star
32

rust-layers

A GPU-accelerated 2D animation library for Rust
Rust
58
star
33

servo-starters

Servo Starters is a list of easy tasks that are good for beginners to rust or servo.
JavaScript
58
star
34

saltfs

Salt Stack Filesystem
SaltStack
56
star
35

rust-azure

Rust bindings to mozilla-central's graphics abstraction layer
C++
56
star
36

rust-opengles

[UNMAINTAINED] OpenGL ES 2.0 bindings for Rust (see servo/gleam)
Rust
42
star
37

mozangle

Mozilla’s fork of Google ANGLE, repackaged as a Rust crate
C++
40
star
38

rust-selectors

CSS Selectors matching for Rust
38
star
39

smallbitvec

A growable bit-vector for Rust, optimized for size
Rust
37
star
40

pixman

C
30
star
41

rust-png

Rust bindings for libpng - UNMAINTAINED - DO NOT USE
C
27
star
42

rust-freetype

Rust bindings for FreeType.
Rust
25
star
43

rust-http-client

[UNMAINTAINED] old HTTP client library for Rust
C
24
star
44

rust-xlib

Rust bindings for xlib. UNMAINTAINED
Rust
22
star
45

core-graphics-rs

DEPRECATED - CoreGraphics bindings for Rust
Rust
21
star
46

rust-glut

[UNMAINTAINED] GLUT bindings for Rust
Rust
20
star
47

devices

Servo-specific APIs to access various devices
Rust
19
star
48

core-text-rs

DEPRECATED - Rust bindings for CoreText.
Rust
18
star
49

rustc-test

A fork of Rust’s `test` crate that doesn’t require unstable language features.
Rust
17
star
50

rust-quicksort

A Rust quicksort implementation for in-place sorting.
Rust
17
star
51

hyper_serde

Serde support for Hyper types
Rust
16
star
52

doc.servo.org

Documentation generated from Servo’s source code in its master branch
HTML
15
star
53

rust-fontconfig

Rust bindings for fontconfig.
Rust
15
star
54

book

The Servo Book
JavaScript
14
star
55

libfreetype2

C
13
star
56

servo.org_2014-2020

Main website for Servo.
JavaScript
13
star
57

osmesa-src

OSMesa source code and cargo build scripts to compile on Linux and Mac
C
12
star
58

homebrew-servo

Servo formulae repo for Homebrew
Ruby
11
star
59

nss

Network Security Services - UNMAINTAINED - DO NOT USE
C
11
star
60

plane-split

Plane splitting with euclid
Rust
11
star
61

rust-icu

Rust bindings to ICU (International Components for Unicode)
C++
11
star
62

libcss

[UNMAINTAINED] Servo fork of libcss from the NetSurf project
C
11
star
63

servo-warc-tests

Test Servo on Web Archive snapshots of real web sites
Shell
11
star
64

libfontconfig

Cargoified libfontconfig for Rust packages
C
10
star
65

libhubbub

[UNMAINTAINED] HTML parser library from the NetSurf project
C
10
star
66

cairo

C
10
star
67

libexpat

Not actively updating to new versions of expat. Pull requests to do so accepted.
C
10
star
68

blog.servo.org

The Servo blog
CSS
9
star
69

libgstreamer_android_gen

Scripts to generate Servo Media GStreamer dependencies on Android
Shell
9
star
70

io-surface-rs

Rust bindings to IOSurface.framework on Mac OS X and iOS
Rust
9
star
71

servo.org

Servo project website
HTML
8
star
72

futf

Handling fragments of UTF-8 in Rust
Rust
8
star
73

fontsan

Sanitiser for untrusted font files
C
8
star
74

sparkle

GL bindings for Servo's WebGL implementation (alternative to the `gleam` crate)
Rust
7
star
75

rust-glx

GLX 1.4 bindings for Linux
Rust
7
star
76

rust-hubbub

[UNMAINTAINED] Rust bindings to the hubbub HTML parser library from the NetSurf project
Rust
6
star
77

unicode-script

Rust
6
star
78

gecko-media

Firefox's media playback stack in a stand alone Rust crate
C
6
star
79

internal-wpt-dashboard

A simple wpt.fyi like dashboard to track progress of WPT scores for Servo's focus areas.
JavaScript
5
star
80

libparserutils

[UNMAINTAINED] libparserutils from the NetSurf project
C
5
star
81

rust-css

[UNMAINTAINED] obsolete CSS glue code for Servo
Rust
5
star
82

cgl-rs

Rust bindings for CGL on Mac
Rust
5
star
83

nspr

Netscape Portable Runtime
C
5
star
84

rust-egl

wrapper of EGL (maintenance changes only)
Rust
5
star
85

surfman-chains

An implementation of double-buffered swap chains for surfman
Rust
5
star
86

webrender_traits

DEPRECATED - now contained in https://github.com/servo/webrender/
Rust
4
star
87

download.servo.org

download.servo.org landing page
HTML
4
star
88

app_units

Rust
3
star
89

libpng

C
3
star
90

layout-zoo

A collection of spectacular and exotic CSS layout edge cases
3
star
91

servoexperiments.com

Experiments with Servo.
JavaScript
3
star
92

intermittent-tracker

A live database of intermittent test failures based on github's webhook notifications.
Python
3
star
93

servo-nightly-builds

Repository to host Servo nightly builds using Github Releases.
Shell
2
star
94

content-blocker

A library for parsing Safari-style content blocking lists and dynamically evaluating the rules against against requests.
Rust
2
star
95

servo-viewer

Simple GLUT-based viewer app for Servo
Rust
2
star
96

rust-netsurfcss

[UNMAINTAINED] Rust bindings to libcss
Rust
2
star
97

rust-cairo

Rust bindings for Cairo.
Rust
2
star
98

servo-with-rust-nightly

Detecting breakage early
2
star
99

sharegl

[UNMAINTAINED] A Rust library for cross-process OpenGL texture sharing
Rust
2
star
100

nelson

Newbors for Servo
Python
1
star