• Stars
    star
    259
  • Rank 157,669 (Top 4 %)
  • Language
    JavaScript
  • License
    ISC License
  • Created over 7 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A fast, safe, compliant XML parser for Node.js and browsers.

parse-xml

A fast, safe, compliant XML parser for Node.js and browsers.

npm version Bundle size CI

Links

Installation

npm install @rgrove/parse-xml

Or, if you like living dangerously, you can load the minified bundle in a browser via Unpkg and use the parseXml global.

Features

  • Returns a convenient object tree representing an XML document.

  • Works great in Node.js and browsers.

  • Provides helpful, detailed error messages with context when a document is not well-formed.

  • Mostly conforms to XML 1.0 (Fifth Edition) as a non-validating parser (see below for details).

  • Passes all relevant tests in the XML Conformance Test Suite.

  • Written in TypeScript and compiled to ES2020 JavaScript for Node.js and ES2017 JavaScript for browsers. The browser build is also optimized for minification.

  • Extremely fast and surprisingly small.

  • Zero dependencies.

Not Features

While this parser is capable of parsing document type declarations (<!DOCTYPE ... >) and including them in the node tree, it doesn't actually do anything with them. External document type definitions won't be loaded, and the parser won't validate the document against a DTD or resolve custom entity references defined in a DTD.

In addition, the only supported character encoding is UTF-8 because it's not feasible (or useful) to support other character encodings in JavaScript.

Examples

Basic Usage

ESM

import { parseXml } from '@rgrove/parse-xml';
parseXml('<kittens fuzzy="yes">I like fuzzy kittens.</kittens>');

CommonJS

const { parseXml } = require('@rgrove/parse-xml');
parseXml('<kittens fuzzy="yes">I like fuzzy kittens.</kittens>');

The result is an XmlDocument instance containing the parsed document, with a structure that looks like this (some properties and methods are excluded for clarity; see the API docs for details):

{
  type: 'document',
  children: [
    {
      type: 'element',
      name: 'kittens',
      attributes: {
        fuzzy: 'yes'
      },
      children: [
        {
          type: 'text',
          text: 'I like fuzzy kittens.'
        }
      ],
      parent: { ... },
      isRootNode: true
    }
  ]
}

All parse-xml objects have toJSON() methods that return JSON-serializable objects, so you can easily convert an XML document to JSON:

let json = JSON.stringify(parseXml(xml));

Friendly Errors

When something goes wrong, parse-xml throws an error that tells you exactly what happened and shows you where the problem is so you can fix it.

parseXml('<foo><bar>baz</foo>');

Output

Error: Missing end tag for element bar (line 1, column 14)
  <foo><bar>baz</foo>
               ^

In addition to a helpful message, error objects have the following properties:

  • column Number

    Column where the error occurred (1-based).

  • excerpt String

    Excerpt from the input string that contains the problem.

  • line Number

    Line where the error occurred (1-based).

  • pos Number

    Character position where the error occurred relative to the beginning of the input (0-based).

Why another XML parser?

There are many XML parsers for Node, and some of them are good. However, most of them suffer from one or more of the following shortcomings:

  • Native dependencies.

  • Loose, non-standard parsing behavior that can lead to unexpected or even unsafe results when given input the author didn't anticipate.

  • Kitchen sink APIs that tightly couple a parser with DOM manipulation functions, a stringifier, or other tooling that isn't directly related to parsing and consuming XML.

  • Stream-based parsing. This is great in the rare case that you need to parse truly enormous documents, but can be a pain to work with when all you want is a node tree.

  • Poor error handling.

  • Too big or too Node-specific to work well in browsers.

parse-xml's goal is to be a small, fast, safe, compliant, non-streaming, non-validating, browser-friendly parser, because I think this is an under-served niche.

I think parse-xml demonstrates that it's not necessary to jettison the spec entirely or to write complex code in order to implement a small, fast XML parser.

Also, it was fun.

Benchmark

Here's how parse-xml's performance stacks up against a few comparable libraries:

While libxmljs2 is faster at parsing medium and large documents, its performance comes at the expense of a large C dependency, no browser support, and a history of security vulnerabilities in the underlying libxml2 library.

In these results, "ops/s" refers to operations per second. Higher is faster.

Node.js v18.14.0 / Darwin arm64
Apple M1 Max

Running "Small document (291 bytes)" suite...
Progress: 100%

  @rgrove/parse-xml 4.1.0:
    191 553 ops/s, Β±0.10%   | fastest

  fast-xml-parser 4.1.1:
    142 565 ops/s, Β±0.11%   | 25.57% slower

  libxmljs2 0.31.0 (native):
    74 646 ops/s, Β±0.30%    | 61.03% slower

  xmldoc 1.2.0 (sax-js):
    66 823 ops/s, Β±0.09%    | slowest, 65.12% slower

Finished 4 cases!
  Fastest: @rgrove/parse-xml 4.1.0
  Slowest: xmldoc 1.2.0 (sax-js)

Running "Medium document (72081 bytes)" suite...
Progress: 100%

  @rgrove/parse-xml 4.1.0:
    1 065 ops/s, Β±0.11%   | 49.81% slower

  fast-xml-parser 4.1.1:
    637 ops/s, Β±0.12%     | 69.98% slower

  libxmljs2 0.31.0 (native):
    2 122 ops/s, Β±2.48%   | fastest

  xmldoc 1.2.0 (sax-js):
    444 ops/s, Β±0.36%     | slowest, 79.08% slower

Finished 4 cases!
  Fastest: libxmljs2 0.31.0 (native)
  Slowest: xmldoc 1.2.0 (sax-js)

Running "Large document (1162464 bytes)" suite...
Progress: 100%

  @rgrove/parse-xml 4.1.0:
    93 ops/s, Β±0.10%    | 53.27% slower

  fast-xml-parser 4.1.1:
    48 ops/s, Β±0.60%    | 75.88% slower

  libxmljs2 0.31.0 (native):
    199 ops/s, Β±1.47%   | fastest

  xmldoc 1.2.0 (sax-js):
    38 ops/s, Β±0.09%    | slowest, 80.9% slower

Finished 4 cases!
  Fastest: libxmljs2 0.31.0 (native)
  Slowest: xmldoc 1.2.0 (sax-js)

See the parse-xml-benchmark repo for instructions on how to run this benchmark yourself.

License

ISC License

More Repositories

1

rawgit

Served files from raw.githubusercontent.com, but with the correct content types. No longer actively developed.
JavaScript
2,390
star
2

sanitize

Ruby HTML and CSS sanitizer.
HTML
2,020
star
3

lazyload

πŸ’€ An ancient tiny JS and CSS loader from the days before everyone had written one. Unmaintained.
JavaScript
1,391
star
4

larch

πŸ’€ Larch copies messages from one IMAP server to another. No longer maintained.
Ruby
616
star
5

jsmin-php

πŸ’€ PHP port of Douglas Crockford's JSMin JavaScript minifier. No longer maintained.
PHP
423
star
6

crass

A Ruby CSS parser that's fully compliant with the CSS Syntax Level 3 specification.
Ruby
139
star
7

combohandler

A simple Yahoo!-style combo handler in Node.js.
JavaScript
116
star
8

node-elastical

πŸ’€ Elastical has moved to https://github.com/ramv/node-elastical and this repo is no longer maintained. Please update your bookmarks!
JavaScript
101
star
9

thoth

πŸ’€ An unmaintained and probably broken Ruby blog engine.
Ruby
67
star
10

emergencykitten

Sometimes you just need a kitten.
JavaScript
66
star
11

jsmin

πŸ’€ Ruby library for minifying JavaScript. Based on Douglas Crockford's jsmin.c. Unmaintained.
JavaScript
51
star
12

textual-sulaco

Sulaco, a style for the Textual IRC client
HTML
44
star
13

synchrotron

Watches a local directory and syncs files to another directory or a remote destination using rsync whenever changes occur.
JavaScript
40
star
14

cssmin

πŸ’€ Ruby library for minifying CSS. Unmaintained.
Ruby
38
star
15

selleck

πŸ’€ Now maintained at https://github.com/yui/selleck
JavaScript
38
star
16

storage-lite

πŸ’€ Lightweight YUI 3 API for persistent cross-browser key/value storage similar to the HTML5 localStorage API. Unmaintained.
JavaScript
29
star
17

pie.gd

Config files, scripts, and documentation for the pie.gd Mastodon instance.
Dockerfile
23
star
18

lectroid

πŸ’€ A really boring blog engine. Unmaintained.
JavaScript
17
star
19

cachetest

πŸ’€ A Sinatra app for testing browser cache characteristics. Unmaintained.
Ruby
16
star
20

trogdor

πŸ’€ A fast, simple search-as-you-type implementation in JavaScript using the Yahoo! Search BOSS API. Unmaintained.
JavaScript
16
star
21

jslib-stats

πŸ’€ Node.js-based crawler that gathers JavaScript library usage stats by executing and inspecting JS. Unmaintained.
JavaScript
12
star
22

vim-yui3

πŸ’€ Vim syntax for YUI3. Unmaintained.
JavaScript
11
star
23

yuitweets

πŸ’€ A Bayesian tweet classifier that can learn the difference between tweets about the YUI Library and tweets about J-pop idols named Yui. Unmaintained.
Ruby
10
star
24

node-tokeninput

πŸ’€ YUI 3 Node plugin that turns a text input field into a tokenized input field similar to Cocoa's NSTokenField control. Unmaintained.
JavaScript
6
star
25

javascript-yui3.tmbundle

πŸ’€ TextMate bundle for YUI 3. Unmaintained.
JavaScript
6
star
26

denyssh

πŸ’€ Blocks SSH brute force attacks using PF. Unmaintained.
Ruby
5
star
27

tweetslurp

πŸ’€ Backs up tweets to a JSON file. Unmaintained.
JavaScript
4
star
28

jshint

πŸ’€ The Kinder, Gentler JavaScript Code Quality Tool
JavaScript
3
star
29

crackup

πŸ’€ Crappy remote backup. Unmaintained.
Ruby
3
star
30

sanitize-web

A super simple web interface to Sanitize, mostly for testing purposes.
HTML
3
star
31

sandbox

πŸ’€ YUI 3 module that simplifies the process of creating isolated iframe sandboxes in which to evaluate JavaScript code for tasks like profiling or unit testing. Unmaintained.
JavaScript
2
star
32

jetpants

πŸ’€
JavaScript
2
star
33

weld

πŸ’€ Combines and minifies CSS and JavaScript files at runtime and build time. Unmaintained.
Ruby
2
star
34

sniffle

πŸ’€ CLI app that learns and identifies user agent strings using a Redis-backed naive Bayes classifier. Just a silly experiment. Unmaintained.
JavaScript
2
star
35

denyspam

πŸ’€ Monitors a mail server log file and uses a firewall to temporarily block or redirect incoming packets from hosts that display spammer-like behavior. Unmaintained.
1
star
36

parse-xml-benchmark

Benchmarks for @rgrove/parse-xml
JavaScript
1
star