• Stars
    star
    1,046
  • Rank 44,062 (Top 0.9 %)
  • Language
    JavaScript
  • License
    Other
  • Created almost 15 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A sax style parser for JS

sax js

A sax-style parser for XML and HTML.

Designed with node in mind, but should work fine in the browser or other CommonJS implementations.

What This Is

  • A very simple tool to parse through an XML string.
  • A stepping stone to a streaming HTML parser.
  • A handy way to deal with RSS and other mostly-ok-but-kinda-broken XML docs.

What This Is (probably) Not

  • An HTML Parser - That's a fine goal, but this isn't it. It's just XML.
  • A DOM Builder - You can use it to build an object model out of XML, but it doesn't do that out of the box.
  • XSLT - No DOM = no querying.
  • 100% Compliant with (some other SAX implementation) - Most SAX implementations are in Java and do a lot more than this does.
  • An XML Validator - It does a little validation when in strict mode, but not much.
  • A Schema-Aware XSD Thing - Schemas are an exercise in fetishistic masochism.
  • A DTD-aware Thing - Fetching DTDs is a much bigger job.

Regarding <!DOCTYPEs and <!ENTITYs

The parser will handle the basic XML entities in text nodes and attribute values: &amp; &lt; &gt; &apos; &quot;. It's possible to define additional entities in XML by putting them in the DTD. This parser doesn't do anything with that. If you want to listen to the ondoctype event, and then fetch the doctypes, and read the entities and add them to parser.ENTITIES, then be my guest.

Unknown entities will fail in strict mode, and in loose mode, will pass through unmolested.

Usage

var sax = require("./lib/sax"),
  strict = true, // set to false for html-mode
  parser = sax.parser(strict);

parser.onerror = function (e) {
  // an error happened.
};
parser.ontext = function (t) {
  // got some text.  t is the string of text.
};
parser.onopentag = function (node) {
  // opened a tag.  node has "name" and "attributes"
};
parser.onattribute = function (attr) {
  // an attribute.  attr has "name" and "value"
};
parser.onend = function () {
  // parser stream is done, and ready to have more stuff written to it.
};

parser.write('<xml>Hello, <who name="world">world</who>!</xml>').close();

// stream usage
// takes the same options as the parser
var saxStream = require("sax").createStream(strict, options)
saxStream.on("error", function (e) {
  // unhandled errors will throw, since this is a proper node
  // event emitter.
  console.error("error!", e)
  // clear the error
  this._parser.error = null
  this._parser.resume()
})
saxStream.on("opentag", function (node) {
  // same object as above
})
// pipe is supported, and it's readable/writable
// same chunks coming in also go out.
fs.createReadStream("file.xml")
  .pipe(saxStream)
  .pipe(fs.createWriteStream("file-copy.xml"))

Arguments

Pass the following arguments to the parser function. All are optional.

strict - Boolean. Whether or not to be a jerk. Default: false.

opt - Object bag of settings regarding string formatting. All default to false.

Settings supported:

  • trim - Boolean. Whether or not to trim text and comment nodes.
  • normalize - Boolean. If true, then turn any whitespace into a single space.
  • lowercase - Boolean. If true, then lowercase tag names and attribute names in loose mode, rather than uppercasing them.
  • xmlns - Boolean. If true, then namespaces are supported.
  • position - Boolean. If false, then don't track line/col/position.
  • strictEntities - Boolean. If true, only parse predefined XML entities (&amp;, &apos;, &gt;, &lt;, and &quot;)

Methods

write - Write bytes onto the stream. You don't have to do this all at once. You can keep writing as much as you want.

close - Close the stream. Once closed, no more data may be written until it is done processing the buffer, which is signaled by the end event.

resume - To gracefully handle errors, assign a listener to the error event. Then, when the error is taken care of, you can call resume to continue parsing. Otherwise, the parser will not continue while in an error state.

Members

At all times, the parser object will have the following members:

line, column, position - Indications of the position in the XML document where the parser currently is looking.

startTagPosition - Indicates the position where the current tag starts.

closed - Boolean indicating whether or not the parser can be written to. If it's true, then wait for the ready event to write again.

strict - Boolean indicating whether or not the parser is a jerk.

opt - Any options passed into the constructor.

tag - The current tag being dealt with.

And a bunch of other stuff that you probably shouldn't touch.

Events

All events emit with a single argument. To listen to an event, assign a function to on<eventname>. Functions get executed in the this-context of the parser object. The list of supported events are also in the exported EVENTS array.

When using the stream interface, assign handlers using the EventEmitter on function in the normal fashion.

error - Indication that something bad happened. The error will be hanging out on parser.error, and must be deleted before parsing can continue. By listening to this event, you can keep an eye on that kind of stuff. Note: this happens much more in strict mode. Argument: instance of Error.

text - Text node. Argument: string of text.

doctype - The <!DOCTYPE declaration. Argument: doctype string.

processinginstruction - Stuff like <?xml foo="blerg" ?>. Argument: object with name and body members. Attributes are not parsed, as processing instructions have implementation dependent semantics.

sgmldeclaration - Random SGML declarations. Stuff like <!ENTITY p> would trigger this kind of event. This is a weird thing to support, so it might go away at some point. SAX isn't intended to be used to parse SGML, after all.

opentagstart - Emitted immediately when the tag name is available, but before any attributes are encountered. Argument: object with a name field and an empty attributes set. Note that this is the same object that will later be emitted in the opentag event.

opentag - An opening tag. Argument: object with name and attributes. In non-strict mode, tag names are uppercased, unless the lowercase option is set. If the xmlns option is set, then it will contain namespace binding information on the ns member, and will have a local, prefix, and uri member.

closetag - A closing tag. In loose mode, tags are auto-closed if their parent closes. In strict mode, well-formedness is enforced. Note that self-closing tags will have closeTag emitted immediately after openTag. Argument: tag name.

attribute - An attribute node. Argument: object with name and value. In non-strict mode, attribute names are uppercased, unless the lowercase option is set. If the xmlns option is set, it will also contains namespace information.

comment - A comment node. Argument: the string of the comment.

opencdata - The opening tag of a <![CDATA[ block.

cdata - The text of a <![CDATA[ block. Since <![CDATA[ blocks can get quite large, this event may fire multiple times for a single block, if it is broken up into multiple write()s. Argument: the string of random character data.

closecdata - The closing tag (]]>) of a <![CDATA[ block.

opennamespace - If the xmlns option is set, then this event will signal the start of a new namespace binding.

closenamespace - If the xmlns option is set, then this event will signal the end of a namespace binding.

end - Indication that the closed stream has ended.

ready - Indication that the stream has reset, and is ready to be written to.

noscript - In non-strict mode, <script> tags trigger a "script" event, and their contents are not checked for special xml characters. If you pass noscript: true, then this behavior is suppressed.

Reporting Problems

It's best to write a failing test if you find an issue. I will always accept pull requests with failing tests if they demonstrate intended behavior, but it is very hard to figure out what issue you're describing without a test. Writing a test is also the best way for you yourself to figure out if you really understand the issue you think you have with sax-js.

More Repositories

1

node-glob

glob functionality for node.js
TypeScript
8,123
star
2

rimraf

A `rm -rf` util for nodejs
JavaScript
5,309
star
3

node-lru-cache

A fast cache that automatically deletes the least recently used items
TypeScript
4,844
star
4

minimatch

a glob matcher in javascript
JavaScript
3,074
star
5

github

Just a place to track issues and feature requests that I have for github
2,209
star
6

nave

Virtual Environments for Node
Shell
1,580
star
7

node-graceful-fs

fs with incremental backoff on EMFILE
JavaScript
1,267
star
8

tshy

JavaScript
847
star
9

node-tar

tar for node
JavaScript
755
star
10

st

A node module for serving static files. Does etags, caching, etc.
JavaScript
376
star
11

inherits

Easy simple tiny inheritance in JavaScript
JavaScript
352
star
12

cluster-master

Take advantage of node built-in cluster module behavior
JavaScript
276
star
13

minipass

A stream implementation that does more by doing less
TypeScript
246
star
14

once

Run a function exactly one time
JavaScript
216
star
15

yallist

Yet Another Linked List
JavaScript
198
star
16

server-destroy

When close() is just not enough
JavaScript
184
star
17

ttlcache

TypeScript
155
star
18

semicolons

When you require("semicolons"), THEY ARE REQUIRED.
JavaScript
145
star
19

slide-flow-control

A flow control library that fits in a slideshow
JavaScript
134
star
20

treeverse

Walk any kind of tree structure depth- or breadth-first. Supports promises and advanced map-reduce operations with a very small API.
JavaScript
130
star
21

multipart-js

JavaScript
123
star
22

reading-list

a list of books I recommend
121
star
23

node-touch

touch(1) for node
JavaScript
121
star
24

catcher

TypeScript
119
star
25

async-cache

Cache your async lookups and don't fetch the same thing more than necessary.
JavaScript
119
star
26

core-util-is

The util.is* functions from Node core
JavaScript
98
star
27

dezalgo

Contain async insanity so that the dark pony lord doesn't eat souls
JavaScript
89
star
28

github-flavored-markdown

Deprecated. Use marked instead.
JavaScript
79
star
29

minizlib

A smaller, faster, zlib stream built on http://npm.im/minipass and Node.js's zlib binding.
JavaScript
71
star
30

node-bench

JavaScript
71
star
31

free-as-in-hugs-license

A (Not OSI-Approved) software license you may use if you wish
70
star
32

sigmund

Quick and dirty psychoanalysis for objects
JavaScript
67
star
33

inflight

Add callbacks to requests in flight to avoid async duplication
JavaScript
66
star
34

fast-list

A fast O(1) push/pop/shift/unshift thing
JavaScript
66
star
35

gist-cli

A gist cli client written in Node
JavaScript
64
star
36

dotfiles

My Dot Files
Shell
63
star
37

wrappy

Callback wrapping utility
JavaScript
56
star
38

block-stream

A stream of fixed-size blocks
JavaScript
52
star
39

isexe

Minimal module to check if a file is executable.
TypeScript
48
star
40

.vim

My vim settings
Vim Script
47
star
41

jackspeak

A very strict and proper argument parser.
TypeScript
44
star
42

char-spinner

Put a little spinner on process.stderr, as unobtrusively as possible.
JavaScript
43
star
43

st-example

an example of serving static files easily in node using the st module
JavaScript
40
star
44

templar

A lightweight template thing for node http servers
JavaScript
37
star
45

nosync

Prevent sync functions in your node programs after first tick
JavaScript
37
star
46

use-strict

Makes all subsequent modules in Node get loaded in strict mode.
JavaScript
37
star
47

path-scurry

TypeScript
35
star
48

ssh-key-decrypt

Decrypt and encrypted ssh private keys
JavaScript
35
star
49

ejsgi

Like JSGI, but using streams.
JavaScript
35
star
50

node-eliza

A Robotic Rogerian Therapist, on IRC
JavaScript
34
star
51

natives

Do stuff with Node.js's native JavaScript modules
JavaScript
31
star
52

goosh

Front-end old-style terminal interface, for web services like those provided by Google and Yahoo.
JavaScript
31
star
53

simple-node-server

A simple fast node http server toolkit.
JavaScript
30
star
54

util-extend

Node's internal object extension function, for you!
JavaScript
30
star
55

chownr

Like `chown -R`
JavaScript
28
star
56

csrf-lite

CSRF protection utility for framework-free node sites.
JavaScript
28
star
57

chmodr

Like `chmod -R` in node
JavaScript
28
star
58

node-hexedit

hexadecimal editor in node
JavaScript
27
star
59

back-to-markdown.css

Turns any markdown editor into a WYSIWYG editor
CSS
26
star
60

node-async-simple

Multiply two numbers, slowly, on the thread pool.
C++
26
star
61

json-stringify-nice

Stringify an object sorting scalars before objects, and defaulting to 2-space indent
JavaScript
25
star
62

node-strict

Makes your Node programs strict about stuff when loaded
JavaScript
25
star
63

fs.realpath

Use node's fs.realpath, but fall back to the JS implementation if the native one fails
JavaScript
25
star
64

sock-daemon

TypeScript
24
star
65

promise-all-reject-late

Like Promise.all, but save rejections until all promises are resolved
JavaScript
24
star
66

promise-call-limit

Call an array of promise-returning functions, restricting concurrency to a specified limit.
TypeScript
24
star
67

node6-module-system-change

A demonstration of what changed in node 6's module loading logic
JavaScript
24
star
68

color-support

A module which will endeavor to guess your terminal's level of color support.
JavaScript
24
star
69

polite-json

TypeScript
23
star
70

ircretary

A note-taking IRC bot
JavaScript
23
star
71

yamlish

A parser for the yamlish format
JavaScript
22
star
72

fs-minipass

fs read and write streams based on minipass
JavaScript
21
star
73

pseudomap

Like `new Map` but for older JavaScripts
JavaScript
21
star
74

node-fuse

Fuse bindings for nodejs
21
star
75

slocket

A locking socket alternative to file-system mutex locks
JavaScript
21
star
76

proto-list

A list of objects bound by prototype chain
JavaScript
20
star
77

retry-until

A function that will keep running a function you give it as long as it throws for a period of time
JavaScript
20
star
78

node-srand

srand bindings for node - Seedable predictable pseudorandom number generator
C++
20
star
79

mutate-fs

Mutate the Node.js filesystem behavior for tests.
JavaScript
20
star
80

ryp

Featureless npm-package bundling.
Shell
19
star
81

filewatcherthing

a thing to watch a file and then run a command
JavaScript
19
star
82

gatsby-remark-tumble-media

A plugin for gatsby-transformer-remark to support photosets, video, and audio in markdown frontmatter.
JavaScript
19
star
83

sodn

SOcial DNodes
JavaScript
19
star
84

joyent-node-on-smart-example

A blog post.
JavaScript
18
star
85

error-page

Easily send errors in Node.js HTTP servers. Think like the `ErrorDocument` declarations in Apache config files.
JavaScript
17
star
86

_ify

an itty bitty curry utility
JavaScript
17
star
87

url-parse-as-address

Parse a URL assuming that it's http/https, even if protocol or // isn't present
JavaScript
17
star
88

http-https

A wrapper that chooses http or https for requests
JavaScript
17
star
89

perfalize

TypeScript
16
star
90

cssmin

A cross-platform regular-expression based minifier for CSS
16
star
91

duplex-passthrough

like a passthrough, but in both directions
JavaScript
16
star
92

mintee

a tiny module for piping an input to multiple output streams
JavaScript
16
star
93

create-isaacs

An npm init module to create modules like I do
JavaScript
16
star
94

tap-assert

An assert module that outputs tap result objects
JavaScript
16
star
95

domain-http-server

A module thingie to use domains in Express or Restify or just regular HTTP servers
JavaScript
15
star
96

fs-readstream-seek

A fs.ReadStream that supports seeking to arbtrary locations within a file.
JavaScript
15
star
97

canonical-host

Node module to redirect users to the canonical hostname for your site.
JavaScript
15
star
98

mcouch

Put your CouchDB in Manta, attachments and docs and all
JavaScript
14
star
99

hardhttps

Slightly hardened https for node
JavaScript
14
star
100

exit-code

`process.exitCode` behavior back-ported from io.js and Node.js 0.12+
JavaScript
14
star