• Stars
    star
    3,289
  • Rank 13,647 (Top 0.3 %)
  • Language
    JavaScript
  • License
    MIT License
  • Created over 11 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A robust HTML entity encoder/decoder written in JavaScript.

he Build status Code coverage status Dependency status

he (for “HTML entities”) is a robust HTML entity encoder/decoder written in JavaScript. It supports all standardized named character references as per HTML, handles ambiguous ampersands and other edge cases just like a browser would, has an extensive test suite, and — contrary to many other JavaScript solutions — he handles astral Unicode symbols just fine. An online demo is available.

Installation

Via npm:

npm install he

Via Bower:

bower install he

Via Component:

component install mathiasbynens/he

In a browser:

<script src="he.js"></script>

In Node.js, io.js, Narwhal, and RingoJS:

var he = require('he');

In Rhino:

load('he.js');

Using an AMD loader like RequireJS:

require(
  {
    'paths': {
      'he': 'path/to/he'
    }
  },
  ['he'],
  function(he) {
    console.log(he);
  }
);

API

he.version

A string representing the semantic version number.

he.encode(text, options)

This function takes a string of text and encodes (by default) any symbols that aren’t printable ASCII symbols and &, <, >, ", ', and `, replacing them with character references.

he.encode('foo © bar ≠ baz 𝌆 qux');
// → 'foo &#xA9; bar &#x2260; baz &#x1D306; qux'

As long as the input string contains allowed code points only, the return value of this function is always valid HTML. Any (invalid) code points that cannot be represented using a character reference in the input are not encoded:

he.encode('foo \0 bar');
// → 'foo \0 bar'

However, enabling the strict option causes invalid code points to throw an exception. With strict enabled, he.encode either throws (if the input contains invalid code points) or returns a string of valid HTML.

The options object is optional. It recognizes the following properties:

useNamedReferences

The default value for the useNamedReferences option is false. This means that encode() will not use any named character references (e.g. &copy;) in the output — hexadecimal escapes (e.g. &#xA9;) will be used instead. Set it to true to enable the use of named references.

Note that if compatibility with older browsers is a concern, this option should remain disabled.

// Using the global default setting (defaults to `false`):
he.encode('foo © bar ≠ baz 𝌆 qux');
// → 'foo &#xA9; bar &#x2260; baz &#x1D306; qux'

// Passing an `options` object to `encode`, to explicitly disallow named references:
he.encode('foo © bar ≠ baz 𝌆 qux', {
  'useNamedReferences': false
});
// → 'foo &#xA9; bar &#x2260; baz &#x1D306; qux'

// Passing an `options` object to `encode`, to explicitly allow named references:
he.encode('foo © bar ≠ baz 𝌆 qux', {
  'useNamedReferences': true
});
// → 'foo &copy; bar &ne; baz &#x1D306; qux'

decimal

The default value for the decimal option is false. If the option is enabled, encode will generally use decimal escapes (e.g. &#169;) rather than hexadecimal escapes (e.g. &#xA9;). Beside of this replacement, the basic behavior remains the same when combined with other options. For example: if both options useNamedReferences and decimal are enabled, named references (e.g. &copy;) are used over decimal escapes. HTML entities without a named reference are encoded using decimal escapes.

// Using the global default setting (defaults to `false`):
he.encode('foo © bar ≠ baz 𝌆 qux');
// → 'foo &#xA9; bar &#x2260; baz &#x1D306; qux'

// Passing an `options` object to `encode`, to explicitly disable decimal escapes:
he.encode('foo © bar ≠ baz 𝌆 qux', {
  'decimal': false
});
// → 'foo &#xA9; bar &#x2260; baz &#x1D306; qux'

// Passing an `options` object to `encode`, to explicitly enable decimal escapes:
he.encode('foo © bar ≠ baz 𝌆 qux', {
  'decimal': true
});
// → 'foo &#169; bar &#8800; baz &#119558; qux'

// Passing an `options` object to `encode`, to explicitly allow named references and decimal escapes:
he.encode('foo © bar ≠ baz 𝌆 qux', {
  'useNamedReferences': true,
  'decimal': true
});
// → 'foo &copy; bar &ne; baz &#119558; qux'

encodeEverything

The default value for the encodeEverything option is false. This means that encode() will not use any character references for printable ASCII symbols that don’t need escaping. Set it to true to encode every symbol in the input string. When set to true, this option takes precedence over allowUnsafeSymbols (i.e. setting the latter to true in such a case has no effect).

// Using the global default setting (defaults to `false`):
he.encode('foo © bar ≠ baz 𝌆 qux');
// → 'foo &#xA9; bar &#x2260; baz &#x1D306; qux'

// Passing an `options` object to `encode`, to explicitly encode all symbols:
he.encode('foo © bar ≠ baz 𝌆 qux', {
  'encodeEverything': true
});
// → '&#x66;&#x6F;&#x6F;&#x20;&#xA9;&#x20;&#x62;&#x61;&#x72;&#x20;&#x2260;&#x20;&#x62;&#x61;&#x7A;&#x20;&#x1D306;&#x20;&#x71;&#x75;&#x78;'

// This setting can be combined with the `useNamedReferences` option:
he.encode('foo © bar ≠ baz 𝌆 qux', {
  'encodeEverything': true,
  'useNamedReferences': true
});
// → '&#x66;&#x6F;&#x6F;&#x20;&copy;&#x20;&#x62;&#x61;&#x72;&#x20;&ne;&#x20;&#x62;&#x61;&#x7A;&#x20;&#x1D306;&#x20;&#x71;&#x75;&#x78;'

strict

The default value for the strict option is false. This means that encode() will encode any HTML text content you feed it, even if it contains any symbols that cause parse errors. To throw an error when such invalid HTML is encountered, set the strict option to true. This option makes it possible to use he as part of HTML parsers and HTML validators.

// Using the global default setting (defaults to `false`, i.e. error-tolerant mode):
he.encode('\x01');
// → '&#x1;'

// Passing an `options` object to `encode`, to explicitly enable error-tolerant mode:
he.encode('\x01', {
  'strict': false
});
// → '&#x1;'

// Passing an `options` object to `encode`, to explicitly enable strict mode:
he.encode('\x01', {
  'strict': true
});
// → Parse error

allowUnsafeSymbols

The default value for the allowUnsafeSymbols option is false. This means that characters that are unsafe for use in HTML content (&, <, >, ", ', and `) will be encoded. When set to true, only non-ASCII characters will be encoded. If the encodeEverything option is set to true, this option will be ignored.

he.encode('foo © and & ampersand', {
  'allowUnsafeSymbols': true
});
// → 'foo &#xA9; and & ampersand'

Overriding default encode options globally

The global default setting can be overridden by modifying the he.encode.options object. This saves you from passing in an options object for every call to encode if you want to use the non-default setting.

// Read the global default setting:
he.encode.options.useNamedReferences;
// → `false` by default

// Override the global default setting:
he.encode.options.useNamedReferences = true;

// Using the global default setting, which is now `true`:
he.encode('foo © bar ≠ baz 𝌆 qux');
// → 'foo &copy; bar &ne; baz &#x1D306; qux'

he.decode(html, options)

This function takes a string of HTML and decodes any named and numerical character references in it using the algorithm described in section 12.2.4.69 of the HTML spec.

he.decode('foo &copy; bar &ne; baz &#x1D306; qux');
// → 'foo © bar ≠ baz 𝌆 qux'

The options object is optional. It recognizes the following properties:

isAttributeValue

The default value for the isAttributeValue option is false. This means that decode() will decode the string as if it were used in a text context in an HTML document. HTML has different rules for parsing character references in attribute values — set this option to true to treat the input string as if it were used as an attribute value.

// Using the global default setting (defaults to `false`, i.e. HTML text context):
he.decode('foo&ampbar');
// → 'foo&bar'

// Passing an `options` object to `decode`, to explicitly assume an HTML text context:
he.decode('foo&ampbar', {
  'isAttributeValue': false
});
// → 'foo&bar'

// Passing an `options` object to `decode`, to explicitly assume an HTML attribute value context:
he.decode('foo&ampbar', {
  'isAttributeValue': true
});
// → 'foo&ampbar'

strict

The default value for the strict option is false. This means that decode() will decode any HTML text content you feed it, even if it contains any entities that cause parse errors. To throw an error when such invalid HTML is encountered, set the strict option to true. This option makes it possible to use he as part of HTML parsers and HTML validators.

// Using the global default setting (defaults to `false`, i.e. error-tolerant mode):
he.decode('foo&ampbar');
// → 'foo&bar'

// Passing an `options` object to `decode`, to explicitly enable error-tolerant mode:
he.decode('foo&ampbar', {
  'strict': false
});
// → 'foo&bar'

// Passing an `options` object to `decode`, to explicitly enable strict mode:
he.decode('foo&ampbar', {
  'strict': true
});
// → Parse error

Overriding default decode options globally

The global default settings for the decode function can be overridden by modifying the he.decode.options object. This saves you from passing in an options object for every call to decode if you want to use a non-default setting.

// Read the global default setting:
he.decode.options.isAttributeValue;
// → `false` by default

// Override the global default setting:
he.decode.options.isAttributeValue = true;

// Using the global default setting, which is now `true`:
he.decode('foo&ampbar');
// → 'foo&ampbar'

he.escape(text)

This function takes a string of text and escapes it for use in text contexts in XML or HTML documents. Only the following characters are escaped: &, <, >, ", ', and `.

he.escape('<img src=\'x\' onerror="prompt(1)">');
// → '&lt;img src=&#x27;x&#x27; onerror=&quot;prompt(1)&quot;&gt;'

he.unescape(html, options)

he.unescape is an alias for he.decode. It takes a string of HTML and decodes any named and numerical character references in it.

Using the he binary

To use the he binary in your shell, simply install he globally using npm:

npm install -g he

After that you will be able to encode/decode HTML entities from the command line:

$ he --encode 'föo ♥ bår 𝌆 baz'
f&#xF6;o &#x2665; b&#xE5;r &#x1D306; baz

$ he --encode --use-named-refs 'föo ♥ bår 𝌆 baz'
f&ouml;o &hearts; b&aring;r &#x1D306; baz

$ he --decode 'f&ouml;o &hearts; b&aring;r &#x1D306; baz'
föo ♥ bår 𝌆 baz

Read a local text file, encode it for use in an HTML text context, and save the result to a new file:

$ he --encode < foo.txt > foo-escaped.html

Or do the same with an online text file:

$ curl -sL "http://git.io/HnfEaw" | he --encode > escaped.html

Or, the opposite — read a local file containing a snippet of HTML in a text context, decode it back to plain text, and save the result to a new file:

$ he --decode < foo-escaped.html > foo.txt

Or do the same with an online HTML snippet:

$ curl -sL "http://git.io/HnfEaw" | he --decode > decoded.txt

See he --help for the full list of options.

Support

he has been tested in at least:

  • Chrome 27-50
  • Firefox 3-45
  • Safari 4-9
  • Opera 10-12, 15–37
  • IE 6–11
  • Edge
  • Narwhal 0.3.2
  • Node.js v0.10, v0.12, v4, v5
  • PhantomJS 1.9.0
  • Rhino 1.7RC4
  • RingoJS 0.8-0.11

Unit tests & code coverage

After cloning this repository, run npm install to install the dependencies needed for he development and testing. You may want to install Istanbul globally using npm install istanbul -g.

Once that’s done, you can run the unit tests in Node using npm test or node tests/tests.js. To run the tests in Rhino, Ringo, Narwhal, and web browsers as well, use grunt test.

To generate the code coverage report, use grunt cover.

Acknowledgements

Thanks to Simon Pieters (@zcorpan) for the many suggestions.

Author

twitter/mathias
Mathias Bynens

License

he is available under the MIT license.

More Repositories

1

dotfiles

🔧 .files, including ~/.macos — sensible hacker defaults for macOS
Shell
29,301
star
2

jquery-placeholder

A jQuery plugin that enables HTML5 placeholder behavior for browsers that aren’t trying hard enough yet
JavaScript
3,983
star
3

evil.sh

🙊 Subtle and not-so-subtle shell tweaks that will slowly drive people insane.
Shell
2,159
star
4

small

Smallest possible syntactically valid files of different types
HTML
1,900
star
5

emoji-regex

A regular expression to match all Emoji-only symbols as per the Unicode Standard.
JavaScript
1,641
star
6

punycode.js

A robust Punycode converter that fully complies to RFC 3492 and RFC 5891.
JavaScript
1,479
star
7

mothereff.in

Web developer tools
JavaScript
1,024
star
8

esrever

A Unicode-aware string reverser written in JavaScript.
JavaScript
878
star
9

jsesc

Given some data, jsesc returns the shortest possible stringified & ASCII-safe representation of that data.
JavaScript
683
star
10

utf8.js

A robust JavaScript implementation of a UTF-8 encoder/decoder, as defined by the Encoding Standard.
JavaScript
539
star
11

base64

A robust base64 encoder/decoder that is fully compatible with `atob()` and btoa()`, written in JavaScript.
JavaScript
491
star
12

CSS.escape

A robust polyfill for the CSS.escape utility method as defined in CSSOM.
JavaScript
486
star
13

jsperf.com

jsPerf.com source code
JavaScript
473
star
14

regenerate

Generate JavaScript-compatible regular expressions based on a given set of Unicode symbols or code points.
JavaScript
353
star
15

php-url-shortener

Simple PHP URL shortener, as used on mths.be
PHP
334
star
16

regexpu

A source code transpiler that enables the use of ES2015 Unicode regular expressions in ES5.
JavaScript
226
star
17

tpyo

A small script that enables you to make typos in JavaScript property names. Powered by ES2015 proxies + Levenshtein string distance.
JavaScript
205
star
18

luamin

A Lua minifier written in JavaScript
JavaScript
188
star
19

cssesc

A JavaScript library for escaping CSS strings and identifiers while generating the shortest possible ASCII-only output.
HTML
150
star
20

String.prototype.startsWith

A robust & optimized ES3-compatible polyfill for the `String.prototype.startsWith` method in ECMAScript 6.
JavaScript
143
star
21

grunt-template

This Grunt plugin interpolates template files with any data you provide and saves the result to another file.
JavaScript
136
star
22

document.scrollingElement

A polyfill for document.scrollingElement as defined in the CSSOM specification.
JavaScript
131
star
23

jquery-visibility

Page Visibility shim for jQuery
JavaScript
129
star
24

jquery-details

World’s first <details>/<summary> polyfill™
HTML
121
star
25

rel-noopener

Quick demonstration of why `<a rel=noopener>` is needed.
HTML
114
star
26

quoted-printable

A robust & character encoding–agnostic JavaScript implementation of the `Quoted-Printable` content transfer encoding as defined by RFC 2045.
JavaScript
88
star
27

grunt-zopfli

A Grunt plugin for compressing files using Zopfli.
JavaScript
87
star
28

emoji-test-regex-pattern

A regular expression pattern for Java/JavaScript to match all emoji in the emoji-test.txt file provided by UTS#51.
JavaScript
79
star
29

jquery-smooth-scrolling

Smooth anchor scrolling plugin for jQuery.
JavaScript
73
star
30

String.prototype.includes

A robust & optimized ES3-compatible polyfill for the `String.prototype.contains` method in ECMAScript 6.
JavaScript
69
star
31

Array.from

A robust & optimized ES3-compatible polyfill for the `Array.from` method in ECMAScript 6.
JavaScript
66
star
32

regexpu-core

regexpu’s core functionality, i.e. `rewritePattern(pattern, flag, options)`, which enables rewriting regular expressions that make use of the ES6 `u` flag into equivalent ES5-compatible regular expression patterns.
JavaScript
63
star
33

jquery-slideshow

The simplest jQuery slideshow plugin. Evar.
JavaScript
61
star
34

String.fromCodePoint

A robust & optimized `String.fromCodePoint` polyfill, based on the ECMAScript 6 specification.
JavaScript
61
star
35

hashtag-regex

A regular expression to match hashtag identifiers as per the Unicode Standard.
JavaScript
60
star
36

custom.keylayout

Custom QWERTY/AZERTY .keylayout files for use with Apple keyboards
59
star
37

unicode-data

Python scripts that generate JavaScript-compatible Unicode data
JavaScript
59
star
38

grunt-yui-compressor

A Grunt plugin for compressing JavaScript and CSS files using YUI Compressor.
JavaScript
59
star
39

String.prototype.at

A robust & optimized ES3-compatible polyfill for the `String.prototype.at` proposal for ECMAScript 6/7.
JavaScript
55
star
40

String.prototype.codePointAt

A robust & optimized `String.prototype.codePointAt` polyfill, based on the ECMAScript 6 specification.
JavaScript
55
star
41

covid-19-vaccinations-germany

Historical data on COVID-19 vaccination doses administered in Germany, per state.
HTML
54
star
42

windows-1252

A robust JavaScript implementation of the windows-1252 character encoding as defined by the Encoding Standard.
JavaScript
44
star
43

flag-emoji-replacements

'🇩🇰🇲🇬'.replace('🇰🇲', '🇪🇨'); // → '🇩🇪🇨🇬'
JavaScript
38
star
44

unicode-tr51

Emoji data extracted from Unicode Technical Report #51.
JavaScript
38
star
45

String.prototype.endsWith

A robust & optimized ES3-compatible polyfill for the `String.prototype.endsWith` method in ECMAScript 6.
JavaScript
35
star
46

wtf-8

A well-tested WTF-8 encoder/decoder written in JavaScript.
JavaScript
34
star
47

caniunicode

Unicode version support across JavaScript features & engines
JavaScript
32
star
48

math-tex

A web component for mathematical typesetting using TeX notation.
HTML
27
star
49

jquery-noselect

A jQuery plugin which disables text selection on any element. Useful for UI elements; evil for pretty much everything else.
JavaScript
27
star
50

String.prototype.repeat

A robust & optimized ES3-compatible polyfill for the `String.prototype.repeat` method in ECMAScript 6.
JavaScript
27
star
51

windows-1251

A robust JavaScript implementation of the windows-1251 character encoding as defined by the Encoding Standard.
JavaScript
26
star
52

kali-linux-docker

Kali Linux Docker
Shell
26
star
53

bacon-cipher

A robust JavaScript implementation of Bacon’s cipher, a.k.a. the Baconian cipher.
JavaScript
24
star
54

jquery-custom-data-attributes

An easy setter/getter for HTML5 data-* attributes
JavaScript
21
star
55

rgi-emoji-regex-pattern

A JavaScript-compatible regular expression pattern to match all RGI emoji symbols and sequences as per the Unicode Standard and UTS#51.
JavaScript
21
star
56

q-encoding

A robust & character encoding–agnostic JavaScript implementation of the `Q` encoding as defined by RFC 2047.
JavaScript
20
star
57

babel-plugin-transform-unicode-property-regex

Compile Unicode property escapes in Unicode regular expressions to ES5 or ES6 that works in today’s environments.
JavaScript
19
star
58

regenerate-unicode-properties

A collection of Regenerate sets for Unicode various properties.
JavaScript
17
star
59

rot

Perform simple rotational letter substitution (such as ROT-13) in JavaScript.
JavaScript
17
star
60

regex-trie-cli

Create regular expression patterns based on a list of strings to be matched.
JavaScript
17
star
61

strip-combining-marks

Easily remove Unicode combining marks from strings.
JavaScript
16
star
62

Array.of

A robust & optimized ES3-compatible polyfill for the `Array.of` method in ECMAScript 6.
JavaScript
15
star
63

jquery-oninput

My `oninput` polyfill as a jQuery plugin
JavaScript
14
star
64

homebrew-ecmascript

Homebrew formulae for ECMAScript engines
Ruby
13
star
65

tibia.com-extension

User script that enhances the character info pages on Tibia.com.
HTML
13
star
66

is-ascii-safe

is-ascii-safe determines whether a given string is ASCII-safe, i.e. if it consists of ASCII characters (U+0000 to U+007F) only.
JavaScript
13
star
67

es-regexp-unicode-character-class-escapes

Proposal to improve the character class escape tokens `\d`, `\D`, `\w`, `\W`, and the word boundary assertions `\b` and `\B` in ES6 Unicode regular expressions (with the `u` flag).
12
star
68

unicode-canonical-property-names-ecmascript

The set of canonical Unicode property names supported in ECMAScript RegExp property escapes.
JavaScript
11
star
69

node-unshorten

URL unshortener for Node.js
JavaScript
11
star
70

RegExp.prototype.match

A robust & optimized ES3-compatible polyfill for the `RegExp.prototype.match` method in ECMAScript 6.
JavaScript
10
star
71

is-potential-custom-element-name

Check whether a given string matches the `PotentialCustomElementName` production as defined in the HTML Standard.
JavaScript
10
star
72

atom-blackboard

TextMate’s Blackboard theme, ported to Atom.
CSS
10
star
73

unicode-emoji-modifier-base

The set of Unicode symbols that can serve as a base for emoji modifiers, i.e. those with the `Emoji_Modifier_Base` property set to `Yes`.
JavaScript
9
star
74

strip-variation-selectors

Remove Unicode variation selectors from strings.
JavaScript
9
star
75

nginx-zopfli-test

This repository contains some files that make it easy to test whether Nginx is correctly serving Zopfli-pre-compressed files.
JavaScript
9
star
76

unicode-property-escapes-tests

Tests for RegExp Unicode property escapes
JavaScript
8
star
77

unicode-match-property-value-ecmascript

Match a Unicode property or property alias to its canonical property name per the algorithm used for RegExp Unicode property escapes in ECMAScript.
JavaScript
8
star
78

grunt-esmangle

A Grunt plugin for mangling or minifying JavaScript files using Esmangle.
JavaScript
8
star
79

unicode-property-value-aliases

Unicode property value alias mappings in JavaScript format.
JavaScript
7
star
80

css-dbg-stories

HTML
7
star
81

unicode-property-aliases-ecmascript

Unicode property alias mappings in JavaScript format for property names that are supported in ECMAScript RegExp property escapes.
JavaScript
7
star
82

unicode-property-aliases

Unicode property alias mappings in JavaScript format.
JavaScript
7
star
83

unicode-match-property-ecmascript

Match a given Unicode property or property alias to its canonical property name per the algorithm used for RegExp Unicode property escapes in ECMAScript.
JavaScript
7
star
84

iso-8859-2

A robust JavaScript implementation of the iso-8859-2 character encoding as defined by the Encoding Standard.
JavaScript
6
star
85

string-prototype-replace-regexp-benchmark

Generated JavaScript benchmarks for String.prototype.{replace,replaceAll} with global regular expressions based on emoji-test-regex-pattern.
JavaScript
6
star
86

idn-allowed-code-points-regex

A regular expression that matches any of the code points that Verisign allows by default in IDN.
JavaScript
6
star
87

pogotransfercalc

Easily calculate how many Pokémon you should transfer before kicking off an evolution spree in Pokémon GO.
Python
6
star
88

macintosh

A robust JavaScript implementation of the macintosh character encoding as defined by the Encoding Standard.
JavaScript
6
star
89

windows-874

A robust JavaScript implementation of the windows-874 character encoding as defined by the Encoding Standard.
JavaScript
5
star
90

swapcase

A letter case swapper with full Unicode support, i.e. based on the official Unicode case folding mappings.
JavaScript
5
star
91

RegExp.prototype.search

A robust & optimized ES3-compatible polyfill for the `RegExp.prototype.search` method in ECMAScript 6.
JavaScript
5
star
92

netlify-test

HTML
4
star
93

pogocpm2level

Easily calculate the level of a given Pokémon in Pokémon GO based on its total CP multiplier value.
Python
4
star
94

tibia-bosses

JavaScript
4
star
95

covid-19-vaccinations-munich

Archive of historical coronavirus data for Munich, Germany
HTML
4
star
96

windows-1250

A robust JavaScript implementation of the windows-1250 character encoding as defined by the Encoding Standard.
JavaScript
4
star
97

stack-exchange-logos

Stack Exchange logos in SVG format.
HTML
4
star
98

gulp-regexpu

Gulp plugin to transpile ES6 Unicode regular expressions to ES5 with regexpu.
JavaScript
4
star
99

is-ascii-safe-cli

is-ascii-safe-cli checks whether a given file (or list of files) is ASCII-safe, i.e. consisting of ASCII characters (U+0000 to U+007F) only.
JavaScript
4
star
100

windows-1257

A robust JavaScript implementation of the windows-1257 character encoding as defined by the Encoding Standard.
JavaScript
4
star