• Stars
    star
    862
  • Rank 52,908 (Top 2 %)
  • Language
  • Created over 9 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Universal Syntax Tree used by @unifiedjs

unist

Universal Syntax Tree.


unist is a specification for syntax trees. It has a big ecosystem of utilities in JavaScript for working with these trees. It’s implemented by several other specifications.

This document may not be released. See releases for released documents. The latest released version is 3.0.0.

Contents

Intro

This document defines a general-purpose format for syntax trees. Development of unist started in July 2015. This specification is written in a Web IDL-like grammar.

Syntax tree

Syntax trees are representations of source code or even natural language. These trees are abstractions that make it possible to analyze, transform, and generate code.

Syntax trees come in two flavors:

  • concrete syntax trees: structures that represent every detail (such as white-space in white-space insensitive languages)
  • abstract syntax trees: structures that only represent details relating to the syntactic structure of code (such as ignoring whether a double or single quote was used in languages that support both, such as JavaScript).

This specification can express both abstract and concrete syntax trees.

Where this specification fits

unist is not intended to be self-sufficient. Instead, it is expected that other specifications implement unist and extend it to express language specific nodes. For example, see projects such as hast (for HTML), nlcst (for natural language), mdast (for Markdown), and xast (for XML).

unist relates to JSON in that compliant syntax trees can be expressed completely in JSON. However, unist is not limited to JSON and can be expressed in other data formats, such as XML.

unist relates to JavaScript in that it has a rich ecosystem of utilities for working with compliant syntax trees in JavaScript. The five most used utilities combined are downloaded thirty million times each month. However, unist is not limited to JavaScript and can be used in other programming languages.

unist relates to the unified, remark, rehype, and retext projects in that unist syntax trees are used throughout their ecosystems.

unist relates to the vfile project in that it accepts unist nodes for its message store, and that vfile can be a source file of a syntax tree.

Types

If you are using TypeScript, you can use the unist types by installing them with npm:

npm install @types/unist

Nodes

Syntactic units in unist syntax trees are called nodes, and implement the Node interface.

Node

interface Node {
  type: string
  data: Data?
  position: Position?
}

The type field is a non-empty string representing the variant of a node. This field can be used to determine the type a node implements.

The data field represents information from the ecosystem. The value of the data field implements the Data interface.

The position field represents the location of a node in a source document. The value of the position field implements the Position interface. The position field must not be present if a node is generated.

Specifications implementing unist are encouraged to define more fields. Ecosystems can define fields on Data.

Any value in unist must be expressible in JSON values: string, number, object, array, true, false, or null. This means that the syntax tree should be able to be converted to and from JSON and produce the same tree. For example, in JavaScript, a tree can be passed through JSON.parse(JSON.stringify(tree)) and result in the same tree.

Position

interface Position {
  start: Point
  end: Point
}

Position represents the location of a node in a source file.

The start field of Position represents the place of the first character of the parsed source region. The end field of Position represents the place of the first character after the parsed source region, whether it exists or not. The value of the start and end fields implement the Point interface.

If the syntactic unit represented by a node is not present in the source file at the time of parsing, the node is said to be generated and it must not have positional information.

For example, if the following value was represented as unist:

alpha
bravo

…the first word (alpha) would start at line 1, column 1, offset 0, and end at line 1, column 6, offset 5. The line feed would start at line 1, column 6, offset 5, and end at line 2, column 1, offset 6. The last word (bravo) would start at line 2, column 1, offset 6, and end at line 2, column 6, offset 11.

Point

interface Point {
  line: number >= 1
  column: number >= 1
  offset: number >= 0?
}

Point represents one place in a source file.

The line field (1-indexed integer) represents a line in a source file. The column field (1-indexed integer) represents a column in a source file. The offset field (0-indexed integer) represents a character in a source file.

The term character means a (UTF-16) code unit which is defined in the Web IDL specification.

Data

interface Data { }

Data represents information associated by the ecosystem with the node.

This space is guaranteed to never be specified by unist or specifications implementing unist.

Parent

interface Parent <: Node {
  children: [Node]
}

Nodes containing other nodes (said to be children) extend the abstract interface Parent (Node).

The children field is a list representing the children of a node.

Literal

interface Literal <: Node {
  value: any
}

Nodes containing a value extend the abstract interface Literal (Node).

The value field can contain any value.

Glossary

Tree

A tree is a node and all of its descendants (if any).

Child

Node X is child of node Y, if Y’s children include X.

Parent

Node X is parent of node Y, if Y is a child of X.

Index

The index of a child is its number of preceding siblings, or 0 if it has none.

Sibling

Node X is a sibling of node Y, if X and Y have the same parent (if any).

The previous sibling of a child is its sibling at its index minus 1.

The next sibling of a child is its sibling at its index plus 1.

Root

The root of a node is itself, if without parent, or the root of its parent.

The root of a tree is any node in that tree without parent.

Descendant

Node X is descendant of node Y, if X is a child of Y, or if X is a child of node Z that is a descendant of Y.

An inclusive descendant is a node or one of its descendants.

Ancestor

Node X is an ancestor of node Y, if Y is a descendant of X.

An inclusive ancestor is a node or one of its ancestors.

Head

The head of a node is its first child (if any).

Tail

The tail of a node is its last child (if any).

Leaf

A leaf is a node with no children.

Branch

A branch is a node with one or more children.

Generated

A node is generated if it does not have positional information.

Type

The type of a node is the value of its type field.

Positional information

The positional information of a node is the value of its position field.

File

A file is a source document that represents the original file that was parsed to produce the syntax tree. Positional information represents the place of a node in this file. Files are provided by the host environment and not defined by unist.

For example, see projects such as vfile.

Preorder

In preorder (NLR) is depth-first tree traversal that performs the following steps for each node N:

  1. N: visit N itself
  2. L: traverse head (then its next sibling, recursively moving forward until reaching tail)
  3. R: traverse tail
Postorder

In postorder (LRN) is depth-first tree traversal that performs the following steps for each node N:

  1. L: traverse head (then its next sibling, recursively moving forward until reaching tail)
  2. R: traverse tail
  3. N: visit N itself
Enter

Enter is a step right before other steps performed on a given node N when traversing a tree.

For example, when performing preorder traversal, enter is the first step taken, right before visiting N itself.

Exit

Exit is a step right after other steps performed on a given node N when traversing a tree.

For example, when performing preorder traversal, exit is the last step taken, right after traversing the tail of N.

Tree traversal

Tree traversal is a common task when working with a tree to search it. Tree traversal is typically either breadth-first or depth-first.

In the following examples, we’ll work with this tree:

graph TD
    A-->B-->C
        B-->D
        B-->E
    A-->F-->G
Breadth-first traversal

Breadth-first traversal is visiting a node and all its siblings to broaden the search at that level, before traversing children.

For the syntax tree defined in the diagram, a breadth-first traversal first searches the root of the tree (A), then its children (B and F), then their children (C, D, E, and G).

Depth-first traversal

Alternatively, and more commonly, depth-first traversal is used. The search is first deepened, by traversing children, before traversing siblings.

For the syntax tree defined in the diagram, a depth-first traversal first searches the root of the tree (A), then one of its children (B or F), then their children (C, D, and E, or G).

For a given node N with children, a depth-first traversal performs three steps, simplified to only binary trees (every node has head and tail, but no other children):

  • N: visit N itself
  • L: traverse head
  • R: traverse tail

These steps can be done in any order, but for non-binary trees, L and R occur together. If L is done before R, the traversal is called left-to-right traversal, otherwise it is called right-to-left traversal. In the case of non-binary trees, the other children between head and tail are processed in that order as well, so for left-to-right traversal, first head is traversed (L), then its next sibling is traversed, etcetera, until finally tail (R) is traversed.

Because L and R occur together for non-binary trees, we can produce four types of orders: NLR, NRL, LRN, RLN.

NLR and LRN (the two left-to-right traversal options) are most commonly used and respectively named preorder and postorder.

For the syntax tree defined in the diagram, preorder and postorder traversal thus first search the root of the tree (A), then its head (B), then its children from left-to-right (C, D, and then E). After all descendants of B are traversed, its next sibling (F) is traversed and then finally its only child (G).

Utilities

Utilities are functions that work with nodes.

There are several projects that deal with nodes from specifications implementing unist:

List of utilities

References

Contribute

See contributing.md in syntax-tree/.github for ways to get started. See support.md for ways to get help.

A curated list of awesome syntax-tree, unist, hast, xast, mdast, and nlcst resources can be found in awesome syntax-tree.

This project has a code of conduct. By interacting with this repository, organization, or community you agree to abide by its terms.

Acknowledgments

The initial release of this project was authored by @wooorm.

Special thanks to @eush77 for their work, ideas, and incredibly valuable feedback! Thanks to @anandthakker, @anko, @arobase-che, @azu, @BarryThePenguin, @ben-eb, @blahah, @blakeembrey, @brainkim, @ChristianMurphy, @davidtheclark, @denysdovhan, @derhuerst, @dozoisch, @fazouane-marouane, @gibson042, @hrajchert, @ikatyang, @inklesspen, @izumin5210, @jasonLaster, @JDvorak, @jlevy, @justjake, @kmck, @kt3k, @KyleAMathews, @luca3m, @mattdesl, @muraken720, @mrzmmr, @nwtn, @rhysd, @Rokt33r, @Sarah-Seo, @sethvincent, @shawnbot, @simov, @staltz, @TitanSnow, @tmcw, and @vhf, for contributing to unist and related projects!

License

CC-BY-4.0 Β© Titus Wormer

More Repositories

1

mdast

Markdown Abstract Syntax Tree format
1,058
star
2

hast

Hypertext Abstract Syntax Tree format
735
star
3

unist-util-visit

utility to visit nodes
JavaScript
269
star
4

mdast-util-from-markdown

mdast utility to parse markdown
JavaScript
202
star
5

nlcst

Natural Language Concrete Syntax Tree format
200
star
6

hastscript

utility to create hast trees
JavaScript
160
star
7

awesome-syntax-tree

Curated list of awesome syntax-tree, unist, hast, xast, mdast, esast, nlcst resources
105
star
8

mdast-util-to-hast

utility to transform mdast to hast
JavaScript
100
star
9

mdast-util-to-markdown

mdast utility to serialize markdown
JavaScript
92
star
10

hast-util-to-html

utility to serialize hast to HTML
JavaScript
81
star
11

mdast-util-toc

utility to generate a table of contents from an mdast tree
JavaScript
80
star
12

unist-builder

utility to create a new trees with a nice syntax
JavaScript
73
star
13

unist-util-visit-parents

utility to recursively walk over unist nodes, with ancestral information
JavaScript
65
star
14

xast

Extensible Abstract Syntax Tree
63
star
15

unist-util-select

utility to select unist nodes with CSS-like selectors
JavaScript
59
star
16

hast-util-reading-time

utility to estimate the reading time
JavaScript
56
star
17

hast-util-to-jsx-runtime

hast utility to transform to preact, react, solid, svelte, vue, etc
JavaScript
51
star
18

hast-util-sanitize

utility to sanitize hast nodes
HTML
49
star
19

esast

ECMAScript Abstract Syntax Tree format
47
star
20

hast-to-hyperscript

Legacy utility to transform hast to something else
46
star
21

mdast-util-to-string

utility to get the plain text content of an mdast node
JavaScript
37
star
22

hast-util-select

utility to add `querySelector`, `querySelectorAll`, and `matches` support for hast
JavaScript
37
star
23

hast-util-to-mdast

utility to transform hast (HTML) to mdast (markdown)
JavaScript
37
star
24

unist-util-is

utility to check if a node passes a test
JavaScript
36
star
25

unist-util-map

utility to create a new tree by mapping all nodes
JavaScript
35
star
26

hast-util-from-html

hast utility to parse from HTML
JavaScript
28
star
27

unist-util-inspect

utility to inspect nodes
JavaScript
25
star
28

mdast-util-mdx-jsx

mdast extension to parse and serialize MDX JSX
JavaScript
23
star
29

estree-util-build-jsx

Transform JSX in estrees to function calls (for react, preact, and most hyperscript interfaces)
JavaScript
22
star
30

hast-util-from-dom

utility to transform a DOM tree to hast
JavaScript
22
star
31

unist-util-find

utility to find a node
JavaScript
21
star
32

unist-util-remove

utility to remove nodes from a tree
JavaScript
21
star
33

hast-util-to-text

utility to get the plain-text value of a node according to the `innerText` algorithm
JavaScript
19
star
34

hast-util-to-estree

hast utility to transform to estree (JavaScript AST) JSX
JavaScript
19
star
35

hast-util-to-dom

utility to transform hast to a DOM tree
JavaScript
19
star
36

nlcst-to-string

utility to transform an nlcst tree to a string
JavaScript
18
star
37

mdast-util-gfm

mdast extension to parse and serialize GFM (GitHub Flavored Markdown)
JavaScript
18
star
38

mdast-util-mdx

mdast extension to parse and serialize MDX (or MDX.js)
JavaScript
18
star
39

nlcst-search

utility to search for patterns in an nlcst tree
JavaScript
16
star
40

unist-util-filter

utility to create a new tree with nodes that pass a filter
JavaScript
16
star
41

hast-util-from-parse5

utility to transform Parse5’s AST to hast
JavaScript
16
star
42

mdast-util-math

mdast extension to parse and serialize math
JavaScript
15
star
43

mdast-zone

utility to treat HTML comments as ranges or markers in mdast
JavaScript
15
star
44

mdast-util-frontmatter

mdast extensions to parse and serialize frontmatter (YAML, TOML, etc)
JavaScript
15
star
45

unist-diff

Diff two unist trees
JavaScript
13
star
46

xast-util-from-xml

utility to parse from XML
JavaScript
13
star
47

hast-util-parse-selector

utility to create an element from a simple CSS selector
JavaScript
13
star
48

mdast-util-directive

mdast extension to parse and serialize generic directives (`:cite[smith04]`)
JavaScript
12
star
49

estree-util-to-js

estree (and esast) utility to serialize as JavaScript
JavaScript
12
star
50

hast-util-is-element

utility to check if a node is a (certain) element
JavaScript
12
star
51

unist-util-remove-position

utility to remove positions from a tree
JavaScript
12
star
52

mdast-util-gfm-table

mdast extension to parse and serialize GFM tables
JavaScript
12
star
53

xastscript

utility to create xast trees
JavaScript
11
star
54

unist-util-modify-children

utility to modify direct children of a parent
JavaScript
11
star
55

mdast-util-find-and-replace

mdast utility to find and replace text in a tree
JavaScript
11
star
56

esast-util-from-js

estree (and esast) utility to parse from JavaScript
JavaScript
11
star
57

mdast-util-heading-range

utility to use headings as ranges in mdast
JavaScript
10
star
58

hast-util-raw

utility to reparse a hast tree
JavaScript
10
star
59

mdast-util-definitions

utility to find definition nodes in an mdast tree
JavaScript
10
star
60

xast-util-feed

xast utility to build feeds (rss, atom)
JavaScript
9
star
61

mdast-util-to-nlcst

utility to transform mdast to nlcst
JavaScript
9
star
62

mdast-comment-marker

utility to parse a comment marker in mdast
JavaScript
9
star
63

ideas

Share ideas for new utilities and tools built with @syntax-tree
9
star
64

nlcst-is-literal

utility to check whether an nlcst node is meant literally
JavaScript
9
star
65

unist-util-index

utility to index property values or computed keys to nodes
JavaScript
8
star
66

unist-util-visit-children

unist utility to visit direct children of a parent
JavaScript
8
star
67

unist-util-position

utility to get the position of a node
JavaScript
8
star
68

mdast-util-mdxjs-esm

mdast extension to parse and serialize MDX.js ESM (import/exports)
JavaScript
8
star
69

mdast-util-gfm-autolink-literal

mdast extension to parse and serialize GFM autolink literals
JavaScript
8
star
70

hast-util-from-html-isomorphic

hast utility that turns HTML into a syntax tree (while being small in browsers)
JavaScript
8
star
71

unist-builder-blueprint

utility to transform trees to unist-builder notation
7
star
72

estree-util-attach-comments

utility to attach comments to estree nodes
JavaScript
7
star
73

hast-util-heading-rank

utility to get the rank (or depth, level) of headings
JavaScript
7
star
74

nlcst-normalize

utility to normalize an nlcst word for easier comparison
JavaScript
7
star
75

mdast-util-compact

utility to make an mdast tree compact
JavaScript
7
star
76

.github

Community health files for syntax-tree, unist, hast, xast, mdast, and nlcst
6
star
77

unist-util-source

utility to get the source of a node or position
JavaScript
6
star
78

hast-util-find-and-replace

utility to find and replace text in a hast tree
JavaScript
6
star
79

mdast-squeeze-paragraphs

utility to remove empty paragraphs from an mdast tree
JavaScript
6
star
80

unist-util-stringify-position

utility to serialize a node, position, or point as a human readable location
JavaScript
6
star
81

unist-util-find-all-after

utility to find nodes after another node
JavaScript
6
star
82

mdast-normalize-headings

utility to make sure there is no more than a single top-level heading in the document
JavaScript
6
star
83

unist-util-find-after

unist utility to find a node after another node
JavaScript
6
star
84

hast-util-heading

utility to check if a node is heading content
JavaScript
6
star
85

unist-util-find-all-before

utility to find nodes before another node
JavaScript
5
star
86

mdast-util-gfm-footnote

mdast extension to parse and serialize GFM footnotes
JavaScript
5
star
87

mdast-util-gfm-strikethrough

mdast extension to parse and serialize GFM strikethrough
JavaScript
5
star
88

unist-util-parents

unist utility to add references to parents on nodes in a tree
JavaScript
5
star
89

mdast-util-heading-style

utility to get the style of an mdast heading
JavaScript
5
star
90

mdast-util-gfm-task-list-item

mdast extension to parse and serialize GFM task list items
JavaScript
5
star
91

unist-util-find-before

utility to find a node before another node
JavaScript
4
star
92

hast-util-to-nlcst

utility to transform hast to nlcst
JavaScript
4
star
93

mdast-util-from-quill-delta

utility to transform Quill delta to mdast
4
star
94

xast-util-to-string

xast utility to get the text value of a node
JavaScript
4
star
95

estree-util-visit

esast (and estree) utility to visit nodes
JavaScript
4
star
96

xast-util-sitemap

xast utility to build a sitemap
JavaScript
4
star
97

hast-util-to-xast

utility to transform to xast (xml)
JavaScript
4
star
98

hast-util-to-snabbdom

utility to transform to a Snabbdom tree
4
star
99

unist-util-generated

utility to check if a node is generated
JavaScript
4
star
100

hast-util-whitespace

utility to check if a node is inter-element whitespace
JavaScript
4
star