• Stars
    star
    200
  • Rank 195,325 (Top 4 %)
  • Language
  • Created about 10 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Natural Language Concrete Syntax Tree format

nlcst

Natural Language Concrete Syntax Tree format.


nlcst is a specification for representing natural language in a syntax tree. It implements the unist spec.

This document may not be released. See releases for released documents. The latest released version is 1.0.2.

Contents

Introduction

This document defines a format for representing natural language as a concrete syntax tree. Development of nlcst started in May 2014, in the now deprecated textom project for retext, before unist existed. This specification is written in a Web IDL-like grammar.

Where this specification fits

nlcst extends unist, a format for syntax trees, to benefit from its ecosystem of utilities.

nlcst relates to JavaScript in that it has an ecosystem of utilities for working with compliant syntax trees in JavaScript. However, nlcst is not limited to JavaScript and can be used in other programming languages.

nlcst relates to the unified and retext projects in that nlcst syntax trees are used throughout their ecosystems.

Types

If you are using TypeScript, you can use the nlcst types by installing them with npm:

npm install @types/nlcst

Nodes (abstract)

Literal

interface Literal <: UnistLiteral {
  value: string
}

Literal (UnistLiteral) represents a node in nlcst containing a value.

Its value field is a string.

Parent

interface Parent <: UnistParent {
  children: [Paragraph | Punctuation | Sentence | Source | Symbol | Text | WhiteSpace | Word]
}

Parent (UnistParent) represents a node in nlcst containing other nodes (said to be children).

Its content is limited to only other nlcst content.

Nodes

Paragraph

interface Paragraph <: Parent {
  type: 'ParagraphNode'
  children: [Sentence | Source | WhiteSpace]
}

Paragraph (Parent) represents a unit of discourse dealing with a particular point or idea.

Paragraph can be used in a root node. It can contain sentence, whitespace, and source nodes.

Punctuation

interface Punctuation <: Literal {
  type: 'PunctuationNode'
}

Punctuation (Literal) represents typographical devices which aid understanding and correct reading of other grammatical units.

Punctuation can be used in sentence or word nodes.

Root

interface Root <: Parent {
  type: 'RootNode'
}

Root (Parent) represents a document.

Root can be used as the root of a tree, never as a child. Its content model is not limited, it can contain any nlcst content, with the restriction that all content must be of the same category.

Sentence

interface Sentence <: Parent {
  type: 'SentenceNode'
  children: [Punctuation | Source | Symbol | WhiteSpace | Word]
}

Sentence (Parent) represents grouping of grammatically linked words, that in principle tells a complete thought, although it may make little sense taken in isolation out of context.

Sentence can be used in a paragraph node. It can contain word, symbol, punctuation, whitespace, and source nodes.

Source

interface Source <: Literal {
  type: 'SourceNode'
}

Source (Literal) represents an external (ungrammatical) value embedded into a grammatical unit: a hyperlink, code, and such.

Source can be used in root, paragraph, sentence, or word nodes.

Symbol

interface Symbol <: Literal {
  type: 'SymbolNode'
}

Symbol (Literal) represents typographical devices different from characters which represent sounds (like letters and numerals), white space, or punctuation.

Symbol can be used in sentence or word nodes.

Text

interface Text <: Literal {
  type: 'TextNode'
}

Text (Literal) represents actual content in nlcst documents: one or more characters.

Text can be used in word nodes.

WhiteSpace

interface WhiteSpace <: Literal {
  type: 'WhiteSpaceNode'
}

WhiteSpace (Literal) represents typographical devices devoid of content, separating other units.

WhiteSpace can be used in root, paragraph, or sentence nodes.

Word

interface Word <: Parent {
  type: 'WordNode'
  children: [Punctuation | Source | Symbol | Text]
}

Word (Parent) represents the smallest element that may be uttered in isolation with semantic or pragmatic content.

Word can be used in a sentence node. It can contain text, symbol, punctuation, and source nodes.

Glossary

See the unist glossary.

List of utilities

See the unist list of utilities for more utilities.

Related

  • mdast β€” Markdown Abstract Syntax Tree format
  • hast β€” Hypertext Abstract Syntax Tree format
  • xast β€” Extensible Abstract Syntax Tree

References

Contribute

See contributing.md in syntax-tree/.github for ways to get started. See support.md for ways to get help. Ideas for new utilities and tools can be posted in syntax-tree/ideas.

A curated list of awesome syntax-tree, unist, mdast, hast, xast, and nlcst resources can be found in awesome syntax-tree.

This project has a code of conduct. By interacting with this repository, organization, or community you agree to abide by its terms.

Acknowledgments

The initial release of this project was authored by @wooorm.

Thanks to @nwtn, @tmcw, @muraken720, and @dozoisch for contributing to nlcst and related projects!

License

CC-BY-4.0 Β© Titus Wormer

More Repositories

1

mdast

Markdown Abstract Syntax Tree format
1,058
star
2

unist

Universal Syntax Tree used by @unifiedjs
862
star
3

hast

Hypertext Abstract Syntax Tree format
735
star
4

unist-util-visit

utility to visit nodes
JavaScript
269
star
5

mdast-util-from-markdown

mdast utility to parse markdown
JavaScript
202
star
6

hastscript

utility to create hast trees
JavaScript
160
star
7

awesome-syntax-tree

Curated list of awesome syntax-tree, unist, hast, xast, mdast, esast, nlcst resources
105
star
8

mdast-util-to-hast

utility to transform mdast to hast
JavaScript
100
star
9

mdast-util-to-markdown

mdast utility to serialize markdown
JavaScript
92
star
10

hast-util-to-html

utility to serialize hast to HTML
JavaScript
81
star
11

mdast-util-toc

utility to generate a table of contents from an mdast tree
JavaScript
80
star
12

unist-builder

utility to create a new trees with a nice syntax
JavaScript
73
star
13

unist-util-visit-parents

utility to recursively walk over unist nodes, with ancestral information
JavaScript
65
star
14

xast

Extensible Abstract Syntax Tree
63
star
15

unist-util-select

utility to select unist nodes with CSS-like selectors
JavaScript
59
star
16

hast-util-reading-time

utility to estimate the reading time
JavaScript
56
star
17

hast-util-to-jsx-runtime

hast utility to transform to preact, react, solid, svelte, vue, etc
JavaScript
51
star
18

hast-util-sanitize

utility to sanitize hast nodes
HTML
49
star
19

esast

ECMAScript Abstract Syntax Tree format
47
star
20

hast-to-hyperscript

Legacy utility to transform hast to something else
46
star
21

mdast-util-to-string

utility to get the plain text content of an mdast node
JavaScript
37
star
22

hast-util-select

utility to add `querySelector`, `querySelectorAll`, and `matches` support for hast
JavaScript
37
star
23

hast-util-to-mdast

utility to transform hast (HTML) to mdast (markdown)
JavaScript
37
star
24

unist-util-is

utility to check if a node passes a test
JavaScript
36
star
25

unist-util-map

utility to create a new tree by mapping all nodes
JavaScript
35
star
26

hast-util-from-html

hast utility to parse from HTML
JavaScript
28
star
27

unist-util-inspect

utility to inspect nodes
JavaScript
25
star
28

mdast-util-mdx-jsx

mdast extension to parse and serialize MDX JSX
JavaScript
23
star
29

estree-util-build-jsx

Transform JSX in estrees to function calls (for react, preact, and most hyperscript interfaces)
JavaScript
22
star
30

hast-util-from-dom

utility to transform a DOM tree to hast
JavaScript
22
star
31

unist-util-find

utility to find a node
JavaScript
21
star
32

unist-util-remove

utility to remove nodes from a tree
JavaScript
21
star
33

hast-util-to-text

utility to get the plain-text value of a node according to the `innerText` algorithm
JavaScript
19
star
34

hast-util-to-estree

hast utility to transform to estree (JavaScript AST) JSX
JavaScript
19
star
35

hast-util-to-dom

utility to transform hast to a DOM tree
JavaScript
19
star
36

nlcst-to-string

utility to transform an nlcst tree to a string
JavaScript
18
star
37

mdast-util-gfm

mdast extension to parse and serialize GFM (GitHub Flavored Markdown)
JavaScript
18
star
38

mdast-util-mdx

mdast extension to parse and serialize MDX (or MDX.js)
JavaScript
18
star
39

nlcst-search

utility to search for patterns in an nlcst tree
JavaScript
16
star
40

unist-util-filter

utility to create a new tree with nodes that pass a filter
JavaScript
16
star
41

hast-util-from-parse5

utility to transform Parse5’s AST to hast
JavaScript
16
star
42

mdast-util-math

mdast extension to parse and serialize math
JavaScript
15
star
43

mdast-zone

utility to treat HTML comments as ranges or markers in mdast
JavaScript
15
star
44

mdast-util-frontmatter

mdast extensions to parse and serialize frontmatter (YAML, TOML, etc)
JavaScript
15
star
45

unist-diff

Diff two unist trees
JavaScript
13
star
46

xast-util-from-xml

utility to parse from XML
JavaScript
13
star
47

hast-util-parse-selector

utility to create an element from a simple CSS selector
JavaScript
13
star
48

mdast-util-directive

mdast extension to parse and serialize generic directives (`:cite[smith04]`)
JavaScript
12
star
49

estree-util-to-js

estree (and esast) utility to serialize as JavaScript
JavaScript
12
star
50

hast-util-is-element

utility to check if a node is a (certain) element
JavaScript
12
star
51

unist-util-remove-position

utility to remove positions from a tree
JavaScript
12
star
52

mdast-util-gfm-table

mdast extension to parse and serialize GFM tables
JavaScript
12
star
53

xastscript

utility to create xast trees
JavaScript
11
star
54

unist-util-modify-children

utility to modify direct children of a parent
JavaScript
11
star
55

mdast-util-find-and-replace

mdast utility to find and replace text in a tree
JavaScript
11
star
56

esast-util-from-js

estree (and esast) utility to parse from JavaScript
JavaScript
11
star
57

mdast-util-heading-range

utility to use headings as ranges in mdast
JavaScript
10
star
58

hast-util-raw

utility to reparse a hast tree
JavaScript
10
star
59

mdast-util-definitions

utility to find definition nodes in an mdast tree
JavaScript
10
star
60

xast-util-feed

xast utility to build feeds (rss, atom)
JavaScript
9
star
61

mdast-util-to-nlcst

utility to transform mdast to nlcst
JavaScript
9
star
62

mdast-comment-marker

utility to parse a comment marker in mdast
JavaScript
9
star
63

ideas

Share ideas for new utilities and tools built with @syntax-tree
9
star
64

nlcst-is-literal

utility to check whether an nlcst node is meant literally
JavaScript
9
star
65

unist-util-index

utility to index property values or computed keys to nodes
JavaScript
8
star
66

unist-util-visit-children

unist utility to visit direct children of a parent
JavaScript
8
star
67

unist-util-position

utility to get the position of a node
JavaScript
8
star
68

mdast-util-mdxjs-esm

mdast extension to parse and serialize MDX.js ESM (import/exports)
JavaScript
8
star
69

mdast-util-gfm-autolink-literal

mdast extension to parse and serialize GFM autolink literals
JavaScript
8
star
70

hast-util-from-html-isomorphic

hast utility that turns HTML into a syntax tree (while being small in browsers)
JavaScript
8
star
71

unist-builder-blueprint

utility to transform trees to unist-builder notation
7
star
72

estree-util-attach-comments

utility to attach comments to estree nodes
JavaScript
7
star
73

hast-util-heading-rank

utility to get the rank (or depth, level) of headings
JavaScript
7
star
74

nlcst-normalize

utility to normalize an nlcst word for easier comparison
JavaScript
7
star
75

mdast-util-compact

utility to make an mdast tree compact
JavaScript
7
star
76

.github

Community health files for syntax-tree, unist, hast, xast, mdast, and nlcst
6
star
77

unist-util-source

utility to get the source of a node or position
JavaScript
6
star
78

hast-util-find-and-replace

utility to find and replace text in a hast tree
JavaScript
6
star
79

mdast-squeeze-paragraphs

utility to remove empty paragraphs from an mdast tree
JavaScript
6
star
80

unist-util-stringify-position

utility to serialize a node, position, or point as a human readable location
JavaScript
6
star
81

unist-util-find-all-after

utility to find nodes after another node
JavaScript
6
star
82

mdast-normalize-headings

utility to make sure there is no more than a single top-level heading in the document
JavaScript
6
star
83

unist-util-find-after

unist utility to find a node after another node
JavaScript
6
star
84

hast-util-heading

utility to check if a node is heading content
JavaScript
6
star
85

unist-util-find-all-before

utility to find nodes before another node
JavaScript
5
star
86

mdast-util-gfm-footnote

mdast extension to parse and serialize GFM footnotes
JavaScript
5
star
87

mdast-util-gfm-strikethrough

mdast extension to parse and serialize GFM strikethrough
JavaScript
5
star
88

unist-util-parents

unist utility to add references to parents on nodes in a tree
JavaScript
5
star
89

mdast-util-heading-style

utility to get the style of an mdast heading
JavaScript
5
star
90

mdast-util-gfm-task-list-item

mdast extension to parse and serialize GFM task list items
JavaScript
5
star
91

unist-util-find-before

utility to find a node before another node
JavaScript
4
star
92

hast-util-to-nlcst

utility to transform hast to nlcst
JavaScript
4
star
93

mdast-util-from-quill-delta

utility to transform Quill delta to mdast
4
star
94

xast-util-to-string

xast utility to get the text value of a node
JavaScript
4
star
95

estree-util-visit

esast (and estree) utility to visit nodes
JavaScript
4
star
96

xast-util-sitemap

xast utility to build a sitemap
JavaScript
4
star
97

hast-util-to-xast

utility to transform to xast (xml)
JavaScript
4
star
98

hast-util-to-snabbdom

utility to transform to a Snabbdom tree
4
star
99

unist-util-generated

utility to check if a node is generated
JavaScript
4
star
100

hast-util-whitespace

utility to check if a node is inter-element whitespace
JavaScript
4
star