parse-english
Natural language parser, for the English language, that produces nlcst.
Contents
- What is this?
- When should I use this?
- Install
- Use
- API
- Algorithm
- Types
- Compatibility
- Security
- Related
- Contribute
- License
What is this?
This package exposes a parser that takes English natural language and produces a syntax tree.
When should I use this?
If you want to handle English natural language as syntax trees manually, use this.
Alternatively, you can use the retext plugin retext-english
,
which wraps this project to also parse natural language at a higher-level
(easier) abstraction.
For Dutch or most Latin-script languages, you can instead use
parse-dutch
or parse-latin
.
Install
This package is ESM only. In Node.js (version 16+), install with npm:
npm install parse-english
In Deno with esm.sh
:
import {ParseEnglish} from 'https://esm.sh/parse-english@6'
In browsers with esm.sh
:
<script type="module">
import {ParseEnglish} from 'https://esm.sh/parse-english@6?bundle'
</script>
Use
import {ParseEnglish} from 'parse-english'
import {inspect} from 'unist-util-inspect'
const tree = new ParseEnglish().parse(
'Mr. Henry Brown: A hapless but friendly City of London worker.'
)
console.log(inspect(tree))
Yields:
RootNode[1] (1:1-1:63, 0-62)
ββ0 ParagraphNode[1] (1:1-1:63, 0-62)
ββ0 SentenceNode[23] (1:1-1:63, 0-62)
ββ0 WordNode[2] (1:1-1:4, 0-3)
β ββ0 TextNode "Mr" (1:1-1:3, 0-2)
β ββ1 PunctuationNode "." (1:3-1:4, 2-3)
ββ1 WhiteSpaceNode " " (1:4-1:5, 3-4)
ββ2 WordNode[1] (1:5-1:10, 4-9)
β ββ0 TextNode "Henry" (1:5-1:10, 4-9)
ββ3 WhiteSpaceNode " " (1:10-1:11, 9-10)
ββ4 WordNode[1] (1:11-1:16, 10-15)
β ββ0 TextNode "Brown" (1:11-1:16, 10-15)
ββ5 PunctuationNode ":" (1:16-1:17, 15-16)
ββ6 WhiteSpaceNode " " (1:17-1:18, 16-17)
ββ7 WordNode[1] (1:18-1:19, 17-18)
β ββ0 TextNode "A" (1:18-1:19, 17-18)
ββ8 WhiteSpaceNode " " (1:19-1:20, 18-19)
ββ9 WordNode[1] (1:20-1:27, 19-26)
β ββ0 TextNode "hapless" (1:20-1:27, 19-26)
ββ10 WhiteSpaceNode " " (1:27-1:28, 26-27)
ββ11 WordNode[1] (1:28-1:31, 27-30)
β ββ0 TextNode "but" (1:28-1:31, 27-30)
ββ12 WhiteSpaceNode " " (1:31-1:32, 30-31)
ββ13 WordNode[1] (1:32-1:40, 31-39)
β ββ0 TextNode "friendly" (1:32-1:40, 31-39)
ββ14 WhiteSpaceNode " " (1:40-1:41, 39-40)
ββ15 WordNode[1] (1:41-1:45, 40-44)
β ββ0 TextNode "City" (1:41-1:45, 40-44)
ββ16 WhiteSpaceNode " " (1:45-1:46, 44-45)
ββ17 WordNode[1] (1:46-1:48, 45-47)
β ββ0 TextNode "of" (1:46-1:48, 45-47)
ββ18 WhiteSpaceNode " " (1:48-1:49, 47-48)
ββ19 WordNode[1] (1:49-1:55, 48-54)
β ββ0 TextNode "London" (1:49-1:55, 48-54)
ββ20 WhiteSpaceNode " " (1:55-1:56, 54-55)
ββ21 WordNode[1] (1:56-1:62, 55-61)
β ββ0 TextNode "worker" (1:56-1:62, 55-61)
ββ22 PunctuationNode "." (1:62-1:63, 61-62)
API
This package exports the identifier ParseEnglish
.
There is no default export.
ParseEnglish()
Create a new parser.
ParseEnglish
extends ParseLatin
.
See parse-latin
for API docs.
Algorithm
All of parse-latin
is included, and the following support for
the English natural language:
- unit abbreviations (
tsp.
,tbsp.
,oz.
,ft.
, and more) - time references (
sec.
,min.
,tues.
,thu.
,feb.
, and more) - business Abbreviations (
Inc.
andLtd.
) - social titles (
Mr.
,Mmes.
,Sr.
, and more) - rank and academic titles (
Dr.
,Rep.
,Gen.
,Prof.
,Pres.
, and more) - geographical abbreviations (
Ave.
,Blvd.
,Ft.
,Hwy.
, and more) - American state abbreviations (
Ala.
,Minn.
,La.
,Tex.
, and more) - Canadian province abbreviations (
Alta.
,QuΓ©.
,Yuk.
, and more) - English county abbreviations (
Beds.
,Leics.
,Shrops.
, and more) - common elision (omission of letters) (
βnβ
,βo
,βem
,βtwas
,β80s
, and more)
Types
This package is fully typed with TypeScript. It exports no additional types.
Compatibility
This package is at least compatible with all maintained versions of Node.js. As of now, that is Node.js 16.0+. It also works in Deno and modern browsers.
Security
This package is safe.
Related
parse-latin
β Latin-script natural language parserparse-dutch
β Dutch natural language parser
Contribute
Yes please! See How to Contribute to Open Source.
License
MIT Β© Titus Wormer