parse-english
Natural language parser, for the English language, that produces nlcst.
Contents
- What is this?
- When should I use this?
- Install
- Use
- API
- Algorithm
- Types
- Compatibility
- Security
- Related
- Contribute
- License
What is this?
This package exposes a parser that takes English natural language and produces a syntax tree.
When should I use this?
If you want to handle English natural language as syntax trees manually, use this.
Alternatively, you can use the retext plugin retext-english
,
which wraps this project to also parse natural language at a higher-level
(easier) abstraction.
For Dutch or most Latin-script languages, you can instead use
parse-dutch
or parse-latin
.
Install
This package is ESM only. In Node.js (version 16+), install with npm:
npm install parse-english
In Deno with esm.sh
:
import {ParseEnglish} from 'https://esm.sh/parse-english@6'
In browsers with esm.sh
:
<script type="module">
import {ParseEnglish} from 'https://esm.sh/parse-english@6?bundle'
</script>
Use
import {ParseEnglish} from 'parse-english'
import {inspect} from 'unist-util-inspect'
const tree = new ParseEnglish().parse(
'Mr. Henry Brown: A hapless but friendly City of London worker.'
)
console.log(inspect(tree))
Yields:
RootNode[1] (1:1-1:63, 0-62)
โโ0 ParagraphNode[1] (1:1-1:63, 0-62)
โโ0 SentenceNode[23] (1:1-1:63, 0-62)
โโ0 WordNode[2] (1:1-1:4, 0-3)
โ โโ0 TextNode "Mr" (1:1-1:3, 0-2)
โ โโ1 PunctuationNode "." (1:3-1:4, 2-3)
โโ1 WhiteSpaceNode " " (1:4-1:5, 3-4)
โโ2 WordNode[1] (1:5-1:10, 4-9)
โ โโ0 TextNode "Henry" (1:5-1:10, 4-9)
โโ3 WhiteSpaceNode " " (1:10-1:11, 9-10)
โโ4 WordNode[1] (1:11-1:16, 10-15)
โ โโ0 TextNode "Brown" (1:11-1:16, 10-15)
โโ5 PunctuationNode ":" (1:16-1:17, 15-16)
โโ6 WhiteSpaceNode " " (1:17-1:18, 16-17)
โโ7 WordNode[1] (1:18-1:19, 17-18)
โ โโ0 TextNode "A" (1:18-1:19, 17-18)
โโ8 WhiteSpaceNode " " (1:19-1:20, 18-19)
โโ9 WordNode[1] (1:20-1:27, 19-26)
โ โโ0 TextNode "hapless" (1:20-1:27, 19-26)
โโ10 WhiteSpaceNode " " (1:27-1:28, 26-27)
โโ11 WordNode[1] (1:28-1:31, 27-30)
โ โโ0 TextNode "but" (1:28-1:31, 27-30)
โโ12 WhiteSpaceNode " " (1:31-1:32, 30-31)
โโ13 WordNode[1] (1:32-1:40, 31-39)
โ โโ0 TextNode "friendly" (1:32-1:40, 31-39)
โโ14 WhiteSpaceNode " " (1:40-1:41, 39-40)
โโ15 WordNode[1] (1:41-1:45, 40-44)
โ โโ0 TextNode "City" (1:41-1:45, 40-44)
โโ16 WhiteSpaceNode " " (1:45-1:46, 44-45)
โโ17 WordNode[1] (1:46-1:48, 45-47)
โ โโ0 TextNode "of" (1:46-1:48, 45-47)
โโ18 WhiteSpaceNode " " (1:48-1:49, 47-48)
โโ19 WordNode[1] (1:49-1:55, 48-54)
โ โโ0 TextNode "London" (1:49-1:55, 48-54)
โโ20 WhiteSpaceNode " " (1:55-1:56, 54-55)
โโ21 WordNode[1] (1:56-1:62, 55-61)
โ โโ0 TextNode "worker" (1:56-1:62, 55-61)
โโ22 PunctuationNode "." (1:62-1:63, 61-62)
API
This package exports the identifier ParseEnglish
.
There is no default export.
ParseEnglish()
Create a new parser.
ParseEnglish
extends ParseLatin
.
See parse-latin
for API docs.
Algorithm
All of parse-latin
is included, and the following support for
the English natural language:
- unit abbreviations (
tsp.
,tbsp.
,oz.
,ft.
, and more) - time references (
sec.
,min.
,tues.
,thu.
,feb.
, and more) - business Abbreviations (
Inc.
andLtd.
) - social titles (
Mr.
,Mmes.
,Sr.
, and more) - rank and academic titles (
Dr.
,Rep.
,Gen.
,Prof.
,Pres.
, and more) - geographical abbreviations (
Ave.
,Blvd.
,Ft.
,Hwy.
, and more) - American state abbreviations (
Ala.
,Minn.
,La.
,Tex.
, and more) - Canadian province abbreviations (
Alta.
,Quรฉ.
,Yuk.
, and more) - English county abbreviations (
Beds.
,Leics.
,Shrops.
, and more) - common elision (omission of letters) (
โnโ
,โo
,โem
,โtwas
,โ80s
, and more)
Types
This package is fully typed with TypeScript. It exports no additional types.
Compatibility
This package is at least compatible with all maintained versions of Node.js. As of now, that is Node.js 16.0+. It also works in Deno and modern browsers.
Security
This package is safe.
Related
parse-latin
โ Latin-script natural language parserparse-dutch
โ Dutch natural language parser
Contribute
Yes please! See How to Contribute to Open Source.
License
MIT ยฉ Titus Wormer