• Stars
    star
    224
  • Rank 176,765 (Top 4 %)
  • Language
    JavaScript
  • License
    MIT License
  • Created almost 8 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Expose Spacy nlp text parsing to Nodejs (and other languages) via socketIO

spacy-nlp npm version CircleCI Code Climate Test Coverage

Expose Spacy nlp text parsing to Nodejs (and other languages) via socketIO

Installation

# install spacy in python3
python3 -m pip install -U socketIO-client-nexus
python3 -m pip install -U spacy==2.1.3
python3 -m spacy download en_core_web_md

# install this npm package
npm i --save spacy-nlp

Usage

const spacyNLP = require("spacy-nlp");
// default port 6466
// start the server with the python client that exposes spacyIO (or use an existing socketIO server at IOPORT)
var serverPromise = spacyNLP.server({ port: process.env.IOPORT });
// Loading spacy may take up to 15s

Note that python3 is preferred. If you use python2, at each run set the env var USE_PY2=true.

You'll see log like:

[Sun Oct 09 2016 16:53:33 GMT-0400 (EDT)] INFO Starting poly-socketio server on port: 6466, expecting 1 IO clients
[Sun Oct 09 2016 16:53:33 GMT-0400 (EDT)] INFO Starting socketIO client for python3 at 6466
[Sun Oct 09 2016 16:53:44 GMT-0400 (EDT)] DEBUG cgkb-py mXjDqupv852zUeMPAAAA joined, 0 remains
[Sun Oct 09 2016 16:53:44 GMT-0400 (EDT)] INFO All 1 IO clients have joined

Since it uses poly-socketio, there'll be one IO server, and one global.client(internal to this module) in the same process, no matter how many times poly-socketio is called. This resolves conflicts for cross-project usage.

E.g. AIVA uses poly-socketio to start a server for its internal cross-language communication, and uses spacy-nlp too. spacy-nlp will automatically use the IO server and the global.client from AIVA.

Methods

Syntax Parsing

Once it is ready, i.e. you can use the nodejs client nlp to parse texts:

const spacyNLP = require("spacy-nlp");
const nlp = spacyNLP.nlp;

// Note you can pass multiple sentences concat in one string.
nlp.parse("Bob Brought the pizza to Alice.").then(output => {
  console.log(output);
  console.log(JSON.stringify(output[0].parse_tree, null, 2));
});

// Store output into variable
const result = await nlp.parse("Bob Brought the pizza to Alice.");

And the output is the syntax parse tree with POS tagging. For the parse_tree, NE means Named Entity for NER; arc of an object is incident on it. An arc points from head word to modifier word. See the explanation on Tensorflow/syntaxnet.

[ { text: 'Bob Brought the pizza to Alice.',
    len: 7,
    tokens: [ 'Bob', 'Brought', 'the', 'pizza', 'to', 'Alice', '.' ],
    noun_phrases: [ 'Bob', 'the pizza', 'Alice' ],
    parse_tree:
     [ { word: 'Brought',
         lemma: 'bring',
         NE: '',
         POS_fine: 'VBD',
         POS_coarse: 'VERB',
         arc: 'ROOT',
         modifiers:
          [ { word: 'Bob',
              lemma: 'Bob',
              NE: 'PERSON',
              POS_fine: 'NNP',
              POS_coarse: 'PROPN',
              arc: 'nsubj',
              modifiers: [] },
            { word: 'pizza',
              lemma: 'pizza',
              NE: '',
              POS_fine: 'NN',
              POS_coarse: 'NOUN',
              arc: 'dobj',
              modifiers:
               [ { word: 'the',
                   lemma: 'the',
                   NE: '',
                   POS_fine: 'DT',
                   POS_coarse: 'DET',
                   arc: 'det',
                   modifiers: [] } ] },
            { word: 'to',
              lemma: 'to',
              NE: '',
              POS_fine: 'IN',
              POS_coarse: 'ADP',
              arc: 'prep',
              modifiers:
               [ { word: 'Alice',
                   lemma: 'Alice',
                   NE: 'PERSON',
                   POS_fine: 'NNP',
                   POS_coarse: 'PROPN',
                   arc: 'pobj',
                   modifiers: [] } ] },
            { word: '.',
              lemma: '.',
              NE: '',
              POS_fine: '.',
              POS_coarse: 'PUNCT',
              arc: 'punct',
              modifiers: [] } ] } ],
    parse_list:
     [ { word: 'Bob',
         lemma: 'Bob',
         NE: 'PERSON',
         POS_fine: 'NNP',
         POS_coarse: 'PROPN' },
       { word: 'Brought',
         lemma: 'bring',
         NE: '',
         POS_fine: 'VBD',
         POS_coarse: 'VERB' },
       { word: 'the',
         lemma: 'the',
         NE: '',
         POS_fine: 'DT',
         POS_coarse: 'DET' },
       { word: 'pizza',
         lemma: 'pizza',
         NE: '',
         POS_fine: 'NN',
         POS_coarse: 'NOUN' },
       { word: 'to',
         lemma: 'to',
         NE: '',
         POS_fine: 'IN',
         POS_coarse: 'ADP' },
       { word: 'Alice',
         lemma: 'Alice',
         NE: 'PERSON',
         POS_fine: 'NNP',
         POS_coarse: 'PROPN' },
       { word: '.',
         lemma: '.',
         NE: '',
         POS_fine: '.',
         POS_coarse: 'PUNCT' } ] } ]

Noun Parsing

// Available options are count (returns the total count) and words (returns the parsed strings) You can specify one or both.
const options = ["count"];

// Note you can pass multiple sentences concat in one string.
nlp
  .parse_nouns(
    "On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.",
    options
  )
  .then(output => {
    console.log(output);
  });

// Store output into variable
const result = await nlp.parse_nouns(
  "On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.",
  options
);

// 19

Verb Parsing

// Available options are count (returns the total count) and words (returns the parsed strings) You can specify one or both.
const options = ["count"];

// Note you can pass multiple sentences concat in one string.
nlp
  .parse_verbs(
    "On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.",
    options
  )
  .then(output => {
    console.log(output);
  });

// Store output into variable
const result = await nlp.parse_verbs(
  "On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.",
  options
);

// 7

Adjective Parsing

// Available options are count (returns the total count) and words (returns the parsed strings) You can specify one or both.
const options = ["count"];

// Note you can pass multiple sentences concat in one string.
nlp
  .parse_adj(
    "On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.",
    options
  )
  .then(output => {
    console.log(output);
  });

// Store output into variable
const result = await nlp.parse_adj(
  "On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.",
  options
);

// 8

Named Entity Parsing

// Available options are count (returns the total count) and words (returns the parsed strings) You can specify one or both.
const options = ["count"];

// Note you can pass multiple sentences concat in one string.
nlp
  .parse_named_entities(
    "On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.",
    options
  )
  .then(output => {
    console.log(output);
  });

// Store output into variable
const result = await nlp.parse_named_entities(
  "On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.",
  options
);

// 8

Date Parsing

// Available options are count (returns the total count) and words (returns the parsed strings) You can specify one or both.
const options = ["words"];

// Note you can pass multiple sentences concat in one string.
nlp
  .parse_date(
    "On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.",
    options
  )
  .then(output => {
    console.log(output);
  });

// Store output into variable
const result = await nlp.parse_date(
  "On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.",
  options
);

// ['22 June 1941', 'from 1939 to 1945']

Time Parsing

// Available options are count (returns the total count) and words (returns the parsed strings) You can specify one or both.
const options = ["count"];

// Note you can pass multiple sentences concat in one string.
nlp
  .parse_time(
    "On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.",
    options
  )
  .then(output => {
    console.log(output);
  });

// Store output into variable
const result = await nlp.parse_time(
  "On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.",
  options
);

// 0

Helpers

The following helper functions are not asynchronous and will not return a promise.

Splitting Large Text

If you have very large text to process, it's best to split the text as Spacy has a max_length limit of 1,000,000 characters.

const text =
  "On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.";

const textArray = nlp.split_text(text);

// ["On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of", "war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often", "abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945."]

Duplicate Removal

If you want to return an array of words, the result will include duplicate strings. To remove duplicates you can use nlp.remove_duplicates.

const text =
  "On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.";

const verbArray = await nlp.parse_verbs(text);
// ["war", "war", "war", "war", "war", "world", "world", "world", "axis", "axis", "ww2", "wwii", "land", "wehrmacht", "union", "powers", "attrition"]

const result = nlp.remove_duplicates(verbArray);

// ["war", "world", "axis", "ww2", "wwii", "land", "wehrmacht", "union", "powers", "attrition"]

Top n Words in a String

const text =
  "On 22 June 1941, the European Axis powers launched an invasion of the Soviet Union, opening the largest land theatre of war in history, which trapped the Axis, most crucially the German Wehrmacht, into a war of attrition. World War II (often abbreviated to WWII or WW2), also known as the Second World War, was a global war that lasted from 1939 to 1945.";

// Arguments are text and cutoff (Top n Words). Returns an array of objects.
const result = nlp.top_words(text, 5);

[
  { word: "the", count: 6 },
  { word: "of", count: 3 },
  { word: "war", count: 3 },
  { word: "Axis", count: 2 },
  { word: "a", count: 2 }
];

More Repositories

1

SLM-Lab

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".
Python
1,244
star
2

aiva

AIVA (A.I. Virtual Assistant): General-purpose virtual assistant for developers.
JavaScript
819
star
3

awesome-deep-rl

A curated list of awesome Deep Reinforcement Learning resources.
646
star
4

openai_lab

An experimentation framework for Reinforcement Learning using OpenAI Gym, Tensorflow, and Keras.
Python
325
star
5

dokker

Dokker.js creates professional Javascript code documentations.
JavaScript
187
star
6

CGKB

Contextual Graph Knowledge Base
JavaScript
87
star
7

python

Python Style Guide
62
star
8

telegram-bot-bootstrap

A bootstrap for Telegram bot with deployable sample bot and JS-wrapped API methods.
JavaScript
45
star
9

robocup-soccer

A.I. Python project on RoboCup 2D Soccer Simulation League.
Python
31
star
10

lomath

Lomath is a tensorial math library extended from lodash, with performant math functions applicable to tensors(multi-arrays). It also has a standalone plotting module that using HighCharts and BrowserSync.
JavaScript
19
star
11

python-structure

Sample project structure for a python package.
Python
17
star
12

Risk-game

Implementation of the 2-player Risk game and the AI to play it, for Math 335 Probability project.
JavaScript
16
star
13

neo4jKB

A graph knowledge base implemented in neo4j.
JavaScript
12
star
14

nlp-time

NLP time parser for time, range, and cron pattern.
JavaScript
7
star
15

Machines

Turing Machines and their restrictions: DFA, NFA, PDA etc, implemented in JavaScript. The design is polymorphic to show the restrictions on Turing Machine.
JavaScript
6
star
16

psi-scraper

The Ion Cannon for scraper, with proxy, robust logic control, parallelization, Sequelize data model. Use with caution.
JavaScript
5
star
17

github-actions-self-hosted

Github Actions Self Hosted container Dockerfile
Dockerfile
5
star
18

reqscraper

Lightweight wrapper for Request and X-Ray JS.
JavaScript
4
star
19

torcharc

TorchArc: Build PyTorch networks by specifying architectures.
Python
3
star
20

Notes-on-AI-realization

Personal notes on possible AI realizations, from the perspective of Computer Science, Quantum physics, Mathematics, and Philosophy.
3
star
21

farming

Two engineers decide to become farmers
Python
2
star
22

k0s-cluster

Private Kubernetes cluster setup on a home lab using k0sctl and Helm charts.
Shell
2
star
23

SLM-Lab-doc

SLM Lab Gitbook Documentation
2
star
24

sheep-tinderbot

A tinder bot for a Sheep. Just for fun.
JavaScript
1
star
25

mnist-classifier

Simple MNIST classifier example using PyTorch Lightning.
Python
1
star
26

kengz.github.io

github page for kengz
HTML
1
star
27

ai-notebook

Experiments with A.I.
Python
1
star
28

awesome-dstack

A curated list of awesome dstack.ai resources
1
star
29

psi

The horsemen are getting back to work. New trading project psi - ψ.
JavaScript
1
star
30

openai_lab_doc_cn

Chinese version of the doc for OpenAI Lab https://github.com/kengz/openai_lab
JavaScript
1
star
31

poly-socketio

Polyglot SocketIO server that allows cross-language communication via JSON
JavaScript
1
star
32

lean-dl-example

Example of a lean deep learning project with a config-driven approach.
Python
1
star