• Stars
    star
    1,196
  • Rank 39,117 (Top 0.8 %)
  • Language
    JavaScript
  • License
    MIT License
  • Created almost 6 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Developer friendly Natural Language Processing ✨

winkNLP

Build Status Coverage Status Known Vulnerabilities CII Best Practices Gitter Follow on Twitter

Developer friendly Natural Language Processing

WinkNLP is a JavaScript library for Natural Language Processing (NLP). Designed specifically to make development of NLP applications easier and faster, winkNLP is optimized for the right balance of performance and accuracy.

It is built ground up with no external dependency and has a lean code base of ~10Kb minified & gzipped. A test coverage of ~100% and compliance with the Open Source Security Foundation best practices make winkNLP the ideal tool for building production grade systems with confidence.

WinkNLP with full Typescript support, runs on Node.js, web browsers and Deno.

Build amazing apps quickly

Wikipedia article timeline Context aware word cloud Key sentences detection

Head to live examples to explore further.

Blazing fast

WinkNLP can easily process large amount of raw text at speeds over 650,000 tokens/second  on a M1 Macbook Pro in both browser and Node.js environments. It even runs smoothly on a low-end smartphone's browser.

Environment Benchmarking Command
Node.js node benchmark/run
Browser How to measure winkNLP's speed on browsers?

Features

WinkNLP has a comprehensive natural language processing (NLP) pipeline covering tokenization, sentence boundary detection (sbd), negation handling, sentiment analysis, part-of-speech (pos) tagging, named entity recognition (ner), custom entities recognition (cer). It offers a rich feature set:

🐎 Fast, lossless & multilingual tokenizer For example, the multilingual text string "¡Hola! नमस्कार! Hi! Bonjour chéri" is tokenized as ["¡", "Hola", "!", "नमस्कार", "!", "Hi", "!", "Bonjour", "chéri"]. The tokenizer processes text at a speed close to 4 million tokens/second on a M1 MBP's browser.
Developer friendly and intuitive APIWith winkNLP, process any text using a simple, declarative syntax; most live examples have 30-40 lines of code.
🖼 Best-in-class text visualizationProgrammatically mark tokens, sentences, entities, etc. using HTML mark or any other tag of your choice.
♻️ Extensive text processing featuresRemove and/or retain tokens with specific attributes such as part-of-speech, named entity type, token type, stop word, shape and many more; compute Flesch reading ease score; generate n-grams; normalize, lemmatise or stem. Checkout how with the right kind of text preprocessing, even Naive Bayes classifier achieves impressive (≥90%) accuracy in sentiment analysis and chatbot intent classification tasks.
🔠 Pre-trained language modelsCompact sizes starting from ~1MB (minified & gzipped) – reduce model loading time drastically down to ~1 second on a 4G network.
💼 Host of utilities & toolsBM25 vectorizer; Several similarity methods – Cosine, Tversky, Sørensen-Dice, Otsuka-Ochiai; Helpers to get bag of words, frequency table, lemma/stem, stop word removal and many more.

WinkJS also has packages like Naive Bayes classifier, multi-class averaged perceptron and popular token and string distance methods, which complement winkNLP.

Documentation

  • Concepts — everything you need to know to get started.
  • API Reference — explains usage of APIs with examples.
  • Change log — version history along with the details of breaking changes, if any.
  • Examples — live examples with code to give you a head start.

Installation

Use npm install:

npm install wink-nlp --save

In order to use winkNLP after its installation, you also need to install a language model according to the node version used. The table below outlines the version specific installation command:

Node.js Version Installation
16 or 18 npm install wink-eng-lite-web-model --save
14 or 12 node -e "require('wink-nlp/models/install')"

The wink-eng-lite-web-model is designed to work with Node.js version 16 or 18. It can also work on browsers as described in the next section. This is the recommended model.

The second command installs the wink-eng-lite-model, which works with Node.js version 14 or 12.

How to configure TypeScript project

Enable esModuleInterop and allowSyntheticDefaultImports in the tsconfig.json file:

"compilerOptions": {
    "esModuleInterop": true,
    "allowSyntheticDefaultImports": true,
    ...
}

How to install for Web Browser

If you’re using winkNLP in the browser use the wink-eng-lite-web-model. Learn about its installation and usage in our guide to using winkNLP in the browser. Explore winkNLP recipes on Observable for live browser based examples.

How to run on Deno

Follow the example on replit.

Get started

Here is the "Hello World!" of winkNLP:

// Load wink-nlp package.
const winkNLP = require( 'wink-nlp' );
// Load english language model.
const model = require( 'wink-eng-lite-web-model' );
// Instantiate winkNLP.
const nlp = winkNLP( model );
// Obtain "its" helper to extract item properties.
const its = nlp.its;
// Obtain "as" reducer helper to reduce a collection.
const as = nlp.as;
 
// NLP Code.
const text = 'Hello   World🌎! How are you?';
const doc = nlp.readDoc( text );
 
console.log( doc.out() );
// -> Hello   World🌎! How are you?
 
console.log( doc.sentences().out() );
// -> [ 'Hello   World🌎!', 'How are you?' ]
 
console.log( doc.entities().out( its.detail ) );
// -> [ { value: '🌎', type: 'EMOJI' } ]
 
console.log( doc.tokens().out() );
// -> [ 'Hello', 'World', '🌎', '!', 'How', 'are', 'you', '?' ]
 
console.log( doc.tokens().out( its.type, as.freqTable ) );
// -> [ [ 'word', 5 ], [ 'punctuation', 2 ], [ 'emoji', 1 ] ]

Experiment with winkNLP on RunKit.

Speed & Accuracy

The winkNLP processes raw text at ~650,000 tokens per second with its wink-eng-lite-web-model, when benchmarked using "Ch 13 of Ulysses by James Joyce" on a M1 Macbook Pro machine with 16GB RAM. The processing included the entire NLP pipeline — tokenization, sentence boundary detection, negation handling, sentiment analysis, part-of-speech tagging, and named entity extraction. This speed is way ahead of the prevailing speed benchmarks.

The benchmark was conducted on Node.js versions 16, and 18.

It pos tags a subset of WSJ corpus with an accuracy of ~95% — this includes tokenization of raw text prior to pos tagging. The present state-of-the-art is at ~97% accuracy but at lower speeds and is generally computed using gold standard pre-tokenized corpus.

Its general purpose sentiment analysis delivers a f-score of ~84.5%, when validated using Amazon Product Review Sentiment Labelled Sentences Data Set at UCI Machine Learning Repository. The current benchmark accuracy for specifically trained models can range around 95%.

Memory Requirement

Wink NLP delivers this performance with the minimal load on RAM. For example, it processes the entire History of India Volume I with a total peak memory requirement of under 80MB. The book has around 350 pages which translates to over 125,000 tokens.

Need Help?

Usage query 👩🏽‍💻

Please ask at Stack Overflow or discuss at Wink JS GitHub Discussions or chat with us at Wink JS Gitter Lobby.

Bug report 🐛

If you spot a bug and the same has not yet been reported, raise a new issue or consider fixing it and sending a PR.

New feature 🌟

Looking for a new feature, request it via the new features & ideas discussion forum or consider becoming a contributor.

About winkJS

WinkJS is a family of open source packages for Natural Language Processing, Machine Learning, and Statistical Analysis in NodeJS. The code is thoroughly documented for easy human comprehension and has a test coverage of ~100% for reliability to build production grade solutions.

Copyright & License

Wink NLP is copyright 2017-23 GRAYPE Systems Private Limited.

It is licensed under the terms of the MIT License.

More Repositories

1

wink-nlp-utils

NLP Functions for amplifying negations, managing elisions, creating ngrams, stems, phonetic codes to tokens and more.
JavaScript
117
star
2

wink-pos-tagger

English Part-of-speech (POS) tagger
JavaScript
65
star
3

wink-lemmatizer

English lemmatizer
JavaScript
63
star
4

wink-sentiment

Accurate and fast sentiment scoring of phrases with #hashtags, emoticons :) & emojis 🎉
JavaScript
61
star
5

wink-tokenizer

Multilingual tokenizer that automatically tags each token with its type
JavaScript
59
star
6

wink-bm25-text-search

Fast Full Text Search based on BM25
JavaScript
56
star
7

wink-statistics

Fast & numerically stable statistical analysis
JavaScript
46
star
8

wink-ner

Language agnostic named entity recognizer
JavaScript
39
star
9

wink-naive-bayes-text-classifier

Naive Bayes Text Classifier
JavaScript
38
star
10

wink-distance

Distance/Similarity functions for Bag of Words, Strings, Vectors and more.
JavaScript
22
star
11

wink-porter2-stemmer

Javascript Implementation of Porter Stemmer Algorithm V2 by Dr Martin F Porter
JavaScript
20
star
12

wink-regression-tree

Decision Tree to predict the value of a continuous target variable
JavaScript
15
star
13

wink-lexicon

English lexicon useful in NLP/NLU
JavaScript
15
star
14

wink-perceptron

Multi-class classifier
JavaScript
13
star
15

wink-jaro-distance

An Implementation of Jaro Distance Algorithm by Matthew A. Jaro
JavaScript
13
star
16

wink-eng-lite-model

English lite language model for wink-nlp.
11
star
17

showcase-wiz

🧙🏽‍♂️ Visualize wink-nlp's features
CSS
11
star
18

wink-eng-lite-web-model

English lite language model for Web Browsers
JavaScript
11
star
19

showcase-bm25-text-search

🕵️‍♀️ Showcase of the wink-bm25-text-search package
JavaScript
8
star
20

showcase-timeline

📆 Timeline view of Wikipedia articles
HTML
6
star
21

showcase-writing-assistant

English language writing assistant
JavaScript
6
star
22

wink-helpers

Functions for cross validation, shuffle, cartesian product and more
JavaScript
6
star
23

wink-composer

Compose LLM applications easily ♫
JavaScript
6
star
24

wink-embeddings-sg-100d

100-dimensional English word embeddings for wink-nlp
5
star
25

winkjs.github.io

New winkjs.org website, made using middleman
HTML
4
star
26

showcase-serverless

👾 Demo of a serverless wink-pos-tagger
JavaScript
3
star
27

sentimental

This repo has moved to
HTML
3
star
28

wink-llm-composer

Compose LLM applications easily
2
star