• Stars
    star
    137
  • Rank 264,439 (Top 6 %)
  • Language
    JavaScript
  • License
    MIT License
  • Created about 9 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

bag-of-words calculator in javascript

mimir: Bag-Of-Words and TF-IDF

mimir

Mimir knows a lot about words

mimir is a JavaScript micro-module to produce a vocabulary of words given a set of texts, and a vector representation of a text against that vocabulary. It also performs basic TF-IDF analysis.

In NLP and IR, a bag-of-words model is a way to represent a piece of text with a vector, which, in JavaScript, is a simple array of integers. A vector is the imprescindible starting element for any kind of machine learning or classification.

mimir disregards all grammar and non-alphanumeric characters.

As your text is now a vector, you can use feed it to trained classifiers such as Artificial Neural Networks (ANN), or a Support Vector Machine (SVM).

Usage

BOW

var mimir = require('./index'),
  bow = mimir.bow,
  dict = mimir.dict;

var texts = ["I like\n, : ; chocolate",
  "Chocolate; is great",
  "I like  --boar ragu'",
  "I don't like artichokes"
],
  voc = dict(texts);
console.log(bow("boar like chocolate", voc), bow("Ragu is great and I like it", voc));
// prints [ 0, 1, 1, 0, 0, 1, 0, 0, 0 ] [ 1, 1, 0, 1, 1, 0, 1, 0, 0 ]

TF-IDF

Term Frequency - Inverse Document Frequency is extremely important for scoring the importance of words in a series of documents.

var mimir = require('./index'),
  tfidf = mimir.tfidf;

var textlist = [
  "World War II, also known as the Second World War (after the recent Great War), was a global war that lasted from 1939 to 1945. World War II is the deadliest conflict in human history",
  "Germanic paganism refers to the theology and religious practices of the Germanic peoples from the Iron Age until their Christianization during the Medieval period.",
  "The Cleveland Bay is a breed of horse that originated in England during the 17th century, named for its consistent bay colouring and the Cleveland district of Yorkshire. It is a strong, well-muscled horse breed, the oldest established breed in England, and the only non-draught horse developed in Great Britain. The ancestors of the breed were developed during the Middle Ages for use as pack horses"
];

textlist.forEach(function (t, index) {
  console.log('Most important words in document', index + 1);
  var scores = {};
  tokenize(t).forEach(function (word) {
    scores[word] = tfidf(word, t, textlist);
  });
  scores = Object.keys(scores).map(function (word) {
    return {
      word: word,
      score: scores[word]
    }
  });
  scores.sort(function (a, b) {
    return a.score < b.score ? 1 : -1;
  });
  console.log(scores.splice(0, 3));
});
/*
prints:
tf-idf for the word chocolate: -0.2231435513142097
Most important words in document 1
[ { word: 'war', score: 0.05792358687259491 },
  { word: 'world', score: 0.034754152123556946 },
  { word: 'ii', score: 0.023169434749037963 } ]
Most important words in document 2
[ { word: 'germanic', score: 0.032437208648653154 },
  { word: 'christianization', score: 0.016218604324326577 },
  { word: 'theology', score: 0.016218604324326577 } ]
Most important words in document 3
[ { word: 'breed', score: 0.023850888712244965 },
  { word: 'horse', score: 0.017888166534183726 },
  { word: 'developed', score: 0.011925444356122483 } ]

*/

More Repositories

1

LokiJS

javascript embeddable / in-memory database
JavaScript
6,726
star
2

pycv

Website of the book "Learn OpenCV 3 with Python"
Python
543
star
3

PowerArray

Boosted Performance Array
JavaScript
248
star
4

fundb

Functional programming based database engine
JavaScript
38
star
5

classify-text

proof of concept of text classification with mimir and brain
JavaScript
36
star
6

redisvue

real-time monitor and keys analytics for redis
Vue
32
star
7

jotun

convert javascript object to numerical representation (VSM)
JavaScript
20
star
8

dog-project

Jupyter Notebook
11
star
9

aind2-rnn

udacity recurrent neural network project
Jupyter Notebook
10
star
10

armour

prevent side-effects function wrapper
JavaScript
10
star
11

lokijs-server

lokijs tcp server
JavaScript
9
star
12

udacity-ec2-gpu

terraform project to launch a gpu ec2 instance for deep learning network training
HCL
9
star
13

opencv-mediapipe-hand-gesture-recognition

generate training data, train and classify a deep learning model for hand gesture recognition
Jupyter Notebook
7
star
14

lokijs-client

tcp client for lokijs
JavaScript
7
star
15

csv-loader

csv-loader for js
JavaScript
6
star
16

baldr

jekyll-style statically generated website with angular and requirejs
CSS
6
star
17

jumble

ultra-efficient javascript serialization library
JavaScript
5
star
18

paladin

Javascript composition and delegation library
JavaScript
4
star
19

cv_ml

computer vision and machine learning with opencv, sklearn, keras
Python
4
star
20

snaptun

lokijs http server wrapper
JavaScript
4
star
21

aind-sudoku

Python
3
star
22

lokijs-utils

utilities for lokijs
JavaScript
3
star
23

aind-cv-capstone

capstone project for aind cv concentration
Jupyter Notebook
3
star
24

gordian

node testing framework
JavaScript
3
star
25

autoform

nodejs server side utility to generate form elements with JSON form descriptors
JavaScript
3
star
26

AIND-Isolation

Python
3
star
27

skrymir

LokiJS v2.0 experimental repository
JavaScript
3
star
28

sleipnir

LokiJS tcp server
3
star
29

streamlog

Immutable events as a Stream, Log and Database: self-materializing and replicating views for distributed systems
JavaScript
3
star
30

dis

Go
2
star
31

snaptun-cli

command line client for snaptun
JavaScript
2
star
32

miniss

A slim javascript/JSON to css processor. Ideal for building based on a common CSS.
JavaScript
2
star
33

lokijs.org

website code
HTML
2
star
34

num-js

numpy-like lib
JavaScript
2
star
35

forward

forwarding redis keys into the aether
Go
2
star
36

ann

artificial neural networks implementation using SGD
JavaScript
2
star
37

nibiru

run lambda locally
JavaScript
2
star
38

classifly

example of JS Object classification using brain (ANN), mimir (BOW) and jotun (objects to vector)
1
star
39

techfort.com

tech-fort.com website
JavaScript
1
star
40

bloki

express angular blogging engine with lokijs database
JavaScript
1
star
41

aind-cv-mimicme

JavaScript
1
star
42

go-demo

this repo is a hello world http server that only serves as a sample go app for building go apps with jenkins. If you are learning jenkins you can use this too
Go
1
star
43

Sigyn

android nosql embeddable datastore
Java
1
star
44

sendyoulater

Go
1
star
45

AndroidMongolab

a tiny library for mongolab connection with android
Java
1
star
46

hbow

bag of words in haskell (study)
Haskell
1
star
47

norns

replicable redis-like key value store for node
JavaScript
1
star
48

doomsword

doomsword web site code
CSS
1
star
49

heavy-metal-js

Performance findings, Code robustness: Speed and Power. Heavy Metal.
1
star
50

streamlogdb

event stream as log and database
C++
1
star
51

wyrdtales

Go
1
star