• Stars
    star
    125
  • Rank 286,335 (Top 6 %)
  • Language
    TypeScript
  • License
    MIT License
  • Created almost 2 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Powerful machine learning library for Node.js – uses Python's scikit-learn under the hood.

sklearn ts logo

scikit-learn-ts

NPM Build Status MIT License Prettier Code Formatting

Intro

This project enables Node.js devs to use Python's powerful scikit-learn machine learning library – without having to know any Python. 🤯

See the full docs for more info.

Note This project is new and experimental. It works great for local development, but I wouldn't recommend using it for production just yet. You can follow the progress on Twitter @transitive_bs

Features

  • All TS classes are auto-generated from the official python scikit-learn docs!
  • All 257 classes are supported along with proper TS types and docs
    • KMeans
    • TSNE
    • PCA
    • LinearRegression
    • LogisticRegression
    • DecisionTreeClassifier
    • RandomForestClassifier
    • XGBClassifier
    • DBSCAN
    • StandardScaler
    • MinMaxScaler
    • ... all of them 💯
  • Generally much faster and more robust than JS-based alternatives
    • (benchmarks & comparisons coming soon)

Prequisites

This project is meant for Node.js users, so don't worry if you're not familiar with Python. This is the only step where you'll need to touch Python, and it should be pretty straightforward.

Make sure you have Node.js and Python 3 installed and in your PATH.

  • node >= 14
  • python >= 3.7

In python land, install numpy and scikit-learn either globally via pip or via your favorite virtualenv manager. The shell running your Node.js program will need access to these python modules, so if you're using a virtualenv, make sure it's activated.

If you're not sure what this means, it's okay. First install python, which will also install pip, python's package manager. Then run:

pip install numpy scikit-learn

Congratulations! You've safely navigated Python land, and from here on out, we'll be using Node.js / JS / TS. The sklearn NPM package will use your Python installation under the hood.

Install

npm install sklearn

Usage

See the full docs for more info.

import * as sklearn from 'sklearn'

const data = [
  [0, 0, 0],
  [0, 1, 1],
  [1, 0, 1],
  [1, 1, 1]
]

const py = await sklearn.createPythonBridge()

const model = new sklearn.TSNE({
  n_components: 2,
  perplexity: 2
})
await model.init(py)

const x = await model.fit_transform({ X: data })
console.log(x)

await model.dispose()
await py.disconnect()

Since the TS classes are auto-generated from the Python docs, the code will look almost identical to the Python version, so use their excellent API docs as a reference.

All class names, method names, attribute (accessor) names and types are the same as the official Python version.

The main differences are:

  • You need to call createPythonBridge() before using any sklearn classes
    • This spawns a Python child process and validates all of the Python dependencies
    • You can pass a custom python path via createPythonBridge({ python: '/path/to/your/python3' })
  • You need to pass this bridge to a class's async init method before using it
    • This creates an underlying Python variable representing your class instance
  • Instead of using numpy or pandas, we're just using plain JavaScript arrays
    • Anywhere the Python version would input or output a nympy array, we instead just use number[], number[][], etc
    • We take care of converting to and from numpy arrays automatically where necessary
  • Whenever you're done using an instance, call dispose() to free the underlying Python resources
  • Whenever you're done using your Python bridge, call disconnect() on the bridge to cleanly exit the Python child process

Restrictions

  • We don't currently support positional arguments; only keyword-based arguments:
// this works (keyword args)
const x = await model.fit_transform({ X: data })

// this doesn't work yet (positional args)
const y = await model.fit_transform(data)
  • We don't currently generate TS code for scikit-learn's built-in datasets
  • We don't currently generate TS code for scikit-learn's top-level function exports (only classes right now)
  • There are basic unit tests for a handful of the auto-generated TS classes, and they work well, but there are probably edge cases and bugs in other auto-generated classes
    • Please create an issue on GitHub if you run into any weird behavior and include as much detail as possible, including code snippets

Examples

Here are some side-by-side examples using the official Python scikit-learn package on the left and the TS sklearn package on the right.

StandardScaler

StandardScaler Python docs

Python TypeScript
import numpy as np
from sklearn.preprocessing import StandardScaler

data = np.array([
  [0, 0, 0],
  [0, 1, 1],
  [1, 0, 1],
  [1, 1, 1]
])

s = StandardScaler()

x = s.fit_transform(data)
import * as sklearn from 'sklearn'

const data = [
  [0, 0, 0],
  [0, 1, 1],
  [1, 0, 1],
  [1, 1, 1]
]

const py = await sklearn.createPythonBridge()

const s = new sklearn.StandardScaler()
await s.init(py)

const x = await s.fit_transform({ X: data })

KMeans

KMeans Python docs

Python TypeScript
import numpy as np
from sklearn.cluster import KMeans

data = np.array([
  [0, 0, 0],
  [0, 1, 1],
  [1, 0, 1],
  [1, 1, 1]
])

model = KMeans(
  n_clusters=2,
  random_state=42,
  n_init='auto'
)

x = model.fit_predict(data)
import * as sklearn from 'sklearn'

const data = [
  [0, 0, 0],
  [0, 1, 1],
  [1, 0, 1],
  [1, 1, 1]
]

const py = await sklearn.createPythonBridge()

const model = new sklearn.KMeans({
  n_clusters: 2,
  random_state: 42,
  n_init: 'auto'
})
await model.init(py)

const x = await model.fit_predict({ X: data })

TSNE

TSNE Python docs

Python TypeScript
import numpy as np
from sklearn.manifold import TSNE

data = np.array([
  [0, 0, 0],
  [0, 1, 1],
  [1, 0, 1],
  [1, 1, 1]
])

model = TSNE(
  n_components=2,
  perplexity=2,
  learning_rate='auto',
  init='random'
)

x = model.fit_transform(data)
import * as sklearn from 'sklearn'

const data = [
  [0, 0, 0],
  [0, 1, 1],
  [1, 0, 1],
  [1, 1, 1]
]

const py = await sklearn.createPythonBridge()

const model = new sklearn.TSNE({
  n_components: 2,
  perplexity: 2,
  learning_rate: 'auto',
  init: 'random'
})
await model.init(py)

const x = await model.fit_transform({ X: data })

See the full docs for more examples.

Why?

The Python ML ecosystem is generally a lot more mature than the Node.js ML ecosystem. Most ML research happens in Python, and many common ML tasks that Python devs take for granted are much more difficult to accomplish in Node.js.

For example, I was recently working on a data viz project using full-stack TypeScript, and I needed to use k-means and t-SNE on some text embeddings. I tested 6 different t-SNE JS packages and several k-means packages. None of the t-SNE packages worked for medium-sized inputs, they were 1000x slower in many cases, and I kept running into NaN city with the JS-based versions.

Case in point; it's incredibly difficult to compete with the robustness, speed, and maturity of proven Python ML libraries like scikit-learn in JS/TS land.

So instead of trying to build a Rust-based version from scratch or using ad hoc NPM packages like above, I decided to create an experiment to see how practical it would be to just use scikit-learn from Node.js.

And that's how scikit-learn-ts was born.

How it works

This project uses a fork of python-bridge to spawn a Python interpreter as a subprocess and communicates back and forth via standard Unix pipes. The IPC pipes don't interfere with stdout/stderr/stdin, so your Node.js code and the underlying Python code can print things normally.

The TS library is auto-generated from the Python scikit-learn API docs. By using the official Python docs as a source of truth, we can guarantee a certain level of compatibility and upgradeability.

For each scikit-learn HTML page that belongs to an exported Python class or function, we first parse it's metadata, params, methods, attributes, etc using cheerio, then we convert the Python types into equivalent TypeScript types. We then generate a corresponding TypeScript file which wraps an instance of that Python declaration via a PythonBridge.

For each TypeScript wrapper class of function, we take special care to handle serializing values back and forth between Node.js and Python as JSON, including converting between primitive arrays and numpy arrays where necessary. All numpy array conversions should be handled automatically for you since we only support serializing primitive JSON types over the PythonBridge. There may be some edge cases where the automatic numpy inference fails, but we have a regression test suite for parsing these cases, so as long as the official Python docs are correct for a given type, then our implicit numpy conversion logic should "just work".

Credit

This project is not affiliated with the official Python scikit-learn project. Hopefully it will be one day. 😄

All of the difficult machine learning work happens under the hood via the official Python scikit-learn project, with full credit given to their absolutely amazing team. This project is just a small open source experiment to try and leverage the existing scikit-learn ecosystem for the Node.js community.

See the full docs for more info.

License

The official Python scikit-learn project is licensed under the BSD 3-Clause.

This project is licensed under MIT © Travis Fischer.

If you found this project helpful, please consider following me on twitter twitter

More Repositories

1

chatgpt-api

Node.js client for the official ChatGPT API. 🔥
TypeScript
12,733
star
2

create-react-library

CLI for creating reusable react libraries.
JavaScript
4,783
star
3

nextjs-notion-starter-kit

Deploy your own Notion-powered website in minutes with Next.js and Vercel.
TypeScript
4,211
star
4

awesome-puppeteer

A curated list of awesome puppeteer resources.
2,073
star
5

react-particle-effect-button

Bursting particle effect buttons for React 🎉
JavaScript
1,463
star
6

awesome-ffmpeg

👻 A curated list of awesome FFmpeg resources.
847
star
7

bing-chat

Node.js client for Bing's new AI-powered search. It's like ChatGPT on steroids 🔥
TypeScript
745
star
8

ffmpeg-concat

Concats a list of videos together using ffmpeg with sexy OpenGL transitions.
JavaScript
721
star
9

chatgpt-twitter-bot

Twitter bot powered by OpenAI's ChatGPT API. It's aliveeeee 🤖
TypeScript
629
star
10

functional-typescript

TypeScript standard for rock solid serverless functions.
TypeScript
628
star
11

chatgpt-plugin-ts

Everything you need to start building ChatGPT Plugins in JS/TS 🔥
TypeScript
546
star
12

ffmpeg-gl-transition

FFmpeg filter for applying GLSL transitions between video streams.
C
525
star
13

OpenOpenAI

Self-hosted version of OpenAI’s new stateful Assistants API
TypeScript
509
star
14

react-static-tweets

Extremely fast static renderer for tweets.
TypeScript
506
star
15

yt-semantic-search

OpenAI-powered semantic search for any YouTube playlist – featuring the All-In Podcast. 💪
TypeScript
423
star
16

twitter-search

Instantly search across your entire Twitter history with a beautiful UI powered by Algolia.
TypeScript
347
star
17

check-links

Robustly checks an array of URLs for liveness. Extremely fast ⚡
JavaScript
324
star
18

react-modern-library-boilerplate

Boilerplate for publishing modern React modules with Rollup
JavaScript
324
star
19

puppeteer-lottie

Renders Lottie animations via Puppeteer to image, GIF, or MP4.
JavaScript
309
star
20

snapchat

NodeJS client for the unofficial Snapchat API
JavaScript
265
star
21

lqip-modern

Modern approach to Low Quality Image Placeholders (LQIP) using webp and sharp.
JavaScript
221
star
22

sms-number-verifier

Allows you to spoof SMS number verification.
JavaScript
173
star
23

puppeteer-email

Email automation driven by headless chrome.
JavaScript
149
star
24

ffmpeg-generate-video-preview

Generates an attractive image strip or GIF preview from a video.
JavaScript
138
star
25

random

The most random module on npm
TypeScript
128
star
26

react-background-slideshow

Sexy tiled background slideshow for React 🔥
JavaScript
111
star
27

captcha-solver

Library and CLI for automating captcha verification across multiple providers.
JavaScript
107
star
28

chatgpt-well-known-plugin-finder

Checks Alexa's top 1M websites for the presence of OpenAI's new .well-known/ai-plugin.json files
TypeScript
106
star
29

puppeteer-lottie-cli

CLI for rendering Lottie animations via Puppeteer to image, GIF, or MP4.
JavaScript
106
star
30

react-suspense-polyfill

Polyfill for the React Suspense API 😮
JavaScript
100
star
31

puppeteer-instagram

Instagram automation driven by headless chrome.
JavaScript
100
star
32

chatgpt-hackers

Join thousands of other developers, researchers, and AI enthusiasts who are building at the cutting edge of AI ✨
TypeScript
97
star
33

react-starfield-animation

✨ Canvas-based starfield animation for React.
JavaScript
96
star
34

react-mp3-recorder

Microphone recorder for React that captures mp3 audio 🎵
JavaScript
84
star
35

react-particle-animation

✨Canvas-based particle animation for React.
JavaScript
81
star
36

react-fluid-gallery

Fluid media gallery for React powered by WebGL.
JavaScript
75
star
37

primitive

Reproduce images from geometric primitives.
JavaScript
74
star
38

npm-es-modules

Breakdown of 7 different ways to use ES modules with npm today.
JavaScript
69
star
39

react-fluid-animation

Fluid media animation for React powered by WebGL.
JavaScript
68
star
40

react-before-after-slider

A sexy image comparison slider for React.
JavaScript
62
star
41

ffmpeg-extract-frames

Extracts frames from a video using ffmpeg.
JavaScript
60
star
42

puppeteer-render-text

Robust text renderer using headless chrome.
JavaScript
59
star
43

text-summarization

Automagically generates summaries from html or text.
JavaScript
54
star
44

kwote

Create beautiful quotes that capture your attention.
TypeScript
49
star
45

next-movie

Pick your next movie using Next.js 13
TypeScript
45
star
46

bens-bites-ai-search

AI search for all the best resources in AI – powered by Ben's Bites 💯
TypeScript
45
star
47

ffmpeg-cli-flags

A comprehensive list of all ffmpeg commandline flags.
44
star
48

twitter-feed-algorithm

TypeScript code exploring what an open source version of Twitter's algorithmic feed might look like.
TypeScript
42
star
49

scrape-github-trending

Tutorial for web scraping / crawling with Node.js.
JavaScript
42
star
50

clubhouse

Clubhouse API client and social graph crawler for TypeScript.
TypeScript
41
star
51

populate-movies

Populates a high quality database of movies from TMDB, IMDB, and Rotten Tomatoes.
TypeScript
36
star
52

react-fake-tweet

React renderer for tweets.
JavaScript
33
star
53

ip-set

Efficient mutable set data structure optimized for use with IPv4 and IPv6 addresses. The primary use case is for working with potentially large IP blacklists.
JavaScript
33
star
54

cf-image-proxy

Image proxy and CDN for CF workers. Simple, extremely fast, and free.
JavaScript
31
star
55

node-compat-require

Easily allow your Node program to run in a target node version range to maximize compatibility.
JavaScript
22
star
56

avp

Audio Visual Playground, or Alien vs Predator? You decide...
TypeScript
22
star
57

p-cache

Decorator to memoize the results of async functions via lru-cache.
JavaScript
22
star
58

parse-otp-message

Parses OTP messages for a verification code and service provider.
JavaScript
21
star
59

spotify-to-twitter

Example of how to create your own automated Twitter account that tweets tracks from a Spotify playlist.
JavaScript
21
star
60

gif-extract-frames

Extracts frames from a GIF including inter-frame coalescing.
JavaScript
21
star
61

puppeteer-instaquote

Use Puppeteer to create snazzy Instagram-like quote images and memes
JavaScript
20
star
62

async-await-parallel

Node.js module with simple concurrency control for awaiting an array of async results
JavaScript
20
star
63

internet-diet

Chrome extension to remove unhealthy foods from the web.
HTML
18
star
64

react-block-image

React replacement for img with more control + fallback support.
JavaScript
18
star
65

react-docgen-props-table

Beautiful Props Table for React Docgen.
JavaScript
17
star
66

lexica-api

API wrapper around Lexica.art for searching Stable Diffusion images.
TypeScript
16
star
67

puppeteer-github

GitHub automation driven by headless chrome.
JavaScript
16
star
68

google-waitlist

Sign up for Google's latest AI-powered waitlist today!
TypeScript
16
star
69

ffmpeg-extract-frame

Extracts a single frame from a video.
JavaScript
15
star
70

compare-tokenizers

A test suite comparing Node.js BPE tokenizers for use with AI models.
TypeScript
15
star
71

ffmpeg-extract-audio

Extracts an audio stream from a media file.
JavaScript
14
star
72

replicate-api

Node.js wrapper around Replicate's ML API (including dreambooth + stable diffusion).
TypeScript
14
star
73

Gravity-spritekit

iOS n-body simulation visualized with metaballs. Physics and graphics provided by SpriteKit.
Swift
14
star
74

getsmscode

API client for getsmscode.com
JavaScript
13
star
75

nala

In loving memory of Nala Das Kitten; 2010 - 2023. 💕
TypeScript
13
star
76

ffmpeg-probe

Wrapper around ffprobe for getting info about media files.
JavaScript
13
star
77

apple-april-fools-2023

Fake Apple AI product launch for April Fool's Day 2023.
TypeScript
13
star
78

ffmpeg-on-progress

Utility for robustly reporting progress with fluent-ffmpeg.
JavaScript
13
star
79

dissolve-generator

Cool 2D dissolve effect generator
JavaScript
13
star
80

abstract-object-storage

Collection of useful utilities for working with Google Cloud Storage.
JavaScript
12
star
81

Milton

C++ Rendering Framework w/ MLT, bidi path tracing, etc. and OpenGL Previews (undergrad thesis project from Brown '09)
12
star
82

koa2-mongoose-crud

Koa 2 CRUD middleware for Mongoose models.
JavaScript
11
star
83

get-mp3-duration

Computes the duration of an mp3 buffer in node or browser.
JavaScript
11
star
84

github-scraper

Misc scripts for scraping GitHub.
TypeScript
9
star
85

create-vue-library

JavaScript
9
star
86

warm-social-images

Simple CLI to warm the cache of social images in all pages from a sitemap.
JavaScript
9
star
87

primitive-cli

CLI to reproduce images from geometric primitives.
JavaScript
9
star
88

puppeteer-render-text-cli

CLI for rendering text with headless chrome.
JavaScript
9
star
89

github-is-starred-cli

CLI for checking if a user has starred a particular GitHub repo.
JavaScript
9
star
90

open-source

Keeping track of my various open source projects.
8
star
91

id-shortener

Efficient id / url shortener for NodeJS backed by pluggable storage defaulting to redis.
JavaScript
8
star
92

update-markdown-jsdoc

Updates a markdown document section with jsdoc documentation.
JavaScript
7
star
93

commit-emoji

Performs a git commit with a random emoji message. 😂 🤙 🚀
JavaScript
7
star
94

github-is-starred

Checks if a user has starred a particular GitHub repo.
JavaScript
7
star
95

puppeteer-github-cli

CLI for GitHub automation driven by headless chrome.
JavaScript
7
star
96

wahlburger

Get dem burgers
JavaScript
7
star
97

phash-im

Perceptual image hashing provided by imagemagick.
JavaScript
6
star
98

phash-gif

Perceptual GIF hashing for easily finding near-duplicate GIFs.
JavaScript
6
star
99

is-acronym

Determines whether a given string is a common English acronym.
JavaScript
5
star
100

react-springy-scroll

React utility that adds a physical springiness to elements on scroll.
JavaScript
5
star