• Stars
    star
    222
  • Rank 179,123 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created over 6 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A tool library of classical machine learning algorithms with only numpy.

NP_ML

Introduction

Classical machine learning algorithms implemented with pure numpy.

The repo to help you understand the ml algorithms instead of blindly using APIs.

Directory

Algorithm List

Classify

  • Perceptron

For perceptron, the example used the UCI/iris dataset. Since the basic perceptron is a binary classifier, the example used the data for versicolor and virginica. Also, since the iris dataset is not linear separable, the result may vary much.

Figure: versicolor and virginica. Hard to distinguish... Right?

Perceptron result on the Iris dataset.

  • K Nearest Neightbor (KNN)

For KNN, the example also used the UCI/iris dataset.

KNN result on the Iris dataset.

  • Naive Bayes

For naive bayes, the example used the UCI/SMS Spam Collection Dataset to do spam filtering.

For this example only, for tokenizing, nltk is used. And the result is listed below:

preprocessing data...
100%|#####################################################################| 5572/5572 [00:00<00:00, 8656.12it/s]
finish preprocessing data.

100%|#####################################################################| 1115/1115 [00:00<00:00, 55528.96it/s]
accuracy:  0.9757847533632287

We got 97.6% accuracy! That's nice!

And we try two examples, a typical ham and a typical spam. The result show as following.

example ham:
Po de :-):):-):-):-). No need job aha.
predict result:
ham

example spam:
u r a winner U ave been specially selected 2 receive æ¾¹1000 cash or a 4* holiday (flights inc) speak to a 
live operator 2 claim 0871277810710p/min (18 )
predict result:
spam
  • Decision Tree

For decision tree, the example used the UCI/tic-tac-toe dataset. The input is the status of 9 block and the result is whether x win.

tic tac toe.

Here, we use ID3 and CART to generate a one layer tree.

For the ID3, we have:

root
├── 4 == b : True
├── 4 == o : False
└── 4 == x : True
accuracy = 0.385

And for CART, we have:

root
├── 4 == o : False
└── 4 != o : True
accuracy = 0.718

In both of them, feature_4 is the status of the center block. We could find out that the center block matters!!! And in ID3, the tree has to give a result for 'b', which causes the low accuracy.

  • Random Forest
  • SVM
  • AdaBoost
  • HMM

Cluster

  • Kmeans

For kmeans, we use the make_blob() function in sklearn to produce toy dataset.

Kmeans result on the blob dataset.

  • Affinity Propagation

You can think affinity propagation as an cluster algorithm that generate cluster number automatically.

Kmeans result on the blob dataset.

Manifold Learning

In manifold learning, we all use the simple curve-s data to show the difference between algorithms.

Curve S data.

  • PCA

The most popular way to reduce dimension.

PCA visualization.

  • LLE

A manifold learning method using only local information.

LLE visualization.

NLP

  • LDA

Time Series Analysis

  • AR

Usage

  • Installation

If you want to use the visual example, please install the package by:

  $ git clone https://github.com/zhuzilin/NP_ML
  $ cd NP_ML
  $ python setup.py install
  • Examples in section "Algorithm List"

Run the script in NP_ML/example/ . For example:

  $ cd example/
  $ python affinity_propagation.py

(Mac/Linux user may face some issue with the data directory. Please change them in the correspondent script).

  • Examples for Statistical Learning Method(《统计学习方法》)

Run the script in NP_ML/example/StatisticalLearningMethod/ .For example:

  $ cd example/StatisticalLearningMethod
  $ python adaboost.py

Reference

Classical ML algorithms was validated by naive examples in Statistical Learning Method(《统计学习方法》)

Time series models was validated by example in Bus 41202

Something Else

Currently, this repo will only implement algorithms that do not need gradient descent. Those would be arranged in another repo in which I would implement those using framework like pytorch. Coming soon:)

More Repositories

1

ring-flash-attention

Ring attention implementation with flash attention
Python
556
star
2

faster-nougat

Implementation of nougat that focuses on processing pdf locally.
Python
72
star
3

monkey

A C++ version monkey language interpreter. From Write An Interpreter In Go
C++
36
star
4

pdf-with-its-own-md5

A PDF template that contains its own MD5!
TeX
36
star
5

ncnn-swift

An example on using ncnn with Swift.
C++
33
star
6

es

A JavaScript interpreter from scratch, supporting ES5 syntax.
C++
25
star
7

chatgpt-desktop

Desktop version of ChatGPT, support manually set cookie
JavaScript
16
star
8

google-translate-desktop

Google Translate Desktop built with Electron
JavaScript
15
star
9

seastar-cn

Seastar教程(文档)中文翻译。A Chinese translation for Seastar tutorial and relevant documents.
14
star
10

pytorch-malloc

An external memory allocator example for PyTorch.
C++
13
star
11

NeZha

Organizing ssh servers in one shell.
Python
10
star
12

simple-pandas

A much simpler pandas!!!
Jupyter Notebook
8
star
13

SwiftPEG

A PEG parser generator written in swift 5.3.
Swift
7
star
14

aqt-pytorch

Python
7
star
15

li

another mini text editor with 500 loc and simple interfaces for short cut.
C++
7
star
16

wandb-discord-bot

A discord bot for monitoring wandb project and runs.
JavaScript
6
star
17

PIXEL.css

the pixel art code from the fun and famous NES.css
SCSS
5
star
18

vllm-group

Python
5
star
19

ecdh-psi

A Go Implementation of ECDH-PSI
Go
4
star
20

pytorch-extension-gcc

How to use gcc instead of setup.py to build an pytorch extension.
Python
4
star
21

torchrec_mapper

C++
2
star
22

dafny-exercises

some formal verification exercises using dafny.
Dafny
2
star
23

on-device_recommendation_tflite

Swift
2
star
24

blog

my blog~
JavaScript
2
star
25

zhuzilin

2
star
26

electron-fc

A electron based famicom(NES) emulator
JavaScript
1
star
27

bfc

A really fast non-JIT brainfuck interpreter.
Brainfuck
1
star
28

autodiff

pytorch like automatic differentiation library in numpy
Python
1
star
29

nvcc_daxpy_example

C++
1
star
30

garbled_circuit

A python implementation of Yao's GC.
Python
1
star