• Stars
    star
    1,490
  • Rank 31,531 (Top 0.7 %)
  • Language
    Jupyter Notebook
  • License
    Apache License 2.0
  • Created over 6 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

2-2000x faster ML algos, 50% less memory usage, works on all hardware - new and old.

drawing

2-2000x faster algos, 50% less memory usage, works on all hardware - new and old.

If you want to collab on fast algorithms - msg me!! Join our Discord server on making AI faster, or if you just wanna chat about AI!! https://discord.gg/k8AtkZqNwr

What's going to be in Hyperlearn 2022!

animated

! Hyperlearn is under construction! A brand new stable package will be uploaded sometime in 2022! Stay tuned!

Moonshot Website

Documentation

50 Page Modern Big Data Algorithms PDF

In 2018-2020, I was at NVIDIA helping make GPU ML algos faster! I incorporated Hyperlearn's methods to make TSNE 2000x faster, and others faster. Since then, I have 50+ fast algos, but didn't have time to update Hyperlearn since Moonshot was priority one! I'll be updating Hyperlearn late 2022!


Hyperlearn's algorithms, methods and repo has been featured or mentioned in 5 research papers!

+ Microsoft, UW, UC Berkeley, Greece, NVIDIA

Hyperlearn's methods and algorithms have been incorporated into more than 6 organizations and repositories!

+ NASA + Facebook's Pytorch, Scipy, Cupy, NVIDIA, UNSW

During Hyperlearn's development, bugs and issues were notified to GCC!


Packages Used

HyperLearn is written completely in PyTorch, NoGil Numba, Numpy, Pandas, Scipy & LAPACK, C++, C, Python, Cython and Assembly, and mirrors (mostly) Scikit Learn. HyperLearn also has statistical inference measures embedded, and can be called just like Scikit Learn's syntax.

Some key current achievements of HyperLearn:

  • 70% less time to fit Least Squares / Linear Regression than sklearn + 50% less memory usage
  • 50% less time to fit Non Negative Matrix Factorization than sklearn due to new parallelized algo
  • 40% faster full Euclidean / Cosine distance algorithms
  • 50% less time LSMR iterative least squares
  • New Reconstruction SVD - use SVD to impute missing data! Has .fit AND .transform. Approx 30% better than mean imputation
  • 50% faster Sparse Matrix operations - parallelized
  • RandomizedSVD is now 20 - 30% faster

Around mid 2022, Hyperlearn will evolve to GreenAI and aims to incorporate:

  • New Paratrooper optimizer - fastest SGD variant combining Lookahead, Learning Rate Range Finder, and more!
  • 30% faster Matrix Multiplication on CPUs
  • Software Support for brain floating point (bfloat16) on nearly all hardware
  • Easy compilation on old and new CPU hardware (x86, ARM)
  • 100x faster regular expressions
  • 50% faster and 50% less memory usage for assembly kernel accelerated methods
  • Fast and parallelized New York Times scraper
  • Fast and parallelized NYSE Announcements scraper
  • Fast and parallelized FRED scraper
  • Fast and parallelized Yahoo Finance scraper

I also published a mini 50 page book titled "Modern Big Data Algorithm".

Modern Big Data Algorithms PDF

Modern Big Data Algorithms

Comparison of Speed / Memory

Algorithm n p Time(s) RAM(mb) Notes
Sklearn Hyperlearn Sklearn Hyperlearn
QDA (Quad Dis A) 1000000 100 54.2 22.25 2,700 1,200 Now parallelized
LinearRegression 1000000 100 5.81 0.381 700 10 Guaranteed stable & fast

Time(s) is Fit + Predict. RAM(mb) = max( RAM(Fit), RAM(Predict) )

I've also added some preliminary results for N = 5000, P = 6000

drawing


Help is really needed! Message me!


Key Methodologies and Aims

1. Embarrassingly Parallel For Loops

2. 50%+ Faster, 50%+ Leaner

3. Why is Statsmodels sometimes unbearably slow?

4. Deep Learning Drop In Modules with PyTorch

5. 20%+ Less Code, Cleaner Clearer Code

6. Accessing Old and Exciting New Algorithms


1. Embarrassingly Parallel For Loops

  • Including Memory Sharing, Memory Management
  • CUDA Parallelism through PyTorch & Numba

2. 50%+ Faster, 50%+ Leaner

3. Why is Statsmodels sometimes unbearably slow?

  • Confidence, Prediction Intervals, Hypothesis Tests & Goodness of Fit tests for linear models are optimized.
  • Using Einstein Notation & Hadamard Products where possible.
  • Computing only what is neccessary to compute (Diagonal of matrix and not entire matrix).
  • Fixing the flaws of Statsmodels on notation, speed, memory issues and storage of variables.

4. Deep Learning Drop In Modules with PyTorch

  • Using PyTorch to create Scikit-Learn like drop in replacements.

5. 20%+ Less Code, Cleaner Clearer Code

  • Using Decorators & Functions where possible.
  • Intuitive Middle Level Function names like (isTensor, isIterable).
  • Handles Parallelism easily through hyperlearn.multiprocessing

6. Accessing Old and Exciting New Algorithms

  • Matrix Completion algorithms - Non Negative Least Squares, NNMF
  • Batch Similarity Latent Dirichelt Allocation (BS-LDA)
  • Correlation Regression
  • Feasible Generalized Least Squares FGLS
  • Outlier Tolerant Regression
  • Multidimensional Spline Regression
  • Generalized MICE (any model drop in replacement)
  • Using Uber's Pyro for Bayesian Deep Learning

Goals & Development Schedule

Hyperlearn will be revamped in the following months to become Moonshot GreenAI with over an extra 150 optimized algorithms! Stay tuned!! Also you made it this far! If you want to join Moonshot, complete the secretive quiz!

Join Moonshot!


Extra License Terms

  1. The Apache 2.0 license is adopted.

More Repositories

1

sciblox

sciblox - Easier Data Science and Machine Learning
HTML
48
star
2

Microsoft_Hack_2017

Created a Machine Learning system to predict if someone is going to suicide, and proposed some mechanisms to counter and help potential victims.
Jupyter Notebook
4
star
3

AI_Lectures_2017

Data Science Society of UNSW Lecture Series
HTML
3
star
4

Gov_Hack_2017

We used NSW Air Quality data, health data, transport OPAL congestion data and also some novel disease spreading algorithms to try model and find correlations between Air Quality, Congestion and Influenza rates.
HTML
3
star
5

Markovian_SIR_Deaths_Model

I noticed that traditional methods to predict a disease outbreak was by performing sentiment analysis on Twitter posts and Google Search terms. Unfortunately, these methods were inadequate, as Twitter and Google is not popular in all countries. So, I created a system to model and predict outbreaks without the need for social media. The system was able to update the probabilities of a virus from spreading from A to B in real time, and I plan to release it to the public next year. I also used Machine Learning and Deep Learning to predict larger long-term virus trends with Google Trends, and this acted as a validator for the MSIRD model.
Jupyter Notebook
3
star
6

Reversing_Markov_Chains

Sometimes we want to โ€reverseโ€ a Markov Chain process. Taking the inverse of the transition matrix allows this to work, but the inverse result is not a transition matrix. If I wanted to model a population going to work, and then going back home, negative and greater than 1 probabilities in the inverse matrix will cause issues. I propose a method to compute the โ€inverseโ€ of a transition matrix, and the result is still a transition matrix
Python
3
star
7

Health_Hack_2017

We developed a data mining solution to scrape data from multiple sources to get the latest and greatest information about health grants. AND - we designed a cool Jupyter Notebook Solution to visualise our 2nd challenge- to get data constantly from a source, and show it in Plotly
Jupyter Notebook
3
star
8

AI_Lab_2018

2
star
9

danielhanchen.github.io

A collection of machine learning notes
HTML
2
star
10

MBS_Datathon_2017

We created a new Health Rating system that aggregated data from QANTAS, ABS, Victoria Health and supermarket data to reveal correlations between them. We got Special Commendations for our great work!
2
star
11

UNSW_2025_Degree_Recommendation

Jupyter Notebook
2
star
12

aeros

Project Aeros aims to optimise data science processes in Python, and provides the user with easy to use functions and APIs
1
star
13

UNSW_MARK_AI

Jupyter Notebook
1
star
14

danielhanchen

1
star
15

game2019

HTML
1
star
16

cria

Finetuning LLaMA with cleaned datasets
Python
1
star