• Stars
    star
    1,607
  • Rank 29,111 (Top 0.6 %)
  • Language
    Haskell
  • License
    Other
  • Created over 12 years ago
  • Updated over 8 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Homomorphic machine learning

HLearn

HLearn is a high performance machine learning library written in Haskell. For example, it currently has the fastest nearest neighbor implementation for arbitrary metric spaces (see this blog post).

HLearn is also a research project. The research goal is to discover the "best possible" interface for machine learning. This involves two competing demands: The library should be as fast as low-level libraries written in C/C++/Fortran/Assembly; but it should be as flexible as libraries written in high level languages like Python/R/Matlab. Julia is making amazing progress in this direction, but HLearn is more ambitious. In particular, HLearn's goal is to be faster than the low level languages and more flexible than the high level languages.

To achieve this goal, HLearn uses a very different interface than standard learning libraries. The H in HLearn stands for three separate concepts that are fundamental to HLearn's design:

  1. The H stands for Haskell. Machine learning is about estimating functions from data, so it makes sense that a functional programming language would be well suited for machine learning. But Functional programming languages are not widely used in machine learning because they traditionally lack strong support for the fast numerical computations required for learning algorithms. HLearn uses the SubHask library to get this fast numeric support in Haskell. The two libraries are being developed in tandem with each other.
  1. The H stands for Homomorphisms. Homomorphisms are a fundamental concept in abstract algebra, and HLearn exploits the algebraic structures inherrent in learning systems. The following table gives a brief overview of what these structures give us:

    Structure What we get
    Monoid parallel batch training
    Monoid online training
    Monoid fast cross-validation
    Abelian group "untraining" of data points
    Abelian group more fast cross-validation
    R-Module weighted data points
    Vector space fractionally weighted data points
    Functor fast simple preprocessing of data
    Monad fast complex preprocessing of data
  2. The H stands for the History monad. One of the most difficult tasks of developing a new learning algorithm is debugging the optimization procedure. There has previously been essentially no work on making this debugging process easier, and the History monad tries to solve this problem. It lets you thread debugging information throughout the optimization code without modifying the original code. Furthermore, there is no runtime overhead associated with this technique.

The downside of HLearn's ambition is that it currently does not implement many of the popular machine learning techniques.

More Documentation

Due to the rapid pace of development, HLearn's documentation is sparse. That said, the examples folder is a good place to start. The haddock documentation embedded within the code is decent; but unfortunately, hackage is unable to compile the haddocks because it uses an older version of GHC.

HLearn has several academic papers:

There are also a number of blog posts on my personal website. Unfortunately, they are mostly out of date with the latest version of HLearn. They might help you understand some of the main concepts in HLearn, but the code they use won't work at all.

Contributing

I'd love to have you contribute, and I'd be happy to help you get started! Just create an issue to let me know you're interested and we can work something out.

More Repositories

1

ucr-cs100

open source software construction course
C++
483
star
2

subhask

Type safe interface for working in subcategories of Hask
Haskell
413
star
3

HerbiePlugin

GHC plugin that improves Haskell code's numerical stability
Haskell
191
star
4

ifcxt

constraint level if statements
Haskell
110
star
5

cmc-csci046

CMC's Data Structures and Algorithms Course Materials
TeX
52
star
6

typeparams

Lens-like interface for type level parameters; allows unboxed unboxed vectors and supercompilation
Haskell
41
star
7

cmc-csci143

big data course materials
TeX
40
star
8

cmc-csci040

Computing for the Web
HTML
37
star
9

hmm

hidden markov models in haskell
Haskell
34
star
10

cmc-csci145-math166

Data Mining
TeX
32
star
11

datasets

A collection of publicly available datasets
Scilab
26
star
12

simd

simple interface to ghc's simd vector support
Haskell
23
star
13

gitlearn

a course management system (similar to ilearn) based on git
Shell
22
star
14

parsed

a haskellified version of the classic sed unix tool
TeX
21
star
15

homoiconic

Constructs FAlgebras from typeclasses, making Haskell functions homoiconic
Haskell
18
star
16

deep-tda

TeX
17
star
17

typespeed

fork of the popular typespeed program designed for the UCR cs100 curriculum
C
16
star
18

cmc-csci181-deeplearning

deep learning course materials
Python
15
star
19

vector-heterogenous

Arbitrary size tuples in Haskell
Haskell
12
star
20

american-shit

primary and secondary sources documenting America's misdeads
11
star
21

geolocation

Image/text geolocation with tensorflow and the MvMF
Python
9
star
22

modulus-magnus-linguae

Python
8
star
23

ConstraintKinds

Implements common Haskell type classes using the constraint kinds pattern to allow constraints.
Haskell
7
star
24

vector-functorlazy

vectors supporting lazy fmap application; asymptotically faster in some cases
Haskell
7
star
25

metahtml

Python
5
star
26

chajda

Python
5
star
27

wiktionary_bli

TeX
4
star
28

cmc-csci181-languages

HTML
3
star
29

twitter_coronavirus

Python
2
star
30

rapache

a simple webserver built with bash commands
Shell
2
star
31

pagila-hw3

Shell
1
star
32

lab-timeit2

Python
1
star
33

lambda-server

Shell
1
star
34

pagila-hw2

Shell
1
star
35

binary_search

Python
1
star
36

korean

1
star
37

wardial

Python
1
star
38

lab-goodreads

1
star
39

multiskipgram

Python
1
star
40

mediabiasfactcheck

Python
1
star
41

reddit

Python
1
star
42

fake-news

Python
1
star
43

novichenko

Python
1
star
44

twitter_postgres2

Python
1
star
45

word_ladder

Python
1
star
46

2023spring-FinTechPracticum

1
star
47

pgrollup

easy creation of rollup tables in postgresql (compute count(*) queries in constant time)
PLpgSQL
1
star
48

NovichenkoBot

Python
1
star
49

2021fintech

Jupyter Notebook
1
star
50

sorting

Python
1
star
51

2020summer

TeX
1
star
52

cv

my curriculum vitae
TeX
1
star
53

search_engine

Python
1
star
54

Classification

Haskell
1
star
55

histogram

A Haskell package for easily creating histograms
Haskell
1
star
56

dominion

Haskell
1
star
57

haskell-lecture

Outline of material for an intro to haskell course
1
star
58

pagila-midterm

PLpgSQL
1
star
59

cmc-advising

1
star
60

containers

Python
1
star
61

glowing-wookie

Haskell
1
star
62

twitter_postgres

Python
1
star
63

2021summer

1
star
64

hperfstat

haskell bindings to linux's cpu hw counters (i.e. results of `perf stat`)
1
star