• This repository has been archived on 24/Jul/2020
  • Stars
    star
    610
  • Rank 71,098 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created over 14 years ago
  • Updated about 9 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

MILK: Machine Learning Toolkit

MILK: MACHINE LEARNING TOOLKIT

Machine Learning in Python

Milk is a machine learning toolkit in Python.

Its focus is on supervised classification with several classifiers available: SVMs (based on libsvm), k-NN, random forests, decision trees. It also performs feature selection. These classifiers can be combined in many ways to form different classification systems.

For unsupervised learning, milk supports k-means clustering and affinity propagation.

Milk is flexible about its inputs. It optimised for numpy arrays, but can often handle anything (for example, for SVMs, you can use any dataype and any kernel and it does the right thing).

There is a strong emphasis on speed and low memory usage. Therefore, most of the performance sensitive code is in C++. This is behind Python-based interfaces for convenience.

To learn more, check the docs at http://packages.python.org/milk/ or the code demos included with the source at milk/demos/.

Examples

Here is how to test how well you can classify some features,labels data, measured by cross-validation:

import numpy as np
import milk
features = np.random.rand(100,10) # 2d array of features: 100 examples of 10 features each
labels = np.zeros(100)
features[50:] += .5
labels[50:] = 1
confusion_matrix, names = milk.nfoldcrossvalidation(features, labels)
print 'Accuracy:', confusion_matrix.trace()/float(confusion_matrix.sum())

If want to use a classifier, you instanciate a learner object and call its train() method:

import numpy as np
import milk
features = np.random.rand(100,10)
labels = np.zeros(100)
features[50:] += .5
labels[50:] = 1
learner = milk.defaultclassifier()
model = learner.train(features, labels)

# Now you can use the model on new examples:
example = np.random.rand(10)
print model.apply(example)
example2 = np.random.rand(10)
example2 += .5
print model.apply(example2)

There are several classification methods in the package, but they all use the same interface: train() returns a model object, which has an apply() method to execute on new instances.

Details

License: MIT

Author: Luis Pedro Coelho (with code from LibSVM and scikits.learn)

API Documentation: http://packages.python.org/milk/

Mailing List: http://groups.google.com/group/milk-users

Features

  • SVMs. Using the libsvm solver with a pythonesque wrapper around it.
  • LASSO
  • K-means using as little memory as possible. It can cluster millions of instances efficiently.
  • Random forests
  • Self organising maps
  • Stepwise Discriminant Analysis for feature selection.
  • Non-negative matrix factorisation
  • Affinity propagation

Recent History

The ChangeLog file contains a more complete history.

New in 0.6.1 (11 May 2015)

  • Fixed source distribution

New in 0.6 (27 Apr 2015)

  • Update for Python 3

New in 0.5.3 (19 Jun 2013)

  • Fix MDS for non-array inputs
  • Fix MDS bug
  • Add return_* arguments to kmeans
  • Extend zscore() to work on non-ndarrays
  • Add frac_precluster_learner
  • Work with older C++ compilers

New in 0.5.2 (7 Mar 2013)

  • Fix distribution of Eigen with source

New in 0.5.1 (11 Jan 2013)

  • Add subspace projection kNN
  • Export pdist in milk namespace
  • Add Eigen to source distribution
  • Add measures.curves.roc
  • Add mds_dists function
  • Add verbose argument to milk.tests.run

New in 0.5 (05 Nov 2012)

  • Add coordinate-descent based LASSO
  • Add unsupervised.center function
  • Make zscore work with NaNs (by ignoring them)
  • Propagate apply_many calls through transformers
  • Much faster SVM classification with means a much faster defaultlearner() [measured 2.5x speedup on yeast dataset!]

For older versions, see ChangeLog file

More Repositories

1

BuildingMachineLearningSystemsWithPython

Source Code for the book Building Machine Learning Systems with Python
Python
2,109
star
2

mahotas

Computer Vision in Python
Python
811
star
3

jug

Parallel programming with Python
Python
397
star
4

diskhash

Diskbased (persistent) hashtable
C
143
star
5

django-gitcms

A git based cms for django
Python
71
star
6

imread

Read images to numpy arrays
C++
69
star
7

nixml

NIX + YAML for easy to use reproducible environments
Python
60
star
8

pymorph

Python Morphology Toolbox
Python
47
star
9

talk-python-intro

Introduction to Python (Jupyter based)
Jupyter Notebook
30
star
10

hex

Reimplementation of TeX in Haskell: pre-alpha
Haskell
30
star
11

python-image-tutorial

Python image tutorial (based on ipython notebooks)
Jupyter Notebook
29
star
12

milksets

Machine Learning Toolkit Datasets: A collection of UCI datasets with a Python interface
Python
26
star
13

Programming-for-Scientists

Source Material for a course on Programming targeted at scientists
TeX
25
star
14

pythonvision_org

django-gitcms files for http://pythonvision.org
CSS
17
star
15

mergedirs

Merge two directories without losing files
Python
16
star
16

luispedro_org

jekyll files for luispedro.org
JavaScript
12
star
17

Coelho2009_ISBI_NuclearSegmentation

Reproducible research archive for "Nuclear segmentation in microscope cell images"
TeX
9
star
18

Coelho2021_GMGCv1

Python
9
star
19

libertarian-welfare

Libertarian Welfare State: A Book I'm Writing
Python
7
star
20

dot-link

Dotted Suffix Trees
C++
7
star
21

HBC

My Version of Hal Daumé's Hierarchical Bayesian Compiler
Haskell
7
star
22

android-fuse

Mount an android device using FUSE
Python
5
star
23

pyslic

PySLIC
Python
5
star
24

irstlm

Fork of IRSTLM
C++
5
star
25

safeio

Haskell Library for safe (atomic) IO
Haskell
3
star
26

PenalizedRegression

Shell
3
star
27

vita

My Vita
TeX
3
star
28

gitpointer

Github suggest
Python
3
star
29

elgreco

Graphical Models
C++
3
star
30

mahotas-paper

Paper about mahotas
Python
3
star
31

base-user

dot files and the like so that I can set up a new computer with a couple of command line calls
TeX
3
star
32

readmagick

Read images in Python using ImageMagick++ : SUPERCEDED BY IMREAD
C++
3
star
33

Coelho2015_NetsDetermination

Reproducible code archive for "Automatic Determination of NET (Neutrophil Extracellular Traps) Coverage in Fluorescent Microscopy Images" by Coelho et al.
Python
3
star
34

website.content

Content for My Website
2
star
35

talk-scientific-communication

Talk on Scientific Oral Communication
HTML
2
star
36

safeout

Simple atomic writing for Python
Python
2
star
37

conduit-algorithms

Conduit based algorithms
Haskell
2
star
38

unpack

Unpack zip/7z/tar.* archives with a consistent interface
Python
2
star
39

programming

Programming: An Introduction. A introductory book on computer programming.
2
star
40

imcol

Image Collection Management
Python
2
star
41

waldo

Waldo Project
Python
2
star
42

beiraproject

Beira Project Stuff
Shell
2
star
43

jug-presentations

presentations about jug
TeX
2
star
44

gitwc

Plot number of chars, words, and lines across time in a repository
Python
2
star
45

Hanu

Utilities for numerics in Haskell
Haskell
2
star
46

jug-paper

A future paper about jug (written in the open)
TeX
1
star
47

fna2faa.rs

Rust
1
star
48

outsort

Generic sorting of large datasets (using temporary files as temporary space)
Haskell
1
star
49

imm-mirna

Shell
1
star
50

particles

Python
1
star
51

StructureFunctionOceanTutorial

Jupyter Notebook
1
star
52

ML-for-microbial-communities

Jupyter Notebook
1
star
53

refsweb

Web interface to refs
Python
1
star
54

rbit

rabbit mail
Python
1
star
55

tutorial-unit-testing

HTML
1
star
56

whim

What Have I Missed
CoffeeScript
1
star
57

alist

Append List in Haskell
Haskell
1
star
58

refs

Bibtex Reference Management Software
Python
1
star
59

vision

Vision and Image Processing Library for Python [superceded by mahotas—use that]
Python
1
star
60

tutorial-cluster-usage

JavaScript
1
star
61

cq

Code Quarterly
Haskell
1
star
62

mlsegment

C++
1
star
63

rabbit_blog

1
star
64

beiraproject_org

django-gitcms repo for beiraproject.org
Python
1
star
65

TestingNGLESS

Haskell
1
star
66

blog_luispedro_org

CSS
1
star
67

NGH

Next Generation Sequence Handling in Haskell
Haskell
1
star
68

redundant100

Haskell
1
star
69

image-difftool

Image diff for git on the command line
Python
1
star
70

binary-instances

Haskell instances of Data.Binary
Haskell
1
star
71

anscombe

Python
1
star
72

fautils

Haskell
1
star
73

metarabbit

Posts for metarabbit.wordpress.com
Python
1
star
74

fragile

Lightweight command line unit testing tool for nodejs (with coffescript support)
JavaScript
1
star