• Stars
    star
    135
  • Rank 269,297 (Top 6 %)
  • Language
    C
  • Created about 10 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

KMeans benchmark

This benchmark is born to compare the performance of Pharo 3 in executing a simple machine learning algorithm with a reference implementation in Python and Scala. Since then, it got a little out of hand, and a few other implementations are available.

This benchmark is not going to be updated anymore after 30/3/2017. I don't have the time to reinstall all languages on the original machine. Still, you can play with it and test everything on your computer. Thus, I will not accept any more PRs to this repository.

Rules

The implementations should all follow the same algorithm, and be optimized for idiomatic code and not for speed. The example is intended to compare time of execution for a typical machine learning algorithm, ideally during an interactive session, instead of highly optimized production code. As such, it is important that the code is straightforward and that there is no separate phase to prepare the caches.

The points are in points.json, and are to be grouped into 10 clusters, using 15 iterations of kmeans. The initial centroids are initialized to the first 10 points, and we take an average over 100 runs.

Results

Time for running on my laptop are available under results. A few surprises:

  • Writing a working Rust implementation was surprisingly difficult; writing one that would perform decently even more so. I had to rely frequently on help from people online.
  • PyPy is able to outperform Scala
  • Factor is pretty impressive, given that it is a fairly small project with a dedicated VM. With an implementation in 8 (!) lines, we get the a fairly performing dynamic language
  • Nim was also quite impressive: my first implementation was as easy as Python, and it was just behind Rust; when an unnecessary copy was removed, it turned out to be the fastest.

Contribute

If you want to contribute an implementation in a different language, please file a PR. Try to follow the same logic that is used in the examples in other languages - for instance, using a group by operation where available. As you may notice, the algorithm is not optimized, and intentionally so: while K-means in particular has various possible optimizations, other similar algorithms may fail to have the particular shape that makes these optimizations viable.

For the curious folks, I have tried a more optimized (single-threaded) implementation in Nim, that avoids the square root in the distance and accumulates the sum of points near a centroid, rather than putting them into a data structure. For comparison, this version runs in 67 ms. Computers are actually quite fast, these days!

How to run

C

sudo apt-get install libglib2.0-0
sudo apt-get install libjansson-dev # or equivalent for your OS
./compile.sh
./kmeans

C++

We use BiiCode for building. Assuming you have it installed, from the cpp directory do

bii init -L
bii configure -DCMAKE_BUILD_TYPE=RELEASE
bii build
bin/user_cpp_benchmark

Chapel

Before compiling chapel please do:

export CHPL_LLVM=llvm

to enable LLVM support (this is used for the json import in C). Then, make sure that chpl is on your $PATH (for instance with source source util/setchplenv.sh). Finally:

make
./kmeans

Clojure: lein with-profile uberjar run

Common Lisp: sbcl --script kmeans.lisp

Crystal:

crystal build kmeans.cr --release
./kmeans

CUDA

sudo apt-get install libjansson-dev # or equivalent for your OS (e.g. on Mac you can: brew install jansson)
cmake .
make

then

./kmeans.out [ input_file.json number_of_points number_of_centroids ]

D:

dmd -O -inline -release -noboundscheck main.d
./main

Elixir:

elixirc kmeans.ex
elixir main.exs

Erlang:

erl
1> c(main).
2> c(kmeans).
3> main:run().

F#:

make
make run

Factor:

USE: kmeans.benchmark
100 "../points.json" kmeans-benchmark

Go

go build main.go
./main

Haskell:

cabal install --only-dependencies
cabal build
dist/build/kmeans/kmeans

Java:

mvn compile
mvn exec:java

Java 8 (Streams and Lambdas):

mvn compile
mvn exec:java

Julia:

julia -e 'Pkg.add("JSON")'
julia kmeans.jl

Kotlin:

mvn compile exec:java

Lua: download this JSON library and put it in the same folder as the main file. Then run

lua kmeans.lua
luajit kmeans.lua

Nim:

nim c -d:release benchmark
./benchmark

Node:

npm install
node kmeans.js

OCaml:

opam install core yojson
corebuild -pkg yojson main.native
./main.native

OpenCL: need to have Nim and Nimble installed, as well as a NVIDIA GPU. Check the library paths in kmeans.nimble, then run nimble kmeans.

OpenMP

make

./kmeans.out [ inputfile.json number_of_points number_of_centroids number_of_threads ]

or:

./kmeans.out [number_of_threads]

Parasail: assume pslc.csh is on $PATH. Then

pslc.csh -O3 point.psl kmeans.psl benchmark.psl -o benchmark
./benchmark

Perl:

perl kmeans.pl

Pharo3: install NeoJSON and file-in Kmeans.st, then open a workspace and write something like

| path points kmeans |

path := '../points.json'.

kmeans := KMeans new
  iterations: 15;
  clusters: 10;
  yourself.

StandardFileStream readOnlyFileNamed: path
  do: [ :stream |
    points := (NeoJSONReader on: stream) next collect: [ :each |
      (each first) @ (each second)
    ].
  ].

kmeans benchmark: points repeating: 100

Python:

python kmeans.py
pypy kmeans.py

Pony:

make
make run

Ruby:

ruby kmeans.rb
rbx kmeans.rb

Rust:

cargo run --release

Scala: sbt run

Scala-Js:

First, generate the compiled javascript with sbt fullOptJS. Then, cd target/scala-2.11, open node and

> require('./kmeans-opt')
> require('./kmeans-launcher')

At first, it seems that nothing is going on, but after a while you should see the results printed.

Scala-Native: sbt run

Stanza:

Install Stanza following instructions here and put it into your PATH.

./compile.sh
./kmeans

Swift:

swiftc -Ounchecked kmeans.swift main.swift
./main

X10: First

mkdir cbin
mkdir javabin

Make sure that the bin folder of X10 is on your path. Then, for the Java target, decomment lines with Java json import inside Main.x10 and

make java
make runJava

For the native target, decomment lines with C json import inside Main.x10 and

make c
make runC

More Repositories

1

paths-js

Generate SVG paths for geometric shapes 📊
JavaScript
1,715
star
2

patty

A pattern matching library for Nim
Nim
276
star
3

neo

A matrix library
HTML
244
star
4

rosencrantz

A web DSL for Nim
Nim
198
star
5

linear-algebra

Linear algebra for Nim
Nim
140
star
6

react.nim

React.js bindings for Nim
Nim
92
star
7

memo

Memoization for Nim
Nim
78
star
8

cello

A string library
Nim
76
star
9

ganzo

A GAN framework
Python
66
star
10

factor-tutorial

From function composition to distributed programming
65
star
11

csvtools

Manage CSV files in Nim
Nim
45
star
12

alea

Define and compose random variables
Nim
44
star
13

paths-js-demo

Demo application for paths.js
CSS
38
star
14

emmy

Nim
37
star
15

interfaced

Nim
32
star
16

on-rust-and-nim

A first experience with Rust
30
star
17

paths-scala-js

Scala.js binding for Paths.js
Scala
29
star
18

charade

A server for multilanguage, composable NLP API in Python
Python
28
star
19

lethe

Oblivious RAM for Scala
Scala
19
star
20

paths-js-react-demo

Demo application for Paths.js
JavaScript
18
star
21

commutative-algebra

An introduction to the basic ideas of commutative algebra
16
star
22

rxnim

Nim
14
star
23

spills

Disk-based sequences
Nim
12
star
24

neurotic

Nim
11
star
25

factor-work

Miscellaneous Factor utilities
Factor
5
star
26

ractive-stack

Skeleton of a Ractive - Coffeescript - Require.js app
CoffeeScript
5
star
27

paths-scala-js-demo

Demo application for Paths.scala.js
Scala
5
star
28

react-stack

Skeleton of a React - Coffeescript - Require.js app
CoffeeScript
4
star
29

factor-package-manager

Factor
3
star
30

math-notes

Short mathematical notes
3
star
31

factor-packages

Factor
3
star
32

levenshtein

Levenshtein distance benchmark
Nim
3
star
33

paths-talk-examples

Examples from the talk on Paths.js
JavaScript
2
star
34

vagga-examples

Ruby
2
star
35

paths-talk-slides

Slides for the talk on Paths.js
JavaScript
2
star
36

wgan

Wasserstein GAN with Gradient Penalty
Python
1
star
37

backupper

A simple tool for incremental copies
Nim
1
star