• This repository has been archived on 07/May/2020
  • Stars
    star
    129
  • Rank 279,262 (Top 6 %)
  • Language
    Common Lisp
  • License
    MIT License
  • Created about 10 years ago
  • Updated over 9 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The winning solution to the The Higgs Boson Machine Learning Challenge.

This is the winning solution to the The Higgs Boson Machine Learning Challenge on Kaggle by GΓ‘bor Melis.

Instructions for the batteries included distribution

For the contest one has to submit a zip file with all the libraries and reproduce the submitted results exactly. The instructions in this section relate to that.

The winning submission was a bag of 70 dropout neural networks and took one day to train on a GTX Titan GPU in double precision mode. Prediction took about an hour.

My tests indicate that 8-16 neural networks are very close in performance to 70, so this was probably an overkill.

Prerequisites

Summary of what's needed:

  • NVIDIA CUDA Toolkit (tested with 5.5.22, build 331.67)
  • a CUDA card, preferably a Titan (in double precision mode) or better
  • to rebuild the executable, SBCL (a Common Lisp compiler)
  • 24GiB RAM
  • about 4GiB of disk space

Let's see why these things are needed. The CUDA card and the NVIDIA CUDA Toolkit are prerequisites, because training of the dropout network was performed on a CUDA card relying on the CURANDOM random number generator, and also because the results of deterministic operations on the GPU and the CPU can and do differ within the constraints of the IEEE standard.

A Titan or better is recommended because it is many times faster than GTX 480 or similar alternatives in double precision mode. I haven't tested with single precision float, it's likely to work just as well.

SBCL is only necessary to rebuild the executable. SBCL 1.1.14 was used in the submission, but any recent version should work.

I'm not sure about the actual minimum memory required, it may work with 8GiB, but it was tested with 24GiB and that's the heap size with which the executable was built.

Reproducing the results

The zip file contains an executable built on a recent Debian Linux x86-64 Testing installation. This is executable is the file 'rumcajsz' in the top-level directory.

If this executable works, there is no need to build anything, training and prediction can be done right away:

    ./higgsml-train <path/to/training.csv> <save-dir-for-trained-model>
    ./higgsml-run <path/to/test.csv> <save-dir-for-trained-model> \
        <path/to/submission.csv>

Note that higgsml-train logs its doings to stdout and also to

     <save-dir-for-trained-model>/rumcajsz.log

If higgsml-train fails with a library or ELF format related error, then the executable may not be compatible with slc5 and it needs to be rebuilt with:

    ./higgsml-build

After rebuilding the executable, the above higgsml-{train,run} commands should work.

On the public and private leaderboards the submission achieved 3.78457 and 3.80581, respectively. Cross-validation on the training set indicated 3.82-3.85 depending on how much smoothing was applied to the ams-vs-cutoff curve. For the raw curve see doc/cv-ams-vs-cutoff.png.

Instructions for the source distribution

  1. Install either SBCL or AllegroCL. SBCL is free and is available on all Linux distributions.
  1. Set some options:

     $ ./configure --help
     Usage: ./configure
               [--lisp [ acl | sbcl]]
               [--sbcl-bin <sbcl-binary>]
               [--sbcl-options <sbcl-options>]
               [--acl-bin <acl-binary>]
               [--acl-options <acl-options>]
               [--data-dir <directory>]
               [--model-dir <directory>]
               [--submission-dir <directory>]
    
     --lisp
       defaults to 'sbcl'.
     --sbcl-bin
       Path to the sbcl executable. Defaults to 'sbcl'.
     --sbcl-options
       Defaults to '--dynamic-space-size 24000 --noinform
                    --lose-on-corruption --end-runtime-options
                    --non-interactive --no-userinit --no-sysinit
                    --disable-debugger'.
     --acl-bin
       Path to the AllegroCL executable. Defaults to 'alisp'.
     --acl-options
       Defaults to ''.
     --data-dir
       Where the uncompressed csv files reside, defaults to 'data/'.
     --model-dir
       Where the model files are saved, defaults to 'model/'.
     --submission-dir
       Where submission files are saved, defaults to 'submission/'.
    

The directories can be absolute filenames or relative to the configuration script.

Installing dependencies to a project local quicklisp dir

The following command will set up a new quicklisp directory right below the top-level directory and fetch all dependencies with the exact versions with which this program was tested:

    make quicklisp

Installing dependencies to a global quicklisp dir

Well, you are on your own, but you may want to use the following command to fetch the dependencies not in quicklisp:

    (cd ~/quicklisp/local-projects &&
         <path-to-here>/build/install-local-projects.sh)

If you are having trouble, not that the rest of the libraries were from the 2015-01-13 quicklisp distribution. See build/install-dependencies.lisp about how to get historical distributions.

Running and Developing

Place training.csv and test.csv into the data directory (by default ./data/).

Train the model and create the leaderboard predictions files by evaluating the commented out form at the bottom of src/bpn.lisp.

Also see the description of the command line scripts higgsml-{build,train,run} above.

More Repositories

1

mgl

Common Lisp machine learning library.
Common Lisp
585
star
2

mgl-pax

Documentation system, browser, generator.
Common Lisp
73
star
3

mgl-mat

MAT is library for working with multi-dimensional arrays which supports efficient interfacing to foreign and CUDA code. BLAS and CUBLAS bindings are available.
Common Lisp
69
star
4

mgl-gpr

MGL-GPR is a library of evolutionary algorithms such as Genetic Programming (evolving typed expressions from a set of operators and constants) and Differential Evolution.
Common Lisp
63
star
5

journal

A Common Lisp library for logging, tracing, testing and persistence.
Common Lisp
39
star
6

try

Try is an extensible test anti-framework with equal support for interactive and non-interactive workflows.
Common Lisp
26
star
7

cl-libsvm

Common Lisp wrapper for the libsvm support vector machine library.
C++
16
star
8

micmac

Common Lisp implementation of various AI/statistics related things.
Common Lisp
16
star
9

planet-wars

Winning bot of Planet Wars Google AI Challenge in 2010.
Common Lisp
15
star
10

lisp-machine-xkb

Lisp machine X11 keyboard layout
12
star
11

color-theme-mgl

A pair of light and dark Emacs 24 color themes.
Emacs Lisp
12
star
12

lassie

Common Lisp Latent Semantic Indexing library.
Common Lisp
5
star
13

fsvd

Common Lisp Funky Singular Value Decomposition library.
Common Lisp
5
star
14

adaptive-hashing

Adaptive Hashing paper and talk
TeX
3
star
15

phd-thesis

LaTeX sources of the PhD thesis "Towards Better Generative Models of Language"
TeX
3
star
16

mgl-pax-blog

Common Lisp
2
star
17

six

Hex playing program for KDE.
Shell
2
star
18

two-tailed-averaging

LaTeX sources for the paper and a demonstration of grid typesetting
TeX
2
star