• Stars
    star
    100
  • Rank 338,711 (Top 7 %)
  • Language
    Python
  • License
    MIT License
  • Created over 8 years ago
  • Updated over 8 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Creating a better validation set when test examples differ from training examples

Adversarial validation

The santander dir holds the scripts for the Santander competition:

distinguish_train_test.py - try to distinguish train/test set examples
validate.py - get validation AUC scores for logistic regression and random forest
predict.py - output test predictions from logistic regression and random forest

Similarly, the 'numerai' dir contains the Numerai scripts:

distinguish_train_test.py - try to distinguish train/test set examples
sort_train.py - sort training examples by their similarity to test examples
validate_sorted.py - get validation scores using for most test-like examples
predict.py - output test predictions

More Repositories

1

goodbooks-10k

Ten thousand books, six million ratings
Jupyter Notebook
788
star
2

hyperband

Tuning hyperparams fast with Hyperband
Python
587
star
3

phraug

A set of simple Python scripts for pre-processing large files
Python
271
star
4

phraug2

A new version of phraug, which is a set of simple Python scripts for pre-processing large files
Python
206
star
5

numer.ai

Validation and prediction code for numer.ai
Python
150
star
6

kaggle-blackbox

Deep learning made easy
MATLAB
115
star
7

classifying-text

Classifying text with bag-of-words
Python
114
star
8

evaluating-recommenders

Compute and plot NDCG for a recommender system
Python
95
star
9

time-series-classification

Classifying time series using feature extraction
Python
86
star
10

classifier-calibration

Reliability diagrams, Platt's scaling, isotonic regression
Python
71
star
11

kaggle-advertised-salaries

Predicting job salaries from ads - a Kaggle competition
Python
55
star
12

the-secret-of-the-big-guys

k-means + a linear model = good results
Python
55
star
13

pointer-networks-experiments

Sorting numbers with pointer networks
Python
55
star
14

kaggle-cats-and-dogs

Classifying images with OverFeat
Python
46
star
15

kaggle-stackoverflow

Predicting closed questions on Stack Overflow
Python
46
star
16

gaussrank

Preparing continuous features for neural networks with GaussRank
Python
45
star
17

kaggle-happiness

Predicting happiness from demographics and poll answers
Python
45
star
18

kaggle-cifar

Code for the CIFAR-10 competition at Kaggle, uses cuda-convnet
Python
44
star
19

sofia-ml-mod

sofia-kmeans with sparse RBF cluster mapping
C++
42
star
20

pylearn2-practice

Pylearn2 in practice
Python
41
star
21

kaggle-burn-cpu

Code for the "Burn CPU, burn" competition at Kaggle. Uses Extreme Learning Machines and hyperopt.
Python
33
star
22

kaggle-amazon

Amazon access control challenge
Python
25
star
23

pybrain-practice

A regression example for PyBrain
Python
25
star
24

wine-quality

Predicting wine quality
R
25
star
25

dimensionality-reduction-for-sparse-binary-data

convert a lot of zeros and ones to fewer real numbers
Python
23
star
26

cubert

How to make those 3D data visualizations
JavaScript
22
star
27

kaggle-gender

A Kaggle competition: discriminate gender based on handwriting
Python
21
star
28

msda-denoising

Using a very fast denoising autoencoder
MATLAB
17
star
29

kaggle-solar

Code for Solar Energy Prediction Contest at Kaggle
Python
17
star
30

nonlinear-vowpal-wabbit

How to use automatic polynomial features and neural network mode in VW
Python
17
star
31

metric-learning-for-regression

Applying metric learning to kin8nm
MATLAB
16
star
32

kaggle-avito

Code for the Avito competition
Python
16
star
33

kaggle-rossmann

Predicting sales with Pandas
Python
15
star
34

spearmint

tuning hyperparams automatically with spearmint
R
15
star
35

kaggle-accelerometer

Code for Accelerometer Biometric Competition at Kaggle
Python
15
star
36

large-scale-linear-learners

VW, Liblinear and StreamSVM compared on webspam
Python
14
star
37

r-libsvm-format-read-write

R code for reading and writing files in libsvm format
R
14
star
38

stardose

A recommender system for GitHub repositories
Python
13
star
39

running-external-programs-from-python

Python
11
star
40

feature-selection

Selecting features for classification with MRMR
R
11
star
41

kaggle-merck

Merck challenge at Kaggle
Python
10
star
42

kaggle-stumbleupon

bag of words + sparsenn
Python
10
star
43

project-rhubarb

predicting mortality in England using air quality data
Python
9
star
44

kaggle-bestbuy_big

Code for the Best Buy competition at Kaggle
Python
8
star
45

kaggle-digits

Some code for the Digits competition at Kaggle, incl. pylearn2's maxout
MATLAB
8
star
46

misc

misc
Jupyter Notebook
7
star
47

kaggle-poker-hands

Code for the Poker Rule Induction competition
Python
7
star
48

kaggle-bestbuy_small

Python
6
star
49

AlpacaGPT

How to train your own ChatGPT, Alpaca style
Python
3
star
50

kaggle-jobs

Some auxiliary code for Kaggle job recommendation challenge
Python
2
star