zygmuntz/adversarial-validation

Stars
100
Rank 340,703 (Top 7 %)
Language
Python
License
MIT License
Created over 8 years ago
Updated over 8 years ago

zygmuntz/adversarial-validation

zygmuntz

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Creating a better validation set when test examples differ from training examples

Adversarial validation

The santander dir holds the scripts for the Santander competition:

distinguish_train_test.py - try to distinguish train/test set examples
validate.py - get validation AUC scores for logistic regression and random forest
predict.py - output test predictions from logistic regression and random forest

Similarly, the 'numerai' dir contains the Numerai scripts:

distinguish_train_test.py - try to distinguish train/test set examples
sort_train.py - sort training examples by their similarity to test examples
validate_sorted.py - get validation scores using for most test-like examples
predict.py - output test predictions

goodbooks-10k

Ten thousand books, six million ratings

Jupyter Notebook

hyperband

Tuning hyperparams fast with Hyperband

phraug

A set of simple Python scripts for pre-processing large files

phraug2

A new version of phraug, which is a set of simple Python scripts for pre-processing large files

numer.ai

Validation and prediction code for numer.ai

kaggle-blackbox

Deep learning made easy

classifying-text

Classifying text with bag-of-words

evaluating-recommenders

Compute and plot NDCG for a recommender system

time-series-classification

Classifying time series using feature extraction

classifier-calibration

Reliability diagrams, Platt's scaling, isotonic regression

kaggle-advertised-salaries

Predicting job salaries from ads - a Kaggle competition

the-secret-of-the-big-guys

k-means + a linear model = good results

pointer-networks-experiments

Sorting numbers with pointer networks

kaggle-cats-and-dogs

Classifying images with OverFeat

kaggle-stackoverflow

Predicting closed questions on Stack Overflow

gaussrank

Preparing continuous features for neural networks with GaussRank

kaggle-happiness

Predicting happiness from demographics and poll answers

kaggle-cifar

Code for the CIFAR-10 competition at Kaggle, uses cuda-convnet

sofia-ml-mod

sofia-kmeans with sparse RBF cluster mapping

pylearn2-practice

Pylearn2 in practice

kaggle-burn-cpu

Code for the "Burn CPU, burn" competition at Kaggle. Uses Extreme Learning Machines and hyperopt.

kaggle-amazon

Amazon access control challenge

pybrain-practice

A regression example for PyBrain

wine-quality

Predicting wine quality

dimensionality-reduction-for-sparse-binary-data

convert a lot of zeros and ones to fewer real numbers

cubert

How to make those 3D data visualizations

kaggle-gender

A Kaggle competition: discriminate gender based on handwriting

msda-denoising

Using a very fast denoising autoencoder

kaggle-solar

Code for Solar Energy Prediction Contest at Kaggle

nonlinear-vowpal-wabbit

How to use automatic polynomial features and neural network mode in VW

metric-learning-for-regression

Applying metric learning to kin8nm

kaggle-avito

Code for the Avito competition

kaggle-rossmann

Predicting sales with Pandas

spearmint

tuning hyperparams automatically with spearmint

kaggle-accelerometer

Code for Accelerometer Biometric Competition at Kaggle

large-scale-linear-learners

VW, Liblinear and StreamSVM compared on webspam

r-libsvm-format-read-write

R code for reading and writing files in libsvm format

stardose

A recommender system for GitHub repositories

running-external-programs-from-python

feature-selection

Selecting features for classification with MRMR

kaggle-merck

Merck challenge at Kaggle

kaggle-stumbleupon

bag of words + sparsenn

project-rhubarb

predicting mortality in England using air quality data

kaggle-bestbuy_big

Code for the Best Buy competition at Kaggle

kaggle-digits

Some code for the Digits competition at Kaggle, incl. pylearn2's maxout

misc

Jupyter Notebook

kaggle-poker-hands

Code for the Poker Rule Induction competition

kaggle-bestbuy_small

AlpacaGPT

How to train your own ChatGPT, Alpaca style

kaggle-jobs

Some auxiliary code for Kaggle job recommendation challenge