• Stars
    star
    324
  • Rank 129,708 (Top 3 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created over 7 years ago
  • Updated over 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Pytorch starter kit for Kaggle competitions

Summary

Pytorch Kaggle starter is a framework for managing experiments in Kaggle competitions. It reduces time to first submission by providing a suite of helper functions for model training, data loading, adjusting learning rates, making predictions, ensembling models, and formatting submissions.

Inside are example Jupyter notebooks walking through how to get strong scores on popular competitions:

These notebooks outline basic, single-model submissions. Scores can be improved significantly by ensembling models and using test-time augmentation.

Features

  1. Experiments - Launch experiments from python dictionaries inside jupyter notebooks or python scripts. Attach Visualizers (Visdom, Kibana), Metrics (Accuracy, F2, Loss), or external datastores (S3, Elasticsearch)
  2. Monitoring - Track experiments from your phone or web-browser in real-time with Visdom, a lightweight visualization framework from Facebook
  3. Notifications - Receive email notifications when experiments complete or fail
  4. Sharing - Upload experiments, predictions and ensembles to S3 for other users to download
  5. Analysis - Compare experiments across users with Kibana. Design custom dashboards for specific competitions
  6. Helpers - Reduce time to submission with helper code for common tasks--custom datasets, metrics, storing predictions, ensembling models, making submissions, and more.
  7. Torchsample - Includes the latest release of ncullen93's torchsample project for additional trainer helpers and data augmentations.

Requirements

  1. Anaconda with Python3
  2. Pytorch
  3. Other requirements: pip install -r requirements.txt
  4. conda install -c menpo opencv
  5. Server with GPU and Cuda installed

Datasets

To get started you'll need to move all training and test images to the project_root/datasets/inputs directory (then either trn_jpg tst_jpg subdirectories). Running the first cell of each notebook creates the directory structure outlined in the config.py file.

There is no need to create separate directories for classes or validation sets. This is handled by the data_fold.py module and the FileDataset, which expects a list of filepaths and targets. After trying out a lot of approaches, I found this to be the easiest and most extensible. You'll sometimes need to generate a metadata.csv file separately if Kaggle didn't provide one. This sort of competition-specific code can live in the competitions/ directory.

Visdom

Visualize experiment progress on your phone with Facebook's new Visdom framework.

Visdom

Kibana

Spin up an Elasticsearch cluster locally or on AWS to start visualizing or tracking experiments. Create custom dashboards with Kibana's easy-to-use drag and drop chart creation tools.

Kibana1

Filter and sort experiments, zoom to a specific time period, or aggregate metrics across experiments and see updates in real time.

Kibana2

Emails

Receive emails when experiments compete or fail using AWS SES service.

Emails

Kaggle CLI

Quickly download and submit with the kaggle cli tool.

kg download -c dogs-vs-cats-redux-kernels-edition -v -u USERNAME -p PASSWORD
kg submit -m 'my sub' -c dogs-vs-cats-redux-kernels-edition -v -u USERNAME -p PASSWORD my_exp_tst.csv

Best practices

  • Use systemd for always running Visdom and Jupyter servers

Unit Tests

Run tests with:

python -m pytest tests/

Other run commands:

python -m pytest tests/ (all tests)
python -m pytest -k filenamekeyword (tests matching keyword)
python -m pytest tests/utils/test_sample.py (single test file)
python -m pytest tests/utils/test_sample.py::test_answer_correct (single test method)
python -m pytest --resultlog=testlog.log tests/ (log output to file)
python -m pytest -s tests/ (print output to console)

TODO

  • Add TTA (test time augmentation) example
  • Add Pseudolabeling example
  • Add Knowledge Distillation example
  • Add Multi-input/Multi-target examples
  • Add stacking helper functions

More Repositories

1

ml-glossary

Machine learning glossary
Python
3,005
star
2

pytorch_tiramisu

FC-DenseNet in PyTorch for Semantic Segmentation
Jupyter Notebook
306
star
3

pytorch-cheatsheet

Check out improved:
Jupyter Notebook
296
star
4

ml-study

ML Learning Sabbatical Study Materials
Jupyter Notebook
132
star
5

computer-vision

Computer vision sabbatical study materials
Jupyter Notebook
54
star
6

problems

Programming practice problems and solutions ;)
Python
39
star
7

labelml

Machine Learning Image Annotation Tool
Python
36
star
8

machine-learning

Machine learning sabbatical study materials
Jupyter Notebook
23
star
9

pytorch-federated-learning

Jupyter Notebook
15
star
10

learning_data_aug

OpenAI Request for Research - https://blog.openai.com/requests-for-research-2/
Jupyter Notebook
13
star
11

zoosearch

Search engine for machine learning models and datasets
JavaScript
10
star
12

higgins

OpenAI-based personal assistant and browser automation tool
Python
9
star
13

deep_learning_course

Materials for deep learning course created by fast.ai
Jupyter Notebook
8
star
14

punisher

Cryptocurrency trading library for machine learning research
Jupyter Notebook
6
star
15

python-algorithms

Notes and functions for Python algorithms course
Python
4
star
16

healthfed

Federated Learning in Healthcare Data
Python
3
star
17

epilepsy_diary

Backend FB Messenger Bot To Help Epileptics Track Seizures
Python
3
star
18

AndroidCaffe2

Demo deploying PyTorch/Caffe2 models to Android
C++
3
star
19

vaa3d-api

API for running Vaa3D jobs
Python
3
star
20

lungcancer

Luna and Kaggle Lung Cancer
Jupyter Notebook
2
star
21

AndroidDemo

Quick prototyping app to learn Android basics
C++
2
star
22

deephacks

Jupyter Notebook
2
star
23

hapibot

Frontend FB Messenger Bot To Help Epileptics Track Seizures
JavaScript
2
star
24

label-ai

Python
1
star
25

PyTorch2Android

Run PyTorch models on Android
1
star
26

sealions

Jupyter Notebook
1
star
27

kaggle-submissions

Kaggle competition submissions
Jupyter Notebook
1
star
28

probleeyo

Coding practice problems for mobile devices
JavaScript
1
star
29

bfortuner.github.io

Old Static Blog w Jekyll
HTML
1
star