• Stars
    star
    105
  • Rank 326,417 (Top 7 %)
  • Language
    Jupyter Notebook
  • License
    Other
  • Created about 4 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

1st place solution for the Kaggle PANDA Challenge

Kaggle-PANDA-1st-place-solution

This is the 1st place solution of the PANDA Competition, where the specific writeup is here.

The codes and models are created by Team PND, @yukkyo and @kentaroy47.

Our model and codes are open sourced under CC-BY-NC 4.0. Please see LICENSE for specifics.

You can skip some steps (because some outputs are already in input dir).

Slide describing our solution!

https://docs.google.com/presentation/d/1Ies4vnyVtW5U3XNDr_fom43ZJDIodu1SV6DSK8di6fs/

1. Environment

You can choose using docker or not.

1.1 Don't use docker (haven't tested..)

  • Ubuntu 18.04
  • Python 3.7.2
  • CUDA 10.2
  • NVIDIA/apex == 1.0 installed
# main dependency
$ pip install -r docker/requirements.txt
# arutema code dependency
$ pip install git+https://github.com/ildoonet/pytorch-gradual-warmup-lr.git
$ pip install efficientnet_pytorch

1.2 Use docker (Recommended)

# build
$ sh docker/build.sh

# run
$ sh docker/run.sh

# exec
$ sh docker/exec.sh

2. Preparing

2.1 Get Data

Download only train_images and train_masks.

$ cd input
$ kaggle download ...
$ unzip ...

(skip) 2.2 Grouping imageids by image hash threshold

(skip) 2.3 Split kfold

$ cd src
$ python data_process/s00_make_k_fold.py
  • Is constant with fixed seed
  • output:
    • input/train-5kfold.csv

2.4 Make tile pngs for training

$ cd src
$ python data_process/s07_simple_tile.py --mode 0
$ python data_process/s07_simple_tile.py --mode 2
$ python data_process/a00_save_tiles.py
$ cd ../input
$ cd numtile-64-tilesize-192-res-1-mode-0
$ unzip train.zip -d train
$ cd ..
$ cd numtile-64-tilesize-192-res-1-mode-2
$ unzip train.zip -d train
$ cd ..

3. Train base model for removing noise(expected TitanRTX x 1)

Each fold needs about 18 hours.

$ cd src
$ python train.py --config configs/final_1.yaml --kfold 1
$ python train.py --config configs/final_1.yaml --kfold 2
$ python train.py --config configs/final_1.yaml --kfold 3
$ python train.py --config configs/final_1.yaml --kfold 4
$ python train.py --config configs/final_1.yaml --kfold 5
  • output:
    • output/model/final_1
      • Each weights and train logs

4. Predict to local validation for removing noise

Each fold needs about 1 hour.

$ cd src
$ python kernel.py --kfold 1
$ python kernel.py --kfold 2
$ python kernel.py --kfold 3
$ python kernel.py --kfold 4
$ python kernel.py --kfold 5
  • outputs are prediction results of the hold-out train data:
    • output/model/final_1/local_preds~~~.csv

5. Remove noise

$ cd src
$ python data_process/s12_remove_noise_by_local_preds.py
  • output:
    • output/model/final_1
      • local_preds_final_1_efficientnet-b1.csv
        • Concatenated prediction results of the hold-out data
        • This is used to clean labels
      • local_preds_final_1_efficientnet-b1_removed_noise_thresh_16.csv
        • Used to train Model 1
        • Base label cleaning results
      • local_preds_final_1_efficientnet-b1_removed_noise_thresh_rad_13_08_ka_15_10.csv
        • Used to train Model 2
        • Label cleaned to remove 20% Radboud labels
  • FYI: we used this csv at final sub on competition: (did not fix seed at time)
    • input/train-5kfold_remove_noisy_by_0622_rad_13_08_ka_15_10.csv

6. Re-train 5-fold models with noise removed

  • You can replace output/train-5kfold_remove_noisy.csv to input/train-5kfold_remove_noisy_by_0622_rad_13_08_ka_15_10.csv in config

  • Only 1,4,5 folds are used for final inference

  • Each fold needs about 15 hours.

    Training model 2(fam_taro model):

$ cd src
# only best LB folds are trained
$ python train.py --config configs/final_2.yaml --kfold 1
$ python train.py --config configs/final_2.yaml --kfold 4
$ python train.py --config configs/final_2.yaml --kfold 5

Training model 1(arutema model):

Please run train_famdata-kfolds.ipynb on jupyter notebook or

# go to home
$ python train_famdata-kfolds.py

I haven't tested .py, so please try .ipynb for operation.

The final models are saved to models.

Each fold will take 4 hours.

Trained models

Models reproducing 1st place score is saved in ./final_models

7. Submit on Kaggle Notebook

### Model 2
# Line [7]
class Config:
    def __init__(self, on_kernel=True, kfold=1, debug=False):
        ...
        ...
        ...

        # You can change weight name. But not need on this README
        self.weight_name = "final_2_efficientnet-b1_kfold_{}_latest.pt"
        self.weight_name = self.weight_name.format(kfold)

        ...
        ...
        ...

    def get_weight_path(self):
        if self.on_kernel:
            # You should change this path to your Kaggle Dataset path
            return os.path.join("../input/030-weight", self.weight_name)
        else:
            dir_name = self.weight_name.split("_")[0]
            return os.path.join("../output/model", dir_name, self.weight_name)
       
### Model 1
# Line [13]
def load_models(model_files):
    models = []
    for model_f in model_files:
        ## You should change this path to your Kaggle Dataset path
        model_f = os.path.join("../input/latesubspanda", model_f)
        ...

model_files = [
    'efficientnet-b0famlabelsmodelsub_avgpool_tile36_imsize256_mixup_final_epoch20_fold0.pth',
]

model_files2 = [
    'efficientnet-b0famlabelsmodelsub_avgpool_tile36_imsize256_mixup_final_epoch20_fold0.pth',
    "efficientnet-b0famlabelsmodelsub_avgpool_tile36_imsize256_mixup_final_epoch20_fold1.pth",
    "efficientnet-b0famlabelsmodelsub_avgpool_tile36_imsize256_mixup_final_epoch20_fold2.pth",
    "efficientnet-b0famlabelsmodelsub_avgpool_tile36_imsize256_mixup_final_epoch20_fold3.pth",
    "efficientnet-b0famlabelsmodelsub_avgpool_tile36_imsize256_mixup_final_epoch20_fold4.pth"
]
        

More Repositories

1

vision-transformers-cifar10

Let's train vision transformers (ViT) for cifar 10!
Python
500
star
2

apple-lidar-stream

Stream Apple LiDAR (iPad/iPhone) data with open3d
Python
76
star
3

efficientdet.pytorch

:neckbeard:Unofficial implementation of EfficientDet
Jupyter Notebook
61
star
4

benchmark-FP32-FP16-INT8-with-TensorRT

Benchmark inference speed of CNNs with various quantization methods in Pytorch+TensorRT with Jetson Nano/Xavier
Jupyter Notebook
54
star
5

timm_speed_benchmark

Benchmark Benchmark Benchmark!
Jupyter Notebook
51
star
6

BlazeFace_Person.pytorch

Unofficial Pytorch implementation of ๐Ÿ”ฅBlazeFace๐Ÿ”ฅ
Jupyter Notebook
42
star
7

ODA-Object-Detection-ttA

ODA is a test-time-augmentation(TTA) tool for 2D object detectors. For use in Kaggle competitions.
Jupyter Notebook
37
star
8

Deep-Compression.Pytorch

Unofficial Pytorch implementation of Deep Compression in CIFAR10
Python
33
star
9

DatasetCulling

โœ‚๏ธ Dataset Culling: Faster training of domain specific models with distillation โœ‚๏ธ (IEEE ICIP 2019)
Python
27
star
10

pytorch-mgpu-cifar10

testing multi gpu for pytorch
Python
25
star
11

MetricLearning-mnist-pytorch

Playground of Metric Learning with MNIST @pytorch. We provide ArcFace, CosFace, SphereFace, CircleLoss and visualization.
Jupyter Notebook
24
star
12

faster-rcnn.pytorch_resnet50

Pytorch Pretrained Resnet18, 34, 50 backbone of faster-rcnn
Python
17
star
13

pytorch-onnx-tensorrt-CIFAR10

Train cifar10 networks and inference with tensorrt.
Python
14
star
14

pytorch-cifar10-fp16

Let's train CIFAR 10 Pytorch with Half-Precision!
Python
13
star
15

keras-Opcounter

calculate number of OPS (computation) in a Keras model!
Python
12
star
16

youtube-stream-downloader

Download Youtube, Twitch, whatever videos with a Python script.
Python
11
star
17

easy-anomaly-detection-with-keras

Lets do anomaly detection with keras!
Python
11
star
18

AnomalyDetection.pytorch

Startup some anomaly detection with pytorch!
Python
9
star
19

kaggle-nflimpact-13thplace

Solution of team tara: Public 7th, Private 13th (The renewed pipeline scores 8th place)
Jupyter Notebook
9
star
20

SSD.objectdetection.pytorch

Library for training and testing object detection for Pytorch (ssd, retinanet)
Jupyter Notebook
8
star
21

ESC-50.pytorch

audio classification by pytorch.
Jupyter Notebook
7
star
22

centernet-from-scratch.pytorch

A simplifed and unofficial implementation of centernet
Jupyter Notebook
7
star
23

quantize_models_sandbox

quantize models like vit and mlp-mixer
Jupyter Notebook
5
star
24

training-domain-specific-models

Framework for training efficient domain specific object detection models in Pytorch
Python
4
star
25

arxiv-scraping

let's scrape arxiv papers by python
Python
3
star
26

kaggle-wheat-arutema47

Global Wheat Detection Codes. Private 80th silver.
Jupyter Notebook
3
star
27

keras_resnet_sparsity

calculate sparsity of resnet w/keras
Python
2
star
28

compare-efficientnet-and-resnet

Compare performance of efficientnet against resnet in pytorch
Jupyter Notebook
2
star
29

benchmark-mygpu-pytorch

benchmark my gpu, personal note
Jupyter Notebook
2
star
30

computation-calculator.pytorch

Calculate the number of computation (MACs) you need for a given CNN.
2
star
31

textsnake_pytorch

Unofficial implementation of textsnake. Mostly to practice code reading.
Jupyter Notebook
2
star
32

cifar10-challange-with-wandb

Python
2
star
33

my-phD-thesis

PhD thesis @Keio University 2019
TeX
1
star
34

line-notify-bot-with-apscheduler

Python
1
star
35

over-the-counter-faster-rcnn.pytorch

a easy to use faster-rcnn in pytorch
Python
1
star
36

deepsort.keras

DeepSort implemented in Keras.
1
star
37

Point-Cloud-Neural-Networks

Study works using point cloud+neural networks
1
star
38

pytorch-lightning-tryouts

try out pytorch lightning with CIFAR 10!
HTML
1
star
39

choka-analysis

้‡ฃใ‚Šใƒ“ใ‚ธใƒงใƒณใฎ้‡ฃๆžœใ‚’ใ‚นใ‚ฏใƒฌใ‚คใƒ”ใƒณใ‚ฐโ†’ๅฏ่ฆ–ๅŒ–
Python
1
star
40

quantize-huggingface

Quantize Huggingface transformers like BERT ๐Ÿค—
Jupyter Notebook
1
star