• Stars
    star
    1,081
  • Rank 42,808 (Top 0.9 %)
  • Language
    Python
  • Created over 6 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

My best practice of training large dataset using PyTorch.

Efficient-PyTorch

My best practice of training large dataset using PyTorch.

Speed overview

By following the tips, we can reach achieve ~730 images/second with PyTorch when training ResNet-50 on ImageNet. According to benchmark reported on Tensorflow and MXNet, the performance is still competitive.

Epoch: [0][430/5005]    Time 0.409 (0.405)      Data 626.6 (728.0)      Loss 6.8381 (6.9754)    Error@1 100.000 (99.850) Error@5 99.609 (99.259)
Epoch: [0][440/5005]    Time 0.364 (0.404)      Data 704.2 (727.9)      Loss 6.8506 (6.9725)    Error@1 100.000 (99.851) Error@5 99.609 (99.258)
Epoch: [0][450/5005]    Time 0.350 (0.403)      Data 730.7 (727.3)      Loss 6.8846 (6.9700)    Error@1 100.000 (99.847) Error@5 99.609 (99.258)
Epoch: [0][460/5005]    Time 0.357 (0.402)      Data 716.8 (727.4)      Loss 6.9129 (6.9680)    Error@1 100.000 (99.849) Error@5 99.609 (99.256)
Epoch: [0][470/5005]    Time 0.346 (0.401)      Data 740.8 (727.4)      Loss 6.8574 (6.9657)    Error@1 100.000 (99.850) Error@5 98.828 (99.249)
Epoch: [0][480/5005]    Time 0.425 (0.400)      Data 601.8 (727.3)      Loss 6.8467 (6.9632)    Error@1 100.000 (99.849) Error@5 99.609 (99.239)
Epoch: [0][490/5005]    Time 0.358 (0.399)      Data 715.2 (727.2)      Loss 6.8319 (6.9607)    Error@1 100.000 (99.848) Error@5 99.609 (99.232)
Epoch: [0][500/5005]    Time 0.347 (0.399)      Data 737.4 (726.9)      Loss 6.8426 (6.9583)    Error@1 99.609 (99.843)  Error@5 98.047 (99.220)
Epoch: [0][510/5005]    Time 0.346 (0.398)      Data 740.5 (726.7)      Loss 6.8245 (6.9561)    Error@1 100.000 (99.839) Error@5 99.609 (99.211)
Epoch: [0][520/5005]    Time 0.350 (0.452)      Data 730.7 (724.0)      Loss 6.8270 (6.9538)    Error@1 99.609 (99.834)  Error@5 97.656 (99.193)
Epoch: [0][530/5005]    Time 0.340 (0.450)      Data 752.9 (724.4)      Loss 6.8149 (6.9516)    Error@1 100.000 (99.832) Error@5 98.047 (99.183)

Key Points of Efficiency

Now most frameworks adapt CUDNN as their backends. Without special optimization, the inference time is similiar across frameworks. To optimize training time, we focus on other points such as

Data Loader

The default combination datasets.ImageFolder + data.DataLoader is not enough for large scale classification. According to my experience, even I upgrade to Samsung 960 Pro (read 3.5 GB/s, write 2.0 GB/s), whole training pipeline still suffers at disk I/O.

The reason causing is the slow reading of discountiuous small chunks. To optimize, we need to dump small JPEG images into a large binary file. TensorFlow has its own TFRecord and MXNet uses recordIO. Beside these two, there are other options like hdf5, pth, n5, lmdb etc. Here I choose lmdb because

  1. TFRecord is a private protocal which is hard to hack into. RecordIO's documentation is confusing and do not provide a clean python API.
  2. hdf5 pth n5, though with a straightforward json-like API, require to put the whole file into memory. This is not practicle when you play with large dataset like imagenet.

Data Parallel

The default data parallel of PyTorch, powerd by nn.DataParallel, is in-efficienct! Fisrt, because the GIL of Python, multi-threading do not fully utilize all cores torch/nn/parallel/parallel_apply.py#47. Second, the collective scheme of DataParallel is to gather all results on cuda:0. It leads to imbalance workload and sometimes OOM especially you are running segmentation models.

nn.DistributedDataParllel provides a more elegant solution: Instead of launching call from different threads, it starts with multiple processes (no GIL) and assigns a balanced workload for all GPUs.

(on-going) detailed scripts and experiment numbers.

More Repositories

1

pytorch-OpCounter

Count the MACs / FLOPs of your PyTorch model.
Python
4,849
star
2

pytorch-memonger

Sublinear memory optimization for deep learning. https://arxiv.org/abs/1604.06174
Python
587
star
3

SparseNet

[ECCV 2018] Sparsely Aggreagated Convolutional Networks https://arxiv.org/abs/1801.05895
Python
125
star
4

arXiv-stats

Python
50
star
5

hf-torrent

Python
37
star
6

mxbox

Simple, efficient and flexible vision toolbox for mxnet framework.
Python
31
star
7

Bayesian-Compression-for-Deep-Learning

Remplementation of paper https://arxiv.org/abs/1705.08665
Python
28
star
8

PyTorch-Template

A template for PyTorch projects.
Python
22
star
9

Colorize-Your-World

Let there be color!
Jupyter Notebook
19
star
10

Machine-Learning-for-Image-Colorization

(Torch + Tensorflow) A deep magic brings color to your monochrome image!
MATLAB
12
star
11

GroupNorm.pytorch

PyTorch implementation of Group Normalization https://arxiv.org/abs/1803.08494
Python
11
star
12

Colorizing-Color-Images

[HVEI 2018] Colorizing Color Images
Jupyter Notebook
11
star
13

Project-Page-Render

HTML
10
star
14

Echoo

Let your program echo to you.
Python
8
star
15

arch-viz

Python
5
star
16

hf-torrent-store

5
star
17

Deep-Learning-Live

From linear regression to multi-layer perceptron, an introductive tutorial for deep learning beginners.
4
star
18

MNasNet-TensorFlow

Implementation of MnasNet: Platform-Aware Neural Architecture Search for Mobile
Python
4
star
19

tvm-notes

Python
4
star
20

PyTorch-via-PyTorch

C++
4
star
21

FlashATM

3
star
22

HW-for-COMP

HTML
3
star
23

edge-cloud-train

Python
3
star
24

ffmpeg-cuda-docker

A docker container to launch GPU accelerated FFmpeg
Dockerfile
3
star
25

pi-tools

A repo includes some useful tools for raspberry pi farm setup. https://hub.docker.com/repository/docker/lyken/pi-tools
Dockerfile
2
star
26

EIE-pytorch

PyTorch implementation for EIE https://arxiv.org/abs/1602.01528
Jupyter Notebook
2
star
27

Colorize.PyTorch

Python
2
star
28

torch-mps-benchmark

Python
1
star
29

sample-video

1
star
30

micro23

1
star
31

GPU-Speed-Benchmark

Python
1
star
32

tvm-issue-07-12

Python
1
star
33

Docker-Horovod

Dockerfile
1
star
34

Deep-Learning-Framework-Popularity

Python
1
star
35

pythonLearn

Python
1
star
36

BeihangData

Python
1
star
37

tiny-whisper

Python
1
star
38

Neurips19-Statistics

1
star
39

gluon-multiple-gpu

Python
1
star
40

tvm-hack

Python
1
star
41

lith

Ligeng's extensions for PyTorch
Python
1
star
42

ubuntun-research

Common scripts I've used for setting up my ubuntu server
Shell
1
star
43

wandb-example

Python
1
star