Video Compression through Image Interpolation
Chao-Yuan Wu,
Nayan Singhal,
Philipp Krähenbühl.
In ECCV, 2018.
[Project Page]
[Paper]
Overview
PyTorch implementation of deep video compression codec.
Currently supported:
- Training interpolation models with different offsets.
- Evaluation on single model (PSNR/MS-SSIM).
- Some ablation study options.
Coming soon:
- Entropy coding.
- Evaluation on combined model.
Dependency
We conducted experiments in the following environment:
- Linux
- Python 3.6.3
- PyTorch 0.3.0.post4
- TITAN X GPUs with CuDNN.
Similar environments (e.g. with OSX, Python 2) might work with small modification, but not tested.
Getting started
We provide a demo training script which trains on 7 clips for 100 iterations, and evaluates on a hold-out clip.
To run the demo, please download the demo data,
and run train.sh 2
(the argument (0
, 1
, or 2
) specifies the level
of hierarchy).
This will take about 3 minutes.
Expected output:
Creating loader for data/train...
448 images loaded.
distance=1/2
Loader for 448 images (28 batches) created.
Encoder fuse level: 1
Decoder fuse level: 1
...
[TRAIN] Iter[1]; LR: 0.00025; Loss: 0.260715; Backprop: 0.2985 sec; Batch: 2.8358 sec
[TRAIN] Iter[2]; LR: 0.00025; Loss: 0.237539; Backprop: 0.2371 sec; Batch: 1.5466 sec
[TRAIN] Iter[3]; LR: 0.00025; Loss: 0.241159; Backprop: 0.3445 sec; Batch: 1.4208 sec
[TRAIN] Iter[4]; LR: 0.00025; Loss: 0.193481; Backprop: 0.2328 sec; Batch: 1.3091 sec
[TRAIN] Iter[5]; LR: 0.00025; Loss: 0.181479; Backprop: 0.2336 sec; Batch: 1.2742 sec
...
[TRAIN] Iter[99]; LR: 0.00025; Loss: 0.090678; Backprop: 0.2444 sec; Batch: 1.3960 sec
[TRAIN] Iter[100]; LR: 0.00025; Loss: 0.082984; Backprop: 0.2431 sec; Batch: 1.3988 sec
Loss at each step:
0.1758 0.0982 0.0620 0.0574 0.0579 0.0597 0.0653 0.0742 0.0846 0.0949
...
Start evaluation...
...
Creating loader for data/eval...
8 images loaded.
distance=1/2
Loader for 8 images (8 batches) created.
...
Evaluation @iter 100 done in 24 secs
TVL Loss : 0.12207 0.06993 0.06193 0.06525 0.06742 0.07027 0.07543 0.08148 0.08650 0.09003
TVL MS-SSIM: 0.61841 0.81475 0.85905 0.87109 0.87745 0.88022 0.87903 0.87486 0.86880 0.86132
TVL PSNR : 28.02937 28.63096 28.87302 28.87184 28.77673 28.64452 28.44989 28.28644 28.19869 28.15354
Output images with different number of
progression compression iterations
are stored in a directory called output
.
It starts with the blurry output with 1 iteration:
to the better output with 10 iterations:
Since we've only trained the model for 3 minutes, the results don't look great yet, but we can see that it roughly starts to reconstruct the frames.
The final result using full training set will look like:
Please see our Project Page for more examples.
To train or evaluate on additional datasets,
please see DATA.md
for details and instructions.
Model weights and logs
Model weights are available here
.
The associated logs are available here
.
Kinetics video IDs
The list of Kinetics videos we used for train/val/test is available here
.
Citation
If you find this model useful for your research, please use the following BibTeX entry.
@inproceedings{wu2018vcii, title={Video Compression through Image Interpolation}, author={Wu, Chao-Yuan and Nayan Singhal and Kr{\"a}henb{\"u}hl, Philipp}, booktitle={ECCV}, year={2018} }
Acknowledgment
This implementation largely borrows from pytorch-image-comp-rnn by Biao Zhang (1zb). U-net implementation borrows from Pytorch-UNet by Milesi Alexandre. Thank you, Biao and Milesi Alexandre!