Spatio-Temporal Deformable Convolution for Compressed Video Quality Enhancement (AAAI 2020)
ð Note: We implement STDF based on MMEditing at PowerQE.
0. Background
PyTorch implementation of Spatio-Temporal Deformable Convolution for Compressed Video Quality Enhancement (AAAI 2020).
- A simple and effective video quality enhancement network.
- Adopt feature alignment by multi-frame deformable convolutions, instead of motion estimation and motion compensation.
Notice: The dataset and training method are different from those in the original paper.
(Figure copyright: Jianing Deng)
Feel free to contact: [email protected].
1. Pre-request
1.1. Environment
conda create -n stdf python=3.7 -y && conda activate stdf
git clone --depth=1 https://github.com/ryanxingql/stdf-pytorch && cd stdf-pytorch/
# given CUDA 10.1
python -m pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
python -m pip install tqdm lmdb pyyaml opencv-python scikit-image
1.2. DCNv2
Build DCNv2
cd ops/dcn/
bash build.sh
Check if DCNv2 works (optional)
python simple_check.py
The DCNv2 source files here is different from the open-sourced version due to incompatibility. [issue]
1.3. MFQEv2 dataset
Download and compress videos
Please check here.
Edit YML
We now edit option_R3_mfqev2_4G.yml
.
Suppose the folder MFQEv2_dataset/
is placed at /raid/xql/datasets/MFQEv2_dataset/
, then you should assign /raid/xql/datasets/MFQEv2_dataset/
to dataset -> train -> root
in YAML.
R3
: one of the network structures provided in the paper;mfqev2
: MFQEv2 dataset will be adopted;4G
: 4 GPUs will be used for the below training. Similarly, you can also editoption_R3_mfqev2_1G.yml
andoption_R3_mfqev2_2G.yml
if needed.
Generate LMDB
We now generate LMDB to speed up IO during training.
python create_lmdb_mfqev2.py --opt_path option_R3_mfqev2_4G.yml
Now you will get all needed data:
MFQEv2_dataset/
âââ train_108/
â âââ raw/
â âââ HM16.5_LDP/
â âââ QP37/
âââ test_18/
â âââ raw/
â âââ HM16.5_LDP/
â âââ QP37/
âââ mfqev2_train_gt.lmdb/
âââ mfqev2_train_lq.lmdb/
Finally, the MFQEv2 dataset root will be sym-linked to the folder ./data/
automatically.
So that we and programmes can access MFQEv2 dataset at
./data/
directly.
2. Train
See script.sh
.
3. Test
Pretrained models can be found here: [Releases] and [įūåšĶį―į (stdf)]
3.1. Test MFQEv2 dataset after training
See script.sh
.
3.2. Test MFQEv2 dataset without training
If you did not run create_lmdb
for training, you should first sym-link MFQEv2 dataset to ./data/
.
mkdir data/
ln -s /your/path/to/MFQEv2_dataset/ data/MFQEv2
Download the pre-trained model, and see script.sh
.
3.3. Test your own video
First download the pre-trained model, and then run:
CUDA_VISIBLE_DEVICES=0 python test_one_video.py
See test_one_video.py
for more details.
4. Result
loading model exp/MFQEv2_R3_enlarge300x/ckp_290000.pt...
> model exp/MFQEv2_R3_enlarge300x/ckp_290000.pt loaded.
<<<<<<<<<< Results >>>>>>>>>>
BQMall_832x480_600.yuv: [31.297] dB -> [32.221] dB
BQSquare_416x240_600.yuv: [28.270] dB -> [29.078] dB
BQTerrace_1920x1080_600.yuv: [31.247] dB -> [31.852] dB
BasketballDrill_832x480_500.yuv: [31.591] dB -> [32.359] dB
BasketballDrive_1920x1080_500.yuv: [33.227] dB -> [33.963] dB
BasketballPass_416x240_500.yuv: [30.482] dB -> [31.446] dB
BlowingBubbles_416x240_500.yuv: [27.794] dB -> [28.465] dB
Cactus_1920x1080_500.yuv: [32.207] dB -> [32.918] dB
FourPeople_1280x720_600.yuv: [34.589] dB -> [35.533] dB
Johnny_1280x720_600.yuv: [36.375] dB -> [37.161] dB
Kimono_1920x1080_240.yuv: [34.411] dB -> [35.272] dB
KristenAndSara_1280x720_600.yuv: [35.887] dB -> [36.895] dB
ParkScene_1920x1080_240.yuv: [31.583] dB -> [32.140] dB
PartyScene_832x480_500.yuv: [27.802] dB -> [28.402] dB
PeopleOnStreet_2560x1600_150.yuv: [31.388] dB -> [32.557] dB
RaceHorses_416x240_300.yuv: [29.320] dB -> [30.055] dB
RaceHorses_832x480_300.yuv: [30.094] dB -> [30.557] dB
Traffic_2560x1600_150.yuv: [33.176] dB -> [33.866] dB
> ori: [31.708] dB
> ave: [32.486] dB
> delta: [0.778] dB
TOTAL TIME: [0.2] h
5. Q&A
5.1. Vimeo-90K dataset
You should download the Vimeo-90K dataset, convert these PNG sequences into 7-frame YCbCr YUV444P videos, then compress these videos under QP37, All Intra, HM16.5.
We also provide one-click programme at [Releases] and [įūåšĶį―į (stdf)].
Vimeo-90K/
âââ vimeo_septuplet/
â âââ ...
âââ vimeo_septuplet_ycbcr/
â âââ ...
âââ vimeo_septuplet_ycbcr_intra/
âââ ...
The LMDB preparation, option YAML, training and test codes have been already provided in this repository.
5.2. Why the epoch index starts from 0, while the iter index (also model index) starts from 1
Small bug. I may fix it some time.
5.3. How do we enlarge the dataset
Following BasicSR, we set sampling index = target index % dataset len
.
For example, if we have a dataset which volume is 4 and enlargement ratio is 2, then we will sample images at indexes equal 0, 1, 2, 3, 0, 1, 2, 3. Note that at each sampling, we will randomly crop the image. Therefore, the patches cropped at the same image but different times can be different.
Besides, the data loader will be shuffled at the start of each epoch. Enlarging epoch can help reduce the total starting times.
5.4. Why do we set the number of iteration, not epoch
Considering that we can enlarge the dataset with various ratio, the number of epoch is meaningless. In the meanwhile, the number of iteration indicates the number of sampling batches, which is more meaningful to us.
6. License
We adopt Apache License v2.0. For other licenses, please refer to BasicSR and DCNv2.
If you find this repository helpful, you may cite:
@inproceedings{2020deng_stdf,
title={Spatio-Temporal Deformable Convolution for Compressed Video Quality Enhancement},
author={Deng, Jianing and Wang, Li and Pu, Shiliang and Zhuo, Cheng},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={34},
number={07},
pages={10696--10703},
year={2020}
}
@software{2020xing_stdf,
author = {Xing, Qunliang and Deng, Jianing},
month = {9},
title = {{PyTorch implementation of STDF}},
url = {https://github.com/ryanxingql/stdf-pytorch},
version = {1.0.0},
year = {2020}
}
Special thanks to Jianing Deng (éåŪķåŪ, the author of STDF) for network structure and training details.