• Stars
    star
    154
  • Rank 242,095 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created over 2 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[ECCV'22] FTVSR: Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution

FTVSR (ECCV 2022)

PWC PWC

This is the official PyTorch implementation of the paper Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution.

Contents

Introduction

Compressed video super-resolution (VSR) aims to restore high-resolution frames from compressed low-resolution counterparts. Most recent VSR approaches often enhance an input frame by β€œborrowing” relevant textures from neighboring video frames. Although some progress has been made, there are grand challenges to effectively extract and transfer high-quality textures from compressed videos where most frames are usually highly degraded. we propose a novel Frequency-Transformer for compressed Video Super-Resolution (FTVSR) that conducts self-attention over a joint space-time-frequency domain. FTVSR significantly outperforms previous methods and achieves new SOTA results.

Contribution

We propose transfering video frames into frequecy domain design a novel frequency attention mechanism. We study the different self-attention schemes among space, time and frequency dimensions. We propose a novel Frequency-Transformer for compressed Video Super-Resolution (FTVSR) that conducts self-attention over a joint space-time-frequency domain.

Overview

Visual

Some visual results on videos with different compression rates (No compression, CRF 15, 25, 35).

Requirements and dependencies

  • python 3.7 (recommend to use Anaconda)
  • pytorch == 1.9.0
  • torchvision == 0.10.0
  • opencv-python == 4.5.3
  • mmcv-full == 1.3.9
  • scipy==1.7.3
  • scikit-image == 0.19.0
  • lmdb == 1.2.1
  • yapf == 0.31.0
  • tensorboard == 2.6.0

Model

Pre-trained models can be downloaded from baidu cloud(i42r) or Google drive.

  • FTVSR_REDS.pth: trained on REDS dataset with 50% uncompressed videos and 50% compressed videos (CRF 15, 25, 35).
  • FTVSR_Vimeo90K.pth: trained on Vimeo-90K dataset with 50% uncompressed videos and 50% compressed videos (CRF 15, 25, 35).

Dataset

  1. Training set

    • REDS dataset. We regroup the training and validation dataset into one folder. The original training dataset has 240 clips from 000 to 239. The original validation dataset were renamed from 240 to 269.
      • Make REDS structure be:
      	β”œβ”€β”€β”€β”€REDS
      		β”œβ”€β”€β”€β”€train
      			β”œβ”€β”€β”€β”€train_sharp
      				β”œβ”€β”€β”€β”€000
      				β”œβ”€β”€β”€β”€...
      				β”œβ”€β”€β”€β”€269
      			β”œβ”€β”€β”€β”€train_sharp_bicubic
      				β”œβ”€β”€β”€β”€X4
      					β”œβ”€β”€β”€β”€000
      					β”œβ”€β”€β”€β”€...
      					β”œβ”€β”€β”€β”€269
      
    • Viemo-90K dataset. Download the original data and use the script 'degradation/BD_degradation.m' (run in MATLAB) to generate the low-resolution images. The sep_trainlist.txt file listing the training samples in the download zip file.
      • Make Vimeo-90K structure be:
       	β”œβ”€β”€β”€β”€vimeo_septuplet
       		β”œβ”€β”€β”€β”€sequences
       			β”œβ”€β”€β”€β”€00001
       			β”œβ”€β”€β”€β”€...
       			β”œβ”€β”€β”€β”€00096
       		β”œβ”€β”€β”€β”€sequences_BD
       			β”œβ”€β”€β”€β”€00001
       			β”œβ”€β”€β”€β”€...
       			β”œβ”€β”€β”€β”€00096
       		β”œβ”€β”€β”€β”€sep_trainlist.txt
       		β”œβ”€β”€β”€β”€sep_testlist.txt
      
    • Generate the compressed videos by ffmpeg with command "ffmpeg -i LR.mp4 -vcodec libx264 -crf CRFvalue LR_compressed.mp4". We train FTVSR on the 50% uncompressed videos and 50% compressed videos with CRF 15, 25, and 35.
  2. Testing set

    • REDS4 and Vid4 dataset. The 000, 011, 015, 020 clips from the original training dataset of REDS. Download the compressed testing videos from baidu cloud or Google drive.

Test

  1. Clone this github repo
git clone https://github.com/researchmm/FTVSR.git
cd FTVSR
  1. Download pre-trained weights (baidu cloud | Google drive) under ./checkpoint
  2. Prepare testing dataset and modify "dataset_root" in configs/FTVSR_reds4.py and configs/FTVSR_vimeo90k.py
  3. Run test
# REDS model
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./tools/dist_test.sh configs/FTVSR_reds4.py checkpoint/FTVSR_REDS.pth 8 [--save-path 'save_path']
# Vimeo model
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./tools/dist_test.sh configs/FTVSR_vimeo90k.py checkpoint/FTVSR_Vimeo90K.pth 8 [--save-path 'save_path']
  1. The results are saved in save_path.

Train

  1. Clone this github repo
git clone https://github.com/researchmm/FTVSR.git
cd FTVSR
  1. Prepare training dataset and modify "dataset_root" in configs/FTVSR_reds4.py and configs/FTVSR_vimeo90k.py
  2. Run training
# REDS
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./tools/dist_train.sh configs/FTVSR_reds4.py 8
# Vimeo
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./tools/dist_train.sh configs/FTVSR_vimeo90k.py 8

Related projects

We also sincerely recommend some other excellent works related to us. ✨

Citation

If you find the code and pre-trained models useful for your research, please consider citing our paper. 😊

@InProceedings{qiu2022learning,
author = {Qiu, Zhongwei and Yang, Huan and Fu, Jianlong and Fu, Dongmei},
title = {Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution},
booktitle = {ECCV},
year = {2022},
}

Acknowledgment

This code is built on mmediting. We thank the authors of BasicVSR for sharing their code.

More Repositories

1

TTSR

[CVPR'20] TTSR: Learning Texture Transformer Network for Image Super-Resolution
Python
765
star
2

SiamDW

[CVPR'19 Oral] Deeper and Wider Siamese Networks for Real-Time Visual Tracking
Python
750
star
3

Stark

[ICCV'21] Learning Spatio-Temporal Transformer for Visual Tracking
Python
645
star
4

TracKit

[ECCV'20] Ocean: Object-aware Anchor-Free Tracking
Python
612
star
5

STTN

[ECCV'2020] STTN: Learning Joint Spatial-Temporal Transformations for Video Inpainting
Jupyter Notebook
465
star
6

AOT-GAN-for-Inpainting

[TVCG'2023] AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)
Python
424
star
7

LightTrack

[CVPR21] LightTrack: Finding Lightweight Neural Network for Object Tracking via One-Shot Architecture Search
Python
396
star
8

MM-Diffusion

[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
Python
389
star
9

PEN-Net-for-Inpainting

[CVPR'2019] PEN-Net: Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting
Python
357
star
10

img2poem

[MM'18] Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training
Python
280
star
11

tasn

Trilinear Attention Sampling Network for Fine-grained Image Recognition
Python
218
star
12

soho

[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Python
206
star
13

TTVSR

[CVPR'22 Oral] TTVSR: Learning Trajectory-Aware Transformer for Video Super-Resolution
Python
199
star
14

DBTNet

Code for our NeurIPS'19 paper "Learning Deep Bilinear Transformation for Fine-grained Image Representation"
Python
105
star
15

generate-it

A collection of models for image<->text generation in ACM MM 2021.
Python
64
star
16

CKDN

[ICCV'21] CKDN: Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment
Python
55
star
17

SariGAN

[NeurIPS'20] Learning Semantic-aware Normalization for Generative Adversarial Networks
Python
53
star
18

VOT2019

The Winner and Runner-up Trackers for VOT-2019 Challenges
Python
51
star
19

WSOD2

[ICCV'19] WSOD^2: Learning Bottom-up and Top-down Objectness Distillation for Weakly-supervised Object Detection
Python
47
star
20

VQD-SR

[ICCV'23] VQD-SR: Learning Data-Driven Vector-Quantized Degradation Model for Animation Video Super-Resolution
Python
37
star
21

CyDAS

Cyclic Differentiable Architecture Search
Python
34
star
22

NEAS

Python
19
star
23

2D-TAN

AAAI2020 - Learning 2D Temporal Localization Networks for Moment Localization with Natural Language
Python
17
star
24

STTR

[ACCV'22] Fine-Grained Image Style Transfer with Visual Transformers
Python
14
star
25

AAST-pytorch

[MM'20] Aesthetic-Aware Image Style Transfer
Python
14
star
26

davinci-videofactory

JavaScript
12
star
27

AI_Illustrator

[MM'22 Oral] AI Illustrator: Translating Raw Descriptions into Images by Prompt-based Cross-Modal Generation
Python
11
star
28

language-guided-animation

[TMM 2023] Language-Guided Face Animation by Recurrent StyleGAN-based Generator
Python
11
star