• Stars
    star
    197
  • Rank 196,523 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created over 2 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[CVPR'22 Oral] TTVSR: Learning Trajectory-Aware Transformer for Video Super-Resolution

TTVSR (CVPR2022, Oral)

This is the official PyTorch implementation of the paper Learning Trajectory-Aware Transformer for Video Super-Resolution.

Contents

Introduction

We proposed an approach named TTVSR to study video super-resolution by leveraging long-range frame dependencies. TTVSR introduces Transformer architectures in video super-resolution tasks and formulates video frames into pre-aligned trajectories of visual tokens to calculate attention along trajectories.

Contribution

We propose a novel trajectory-aware Transformer, which is one of the first works to introduce Transformer into video super-resolution tasks. TTVSR reduces computational costs and enables long-range modeling in videos. TTVSR can outperform existing SOTA methods in four widely-used VSR benchmarks.

Overview

Visual

Requirements and dependencies

  • python 3.7 (recommend to use Anaconda)
  • pytorch == 1.9.0
  • torchvision == 0.10.0
  • opencv-python == 4.5.3
  • mmcv-full == 1.3.9
  • scipy==1.7.3
  • scikit-image == 0.19.0
  • lmdb == 1.2.1
  • yapf == 0.31.0
  • tensorboard == 2.6.0

Model and results

Pre-trained models can be downloaded from onedrive, google drive, and baidu cloud(nbgc).

  • TTVSR_REDS.pth: trained on REDS dataset with BI degradation.
  • TTVSR_Vimeo90K.pth: trained on Vimeo-90K dataset with BD degradation.

The output results on REDS4, Vid4 and UMD10 can be downloaded from onedrive, google drive, and baidu cloud(nbgc).

Dataset

  1. Training set

    • REDS dataset. We regroup the training and validation dataset into one folder. The original training dataset has 240 clips from 000 to 239. The original validation dataset were renamed from 240 to 269.
      • Make REDS structure be:
      	β”œβ”€β”€β”€β”€REDS
      		β”œβ”€β”€β”€β”€train
      			β”œβ”€β”€β”€β”€train_sharp
      				β”œβ”€β”€β”€β”€000
      				β”œβ”€β”€β”€β”€...
      				β”œβ”€β”€β”€β”€269
      			β”œβ”€β”€β”€β”€train_sharp_bicubic
      				β”œβ”€β”€β”€β”€X4
      					β”œβ”€β”€β”€β”€000
      					β”œβ”€β”€β”€β”€...
      					β”œβ”€β”€β”€β”€269
      
    • Viemo-90K dataset. Download the original training + test set and use the script 'degradation/BD_degradation.m' (run in MATLAB) to generate the low-resolution images. The sep_trainlist.txt file listing the training samples in the download zip file.
      • Make Vimeo-90K structure be:
       	β”œβ”€β”€β”€β”€vimeo_septuplet
       		β”œβ”€β”€β”€β”€sequences
       			β”œβ”€β”€β”€β”€00001
       			β”œβ”€β”€β”€β”€...
       			β”œβ”€β”€β”€β”€00096
       		β”œβ”€β”€β”€β”€sequences_BD
       			β”œβ”€β”€β”€β”€00001
       			β”œβ”€β”€β”€β”€...
       			β”œβ”€β”€β”€β”€00096
       		β”œβ”€β”€β”€β”€sep_trainlist.txt
       		β”œβ”€β”€β”€β”€sep_testlist.txt
      
  2. Testing set

    • REDS4 dataset. The 000, 011, 015, 020 clips from the original training dataset of REDS.
    • Viemo-90K dataset. The sep_testlist.txt file listing the testing samples in the download zip file.
    • Vid4 and UDM10 dataset. Use the script 'degradation/BD_degradation.m' (run in MATLAB) to generate the low-resolution images.
      • Make Vid4 and UDM10 structure be:
       	β”œβ”€β”€β”€β”€VID4
       		β”œβ”€β”€β”€β”€BD
       			β”œβ”€β”€β”€β”€calendar
       			β”œβ”€β”€β”€β”€...
       		β”œβ”€β”€β”€β”€HR
       			β”œβ”€β”€β”€β”€calendar
       			β”œβ”€β”€β”€β”€...
       	β”œβ”€β”€β”€β”€UDM10
       		β”œβ”€β”€β”€β”€BD
       			β”œβ”€β”€β”€β”€archpeople
       			β”œβ”€β”€β”€β”€...
       		β”œβ”€β”€β”€β”€HR
       			β”œβ”€β”€β”€β”€archpeople
       			β”œβ”€β”€β”€β”€...
      

Test

  1. Clone this github repo
git clone https://github.com/researchmm/TTVSR.git
cd TTVSR
  1. Download pre-trained weights (onedrive|google drive|baidu cloud(nbgc)) under ./checkpoint
  2. Prepare testing dataset and modify "dataset_root" in configs/TTVSR_reds4.py and configs/TTVSR_vimeo90k.py
  3. Run test
# REDS model
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./tools/dist_test.sh configs/TTVSR_reds4.py checkpoint/TTVSR_REDS.pth 8 [--save-path 'save_path']
# Vimeo model
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./tools/dist_test.sh configs/TTVSR_vimeo90k.py checkpoint/TTVSR_Vimeo90K.pth 8 [--save-path 'save_path']
  1. The results are saved in save_path.

Train

  1. Clone this github repo
git clone https://github.com/researchmm/TTVSR.git
cd TTVSR
  1. Prepare training dataset and modify "dataset_root" in configs/TTVSR_reds4.py and configs/TTVSR_vimeo90k.py
  2. Run training
# REDS
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./tools/dist_train.sh configs/TTVSR_reds4.py 8
# Vimeo
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./tools/dist_train.sh configs/TTVSR_vimeo90k.py 8
  1. The training results are saved in ./ttvsr_reds4 and ./ttvsr_vimeo90k (also can be set by modifying "work_dir" in configs/TTVSR_reds4.py and configs/TTVSR_vimeo90k.py)

Related projects

We also sincerely recommend some other excellent works related to us. ✨

Citation

If you find the code and pre-trained models useful for your research, please consider citing our paper. 😊

@InProceedings{liu2022learning,
author = {Liu, Chengxu and Yang, Huan and Fu, Jianlong and Qian, Xueming},
title = {Learning Trajectory-Aware Transformer for Video Super-Resolution},
booktitle = {CVPR},
year = {2022},
month = {June}
}

Acknowledgment

This code is built on mmediting. We thank the authors of BasicVSR for sharing their code.

Contact

If you meet any problems, please describe them in issues or contact:

More Repositories

1

TTSR

[CVPR'20] TTSR: Learning Texture Transformer Network for Image Super-Resolution
Python
756
star
2

SiamDW

[CVPR'19 Oral] Deeper and Wider Siamese Networks for Real-Time Visual Tracking
Python
747
star
3

Stark

[ICCV'21] Learning Spatio-Temporal Transformer for Visual Tracking
Python
628
star
4

TracKit

[ECCV'20] Ocean: Object-aware Anchor-Free Tracking
Python
608
star
5

STTN

[ECCV'2020] STTN: Learning Joint Spatial-Temporal Transformations for Video Inpainting
Jupyter Notebook
462
star
6

AOT-GAN-for-Inpainting

[TVCG'2023] AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)
Python
416
star
7

LightTrack

[CVPR21] LightTrack: Finding Lightweight Neural Network for Object Tracking via One-Shot Architecture Search
Python
387
star
8

MM-Diffusion

[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
Python
354
star
9

PEN-Net-for-Inpainting

[CVPR'2019] PEN-Net: Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting
Python
354
star
10

img2poem

[MM'18] Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training
Python
282
star
11

tasn

Trilinear Attention Sampling Network for Fine-grained Image Recognition
Python
218
star
12

soho

[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Python
205
star
13

FTVSR

[ECCV'22] FTVSR: Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution
Python
151
star
14

DBTNet

Code for our NeurIPS'19 paper "Learning Deep Bilinear Transformation for Fine-grained Image Representation"
Python
105
star
15

generate-it

A collection of models for image<->text generation in ACM MM 2021.
Python
64
star
16

CKDN

[ICCV'21] CKDN: Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment
Python
55
star
17

SariGAN

[NeurIPS'20] Learning Semantic-aware Normalization for Generative Adversarial Networks
Python
54
star
18

VOT2019

The Winner and Runner-up Trackers for VOT-2019 Challenges
Python
50
star
19

WSOD2

[ICCV'19] WSOD^2: Learning Bottom-up and Top-down Objectness Distillation for Weakly-supervised Object Detection
Python
46
star
20

CyDAS

Cyclic Differentiable Architecture Search
Python
34
star
21

VQD-SR

[ICCV'23] VQD-SR: Learning Data-Driven Vector-Quantized Degradation Model for Animation Video Super-Resolution
Python
34
star
22

NEAS

Python
19
star
23

2D-TAN

AAAI2020 - Learning 2D Temporal Localization Networks for Moment Localization with Natural Language
Python
16
star
24

AAST-pytorch

[MM'20] Aesthetic-Aware Image Style Transfer
Python
14
star
25

STTR

[ACCV'22] Fine-Grained Image Style Transfer with Visual Transformers
Python
12
star
26

davinci-videofactory

JavaScript
12
star
27

AI_Illustrator

[MM'22 Oral] AI Illustrator: Translating Raw Descriptions into Images by Prompt-based Cross-Modal Generation
Python
11
star
28

language-guided-animation

[TMM 2023] Language-Guided Face Animation by Recurrent StyleGAN-based Generator
Python
10
star