• Stars
    star
    465
  • Rank 94,287 (Top 2 %)
  • Language
    Jupyter Notebook
  • Created over 4 years ago
  • Updated over 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[ECCV'2020] STTN: Learning Joint Spatial-Temporal Transformations for Video Inpainting

STTN for Video Inpainting

teaser

Paper | Project | Slides |BibTex

Learning Joint Spatial-Temporal Transformations for Video Inpainting

Yanhong Zeng, Jianlong Fu, and Hongyang Chao.
In ECCV 2020.

Citation

If any part of our paper and repository is helpful to your work, please generously cite with:

@inproceedings{yan2020sttn,
  author = {Zeng, Yanhong and Fu, Jianlong and Chao, Hongyang,
  title = {Learning Joint Spatial-Temporal Transformations for Video Inpainting},
  booktitle = {The Proceedings of the European Conference on Computer Vision (ECCV)},
  year = {2020}
}

Introduction

High-quality video inpainting that completes missing regions in video frames is a promising yet challenging task.

In this paper, we propose to learn a joint Spatial-Temporal Transformer Network (STTN) for video inpainting. Specifically, we simultaneously fill missing regions in all input frames by the proposed multi-scale patch-based attention modules. STTN is optimized by a spatial-temporal adversarial loss.

To show the superiority of the proposed model, we conduct both quantitative and qualitative evaluations by using standard stationary masks and more realistic moving object masks.

STTN

Installation

Clone this repo.

git clone [email protected]:researchmm/STTN.git
cd STTN/

We build our project based on Pytorch and Python. For the full set of required Python packages, we suggest create a Conda environment from the provided YAML, e.g.

conda env create -f environment.yml 
conda activate sttn

Completing Videos Using Pretrained Model

The result videos can be generated using pretrained models. For your reference, we provide a model pretrained on Youtube-VOS(Google Drive Folder).

  1. Download the pretrained models from the Google Drive Folder, save it in checkpoints/.

  2. Complete videos using the pretrained model. For example,

python test.py --video examples/schoolgirls_orig.mp4 --mask examples/schoolgirls  --ckpt checkpoints/sttn.pth 

The outputs videos are saved at examples/.

Dataset Preparation

We provide dataset split in datasets/.

Preparing Youtube-VOS (2018) Dataset. The dataset can be downloaded from here. In particular, we follow the standard train/validation/test split (3,471/474/508). The dataset should be arranged in the same directory structure as

datasets
    |- youtube-vos
        |- JPEGImages
           |- <video_id>.zip
           |- <video_id>.zip
        |- test.json 
        |- train.json 

Preparing DAVIS (2018) Dataset. The dataset can be downloaded from here. In particular, there are 90 videos with densely-annotated object masks and 60 videos without annotations. The dataset should be arranged in the same directory structure as

datasets
    |- davis
        |- JPEGImages
          |- cows.zip
          |- goat.zip
        |- Annoatations
          |- cows.zip
          |- goat.zip
        |- test.json 
        |- train.json 

Training New Models

Once the dataset is ready, new models can be trained with the following commands. For example,

python train.py --config configs/youtube-vos.json --model sttn 

Testing

Testing is similar to Completing Videos Using Pretrained Model.

python test.py --video examples/schoolgirls_orig.mp4 --mask examples/schoolgirls  --ckpt checkpoints/sttn.pth 

The outputs videos are saved at examples/.

Visualization

We provide an example of visualization attention maps in visualization.ipynb.

Training Monitoring

We provide traning monitoring on losses by running:

tensorboard --logdir release_mode                                                    

Contact

If you have any questions or suggestions about this paper, feel free to contact me ([email protected]).

More Repositories

1

TTSR

[CVPR'20] TTSR: Learning Texture Transformer Network for Image Super-Resolution
Python
765
star
2

SiamDW

[CVPR'19 Oral] Deeper and Wider Siamese Networks for Real-Time Visual Tracking
Python
750
star
3

Stark

[ICCV'21] Learning Spatio-Temporal Transformer for Visual Tracking
Python
645
star
4

TracKit

[ECCV'20] Ocean: Object-aware Anchor-Free Tracking
Python
612
star
5

AOT-GAN-for-Inpainting

[TVCG'2023] AOT-GAN for High-Resolution Image Inpainting (codebase for image inpainting)
Python
424
star
6

LightTrack

[CVPR21] LightTrack: Finding Lightweight Neural Network for Object Tracking via One-Shot Architecture Search
Python
396
star
7

MM-Diffusion

[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
Python
389
star
8

PEN-Net-for-Inpainting

[CVPR'2019] PEN-Net: Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting
Python
357
star
9

img2poem

[MM'18] Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training
Python
280
star
10

tasn

Trilinear Attention Sampling Network for Fine-grained Image Recognition
Python
218
star
11

soho

[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Python
206
star
12

TTVSR

[CVPR'22 Oral] TTVSR: Learning Trajectory-Aware Transformer for Video Super-Resolution
Python
199
star
13

FTVSR

[ECCV'22] FTVSR: Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution
Python
154
star
14

DBTNet

Code for our NeurIPS'19 paper "Learning Deep Bilinear Transformation for Fine-grained Image Representation"
Python
105
star
15

generate-it

A collection of models for image<->text generation in ACM MM 2021.
Python
64
star
16

CKDN

[ICCV'21] CKDN: Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment
Python
55
star
17

SariGAN

[NeurIPS'20] Learning Semantic-aware Normalization for Generative Adversarial Networks
Python
53
star
18

VOT2019

The Winner and Runner-up Trackers for VOT-2019 Challenges
Python
51
star
19

WSOD2

[ICCV'19] WSOD^2: Learning Bottom-up and Top-down Objectness Distillation for Weakly-supervised Object Detection
Python
47
star
20

VQD-SR

[ICCV'23] VQD-SR: Learning Data-Driven Vector-Quantized Degradation Model for Animation Video Super-Resolution
Python
37
star
21

CyDAS

Cyclic Differentiable Architecture Search
Python
34
star
22

NEAS

Python
19
star
23

2D-TAN

AAAI2020 - Learning 2D Temporal Localization Networks for Moment Localization with Natural Language
Python
17
star
24

STTR

[ACCV'22] Fine-Grained Image Style Transfer with Visual Transformers
Python
14
star
25

AAST-pytorch

[MM'20] Aesthetic-Aware Image Style Transfer
Python
14
star
26

davinci-videofactory

JavaScript
12
star
27

AI_Illustrator

[MM'22 Oral] AI Illustrator: Translating Raw Descriptions into Images by Prompt-based Cross-Modal Generation
Python
11
star
28

language-guided-animation

[TMM 2023] Language-Guided Face Animation by Recurrent StyleGAN-based Generator
Python
11
star