• Stars
    star
    278
  • Rank 144,420 (Top 3 %)
  • Language
    Python
  • License
    Other
  • Created about 5 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

RVOS: End-to-End Recurrent Network for Video Object Segmentation (CVPR 2019)

RVOS: End-to-End Recurrent Net for Video Object Segmentation

See our project website here.

In order to develop this code, we used RSIS (Recurrent Semantic Instance Segmentation), which can be found here, and modified it to suit it to video object segmentation task.

One shot visual results

RVOS One shot

Zero shot visual results

RVOS Zero shot

License

This code cannot be used for commercial purposes. Please contact the authors if interested in licensing this software.

Installation

  • Clone the repo:
git clone https://github.com/imatge-upc/rvos.git
  • Install requirements pip install -r requirements.txt
  • Install PyTorch 1.0 (choose the whl file according to your setup, e.g. your CUDA version):
pip3 install https://download.pytorch.org/whl/cu100/torch-1.0.1.post2-cp36-cp36m-linux_x86_64.whl
pip3 install torchvision

Data

YouTube-VOS

Download the YouTube-VOS dataset from their website. You will need to register to codalab to download the dataset. Create a folder named databasesin the parent folder of the root directory of this project and put there the database in a folder named YouTubeVOS. The root directory (rvosfolder) and the databases folder should be in the same directory.

The training of the RVOS model for YouTube-VOS has been implemented using a split of the train set into two subsets: train-train and train-val. The model is trained on the train-train subset and validated on the train-val subset to decide whether the model should be saved or not. To train the model according to this split, the code requires that there are two json files in the databases/YouTubeVOS/train/folder named train-train-meta.jsonand train-val-meta.json with the same format as the meta.jsonincluded when downloading the dataset. You can also download the partition used in our experiments in the following links:

DAVIS 2017

Download the DAVIS 2017 dataset from their website at 480p resolution. Create a folder named databasesin the parent folder of the root directory of this project and put there the database in a folder named DAVIS2017. The root directory (rvosfolder) and the databases folder should be in the same directory.

LMDB data indexing

To highly speed the data loading we recommend to generate an LMDB indexing of it by doing:

python dataset_lmdb_generator.py -dataset=youtube

or

python dataset_lmdb_generator.py -dataset=davis2017

depending on the dataset you are using.

Training

  • Train the model for one-shot video object segmentation with python train_previous_mask.py -model_name model_name. Checkpoints and logs will be saved under ../models/model_name.
  • Train the model for zero-shot video object segmentation with python train.py -model_name model_name. Checkpoints and logs will be saved under ../models/model_name.
  • Other arguments can be passed as well. For convenience, scripts to train with typical parameters are provided under scripts/.
  • Plot loss curves at any time with python plot_curves.py -model_name model_name.

Evaluation

We provide bash scripts to evaluate models for the YouTube-VOS and DAVIS 2017 datasets. You can find them under the scripts folder. On the one hand, eval_one_shot_youtube.shand eval_zero_shot_youtube.sh generate the results for YouTube-VOS dataset on one-shot video object segmentation and zero-shot video object segmentation respectively. On the other hand, eval_one_shot_davis.shand eval_zero_shot_davis.sh generate the results for DAVIS 2017 dataset on one-shot video object segmentation and zero-shot video object segmentation respectively.

Furthermore, in the src folder, prepare_results_submission.pyand prepare_results_submission_davis can be applied to change the format of the results in the appropiate format to use the official evaluation servers of YouTube-VOS and DAVIS respectively.

Demo

You can run demo.py to do generate the segmentation masks of a video. Just do:

python demo.py -model_name one-shot-model-davis --overlay_masks

and it will generate the resulting masks.

To run the demo for your own videos:

  1. extract the frames to a folder (make sure their names are in order, e.g. 00000.jpg, 00001.jpg, ...)
  2. Have the initial mask corresponding to the first frame (e.g. 00000.png).
  3. run python demo.py -model_name one-shot-model-davis -frames_path path-to-your-frames -mask_path path-to-initial-mask --overlay_masks

to do it for zero-shot (i.e. without initial mask) run python demo.py -model_name zero-shot-model-davis -frames_path path-to-your-frames --zero_shot --overlay_masks

Also you can use the argument -results_path to save the results to the folder you prefer.

Pretrained models

Download weights for models trained with:

The same files are also available in this folder in Google Drive.

Extract and place the obtained folder under models directory. You can then run evaluation scripts with the downloaded model by setting args.model_name to the name of the folder.

Contact

For questions and suggestions use the issues section or send an e-mail to [email protected]

More Repositories

1

detection-2016-nipsws

Hierarchical Object Detection with Deep Reinforcement Learning
Python
423
star
2

salgan

SalGAN: Visual Saliency Prediction with Generative Adversarial Networks
Python
368
star
3

retrieval-2017-cam

Class-Weighted Convolutional Features for Image Retrieval (BMVC 2017)
Python
223
star
4

retrieval-2016-deepvision

Faster R-CNN features for Instance Search
Python
216
star
5

activitynet-2016-cvprw

Tools to participate in the ActivityNet Challenge 2016 (NIPSW 2016)
Jupyter Notebook
195
star
6

saliency-2016-cvpr

Shallow and Deep Convolutional Networks for Saliency Prediction
Python
185
star
7

3D-GAN-superresolution

3D super-resolution using Generative Adversarial Networks
Python
155
star
8

rsis

Recurrent Neural Networks for Semantic Instance Segmentation
Jupyter Notebook
132
star
9

skiprnn-2017-telecombcn

Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks (ICLR 2018)
Python
124
star
10

retrieval-2016-icmr

Bags of Local Convolutional Features for Scalable Instance Search (ICMR 2016)
Python
111
star
11

sentiment-2017-imavis

From Pixels to Sentiment: Fine-tuning CNNs for Visual Sentiment Prediction
Python
97
star
12

liverseg-2017-nipsws

Detection-aided Liver Lesion Segmentation
Python
96
star
13

readai

READAI - Artificial Intelligence Reading Group
79
star
14

salbow

Saliency Weighted Convolutional Features for Instance Search
Python
55
star
15

deep-stereo

Use deep learning to estimate virtual views
Python
52
star
16

mri-braintumor-segmentation

MRI Brain Tumor Segmentation - BraTS Challenge 2020
Python
46
star
17

vqa-2016-cvprw

Visual question answering for CVPR16 VQA Challenge.
Python
41
star
18

pathgan

PathGan: Visual Scan-path Prediction with Generative Adversarial Networks
Python
41
star
19

danifojo-2018-repeatrnn

Comparing Fixed and Adaptive Computation Time for Recurrent Neural Networks
Python
34
star
20

slt_how2sign_wicv2023

Sign Language Translation for Instructional Videos - CVPR WiCV 2023
Python
32
star
21

Action-Tubelet-Detection-in-AVA

Python
28
star
22

telecombcn-2016-dlcv

Summer Seminar ETSETB TelecomBCN, 4-8 July 2016
HTML
27
star
23

unsupervised-2017-cvprw

Disentangling Motion, Foreground and Background Features in Videos
Python
25
star
24

multimodal-registration

Software for performing registration of 2D images and 3D point clouds
C++
22
star
25

sentiment-2015-asm

Diving Deep into Sentiment: Understanding Fine-tuned CNNs for Visual Sentiment Prediction
Python
21
star
26

SurvLIMEpy

Local interpretability for survival models
Python
17
star
27

retrieval-2016-remote

Multi-Label Remote Sensing Image Retrieval By Using Deep Features
14
star
28

segmentation_DLMI

Python
12
star
29

saliency-2018-videosalgan

Temporal Regularization of Saliency Maps in Egocentric Videos
Python
10
star
30

sign-topic

Topic Detection in Continuous Sign Language Videos. Presented as an extended abstract to the "AVA: Accessibility, Vision, and Autonomy Meet" CVPR 2022 Workshop.
Python
7
star
31

rvos-mots

Curriculum Learning for Recurrent Video Object Segmentation
Python
6
star
32

egocentric-2016-saliency

Research on the prediction of visual saliency in egocentric vision.
Python
6
star
33

affective-2017-musa2

More cat than cute? Interpretable Prediction of Adjective-Noun Pairs
Python
5
star
34

ragc

RAGC: Residual Attention Graph Convolutional Network for Geometric 3D Scene Classification (ICCVW 2019)
Python
5
star
35

egocentric-2017-lta

Semantic Summarization of Egocentric Photo Stream Events
OpenEdge ABL
5
star
36

videolabeler

Python
4
star
37

PiCoEDL

Discovery and Learning of Minecraft Navigation Goals from Pixels and Coordinates
HTML
4
star
38

skinningnet

CSS
3
star
39

speech2signs-2017-nmt

Neural Machine Translation based on the Pytorch “Attention is all you need” Implementation
Python
3
star
40

AI4Agriculture-grape-detection

Jupyter Notebook
3
star
41

munegc

Python
3
star
42

icv-3d-vision

Several tools and demonstrations for the 3D vision part of the Introduction to Computer Vision Course
MATLAB
3
star
43

saliency-2018-timeweight

The Importance of Time in Visual Attention Models
Jupyter Notebook
3
star
44

trecvid-2015

Tools for running the TRECVID 2015 Instance Search task at the Technical University of Catalonia.
Python
3
star
45

VNeAT

VNeAT (Voxel-wise Neuroimaging Analysis Toolbox) is a command-line toolbox written in Python that provides the tools to analyze the linear and nonlinear dynamics of a particular tissue and study the statistical significance of such dynamics at the voxel level.
Python
3
star
46

structnet_aging

Jupyter Notebook
2
star
47

memory-2016-fpv

Tools for lifelogging image processing.
JavaScript
2
star
48

signalign

Tools for temporal and spatial aligning between cameras in our sign language video datasets.
Jupyter Notebook
2
star
49

SurvLIME-experiments

Jupyter Notebook
2
star
50

synthref

SynthRef: Generation of Synthetic Referring Expressions for Object Segmentation
2
star
51

retrieval-2016-lostobject

Time-sensitive Egocentric Image Retrieval for Fidings Objects in Lifelogs.
2
star
52

progressive_nns

Python
1
star
53

saliency-2016-lsun

Visual saliency predictor used to participate in the LSUN Challenge 2016.
Jupyter Notebook
1
star
54

netbenchmark

R
1
star
55

Maxtree-Processing-Toolbox

C
1
star
56

gesture-sound

This is the TFG of Efrem Blazquez: L’Art del so en Mans de la Imatge. Els Sensors 3D i les seves Possibilitats
C++
1
star
57

image-synthesis

Synthesis of prostate MRI or biomarkers to improve the detection/classification of clinically significant prostate cancer
Python
1
star
58

BCN20000

Python
1
star