• Stars
    star
    1,750
  • Rank 26,606 (Top 0.6 %)
  • Language
    Python
  • License
    MIT License
  • Created over 2 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[ECCV 2022] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

XMem

Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model

Ho Kei Cheng, Alexander Schwing

University of Illinois Urbana-Champaign

[arXiv] [PDF] [Project Page] Open In Colab

Demo

Handling long-term occlusion:

cans_crf20.mp4

Very-long video; masked layer insertion:

breakdance_soft_crf20.mp4

Source: https://www.youtube.com/watch?v=q5Xr0F4a0iU

Out-of-domain case:

Fujiwara_Chika.mp4

Source: ใ‹ใใ‚„ๆง˜ใฏๅ‘Šใ‚‰ใ›ใŸใ„ ๏ฝžๅคฉๆ‰ใŸใกใฎๆ‹ๆ„›้ ญ่„ณๆˆฆ๏ฝž Ep.3; A-1 Pictures

[Failure Cases]

Features

  • Handle very long videos with limited GPU memory usage.
  • Quite fast. Expect ~20 FPS even with long videos (hardware dependent).
  • Come with a GUI (modified from MiVOS).

Table of Contents

  1. Introduction
  2. Results
  3. Interactive GUI demo
  4. Training/inference
  5. Citation

Introduction

framework

We frame Video Object Segmentation (VOS), first and foremost, as a memory problem. Prior works mostly use a single type of feature memory. This can be in the form of network weights (i.e., online learning), last frame segmentation (e.g., MaskTrack), spatial hidden representation (e.g., Conv-RNN-based methods), spatial-attentional features (e.g., STM, STCN, AOT), or some sort of long-term compact features (e.g., AFB-URR).

Methods with a short memory span are not robust to changes, while those with a large memory bank are subject to a catastrophic increase in computation and GPU memory usage. Attempts at long-term attentional VOS like AFB-URR compress features eagerly as soon as they are generated, leading to a loss of feature resolution.

Our method is inspired by the Atkinson-Shiffrin human memory model, which has a sensory memory, a working memory, and a long-term memory. These memory stores have different temporal scales and complement each other in our memory reading mechanism. It performs well in both short-term and long-term video datasets, handling videos with more than 10,000 frames with ease.

Training/inference

First, install the required python packages and datasets following GETTING_STARTED.md.

For training, see TRAINING.md.

For inference, see INFERENCE.md.

Citation

Please cite our paper if you find this repo useful!

@inproceedings{cheng2022xmem,
  title={{XMem}: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model},
  author={Cheng, Ho Kei and Alexander G. Schwing},
  booktitle={ECCV},
  year={2022}
}

Related projects that this paper is developed upon:

@inproceedings{cheng2021stcn,
  title={Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation},
  author={Cheng, Ho Kei and Tai, Yu-Wing and Tang, Chi-Keung},
  booktitle={NeurIPS},
  year={2021}
}

@inproceedings{cheng2021mivos,
  title={Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion},
  author={Cheng, Ho Kei and Tai, Yu-Wing and Tang, Chi-Keung},
  booktitle={CVPR},
  year={2021}
}

We use f-BRS in the interactive demo: https://github.com/saic-vul/fbrs_interactive_segmentation

And if you want to cite the datasets:

bibtex

@inproceedings{shi2015hierarchicalECSSD,
  title={Hierarchical image saliency detection on extended CSSD},
  author={Shi, Jianping and Yan, Qiong and Xu, Li and Jia, Jiaya},
  booktitle={TPAMI},
  year={2015},
}

@inproceedings{wang2017DUTS,
  title={Learning to Detect Salient Objects with Image-level Supervision},
  author={Wang, Lijun and Lu, Huchuan and Wang, Yifan and Feng, Mengyang 
  and Wang, Dong, and Yin, Baocai and Ruan, Xiang}, 
  booktitle={CVPR},
  year={2017}
}

@inproceedings{FSS1000,
  title = {FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation},
  author = {Li, Xiang and Wei, Tianhan and Chen, Yau Pun and Tai, Yu-Wing and Tang, Chi-Keung},
  booktitle={CVPR},
  year={2020}
}

@inproceedings{zeng2019towardsHRSOD,
  title = {Towards High-Resolution Salient Object Detection},
  author = {Zeng, Yi and Zhang, Pingping and Zhang, Jianming and Lin, Zhe and Lu, Huchuan},
  booktitle = {ICCV},
  year = {2019}
}

@inproceedings{cheng2020cascadepsp,
  title={{CascadePSP}: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement},
  author={Cheng, Ho Kei and Chung, Jihoon and Tai, Yu-Wing and Tang, Chi-Keung},
  booktitle={CVPR},
  year={2020}
}

@inproceedings{xu2018youtubeVOS,
  title={Youtube-vos: A large-scale video object segmentation benchmark},
  author={Xu, Ning and Yang, Linjie and Fan, Yuchen and Yue, Dingcheng and Liang, Yuchen and Yang, Jianchao and Huang, Thomas},
  booktitle = {ECCV},
  year={2018}
}

@inproceedings{perazzi2016benchmark,
  title={A benchmark dataset and evaluation methodology for video object segmentation},
  author={Perazzi, Federico and Pont-Tuset, Jordi and McWilliams, Brian and Van Gool, Luc and Gross, Markus and Sorkine-Hornung, Alexander},
  booktitle={CVPR},
  year={2016}
}

@inproceedings{denninger2019blenderproc,
  title={BlenderProc},
  author={Denninger, Maximilian and Sundermeyer, Martin and Winkelbauer, Dominik and Zidan, Youssef and Olefir, Dmitry and Elbadrawy, Mohamad and Lodhi, Ahsan and Katam, Harinandan},
  booktitle={arXiv:1911.01911},
  year={2019}
}

@inproceedings{shapenet2015,
  title       = {{ShapeNet: An Information-Rich 3D Model Repository}},
  author      = {Chang, Angel Xuan and Funkhouser, Thomas and Guibas, Leonidas and Hanrahan, Pat and Huang, Qixing and Li, Zimo and Savarese, Silvio and Savva, Manolis and Song, Shuran and Su, Hao and Xiao, Jianxiong and Yi, Li and Yu, Fisher},
  booktitle   = {arXiv:1512.03012},
  year        = {2015}
}

Contact: [email protected]

More Repositories

1

Tracking-Anything-with-DEVA

[ICCV 2023] Tracking Anything with Decoupled Video Segmentation
Python
1,248
star
2

CascadePSP

[CVPR 2020] CascadePSP: Toward Class-Agnostic and Very High-Resolution Segmentation via Global and Local Refinement
Python
826
star
3

Cutie

[CVPR 2024 Highlight] Putting the Object Back Into Video Object Segmentation
Python
705
star
4

STCN

[NeurIPS 2021] Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation
Python
542
star
5

MiVOS

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion. Semi-supervised VOS as well!
Python
465
star
6

Mask-Propagation

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code ๐ŸŒŸ. Semi-supervised video object segmentation evaluation.
Python
128
star
7

Scribble-to-Mask

[CVPR 2021] MiVOS - Scribble to Mask module
Python
87
star
8

vos-benchmark

Fast and general video object segmentation evaluation.
Python
27
star
9

PyTorch-ARCNN

A test script for ARCNN powered by PyTorch.
Python
14
star
10

davis2016-evaluation

Python
8
star
11

nitrous-ema

Fast and simple post-hoc EMA (Karras et al., 2023) for PyTorch with minimal `.item()` calls. ~78% lower overhead than ema_pytorch.
Python
4
star
12

Course-Data-Analyser

A project for COMP2021 which analyse data of courses in HKUST
HTML
3
star
13

CharTrans-GAN

Use GAN to perform style transfer of Chinese characters.
TeX
3
star
14

BlenderVOSRenderer

Python
2
star
15

Single-View-Metrology-Step-By-Step

An implementation of Single View Metrology (Criminisi99) with step-by-step guidance in a Jupyter Notebook.
Jupyter Notebook
1
star
16

Android-Matrix-Calculator

A simple android matrix calculator which supports input of unknowns.
Java
1
star
17

VisualChat-Painter-example

Java
1
star
18

htyc-eitc-student

This is a repo for storing and sharing the resources provided to EITC students in HTYC.
Java
1
star
19

kinetics_to_frames

Convert kinetics datasets (or other video datasets) to frames. Support resizing and temporal sampling for space efficiency.
Python
1
star
20

Markov-Next-Word

A next-word prediction program using Markov chain with n-gram written in Go.
Go
1
star
21

shared-memory-tensor-dataset

This repository provides an example of reading from a single shared memory tensor from multiple processes (e.g., with DDP).
Python
1
star