• Stars
    star
    149
  • Rank 248,619 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created over 2 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[CVPR-2022 (oral)]-Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation

Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation (CVPR-2022, oral)

Paper, Sides, Poster, Video

Xiangtai Li, Wenwei Zhang, Jiangmiao Pang, Kai Chen, Guangliang Cheng, Yunhai Tong, Chen Change Loy.

We introduce Video K-Net, a simple, strong, and unified framework for fully end-to-end dense video segmentation.

The method is built upon K-Net, a method of unifying image segmentation via a group of learnable kernels.

This project contains the training and testing code of Video K-Net for both VPS (Video Panoptic Segmentation), VSS(Video Semantic Segmentation), VIS(Video Instance Segmentation).

To the best of our knowledge, our Video K-Net is the first open-sourced method that supports three different video segmentation tasks (VIS, VPS, VSS) for Video Scene Understanding.

News! Video K-Net is acknowledged as a strong baseline for CVPR-2023 workshop "The 2nd Pixel-level Video Understanding in the Wild".

News! Video K-Net also supports VIP-Seg dataset(CVPR-2022). It also achieves the new state-of-the-art result.

Environment and DataSet Preparation

Our codebase is based on MMDetection and MMSegmentation. Parts of the code is borrowed from MMtracking and UniTrack.

  • MIM >= 0.1.1
  • MMCV-full >= v1.3.8
  • MMDetection == v2.18.0
  • timm
  • scipy
  • panopticapi

See the DATASET.md

knet folder contains the Video K-Net for VPS.

knet_vis folder contains the Video K-Net for VIS.

Pretrained CKPTs and Trained Models

We provide the pretrained models for VPS and VIS.

Baidu Yun Link: here Code:i034

One Drive Link: here

The pretrained models are provided to train the Video K-Net.

The trained models are also provided for play and test.

[VPS] KITTI-STEP

  1. First pretrain K-Net on Cityscapes-STEP datasset. As shown in original STEP paper(Appendix Part) and our own EXP results, this step is very important to improve the segmentation performance. You can also use our trained model for verification.

Cityscape-STEP follows the format of STEP: 17 stuff classes and 2 thing classes.

# train cityscapes step panoptic segmentation models
sh ./tools/slurm_train.sh $PARTITION knet_step configs/det/knet_cityscapes_step/knet_s3_r50_fpn.py $WORK_DIR --no-validate
  1. Then train the Video K-Net on KITTI-STEP. We have provided the pretrained models from Cityscapes of Video K-Net.

For slurm users:

# train Video K-Net on KITTI-step using R-50
GPUS=8 sh ./tools/slurm_train.sh $PARTITION video_knet_step configs/det/video_knet_kitti_step/video_knet_s3_r50_rpn_1x_kitti_step_sigmoid_stride2_mask_embed_link_ffn_joint_train.py $WORK_DIR --no-validate --load-from /path_to_knet_step_city_r50
# train Video K-Net on KITTI-step using Swin-base
GPUS=16 GPUS_PER_NODE=8 sh ./tools/slurm_train.sh $PARTITION video_knet_step configs/det/video_knet_kitti_step/video_knet_s3_swinb_rpn_1x_kitti_step_sigmoid_stride2_mask_embed_link_ffn_joint_train.py $WORK_DIR --no-validate --load-from /path_to_knet_step_city_r50

Our models are trained with two V100 machines.

For Local machine:

# train Video K-Net on KITTI-step with 8 GPUs
sh ./tools/dist_train.sh video_knet_step configs/det/video_knet_kitti_step/video_knet_s3_r50_rpn_1x_kitti_step_sigmoid_stride2_mask_embed_link_ffn_joint_train.py 8 $WORK_DIR --no-validate
  1. Testing and Demo.

We provide both VPQ and STQ metrics to evaluate VPS models.

# test locally 
sh ./tools/dist_step_test.sh configs/det/knet_cityscapes_ste/knet_s3_r50_fpn.py $MODEL_DIR 

We also dump the colored images for debug.

# eval STEP STQ
python tools/eval_dstq_step.py result_path gt_path
# eval STEP VPQ
python tools/eval_dvpq_step.py result_path gt_path

Toy Video K-Net

As shown in the paper, we also provide toy video K-Net in knet/video/knet_quansi_dense_embed_fc_toy_exp.py. You use the K-Net pre-trained on image-level KITTI-STEP without tracking.

[VIS] YouTube-VIS-2019

  1. First Download the pre-trained Image K-Net instance segmentation models. All the models are pretrained on COCO which is a common. You can also pretrain it by yourself. We also provide the config for pretraining.

For slurm users:

# train K-Net instance segmentation models on COCO using R-50
GPUS=8 sh ./tools/slurm_train.sh $PARTITION knet_instance configs/det/coco/knet_s3_r50_fpn_ms-3x_coco.py $WORK_DIR 
  1. Then train the video K-Net in a clip-wised manner.
# train Video K-Net VIS models using R-50
GPUS=8 sh ./tools/slurm_train.sh $PARTITION video_knet_vis configs/video_knet_vis/video_knet_vis/knet_track_r50_1x_youtubevis.py $WORK_DIR --load-from /path_to_knet_instance_coco
  1. To evaluate the results of Video K-Net on VIS. Dump the prediction results for submission to the conda server.
# test Video K-Net VIS models using R-50
GPUS=8 sh tools_vis/dist_test_whole_video.sh $PARTITION video_knet_vis configs/video_knet_vis/video_knet_vis/knet_track_r50_1x_youtubevis.py $WORK_DIR --format-only

The result json is dumped into the root of this codebase.

[VPS] VIP-Seg

  1. First Download the pre-trained Image K-Net panoptic segmentation models. All the models are pretrained on COCO which is a common step following VIP-Seg. You can also pretrain it by yourself. We also provide the config for pretraining.
# train K-Net on COCO Panoptic Segmetnation
GPUS=8 sh ./tools/slurm_train.sh $PARTITION knet_coco configs/det/coco/knet_s3_r50_fpn_ms-3x_coco-panoptic.py $WORK_DIR 
  1. Train the Video K-Net on the VIP-Seg dataset.
# train Video K-Net on VIP-Seg
GPUS=8 sh ./tools/slurm_train.sh $PARTITION video_knet_vis configs/det/video_knet_vipseg/video_knet_s3_r50_rpn_vipseg_mask_embed_link_ffn_joint_train.py $WORK_DIR --load-from /path/knet_coco_pretrained_r50
  1. Test the Video K-Net on VIP-Seg val dataset.
# test locally on VIP-Seg
sh ./tools/dist_step_test.sh configs/det/video_knet_vipseg/video_knet_s3_r50_rpn_vipseg_mask_embed_link_ffn_joint_train.py $MODEL_DIR 

We also dump the colored images for debug.

# eval STEP STQ
python tools/eval_dstq_vipseg.py result_path gt_path
# eval STEP VPQ
python tools/eval_dvpq_vipseg.py result_path gt_path

Visualization Results

Results on KITTI-STEP DataSet

Results on VIP-Seg DataSet

Results on YouTube-VIS DataSet

Short term segmentation and tracking results on Cityscapes VPS dataset.

images(left), Video K-Net(middle), Ground Truth Alt Text

Alt Text

Long term segmentation and tracking results on STEP dataset.

Alt Text

Alt Text

Related Project and Acknowledgement

Citing Video K-Net 🙏

If you use our codebase in your research or used for CVPR-2023 pixel-level video workshop, please use the following BibTeX entry.

NIPS-2021, K-Net: Unified Segmentation: Our Image baseline (https://github.com/ZwwWayne/K-Net)

ECCV-2022, PolyphonicFormer: A Unified Framework For Panoptic Segmentation + Depth Estimation (winner of ICCV-2021 BMTT workshop) (https://github.com/HarborYuan/PolyphonicFormer)

@inproceedings{li2022videoknet,
  title={Video k-net: A simple, strong, and unified baseline for video segmentation},
  author={Li, Xiangtai and Zhang, Wenwei and Pang, Jiangmiao and Chen, Kai and Cheng, Guangliang and Tong, Yunhai and Loy, Chen Change},
  booktitle={CVPR},
  year={2022}
}

@article{zhang2021k,
  title={K-net: Towards unified image segmentation},
  author={Zhang, Wenwei and Pang, Jiangmiao and Chen, Kai and Loy, Chen Change},
  journal={NeurIPS},
  year={2021}
}

More Repositories

1

OMG-Seg

OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
Python
1,272
star
2

Awesome-Segmentation-With-Transformer

[T-PAMI-2024] Transformer-Based Visual Segmentation: A Survey
684
star
3

OctaveConv_pytorch

Pytorch implementation of newly added convolution
Python
582
star
4

DecoupleSegNets

[ECCV-2020]: Improving Semantic Segmentation via Decoupled Body and Edge Supervision
Python
370
star
5

SFSegNets

[ECCV-2020-oral]-Semantic Flow for Fast and Accurate Scene Parsing
Python
368
star
6

GALD-DGCNet

Source code and model GALD net (BMVC-2019) and Dual-Seg Net (BMVC-2019)
Python
343
star
7

Fast_Seg

This repo provides ⚡ fast⚡ semantic segmentation models on CityScapes/Camvid DataSet by Pytorch
Python
208
star
8

CAE

This is a PyTorch implementation of “Context AutoEncoder for Self-Supervised Representation Learning"
Python
192
star
9

PFSegNets

PointFlow (CVPR-2021)
Python
121
star
10

Tube-Link

[ICCV-2023]-Universal Video Segmentaion For VSS, VPS and VIS
Python
109
star
11

dfn_seg

Implementation of Paper Learning a Discriminative Feature Network for Semantic Segmentation (CVPR2018)(face++)
Python
70
star
12

BSSeg

BoundarySqueeze: Image Segmentation as Boundary Squeezing
Python
55
star
13

Panoptic-PartFormer

[ECCV-2022] The First Unified End-to-End System for Panoptic Part Segmentation
Python
53
star
14

fuse_seg_pytorch

Pytorch Implementation of Paper: Enhancing Feature Fusion for Semantic Segmentation (face++)
Python
43
star
15

AI_challenger_Chinese_Caption

Repository for image caption for Chinese
Jupyter Notebook
25
star
16

TemporalPyramidRouting

Temporal Pyramid Routing For Video Instance Segmentation-T-PAMI-2022
Python
25
star
17

QueryPanSeg

Query Learning of Both Thing and Stuff for Panoptic Segmentation-ICIP-2022
15
star
18

deepLearning.ai.solution

This repository contains the implementation of deep learning courses by Andrew ng on Coursera
Jupyter Notebook
13
star
19

netwarp_test

Semantic Video CNNs through Representation Warping. ICCV 2017
Python
5
star
20

CompactSecondOrderNet

3
star
21

BasicAlgorithm-PAT-LeetCode-LintCode-

This repository is used to record the study of algorithm.
C++
1
star
22

cinema_java_software_engineering

This repository contains a simple Cinema System. This is the project of 3rd Software Engineering
Java
1
star
23

Pytorch-Cifar-models

This repository contains some famous CNN models that can run on the cifar-10 dataset
Python
1
star
24

MobileNet2-pytorch

This repository contains mobile nets implemetation by pytorch
Python
1
star
25

Adaboost-byhand

This repository contains the basic, mulit_boosting and basic bagging implementation
Python
1
star