• Stars
    star
    121
  • Rank 293,924 (Top 6 %)
  • Language
    Jupyter Notebook
  • Created over 2 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[NeurIPS 2023] Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective

Unsupervised Video Domain Adaptation for Action Recognition:
A Disentanglement Perspective

Pengfei Wei1  Lingdong Kong1,2  Xinghua Qu1  Xiang Yin1  Zhiqiang Xu3  Jing Jiang4  Zejun Ma1
1ByteDance AI Lab   2National University of Singapore   3MBZUAI   4University of Technology Sydney

About

TranSVAE is a disentanglement framework designed for unsupervised video domain adaptation. It aims at disentangling the domain information from the data during the adaptation process. We consider the generation of cross-domain videos from two sets of latent factors: one encoding the static domain-related information and another encoding the temporal and semantic-related information. Objectives are enforced to constrain these latent factors to achieve domain disentanglement and transfer.



Col1: Original sequences ("Human" $\mathcal{D}=\mathbf{P}_1$ and "Alien" $\mathcal{D}=\mathbf{P}_2$); Col2: Sequence reconstructions; Col3: Reconstructed sequences using $z_1^{\mathcal{D}},...,z_T^{\mathcal{D}}$; Col4: Domain transferred sequences with exchanged $z_d^{\mathcal{D}}$.


Visit our project page to explore more details. 🐾

Updates

  • [2022.08] - TranSVAE achieves 1st place among the UDA leaderboards of UCF-HMDB, Jester, and Epic-Kitchens, based on Paper-with-Code.
  • [2022.08] - Try a Gradio demo for domain disentanglement in TranSVAE at Hugging Face Spaces! 🤗
  • [2022.08] - Our paper is available on arXiv, click here to check it out!

Outline

Highlight

Conceptual Comparison
Graphical Model
Framework Overview

Installation

Please refer to INSTALL.md for the installation details.

Data Preparation

Please refer to DATA_PREPARE.md for the details to prepare the 1UCF101, 2HMDB51, 3Jester, 4Epic-Kitchens, and 5Sprites datasets.

Getting Started

Please refer to GET_STARTED.md to learn more usage about this codebase.

Main Result

UCF101 - HMDB51

PWC

Method Backbone U101 → H51 H51 → U101 Average
DANN (JMLR'16) ResNet-101 75.28 76.36 75.82
JAN (ICML'17) ResNet-101 74.72 76.69 75.71
AdaBN (PR'18) ResNet-101 72.22 77.41 74.82
MCD (CVPR'18) ResNet-101 73.89 79.34 76.62
TA3N (ICCV'19) ResNet-101 78.33 81.79 80.06
ABG (MM'20) ResNet-101 79.17 85.11 82.14
TCoN (AAAI'20) ResNet-101 87.22 89.14 88.18
MA2L-TD (WACV'22) ResNet-101 85.00 86.59 85.80
Source-only I3D 80.27 88.79 84.53
DANN (JMLR'16) I3D 80.83 88.09 84.46
ADDA (CVPR'17) I3D 79.17 88.44 83.81
TA3N (ICCV'19) I3D 81.38 90.54 85.96
SAVA (ECCV'20) I3D 82.22 91.24 86.73
CoMix (NeurIPS'21) I3D 86.66 93.87 90.22
CO2A (WACV'22) I3D 87.78 95.79 91.79
TranSVAE (Ours) I3D 87.78 98.95 93.37
Oracle I3D 95.00 96.85 95.93

Jester

PWC

Task Source-only DANN ADDA TA3N CoMix TranSVAE (Ours) Oracle
JSJT 51.5 55.4 52.3 55.5 64.7 66.1 95.6

Epic-Kitchens

PWC

Task Source-only DANN ADDA TA3N CoMix TranSVAE (Ours) Oracle
D1D2 32.8 37.7 35.4 34.2 42.9 50.5 64.0
D1D3 34.1 36.6 34.9 37.4 40.9 50.3 63.7
D2D1 35.4 38.3 36.3 40.9 38.6 50.3 57.0
D2D3 39.1 41.9 40.8 42.8 45.2 58.6 63.7
D3D1 34.6 38.8 36.1 39.9 42.3 48.0 57.0
D3D2 35.8 42.1 41.4 44.2 49.2 58.0 64.0
Average 35.3 39.2 37.4 39.9 43.2 52.6 61.5

Ablation Study

UCF101HMDB51

HMDB51UCF101

Domain Transfer Example

Source (Original) Target (Original) Source (Original) Target (Original)
src_original tar_original src_original tar_original
Reconstruct ($\mathbf{z}_d^{\mathcal{S}}$ + $\mathbf{z}_t^{\mathcal{S}}$) Reconstruct ($\mathbf{z}_d^{\mathcal{T}}$ + $\mathbf{z}_t^{\mathcal{T}}$) Reconstruct ($\mathbf{z}_d^{\mathcal{S}}$ + $\mathbf{z}_t^{\mathcal{S}}$) Reconstruct ($\mathbf{z}_d^{\mathcal{T}}$ + $\mathbf{z}_t^{\mathcal{T}}$)
src_recon tar_recon src_recon tar_recon
Reconstruct ($\mathbf{z}_d^{\mathcal{S}} + \mathbf{0}$) Reconstruct ($\mathbf{z}_d^{\mathcal{T}} + \mathbf{0}$) Reconstruct ($\mathbf{z}_d^{\mathcal{S}} + \mathbf{0}$) Reconstruct ($\mathbf{z}_d^{\mathcal{T}} + \mathbf{0}$)
recon_srcZf recon_tarZf recon_srcZf recon_tarZf
Reconstruct ($\mathbf{0} + \mathbf{z}_t^{\mathcal{S}}$) Reconstruct ($\mathbf{0} + \mathbf{z}_t^{\mathcal{T}}$) Reconstruct ($\mathbf{0} + \mathbf{z}_t^{\mathcal{S}}$) Reconstruct ($\mathbf{0} + \mathbf{z}_t^{\mathcal{T}}$)
recon_srcZt recon_tarZt recon_srcZt recon_tarZt
Reconstruct ($\mathbf{z}_d^{\mathcal{S}} + \mathbf{z}_t^{\mathcal{T}}$) Reconstruct ($\mathbf{z}_d^{\mathcal{T}} + \mathbf{z}_t^{\mathcal{S}}$) Reconstruct ($\mathbf{z}_d^{\mathcal{S}} + \mathbf{z}_t^{\mathcal{T}}$) Reconstruct ($\mathbf{z}_d^{\mathcal{T}} + \mathbf{z}_t^{\mathcal{S}}$)
recon_srcZf_tarZt recon_tarZf_srcZt recon_srcZf_tarZt recon_tarZf_srcZt

TODO List

  • Initial release. 🚀
  • Add license. See here for more details.
  • Add demo at Hugging Face Spaces.
  • Add installation details.
  • Add data preparation details.
  • Add evaluation details.
  • Add training details.

License

Creative Commons License
This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Acknowledgement

We acknowledge the use of the following public resources during the course of this work: 1UCF101, 2HMDB51, 3Jester, 4Epic-Kitchens, 5Sprites, 6I3D, and 7TRN.

Citation

If you find this work helpful, please kindly consider citing our paper:

@ARTICLE{wei2022transvae,
  title={Unsupervised Video Domain Adaptation: A Disentanglement Perspective},
  author={Wei, Pengfei and Kong, Lingdong and Qu, Xinghua and Yin, Xiang and Xu, Zhiqiang and Jiang, Jing and Ma, Zejun},
  journal={arXiv preprint arXiv:2208.07365}, 
  year={2022},
}

More Repositories

1

Robo3D

[ICCV 2023] Robo3D: Towards Robust and Reliable 3D Perception against Corruptions
Python
309
star
2

LaserMix

[CVPR 2023 Highlight] LaserMix for Semi-Supervised LiDAR Semantic Segmentation
Python
283
star
3

RoboDepth

[NeurIPS 2023] RoboDepth: Robust Out-of-Distribution Depth Estimation under Corruptions
Python
264
star
4

PointCloud-C

Benchmarking and Analyzing Point Cloud Perception Robustness under Corruptions
Python
165
star
5

ntu-graduate-courses

Graduate-Level Courses (Electrical Engineering & Computer Science) at NTU, Singapore
Mercury
78
star
6

ConDA

[ICRA 2023] ConDA: Unsupervised Domain Adaptation for LiDAR Segmentation via Regularized Domain Concatenation
Python
59
star
7

Calib3D

Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding
Python
42
star
8

OpenESS

[CVPR 2024 Highlight] OpenESS: Event-Based Semantic Scene Understanding with Open Vocabularies
39
star
9

awesome-3d-da

A curated list of awesome 3D domain adaptation resources
21
star
10

mca

[TNNLS'21] Mutual-Collision-Avoidance Scheme Synthesized by Neural Networks for Dual Redundant Robot Manipulators Executing Cooperative Tasks
MATLAB
16
star
11

numerical-planning

[ROBIO’18] Comparisons among Six Numerical Methods for Solving Repetitive Motion Planning of Redundant Robot Manipulators
MATLAB
15
star
12

vp-rnn

[TNNLS'19] Power-Type Varying-Parameter RNN for Solving TVQP Problems: Design, Analysis, and Applications
MATLAB
11
star
13

cs4243_lab

Jupyter Notebook
11
star
14

cf-eqa

[arXiv'21] Counterfactual QA: Eliminating Bias in Question Answering
Python
5
star