Discover ldkong1205/TranSVAE Open Source project

Unsupervised Video Domain Adaptation for Action Recognition:
A Disentanglement Perspective

Pengfei Wei¹ Lingdong Kong^1,2 Xinghua Qu¹ Xiang Yin¹ Zhiqiang Xu³ Jing Jiang⁴ Zejun Ma¹
¹ByteDance AI Lab ²National University of Singapore ³MBZUAI ⁴University of Technology Sydney

About

TranSVAE is a disentanglement framework designed for unsupervised video domain adaptation. It aims at disentangling the domain information from the data during the adaptation process. We consider the generation of cross-domain videos from two sets of latent factors: one encoding the static domain-related information and another encoding the temporal and semantic-related information. Objectives are enforced to constrain these latent factors to achieve domain disentanglement and transfer.

Col1: Original sequences ("Human" $\mathcal{D}=\mathbf{P}_1$ and "Alien" $\mathcal{D}=\mathbf{P}_2$); Col2: Sequence reconstructions; Col3: Reconstructed sequences using $z_1^{\mathcal{D}},...,z_T^{\mathcal{D}}$; Col4: Domain transferred sequences with exchanged $z_d^{\mathcal{D}}$.

Visit our project page to explore more details. 🐾

Updates

[2022.08] - TranSVAE achieves 1st place among the UDA leaderboards of UCF-HMDB, Jester, and Epic-Kitchens, based on Paper-with-Code.
[2022.08] - Try a Gradio demo for domain disentanglement in TranSVAE at Hugging Face Spaces! 🤗
[2022.08] - Our paper is available on arXiv, click here to check it out!

Highlight

Conceptual Comparison

Graphical Model

Framework Overview

Installation

Please refer to INSTALL.md for the installation details.

Data Preparation

Please refer to DATA_PREPARE.md for the details to prepare the ¹UCF₁₀₁, ²HMDB₅₁, ³Jester, ⁴Epic-Kitchens, and ⁵Sprites datasets.

Getting Started

Please refer to GET_STARTED.md to learn more usage about this codebase.

Main Result

UCF₁₀₁ - HMDB₅₁

Method	Backbone	U₁₀₁ → H₅₁	H₅₁ → U₁₀₁	Average
DANN (JMLR'16)	ResNet-101	75.28	76.36	75.82
JAN (ICML'17)	ResNet-101	74.72	76.69	75.71
AdaBN (PR'18)	ResNet-101	72.22	77.41	74.82
MCD (CVPR'18)	ResNet-101	73.89	79.34	76.62
TA³N (ICCV'19)	ResNet-101	78.33	81.79	80.06
ABG (MM'20)	ResNet-101	79.17	85.11	82.14
TCoN (AAAI'20)	ResNet-101	87.22	89.14	88.18
MA²L-TD (WACV'22)	ResNet-101	85.00	86.59	85.80
Source-only	I3D	80.27	88.79	84.53
DANN (JMLR'16)	I3D	80.83	88.09	84.46
ADDA (CVPR'17)	I3D	79.17	88.44	83.81
TA³N (ICCV'19)	I3D	81.38	90.54	85.96
SAVA (ECCV'20)	I3D	82.22	91.24	86.73
CoMix (NeurIPS'21)	I3D	86.66	93.87	90.22
CO²A (WACV'22)	I3D	87.78	95.79	91.79
TranSVAE (Ours)	I3D	87.78	98.95	93.37
Oracle	I3D	95.00	96.85	95.93

Jester

Task	Source-only	DANN	ADDA	TA³N	CoMix	TranSVAE (Ours)	Oracle
J_S → J_T	51.5	55.4	52.3	55.5	64.7	66.1	95.6

Epic-Kitchens

Task	Source-only	DANN	ADDA	TA³N	CoMix	TranSVAE (Ours)	Oracle
D₁ → D₂	32.8	37.7	35.4	34.2	42.9	50.5	64.0
D₁ → D₃	34.1	36.6	34.9	37.4	40.9	50.3	63.7
D₂ → D₁	35.4	38.3	36.3	40.9	38.6	50.3	57.0
D₂ → D₃	39.1	41.9	40.8	42.8	45.2	58.6	63.7
D₃ → D₁	34.6	38.8	36.1	39.9	42.3	48.0	57.0
D₃ → D₂	35.8	42.1	41.4	44.2	49.2	58.0	64.0
Average	35.3	39.2	37.4	39.9	43.2	52.6	61.5

Ablation Study

UCF₁₀₁ → HMDB₅₁

HMDB₅₁ → UCF₁₀₁

Domain Transfer Example

Source (Original)	Target (Original)	Source (Original)	Target (Original)


Reconstruct ($\mathbf{z}_d^{\mathcal{S}}$ + $\mathbf{z}_t^{\mathcal{S}}$)	Reconstruct ($\mathbf{z}_d^{\mathcal{T}}$ + $\mathbf{z}_t^{\mathcal{T}}$)	Reconstruct ($\mathbf{z}_d^{\mathcal{S}}$ + $\mathbf{z}_t^{\mathcal{S}}$)	Reconstruct ($\mathbf{z}_d^{\mathcal{T}}$ + $\mathbf{z}_t^{\mathcal{T}}$)


Reconstruct ($\mathbf{z}_d^{\mathcal{S}} + \mathbf{0}$)	Reconstruct ($\mathbf{z}_d^{\mathcal{T}} + \mathbf{0}$)	Reconstruct ($\mathbf{z}_d^{\mathcal{S}} + \mathbf{0}$)	Reconstruct ($\mathbf{z}_d^{\mathcal{T}} + \mathbf{0}$)


Reconstruct ($\mathbf{0} + \mathbf{z}_t^{\mathcal{S}}$)	Reconstruct ($\mathbf{0} + \mathbf{z}_t^{\mathcal{T}}$)	Reconstruct ($\mathbf{0} + \mathbf{z}_t^{\mathcal{S}}$)	Reconstruct ($\mathbf{0} + \mathbf{z}_t^{\mathcal{T}}$)


Reconstruct ($\mathbf{z}_d^{\mathcal{S}} + \mathbf{z}_t^{\mathcal{T}}$)	Reconstruct ($\mathbf{z}_d^{\mathcal{T}} + \mathbf{z}_t^{\mathcal{S}}$)	Reconstruct ($\mathbf{z}_d^{\mathcal{S}} + \mathbf{z}_t^{\mathcal{T}}$)	Reconstruct ($\mathbf{z}_d^{\mathcal{T}} + \mathbf{z}_t^{\mathcal{S}}$)

TODO List

License

This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Acknowledgement

We acknowledge the use of the following public resources during the course of this work: ¹UCF₁₀₁, ²HMDB₅₁, ³Jester, ⁴Epic-Kitchens, ⁵Sprites, ⁶I3D, and ⁷TRN.

Citation

If you find this work helpful, please kindly consider citing our paper:

@ARTICLE{wei2022transvae,
  title={Unsupervised Video Domain Adaptation: A Disentanglement Perspective},
  author={Wei, Pengfei and Kong, Lingdong and Qu, Xinghua and Yin, Xiang and Xu, Zhiqiang and Jiang, Jing and Ma, Zejun},
  journal={arXiv preprint arXiv:2208.07365}, 
  year={2022},
}

ldkong1205/TranSVAE

ldkong1205

Reviews

Repository Details