A Disentanglement Perspective
Unsupervised Video Domain Adaptation for Action Recognition:
Pengfei Wei1
Lingdong Kong1,2
Xinghua Qu1
Xiang Yin1
Zhiqiang Xu3
Jing Jiang4
Zejun Ma1
1ByteDance AI Lab
2National University of Singapore
3MBZUAI
4University of Technology Sydney
About
TranSVAE is a disentanglement framework designed for unsupervised video domain adaptation. It aims at disentangling the domain information from the data during the adaptation process. We consider the generation of cross-domain videos from two sets of latent factors: one encoding the static domain-related information and another encoding the temporal and semantic-related information. Objectives are enforced to constrain these latent factors to achieve domain disentanglement and transfer.
Col1: Original sequences ("Human"
Visit our project page to explore more details.
Updates
- [2022.08] - TranSVAE achieves 1st place among the UDA leaderboards of UCF-HMDB, Jester, and Epic-Kitchens, based on Paper-with-Code.
- [2022.08] - Try a Gradio demo for domain disentanglement in TranSVAE at Hugging Face Spaces!
🤗 - [2022.08] - Our paper is available on arXiv, click here to check it out!
Outline
- Highlights
- Installation
- Data Preparation
- Getting Started
- Main Results
- TODO List
- License
- Acknowledgement
- Citation
Highlight
Conceptual Comparison |
---|
Graphical Model |
Framework Overview |
Installation
Please refer to INSTALL.md for the installation details.
Data Preparation
Please refer to DATA_PREPARE.md for the details to prepare the 1UCF101, 2HMDB51, 3Jester, 4Epic-Kitchens, and 5Sprites datasets.
Getting Started
Please refer to GET_STARTED.md to learn more usage about this codebase.
Main Result
UCF101 - HMDB51
Method | Backbone | U101 → H51 | H51 → U101 | Average |
---|---|---|---|---|
DANN (JMLR'16) | ResNet-101 | 75.28 | 76.36 | 75.82 |
JAN (ICML'17) | ResNet-101 | 74.72 | 76.69 | 75.71 |
AdaBN (PR'18) | ResNet-101 | 72.22 | 77.41 | 74.82 |
MCD (CVPR'18) | ResNet-101 | 73.89 | 79.34 | 76.62 |
TA3N (ICCV'19) | ResNet-101 | 78.33 | 81.79 | 80.06 |
ABG (MM'20) | ResNet-101 | 79.17 | 85.11 | 82.14 |
TCoN (AAAI'20) | ResNet-101 | 87.22 | 89.14 | 88.18 |
MA2L-TD (WACV'22) | ResNet-101 | 85.00 | 86.59 | 85.80 |
Source-only | I3D | 80.27 | 88.79 | 84.53 |
DANN (JMLR'16) | I3D | 80.83 | 88.09 | 84.46 |
ADDA (CVPR'17) | I3D | 79.17 | 88.44 | 83.81 |
TA3N (ICCV'19) | I3D | 81.38 | 90.54 | 85.96 |
SAVA (ECCV'20) | I3D | 82.22 | 91.24 | 86.73 |
CoMix (NeurIPS'21) | I3D | 86.66 | 93.87 | 90.22 |
CO2A (WACV'22) | I3D | 87.78 | 95.79 | 91.79 |
TranSVAE (Ours) | I3D | 87.78 | 98.95 | 93.37 |
Oracle | I3D | 95.00 | 96.85 | 95.93 |
Jester
Task | Source-only | DANN | ADDA | TA3N | CoMix | TranSVAE (Ours) | Oracle |
---|---|---|---|---|---|---|---|
JS → JT | 51.5 | 55.4 | 52.3 | 55.5 | 64.7 | 66.1 | 95.6 |
Epic-Kitchens
Task | Source-only | DANN | ADDA | TA3N | CoMix | TranSVAE (Ours) | Oracle |
---|---|---|---|---|---|---|---|
D1 → D2 | 32.8 | 37.7 | 35.4 | 34.2 | 42.9 | 50.5 | 64.0 |
D1 → D3 | 34.1 | 36.6 | 34.9 | 37.4 | 40.9 | 50.3 | 63.7 |
D2 → D1 | 35.4 | 38.3 | 36.3 | 40.9 | 38.6 | 50.3 | 57.0 |
D2 → D3 | 39.1 | 41.9 | 40.8 | 42.8 | 45.2 | 58.6 | 63.7 |
D3 → D1 | 34.6 | 38.8 | 36.1 | 39.9 | 42.3 | 48.0 | 57.0 |
D3 → D2 | 35.8 | 42.1 | 41.4 | 44.2 | 49.2 | 58.0 | 64.0 |
Average | 35.3 | 39.2 | 37.4 | 39.9 | 43.2 | 52.6 | 61.5 |
Ablation Study
Domain Transfer Example
TODO List
- Initial release. 🚀
- Add license. See here for more details.
- Add demo at Hugging Face Spaces.
- Add installation details.
- Add data preparation details.
- Add evaluation details.
- Add training details.
License
This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Acknowledgement
We acknowledge the use of the following public resources during the course of this work: 1UCF101, 2HMDB51, 3Jester, 4Epic-Kitchens, 5Sprites, 6I3D, and 7TRN.
Citation
If you find this work helpful, please kindly consider citing our paper:
@ARTICLE{wei2022transvae,
title={Unsupervised Video Domain Adaptation: A Disentanglement Perspective},
author={Wei, Pengfei and Kong, Lingdong and Qu, Xinghua and Yin, Xiang and Xu, Zhiqiang and Jiang, Jing and Ma, Zejun},
journal={arXiv preprint arXiv:2208.07365},
year={2022},
}