HOI-Learning-List
Some recent (2015-now) Human-Object Interaction Learning studies. If you find any errors or problems, please don't hesitate to comment.
A list of Transfomer-based vision works: https://github.com/DirtyHarryLYL/Transformer-in-Vision.
Image Dataset/Benchmark
-
SynHOI (arXiv 2023.5), synthetic HOI data [Paper]
-
HICO-DET-SG, V-COCO-SG (new splits of HICO-DET and V-COCO) [Paper], [Code]
-
HOI-COCO (CVPR2021) [Website]
-
PaStaNet-HOI (TPAMI2021) [Benchmark]
-
HAKE (CVPR2020) [YouTube] [bilibili] [Website] [Paper] [HAKE-Action-Torch] [HAKE-Action-TF]
-
PIC [Website]
More...
Video HOI Datasets
-
AVA [Website], HOIs (human-object, human-human), and pose (body motion) actions
-
Action Genome [Website], action verbs and spatial relationships
3D HOI Datasets
Suevry
- Human object interaction detection: Design and survey (Image and Vision Computing 2022), [Paper]
Method
HOI Image Generation
-
Exploiting Relationship for Complex-scene Image Generation (arXiv 2021.04) [Paper]
-
Specifying Object Attributes and Relations in Interactive Scene Generation (arXiv 2019.11) [Paper]
HOI Recognition: Image-based, to recognize all the HOIs in one image.
-
DEFR (arXiv 2021.12) [Paper]
-
Interaction Compass (ICCV 2021) [Paper]
-
DEFR-CLIP (arXiv 2021.07) [Paper]
-
PaStaNet: Toward Human Activity Knowledge Engine (CVPR2020) [Code] [Data] [Paper] [YouTube] [bilibili]
-
Pairwise (ECCV2018) [Paper]
-
Attentional Pooling for Action Recognition (NIPS2017) [Code] [Paper]
-
Learning Models for Actions and Person-Object Interactions with Transfer to Question Answering (ECCV2016) [Code] [Paper]
-
Contextual Action Recognition with R*CNN (ICCV2015) [Code] [Paper]
-
SGAP-Net (AAAI2020) [Paper]
More...
Unseen or zero-shot learning (image-level recognition).
-
Compositional Learning for Human Object Interaction (ECCV2018) [Paper]
-
Zero-Shot Human-Object Interaction Recognition via Affordance Graphs (Sep. 2020) [Paper]
More...
HOI Detection: Instance-based, to detect the human-object pairs and classify the interactions.
-
PSN (arXiv 2023), [Paper]
-
HOKEM (arXiv 2023), [Paper]
-
OpenCat (CVPR 2023), [Paper]
-
DiffHOI (arXiv 2023.5), [Paper]
-
UniVRD (arXiv 2023), [Paper]
-
SKGHOI (arXiv 2023), [Paper]
-
PR-Net (arXiv 2023), [Paper]
-
PQNet (MMAsia 2022), [Paper]
-
MHOI (TCSVT 2022), [Paper]
-
K-BAN (arXiv 2022), [Paper]
-
SGCN4HOI (IEEE SMC 2022), [Paper]
-
ODM (ECCV 2022), [Paper]
-
SDT (arXiv 2022), [Paper]
-
STIP (CVPR 2022), [Paper]
-
DT (CVPR 2022), [Paper]
-
CATN (CVPR 2022), [Paper]
-
SSRT (CVPR 2022), [Paper]
-
MSTR (CVPR 2022), [Paper]
-
Iwin (ECCV 2022), [Paper]
-
RGBM (arXiv 2022.2), [Paper]
-
PhraseHOI (AAAI 2022) [Paper]
-
DEFR (arXiv 2021.12) [Paper]
-
HRNet (TIP 2021) [Paper]
-
SG2HOI (ICCV 2021) [Paper]
-
HOI-MO-Net (IVC 2021) [Paper]
-
IPGN (TIP 2021.7) [Paper]
-
Human Object Interaction Detection using Two-Direction Spatial Enhancement and Exclusive Object Prior (arXiv) [Paper]
-
PST (ICCV2021) [Paper]
-
RR-Net (arXiv 2021.5) [Paper]
-
End-to-End Human Object Interaction Detection with HOI Transformer (CVPR2021), [Paper], [Code]
-
DIRV (AAAI2021) [Paper]
-
DecAug (AAAI2021) [Paper]
-
OSGNet (IEEE Access) [Paper]
-
PFNet (CVM) [Paper]
-
UniDet (ECCV2020) [Paper]
-
FCMNet (ECCV2020) [Paper]
-
Contextual Heterogeneous Graph Network for Human-Object Interaction Detection (ECCV2020) [Paper]
-
ConsNet (ACMMM2020) [Paper] [Code], HICO-DET Python API: A general Python toolkit for the HICO-DET dataset, including APIs for data loading & processing, human-object pair IoU & NMS calculation, and standard evaluation. [Code] [Documentation]
-
Action-Guided Attention Mining and Relation Reasoning Network for Human-Object Interaction Detection (IJCAI2020) [Paper]
-
PaStaNet (CVPR2020) [Code] [Data] [Paper] [YouTube] [bilibili]
-
Cascaded Human-Object Interaction Recognition (CVPR2020) [Code] [Paper]
-
Diagnosing Rarity in Human-Object Interaction Detection (CVPRW2020) [Paper]
-
MLCNet (ICMR2020) [Paper]
-
SIGN (ICME2020) [Paper]
-
In-GraphNet (IJCAI-PRICAI 2020) [Paper]
-
RPNN (ICCV2019) [Paper]
-
Deep Contextual Attention for Human-Object Interaction Detection (ICCV2019) [Paper]
-
Turbo (AAAI2019) [Paper]
-
InteractNet (CVPR2018) [Paper]
-
Scaling Human-Object Interaction Recognition through Zero-Shot Learning (WACV2018) [Paper]
-
VS-GATs (Mar. 2020) [Paper]
-
Classifying All Interacting Pairs in a Single Shot (Jan. 2020) [Paper]
-
Novel Human-Object Interaction Detection via Adversarial Domain Generalization (May. 2020) [Paper]
-
SABRA (Dec 2020) [Paper]
More...
Unseen or zero/low-shot or weakly-supervised learning (instance-level detection).
-
Unal et.al. (arXiv 2023), [Paper]
-
Align-Former (BMVC 2021), [Paper]
-
Discovering Human Interactions with Large-Vocabulary Objects via Query and Multi-Scale Detection (ICCV2021) [Paper], [Code]
-
DGIG-Net (TOC2021) [Paper]
-
Detecting Human-Object Interaction with Mixed Supervision (WACV 2021) [Paper]
-
Zero-Shot Human-Object Interaction Recognition via Affordance Graphs (Sep. 2020) [Paper]
-
Novel Human-Object Interaction Detection via Adversarial Domain Generalization (May. 2020) [Paper]
-
Functional (AAAI2020) [Paper]
-
Scaling Human-Object Interaction Recognition through Zero-Shot Learning (WACV2018) [Paper]
More...
Video HOI methods
-
SPDTP (arXiv, Jun 2022), [Paper]
-
V-HOI (arXiv, Jun 2022), [Paper]
-
Detecting Human-Object Relationships in Videos (ICCV2021) [Paper]
-
VidHOI (May 2021), [Paper]
-
Generating Videos of Zero-Shot Compositions of Actions and Objects (Jul 2020), HOI GAN, [Paper]
-
Grounded Human-Object Interaction Hotspots from Video (ICCV2019) [Code] [Paper]
More...
3D HOI Reconstruction/Generation
Result
PaStaNet-HOI:
Proposed by TIN (TPAMI version, Transferable Interactiveness Network). It is built on HAKE data, includes 110K+ images and 520 HOIs (without the 80 "no_interaction" HOIs of HICO-DET to avoid the incomplete labeling). It has a more severe long-tailed data distribution thus is more difficult.
Detector: COCO pre-trained
Method | mAP |
---|---|
iCAN | 11.00 |
iCAN+NIS | 13.13 |
TIN | 15.38 |
HICO-DET:
1) Detector: COCO pre-trained
Method | Pub | Full(def) | Rare(def) | None-Rare(def) | Full(ko) | Rare(ko) | None-Rare(ko) |
---|---|---|---|---|---|---|---|
Shen et al. | WACV2018 | 6.46 | 4.24 | 7.12 | - | - | - |
HO-RCNN | WACV2018 | 7.81 | 5.37 | 8.54 | 10.41 | 8.94 | 10.85 |
InteractNet | CVPR2018 | 9.94 | 7.16 | 10.77 | - | - | - |
Turbo | AAAI2019 | 11.40 | 7.30 | 12.60 | - | - | - |
GPNN | ECCV2018 | 13.11 | 9.34 | 14.23 | - | - | - |
Xu et. al | ICCV2019 | 14.70 | 13.26 | 15.13 | - | - | - |
iCAN | BMVC2018 | 14.84 | 10.45 | 16.15 | 16.26 | 11.33 | 17.73 |
Wang et. al. | ICCV2019 | 16.24 | 11.16 | 17.75 | 17.73 | 12.78 | 19.21 |
Lin et. al | IJCAI2020 | 16.63 | 11.30 | 18.22 | 19.22 | 14.56 | 20.61 |
Functional (suppl) | AAAI2020 | 16.96 | 11.73 | 18.52 | - | - | - |
Interactiveness | CVPR2019 | 17.03 | 13.42 | 18.11 | 19.17 | 15.51 | 20.26 |
No-Frills | ICCV2019 | 17.18 | 12.17 | 18.68 | - | - | - |
RPNN | ICCV2019 | 17.35 | 12.78 | 18.71 | - | - | - |
PMFNet | ICCV2019 | 17.46 | 15.65 | 18.00 | 20.34 | 17.47 | 21.20 |
SIGN | ICME2020 | 17.51 | 15.31 | 18.53 | 20.49 | 17.53 | 21.51 |
Interactiveness-optimized | CVPR2019 | 17.54 | 13.80 | 18.65 | 19.75 | 15.70 | 20.96 |
Liu et.al. | arXiv | 17.55 | 20.61 | - | - | - | - |
Wang et al. | ECCV2020 | 17.57 | 16.85 | 17.78 | 21.00 | 20.74 | 21.08 |
In-GraphNet | IJCAI-PRICAI 2020 | 17.72 | 12.93 | 19.31 | - | - | - |
HOID | CVPR2020 | 17.85 | 12.85 | 19.34 | - | - | - |
MLCNet | ICMR2020 | 17.95 | 16.62 | 18.35 | 22.28 | 20.73 | 22.74 |
SAG | arXiv | 18.26 | 13.40 | 19.71 | - | - | - |
Sarullo et al. | arXiv | 18.74 | - | - | - | - | - |
DRG | ECCV2020 | 19.26 | 17.74 | 19.71 | 23.40 | 21.75 | 23.89 |
Analogy | ICCV2019 | 19.40 | 14.60 | 20.90 | - | - | - |
VCL | ECCV2020 | 19.43 | 16.55 | 20.29 | 22.00 | 19.09 | 22.87 |
VS-GATs | arXiv | 19.66 | 15.79 | 20.81 | - | - | - |
VSGNet | CVPR2020 | 19.80 | 16.05 | 20.91 | - | - | - |
PFNet | CVM | 20.05 | 16.66 | 21.07 | 24.01 | 21.09 | 24.89 |
ATL(w/ COCO) | CVPR2021 | 20.08 | 15.57 | 21.43 | - | - | - |
FCMNet | ECCV2020 | 20.41 | 17.34 | 21.56 | 22.04 | 18.97 | 23.12 |
ACP | ECCV2020 | 20.59 | 15.92 | 21.98 | - | - | - |
PD-Net | ECCV2020 | 20.81 | 15.90 | 22.28 | 24.78 | 18.88 | 26.54 |
SG2HOI | ICCV2021 | 20.93 | 18.24 | 21.78 | 24.83 | 20.52 | 25.32 |
TIN-PAMI | TAPMI2021 | 20.93 | 18.95 | 21.32 | 23.02 | 20.96 | 23.42 |
ATL | CVPR2021 | 21.07 | 16.79 | 22.35 | - | - | - |
PMN | arXiv | 21.21 | 17.60 | 22.29 | - | - | - |
IPGN | TIP2021 | 21.26 | 18.47 | 22.07 | - | - | - |
DJ-RN | CVPR2020 | 21.34 | 18.53 | 22.18 | 23.69 | 20.64 | 24.60 |
OSGNet | IEEE Access | 21.40 | 18.12 | 22.38 | - | - | - |
K-BAN | arXiv2022 | 21.48 | 16.85 | 22.86 | 24.29 | 19.09 | 25.85 |
SCG+ODM | ECCV2022 | 21.50 | 17.59 | 22.67 | - | - | - |
DIRV | AAAI2021 | 21.78 | 16.38 | 23.39 | 25.52 | 20.84 | 26.92 |
SCG | ICCV2021 | 21.85 | 18.11 | 22.97 | - | - | - |
HRNet | TIP2021 | 21.93 | 16.30 | 23.62 | 25.22 | 18.75 | 27.15 |
ConsNet | ACMMM2020 | 22.15 | 17.55 | 23.52 | 26.57 | 20.8 | 28.3 |
SKGHOI | arXiv2023 | 22.61 | 15.87 | 24.62 | - | - | - |
IDN | NeurIPS2020 | 23.36 | 22.47 | 23.63 | 26.43 | 25.01 | 26.85 |
QAHOI-Res50 | arXiv2021 | 24.35 | 16.18 | 26.80 | - | - | - |
DOQ | CVPR2022 | 25.97 | 26.09 | 25.93 | - | - | - |
STIP | CVPR2022 | 28.81 | 27.55 | 29.18 | 32.28 | 31.07 | 32.64 |
2) Detector: pre-trained on COCO, fine-tuned on HICO-DET train set (with GT human-object pair boxes) or one-stage detector (point-based, transformer-based)
The finetuned detector would learn to only detect the interactive humans and objects (with interactiveness), thus suppressing many wrong pairings (non-interactive human-object pairs) and boosting the performance.
Method | Pub | Full(def) | Rare(def) | None-Rare(def) | Full(ko) | Rare(ko) | None-Rare(ko) |
---|---|---|---|---|---|---|---|
UniDet | ECCV2020 | 17.58 | 11.72 | 19.33 | 19.76 | 14.68 | 21.27 |
IP-Net | CVPR2020 | 19.56 | 12.79 | 21.58 | 22.05 | 15.77 | 23.92 |
RR-Net | arXiv | 20.72 | 13.21 | 22.97 | - | - | - |
PPDM (paper) | CVPR2020 | 21.10 | 14.46 | 23.09 | - | - | - |
PPDM (github-hourglass104) | CVPR2020 | 21.73/21.94 | 13.78/13.97 | 24.10/24.32 | 24.58/24.81 | 16.65/17.09 | 26.84/27.12 |
Functional | AAAI2020 | 21.96 | 16.43 | 23.62 | - | - | - |
SABRA-Res50 | arXiv | 23.48 | 16.39 | 25.59 | 28.79 | 22.75 | 30.54 |
VCL | ECCV2020 | 23.63 | 17.21 | 25.55 | 25.98 | 19.12 | 28.03 |
ATL | CVPR2021 | 23.67 | 17.64 | 25.47 | 26.01 | 19.60 | 27.93 |
PST | ICCV2021 | 23.93 | 14.98 | 26.60 | 26.42 | 17.61 | 29.05 |
SABRA-Res50FPN | arXiv | 24.12 | 15.91 | 26.57 | 29.65 | 22.92 | 31.65 |
ATL(w/ COCO) | CVPR2021 | 24.50 | 18.53 | 26.28 | 27.23 | 21.27 | 29.00 |
IDN | NeurIPS2020 | 24.58 | 20.33 | 25.86 | 27.89 | 23.64 | 29.16 |
FCL | CVPR2021 | 24.68 | 20.03 | 26.07 | 26.80 | 21.61 | 28.35 |
HOTR | CVPR2021 | 25.10 | 17.34 | 27.42 | - | - | - |
FCL+VCL | CVPR2021 | 25.27 | 20.57 | 26.67 | 27.71 | 22.34 | 28.93 |
OC-Immunity | AAAI2022 | 25.44 | 23.03 | 26.16 | 27.24 | 24.32 | 28.11 |
ConsNet-F | ACMMM2020 | 25.94 | 19.35 | 27.91 | 30.34 | 23.4 | 32.41 |
SABRA-Res152 | arXiv | 26.09 | 16.29 | 29.02 | 31.08 | 23.44 | 33.37 |
QAHOI-Res50 | arXiv2021 | 26.18 | 18.06 | 28.61 | - | - | - |
Zou et al. | CVPR2021 | 26.61 | 19.15 | 28.84 | 29.13 | 20.98 | 31.57 |
SKGHOI | arXiv2023 | 26.95 | 21.28 | 28.56 | - | - | - |
RGBM | arXiv2022 | 27.39 | 21.34 | 29.20 | 30.87 | 24.20 | 32.87 |
GTNet | arXiv | 28.03 | 22.73 | 29.61 | 29.98 | 24.13 | 31.73 |
K-BAN | arXiv2022 | 28.83 | 20.29 | 31.31 | 31.05 | 21.41 | 33.93 |
AS-Net | CVPR2021 | 28.87 | 24.25 | 30.25 | 31.74 | 27.07 | 33.14 |
QPIC-Res50 | CVPR2021 | 29.07 | 21.85 | 31.23 | 31.68 | 24.14 | 33.93 |
GGNet | CVPR2021 | 29.17 | 22.13 | 30.84 | 33.50 | 26.67 | 34.89 |
QPIC-CPC | CVPR2022 | 29.63 | 23.14 | 31.57 | - | - | - |
QPIC-Res101 | CVPR2021 | 29.90 | 23.92 | 31.69 | 32.38 | 26.06 | 34.27 |
SCG | ICCV2021 | 29.26 | 24.61 | 30.65 | 32.87 | 27.89 | 34.35 |
MHOI | TCSVT2022 | 29.67 | 24.37 | 31.25 | 31.87 | 27.28 | 33.24 |
PhraseHOI | AAAI2022 | 30.03 | 23.48 | 31.99 | 33.74 | 27.35 | 35.64 |
MSTR | CVPR2022 | 31.17 | 25.31 | 32.92 | 34.02 | 28.83 | 35.57 |
SSRT | CVPR2022 | 31.34 | 24.31 | 33.32 | - | - | - |
OCN | AAAI2022 | 31.43 | 25.80 | 33.11 | - | - | - |
SCG+ODM | ECCV2022 | 31.65 | 24.95 | 33.65 | - | - | - |
DT | CVPR2022 | 31.75 | 27.45 | 33.03 | 34.50 | 30.13 | 35.81 |
ParSe (COCO) | NeurIPS2022 | 31.79 | 26.36 | 33.41 | - | - | - |
CATN (w/ Bert) | CVPR2022 | 31.86 | 25.15 | 33.84 | 34.44 | 27.69 | 36.45 |
SQA | ICASSP2023 | 31.99 | 29.88 | 32.62 | 35.12 | 32.74 | 35.84 |
CDN | NeurIPS2021 | 32.07 | 27.19 | 33.53 | 34.79 | 29.48 | 36.38 |
STIP | CVPR2022 | 32.22 | 28.15 | 33.43 | 35.29 | 31.43 | 36.45 |
DEFR | arXiv2021 | 32.35 | 33.45 | 32.02 | - | - | - |
PQNet-L | mmasia2022 | 32.45 | 27.80 | 33.84 | 35.28 | 30.72 | 36.64 |
CDN-s+HQM | ECCV2022 | 32.47 | 28.15 | 33.76 | - | - | - |
UPT | CVPR2022 | 32.62 | 28.62 | 33.81 | 36.08 | 31.41 | 37.47 |
OpenCat | CVPR2023 | 32.68 | 28.42 | 33.75 | - | - | - |
Iwin | ECCV2022 | 32.79 | 27.84 | 35.40 | 35.84 | 28.74 | 36.09 |
RLIP-ParSe (VG+COCO) | NeurIPS2022 | 32.84 | 26.85 | 34.63 | - | - | - |
PR-Net | arXiv2023 | 32.86 | 28.03 | 34.30 | - | - | - |
MUREN | CVPR2023 | 32.87 | 28.67 | 34.12 | 35.52 | 30.88 | 36.91 |
SDT | arXiv2022 | 32.97 | 28.49 | 34.31 | 36.32 | 31.90 | 37.64 |
DOQ | CVPR2022 | 33.28 | 29.19 | 34.50 | - | - | - |
IF | CVPR2022 | 33.51 | 30.30 | 34.46 | 36.28 | 33.16 | 37.21 |
PSN | arXiv2023 | 34.02 | 29.44 | 35.39 | - | - | - |
HOICLIP | CVPR2023 | 34.69 | 31.12 | 35.74 | 37.61 | 34.47 | 38.54 |
GEN-VLKT (w/ CLIP) | CVPR2022 | 34.95 | 31.18 | 36.08 | 38.22 | 34.36 | 39.37 |
SOV-STG (res101) | arXiv2023 | 35.01 | 30.63 | 36.32 | 37.60 | 32.77 | 39.05 |
PartMap | ECCV2022 | 35.15 | 33.71 | 35.58 | 37.56 | 35.87 | 38.06 |
QAHOI-Swin-Large-ImageNet-22K | arXiv2021 | 35.78 | 29.80 | 37.56 | 37.59 | 31.66 | 39.36 |
GEN-VLKT-L + CQL | CVPR2023 | 36.03 | 33.16 | 36.89 | 38.82 | 35.51 | 39.81 |
FGAHOI | arXiv2023 | 37.18 | 30.71 | 39.11 | 38.93 | 31.93 | 41.02 |
ViPLO | CVPR2023 | 37.22 | 35.45 | 37.75 | 40.61 | 38.82 | 41.15 |
UniVRD w/ extra data+VLM | arXiv2023 | 38.61 | 33.39 | 40.16 | - | - | - |
DiffHOI w/ syn data | arXiv2023 | 41.50 | 39.96 | 41.96 | 43.62 | 41.41 | 44.28 |
SOV-STG (swin-l) | arXiv2023 | 43.35 | 42.25 | 43.69 | 45.53 | 43.62 | 46.11 |
3) Ground Truth human-object pair boxes (only evaluating HOI recognition)
Method | Pub | Full(def) | Rare(def) | None-Rare(def) |
---|---|---|---|---|
iCAN | BMVC2018 | 33.38 | 21.43 | 36.95 |
Interactiveness | CVPR2019 | 34.26 | 22.90 | 37.65 |
Analogy | ICCV2019 | 34.35 | 27.57 | 36.38 |
ATL | CVPR2021 | 43.32 | 33.84 | 46.15 |
IDN | NeurIPS2020 | 43.98 | 40.27 | 45.09 |
ATL(w/ COCO) | CVPR2021 | 44.27 | 35.52 | 46.89 |
FCL | CVPR2021 | 45.25 | 36.27 | 47.94 |
GTNet | arXiv | 46.45 | 35.10 | 49.84 |
SCG | ICCV2021 | 51.53 | 41.01 | 54.67 |
K-BAN | arXiv2022 | 52.99 | 34.91 | 58.40 |
ConsNet | ACMMM2020 | 53.04 | 38.79 | 57.3 |
ViPLO | CVPR2023 | 62.09 | 59.26 | 62.93 |
Interactiveness detection (interactive or not + pair box detection):
4)Method | Pub | HICO-DET | V-COCO |
---|---|---|---|
TIN++ | TPAMI2022 | 14.35 | 29.36 |
PPDM | CVPR2020 | 27.34 | - |
QPIC | CVPR2021 | 32.96 | 38.33 |
CDN | NeurIPS2021 | 33.55 | 40.13 |
PartMap | ECCV2022 | 38.74 | 43.61 |
5) Enhanced with HAKE:
Method | Pub | Full(def) | Rare(def) | None-Rare(def) | Full(ko) | Rare(ko) | None-Rare(ko) |
---|---|---|---|---|---|---|---|
iCAN | BMVC2018 | 14.84 | 10.45 | 16.15 | 16.26 | 11.33 | 17.73 |
iCAN + HAKE-HICO-DET | CVPR2020 | 19.61 (+4.77) | 17.29 | 20.30 | 22.10 | 20.46 | 22.59 |
Interactiveness | CVPR2019 | 17.03 | 13.42 | 18.11 | 19.17 | 15.51 | 20.26 |
Interactiveness + HAKE-HICO-DET | CVPR2020 | 22.12 (+5.09) | 20.19 | 22.69 | 24.06 | 22.19 | 24.62 |
Interactiveness + HAKE-Large | CVPR2020 | 22.66 (+5.63) | 21.17 | 23.09 | 24.53 | 23.00 | 24.99 |
6) Zero-Shot HOI detection:
Unseen action-object combination scenario (UC)
Method | Pub | Detector | Unseen(def) | Seen(def) | Full(def) |
---|---|---|---|---|---|
Shen et al. | WACV2018 | COCO | 5.62 | - | 6.26 |
Functional | AAAI2020 | HICO-DET | 11.31 ± 1.03 | 12.74 ± 0.34 | 12.45 ± 0.16 |
ConsNet | ACMMM2020 | COCO | 16.99 ± 1.67 | 20.51 ± 0.62 | 19.81 ± 0.32 |
EoID | AAAI2023 | - | 23.01±1.54 | 30.39±0.40 | 28.91±0.27 |
VCL (NF-UC) | ECCV2020 | HICO-DET | 16.22 | 18.52 | 18.06 |
ATL(w/ COCO) ((NF-UC)) | CVPR2021 | HICO-DET | 18.25 | 18.78 | 18.67 |
FCL (NF-UC) | CVPR2021 | HICO-DET | 18.66 | 19.55 | 19.37 |
RLIP-ParSe (RF-UC) | NeurIPS2022 | COCO, VG | 20.27 | 27.67 | 26.19 |
SCL | arxiv | HICO-DET | 21.73 | 25.00 | 24.34 |
OpenCat(NF-UC) | CVPR2023 | HICO-DET | 23.25 | 28.04 | 27.08 |
GEN-VLKT* (NF-UC) | CVPR2022 | HICO-DET | 25.05 | 23.38 | 23.71 |
EoID (NF-UC) | AAAI2023 | HICO-DET | 26.77 | 26.66 | 26.69 |
HOICLIP (NF-UC) | CVPR2023 | HICO-DET | 26.39 | 28.10 | 27.75 |
DiffHOI w/ syn data (NF-UC) | arXiv2023 | HICO-DET + syn data | 29.45 | 31.68 | 31.24 |
VCL (RF-UC) | ECCV2020 | HICO-DET | 10.06 | 24.28 | 21.43 |
ATL(w/ COCO) ((RF-UC)) | CVPR2021 | HICO-DET | 9.18 | 24.67 | 21.57 |
FCL (RF-UC) | CVPR2021 | HICO-DET | 13.16 | 24.23 | 22.01 |
SCL (RF-UC) | arxiv | HICO-DET | 19.07 | 30.39 | 28.08 |
RLIP-ParSe (RF-UC) | NeurIPS2022 | COCO, VG | 19.19 | 33.35 | 30.52 |
GEN-VLKT* (RF-UC) | CVPR2022 | HICO-DET | 21.36 | 32.91 | 30.56 |
OpenCat(RF-UC) | CVPR2023 | HICO-DET | 21.46 | 33.86 | 31.38 |
HOICLIP (RF-UC) | CVPR2023 | HICO-DET | 25.53 | 34.85 | 32.99 |
DiffHOI w/ syn data (RF-UC) | arXiv2023 | HICO-DET + syn data | 28.76 | 38.01 | 36.16 |
- * indicates large Visual-Language model pretraining, \eg, CLIP.
- For the details of the setting, please refer to corresponding publications. This is not officially published and might miss some publications. Please find the corresponding publications.
Zero-shot* HOI detection without fine-tuning (NF)
Method | Pub | Backbone | Dataset | Detector | Full | Rare | Non-Rare |
---|---|---|---|---|---|---|---|
RLIP-ParSeD | NeurIPS2022 | ResNet-50 | COCO + VG | DDETR | 13.92 | 11.20 | 14.73 |
RLIP-ParSe | NeurIPS2022 | ResNet-50 | COCO + VG | DETR | 15.40 | 15.08 | 15.50 |
- * indicates a formulation that assesses the generalization of a pre-training model to unseen distributions, proposed in RLIP. zero-shot follows the terminology from CLIP.
Unseen object scenario (UO)
Method | Pub | Detector | Full(def) | Seen(def) | Unseen(def) |
---|---|---|---|---|---|
Functional | AAAI2020 | HICO-DET | 13.84 | 14.36 | 11.22 |
FCL | CVPR2021 | HICO-DET | 19.87 | 20.74 | 15.54 |
ConsNet | ACMMM2020 | COCO | 20.71 | 20.99 | 19.27 |
Unseen action scenario (UA)
Method | Pub | Detector | Full(def) | Seen(def) | Unseen(def) |
---|---|---|---|---|---|
ConsNet | ACMMM2020 | COCO | 19.04 | 20.02 | 14.12 |
EoID | AAAI2023 | - | 29.22 | 30.46 | 23.04 |
Unseen action scenario (UV), results from EoID
Method | Pub | Detector | Unseen(def) | Seen(def) | Full(def) |
---|---|---|---|---|---|
GEN-VLKT | CVPR2022 | - | 20.96 | 30.23 | 28.74 |
EoID | AAAI2023 | - | 22.71 | 30.73 | 29.61 |
Another setting
Method | Pub | Unseen | Seen | Full |
---|---|---|---|---|
Shen et. al. | WACV2018 | 5.62 | - | 6.26 |
Functional | AAAI2020 | 10.93 | 12.60 | 12.26 |
VCL | ECCV2020 | 10.06 | 24.28 | 21.43 |
ATL | CVPR2021 | 9.18 | 24.67 | 21.57 |
FCL | CVPR2021 | 13.16 | 24.23 | 22.01 |
THID (w/ CLIP) | CVPR2022 | 15.53 | 24.32 | 22.96 |
EoID | AAAI2023 | 22.04 | 31.39 | 29.52 |
GEN-VLKT | CVPR2022 | 21.36 | 32.91 | 30.56 |
7) Few-Shot HOI detection:
1% HICO-Det Data used in fine-tuning
Method | Pub | Backbone | Dataset | Detector | Data | Full | Rare | Non-Rare |
---|---|---|---|---|---|---|---|---|
RLIP-ParSeD | NeurIPS2022 | ResNet-50 | COCO + VG | DDETR | 1% | 18.30 | 16.22 | 18.92 |
RLIP-ParSe | NeurIPS2022 | ResNet-50 | COCO + VG | DETR | 1% | 18.46 | 17.47 | 18.76 |
10% HICO-Det Data used in fine-tuning
Method | Pub | Backbone | Dataset | Detector | Data | Full | Rare | Non-Rare |
---|---|---|---|---|---|---|---|---|
RLIP-ParSeD | NeurIPS2022 | ResNet-50 | COCO + VG | DDETR | 10% | 22.09 | 15.89 | 23.94 |
RLIP-ParSe | NeurIPS2022 | ResNet-50 | COCO + VG | DETR | 10% | 22.59 | 20.16 | 23.32 |
8) Weakly-supervised HOI detection:
Method | Pub | Backbone | Dataset | Detector | Full | Rare | Non-Rare |
---|---|---|---|---|---|---|---|
Explanation-HOI | ECCV2020 | ResNeXt101 | COCO | FRCNN | 10.63 | 8.71 | 11.20 |
MX-HOI | WACV2021 | ResNet-101 | COCO | FRCNN | 16.14 | 12.06 | 17.50 |
PPR-FCN (from Weakly-HOI-CLIP) | ICCV2017 | ResNet-50, CLIP | COCO | FRCNN | 17.55 | 15.69 | 18.41 |
Align-Former | BMVC2021 | ResNet-101 | - | - | 20.85 | 18.23 | 21.64 |
Weakly-HOI-CLIP | ICLR2023 | ResNet-101, CLIP | COCO | FRCNN | 25.70 | 24.52 | 26.05 |
OpenCat | CVPR 2023 | DETR | - | - | 25.82 | 24.35 | 26.19 |
Ambiguous-HOI
Detector: COCO pre-trained
Method | mAP |
---|---|
iCAN | 8.14 |
Interactiveness | 8.22 |
Analogy(reproduced) | 9.72 |
DJ-RN | 10.37 |
OC-Immunity | 10.45 |
SWiG-HOI
Method | Pub | Non-Rare | Unseen | Seen | Full |
---|---|---|---|---|---|
JSR | ECCV2020 | 10.01 | 6.10 | 2.34 | 6.08 |
CHOID | ICCV2021 | 10.93 | 6.63 | 2.64 | 6.64 |
QPIC | CVPR2021 | 16.95 | 10.84 | 6.21 | 11.12 |
THID (w/ CLIP) | CVPR2022 | 17.67 | 12.82 | 10.04 | 13.26 |
V-COCO: Scenario1
1) Detector: COCO pre-trained or one-stage detector
Method | Pub | AP(role) |
---|---|---|
Gupta et al. | arXiv | 31.8 |
InteractNet | CVPR2018 | 40.0 |
Turbo | AAAI2019 | 42.0 |
GPNN | ECCV2018 | 44.0 |
UniVRD w/ extra data+VLM | arXiv2023 | 45.19 |
iCAN | BMVC2018 | 45.3 |
Xu et. al | CVPR2019 | 45.9 |
Wang et. al. | ICCV2019 | 47.3 |
UniDet | ECCV2020 | 47.5 |
Interactiveness | CVPR2019 | 47.8 |
Lin et. al | IJCAI2020 | 48.1 |
VCL | ECCV2020 | 48.3 |
Zhou et. al. | CVPR2020 | 48.9 |
In-GraphNet | IJCAI-PRICAI 2020 | 48.9 |
Interactiveness-optimized | CVPR2019 | 49.0 |
TIN-PAMI | TAPMI2021 | 49.1 |
IP-Net | CVPR2020 | 51.0 |
DRG | ECCV2020 | 51.0 |
RGBM | arXiv2022 | 51.7 |
VSGNet | CVPR2020 | 51.8 |
PMN | arXiv | 51.8 |
PMFNet | ICCV2019 | 52.0 |
Liu et.al. | arXiv | 52.28 |
FCL | CVPR2021 | 52.35 |
PD-Net | ECCV2020 | 52.6 |
Wang et.al. | ECCV2020 | 52.7 |
PFNet | CVM | 52.8 |
Zou et al. | CVPR2021 | 52.9 |
SIGN | ICME2020 | 53.1 |
ACP | ECCV2020 | 52.98 (53.23) |
FCMNet | ECCV2020 | 53.1 |
HRNet | TIP2021 | 53.1 |
SGCN4HOI | IEEESMC2022 | 53.1 |
ConsNet | ACMMM2020 | 53.2 |
IDN | NeurIPS2020 | 53.3 |
SG2HOI | ICCV2021 | 53.3 |
OSGNet | IEEE Access | 53.43 |
SABRA-Res50 | arXiv | 53.57 |
K-BAN | arXiv2022 | 53.70 |
IPGN | TIP2021 | 53.79 |
AS-Net | CVPR2021 | 53.9 |
RR-Net | arXiv | 54.2 |
SCG | ICCV2021 | 54.2 |
HOKEM | arXiv2023 | 54.6 |
SABRA-Res50FPN | arXiv | 54.69 |
GGNet | CVPR2021 | 54.7 |
MLCNet | ICMR2020 | 55.2 |
HOTR | CVPR2021 | 55.2 |
DIRV | AAAI2021 | 56.1 |
SABRA-Res152 | arXiv | 56.62 |
PhraseHOI | AAAI2022 | 57.4 |
GTNet | arXiv | 58.29 |
QPIC-Res101 | CVPR2021 | 58.3 |
QPIC-Res50 | CVPR2021 | 58.8 |
CATN (w/ fastText) | CVPR2022 | 60.1 |
FGAHOI | arXiv2023 | 60.5 |
Iwin | ECCV2022 | 60.85 |
UPT-ResNet-101-DC5 | CVPR2022 | 61.3 |
SDT | arXiv2022 | 61.8 |
OpenCat | CVPR2023 | 61.9 |
MSTR | CVPR2022 | 62.0 |
ViPLO | CVPR2023 | 62.2 |
PR-Net | arXiv2023 | 62.9 |
IF | CVPR2022 | 63.0 |
ParMap | ECCV2022 | 63.0 |
QPIC-CPC | CVPR2022 | 63.1 |
DOQ | CVPR2022 | 63.5 |
HOICLIP | CVPR2023 | 63.5 |
GEN-VLKT (w/ CLIP) | CVPR2022 | 63.58 |
QPIC+HQM | ECCV2022 | 63.6 |
SOV-STG | arXiv2023 | 63.9 |
CDN | NeurIPS2021 | 63.91 |
RLIP-ParSe (COCO+VG) | NeurIPS2022 | 64.2 |
MHOI | TCSVT2022 | 64.5 |
ParSe (COCO) | NeurIPS2022 | 64.8 |
SSRT | CVPR2022 | 65.0 |
OCN | AAAI2022 | 65.3 |
SQA | ICASSP2023 | 65.4 |
DiffHOI | arXiv2023 | 65.7 |
PSN | arXiv2023 | 65.9 |
STIP | CVPR2022 | 66.0 |
DT | CVPR2022 | 66.2 |
GEN-VLKT-L + CQL | CVPR2023 | 66.8 |
MUREN | CVPR2023 | 68.8 |
2) Enhanced with HAKE:
Method | Pub | AP(role) |
---|---|---|
iCAN | CVPR2019 | 45.3 |
iCAN + HAKE-Large (transfer learning) | CVPR2020 | 49.2 (+3.9) |
Interactiveness | CVPR2019 | 47.8 |
Interactiveness + HAKE-Large (transfer learning) | CVPR2020 | 51.0 (+3.2) |
3) Weakly-supervised HOI detection:
Method | Pub | Backbone | Dataset | Detector | AP(role)-S1 | AP(role)-S2 |
---|---|---|---|---|---|---|
Weakly-HOI-CLIP | ICLR2023 | ResNet-101, CLIP | COCO | FRCNN | 44.74 | 49.97 |
HOI-COCO:
based on V-COCO
Method | Pub | Full | Seen | Unseen |
---|---|---|---|---|
VCL | ECCV2020 | 23.53 | 8.29 | 35.36 |
ATL(w/ COCO) | CVPR2021 | 23.40 | 8.01 | 35.34 |
HICO
1) Default
Method | mAP |
---|---|
R*CNN | 28.5 |
Girdhar et.al. | 34.6 |
Mallya et.al. | 36.1 |
Pairwise | 39.9 |
RelViT | 40.12 |
DEFR-base | 44.1 |
OpenTAP | 51.7 |
DEFR-CLIP | 60.5 |
DEFR/16 CLIP | 65.6 |
2) Enhanced with HAKE:
Method | mAP |
---|---|
Mallya et.al. | 36.1 |
Mallya et.al.+HAKE-HICO | 45.0 (+8.9) |
Pairwise | 39.9 |
Pairwise+HAKE-HICO | 45.9 (+6.0) |
Pairwise+HAKE-Large | 46.3 (+6.4) |