Awesome Egocentric Vision

A curated list of egocentric vision resources.

Egocentric (first-person) vision is a sub-field of computer vision that analyses image/video data obtained using a wearable camera simulating a person's visual field.

Papers

Clustered into various problem statements.
Clustered according to the conferences.
- CVPR
- ECCV
- ICCV
- WACV
- BMVC
Datasets

Papers

Clustered in various problem statements.

Action/Activity Recognition

MMG-Ego4D: Multimodal Generalization in Egocentric Action Recognition - Xinyu Gong, Sreyas Mohan, Naina Dhingra, Jean-Charles Bazin, YILEI LI, Zhangyang Wang, Rakesh Ranjan. In CVPR 2023.
Therbligs In Action: Video Understanding through Motion Primitives - Eadom Dessalene, Michael Maynord, Cornelia Fermu ̈ller, Yiannis Aloimonos. In CVPR 2023. [project page]
[Learning Video Representations from Large Language Models](https://arxiv.org/pdf/2212.04501.pdf; https://facebookresearch.github.io/LaViLa) - Yue Zhao, Ishan Misra, Philipp Krähenbühl, Rohit Girdhar. In CVPR 2023. [project page] [code] [demo]
Learning State-Aware Visual Representations from Audible Interactions - Himangi Mittal, Pedro Morgado, Unnat Jain, Abhinav Gupta. In NeurIPS 2022. [Code] [Video]
Egocentric Activity Recognition and Localization on a 3D Map - Miao Liu, Lingni Ma, Kiran Somasundaram, Yin Li, Kristen Grauman, James M. Rehg, Chao Li. In ECCV 2022.
SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric Action Recognition - Victor Escorcia, Ricardo Guerrero, Xiatian Zhu, Brais Martinez. In ECCV 2022.
E2(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition - Chiara Plizzari, Mirco Planamente, Gabriele Goletto, Marco Cannici, Emanuele Gusso, Matteo Matteucci, Barbara Caputo. In CVPR 2022.
Domain Generalization through Audio-Visual Relative Norm Alignment in First Person Action Recognition - Mirco Planamente, Chiara Plizzari, Emanuele Alberti, and Barbara Caputo. In WACV 2022.
With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition - Evangelos Kazakos, Jaesung Huh, Arsha Nagrani, Andrew Zisserman, and Dima Damen. In BMVC 2021. [project page] [code]
Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips - Lijin Yang, Yifei Huang, Yusuke Sugano, and Yoichi Sato. In BMVC 2021. [project page]
Interactive Prototype Learning for Egocentric Action Recognition - Xiaohan Wang, Linchao Zhu, Heng Wang, and Yi Yang. In ICCV 2021.
Multi-Modal Domain Adaptation for Fine-Grained Action Recognition - Jonathan Munro and Dima Damen. In CVPR 2020. [project page] [code]
Integrating Human Gaze Into Attention for Egocentric Activity Recognition - Kyle Min, Jason J. Corso. In WACV 2021. [code]
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition - Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, and Dima Damen. In ICCV 2019. [code] [project page]
LSTA: Long Short-Term Attention for Egocentric Action Recognition - Swathikiran Sudhakaran, Sergio Escalera, and Oswald Lanz. In CVPR 2019. [code]
Egocentric Activity Recognition on a Budget - Rafael Possas, Sheila Pinto Caceres, and Fabio Ramos. In CVPR 2018. [demo]
From Lifestyle VLOGs to Everyday Interaction - David F. Fouhey, Weicheng Kuo, Alexei A. Efros, and Jitendra Malik. In CVPR 2018. [project page]
Actor and Observer: Joint Modeling of First and Third-Person Videos - Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, and Karteek Alahari. In CVPR 2018. [code]
In the eye of beholder: Joint learning of gaze and actions in first person video - Yin Li, Miao Liu, and James M. Rehg. In ECCV 2018.
Privacy-Preserving Human Activity Recognition from Extreme Low Resolution - Michael S. Ryoo, Brandon Rothrock, Charles Fleming, and Hyun Jong Yang. In AAAI 2017.
Jointly Recognizing Object Fluents and Tasks in Egocentric Videos - Yang Liu, Ping Wei, and Song-Chun Zhu. In ICCV 2017.
Trajectory Aligned Features For First Person Action Recognition - Suriya Singh, Chetan Arora, and C.V. Jawahar. In Pattern Recognition 2017.
First Person Action Recognition Using Deep Learned Descriptors - Suriya Singh, Chetan Arora, and C.V. Jawahar. In CVPR 2016. [project page] [code]
Understanding Hand-Object Manipulation with Grasp Types and Object Attributes - Minjie Cai, Kris M. Kitani, and Yoichi Sato. In Robotics: Science and Systems 2016.
Delving into egocentric actions - Yin Li, Zhefan Ye, and James M. Rehg. In CVPR 2015.
Pooled Motion Features for First-Person Videos - Michael S. Ryoo, Brandon Rothrock, and Larry H. Matthies. In CVPR 2015.
Generating Notifications for Missing Actions: Don't forget to turn the lights off! - Bilge Soran, Ali Farhadi, and Linda Shapiro. In ICCV 2015.
First-Person Activity Recognition: What Are They Doing to Me? - M. S. Ryoo and Larry Matthies. In CVPR 2013.
Detecting activities of daily living in first-person camera views - Hamed Pirsiavash and Deva Ramanan. In CVPR 2012.
Learning to recognize daily actions using gaze - Alireza Fathi, Yin Li, and James M. Rehg. In ECCV 2012.
Learning to recognize objects in egocentric activities - Alireza Fathi, Xiaofeng Ren, and James M. Rehg. In CVPR 2011.
Fast unsupervised ego-action learning for first-person sports videos - Kris M. Kitani, Takahiro Okabe, Yoichi Sato, and Akihiro Sugimoto. In CVPR 2011. [project page]
Temporal segmentation and activity classification from first-person sensing - Ekaterina H. Spriggs, Fernando De La Torre, and Martial Hebert. In CVPR Workshops 2009.
Wearable hand activity recognition for event summarization - W.W. Mayol and D.W. Murray. In IEEE International Symposium on Wearable Computers, 2005.

Object/Hand Recognition

Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action Recognition from Egocentric RGB Videos - Yilin Wen, Hao Pan, Lei Yang, Jia Pan, Taku Komura, Wenping Wang. In CVPR 2023. [Code] [Video]
Generative Adversarial Network for Future Hand Segmentation from Egocentric Video - Wenqi Jia, Miao Liu, James M. Rehg. In ECCV 2022.
Whose Hand Is This? Person Identification From Egocentric Hand Gestures - Satoshi Tsutsui, Yanwei Fu, and David J. Crandall. In WACV 2021.
Generalizing Hand Segmentation in Egocentric Videos with Uncertainty-Guided Model Adaptation - Minjie Cai, Feng Lu, and Yoichi Sato. In CVPR 2020. [code]
H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions - Bugra Tekin, Federica Bogo, and Marc Pollefeys. In CVPR 2019. [video]
Analysis of Hand Segmentation in the Wild - Aisha Urooj Khan and Ali Borji. In CVPR 2018.
First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations - Guillermo Garcia-Hernando, Shanxin Yuan, Seungryul Baek, and Tae-Kyun Kim. In CVPR 2018. [project page] [code]
Egocentric Gesture Recognition Using Recurrent 3D Convolutional Neural Networks with Spatiotemporal Transformer Modules - Congqi Cao, Yifan Zhang, Yi Wu, Hanqing Lu, and Jian Cheng. In ICCV 2017.
Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions - Sven Bambach, Stefan Lee, David J. Crandall, and Chen Yu. In ICCV 2015.
Detecting Snap Points in Egocentric Video with a Web Photo Prior - Bo Xiong and Kristen Grauman. In ECCV 2014. [project page] [code]
Pixel-level hand detection in ego-centric videos - Cheng Li and Kris M. Kitani. In CVPR 2013. [video] [code]
Context-based vision system for place and object recognition - Antonio Torralba, Kevin P. Murphy, William T. Freeman, Mark A. Rubin. In ICCV 2003. [project page]

Action/Gaze Anticipation

A Hybrid Egocentric Activity Anticipation Framework via Memory-Augmented Recurrent and One-shot Representation Forecasting - Tianshan Liu and Kin-Man Lam. In CVPR 2022.
Learning to Anticipate Egocentric Actions by Imagination - Yu Wu, Linchao Zhu, Xiaohan Wang, Yi Yang, and Fei Wu. In TIP 2021.
Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video - Miao Liu, Siyu Tang, Yin Li, and James M. Rehg. In ECCV 2020. [project page]
How Can I See My Future? FvTraj: Using First-person View for Pedestrian Trajectory Prediction - Huikun Bi, Ruisi Zhang, Tianlu Mao, Zhigang Deng, and Zhaoqi Wang. In ECCV 2020. [presentation video] [summary video]
Multimodal Future Localization and Emergence Prediction for Objects in Egocentric View With a Reachability Prior - Osama Makansi, Ozgun Cicek, Kevin Buchicchio, and Thomas Brox. In CVPR 2020. [demo] [code] [project page]
EGO-TOPO: Environment Affordances from Egocentric Video - Tushar Nagarajan, Yanghao Li, Christoph Feichtenhofer, and Kristen Grauman. In CVPR 2020. [project page] [demo]
What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention - Antonino Furnari and Giovanni Maria Farinella. In ICCV 2019 [code] [demo]
Digging Deeper into Egocentric Gaze Prediction - Hamed R. Tavakoli, Esa Rahtu, Juho Kannala, and Ali Borji. In WACV 2019.
Predicting Gaze in Egocentric Video by Learning Task-dependent Attention Transition - Yifei Huang, Minjie Cai, Zhenqiang Li, and Yoichi Sato. In ECCV 2018 [code]
First-Person Activity Forecasting with Online Inverse Reinforcement Learning - Nicholas Rhinehart and Kris M. Kitani. In ICCV 2017. [project page] [video]
Deep future gaze: Gaze anticipation on egocentric videos using adversarial networks - Mengmi Zhang, Keng Teck Ma, Joo Hwee Lim, Qi Zhao, and Jiashi Feng. In CVPR 2017. [code]
Going deeper into first-person activity recognition - Minghuang Ma, Haoqi Fan, and Kris M. Kitani. In CVPR 2016.
Learning to predict gaze in egocentric video - Yin Li, Alireza Fathi, and James M. Rehg. In ICCV 2013.

Localization

Hand-Priming in Object Localization for Assistive Egocentric Vision - Kyungjun Lee, Abhinav Shrivastava, and Hernisa Kacorri. In WACV 2020.
Egocentric Shopping Cart Localization - Emiliano Spera, Antonino Furnari, Sebastiano Battiato, and Giovanni Maria Farinella. In ICPR 2018.
Recognizing personal locations from egocentric videos - Antonino Furnari, Giovanni Maria Farinella, and Sebastiano Battiato. In IEEE Transactions on Human-Machine Systems 2017.
Personal-Location-Based Temporal Segmentation of Egocentric Video for Lifelogging Applications - Antonino Furnari, Sebastiano Battiato, and Giovanni Maria Farinella. In Journal of Visual Communication and Image Representation 2017. [demo] [project page]
Egocentric Future Localization - Hyun Soo Park, Jyh-Jing Hwang, Yedong Niu, and Jianbo Shi. In CVPR 2016. [demo]
Real-time localization and mapping with wearable active vision - A.J. Davison, W.W. Mayol, and D.W. Murray. In The Second IEEE and ACM International Symposium 2003.

Clustering

Sr-clustering: Semantic regularized clustering for egocentric photo streams segmentation - Mariella Dimiccoli, Marc Bolanosa, Estefania Talavera Maedeh Aghaei, Stavri G. Nikolov, and Petia Radeva. In Computer Vision and Image Understanding 2017.
Summarization and Classification of Wearable Camera Streams by Learning the Distributions over Deep Features of Out-of-Sample Image Sequences - Alessandro Perina, Sadegh Mohammadi, Nebojsa Jojic, and Vittorio Murino. In ICCV 2017.

Video Summarization

Query-focused video summarization: Dataset, evaluation, and a memory network based approach - Aidean Sharghi, Jacob S. Laurel and Boqing Gong. In CVPR 2017. [project page]
Toward storytelling from visual lifelogging: An overview - Marc Bolanos, Mariella Dimiccoli, and Petia Radeva. In IEEE Transactions on Human-Machine Systems 2017.
Story-Driven Summarization for Egocentric Video - Zheng Lu and Kristen Grauman. In CVPR 2013 [project page]
Discovering Important People and Objects for Egocentric Video Summarization - Yong Jae Lee, Joydeep Ghosh, and Kristen Grauman. In CVPR 2012. [project page]

Social Interactions

EgoCom: A Multi-person Multi-modal Egocentric Communications Dataset - Curtis G. Northcutt, Shengxin Zha, Steven Lovegrove, and Richard Newcombe. In PAMI 2020.
Deep Dual Relation Modeling for Egocentric Interaction Recognition - Haoxin Li, Yijun Cai, and Wei-Shi Zheng. In CVPR 2019.
Recognizing Micro-Actions and Reactions from Paired Egocentric Videos - Ryo Yonetani, Kris M. Kitani, and Yoichi Sato. In CVPR 2016.

Pose Estimation

AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand Pose Estimation - Takehiko Ohkawa, Kun He, Fadime Sener, Tomas Hodan, LUAN TRAN, Cem Keskin. In CVPR 2023.
Scene-aware Egocentric 3D Human Pose Estimation - Jian Wang, Diogo Luvizon, Weipeng Xu, Lingjie Liu, Kripasindhu Sarkar, Christian Theobalt. In CVPR 2023.
Ego-Body Pose Estimation via Ego-Head Pose Estimation - Jiaman Li · Karen Liu · Jiajun Wu. In CVPR 2023.
EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices - Siwei Zhang, Qianli Ma, Yan Zhang, Zhiyin Qian, Taein Kwon, Marc Pollefeys, Federica Bogo, Siyu Tang. In ECCV 2022. [project page] [dataset] [code]
UnrealEgo: A New Dataset for Robust Egocentric 3D Human Motion Capture - Hiroyasu Akada, Jian Wang, Soshi Shimada, Masaki Takahashi, Christian Theobalt, Vladislav Golyanik. In ECCV 2022. [project page] [code] [dataset] [demo]
Estimating Egocentric 3D Human Pose in the Wild with External Weak Supervision - Jian Wang, Lingjie Liu, Weipeng Xu, Kripasindhu Sarkar, Diogo Luvizon, Christian Theobalt. In CVPR 2022. [project page] [dataset] [demo]
Estimating Egocentric 3D Human Pose in Global Space - Jian Wang, Lingjie Liu, Weipeng Xu, Kripasindhu Sarkar, Christian Theobalt. In ICCV 2021. [project page] [dataset] [demo]
Automatic Calibration of the Fisheye Camera for Egocentric 3D Human Pose Estimation From a Single Image - Yahui Zhang, Shaodi You, and Theo Gevers. In WACV 2021.
You2Me: Inferring Body Pose in Egocentric Video via First and Second Person Interactions - Evonne Ng, Donglai Xiang, Hanbyul Joo, and Kristen Grauman. In CVPR 2020. [demo] [project page] [dataset] [code]
Ego-Pose Estimation and Forecasting as Real-Time PD Control - Ye Yuan and Kris Kitani. In ICCV 2019. [code] [project page] [demo]
xR-EgoPose: Egocentric 3D Human Pose From an HMD Camera - Denis Tome, Patrick Peluse, Lourdes Agapito, and Hernan Badino. In ICCV 2019. [demo] [dataset]
Seeing Invisible Poses: Estimating 3D Body Pose from Egocentric Video - Hao Jiang and Kristen Grauman. In CVPR 2017.
First-Person Pose Recognition using Egocentric Workspaces - Gregory Rogez, James S. Supancic, and Deva Ramanan. In CVPR 2015.

Human Object Interaction

ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation - Zicong Fan, Omid Taheri, Dimitrios Tzionas, Muhammed Kocabas, Manuel Kaufmann, Michael J. Black, Otmar Hilliges. In CVPR 2023. [code]
Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications - Lingzhi Zhang, Shenghao Zhou, Simon Stent, Jianbo Shi. In ECCV 2022. [project page] [code] [dataset]
HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction - Yunze Liu, Yun Liu, Che Jiang, Kangbo Lyu, Weikang Wan, Hao Shen, Boqiang Liang, Zhoujie Fu, He Wang, Li Yi. In CVPR 2022. [project page] [video]
Hand-Object Contact Prediction via Motion-Based Pseudo-Labeling and Guided Progressive Label Correction - Takuma Yagi, Md Tasnimul Hasan, and Yoichi Sato. In BMVC 2021. [project page] [code]
The MECCANO Dataset: Understanding Human-Object Interactions from Egocentric Videos in an Industrial-like Domain - Francesco Ragusa, Antonino Furnari, Salvatore Livatino, and Giovanni Maria Farinella. In WACV 2021. [project page]
Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video - Miao Liu, Siyu Tang, Yin Li, and James M. Rehg. In ECCV 2020. [project page]
You-Do, I-Learn: Discovering Task Relevant Objects and their Modes of Interaction from Multi-User Egocentric Video - Dima Damen, Tessid Leelasawassuk, Osian Haines, Andrew Calway,and Walterio Mayol-Cuevas. In BMVC 2014 [project page]
Automated capture and delivery of assistive task guidance with an eyewear computer: the GlaciAR system - Teesid Leelasawassuk, Dima Damen, and Walterio Mayol-Cuevas. In Augmented Human International Conference, ACM 2017.

Temporal Boundary Detection

Trespassing the Boundaries: Labeling Temporal Bounds for Object Interactions in Egocentric Video - Davide Moltisanti, Michael Wray, Walterio Mayol-Cuevas, and Dima Damen. In ICCV 2017.
Temporal segmentation of egocentric videos -Yair Poleg, Chetan Arora, and Shmuel Peleg. In CVPR 2014.

Privacy in Egocentric Videos

Is Sharing of Egocentric Video Giving Away Your Biometric Signature? - Daksh Thapar, Chetan Arora, and Aditya Nigam. In ECCV 2020. [project page]
Mitigating Bystander Privacy Concerns in Egocentric Activity Recognition with Deep Learning and Intentional Image Degradation - Mariella Dimiccoli, Juan Marin, and Edison Thomaz. In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2018.
Privacy-Preserving Human Activity Recognition from Extreme Low Resolution - Michael S. Ryoo, Brandon Rothrock, Charles Fleming, and Hyun Jong Yang. In AAAI 2017.
Ego-Surfing First Person Videos - Ryo Yonetani, Kris M. Kitani, and Yoichi Sato. In CVPR 2015.

Multiple Egocentric Tasks

Egocentric Video-Language Pretraining - Kevin Qinghong Lin, Alex Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, Rongcheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu and Mike Zheng Shou. In NeurIPS 2022. [project page] [code]
Ego4D: Around the World in 3,000 Hours of Egocentric Video - Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Christian Fuegen, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei Huang, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Bernard Ghanem, Vamsi Krishna Ithapu, C.V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, and Jitendra Malik. In CVPR 2022. [Github] [project page] [video]

Task Understanding

My View is the Best View: Procedure Learning from Egocentric Videos - Siddhant Bansal, Chetan Arora, C.V. Jawahar. In ECCV 2022. [project page] [dataset] [code]
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant - Benita Wong, Joya Chen, You Wu, Stan Weixian Lei, Dongxing Mao, Difei Gao, Mike Zheng Shou. In ECCV 2022. [project page] [code]

Miscellaneous (New Tasks)

Tracking Multiple Deformable Objects in Egocentric Videos - Mingzhen Huang, Xiaoxing Li, Jun Hu, Honghong Peng, Siwei Lyu. In CVPR 2023.
Egocentric Audio-Visual Object Localization - Chao Huang · Yapeng Tian · Anurag Kumar · Chenliang Xu. In CVPR 2023. [project page]
Balanced Spherical Grid for Egocentric View Synthesis - Changwoon Choi · Sang Min Kim · Young Min Kim. In CVPR 2023. [code]
Ego-Body Pose Estimation via Ego-Head Pose Estimation - Jiaman Li · Karen Liu · Jiajun Wu. In CVPR 2023.
Egocentric Video Task Translation - Zihui Xue · Yale Song · Kristen Grauman · Lorenzo Torresani. In CVPR 2023.
Egocentric Auditory Attention Localization in Conversations - Fiona Ryan · Hao Jiang · Abhinav Shukla · James Rehg · Vamsi Krishna Ithapu. In CVPR 2023. [project page]
Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization - Mengmeng Xu · Yanghao Li · Cheng-Yang Fu · Bernard Ghanem · Tao Xiang · Juan-Manuel Perez-Rua. In CVPR 2023. [project page]
Egocentric Auditory Attention Localization in Conversations - Fiona Ryan · Hao Jiang · Abhinav Shukla · James Rehg · Vamsi Krishna Ithapu. In CVPR 2023. [project page]
Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations - Sagnik Majumder · Hao Jiang · Pierre Moulon · Ethan Henderson · Paul Calamia · Kristen Grauman · Vamsi Krishna Ithapu. In CVPR 2023.
EgoTaskQA: Understanding Human Tasks in Egocentric Videos - Baoxiong Jia, Ting Lei, Song-Chun Zhu, Siyuan Huang. In NeurIPS 2022. [projet page] [code]
Robust Egocentric Photo-realistic Facial Expression Transfer for Virtual Reality - Amin Jourabloo, Fernando De la Torre, Jason Saragih, Shih-En Wei, Stephen Lombardi, Te-Li Wang, Danielle Belko, Autumn Trimble, Hernan Badino. In CVPR 2022.
Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos - Shaowei Liu, Subarna Tripathi, Somdeb Majumdar, Xiaolong Wang. In CVPR 2022. [project page] [video] [slides]
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization - Hao Jiang, Calvin Murdock, Vamsi Krishna Ithapu. In CVPR 2022.
Egocentric Scene Understanding via Multimodal Spatial Rectifier - Tien Do, Khiem Vuong, Hyun Soo Park. In CVPR 2022.
Egocentric Prediction of Action Target in 3D - Yiming Li, Ziang Cao, Andrew Liang, Benjamin Liang, Luoyao Chen, Hang Zhao, Chen Feng. In CVPR 2022.
Slow-Fast Auditory Streams for Audio Recognition - Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, and Dima Damen. ICASSP 2021. [project page] [code]
Ego-Exo: Transferring Visual Representations From Third-Person to First-Person Videos - Yanghao Li, Tushar Nagarajan, Bo Xiong, Kristen Grauman. In CVPR 2021. [code]
EGO-SLAM: A Robust Monocular SLAM for Egocentric Videos - Suvam Patra, Kartikeya Gupta, Faran Ahmad, Chetan Arora, and Subhashis Banerjee. In WACV 2019. [code]
Egocentric Basketball Motion Planning from a Single First-Person Image - Gedas Bertasius, Aaron Chan, and Jianbo Shi. In CVPR 2018. [demo]
Jointly Learning Energy Expenditures and Activities using Egocentric Multimodal Signals - Katsuyuki Nakamura, Serena Yeung, Alexandre Alahi, and Li Fei-Fei. In CVPR 2017.
Walk and Learn: Facial Attribute Representation Learning from Egocentric Video and Contextual Data - Jing Wang, Yu Cheng, and Rogerio Schmidt Feris. In CVPR 2016. [demo]
Compact CNN for Indexing Egocentric Videos - Yair Poleg, Ariel Ephrat, Shmuel Peleg, and Chetan Arora. In WACV 2016.
Detecting engagement in egocentric video - Yu-Chuan Su and Kristen Grauman. In ECCV 2016.
EgoSampling: Fast-Forward and Stereo for Egocentric Videos - Yair Poleg, Tavi Halperin, Chetan Arora, and Shmuel Peleg. In CVPR 2015.

Clustered according to the conferences.

CVPR

Therbligs In Action: Video Understanding through Motion Primitives - Eadom Dessalene, Michael Maynord, Cornelia Fermu ̈ller, Yiannis Aloimonos. In CVPR 2023. [project page]
Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action Recognition from Egocentric RGB Videos - Yilin Wen, Hao Pan, Lei Yang, Jia Pan, Taku Komura, Wenping Wang. In CVPR 2023. [Code] [Video]
MMG-Ego4D: Multimodal Generalization in Egocentric Action Recognition - Xinyu Gong, Sreyas Mohan, Naina Dhingra, Jean-Charles Bazin, YILEI LI, Zhangyang Wang, Rakesh Ranjan. In CVPR 2023.
AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand Pose Estimation - Takehiko Ohkawa, Kun He, Fadime Sener, Tomas Hodan, LUAN TRAN, Cem Keskin. In CVPR 2023.
Scene-aware Egocentric 3D Human Pose Estimation - Jian Wang, Diogo Luvizon, Weipeng Xu, Lingjie Liu, Kripasindhu Sarkar, Christian Theobalt. In CVPR 2023.
Tracking Multiple Deformable Objects in Egocentric Videos - Mingzhen Huang, Xiaoxing Li, Jun Hu, Honghong Peng, Siwei Lyu. In CVPR 2023.
Egocentric Audio-Visual Object Localization - Chao Huang · Yapeng Tian · Anurag Kumar · Chenliang Xu. In CVPR 2023. [project page]
Balanced Spherical Grid for Egocentric View Synthesis - Changwoon Choi · Sang Min Kim · Young Min Kim. In CVPR 2023. [code]
Egocentric Video Task Translation - Zihui Xue · Yale Song · Kristen Grauman · Lorenzo Torresani. In CVPR 2023.
Egocentric Auditory Attention Localization in Conversations - Fiona Ryan · Hao Jiang · Abhinav Shukla · James Rehg · Vamsi Krishna Ithapu. In CVPR 2023. [project page]
Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations - Sagnik Majumder · Hao Jiang · Pierre Moulon · Ethan Henderson · Paul Calamia · Kristen Grauman · Vamsi Krishna Ithapu. In CVPR 2023.
ARCTIC: A Dataset for Dexterous Bimanual Hand-Object Manipulation - Zicong Fan, Omid Taheri, Dimitrios Tzionas, Muhammed Kocabas, Manuel Kaufmann, Michael J. Black, Otmar Hilliges. In CVPR 2023. [code]
[Learning Video Representations from Large Language Models](https://arxiv.org/pdf/2212.04501.pdf; https://facebookresearch.github.io/LaViLa) - Yue Zhao, Ishan Misra, Philipp Krähenbühl, Rohit Girdhar. In CVPR 2023. [project page] [code] [demo]
Ego4D: Around the World in 3,000 Hours of Egocentric Video - Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Christian Fuegen, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei Huang, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Bernard Ghanem, Vamsi Krishna Ithapu, C.V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, and Jitendra Malik. In CVPR 2022. [Github] [project page] [video]
HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction - Yunze Liu, Yun Liu, Che Jiang, Kangbo Lyu, Weikang Wan, Hao Shen, Boqiang Liang, Zhoujie Fu, He Wang, Li Yi. In CVPR 2022. [project page] [video]
E2(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition - Chiara Plizzari, Mirco Planamente, Gabriele Goletto, Marco Cannici, Emanuele Gusso, Matteo Matteucci, Barbara Caputo. In CVPR 2022.
Estimating Egocentric 3D Human Pose in the Wild with External Weak Supervision - Jian Wang, Lingjie Liu, Weipeng Xu, Kripasindhu Sarkar, Diogo Luvizon, Christian Theobalt. In CVPR 2022. [project page] [dataset] [demo]
Robust Egocentric Photo-realistic Facial Expression Transfer for Virtual Reality - Amin Jourabloo, Fernando De la Torre, Jason Saragih, Shih-En Wei, Stephen Lombardi, Te-Li Wang, Danielle Belko, Autumn Trimble, Hernan Badino. In CVPR 2022.
Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos - Shaowei Liu, Subarna Tripathi, Somdeb Majumdar, Xiaolong Wang. In CVPR 2022. [project page] [video] [slides]
A Hybrid Egocentric Activity Anticipation Framework via Memory-Augmented Recurrent and One-shot Representation Forecasting - Tianshan Liu and Kin-Man Lam. In CVPR 2022.
Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization - Hao Jiang, Calvin Murdock, Vamsi Krishna Ithapu. In CVPR 2022.
Egocentric Scene Understanding via Multimodal Spatial Rectifier - Tien Do, Khiem Vuong, Hyun Soo Park. In CVPR 2022.
Egocentric Prediction of Action Target in 3D - Yiming Li, Ziang Cao, Andrew Liang, Benjamin Liang, Luoyao Chen, Hang Zhao, Chen Feng. In CVPR 2022.
Ego-Exo: Transferring Visual Representations From Third-Person to First-Person Videos - Yanghao Li, Tushar Nagarajan, Bo Xiong, Kristen Grauman. In CVPR 2021. [code]
Multi-Modal Domain Adaptation for Fine-Grained Action Recognition - Jonathan Munro and Dima Damen. In CVPR 2020. [project page] [code]
Generalizing Hand Segmentation in Egocentric Videos with Uncertainty-Guided Model Adaptation - Minjie Cai, Feng Lu, and Yoichi Sato. In CVPR 2020. [code]
Multimodal Future Localization and Emergence Prediction for Objects in Egocentric View With a Reachability Prior - Osama Makansi, Ozgun Cicek, Kevin Buchicchio, and Thomas Brox. In CVPR 2020. [demo] [code] [project page]
EGO-TOPO: Environment Affordances from Egocentric Video - Tushar Nagarajan, Yanghao Li, Christoph Feichtenhofer, and Kristen Grauman. In CVPR 2020. [project page] [demo]
You2Me: Inferring Body Pose in Egocentric Video via First and Second Person Interactions - Evonne Ng, Donglai Xiang, Hanbyul Joo, and Kristen Grauman. In CVPR 2020. [demo] [project page] [dataset] [code]
LSTA: Long Short-Term Attention for Egocentric Action Recognition - Swathikiran Sudhakaran, Sergio Escalera, and Oswald Lanz. In CVPR 2019. [code]
H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions - Bugra Tekin, Federica Bogo, and Marc Pollefeys. In CVPR 2019. [video]
Deep Dual Relation Modeling for Egocentric Interaction Recognition - Haoxin Li, Yijun Cai, and Wei-Shi Zheng. In CVPR 2019.
Egocentric Activity Recognition on a Budget - Rafael Possas, Sheila Pinto Caceres, and Fabio Ramos. In CVPR 2018. [demo]
From Lifestyle VLOGs to Everyday Interaction - David F. Fouhey, Weicheng Kuo, Alexei A. Efros, and Jitendra Malik. In CVPR 2018. [project page]
Actor and Observer: Joint Modeling of First and Third-Person Videos - Gunnar A. Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, and Karteek Alahari. In CVPR 2018. [code]
Analysis of Hand Segmentation in the Wild - Aisha Urooj Khan and Ali Borji. In CVPR 2018.
First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations - Guillermo Garcia-Hernando, Shanxin Yuan, Seungryul Baek, and Tae-Kyun Kim. In CVPR 2018. [project page] [code]
Egocentric Basketball Motion Planning from a Single First-Person Image - Gedas Bertasius, Aaron Chan, and Jianbo Shi. In CVPR 2018. [demo]
Query-focused video summarization: Dataset, evaluation, and a memory network based approach - Aidean Sharghi, Jacob S. Laurel and Boqing Gong. In CVPR 2017. [project page]
Deep future gaze: Gaze anticipation on egocentric videos using adversarial networks - Mengmi Zhang, Keng Teck Ma, Joo Hwee Lim, Qi Zhao, and Jiashi Feng. In CVPR 2017. [code]
Jointly Learning Energy Expenditures and Activities using Egocentric Multimodal Signals - Katsuyuki Nakamura, Serena Yeung, Alexandre Alahi, and Li Fei-Fei. In CVPR 2017.
Seeing Invisible Poses: Estimating 3D Body Pose from Egocentric Video - Hao Jiang and Kristen Grauman. In CVPR 2017.
First Person Action Recognition Using Deep Learned Descriptors - Suriya Singh, Chetan Arora, and C.V. Jawahar. In CVPR 2016. [project page] [code]
Going deeper into first-person activity recognition - Minghuang Ma, Haoqi Fan, and Kris M. Kitani. In CVPR 2016.
Egocentric Future Localization - Hyun Soo Park, Jyh-Jing Hwang, Yedong Niu, and Jianbo Shi. In CVPR 2016. [demo]
Recognizing Micro-Actions and Reactions from Paired Egocentric Videos - Ryo Yonetani, Kris M. Kitani, and Yoichi Sato. In CVPR 2016.
Walk and Learn: Facial Attribute Representation Learning from Egocentric Video and Contextual Data - Jing Wang, Yu Cheng, and Rogerio Schmidt Feris. In CVPR 2016. [demo]
Delving into egocentric actions - Yin Li, Zhefan Ye, and James M. Rehg. In CVPR 2015.
Pooled Motion Features for First-Person Videos - Michael S. Ryoo, Brandon Rothrock, and Larry H. Matthies. In CVPR 2015.
EgoSampling: Fast-Forward and Stereo for Egocentric Videos - Yair Poleg, Tavi Halperin, Chetan Arora, and Shmuel Peleg. In CVPR 2015.
Ego-Surfing First Person Videos - Ryo Yonetani, Kris M. Kitani, and Yoichi Sato. In CVPR 2015.
First-Person Pose Recognition using Egocentric Workspaces - Gregory Rogez, James S. Supancic, and Deva Ramanan. In CVPR 2015.
Temporal segmentation of egocentric videos -Yair Poleg, Chetan Arora, and Shmuel Peleg. In CVPR 2014.
First-Person Activity Recognition: What Are They Doing to Me? - M. S. Ryoo and Larry Matthies. In CVPR 2013.
Pixel-level hand detection in ego-centric videos - Cheng Li and Kris M. Kitani. In CVPR 2013. [video] [code]
Story-Driven Summarization for Egocentric Video - Zheng Lu and Kristen Grauman. In CVPR 2013 [project page]
Detecting activities of daily living in first-person camera views - Hamed Pirsiavash and Deva Ramanan. In CVPR 2012.
Discovering Important People and Objects for Egocentric Video Summarization - Yong Jae Lee, Joydeep Ghosh, and Kristen Grauman. In CVPR 2012. [project page]
Learning to recognize objects in egocentric activities - Alireza Fathi, Xiaofeng Ren, and James M. Rehg. In CVPR 2011.
Fast unsupervised ego-action learning for first-person sports videos - Kris M. Kitani, Takahiro Okabe, Yoichi Sato, and Akihiro Sugimoto. In CVPR 2011. [project page]

Datasets

EgoProceL - 62 hours of egocentric videos recorded by 130 subjects performing 16 tasks for procedure learning.
EgoBody - Large-scale dataset capturing ground-truth 3D human motions during social interactions in 3D scenes.
UnrealEgo - Large-scale naturalistic dataset for egocentric 3D human pose estimation.
Hand-object Segments - Hand-object interactions in 11,235 frames from 1,000 videos covering daily activities in diverse scenarios.
Ego4D - 3,025 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 855 unique camera wearers from 74 worldwide locations and 9 different countries.
HOI4D - HOI4D consists of 2.4M RGB-D egocentric video frames over 4000 sequences collected by 9 participants interacting with 800 different object instances from 16 categories over 610 different indoor rooms.
EgoCom - A natural conversations dataset containing multi-modal human communication data captured simultaneously from the participants' egocentric perspectives.
TREK-100 - Object tracking in first person vision.
MECCANO - 20 subject assembling a toy motorbike.
EPIC-Kitchens 2020 - Subjects performing unscripted actions in their native environments.
EPIC-Tent - 29 participants assembling a tent while wearing two head-mounted cameras. [paper]
EGO-CH - 70 subjects visiting two cultural sites in Sicily, Italy.
EPIC-Kitchens 2018 - 32 subjects performing unscripted actions in their native environments.
Charade-Ego - Paired first-third person videos.
EGTEA Gaze+ - 32 subjects, 86 cooking sessions, 28 hours.
ADL - 20 subjects performing daily activities in their native environments.
CMU kitchen - Multimodal, 18 subjects cooking 5 different recipes: brownies, eggs, pizza, salad, sandwich.
EgoSeg - Long term actions (walking, running, driving, etc.)
First-Person Social Interactions - 8 subjects at disneyworld.
UEC Dataset - Two choreographed datasets with different egoactions (walk, jump, climb, etc.) + 6 YouTube sports videos.
JPL - Interaction with a robot.
FPPA - Five subjects performing 5 daily actions.
UT Egocentric - 3-5 hours long videos capturing a person's day.
VINST/ Visual Diaries - 31 videos capturing the visual experience of a subject walking from metro station to work.
Bristol Egocentric Object Interaction (BEOID) - 8 subjects, six locations. Interaction with objects and environment.
Object Search Dataset - 57 sequences of 55 subjects on search and retrieval tasks.
UNICT-VEDI - Different subjects visiting a museum.
UNICT-VEDI-POI - Different subjects visiting a museum.
Simulated Egocentric Navigations - Simulated navigations of a virtual agent within a large building.
EgoCart - Egocentric images collected by a shopping cart in a retail store.
Unsupervised Segmentation of Daily Living Activities - Egocentric videos of daily activities.
Visual Market Basket Analysis - Egocentric images collected by a shopping cart in a retail store.
Location Based Segmentation of Egocentric Videos - Egocentric videos of daily activities.
Recognition of Personal Locations from Egocentric Videos - Egocentric videos clips of daily.
EgoGesture - 2k videos from 50 subjects performing 83 gestures.
EgoHands - 48 videos of interactions between two people.
DoMSEV - 80 hours/different activities.
DR(eye)VE - 74 videos of people driving.
THU-READ - 8 subjects performing 40 actions with a head-mounted RGBD camera.
EgoDexter - 4 sequences with 4 actors (2 female), and varying interactions with various objects and and cluttered background. [paper]
First-Person Hand Action (FPHA) - 3D hand-object interaction. Includes 1175 videos belonging to 45 different activity categories performed by 6 actors. [paper]
UTokyo Paired Ego-Video (PEV) - 1,226 pairs of first-person clips extracted from the ones recorded synchronously during dyadic conversations.
UTokyo Ego-Surf - Contains 8 diverse groups of first-person videos recorded synchronously during face-to-face conversations.
TEgO: Teachable Egocentric Objects Dataset - Contains egocentric images of 19 distinct objects taken by two people for training a teachable object recognizer.
Multimodal Focused Interaction Dataset - Contains 377 minutes of continuous multimodal recording captured during 19 sessions, with 17 conversational partners in 18 different indoor/outdoor locations.

Contribute

This is a work in progress. Contributions welcome! Read the contribution guidelines first.

Sid2697/awesome-egocentric-vision

Sid2697

Reviews

Repository Details