6D Object Pose Estimation: Papers and Codes
This repository summarizes papers and codes for 6D Object Pose Estimation of rigid objects, which means computing the 6D transformation from the object coordinate to the camera coordinate. Let Xo​ represents the object's points in the object coordinate, and Xc​ represents the object's points in the camera coordinate, the 6D object pose ​_T_ satisfies ​_Xc = T * Xo​_ and ​_T = [R, t]​_ contains the three dimensional rotation R​ and the three dimensional translation ​_t​_.
Most of the current methods aim at instance-level 6D object pose estimation, which means that the identical 3D model exists. There also emerges category-level 6D object pose estimation, which means that the observed object could be not identical to existing 3D models but come from a same geometric category. Based on the inputs, methods can also be categorized into RGB-D image-based methods and point cloud-based methods.
Table of Contents
    1.1 RGB-D Image-based Methods
        1.1.1. Correspondence-based Methods
            a. Match 2D feature points
            b. Regress 2D projections
        1.1.2. Template-based Methods
        1.1.3. Voting-based Methods
    2.1. Category-level 6D pose estimation
    2.2. 3D shape reconstruction from images
    2.3. 3D shape rendering
1. Instance-level 6D Object Pose Estimation
1.1 RGB-D Image-based Methods
Survey papers:
[arXiv] A Review on Object Pose Recovery: from 3D Bounding Box Detectors to Full 6D Pose Estimators, [paper]
2016:
[ECCVW] A Summary of the 4th International Workshop on Recovering 6D Object Pose, [paper]
1.1.1 Correspondence-based Methods
a. Match 2D feature points
2020:
[arXiv] Delta Descriptors: Change-Based Place Representation for Robust Visual Localization, [paper]
[arXiv] Unconstrained Matching of 2D and 3D Descriptors for 6-DOF Pose Estimation, [paper]
[arXiv] S2DNet: Learning Accurate Correspondences for Sparse-to-Dense Feature Matching, [paper]
[arXiv] SK-Net: Deep Learning on Point Cloud via End-to-end Discovery of Spatial Keypoints, [paper]
[arXiv] LRC-Net: Learning Discriminative Features on Point Clouds by Encoding Local Region Contexts, [paper]
[arXiv] Table-Top Scene Analysis Using Knowledge-Supervised MCMC, [paper]
[arXiv] AprilTags 3D: Dynamic Fiducial Markers for Robust Pose Estimation in Highly Reflective Environments and Indirect Communication in Swarm Robotics, [paper]
[AAAI] LCD: Learned Cross-Domain Descriptors for 2D-3D Matching, [paper] [project]
2019:
[ICCV] GLAMpoints: Greedily Learned Accurate Match points, [paper]
2016:
[ECCV] LIFT: Learned Invariant Feature Transform, [paper]
2012:
[3DIMPVT] 3D Object Detection and Localization using Multimodal Point Pair Features, [paper]
b. Regress 2D projections
2020:
[arXiv] PrimA6D: Rotational Primitive Reconstruction for Enhanced and Robust 6D Pose Estimation, [paper]
[arXiv] EPOS: Estimating 6D Pose of Objects with Symmetries, [paper]
[arXiv] Tackling Two Challenges of 6D Object Pose Estimation: Lack of Real Annotated RGB Images and Scalability to Number of Objects, [paper]
[arXiv] Squeezed Deep 6DoF Object Detection using Knowledge Distillation, [paper]
[arXiv] Learning 2D–3D Correspondences To Solve The Blind Perspective-n-Point Problem, [paper]
[arXiv] PnP-Net: A hybrid Perspective-n-Point Network, [paper]
[arXiv] Object 6D Pose Estimation with Non-local Attention, [paper]
[arXiv] 6DoF Object Pose Estimation via Differentiable Proxy Voting Loss, [paper]
2019:
[arXiv] DPOD: 6D Pose Object Detector and Refiner, [paper]
[CVPR] Segmentation-driven 6D Object Pose Estimation, [paper] [code]
[arXiv] Single-Stage 6D Object Pose Estimation, [paper]
[arXiv] W-PoseNet: Dense Correspondence Regularized Pixel Pair Pose Regression, [paper]
[arXiv] KeyPose: Multi-view 3D Labeling and Keypoint Estimation for Transparent Objects, [paper]
2018:
[CVPR] Real-time seamless single shot 6d object pose prediction, [paper] [code]
[arXiv] Estimating 6D Pose From Localizing Designated Surface Keypoints, [paper]
2017:
[ICCV] BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth, [paper]
[ICCV] SSD-6D: Making rgb-based 3d detection and 6d pose estimation great again, [paper] [code]
[ICRA] 6-DoF Object Pose from Semantic Keypoints, [paper]
1.1.2 Template-based Methods
This kind of methods can be regarded as regression-based methods.
2020:
[IROS] se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains, [paper] [github]
[arXiv] A survey on deep supervised hashing methods for image retrieval, [paper]
[arXiv] Neural Object Learning for 6D Pose Estimation Using a Few Cluttered Images, [paper]
[arXiv] How to track your dragon: A Multi-Attentional Framework for real-time RGB-D 6-DOF Object Pose Tracking, [paper]
[arXiv] Self6D: Self-Supervised Monocular 6D Object Pose Estimation, [paper]
[arXiv] A Novel Pose Proposal Network and Refinement Pipeline for Better Object Pose Estimation, [paper]
[arXiv] G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features, [paper] [code]
[arXiv] Neural Mesh Refiner for 6-DoF Pose Estimation, [paper]
[arXiv] MobilePose: Real-Time Pose Estimation for Unseen Objects with Weak Shape Supervision, [paper]
[arXiv] Robust 6D Object Pose Estimation by Learning RGB-D Features, [paper]
[arXiv] HybridPose: 6D Object Pose Estimation under Hybrid Representations, [paper] [code]
[ICRA] Robust, Occlusion-aware Pose Estimation for Objects Grasped by Adaptive Hands, [paper] [github]
2019:
[arXiv] P2GNet: Pose-Guided Point Cloud Generating Networks for 6-DoF Object Pose Estimation, [paper]
[arXiv] ConvPoseCNN: Dense Convolutional 6D Object Pose Estimation, [paper]
[arXiv] PointPoseNet: Accurate Object Detection and 6 DoF Pose Estimation in Point Clouds, [paper]
[RSS] PoseRBPF: A Rao-Blackwellized Particle Filter for 6D Object Pose Tracking, [paper]
[arXiv] Multi-View Matching Network for 6D Pose Estimation, [paper]
[arXiv] Fast 3D Pose Refinement with RGB Images, [paper]
[arXiv] MaskedFusion: Mask-based 6D Object Pose Detection, [paper]
[CoRL] Scene-level Pose Estimation for Multiple Instances of Densely Packed Objects, [paper]
[IROS] Learning to Estimate Pose and Shape of Hand-Held Objects from RGB Images, [paper]
[IROSW] Motion-Nets: 6D Tracking of Unknown Objects in Unseen Environments using RGB, [paper]
[ICCV] DPOD: 6D Pose Object Detector and Refiner, [paper]
[ICCV] CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation, [paper] [code]
[ICCV] Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation, [paper] [code]
[ICCV] Explaining the Ambiguity of Object Detection and 6D Pose From Visual Data, [paper]
[arXiv] Active 6D Multi-Object Pose Estimation in Cluttered Scenarios with Deep Reinforcement Learning, [paper]
[arXiv] Accurate 6D Object Pose Estimation by Pose Conditioned Mesh Reconstruction, [paper]
[arXiv] Learning Object Localization and 6D Pose Estimation from Simulation and Weakly Labeled Real Images, [paper]
[ICHR] Refining 6D Object Pose Predictions using Abstract Render-and-Compare, [paper]
[arXiv] Deep-6dpose: recovering 6d object pose from a single rgb image, [paper]
[arXiv] Real-time Background-aware 3D Textureless Object Pose Estimation, [paper]
[IROS] SilhoNet: An RGB Method for 6D Object Pose Estimation, [paper]
2018:
[ECCV] AAE: Implicit 3D Orientation Learning for 6D Object Detection From RGB Images, [paper] [code]
[ECCV] DeepIM:Deep Iterative Matching for 6D Pose Estimation, [paper] [code]
[RSS] Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes, [paper] [code]
[IROS] Robust 6D Object Pose Estimation in Cluttered Scenes using Semantic Segmentation and Pose Regression Networks, [paper]
2012:
[ACCV] Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes, [paper]
1.1.3 Voting-based Methods
2019:
[CVPR] PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation, [paper] [code]
2017:
[TPAMI] Robust 3D Object Tracking from Monocular Images Using Stable Parts, [paper]
[Access] Fast Object Pose Estimation Using Adaptive Threshold for Bin-picking, [paper]
2014:
[ECCV] Learning 6d object pose estimation using 3d object coordinate, [paper]
[ECCV] Latent-class hough forests for 3d object detection and pose estimation, [paper]
LineMOD: Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes, ACCV, 2012 [paper] [database]
YCB Datasets: The YCB Object and Model Set: Towards Common Benchmarks for Manipulation Research, IEEE International Conference on Advanced Robotics (ICAR), 2015 [paper]
T-LESS Datasets: T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects, IEEE Winter Conference on Applications of Computer Vision (WACV), 2017 [paper]
HomebrewedDB: RGB-D Dataset for 6D Pose Estimation of 3D Objects, ICCVW, 2019 [paper]
YCB-M: A Multi-Camera RGB-D Dataset for Object Recognition and 6DoF Pose Estimation, arXiv, 2020, [paper] [database]
2 Category-level Methods
2.1 Category-level 6D pose estimation
2021:
[IROS] BundleTrack: 6D Pose Tracking for Novel Objects without Instance or Category-Level 3D Models, [paper] [github]
2020:
[arXiv] CPS: Class-level 6D Pose and Shape Estimation From Monocular Images, [paper]
[arXiv] Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation, [paper]
2019:
[arXiv] Category-Level Articulated Object Pose Estimation, [paper]
[arXiv] LatentFusion: End-to-End Differentiable Reconstruction and Rendering for Unseen Object Pose Estimation, [paper]
[arXiv] 6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints, [paper] [code]
[arXiv] Self-Supervised 3D Keypoint Learning for Ego-motion Estimation, [paper]
[CVPR] Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation, [paper] [code]
[arXiv] Instance- and Category-level 6D Object Pose Estimation, [paper]
[arXiv] kPAM: KeyPoint Affordances for Category-Level Robotic Manipulation, [paper]
2.2 3D shape reconstruction from images
2020:
[arXiv] Joint Hand-object 3D Reconstruction from a Single Image with Cross-branch Feature Fusion, [paper]
[arXiv] Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from Single and Multiple Images, [paper]
[arXiv] 3D Shape Reconstruction from Free-Hand Sketches, [paper]
[arXiv] Learning to Detect 3D Reflection Symmetry for Single-View Reconstruction, [paper]
[arXiv] Convolutional Generation of Textured 3D Meshes, [paper]
[arXiv] 3D Reconstruction of Novel Object Shapes from Single Images, [paper]
[arXiv] Novel Object Viewpoint Estimation through Reconstruction Alignment, [paper]
[arXiv] UCLID-Net: Single View Reconstruction in Object Space, [paper]
[arXiv] SurfaceNet+: An End-to-end 3D Neural Network for Very Sparse Multi-view Stereopsis, [paper]
[arXiv] FroDO: From Detections to 3D Objects, [paper]
[arXiv] CoReNet: Coherent 3D scene reconstruction from a single RGB image, [paper]
[arXiv] Reconstruct, Rasterize and Backprop: Dense shape and pose estimation from a single image, [paper]
[arXiv] Through the Looking Glass: Neural 3D Reconstruction of Transparent Shapes, [paper]
[arXiv] Few-Shot Single-View 3-D Object Reconstruction with Compositional Priors, [paper]
[arXiv] Neural Object Descriptors for Multi-View Shape Reconstruction, [paper]
[arXiv] Leveraging 2D Data to Learn Textured 3D Mesh Generation, [paper]
[arXiv] Deep 3D Capture: Geometry and Reflectance from Sparse Multi-View Images, [paper]
[arXiv] Self-Supervised 2D Image to 3D Shape Translation with Disentangled Representations, [paper]
[arXiv] Atlas: End-to-End 3D Scene Reconstruction from Posed Images, [paper]
[arXiv] Instant recovery of shape from spectrum via latent space connections, [paper]
[arXiv] Self-supervised Single-view 3D Reconstruction via Semantic Consistency, [paper]
[arXiv] Meta3D: Single-View 3D Object Reconstruction from Shape Priors in Memory, [paper]
[arXiv] STD-Net: Structure-preserving and Topology-adaptive Deformation Network for 3D Reconstruction from a Single Image, [paper]
[arXiv] Inverse Graphics GAN: Learning to Generate 3D Shapes from Unstructured 2D Data, [paper]
[arXiv] Deep NRSfM++: Towards 3D Reconstruction in the Wild, [paper]
[arXiv] Learning to Correct 3D Reconstructions from Multiple Views, [paper]
2019:
[arXiv] Boundary Cues for 3D Object Shape Recovery, [paper]
[arXiv] Learning to Generate Dense Point Clouds with Textures on Multiple Categories, [paper]
[arXiv] Front2Back: Single View 3D Shape Reconstruction via Front to Back Prediction, [paper]
[arXiv] Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision, [paper]
[arXiv] SDFDiff: Differentiable Rendering of Signed Distance Fields for 3D Shape Optimization, [paper]
[arXiv] 3D-GMNet: Learning to Estimate 3D Shape from A Single Image As A Gaussian Mixture, [paper]
[arXiv] Deep-Learning Assisted High-Resolution Binocular Stereo Depth Reconstruction, [paper]
2.3 3D shape rendering
2020:
[arXiv] Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image Decomposition, [paper]
[arXiv] SPSG: Self-Supervised Photometric Scene Generation from RGB-D Scans, [paper]
[arXiv] Differentiable Rendering: A Survey, [paper]
[arXiv] Equivariant Neural Rendering, [paper]
2019:
[arXiv] SynSin: End-to-end View Synthesis from a Single Image, [paper] [project]
[arXiv] Neural Point Cloud Rendering via Multi-Plane Projection, [paper]
[arXiv] Neural Voxel Renderer: Learning an Accurate and Controllable Rendering Tool, [paper]