Vision-based Robotic Grasping: Papers and Codes
The essential information to grasp the target object is the 6D gripper pose in the camera coordinate, which contains the 3D gripper position and the 3D gripper orientation to execute the grasp. Within the methods of vision-based robotic grasping, the estimation of 6D gripper poses varies aiming at different grasp manners, which can be categorized into 2D planar grasp and 6DoF grasp.
2D planar grasp means that the target object lies on a plane workspace and the grasp is constrained from one direction. The essential information is simplified from 6D into 3D, which are the 2D in-plane positions and 1D rotation angle. There exist methods of evaluating grasp contact points and methods of evaluating grasp oriented rectangles.
6DoF grasp means that the gripper can grasp the object from various angles in the 3D domain, and the essential 6D gripper pose could not be simplified. Based on whether the grasp is conducted on the complete shape or on the single-view point cloud, methods are categorized into methods based on the partial point cloud and methods based on the complete shape. Methods based on the partial point cloud contains methods of estimating candidate grasps and methods of transferring grasps from existing grasps database. Methods based on the complete shape contains methods of estimating 6D object pose and methods of shape completion. Most of current 6DoF grasp methods aim at known objects where the grasps could be precomputed manually or by simulation, and the problem is thus transformed into a 6D object pose estimation problem.
Besides, most of the robotic grasping approaches require the target object’s location in the input data first. This involves three different stages: object localization without classification, object detection and object instance segmentation. Object localization without classification only outputs the potential regions of the target objects without knowing their categories. Object detection provides bounding boxes of the target objects with their categories. Object instance segmentation further provides pixel or point-level regions of the target objects with their categories.
I summarize all above kinds of methods in this repository, and hope to present a big picture for friends work on vision-based robotic grasping. The table of content is listed as follows.
Thank Hatim Wen for modifying all links to the pdf files and writing a convenient program to download the papers.
How to use?
- run
python download.py
to start your download process.
NOTE:
Before you use, it's better read and change the codes.
Specifically, you should change the value of `name` in `download.py`(line 18) into the md file you split, e.g. '6DoF Grasp.md'.
- Vision-based Robotic Grasping: Papers and Codes
- 0. Review Papers
- 1. Object Localization
- 2. Object Pose Estimation
- 3. 2D Planar Grasp
- 4. 6DoF Grasp
- 5. Task-oriented Methods
- 6. Dexterous Grippers
- 7. Data Generation
- 8. Multi-source
- 9. Motion Planning
- 10. Imitation Learning
- 11. Reinforcement Learning
- 12. Experts
0. Review Papers
[Foundations and Trends in Robotics] 2020-Semantics for Robotic Mapping, Perception and Interaction: A Survey, [paper]
[AIRE] 2020-Vision-based Robotic Grasp Detection From Object Localization, Object Pose Estimation To Grasp Estimation: A Review, [paper]
[arXiv] 2020-Affordances in Robotic Tasks - A Survey, [paper]
[arXiv] 2019-A Review of Robot Learning for Manipulation- Challenges, Representations, and Algorithms, [paper]
[arXiv] 2018-The Limits and Potentials of Deep Learning for Robotics, [paper]
[MTI] 2018-Review of Deep Learning Methods in Robotic Grasp Detection, [paper]
[ToR] 2016-Data-Driven Grasp Synthesis - A Survey, [paper]
[RAS] 2012-An overview of 3D object grasp synthesis algorithms - A Survey, [paper]
1. Object Localization
1.1 Object Localization without Classification
1.1.1 2D-based Methods
a.Fitting 2D Shape Primitives
[BMVC] A buyer’s guide to conic fitting, [paper] [code]
[IJGIG] Algorithms for the reduction of the number of points required to represent a digitized line or its caricature, [paper] [code]
b. Saliency Detection
Survey papers:
[arXiv] 2020-RGB-D Salient Object Detection: A Survey, [paper] [project]
[arXiv] 2019-Salient object detection in the deep learning era: An in-depth survey, [paper]
[CVM] 2014-Salient object detection: A survey, [paper]
2020:
[ECCV] Progressively Guided Alternate Refinement Network for RGB-D Salient Object Detection, [paper]
[ECCV] Hierarchical Dynamic Filtering Network for RGB-D Salient Object Detection, [paper]
[ECCV] Cross-Modal Weighting Network for RGB-D Salient Object Detection, [paper]
[arXiv] Bilateral Attention Network for RGB-D Salient Object Detection, [paper]
[arXiv] Salient Object Detection Combining a Self-attention Module and a Feature Pyramid Network, [paper]
[arXiv] JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection, [paper]
[arXiv] UC-Net: Uncertainty Inspired RGB-D Saliency Detection via Conditional Variational Autoencoders, [paper]
[arXiv] Cross-layer Feature Pyramid Network for Salient Object Detection, [paper]
[arXiv] Depth Potentiality-Aware Gated Attention Network for RGB-D Salient Object Detection, [paper]
[arXiv] Weakly-Supervised Salient Object Detection via Scribble Annotations, [paper]
[arXiv] Highly Efficient Salient Object Detection with 100K Parameters, [paper]
[arXiv] Global Context-Aware Progressive Aggregation Network for Salient Object Detection, [paper]
[arXiv] Adaptive Graph Convolutional Network with Attention Graph Clustering for Co-saliency Detection, [paper]
2019:
[ICCV] Employing deep part-object relationships for salient object detection, [paper]
[ICME] Multi-scale capsule attention-based salient object detection with multi-crossed layer connections, [paper]
2018:
[CVPR] Picanet: Learning pixel-wise contextual attention for saliency detection, [paper]
[SPM] Advanced deep-learning techniques for salient and category-specific object detection: a survey, [paper]
2017:
[CVPR] Deeply supervised salient object detection with short connections, [paper]
[TOC] Video saliency detection using object proposals, [paper]
2016:
[CVPR] Unconstrained salient object detection via proposal subset optimization, [paper]
[CVPR] Deep hierarchical saliency network for salient object detection, [paper]
[TPAMI] Salient object detection via structured matrix decomposition, [paper]
[TIP] Correspondence driven saliency transfer, [paper]
2015:
[CVPR] Saliency detection by multi-context deep learning, [paper]
[TPAMI] Hierarchical image saliency detection on extended CSSD, [paper]
2014:
[CVPR] Saliency optimization from robust background detection, [paper]
[TPAMI] Global contrast based salient region detection, [paper]
2013:
[CVPR] Salient object detection: A discriminative regional feature integration approach, [paper]
[CVPR] Saliency detection via graph-based manifold ranking, [paper]
2012:
[ECCV] Geodesic saliency using background priors, [paper]
1.1.2 3D-based Methods
a.Fitting 3D Shape Primitives
Survey papers:
[CGF] 2019-A survey of simple geometric primitives detection methods for captured 3d data, [paper]
2021:
[CVPR] Cuboids Revisited: Learning Robust 3D Shape Fitting to Single RGB Images, [paper]
2020:
[ECCV] CAD-Deform: Deformable Fitting of CAD Models to 3D Scans, [paper] [code]
[arXiv] Polylidar3D - Fast Polygon Extraction from 3D Data, [paper]
[ICRA] PrimiTect: Fast Continuous Hough Voting for Primitive Detection, [paper] [code]
[arXiv] ParSeNet: A Parametric Surface Fitting Network for 3D Point Clouds, [paper]
2015:
[CVPR] Separating objects and clutter in indoor scenes, [paper]
2013:
[CVPR] A linear approach to matching cuboids in rgbd images, [paper]
2012:
[GCR] Robustly segmenting cylindrical and box-like objects in cluttered scenes using depth cameras, [paper]
2009:
[IROS] Close-range scene segmentation and reconstruction of 3d point cloud maps for mobile manipulation in domestic environments, [paper]
2005:
[ISPRS] Efficient hough transform for automatic detection of cylinders in point clouds, [paper]
b. Saliency Detection
2020:
[ECCV] A Single Stream Network for Robust and Real-time RGB-D Salient Object Detection, [paper]
[ECCV] RGB-D Salient Object Detection with Cross-Modality Modulation and Selection, [paper]
2019:
[PR] Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection, [paper]
[ICCV] Depth-Induced Multi-Scale Recurrent Attention Network for Saliency Detection, [paper]
[ICCV] Pointcloud saliency maps, [paper]
[arXiv] CNN-based RGB-D Salient Object Detection: Learn, Select and Fuse, [paper]
2018:
[CVPR] Progressively complementarity-aware fusion network for RGB-D salient object detection, [paper]
2017:
[TIP] RGBD salient object detection via deep fusion, [paper]
2015:
[CVPRW] Exploiting global priors for RGB-D saliency detection, [paper]
2014:
[ECCV] Rgbd salient object detection: a benchmark and algorithms, [paper]
2013:
[JSIP] Segmenting salient objects in 3d point clouds of indoor scenes using geodesic distances, [paper]
1.2 Object Detection
Detailed paper lists can refer to hoya012 or amusi.
1.2.1 2D Object Detection
Survey papers:
2021:
[TPAMI] Weakly Supervised Object Localization and Detection: A Survey, [paper]
2020:
[arXiv] Iterative Bounding Box Annotation for Object Detection, [paper]
[arXiv] Deep Domain Adaptive Object Detection: a Survey, [paper]
[IJCV] Deep Learning for Generic Object Detection: A Survey, [paper]
2019:
[arXiv] Object Detection in 20 Years A Survey, [paper]
[arXiv] Object Detection with Deep Learning: A Review, [paper]
[arXiv] A Review of Object Detection Models based on Convolutional Neural Network, [paper]
[arXiv] A Review of methods for Textureless Object Recognition, [paper]
a. Two-stage methods
2020:
[ECCV] MimicDet: Bridging the Gap Between One-Stage and Two-Stage Object Detection, [paper]
[ECCV] Corner Proposal Network for Anchor-free, Two-stage Object Detection, [paper]
[arXiv] Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation, [paper]
[arXiv] Instance-Aware, Context-Focused, and Memory-Efficient Weakly Supervised Object Detection, [paper]
[arXiv] Scalable Active Learning for Object Detection, [paper]
[arXiv] Any-Shot Object Detection, [paper]
[arXiv] Frustratingly Simple Few-Shot Object Detection, [paper]
[arXiv] Rethinking the Route Towards Weakly Supervised Object Localization, [paper]
[arXiv] Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN, [paper]
[arXiv] Unsupervised Image-generation Enhanced Adaptation for Object Detection in Thermal images, [paper]
[arXiv] PCSGAN: Perceptual Cyclic-Synthesized Generative Adversarial Networks for Thermal and NIR to Visible Image Transformation, [paper]
[arXiv] SpotNet: Self-Attention Multi-Task Network for Object Detection, [paper]
[arXiv] Real-Time Object Detection and Recognition on Low-Compute Humanoid Robots using Deep Learning, [paper]
[arXiv] FedVision: An Online Visual Object Detection Platform Powered by Federated Learning, [paper]
2019:
[arXiv] Combining Deep Learning and Verification for Precise Object Instance Detection, [paper]
[arXiv] cmSalGAN: RGB-D Salient Object Detection with Cross-View Generative Adversarial Networks, [paper]
[arXiv] OpenLORIS-Object: A Dataset and Benchmark towards Lifelong Object Recognition, [paper] [project]
[IROS] Look Further to Recognize Better: Learning Shared Topics and Category-Specific Dictionaries for Open-Ended 3D Object Recognition, [paper]
[IROS] Recurrent Convolutional Fusion for RGB-D Object Recognition, [paper] [code]
[ICCVW] An Annotation Saved is an Annotation Earned: Using Fully Synthetic Training for Object Detection, [paper]
2017:
[CVPR] FPN: Feature pyramid networks for object detection, [paper]
[arXiv] Light-Head R-CNN: In Defense of Two-Stage Object Detector, [paper] [code]
2016:
[NeurIPS] R-FCN: Object Detection via Region-based Fully Convolutional Networks, [paper] [code]
[TPAMI] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, [paper] [code]
[ECCV] Visual relationship detection with language priors, [paper]
2015:
[ICCV] Fast R-CNN, [paper] [code]
2014:
[ECCV] SPPNet: Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, [paper] [code]
[CVPR] R-CNN: Rich feature hierarchies for accurate object detection and semantic segmentation, [paper] [code]
[CVPR] Scalable object detection using deep neural networks, [paper]
[arXiv] Scalable, high-quality object detection, [paper]
[ICLR] OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks, [paper] [code]
b. Single-stage methods
2020:
[arXiv] OneNet: Towards End-to-End One-Stage Object Detection, [paper]
[arXiv] Scaled-YOLOv4: Scaling Cross Stage Partial Network, [paper]
[TPAMI] AP-Loss for Accurate One-Stage Object Detection, [paper]
[arXiv] Point-Set Anchors for Object Detection, Instance Segmentation and Pose Estimation, [paper]
[arXiv] AutoAssign: Differentiable Label Assignment for Dense Object Detection, [paper]
[arXiv] Localization Uncertainty Estimation for Anchor-Free Object Detection, [paper]
[arXiv] DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution, [paper] [code]
[arXiv] YOLOv4: Optimal Speed and Accuracy of Object Detection, [paper]
[arXiv] SaccadeNet: A Fast and Accurate Object Detector, [paper]
[arXiv] CentripetalNet: Pursuing High-quality Keypoint Pairs for Object Detection, [paper]
[arXiv] Real Time Detection of Small Objects, [paper]
[arXiv] OS2D: One-Stage One-Shot Object Detection by Matching Anchor Features, [paper]
2019:
[arXiv] CenterNet: Objects as Points, [paper]
[arXiv] CenterNet: Keypoint Triplets for Object Detection, [paper]
[ECCV] CornerNet: Detecting Objects as Paired Keypoints, [paper]
[arXiv] FCOS: Fully Convolutional One-Stage Object Detection, [paper]
[arXiv] Bottom-up Object Detection by Grouping Extreme and Center Points, [paper]
2018:
[arXiv] YOLOv3: An Incremental Improvement, [paper] [code]
2017:
[CVPR] YOLO9000: Better, Faster, Stronger, [paper] [code]
[ICCV] RetinaNet: Focal loss for dense object detection, [paper]
2016:
[CVPR] YOLO: You only look once: Unified, real-time object detection, [paper] [code]
[ECCV] SSD: Single Shot MultiBox Detector, [paper] [code]
1.2.2 3D Object Detection
This kind of methods can be divided into three kinds: RGB-based methods, point cloud-based methods, and fusion methods which consume images and point cloud.
a. RGB-based methods
2021:
[TPAMI] MonoGRNet: A General Framework for Monocular 3D Object Detection, [paper]
[arXiv] Exploring 2D Data Augmentation for 3D Monocular Object Detection, [paper]
[arXiv] FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection, [paper]
[arXiv] Geometry-aware data augmentation for monocular 3D object detection, [paper]
[arXiv] OCM3D: Object-Centric Monocular 3D Object Detection, [paper]
[arXiv] Gated3D: Monocular 3D Object Detection From Temporal Illumination Cues, [paper]
[arXiv] PLUME: Efficient 3D Object Detection from Stereo Images, [paper]
[arXiv] Ellipse Regression with Predicted Uncertainties for Accurate Multi-View 3D Object Estimation, [paper]
2020:
[arXiv] Demystifying Pseudo-LiDAR for Monocular 3D Object Detection, [paper]
[arXiv] 3D Object Recognition By Corresponding and Quantizing Neural 3D Scene Representations, [paper]
[ECCV] Monocular Differentiable Rendering for Self-Supervised 3D Object Detection, [paper]
[ECCV] Reinforced Axial Refinement Network for Monocular 3D Object Detection, [paper]
[arXiv] Monocular 3D Detection with Geometric Constraints Embedding and Semi-supervised Training, [paper]
[arXiv] 1-Point RANSAC-Based Method for Ground Object Pose Estimation, [paper]
[IROS] Object-Aware Centroid Voting for Monocular 3D Object Detection, [paper]
[ECCV] Kinematic 3D Object Detection in Monocular Video, [paper]
[arXiv] MoNet3D: Towards Accurate Monocular 3D Object Localization in Real Time, [paper]
[arXiv] Expandable YOLO: 3D Object Detection from RGB-D Images, [paper]
[arXiv] Instant 3D Object Tracking with Applications in Augmented Reality, [paper]
[arXiv] Single-Shot 3D Detection of Vehicles from Monocular RGB Images via Geometry Constrained Keypoints in Real-Time, [paper]
[arXiv] CubifAE-3D: Monocular Camera Space Cubification on Autonomous Vehicles for Auto-Encoder based 3D Object Detection, [paper]
[arXiv] Center3D: Center-based Monocular 3D Object Detection with Joint Depth Understanding, [paper]
[ICITS] Exploring the Capabilities and Limits of 3D Monocular Object Detection - A Study on Simulation and Real World Data, [paper]
[arXiv] Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation, [paper]
[arXiv] End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection, [paper]
[arXiv] Confidence Guided Stereo 3D Object Detection with Split Depth Estimation, [paper]
[arXiv] Monocular 3D Object Detection in Cylindrical Images from Fisheye Cameras, [paper]
[arXiv] ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection, [paper]
[arXiv] MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships, [paper]
[arXiv] Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image, [paper]
[arXiv] SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation, [paper]
[arXiv] siaNMS: Non-Maximum Suppression with Siamese Networks for Multi-Camera 3D Object Detection, [paper]
[AAAI] Monocular 3D Object Detection with Decoupled Structured Polygon Estimation and Height-Guided Depth Estimation, [paper]
[arXiv] SDOD: Real-time Segmenting and Detecting 3D Objects by Depth, [paper]
[arXiv] DSGN: Deep Stereo Geometry Network for 3D Object Detection, [paper]
[arXiv] RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving, [paper]
2019:
[NeurIPS] PerspectiveNet: 3D Object Detection from a Single RGB Image via Perspective Points, [paper]
[arXiv] Single-Stage Monocular 3D Object Detection with Virtual Cameras, [paper]
[arXiv] Environment reconstruction on depth images using Generative Adversarial Networks, [paper] [code]
[arXiv] Learning Depth-Guided Convolutions for Monocular 3D Object Detection, [paper]
[arXiv] RefinedMPL: Refined Monocular PseudoLiDAR for 3D Object Detection in Autonomous Driving, [paper]
[IROS] Look Further to Recognize Better: Learning Shared Topics and Category-Specific Dictionaries for Open-Ended 3D Object Recognition, [paper]
[arXiv] Task-Aware Monocular Depth Estimation for 3D Object Detection, [paper]
[CVPR] Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving, [paper] [code]
[AAAI] MonoGRNet: A Geometric Reasoning Network for 3D Object Localization, [paper] [code]
[ICCV] Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving, [paper]
[ICCV] M3D-RPN: Monocular 3D Region Proposal Network for Object Detection, [paper]
[ICCVW] Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud, [paper]
[arXiv] Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss, [paper]
[arXiv] Monocular 3D Object Detection via Geometric Reasoning on Keypoints, [paper]
b. Point cloud-based methods
Survey papers:
[arXiv] Deep Learning for 3D Point Cloud Understanding: A Survey, [paper]
[TPAMI] 2020-Deep Learning for 3D Point Clouds: A Survey, [paper]
2021:
[CVPR] 3D Spatial Recognition without Spatially Labeled 3D, [paper]
[CVPR] SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud, [paper]
[arXiv] Boundary-Aware 3D Object Detection from Point Clouds, [paper]
[CVPR] Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds, [paper]
[CVPR] HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection, [paper] [code]
[arXiv] Group-Free 3D Object Detection via Transformers, [paper] [code]
[arXiv] SparsePoint: Fully End-to-End Sparse 3D Object Detector, [paper]
[arXiv] Offboard 3D Object Detection from Point Cloud Sequences, [paper]
[CVPR] ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection, [paper] [code]
[CAD] labelCloud: A Lightweight Domain-Independent Labeling Tool for 3D Object Detection in Point Clouds, [paper] [code]
[arXiv] DPointNet: A Density-Oriented PointNet for 3D Object Detection in Point Clouds, [paper]
[arXiv] PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection, [paper]
[arXiv] Auto4D: Learning to Label 4D Objects from Sequential Point Clouds, [paper]
[AAAI] Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection, [paper]
[AAAI] PC-RGNN: Point Cloud Completion and Graph Neural Network for 3D Object Detection, [paper]
[AAAI] CIA-SSD: Confident IoU-Aware Single-Stage Object Detector From Point Cloud, [paper]
2020:
[arXiv] 3D Object Detection with Pointformer, [paper]
[arXiv] It's All Around You: Range-Guided Cylindrical Network for 3D Object Detection, [paper]
[arXiv] 3DIoUMatch: Leveraging IoU Prediction for Semi-Supervised 3D Object Detection, [paper]
[3DV] PanoNet3D: Combining Semantic and Geometric Understanding for LiDARPoint Cloud Detection, [paper]
[3DV] SF-UDA3D: Source-Free Unsupervised Domain Adaptation for LiDAR-Based 3D Object Detection, [paper]
[ECCVW] Multi-Frame to Single-Frame: Knowledge Distillation for 3D Object Detection, [paper]
[ICRA] 3D Object Detection and Tracking Based on Streaming Data, [paper]
[arXiv] A Density-Aware PointRCNN for 3D Objection Detection in Point Clouds, [paper]
[arXiv] Dynamic Edge Weights in Graph Neural Networks for 3D Object Detection, [paper]
[arXiv] RangeRCNN: Towards Fast and Accurate 3D Object Detection with Range Image Representation, [paper]
[WACV] Cross-Modality 3D Object Detection, [paper]
[ECCVW] AB3DMOT: A Baseline for 3D Multi-Object Tracking and New Evaluation Metrics, [paper] [project]
[IROS] MLOD: Awareness of Extrinsic Perturbation in Multi-LiDAR 3D Object Detection for Autonomous Driving, [paper]
[IROS] Uncertainty-aware Self-supervised 3D Data Association, [paper]
[ECCVW] Deformable PV-RCNN: Improving 3D Object Detection with Learned Deformations, [paper]
[arXiv] An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds, [paper]
[arXiv] Part-Aware Data Augmentation for 3D Object Detection in Point Cloud, [paper]
[MM] Weakly Supervised 3D Object Detection from Point Clouds, [paper]
[ECCV] Weakly Supervised 3D Object Detection from Lidar Point Cloud, [paper] [code]
[ECCV] Pillar-based Object Detection for Autonomous Driving, [paper]
[arXiv] InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic Information Modeling, [paper]
[arXiv] CenterNet3D: An Anchor free Object Detector for Autonomous Driving, [paper]
[arXiv] Local Grid Rendering Networks for 3D Object Detection in Point Clouds, [paper]
[arXiv] 1 st Place Solution for Waymo Open Dataset Challenge - 3D Detection and Domain Adaptation, [paper]
[arXiv] Optimisation of the PointPillars network for 3D object detection in point clouds, [paper]
[arXiv] AFDet: Anchor Free One Stage 3D Object Detection, [paper]
[arXiv] Generative Sparse Detection Networks for 3D Single-shot Object Detection, [paper]
[arXiv] Center-based 3D Object Detection and Tracking, [paper]
[arXiv] H3DNet: 3D Object Detection Using Hybrid Geometric Primitives, [paper]
[arXiv] Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection, [paper]
[arXiv] SVGA-Net: Sparse Voxel-Graph Attention Network for 3D Object Detection from Point Clouds, [paper]
[arXiv] Learning to Detect 3D Objects from Point Clouds in Real Time, [paper]
[arXiv] P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds, [paper]
[arXiv] Range Conditioned Dilated Convolutions for Scale Invariant 3D Object Detection, [paper]
[arXiv] Drosophila-Inspired 3D Moving Object Detection Based on Point Clouds, [paper]
[arXiv] Streaming Object Detection for 3-D Point Clouds, [paper]
[arXiv] SS3D: Single Shot 3D Object Detector, [paper]
[arXiv] MLCVNet: Multi-Level Context VoteNet for 3D Object Detection, [paper]
[arXiv] 3D IoU-Net: IoU Guided 3D Object Detector for Point Clouds, [paper]
[arXiv] Finding Your (3D) Center: 3D Object Detection Using a Learned Loss, [paper]
[arXiv] LiDAR-based Online 3D Video Object Detection with Graph-based Message Passing and Spatiotemporal Transformer Attention, [paper]
[arXiv] Quantifying Data Augmentation for LiDAR based 3D Object Detection, [paper]
[arXiv] DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes, [paper]
[arXiv] Improving 3D Object Detection through Progressive Population Based Augmentation, [paper]
[arXiv] Boundary-Aware Dense Feature Indicator for Single-Stage 3D Object Detection from Point Clouds, [paper]
[arXiv] Physically Realizable Adversarial Examples for LiDAR Object Detection, [paper]
[arXiv] Real-time 3D object proposal generation and classification under limited processing resources, [paper]
[arXiv] 3D Object Detection From LiDAR Data Using Distance Dependent Feature Extraction, [paper]
[arXiv] HVNet: Hybrid Voxel Network for LiDAR Based 3D Object Detection, [paper]
[arXiv] Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud, [paper]
[arXiv] PointTrackNet: An End-to-End Network for 3-D Object Detection and Tracking from Point Clouds, [paper]
[arXiv] 3DSSD: Point-based 3D Single Stage Object Detector, [paper]
[ariv] SegVoxelNet: Exploring Semantic Context and Depth-aware Features for 3D Vehicle Detection from Point Cloud, [paper]
[arXiv] Investigating the Importance of Shape Features, Color Constancy, Color Spaces and Similarity Measures in Open-Ended 3D Object Recognition, [paper]
[arXiv] Probabilistic 3D Multi-Object Tracking for Autonomous Driving, [paper]
[AAAI] TANet: Robust 3D Object Detection from Point Clouds with Triple Attention, [paper]
2019:
[arXiv] Class-balanced grouping and sampling for point cloud 3d object detection, [paper] [code]
[arXiv] SESS: Self-Ensembling Semi-Supervised 3D Object Detection, [paper]
[arXiv] Deep SCNN-based Real-time Object Detection for Self-driving Vehicles Using LiDAR Temporal Data, [paper]
[arXiv] Pillar in Pillar: Multi-Scale and Dynamic Feature Extraction for 3D Object Detection in Point Clouds, [paper]
[arXiv] What You See is What You Get: Exploiting Visibility for 3D Object Detection, [paper]
[NeurIPSW] Patch Refinement -- Localized 3D Object Detection, [paper]
[CoRL] End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds, [paper]
[ICCV] Deep Hough Voting for 3D Object Detection in Point Clouds, [paper] [code]
[arXiv] Part-A2 Net: 3D Part-Aware and Aggregation Neural Network for Object Detection from Point Cloud, [paper]
[ICCV] STD: Sparse-to-Dense 3D Object Detector for Point Cloud, [paper]
[CVPR] PointPillars: Fast Encoders for Object Detection from Point Clouds, [paper]
[arXiv] StarNet: Targeted Computation for Object Detection in Point Clouds, [paper]
2018:
[CVPR] PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud, [paper] [code]
[CVPR] PIXOR: Real-time 3D Object Detection from Point Clouds, [paper] [code]
[CVPR] VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection, [paper] [code]
[ECCVW] Complex-YOLO: Real-time 3D Object Detection on Point Clouds, [paper] [code]
[ECCVW] YOLO3D: End-to-end real-time 3D Oriented Object Bounding Box Detection from LiDAR Point Cloud, [paper]
2015
[IROS] VoxNet: A 3D Convolutional Neural Network for real-time object recognition, [paper] [code] [project]
c. Fusion methods
This kind of methods utilize both rgb images and depth images/point clouds. There exist early fusion methods, late fusion methods, and dense fusion methods.
2021:
[arXiv] VR3Dense: Voxel Representation Learning for 3D Object Detection and Monocular Dense Depth Reconstruction, [paper]
[arXiv] Self-Attention Based Context-Aware 3D Object Detection, [paper]
2020:
[arXiv] Multi-View Adaptive Fusion Network for 3D Object Detection, [paper]
[arXiv] CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection, [paper]
[ECCV] EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection, [paper]
[arXiv] Stereo RGB and Deeper LIDAR Based Network for 3D Object Detection, [paper]
[arXiv] PnPNet: End-to-End Perception and Prediction with Tracking in the Loop, [paper]
[arXiv] 3D Object Detection Method Based on YOLO and K-Means for Image and Point Clouds, [paper]
[arXiv] 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object Detection, [paper]
[arXiv] ImVoteNet: Boosting 3D Object Detection in Point Clouds with Image Votes, [paper]
[arXiv] JRMOT: A Real-Time 3D Multi-Object Tracker and a New Large-Scale Dataset, [paper]
[AAAI] PI-RCNN: An Efficient Multi-sensor 3D Object Detector with Point-based Attentive Cont-conv Fusion Module, [paper]
2019:
[arXiv] PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection, [paper]
[arXiv] Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots, [paper]
[arXiv] ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language, [paper]
[arXiv] Relation Graph Network for 3D Object Detection in Point Clouds, [paper]
[arXiv] PointPainting: Sequential Fusion for 3D Object Detection, [paper]
[ICCV] Transferable Semi-Supervised 3D Object Detection From RGB-D Data, [paper]
[arXiv] Adaptive and Azimuth-Aware Fusion Network of Multimodal Local Features for 3D Object Detection, [paper]
[arXiv] Frustum VoxNet for 3D object detection from RGB-D or Depth images, [paper]
[IROS] Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal 3D Object Detection, [paper]
[CVPR] Multi-Task Multi-Sensor Fusion for 3D Object Detection, [paper]
2018:
[CVPR] Frustum PointNets for 3D Object Detection from RGB-D Data, [paper] [code]
[ECCV] Deep Continuous Fusion for Multi-Sensor 3D Object Detection, [paper]
[IROS] Joint 3D Proposal Generation and Object Detection from View Aggregation, [paper] [code]
[CVPR] PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation, [paper]
[ICRA] A General Pipeline for 3D Detection of Vehicles, [paper]
2017:
[CVPR] Multi-View 3D Object Detection Network for Autonomous Driving, [paper] [code]
[CVPR] Amodal Detection of 3D Objects: Inferring 3D Bounding Boxes From 2D Ones in RGB-Depth Images [paper] [code]
[ICCV] 2D-Driven 3D Object Detection in RGB-D Images, [paper]
2016:
[CVPR] Deep sliding shapes for amodal 3d object detection in rgb-d images, [paper]
2014
[ECCV] Learning Rich Features from RGB-D Images for Object Detection and Segmentation, [paper]
1.3 Object Instance Segmentation
1.3.1 2D Instance Segmentation
a. Survey papers
2020:
[arXiv] A Survey on Instance Segmentation: State of the art, [paper]
[arXiv] Evolution of Image Segmentation using Deep Convolutional Neural Network: A Survey, [paper]
[arXiv] Image Segmentation Using Deep Learning: A Survey, [paper]
b. Two-stage methods
2021:
[CVPR] Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation, [paper]
2020:
[arXiv] Visual Identification of Articulated Object Parts, [paper]
[arXiv] DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation, [paper]
[MM] Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation, [paper]
[arXiv] Mask Point R-CNN, [paper]
[ECCV] Commonality-Parsing Network across Shape and Appearance for Partially Supervised Instance Segmentation, [paper]
[arXiv] Learning RGB-D Feature Embeddings for Unseen Object Instance Segmentation, [paper]
[ECCV] LevelSet R-CNN: A Deep Variational Method for Instance Segmentation, [paper]
[ECCV] Boundary-preserving Mask R-CNN, [paper] [code]
[arXiv] A novel Region of Interest Extraction Layer for Instance Segmentation, [paper]
[arXiv] 1st Place Solutions for OpenImage2019 - Object Detection and Instance Segmentation, [paper]
[arXiv] Fully Convolutional Networks for Automatically Generating Image Masks to Train Mask R-CNN, [paper]
[arXiv] FGN: Fully Guided Network for Few-Shot Instance Segmentation, [paper]
[arXiv] PointRend: Image Segmentation as Rendering, [paper]
2019:
[CVPR] HTC: Hybrid task cascade for instance segmentation, [paper]
2018:
[CVPR] PANet: Path aggregation network for instance segmentation, [paper]
[CVPR] Masklab: Instance segmentation by refining object detection with semantic and direction features, [paper]
2017:
[ICCV] Mask r-cnn, [paper] [code]
2016:
[CVPR] Instance-aware semantic segmentation via multi-task network cascades, [paper]
2014:
[ECCV] Simultaneous detection and segmentation, [paper]
c. One-stage methods
2021:
[arXiv] INSTA-YOLO: Real-Time Instance Segmentation, [paper]
2020:
[arXiv] YolactEdge: Real-time Instance Segmentation on the Edge, [paper]
[ECCV] SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation, [paper] [code]
[arXiv] POLY-YOLO: HIGHER SPEED, MORE PRECISE DETECTION AND INSTANCE SEGMENTATION FOR YOLOV3, [paper]
[CVPR] CenterMask: single shot instance segmentation with point representation, [paper]
[arXiv] BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation, [paper]
[arXiv] SOLOv2: Dynamic, Faster and Stronger, [paper] [code]
[arXiv] Mask Encoding for Single Shot Instance Segmentation, [paper]
[arXiv] Deep Affinity Net: Instance Segmentation via Affinity, [paper]
[arXiv] PointINS: Point-based Instance Segmentation, [paper]
[arXiv] Conditional Convolutions for Instance Segmentation, [paper]
[arXiv] Real-time Semantic Background Subtraction, [paper]
[arXiv] FourierNet: Compact mask representation for instance segmentation using differentiable shape decoders, [paper]
2019:
[arXiv] CenterMask:Real-Time Anchor-Free Instance Segmentation, [paper] [code]
[arXiv] SAIS: Single-stage Anchor-free Instance Segmentation, [paper]
[arXiv] YOLACT++ Better Real-time Instance Segmentation, [paper] [code]
[ICCV] YOLACT: Real-time Instance Segmentation, [paper] [code]
[ICCV] TensorMask: A Foundation for Dense Object Segmentation, [paper] [code]
[CASE] Deep Workpiece Region Segmentation for Bin Picking, [paper]
2018:
[CVPR] PANet: Path Aggregation Network for Instance Segmentation, [paper] [code]
[CVPR] MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features, [paper]
2017:
[CVPR] Fully Convolutional Instance-aware Semantic Segmentation, [paper]
2016:
[ECCV] SharpMask: Learning to Refine Object Segments, [paper] [code]
[BMVC] MultiPathNet: A MultiPath Network for Object Detection, [paper] [code]
[CVPR] MNC: Instance-aware Semantic Segmentation via Multi-task Network Cascades, [paper]
2015:
[NeurIPS] DeepMask: Learning to Segment Object Candidates, [paper] [code]
[CVPR] Hypercolumns for Object Segmentation and Fine-grained Localization, [paper]
2014:
[ECCV] SDS: Simultaneous Detection and Segmentation, [paper]
Applications in Robotics:
2021:
[arXiv] Where is my hand? Deep hand segmentation for visual self-recognition in humanoid robots, [paper]
2020:
[arXiv] Self-Supervised Object-in-Gripper Segmentation from Robotic Motions, [paper]
[arXiv] Segmenting unseen industrial components in a heavy clutter using rgb-d fusion and synthetic data, [paper]
[arXiv] Instance Segmentation of Visible and Occluded Regions for Finding and Picking Target from a Pile of Objects, [paper]
[arXiv] Joint Learning of Instance and Semantic Segmentation for Robotic Pick-and-Place with Heavy Occlusions in Clutter, [paper]
d. Panoptic segmentation
2020:
[arXiv] BANet: Bidirectional Aggregation Network with Occlusion Handling for Panoptic Segmentation, [paper]
[arXiv] Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation, [paper]
[arXiv] Towards Bounding-Box Free Panoptic Segmentation, [paper]
2019:
[CVPR] An End-to-End Network for Panoptic Segmentation, [paper]
[CVPR] Panoptic Segmentation, [paper]
[CVPR] Panoptic Feature Pyramid Networks, [paper]
[CVPR] UPSNet: A Unified Panoptic Segmentation Network, [paper]
[IV] Single Network Panoptic Segmentation for Street Scene Understanding, [paper] [code]
[ITSC] Multi-task Network for Panoptic Segmentation in Automated Driving, [paper]
1.3.2 3D Instance Segmentation
a. Two-stage methods
2021:
[arXiv] Deep Learning based 3D Segmentation: A Survey, [paper]
[arXiv] EfficientLPS: Efficient LiDAR Panoptic Segmentation, [paper]
2020:
[arXiv] FPCC-Net: Fast Point Cloud Clustering for Instance Segmentation, [paper]
[arXiv] Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts, [paper]
[arXiv] DyCo3D: Robust Instance Segmentation of 3D Point Clouds through Dynamic Convolution, [paper]
[arXiv] Self-Supervised Learning of Part Mobility from Point Cloud Sequence, [paper]
[arXiv] Learning Gaussian Instance Segmentation in Point Clouds, [paper]
[arXiv] Spatial Semantic Embedding Network: Fast 3D Instance Segmentation with Deep Metric Learning, [paper]
[arXiv] PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation, [paper]
[arXiv] 3D-MPA: Multi Proposal Aggregation for 3D Semantic Instance Segmentation, [paper]
[arXiv] OccuSeg: Occupancy-aware 3D Instance Segmentation, [paper]
[arXiv] Learning to Segment 3D Point Clouds in 2D Image Space, [paper]
[arXiv] Bi-Directional Attention for Joint Instance and Semantic Segmentation in Point Clouds, [paper]
[arXiv] 3DCFS: Fast and Robust Joint 3D Semantic-Instance Segmentation via Coupled Feature Selection, [paper]
[RAL] From Planes to Corners: Multi-Purpose Primitive Detection in Unorganized 3D Point Clouds, [paper]
[arXiv] Learning and Memorizing Representative Prototypes for 3D Point Cloud Semantic and Instance Segmentation, [paper]
[WACV] FuseSeg: LiDAR Point Cloud Segmentation Fusing Multi-Modal Data, [paper]
2019:
[arXiv] Point2Node: Correlation Learning of Dynamic-Node for Point Cloud Feature Modeling, [paper]
[arXiv] LatticeNet: Fast Point Cloud Segmentation Using Permutohedral Lattices, [paper]
[arXiv] Learning to Optimally Segment Point Clouds, [paper]
[arXiv] Point Cloud Instance Segmentation using Probabilistic Embeddings, [paper]
[NeurIPS] Exploiting Local and Global Structure for Point Cloud Semantic Segmentation with Contextual Point Representations, [paper]
[arXiv] Addressing the Sim2Real Gap in Robotic 3D Object Classification, [paper]
[IROS] LDLS: 3-D Object Segmentation Through Label Diffusion From 2-D Images, [paper]
[CoRL] The Best of Both Modes: Separately Leveraging RGB and Depth for Unseen Object Instance Segmentation, [paper] [code]
[arXiv] GSPN: Generative Shape Proposal Network for 3D Instance Segmentation in Point Cloud, [paper]
b. One-stage Methods
2020:
[arXiv] SegGroup: Seg-Level Supervision for 3D Instance and Semantic Segmentation, [paper]
[ECCV] Self-Prediction for Joint Instance and Semantic Segmentation of Point Clouds, [paper]
[IET] SASO: Joint 3D Semantic-Instance Segmentation via Multi-scale Semantic Association and Salient Point Clustering Optimization, [paper]
[AAAI] JSNet: Joint Instance and Semantic Segmentation of 3D Point Clouds, [paper] [code]
[ICRA] LiDARSeg: Instance segmentation of lidar point clouds, [paper]
2019:
[NeurIPS] 3D-BoNet: Learning Object Bounding Boxes for 3D Instance Segmentation on Point Clouds, [paper] [code]
[arXiv] MASC: multi-scale affinity with sparse convolution for 3d instance segmentation, [paper]
[CVPR] ASIS: Associatively segmenting instances and semantics in point clouds, [paper]
[CVPR] SGPN: Similarity Group Proposal Network for 3D Point Cloud Instance Segmentation, [paper]
[CVPR] JSIS3D: joint semantic-instance segmentation of 3d point clouds with multi-task pointwise networks and multi-value conditional random fields, [paper]
c. 3D deep learning networks
2021:
[CVPR] PointGuard: Provably Robust 3D Point Cloud Classification, [paper]
[arXiv] Attention Models for Point Clouds in Deep Learning: A Survey, [paper]
[arXiv] Regularization Strategy for Point Cloud via Rigidly Mixed Sample, [paper]
[arXiv] The Devils in the Point Clouds: Studying the Robustness of Point Cloud Convolutions, [paper]
[arXiv] Self-Supervised Pretraining of 3D Features on any Point-Cloud, [paper]
2020:
[arXiv] P4Contrast: Contrastive Learning with Pairs of Point-Pixel Pairs for RGB-D Scene Understanding, [paper]
[arXiv] Hausdorff Point Convolution with Geometric Priors, [paper]
[arXiv] PCT: Point Cloud Transformer, [paper]
[arXiv] Point Transformer, [paper]
[arXiv] One Point is All You Need: Directional Attention Point for Feature Learning, [paper]
[arXiv] Deep Positional and Relational Feature Learning for Rotation-Invariant Point Cloud Analysis, [paper]
[arXiv] MARNet: Multi-Abstraction Refinement Network for 3D Point Cloud Analysis, [paper]
[arXiv] Point Transformer, [paper]
[NeurIPS] Rotation-Invariant Local-to-Global Representation Learning for 3D Point Cloud, [paper]
[3DV] RANP: Resource Aware Neuron Pruning at Initialization for 3D CNNs, [paper]
[arXiv] Spatial Transformer Point Convolution, [paper]
[BMVC] Neighbourhood-Insensitive Point Cloud Normal Estimation Network, [paper] [code]
[arXiv] LC-NAS: Latency Constrained Neural Architecture Search for Point Cloud Networks, [paper]
[arXiv] Global Context Aware Convolutions for 3D Point Cloud Understanding, [paper]
[arXiv] Self-Supervised Learning of Point Clouds via Orientation Estimation, [paper]
[arXiv] Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution, [paper]
[arXiv] Unsupervised 3D Learning for Shape Analysis via Multiresolution Instance Discrimination, [paper]
[arXiv] Rethinking PointNet Embedding for Faster and Compact Model, [paper]
[arXiv] PointMask: Towards Interpretable and Bias-Resilient Point Cloud Processing, [paper]
[arXiv] A Closer Look at Local Aggregation Operators in Point Cloud Analysis, [paper]
[arXiv] PAI-Conv: Permutable Anisotropic Convolutional Networks for Learning on Point Clouds, [paper]
[arXiv] Shape-Oriented Convolution Neural Network for Point Cloud Analysis, [paper]
[arXiv] LightConvPoint: convolution for points, [paper]
[arXiv] Review: deep learning on 3D point clouds, [paper]
[arXiv] Improving Semantic Analysis on Point Clouds via Auxiliary Supervision of Local Geometric Priors, [paper]
2019:
[arXiv] Quaternion Equivariant Capsule Networks for 3D Point Clouds, [paper]
[arXiv] Geometry Sharing Network for 3D Point Cloud Classification and Segmentation, [paper]
[arXiv] Geometric Capsule Autoencoders for 3D Point Clouds, [paper]
[arXiv] Utility Analysis of Network Architectures for 3D Point Cloud Processing, [paper]
[arXiv] Kaolin: A PyTorch Library for Accelerating 3D Deep Learning Research, [paper] [code]
[ICCV] DensePoint: Learning Densely Contextual Representation for Efficient Point Cloud Processing, [paper] [code]
[TOG] Dynamic Graph CNN for Learning on Point Clouds, [paper] [code]
[ICCV] DeepGCNs: Can GCNs Go as Deep as CNNs?, [paper] [code]
[ICCV] KPConv: Flexible and Deformable Convolution for Point Clouds, [paper] [code]
[MM] SRINet: Learning Strictly Rotation-Invariant Representations for Point Cloud Classification and Segmentation, [paper]
[CVPR] PointConv: Deep Convolutional Networks on 3D Point Clouds, [paper] [code]
[CVPR] PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing, [paper] [code]
[CVPR] Modeling Local Geometric Structure of 3D Point Clouds using Geo-CNN, [paper] [code]
[CVPR] A-CNN: Annularly Convolutional Neural Networks on Point Clouds, [paper] [code]
[arXiv] SAWNet: A Spatially Aware Deep Neural Network for 3D Point Cloud Processing, [paper]
[arXiv] PyramNet: Point Cloud Pyramid Attention Network and Graph Embedding Module for Classification and Segmentation, [paper]
[ICCV] Interpolated Convolutional Networks for 3D Point Cloud Understanding, [paper]
[arXiv] A survey on Deep Learning Advances on Different 3D Data Representations, [paper]
2018:
[TOG] MCCNN: Monte Carlo Convolution for Learning on Non-Uniformly Sampled Point Clouds, [paper] [code]
[NeurIPS] PointCNN: Convolution On X-Transformed Points, [paper] [code]
[CVPR] Mining Point Cloud Local Structures by Kernel Correlation and Graph Pooling, [paper] [code]
[CVPR] SO-Net: Self-Organizing Network for Point Cloud Analysis, [paper] [code]
[CVPR] SPLATNet: Sparse Lattice Networks for Point Cloud Processing, [paper] [code]
[CVPR] Local Spectral Graph Convolution for Point Set Feature Learning, [paper] [code]
[arXiv] Point Convolutional Neural Networks by Extension Operators, [paper]
2017:
[ICCV] Escape from Cells: Deep Kd-Networks for the Recognition of 3D Point Cloud Models, [paper] [code]
[CVPR] PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation, [paper] [code]
[NeurIPS] PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space, [paper] [code]
[CVPR] SyncSpecCNN: Synchronized Spectral CNN for 3D Shape Segmentation, [paper]
2. Object Pose Estimation
This part mainly discuss 6D object pose estimation methods, which can be categorized into RGB-D image-based methods and point cloud-based methods. RGB-D image-based methods mainly utilized the 2D RGB image and the 2.5D Depth image. Point cloud-based methods utilize registration-based methods.
2.1 RGB-D Image-based Methods
Survey papers:
2020:
[EGW] SHREC 2020 track: 6D Object Pose Estimation, [paper]
[ECCVW] BOP Challenge 2020 on 6D Object Localization, [paper]
[arXiv] A Survey on Deep Learning for Localization and Mapping: Towards the Age of Spatial Machine Intelligence, [paper]
[arXiv] Recent Advances in 3D Object and Hand Pose Estimation, [paper]
[arXiv] A Review on Object Pose Recovery: from 3D Bounding Box Detectors to Full 6D Pose Estimators, [paper]
2016:
[ECCVW] A Summary of the 4th International Workshop on Recovering 6D Object Pose, [paper]
2.1.1 Correspondence-based Methods
a. Match 2D feature points
2021:
[arXiv] P2-Net: Joint Description and Detection of Local Features for Pixel and Point Matching, [paper]
2020:
[arXiv] A Method to Generate High Precision Mesh Model and RGB-D Datasetfor 6D Pose Estimation Task, [paper]
[MM] LodoNet: A Deep Neural Network with 2D Keypoint Matchingfor 3D LiDAR Odometry Estimation, [paper]
[ECCV] Solving the Blind Perspective-n-Point Problem End-To-End With Robust Differentiable Geometric Optimization, [paper]
[arXiv] Delta Descriptors: Change-Based Place Representation for Robust Visual Localization, [paper]
[arXiv] Unconstrained Matching of 2D and 3D Descriptors for 6-DOF Pose Estimation, [paper]
[arXiv] S2DNet: Learning Accurate Correspondences for Sparse-to-Dense Feature Matching, [paper]
[arXiv] SK-Net: Deep Learning on Point Cloud via End-to-end Discovery of Spatial Keypoints, [paper]
[arXiv] LRC-Net: Learning Discriminative Features on Point Clouds by Encoding Local Region Contexts, [paper]
[arXiv] Table-Top Scene Analysis Using Knowledge-Supervised MCMC, [paper]
[arXiv] AprilTags 3D: Dynamic Fiducial Markers for Robust Pose Estimation in Highly Reflective Environments and Indirect Communication in Swarm Robotics, [paper]
[AAAI] LCD: Learned Cross-Domain Descriptors for 2D-3D Matching, [paper] [project]
2019:
[ICCV] GLAMpoints: Greedily Learned Accurate Match points, [paper]
2018:
[TPAMI] Re-weighting and 1-Point RANSAC-Based PnP Solution to Handle Outliers, [paper] [code]
2016:
[ECCV] LIFT: Learned Invariant Feature Transform, [paper]
2012:
[3DIMPVT] 3D Object Detection and Localization using Multimodal Point Pair Features, [paper]
b. Regress 2D projections
2021:
[CVPR] DSC-PoseNet: Learning 6DoF Object Pose Estimation via Dual-scale Consistency, [paper]
2020:
[arXiv] Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild with Pose Annotations, [paper] [project]
[arXiv] PyraPose: Feature Pyramids for Fast and Accurate Object Pose Estimation under Domain Shift, [paper]
[arXiv] REDE: End-to-end Object 6D Pose Robust Estimation Using Differentiable Outliers Elimination, [paper]
[arXiv] 3D Object Detection and Pose Estimation of Unseen Objects in Color Images with Local Surface Embeddings, [paper]
[arXiv] Robust RGB-based 6-DoF Pose Estimation without Real Pose Annotations, [paper]
[arXiv] PrimA6D: Rotational Primitive Reconstruction for Enhanced and Robust 6D Pose Estimation, [paper]
[arXiv] EPOS: Estimating 6D Pose of Objects with Symmetries, [paper]
[arXiv] Tackling Two Challenges of 6D Object Pose Estimation: Lack of Real Annotated RGB Images and Scalability to Number of Objects, [paper]
[arXiv] Squeezed Deep 6DoF Object Detection using Knowledge Distillation, [paper]
[arXiv] Learning 2D–3D Correspondences To Solve The Blind Perspective-n-Point Problem, [paper]
[arXiv] PnP-Net: A hybrid Perspective-n-Point Network, [paper]
[arXiv] Object 6D Pose Estimation with Non-local Attention, [paper]
[arXiv] 6DoF Object Pose Estimation via Differentiable Proxy Voting Loss, [paper]
2019:
[arXiv] DPOD: 6D Pose Object Detector and Refiner, [paper]
[CVPR] Segmentation-driven 6D Object Pose Estimation, [paper] [code]
[arXiv] Single-Stage 6D Object Pose Estimation, [paper]
[arXiv] W-PoseNet: Dense Correspondence Regularized Pixel Pair Pose Regression, [paper]
[arXiv] KeyPose: Multi-view 3D Labeling and Keypoint Estimation for Transparent Objects, [paper]
2018:
[CVPR] Real-time seamless single shot 6d object pose prediction, [paper] [code]
[arXiv] Estimating 6D Pose From Localizing Designated Surface Keypoints, [paper]
2017:
[ICCV] BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth, [paper]
[ICCV] SSD-6D: Making rgb-based 3d detection and 6d pose estimation great again, [paper] [code]
[ICRA] 6-DoF Object Pose from Semantic Keypoints, [paper]
2.1.2 Template-based Methods
This kind of methods can be regarded as regression-based methods.
2021:
[ICRA] Investigations on Output Parameterizations of Neural Networks for Single Shot 6D Object Pose Estimation, [paper]
[arXiv] RePOSE: Real-Time Iterative Rendering and Refinement for 6D Object Pose Estimation, [paper]
[ICRA] CloudAAE: Learning 6D Object Pose Regression with On-line Data Synthesis on Point Clouds, [paper]
[arXiv] GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation, [paper]
[arXiv] StablePose: Learning 6D Object Poses from Geometrically Stable Patches, [paper]
[arXiv] Spatial Attention Improves Iterative 6D Object Pose Estimation, [paper]
2020:
[CVPR] PFRL: Pose-Free Reinforcement Learning for 6D Pose Estimation, [paper]
[arXiv] iNeRF: Inverting Neural Radiance Fields for Pose Estimation, [paper]
[NeurIPSW] End-to-End Differentiable 6DoF Object Pose Estimation with Local and Global Constraints, [paper]
[arXiv] Bridging the Performance Gap Between Pose Estimation Networks Trained on Real And Synthetic Data Using Domain Randomization, [paper]
[arXiv] EfficientPose: An efficient, accurate and scalable end-to-end 6D multi object pose estimation approach, [paper]
[arXiv] Pose Estimation of Specular and Symmetrical Objects, [paper]
[arXiv] I Like to Move It: 6D Pose Estimation as an Action Decision Process, [paper]
[IROS] Indirect Object-to-Robot Pose Estimation from an External Monocular RGB Camera, [paper]
[ECCV] CosyPose: Consistent multi-view multi-object 6D pose estimation, [paper]
[arXiv] PAM: Point-wise Attention Module for 6D Object Pose Estimation, [paper]
[IROS] PERCH 2.0 : Fast and Accurate GPU-based Perception via Search for Object Pose Estimation, [paper] [code]
[IROS] Robust Ego and Object 6-DoF Motion Estimation and Tracking, [paper]
[IROS] se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains, [paper]
[arXiv] Learning Orientation Distributions for Object Pose Estimation, [paper]
[arXiv] A survey on deep supervised hashing methods for image retrieval, [paper]
[arXiv] Neural Object Learning for 6D Pose Estimation Using a Few Cluttered Images, [paper]
[arXiv] How to track your dragon: A Multi-Attentional Framework for real-time RGB-D 6-DOF Object Pose Tracking, [paper]
[arXiv] Self6D: Self-Supervised Monocular 6D Object Pose Estimation, [paper]
[arXiv] A Novel Pose Proposal Network and Refinement Pipeline for Better Object Pose Estimation, [paper]
[arXiv] G2L-Net: Global to Local Network for Real-time 6D Pose Estimation with Embedding Vector Features, [paper] [code]
[arXiv] Neural Mesh Refiner for 6-DoF Pose Estimation, [paper]
[arXiv] MobilePose: Real-Time Pose Estimation for Unseen Objects with Weak Shape Supervision, [paper]
[arXiv] Robust 6D Object Pose Estimation by Learning RGB-D Features, [paper]
[arXiv] HybridPose: 6D Object Pose Estimation under Hybrid Representations, [paper] [code]
2019:
[arXiv] P2GNet: Pose-Guided Point Cloud Generating Networks for 6-DoF Object Pose Estimation, [paper]
[arXiv] ConvPoseCNN: Dense Convolutional 6D Object Pose Estimation, [paper]
[arXiv] PointPoseNet: Accurate Object Detection and 6 DoF Pose Estimation in Point Clouds, [paper]
[RSS] PoseRBPF: A Rao-Blackwellized Particle Filter for 6D Object Pose Tracking, [paper]
[arXiv] Multi-View Matching Network for 6D Pose Estimation, [paper]
[arXiv] Fast 3D Pose Refinement with RGB Images, [paper]
[arXiv] MaskedFusion: Mask-based 6D Object Pose Detection, [paper]
[CoRL] Scene-level Pose Estimation for Multiple Instances of Densely Packed Objects, [paper]
[IROS] Learning to Estimate Pose and Shape of Hand-Held Objects from RGB Images, [paper]
[IROSW] Motion-Nets: 6D Tracking of Unknown Objects in Unseen Environments using RGB, [paper]
[ICCV] DPOD: 6D Pose Object Detector and Refiner, [paper]
[ICCV] CDPN: Coordinates-Based Disentangled Pose Network for Real-Time RGB-Based 6-DoF Object Pose Estimation, [paper] [code]
[ICCV] Pix2Pose: Pixel-Wise Coordinate Regression of Objects for 6D Pose Estimation, [paper] [code]
[ICCV] Explaining the Ambiguity of Object Detection and 6D Pose From Visual Data, [paper]
[arXiv] Active 6D Multi-Object Pose Estimation in Cluttered Scenarios with Deep Reinforcement Learning, [paper]
[arXiv] Accurate 6D Object Pose Estimation by Pose Conditioned Mesh Reconstruction, [paper]
[arXiv] Learning Object Localization and 6D Pose Estimation from Simulation and Weakly Labeled Real Images, [paper]
[ICHR] Refining 6D Object Pose Predictions using Abstract Render-and-Compare, [paper]
[arXiv] Deep-6dpose: recovering 6d object pose from a single rgb image, [paper]
[arXiv] Real-time Background-aware 3D Textureless Object Pose Estimation, [paper]
[IROS] SilhoNet: An RGB Method for 6D Object Pose Estimation, [paper]
2018:
[ECCV] AAE: Implicit 3D Orientation Learning for 6D Object Detection From RGB Images, [paper] [code]
[ECCV] DeepIM:Deep Iterative Matching for 6D Pose Estimation, [paper] [code]
[RSS] Posecnn: A convolutional neural network for 6d object pose estimation in cluttered scenes, [paper] [code]
[IROS] Robust 6D Object Pose Estimation in Cluttered Scenes using Semantic Segmentation and Pose Regression Networks, [paper]
2012:
[ACCV] Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes, [paper]
2.1.3 Voting-based Methods
2021:
[arXiv] Vote from the Center: 6 DoF Pose Estimation in RGB-D Images by Radial Keypoint Voting, [paper]
2020:
[arXiv] A Hybrid Approach for 6DoF Pose Estimation, [paper]
2019:
[CVPR] PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation, [paper] [code]
2017:
[TPAMI] Robust 3D Object Tracking from Monocular Images Using Stable Parts, [paper]
[Access] Fast Object Pose Estimation Using Adaptive Threshold for Bin-picking, [paper]
2014:
[ECCV] Learning 6d object pose estimation using 3d object coordinate, [paper]
[ECCV] Latent-class hough forests for 3d object detection and pose estimation, [paper]
Datasets:
LineMOD: Model based training, detection and pose estimation of texture-less 3d objects in heavily cluttered scenes, ACCV, 2012 [paper] [database]
YCB Datasets: The YCB Object and Model Set: Towards Common Benchmarks for Manipulation Research, IEEE International Conference on Advanced Robotics (ICAR), 2015 [paper]
T-LESS Datasets: T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects, IEEE Winter Conference on Applications of Computer Vision (WACV), 2017 [paper]
HomebrewedDB: RGB-D Dataset for 6D Pose Estimation of 3D Objects, ICCVW, 2019 [paper]
YCB-M: A Multi-Camera RGB-D Dataset for Object Recognition and 6DoF Pose Estimation, arXiv, 2020, [paper] [database]
2.2 Point Cloud-based Methods
The partial-view point cloud will be aligned to the complete shape in order to obtain the 6D pose. Generally, coarse registration should be conduct firstly to provide an intial alignment, and dense registration methods like ICP (Iterative Closest Point) will be conducted to obtain the final 6D pose.
2.2.1 Correspondence-based Methods
2021:
[arXiv] Pairwise Point Cloud Registration Using Graph Matching and Rotation-invariant Features, [paper]
[ICRA] 3D3L: Deep Learned 3D Keypoint Detection and Description for LiDARs, [paper]
[arXiv] PRIN/SPRIN: On Extracting Point-wise Rotation Invariant Features, [paper]
2020:
[arXiv] Geometric robust descriptor for 3D point cloud, [paper]
[arXiv] SpinNet: Learning a General Surface Descriptor for 3D Point Cloud Registration, [paper]
[arXiv] UKPGAN: Unsupervised KeyPoint GANeration, [paper]
[ICIP] Distinctive 3D local deep descriptors, [paper]
[arXiv] 3D Correspondence Grouping with Compatibility Features, [paper]
[ECCV] DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF Relocalization, [paper]
[arXiv] Radial intersection count image: a clutter resistant 3D shape descriptor, [paper]
[PRL] Fuzzy Logic and Histogram of Normal Orientation-based 3D Keypoint Detection for Point Clouds, [paper]
[arXiv] Latent Fingerprint Registration via Matching Densely Sampled Points, [paper]
[arXiv] RPM-Net: Robust Point Matching using Learned Features, [paper]
[arXiv] End-to-End Learning Local Multi-view Descriptors for 3D Point Clouds, [paper]
[arXiv] D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features, [paper]
[arXiv] Self-supervised Point Set Local Descriptors for Point Cloud Registration, [paper]
[arXiv] StickyPillars: Robust feature matching on point clouds using Graph Neural Networks, [paper]
2019:
[arXiv] 3DRegNet: A Deep Neural Network for 3D Point Registration, [paper] [code]
[CVPR] The Perfect Match: 3D Point Cloud Matching with Smoothed Densities, [paper]
[arXiv] LCD: Learned Cross-Domain Descriptors for 2D-3D Matching, [paper]
2018:
[ECCV] 3DFeat-Net: Weakly Supervised Local 3D Features for Point Cloud Registration, [paper] [code]
2017:
[CVPR] 3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions, [paper] [code]
2016:
[arXiv] Lessons from the Amazon Picking Challenge, [paper]
[arXiv] Team Delft's Robot Winner of the Amazon Picking Challenge 2016, [paper]
[IJCV] A comprehensive performance evaluation of 3D local feature descriptors, [paper]
2014:
[CVIU] SHOT: Unique signatures of histograms for surface and texture description, [paper)]
2011:
[ICCVW] CAD-model recognition and 6DOF pose estimation using 3D cues, [paper]
2009:
[ICRA] Fast Point Feature Histograms (FPFH) for 3D registration, [paper]
2.2.2 Template-based Methods
Survey papers:
[2021-arXiv] A comprehensive survey on point cloud registration, [paper]
[2020-arXiv] When Deep Learning Meets Data Alignment: A Review on Deep Registration Networks (DRNs), [paper]
[2020-arXiv] Least Squares Optimization: from Theory to Practice, [paper]
2021:
[arXiv] Deep Weighted Consensus (DWC) Dense correspondence confidence maps for 3D shape registration, [paper]
[arXiv] ICOS: Efficient and Highly Robust Point Cloud Registration with Correspondences, [paper]
[arXiv] An Improved Discriminative Optimization for 3D Rigid Point Cloud Registration, [paper]
[arXiv] RANSIC: Fast and Highly Robust Estimation for Rotation Search and Point Cloud Registration using Invariant Compatibility, [paper]
[CVPR] RPSRNet: End-to-End Trainable Rigid Point Set Registration Network using Barnes-Hut 2D-Tree Representation, [paper]
[arXiv] LSG-CPD: Coherent Point Drift with Local Surface Geometry for Point Cloud Registration, [paper]
[arXiv] 3D Point Cloud Registration with Multi-Scale Architecture and Self-supervised Fine-tuning, [paper] [code]
[CVPR] ReAgent: Point Cloud Registration using Imitation and Reinforcement Learning, [paper] [code]
[arXiv] 3DMNDT: 3D multi-view registration method based on the normal distributions transform, [paper]
[arXiv] Generating Annotated Training Data for 6D Object Pose Estimation in Operational Environments with Minimal User Interaction, [paper]
[arXiv] R-PointHop: A Green, Accurate and Unsupervised Point Cloud Registration Method, [paper] [code]
[CVPR] Robust Point Cloud Registration Framework Based on Deep Graph Matching, [paper] [code]
[CVPR] PointDSC: Robust Point Cloud Registration using Deep Spatial Consistency, [paper] [code]
[arXiv] IRON: Invariant-based Highly Robust Point Cloud Registration, [paper]
[arXiv] Dynamical Pose Estimation, [paper]
[arXiv] OMNet: Learning Overlapping Mask for Partial-to-Partial Point Cloud Registration, [paper]
[arXiv] UnsupervisedR&R: Unsupervised Point Cloud Registration via Differentiable Rendering, [paper]
[arXiv] A Parameterised Quantum Circuit Approach to Point Set Matching, [paper]
[arXiv] Hybrid Trilinear and Bilinear Programming for Aligning Partially Overlapping Point Sets, [paper]
[arXiv] Provably Approximated ICP, [paper]
2020:
[IROS] End-to-End 3D Point Cloud Learning for Registration Task Using Virtual Correspondences, [paper]
[arXiv] PREDATOR: Registration of 3D Point Clouds with Low Overlap, [paper]
[arXiv] Recurrent Multi-view Alignment Network for Unsupervised Surface Registration, [paper]
[arXiv] 3D Registration for Self-Occluded Objects in Context, [paper]
[arXiv] Multi-Features Guidance Network for partial-to-partial point cloud registrationm, [paper]
[arXiv] Point Cloud Registration Based on Consistency Evaluation of Rigid Transformation in Parameter Space, [paper]
[arXiv] On Efficient and Robust Metrics for RANSAC Hypotheses and 3D Rigid Registration, [paper]
[IROSW] Improving the Iterative Closest Point Algorithm using Lie Algebra, [paper]
[arXiv] Graphite: GRAPH-Induced feaTure Extraction for Point Cloud Registration, [paper]
[3DV] Registration Loss Learning for Deep Probabilistic Point Set Registration, [paper]
[3DV] MaskNet: A Fully-Convolutional Network to Estimate Inlier Points, [paper]
[arXiv] 3D Meta-Registration: Learning to Learn Registration of 3D Point Clouds, [paper]
[arXiv] A Termination Criterion for Probabilistic PointClouds Registration, [paper]
[ACCV] Mapping of Sparse 3D Data using Alternating Projection, [paper]
[ACCV] Best Buddies Registration for Point Clouds, [paper]
[arXiv] Deep-3DAligner: Unsupervised 3D Point Set Registration Network With Optimizable Latent Vector, [paper]
[arXiv] Fast Gravitational Approach for Rigid Point Set Registration with Ordinary Differential Equations, [paper]
[arXiv] Unsupervised Partial Point Set Registration via Joint Shape Completion and Registration, [paper]
[VCIP] Unsupervised Point Cloud Registration via Salient Points Analysis (SPA), [paper]
[arXiv] Deterministic PointNetLK for Generalized Registration, [paper]
[ECCV] DeepGMR: Learning Latent Gaussian Mixture Models for Registration, [paper]
[ITSC] DeepCLR: Correspondence-Less Architecture for Deep End-to-End Point Cloud Registration, [paper]
[arXiv] Fast and Robust Iterative Closet Point, [paper]
[arXiv] The Phong Surface: Efficient 3D Model Fitting using Lifted Optimization, [paper]
[arXiv] Aligning Partially Overlapping Point Sets: an Inner Approximation Algorithm, [paper]
[arXiv] An Analysis of SVD for Deep Rotation Estimation, [paper]
[arXiv] Applying Lie Groups Approaches for Rigid Registration of Point Clouds, [paper]
[arXiv] Unsupervised Learning of 3D Point Set Registration, [paper]
[arXiv] Minimum Potential Energy of Point Cloud for Robust Global Registration, [paper]
[arXiv] Learning 3D-3D Correspondences for One-shot Partial-to-partial Registration, [paper]
[arXiv] A Dynamical Perspective on Point Cloud Registration, [paper]
[arXiv] Feature-metric Registration: A Fast Semi-supervised Approach for Robust Point Cloud Registration without Correspondences, [paper]
[CVPR] Deep Global Registration, [paper]
[arXiv] DPDist : Comparing Point Clouds Using Deep Point Cloud Distance, [paper]
[arXiv] Single Shot 6D Object Pose Estimation, [paper]
[arXiv] A Benchmark for Point Clouds Registration Algorithms, [paper] [code]
[arXiv] PointGMM: a Neural GMM Network for Point Clouds, [paper]
[arXiv] SceneCAD: Predicting Object Alignments and Layouts in RGB-D Scans, [paper]
[arXiv] TEASER: Fast and Certifiable Point Cloud Registration, [paper] [code]
[arXiv] Plane Pair Matching for Efficient 3D View Registration, [paper]
[arXiv] Learning multiview 3D point cloud registration, [paper]
[ICRA] Robust, Occlusion-aware Pose Estimation for Objects Grasped by Adaptive Hands, [paper] [code]
[arXiv] Non-iterative One-step Solution for Point Set Registration Problem on Pose Estimation without Correspondence, [paper]
[arXiv] 6D Object Pose Regression via Supervised Learning on Point Clouds, [paper]
2019:
[IROS] Continuous close-range 3D object pose estimation, [paper]
[arXiv] One Framework to Register Them All: PointNet Encoding for Point Cloud Alignment, [paper]
[arXiv] DeepICP: An End-to-End Deep Neural Network for 3D Point Cloud Registration, [paper]
[NeurIPS] PRNet: Self-Supervised Learning for Partial-to-Partial Registration, [paper]
[CVPR] PointNetLK: Robust & Efficient Point Cloud Registration using PointNet, [paper] [code]
[ICCV] End-to-End CAD Model Retrieval and 9DoF Alignment in 3D Scans, [paper]
[arXiv] Iterative Matching Point, [paper]
[arXiv] Deep Closest Point: Learning Representations for Point Cloud Registration, [paper] [code]
[arXiv] PCRNet: Point Cloud Registration Network using PointNet Encoding, [paper] [code]
2016:
[TPAMI] Go-ICP: A Globally Optimal Solution to 3D ICP Point-Set Registration, [paper] [code]
2014:
[SGP] Super 4PCS Fast Global Pointcloud Registration via Smart Indexing, [paper] [code]
2.2.3 Voting-based Methods
2021:
[CVPR] FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation, [paper]
2020:
[arXiv] 3D Point-to-Keypoint Voting Network for 6D Pose Estimation, [paper]
[arXiv] 3DPVNet: Patch-level 3D Hough Voting Network for 6D Pose Estimation, [paper]
[arXiv] MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion, [paper] [code]
[arXiv] YOLOff: You Only Learn Offsets for robust 6DoF object pose estimation, [paper]
[arXiv] LRF-Net: Learning Local Reference Frames for 3D Local Shape Description and Matching, [paper]
2019:
[arXiv] PVN3D: A Deep Point-wise 3D Keypoints Voting Network for 6DoF Pose Estimation, [paper] [code]
[CVPR] Densefusion: 6d object pose estimation by iterative dense fusion, [paper] [code]
2.3 Category-level Methods
2.3.1 Category-level 6D pose estimation
2021:
[arXiv] Towards Real-World Category-level Articulation Pose Estimation, [paper]
[arXiv] CAPTRA: CAtegory-level Pose Tracking for Rigid and Articulated Objects from Point Clouds, [paper]
[CVPR] FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism, [paper]
[arXiv] DualPoseNet: Category-level 6D Object Pose and Size Estimation using Dual Pose Network with Refined Learning of Pose Consistency, [paper]
2020:
[IROS] Fully Convolutional Geometric Features for Category-level Object Alignment, [paper]
[arXiv] Category Level Object Pose Estimation via Neural Analysis-by-Synthesis, [paper]
[ECCV] Geometric Correspondence Fields: Learned Differentiable Rendering for 3D Pose Refinement in the Wild, [paper]
[ECCV] Shape Prior Deformation for Categorical 6D Object Pose and Size Estimation, [paper]
[arXiv] CPS: Class-level 6D Pose and Shape Estimation From Monocular Images, [paper]
[arXiv] Learning Canonical Shape Space for Category-Level 6D Object Pose and Size Estimation, [paper]
2019:
[arXiv] Category-Level Articulated Object Pose Estimation, [paper]
[arXiv] LatentFusion: End-to-End Differentiable Reconstruction and Rendering for Unseen Object Pose Estimation, [paper]
[arXiv] 6-PACK: Category-level 6D Pose Tracker with Anchor-Based Keypoints, [paper] [code]
[arXiv] Self-Supervised 3D Keypoint Learning for Ego-motion Estimation, [paper]
[CVPR] Normalized Object Coordinate Space for Category-Level 6D Object Pose and Size Estimation, [paper] [code]
[arXiv] Instance- and Category-level 6D Object Pose Estimation, [paper]
[arXiv] kPAM: KeyPoint Affordances for Category-Level Robotic Manipulation, [paper]
2.3.2 3D shape reconstruction from images
2021:
[arXiv] Optimal Pose and Shape Estimation for Category-level 3D Object Perception, [paper]
[arXiv] FiG-NeRF: Figure-Ground Neural Radiance Fields for 3D Object Category Modelling, [paper]
[CVPR] Shape and Material Capture at Home, [paper]
[CVPR] Monte Carlo Scene Search for 3D Scene Understanding, [paper]
[arXiv] Holistic 3D Scene Understanding from a Single Image with Implicit Representation, [paper]
[arXiv] Adjoint Rigid Transform Network: Self-supervised Alignment of 3D Shapes, [paper]
[arXiv] Joint Learning of 3D Shape Retrieval and Deformation, [paper]
2020:
[arXiv] From Points to Multi-Object 3D Reconstruction, [paper]
[arXiv] Vid2CAD: CAD Model Alignment using Multi-View Constraints from Videos, [paper]
[arXiv] Holistic 3D Human and Scene Mesh Estimation from Single View Images, [paper]
[ECCV] Pix2Surf: Learning Parametric 3D Surface Models of Objects from Images, [paper]
[arXiv] SkeletonNet: A Topology-Preserving Solution for Learning Mesh Reconstruction of Object Surfaces from RGB Images, [paper]
[arXiv] OpenRooms: An End-to-End Open Framework for Photorealistic Indoor Scene Datasets, [paper]
[ECCV] Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve, [paper]
[CVPR] OASIS: A Large-Scale Dataset for Single Image 3D in the Wild, [paper]
[ECCV] Ladybird: Quasi-Monte Carlo Sampling for Deep Implicit Field Based 3D Reconstruction with Symmetry, [paper]
[ECCV] Associative3D: Volumetric Reconstruction from Sparse Views, [paper]
[ECCV] Shape and Viewpoint without Keypoints, [paper]
[arXiv] 3D Shape Reconstruction from Vision and Touch, [paper]
[arXiv] Joint Hand-object 3D Reconstruction from a Single Image with Cross-branch Feature Fusion, [paper]
[arXiv] Pix2Vox++: Multi-scale Context-aware 3D Object Reconstruction from Single and Multiple Images, [paper]
[arXiv] 3D Shape Reconstruction from Free-Hand Sketches, [paper]
[arXiv] Learning to Detect 3D Reflection Symmetry for Single-View Reconstruction, [paper]
[arXiv] Convolutional Generation of Textured 3D Meshes, [paper]
[arXiv] 3D Reconstruction of Novel Object Shapes from Single Images, [paper]
[arXiv] Novel Object Viewpoint Estimation through Reconstruction Alignment, [paper]
[arXiv] UCLID-Net: Single View Reconstruction in Object Space, [paper]
[arXiv] SurfaceNet+: An End-to-end 3D Neural Network for Very Sparse Multi-view Stereopsis, [paper]
[arXiv] FroDO: From Detections to 3D Objects, [paper]
[arXiv] CoReNet: Coherent 3D scene reconstruction from a single RGB image, [paper]
[arXiv] Reconstruct, Rasterize and Backprop: Dense shape and pose estimation from a single image, [paper]
[arXiv] Through the Looking Glass: Neural 3D Reconstruction of Transparent Shapes, [paper]
[arXiv] Few-Shot Single-View 3-D Object Reconstruction with Compositional Priors, [paper]
[arXiv] Neural Object Descriptors for Multi-View Shape Reconstruction, [paper]
[arXiv] Leveraging 2D Data to Learn Textured 3D Mesh Generation, [paper]
[arXiv] Deep 3D Capture: Geometry and Reflectance from Sparse Multi-View Images, [paper]
[arXiv] Self-Supervised 2D Image to 3D Shape Translation with Disentangled Representations, [paper]
[arXiv] Atlas: End-to-End 3D Scene Reconstruction from Posed Images, [paper]
[arXiv] Instant recovery of shape from spectrum via latent space connections, [paper]
[arXiv] Self-supervised Single-view 3D Reconstruction via Semantic Consistency, [paper]
[arXiv] Meta3D: Single-View 3D Object Reconstruction from Shape Priors in Memory, [paper]
[arXiv] STD-Net: Structure-preserving and Topology-adaptive Deformation Network for 3D Reconstruction from a Single Image, [paper]
[arXiv] Inverse Graphics GAN: Learning to Generate 3D Shapes from Unstructured 2D Data, [paper]
[arXiv] Deep NRSfM++: Towards 3D Reconstruction in the Wild, [paper]
[arXiv] Learning to Correct 3D Reconstructions from Multiple Views, [paper]
2019:
[arXiv] Boundary Cues for 3D Object Shape Recovery, [paper]
[arXiv] Learning to Generate Dense Point Clouds with Textures on Multiple Categories, [paper]
[arXiv] Front2Back: Single View 3D Shape Reconstruction via Front to Back Prediction, [paper]
[arXiv] Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision, [paper]
[arXiv] SDFDiff: Differentiable Rendering of Signed Distance Fields for 3D Shape Optimization, [paper]
[arXiv] 3D-GMNet: Learning to Estimate 3D Shape from A Single Image As A Gaussian Mixture, [paper]
[arXiv] Deep-Learning Assisted High-Resolution Binocular Stereo Depth Reconstruction, [paper]
2.3.3 3D shape rendering
2020:
[NeurIPS] Unsupervised Continuous Object Representation Networks for Novel View Synthesis, [paper]
[ECCV] AUTO3D: Novel view synthesis through unsupervisely learned variational viewpoint and global 3D representation, [paper]
[ICML] DRWR: A Differentiable Renderer without Rendering for Unsupervised 3D Structure Learning from Silhouette Images, [paper]
[arXiv] Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image Decomposition, [paper]
[arXiv] SPSG: Self-Supervised Photometric Scene Generation from RGB-D Scans, [paper]
[arXiv] Differentiable Rendering: A Survey, [paper]
[arXiv] Equivariant Neural Rendering, [paper]
2019:
[arXiv] SynSin: End-to-end View Synthesis from a Single Image, [paper] [project]
[arXiv] Neural Point Cloud Rendering via Multi-Plane Projection, [paper]
[arXiv] Neural Voxel Renderer: Learning an Accurate and Controllable Rendering Tool, [paper]
3. 2D Planar Grasp
3.1 Estimating Grasp Contact Points
2021:
[arXiv] Lightweight Convolutional Neural Network with Gaussian-based Grasping Representation for Robotic Grasping Detection, [paper]
2020:
[arXiv] S3K: Self-Supervised Semantic Keypoints for Robotic Manipulation via Multi-View Consistency, [paper]
[arXiv] Dexterous Robotic Grasping with Object-Centric Visual Affordances, [paper]
[IROS] Cloth Region Segmentation for Robust Grasp Selection, [paper]
2019:
[arXiv] Multi-modal Transfer Learning for Grasping Transparent and Specular Objects, [paper]
[IROS] GQ-STN: Optimizing One-Shot Grasp Detection based on Robustness Classifier, [paper]
[ICRA] Mechanical Search: Multi-Step Retrieval of a Target Object Occluded by Clutter, [paper]
[ICRA] MetaGrasp: Data Efficient Grasping by Affordance Interpreter Network, [paper]
[IROS] GlassLoc: Plenoptic Grasp Pose Detection in Transparent Clutter, [paper]
[ICRA] Multi-View Picking: Next-best-view Reaching for Improved Grasping in Clutter, [paper] [code]
2018:
[RSS] Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach, [paper]
[BMVC] EnsembleNet: Improving Grasp Detection using an Ensemble of Convolutional Neural Networks, [paper]
[ICRA] Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching, [paper] [code]
2017:
[RSS] Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics, [paper] [code]
2014:
[ICRA] Fast graspability evaluation on single depth maps for bin picking with general grippers, [paper]
Dataset:
Dex-Net, a synthetic dataset of 6.7 million point clouds, grasps, and robust analytic grasp metrics generated from thousands of 3D models.
3.2 Estimating Oriented Rectangles
2020:
[arXiv] Effective Deployment of CNNs for 3DoF Pose Estimation and Grasping in Industrial Settings, [paper]
[arXiv] Robotic grasp detection using a novel two-stage approach, [paper]
[IROS] Grasping Detection Network with Uncertainty Estimation for Confidence-Driven Semi-Supervised Domain Adaptation, [paper]
[arXiv] Orientation Attentive Robot Grasp Synthesis, [paper]
[arXiv] Stereo Vision Based Single-Shot 6D Object Pose Estimation for Bin-Picking by a Robot Manipulator , [paper]
[arXiv] SGDN: Segmentation-Based Grasp Detection Network For Unsymmetrical Three-Finger Gripper, [paper]
[arXiv] Event-based Robotic Grasping Detection with Neuromorphic Vision Sensor and Event-Stream Dataset, [paper]
[arXiv] Online Self-Supervised Learning for Object Picking: Detecting Optimum Grasping Position using a Metric Learning Approach, [paper]
[arXiv] A Multi-task Learning Framework for Grasping-Position Detection and Few-Shot Classification, [paper]
[arXiv] Rigid-Soft Interactive Learning for Robust Grasping*, [paper]
[arXiv] Optimizing Correlated Graspability Score and Grasp Regression for Better Grasp Prediction, [paper]
[arXiv] Semi-supervised Grasp Detection by Representation Learning in a Vector Quantized Latent Space, [paper]
2019:
[arXiv] Antipodal Robotic Grasping using Generative Residual Convolutional Neural Network, [paper]
[IROS] Domain Independent Unsupervised Learning to grasp the Novel Objects, [paper]
[Sensors] Vision for Robust Robot Manipulation, [paper]
[arXiv] Form2Fit: Learning Shape Priors for Generalizable Assembly from Disassembly, [paper] [code]
[IROS] GRIP: Generative Robust Inference and Perception for Semantic Robot Manipulation in Adversarial Environments, [paper]
[arXiv] Efficient Fully Convolution Neural Network for Generating Pixel Wise Robotic Grasps With High Resolution Images, [paper]
[arXiv] A Single Multi-Task Deep Neural Network with Post-Processing for Object Detection with Reasoning and Robotic Grasp Detection, [paper]
[IROS] ROI-based Robotic Grasp Detection for Object Overlapping Scenes, [paper]
[RO-MAN] Real-time Grasp Pose Estimation for Novel Objects in Densely Cluttered Environment, [paper]
2018:
[IROS] Fully convolutional grasp detection network with oriented anchor box, [paper]
[arXiv] Real-world Multi-object, Multi-grasp Detection, [paper]
[arXiv] Classification based grasp detection using spatial transformer network, [paper]
[arXiv] A Multi-task Convolutional Neural Network for Autonomous Robotic Grasping in Object Stacking Scenes, [paper]
2017:
[IROS] Robotic Grasp Detection using Deep Convolutional Neural Networks, [paper]
[ICMITE] Robust Robot Grasp Detection in Multimodal Fusion, [paper]
[ICRA] A hybrid deep architecture for robotic grasp detection, [paper]
2016:
[ICRA] Supersizing self-supervision: Learning to grasp from 50k tries and 700 robot hours, [paper]
[ICRA] Object discovery and grasp detection with a shared convolutional neural network, [paper]
2015:
[ICRA] Real-time grasp detection using convolutional neural networks, [paper] [code]
[IJRR] Deep Learning for Detecting Robotic Grasps, [paper]
2011:
[ICRA] Efficient grasping from rgbd images: Learning using a new rectangle representation, [paper]
Datasets:
Cornell dataset, the dataset consists of 1035 images of 280 different objects.
Jacquard Dataset, Jacquard: A Large Scale Dataset for Robotic Grasp Detection” in IEEE International Conference on Intelligent Robots and Systems, 2018, [paper]
4. 6DoF Grasp
Grasp Representation: The grasp is represented as 6DoF pose in 3D domain, and the gripper can grasp the object from various angles. The input to this task is 3D point cloud from RGB-D sensors, and this task contains two stages. In the first stage, the target object should be extracted from the scene. In the second stage, if there exists an existing 3D model, the 6D pose of the object could be computed. If there exists no 3D models, the 6DoF grasp pose will be computed from some other methods.
4.1 Methods based on Single-view Point Cloud
In this situation, there exist no 3D models, an the 6-DoF grasps are estimated from available partial data. This can be implemented by directly estimating from partial view point cloud, or indirectly estimating after shape completion.
4.1.1 Methods of Estimating Candidate Grasps
2021:
[ICRA] Contact-GraspNet: Efficient 6-DoF Grasp Generation in Cluttered Scenes, [paper] [code]
[ICRA] RGB Matters: Learning 7-DoF Grasp Poses on Monocular RGBD Images, [paper]
2020:
[arXiv] Reactive Human-to-Robot Handovers of Arbitrary Objects, [paper]
[arXiv] ACRONYM: A Large-Scale Grasp Dataset Based on Simulation, [paper]
[CoRL] Same Object, Different Grasps: Data and Semantic Knowledge for Task-Oriented Grasping, [paper]
[CoRL] A Coarse-To-Fine (C2F) Representation for End-To-End 6-DoF Grasp Detection, [paper]
[arXiv] Goal-Auxiliary Actor-Critic for 6D Robotic Grasping with Point Clouds, [paper]
[NeurIPS] Grasp Proposal Networks: An End-to-End Solution for Visual Learning of Robotic Grasps, [paper]
[arXiv] 6-DoF Grasp Planning using Fast 3D Reconstruction and Grasp Quality CNN, [paper]
[arXiv] Transferable Active Grasping and Real Embodied Dataset, [paper] [code]
[arXiv] Go Fetch: Mobile Manipulation in Unstructured Environments, [paper]
[arXiv] Real-time Fruit Recognition and Grasp Estimation for Autonomous Apple harvesting, [paper]
[arXiv] PointNet++ Grasping: Learning An End-to-end Spatial Grasp Generation Algorithm from Sparse Point Clouds, [paper][code]
[arXiv] EGAD! an Evolved Grasping Analysis Dataset for diversity and reproducibility in robotic manipulation, [paper]
[ariXiv] REGNet: REgion-based Grasp Network for Single-shot Grasp Detection in Point Clouds, [paper]
[RAL] GRASPA 1.0: GRASPA is a Robot Arm graSping Performance benchmArk, [paper] [code]
[arXiv] GraspNet: A Large-Scale Clustered and Densely Annotated Dataset for Object Grasping, [paper]
2019:
[ISRR] A Billion Ways to Grasp: An Evaluation of Grasp Sampling Schemes on a Dense, Physics-based Grasp Data Set, [paper] [project]
[arXiv] 6-DOF Grasping for Target-driven Object Manipulation in Clutter, [paper]
[IROS] Grasping Unknown Objects Based on Gripper Workspace Spheres, [paper]
[arXiv] Learning to Generate 6-DoF Grasp Poses with Reachability Awareness, [paper]
[CoRL] S4G: Amodal Single-view Single-Shot SE(3) Grasp Detection in Cluttered Scenes, [paper]
[ICCV] 6-DoF GraspNet: Variational Grasp Generation for Object Manipulation, [paper] [code]
[ICRA] PointNetGPD: Detecting Grasp Configurations from Point Sets, [paper] [code]
[IJARS] Fast geometry-based computation of grasping points on three-dimensional point clouds, [paper]
2017:
[IJRR] Grasp Pose Detection in Point Clouds, [paper] [code]
[ICINCO] Using geometry to detect grasping points on 3D unknown point cloud, [paper]
2015:
[arXiv] Using geometry to detect grasps in 3d point clouds, [paper]
2010:
[RAS] Learning grasping points with shape context, [paper]
4.1.2 Methods of Transferring Grasps
a. Grasp transfer
2021:
[arXiv] Supervised Training of Dense Object Nets using Optimal Descriptors for Industrial Robotic Applications, [paper]
2020:
[arXiv] DGCM-Net: Dense Geometrical Correspondence Matching Network for Incremental Experience-based Robotic Grasping, [paper]
2019:
[arXiv] Using Synthetic Data and Deep Networks to Recognize Primitive Shapes for Object Grasping, [paper]
[ICRA] Transferring Grasp Configurations using Active Learning and Local Replanning, [paper]
2018:
[arXiv] Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation, [paper]
2017:
[AIP] Fast grasping of unknown objects using principal component analysis, [paper]
2016:
[Humanoids] Part-based grasp planning for familiar objects, [paper]
2015:
[RAS] Category-based task specific grasping, [paper]
2003:
[ICRA] Automatic grasp planning using shape primitives, [paper]
b. Non-rigid registration
2020:
[arXiv] Category-Level 3D Non-Rigid Registration from Single-View RGB Images, [paper]
[arXiv] Neural Non-Rigid Tracking, [paper]
[arXiv] Quasi-Newton Solver for Robust Non-Rigid Registration, [paper]
[arXiv] MINA: Convex Mixed-Integer Programming for Non-Rigid Shape Alignment, [paper]
2019:
[arXiv] Non-Rigid Point Set Registration Networks, [paper] [code]
2018:
[RAL] Transferring Category-based Functional Grasping Skills by Latent Space Non-Rigid Registration, [paper]
[RAS] Learning Postural Synergies for Categorical Grasping through Shape Space Registration, [paper]
[RAS] Autonomous Dual-Arm Manipulation of Familiar Objects, [paper]
c. Shape correspondence
2020:
[arXiv] CorrNet3D: Unsupervised End-to-end Learning of Dense Correspondence for 3D Point Clouds, [paper]
[arXiv] 3D Meta Point Signature: Learning to Learn 3D Point Signature for 3D Dense Shape Correspondence, [paper]
[NeurIPS] Learning Implicit Functions for Topology-Varying Dense 3D Shape Correspondence, [paper] [code]
[NeurIPS] Weakly Supervised Deep Functional Map for Shape Matching, [paper]
[arXiv] A Dual Iterative Refinement Method for Non-rigid Shape Matching, [paper]
[ECCV] Mapping in a cycle: Sinkhorn regularized unsupervised learning for point cloud shapes, [paper]
[arXiv] RPM-Net: Recurrent Prediction of Motion and Parts from Point Cloud, [paper]
[arXiv] Meta Deformation Network: Meta Functionals for Shape Correspondence, [paper]
[JMSE] Geometric Deep Learning for Shape Correspondence in Mass Customization by Three-Dimensional Printing, [paper]
[arXiv] Semantic Correspondence via 2D-3D-2D Cycle, [paper]
[arXiv] Self-supervised Feature Learning by Cross-modality and Cross-view Correspondences, [paper]
[arXiv] Deep Geometric Functional Maps: Robust Feature Learning for Shape Correspondence, [paper] [code]
[arXiv] Efficient and Robust Shape Correspondence via Sparsity-Enforced Quadratic Assignment, [paper]
[CVM] Learning local shape descriptors for computing non-rigid dense correspondence, [paper]
[JCDE] Embedded spectral descriptors: learning the point-wise correspondence metric via Siamese neural networks, [paper]
[arXiv] SAPIEN: A SimulAted Part-based Interactive ENvironment, [paper]
[TVCG] Voting for Distortion Points in Geometric Processing, [paper]
[arXiv] SketchDesc: Learning Local Sketch Descriptors for Multi-view Correspondence, [paper]
2019:
[arXiv] Fine-grained Object Semantic Understanding from Correspondences, [paper]
[IROS] Multi-step Pick-and-Place Tasks Using Object-centric Dense Correspondences, [paper] [code]
[arXiv] Unsupervised cycle-consistent deformation for shape matching, [paper]
[arXiv] ZoomOut: Spectral Upsampling for Efficient Shape Correspondence, [paper]
[C&G] Partial correspondence of 3D shapes using properties of the nearest-neighbor field, [paper]
4.2 Methods based on Complete Shape
4.2.1 Methods of Estimating 6D Object Pose
2020:
[IROS] Transferring Experience from Simulation to the Real World for Precise Pick-And-Place Tasks in Highly Cluttered Scenes, [paper]
[arXiv] Object-Driven Active Mapping for More Accurate Object Pose Estimation and Robotic Grasping, [paper]
[arXiv] Fast and Robust Bin-picking System for Densely Piled Industrial Objects, [paper]
2017:
[IROS] SegICP: Integrated Deep Semantic Segmentation and Pose Estimation, [paper]
[ICRA] Multi-view Self-supervised Deep Learning for 6D Pose Estimation in the Amazon Picking Challenge, [paper] [code]
2010:
[SIMPAR] OpenGRASP: A Toolkit for Robot Grasping Simulation, [paper]
2009:
[ICAR] An automatic grasp planning system for service robots, [paper]
2004:
[RAM] Graspit! A versatile simulator for robotic grasping, [paper] [code]
4.2.2 Methods of Shape Completion
a. Shape Completion-based Grasp
2020:
[arXiv] Pick-Place With Uncertain Object Instance Segmentation and Shape Completion, [paper]
[arXiv] Amodal 3D Reconstruction for Robotic Manipulation via Stability and Connectivity, [paper]
[ICRA] Learning Continuous 3D Reconstructions for Geometrically Aware Grasping, [paper] [code]
[arXiv] Robotic Grasping through Combined Image-Based Grasp Proposal and 3D Reconstruction, [paper]
2019:
[arXiv] ClearGrasp: 3D Shape Estimation of Transparent Objects for Manipulation, [paper]
[arXiv] kPAM-SC: Generalizable Manipulation Planning using KeyPoint Affordance and Shape Completion, [paper] [code]
[arXiv] Data-Efficient Learning for Sim-to-Real Robotic Grasping using Deep Point Cloud Prediction Networks, [paper]
[IROS] Robust Grasp Planning Over Uncertain Shape Completions, [paper]
[arXiv] Multi-Modal Geometric Learning for Grasping and Manipulation, [paper]
2018:
[ICRA] Learning 6-DOF Grasping Interaction via Deep Geometry-aware 3D Representations, [paper]
[IROS] 3D Shape Perception from Monocular Vision, Touch, and Shape Priors, [paper]
2017:
[IROS] Shape Completion Enabled Robotic Grasping, [paper]
b. Shape Completion or Generation
2021:
[arXiv] ASFM-Net: Asymmetrical Siamese Feature Matching Network for Point Completion, [paper]
[CVPR] Variational Relational Point Completion Network, [paper]
[CVPR] View-Guided Point Cloud Completion, [paper]
[arXiv] 3D Semantic Scene Completion: a Survey, [paper]
[CVPR] Cycle4Completion: Unpaired Point Cloud Completion using Cycle Transformation with Missing Region Coding, [paper]
[CVPR] Style-based Point Generator with Adversarial Rendering for Point Cloud Completion, [paper]
[CVPR] Diffusion Probabilistic Models for 3D Point Cloud Generation, [paper]
[arXiv] DeepMetaHandles: Learning Deformation Meta-Handles of 3D Meshes with Biharmonic Coordinates, [paper]
[arXiv] Generation for adaption: a Gan-based approach for 3D Domain Adaption in Point Cloud, [paper]
[arXiv] HyperPocket: Generative Point Cloud Completion, [paper]
2020:
[arXiv] Seeing Behind Objects for 3D Multi-Object Tracking in RGB-D Sequences, [paper]
[arXiv] PMP-Net: Point Cloud Completion by Learning Multi-step Point Moving Paths, [paper]
[arXiv] Towards Part-Based Understanding of RGB-D Scans, [paper]
[arXiv] Learning geometry-image representation for 3D point cloud generation, [paper]
[arXiv] Diverse Plausible Shape Completions from Ambiguous Depth Images, [paper]
[arXiv] A Self-supervised Cascaded Refinement Network for Point Cloud Completion, [paper]
[arXiv] Refinement of Predicted Missing Parts Enhance Point Cloud Completion, [paper]
[3DV] A Progressive Conditional Generative Adversarial Network for Generating Dense and Colored 3D Point Clouds, [paper]
[NeurIPS] Skeleton-bridged Point Completion: From Global Inference to Local Adjustment, [paper]
[arXiv] Pre-Training by Completing Point Clouds, [paper]
[ECCVW] Implicit Feature Networks for Texture Completion from Partial 3D Data, [paper]
[arXiv] LMSCNet: Lightweight Multiscale 3D Semantic Completion, [paper]
[arXiv] Self-Sampling for Neural Point Cloud Consolidation, [paper]
[ECCV] PointMixup: Augmentation for Point Clouds, [paper]
[ECCV] Learning Gradient Fields for Shape Generation, [paper]
[ECCV] SoftPoolNet: Shape Descriptor for Point Cloud Completion and Classification, [paper]
[ECCV] Weakly-supervised 3D Shape Completion in the Wild, [paper]
[arXiv] VPC-Net: Completion of 3D Vehicles from MLS Point Clouds, [paper]
[arXiv] LPMNet: Latent Part Modification and Generation for 3D Point Clouds, [paper]
[arXiv] DSM-Net: Disentangled Structured Mesh Net for Controllable Generation of Fine Geometry, [paper]
[arXiv] KAPLAN: A 3D Point Descriptor for Shape Completion, [paper]
[arXiv] Point Cloud Completion by Learning Shape Priors, [paper]
[TOG] SymmetryNet: Learning to Predict Reflectional and Rotational Symmetries of 3D Shapes from Single-View RGB-D Images, [paper]
[arXiv] MRGAN: Multi-Rooted 3D Shape Generation with Unsupervised Part Disentanglement, [paper]
[arXiv] Neural Mesh Flow: 3D Manifold Mesh Generation via Diffeomorphic Flows, [paper] [project]
[ECCV] Discrete Point Flow Networks for Efficient Point Cloud Generation, [paper]
[arXiv] Progressive Point Cloud Deconvolution Generation Network, [paper]
[arXiv] Point Set Voting for Partial Point Cloud Analysis, [paper]
[arXiv] 3D Topology Transformation with Generative Adversarial Networks, [paper]
[arXiv] Detail Preserved Point Cloud Completion via Separated Feature Aggregation, [paper]
[arXiv] Deep Octree-based CNNs with Output-Guided Skip Connections for 3D Shape and Scene Completion, [paper]
[arXiv] GRNet: Gridding Residual Network for Dense Point Cloud Completion, [paper]
[RAL] GFPNet: A Deep Network for Learning Shape Completion in Generic Fitted Primitives, [paper]
[arXiv] Point Cloud Completion by Skip-attention Network with Hierarchical Folding, [paper]
[arXiv] PointTriNet: Learned Triangulation of 3D Point Sets, [paper]
[arXiv] DeepSDF x Sim(3): Extending DeepSDF for automatic 3D shape retrieval and similarity transform estimation, [paper]
[arXiv] Anisotropic Convolutional Networks for 3D Semantic Scene Completion, [paper]
[arXiv] Cascaded Refinement Network for Point Cloud Completio, [paper]
[arXiv] Generative PointNet: Energy-Based Learning on Unordered Point Sets for 3D Generation, Reconstruction and Classification, [paper]
[arXiv] Intrinsic Point Cloud Interpolation via Dual Latent Space Navigation, [paper]
[arXiv] Modeling 3D Shapes by Reinforcement Learning, [paper]
[arXiv] PF-Net: Point Fractal Network for 3D Point Cloud Completion, [paper]
[arXiv] Hypernetwork approach to generating point clouds, [paper]
[arXiv] Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion, [paper]
[arXiv] PolyGen: An Autoregressive Generative Model of 3D Meshes, [paper]
[arXiv] BlockGAN Learning 3D Object-aware Scene Representations from Unlabelled Images, [paper]
[arXiv] Implicit Geometric Regularization for Learning Shapes, [paper]
[arXiv] The Whole Is Greater Than the Sum of Its Nonrigid Parts, [paper]
[arXiv] PT2PC: Learning to Generate 3D Point Cloud Shapes from Part Tree Conditions, [paper]
[arXiv] Multimodal Shape Completion via Conditional Generative Adversarial Networks, [paper]
[arXiv] Symmetry Detection of Occluded Point Cloud Using Deep Learning, [paper]
2019:
[arXiv] Inferring Occluded Geometry Improves Performance when Retrieving an Object from Dense Clutter, [paper]
2018:
[3DORW] Completion of Cultural Heritage Objects with Rotational Symmetry, [paper]
c. Depth Completion and Estimation
2021:
[arXiv] Single Image Depth Estimation: An Overview, [paper]
[CVPR] Depth Completion using Plane-Residual Representation, [paper]
[arXiv] LEAD: LiDAR Extender for Autonomous Driving, [paper]
2020:
[arXiv] Deep Learning based Monocular Depth Prediction: Datasets, Methods and Applications, [paper]
[IROS] Depth Completion via Inductive Fusion of Planar LIDAR and Monocular Camera, [paper]
[BMVC] DESC: Domain Adaptation for Depth Estimation via Semantic Consistency, [paper] [code]
[arXiv] Adaptive Context-Aware Multi-Modal Network for Depth Completion, [paper]
[arXiv] Depth Completion with RGB Prior, [paper]
[IROS] Balanced Depth Completion between Dense Depth Inference and Sparse Range Measurements via KISS-GP, [paper]
[arXiv] Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets, [paper]
[ECCV] Feature-metric Loss for Self-supervised Learning of Depth and Egomotion, [paper]
[ECCV] Non-Local Spatial Propagation Network for Depth Completion, [paper] [code]
[IROS] UnRectDepthNet: Self-Supervised Monocular Depth Estimation using a Generic Framework for Handling Common Camera Distortion Models, [paper]
[IROS] 360° Depth Estimation from Multiple Fisheye Images with Origami Crown Representation of Icosahedron, [paper]
[ECCV] Self-Supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance, [paper]
[ECCV] P2Net: Patch-match and Plane-regularization for Unsupervised Indoor Depth Estimation, [paper]
[arXiv] P2D: a self-supervised method for depth estimation from polarimetry, [paper]
[arXiv] MiniNet: An extremely lightweight convolutional neural network for real-time unsupervised monocular depth estimation, [paper]
[RAL] Discontinuous and Smooth Depth Completion with Binary Anisotropic Diffusion Tensor, [paper]
[arXiv] Increased-Range Unsupervised Monocular Depth Estimation, [paper]
[arXiv] Targeted Adversarial Perturbations for Monocular Depth Prediction, [paper]
[arXiv] AcED: Accurate and Edge-consistent Monocular Depth Estimation, [paper]
[arXiv] Self-Supervised Joint Learning Framework of Depth Estimation via Implicit Cues, [paper]
[arXiv] Depth by Poking: Learning to Estimate Depth from Self-Supervised Grasping, [paper]
[arXiv] Uncertainty-Aware CNNs for Depth Completion: Uncertainty from Beginning to End, [paper]
[arXiv] A Survey on Deep Learning Techniques for Stereo-based Depth Estimation, [paper]
[arXiv] Real-time single image depth perception in the wild with handheld devices, [paper]
[arXiv] SharinGAN: Combining Synthetic and Real Data for Unsupervised Geometry Estimation, [paper]
[arXiv] PLG-IN: Pluggable Geometric Consistency Loss with Wasserstein Distance in Monocular Depth Estimation, [paper]
[CVPR] Bi3D: Stereo Depth Estimation via Binary Classifications, [paper]
[CVPR] Focus on defocus: bridging the synthetic to real domain gap for depth estimation, [paper]
[arXiv] Decoder Modulation for Indoor Depth Completion, [paper]
[CVPR] On the uncertainty of self-supervised monocular depth estimation, [paper] [code]
[arXiv] Consistent Video Depth Estimation, [paper]
[arXiv] Self-Supervised Attention Learning for Depth and Ego-motion Estimation, [paper]
[arXiv] Pseudo RGB-D for Self-Improving Monocular SLAM and Depth Prediction, [paper]
[arXiv] DepthNet Nano: A Highly Compact Self-Normalizing Neural Network for Monocular Depth Estimation, [paper]
[arXiv] RealMonoDepth: Self-Supervised Monocular Depth Estimation for General Scenes, [paper]
[arXiv] Monocular Depth Estimation with Self-supervised Instance Adaptation, [paper]
[arXiv] Guiding Monocular Depth Estimation Using Depth-Attention Volume, [paper]
[arXiv] 3D Photography using Context-aware Layered Depth Inpainting, [paper]
[arXiv] Occlusion-Aware Depth Estimation with Adaptive Normal Constraints, [paper]
[arXiv] The Edge of Depth: Explicit Constraints between Segmentation and Depth, [paper]
[arXiv] Self-supervised Monocular Trained Depth Estimation using Self-attention and Discrete Disparity Volume, [paper]
[arXiv] DeFeat-Net: General Monocular Depth via Simultaneous Unsupervised Representation Learning, [paper]
[arXiv] Adversarial Attacks on Monocular Depth Estimation, [paper]
[arXiv] Monocular Depth Prediction Through Continuous 3D Loss, [paper]
[arXiv] 3dDepthNet: Point Cloud Guided Depth Completion Network for Sparse Depth and Single Color Image, [paper]
[arXiv] Depth Estimation by Learning Triangulation and Densification of Sparse Points for Multi-view Stereo, [paper]
[arXiv] Monocular Depth Estimation Based On Deep Learning: An Overview, [paper]
[arXiv] Scene Completenesss-Aware Lidar Depth Completion for Driving Scenario, [paper]
[arXiv] Fast Depth Estimation for View Synthesis, [paper]
[arXiv] Active Depth Estimation: Stability Analysis and its Applications, [paper]
[arXiv] Uncertainty depth estimation with gated images for 3D reconstruction, [paper]
[arXiv] Unsupervised Learning of Depth, Optical Flow and Pose with Occlusion from 3D Geometry, [paper]
[arXiv] A-TVSNet: Aggregated Two-View Stereo Network for Multi-View Stereo Depth Estimation, [paper]
[arXiv] Predicting Sharp and Accurate Occlusion Boundaries in Monocular Depth Estimation Using Displacement Fields, [paper]
[ICLR] Semantically-Guided Representation Learning for Self-Supervised Monocular Depth, [paper]
[arXiv] 3D Gated Recurrent Fusion for Semantic Scene Completion, [paper]
[arXiv] Applying Depth-Sensing to Automated Surgical Manipulation with a da Vinci Robot, [paper]
[arXiv] Fast Generation of High Fidelity RGB-D Images by Deep-Learning with Adaptive Convolution, [paper]
[arXiv] DiverseDepth: Affine-invariant Depth Prediction Using Diverse Data, [paper]
[arXiv] Depth Map Estimation of Dynamic Scenes Using Prior Depth Information, [paper]
[arXiv] FIS-Nets: Full-image Supervised Networks for Monocular Depth Estimation, [paper]
[ICRA] Depth Based Semantic Scene Completion with Position Importance Aware Loss, [paper]
[arXiv] ResDepth: Learned Residual Stereo Reconstruction, [paper]
[arXiv] Single Image Depth Estimation Trained via Depth from Defocus Cues, [paper]
[arXiv] RoutedFusion: Learning Real-time Depth Map Fusion, [paper]
[arXiv] Don't Forget The Past: Recurrent Depth Estimation from Monocular Video, [paper]
[AAAI] Morphing and Sampling Network for Dense Point Cloud Completion, [paper] [code]
[AAAI] CSPN++: Learning Context and Resource Aware Convolutional Spatial Propagation Networks for Depth Completion, [paper]
2019:
[arXiv] Normal Assisted Stereo Depth Estimation, [paper]
[arXiv] Geometry-aware Generation of Adversarial and Cooperative Point Clouds, [paper]
[arXiv] DeepSFM: Structure From Motion Via Deep Bundle Adjustment, [paper]
[CVIU] On the Benefit of Adversarial Training for Monocular Depth Estimation, [paper]
[ICCV] Learning Joint 2D-3D Representations for Depth Completion, [paper]
[ICCV] Deep Optics for Monocular Depth Estimation and 3D Object Detection, [paper]
[arXiv] Deep Classification Network for Monocular Depth Estimation, [paper]
[ICCV] Depth Completion from Sparse LiDAR Data with Depth-Normal Constraints, [paper]
[arXiv] Image-based 3D Object Reconstruction: State-of-the-Art and Trends in the Deep Learning Era, [paper]
[arXiv] Real-time Vision-based Depth Reconstruction with NVidia Jetson, [paper]
[IROS] Self-supervised 3D Shape and Viewpoint Estimation from Single Images for Robotics, [paper]
[arXiv] Mesh R-CNN, [paper]
[arXiv] Monocular depth estimation: a survey, [paper]
2018:
[3DV] PCN: Point Completion Network, [paper] [code]
[NeurIPS] Learning to Reconstruct Shapes from Unseen Classes, [paper] [code]
[ECCV] Learning Shape Priors for Single-View 3D Completion and Reconstruction, [paper] [code]
[CVPR] Deep Depth Completion of a Single RGB-D Image, [paper] [code]
d. Point Cloud Denoising and Samping
2020:
[arXiv] SPU-Net: Self-Supervised Point Cloud Upsampling by Coarse-to-Fine Reconstruction with Self-Projection Optimization, [paper]
[arXiv] Deep Magnification-Arbitrary Upsampling over 3D Point Clouds, [paper]
[arXiv] CAD-PU: A Curvature-Adaptive Deep Learning Solution for Point Set Upsampling, [paper]
[MM] Differentiable Manifold Reconstruction for Point Cloud Denoising, [paper]
[arXiv] A Quick Review on Recent Trends in 3D Point Cloud Data Compression Techniques and the Challenges of Direct Processing in 3D Compressed Domain , [paper]
[arXiv] Learning Graph-Convolutional Representations for Point Cloud Denoising, [paper]
[arXiv] MOPS-Net: A Matrix Optimization-driven Network for Task-Oriented 3D Point Cloud Downsampling, [paper]
[arXiv] Deep Feature-preserving Normal Estimation for Point Cloud Filtering, [paper]
[arXiv] Self-Supervised Learning for Domain Adaptation on Point-Clouds, [paper]
[arXiv] Non-Local Part-Aware Point Cloud Denoising, [paper]
[arXiv] PUGeo-Net: A Geometry-centric Network for 3D Point Cloud Upsampling, [paper]
2019:
[arXiv] CNN-based Lidar Point Cloud De-Noising in Adverse Weather, [paper]
[arXiv] PU-GCN: Point Cloud Upsampling using Graph Convolutional Networks, [paper] [code]
[ICCV] PU-GAN: a Point Cloud Upsampling Adversarial Network, [paper] [code]
[CVPR] Patch-based Progressive 3D Point Set Upsampling, [paper] [code]
[arXiv] SampleNet: Differentiable Point Cloud Sampling, [paper] [code]
2018:
[CVPR] PU-Net: Point Cloud Upsampling Network, [paper] [code]
5. Task-oriented Methods
5.1 Task-oriented Manipulation
2020:
[IROS] Learning and Sequencing of Object-Centric Manipulation Skills for Industrial Tasks, [paper]
[RSSW] Self-Supervised Goal-Conditioned Pick and Place, [paper]
[arXiv] Self-Adapting Recurrent Models for Object Pushing from Learning in Simulation, [paper]
[arXiv] Complex Robotic Manipulation via Graph-Based Hindsight Goal Generation, [paper]
[TOR] Learning Transferable Push Manipulation Skills in Novel Contexts, [paper]
[RAL] Task-driven Perception and Manipulation for Constrained Placement of Unknown Objects, [paper]
[arXiv] Vision-based control of a knuckle boom crane with online cable length estimation, [paper]
[arXiv] A Point Cloud-Based Method for Automatic Groove Detection and Trajectory Generation of Robotic Arc Welding Tasks, [paper]
[arXiv] Neuromorphic Event-Based Slip Detection and Suppression in Robotic Grasping and Manipulation, [paper]
[arXiv] Combinatorial 3D Shape Generation via Sequential Assembly, [paper]
[arXiv] Learning visual policies for building 3D shape categories, [paper]
[arXiv] Where to relocate?: Object rearrangement inside cluttered and confined environments for robotic manipulation, [paper]
[arXiv] Correspondence Networks with Adaptive Neighbourhood Consensus, [paper]
[arXiv] Development of a Robotic System for Automated Decaking of 3D-Printed Parts, [paper]
[arXiv] Team O2AS at the World Robot Summit 2018: An Approach to Robotic Kitting and Assembly Tasks using General Purpose Grippers and Tools, [paper]
[arXiv] Towards Mobile Multi-Task Manipulation in a Confined and Integrated Environment with Irregular Objects, [paper]
[arXiv] Autonomous Industrial Assembly using Force, Torque, and RGB-D sensing, [paper]
[RAL] A Deep Learning Approach to Grasping the Invisible, [paper] [code]
2019:
[arXiv] KETO: Learning Keypoint Representations for Tool Manipulation, [paper]
[arXiv] Learning Task-Oriented Grasping from Human Activity Datasets, [paper]
5.2 Grasp Affordance
2021:
[arXiv] CaTGrasp: Learning Category-Level Task-Relevant Grasping in Clutter from Simulation, [paper] [code]
[CVPR] 3D AffordanceNet: A Benchmark for Visual Object Affordance Understanding, [paper]
2020:
[arXiv] Learning to Grasp 3D Objects using Deep Residual U-Nets, [paper]
2019:
[IROS] Detecting Robotic Affordances on Novel Objects with Regional Attention and Attributes, [paper]
[IROS] Learning Grasp Affordance Reasoning through Semantic Relations, [paper]
[arXiv] Automatic pre-grasps generation for unknown 3D objects, [paper]
[IECON] A novel object slicing based grasp planner for 3D object grasping using underactuated robot gripper, [paper]
2018:
[ICRA] AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection, [paper]
[arXiv] Workspace Aware Online Grasp Planning, [paper]
5.3 3D Part Segmentation
2021:
[arXiv] Learning Fine-Grained Segmentation of 3D Shapes without Part Labels, [paper]
2020:
[arXiv] Learning Unsupervised Hierarchical Part Decomposition of 3D Objects from a Single RGB Image, [paper]
[arXiv] Learning 3D Part Assembly from a Single Image, [paper]
[ICLR] Learning to Group: A Bottom-Up Framework for 3D Part Discovery in Unseen Categories, [paper]
2019:
[arXiv] Skeleton Extraction from 3D Point Clouds by Decomposing the Object into Parts, [paper]
[arXiv] Neural Shape Parsers for Constructive Solid Geometry, [paper]
[arXiv] PQ-NET: A Generative Part Seq2Seq Network for 3D Shapes, [paper]
[CVPR] PartNet: A Recursive Part Decomposition Network for Fine-grained and Hierarchical Shape Segmentation, [paper] [code]
[C&G] Autoencoder-based part clustering for part-in-whole retrieval of CAD models, [paper]
2016:
[SiggraphAsia] A Scalable Active Framework for Region Annotation in 3D Shape Collections, [paper]
6. Dexterous Grippers
2021:
[CVPR] ContactOpt: Optimizing Contact to Improve Grasps, [paper]
2020:
[arXiv] Multi-FinGAN: Generative Coarse-To-Fine Sampling of Multi-Finger Grasps, [paper]
[CoRL] Fit2Form: 3D Generative Model for Robot Gripper Form Design, [paper]
[ECCV] GRAB: A Dataset of Whole-Body Human Grasping of Objects, [paper]
[ECCV] DRG: Dual Relation Graph for Human-Object Interaction Detection, [paper] [project] [code]
[ECCV] InterHand2.6M: A Dataset and Baseline for 3D Interacting Hand Pose Estimation from a Single RGB Image, [paper]
[arXiv] TriFinger: An Open-Source Robot for Learning Dexterity, [paper]
[arXiv] Grasping Field: Learning Implicit Representations for Human Grasps, [paper]
[ECCV] ContactPose: A Dataset of Grasps with Object Contact and Hand Pose, [paper] [project]
[ICRA] Generalized Grasping for Mechanical Grippers for Unknown Objects with Partial Point Cloud Representations, [paper]
[arXiv] Multi-Fingered Active Grasp Learning, [paper]
[arXiv] Learning Compliance Adaptation in Contact-Rich Manipulation, [paper]
[arXiv] Leveraging Photometric Consistency over Time for Sparsely Supervised Hand-Object Reconstruction, [paper]
[arXiv] HandVoxNet: Deep Voxel-Based Network for 3D Hand Shape and Pose Estimation from a Single Depth Map, [paper]
[arXiv] Functionally Divided Manipulation Synergy for Controlling Multi-fingered Hands, [paper]
[arXiv] The State of Service Robots: Current Bottlenecks in Object Perception and Manipulation, [paper]
[arXiv] Selecting and Designing Grippers for an Assembly Task in a Structured Approach, [paper]
[arXiv] A Mobile Robot Hand-Arm Teleoperation System by Vision and IMU, [paper]
[arXiv] Robust High-Transparency Haptic Exploration for Dexterous Telemanipulation, [paper]
[arXiv] Tactile Dexterity: Manipulation Primitives with Tactile Feedback, [paper]
[arXiv] Deep Differentiable Grasp Planner for High-DOF Grippers, [paper]
[arXiv] Multi-Fingered Grasp Planning via Inference in Deep Neural Networks, [paper]
[RAL] Benchmarking In-Hand Manipulation, [paper]
2019:
[arXiv] GraphPoseGAN: 3D Hand Pose Estimation from a Monocular RGB Image via Adversarial Learning on Graphs, [paper]
[arXiv] HMTNet:3D Hand Pose Estimation from Single Depth Image Based on Hand Morphological Topology, [paper]
[arXiv] UniGrasp: Learning a Unified Model to Grasp with N-Fingered Robotic Hands, [paper]
[ScienceRobotics] On the choice of grasp type and location when handing over an object, [paper]
[arXiv] Solving Rubik's Cube with a Robot Hand, [paper]
[IJARS] Fast geometry-based computation of grasping points on three-dimensional point clouds, [paper] [code]
[arXiv] Learning better generative models for dexterous, single-view grasping of novel objects, [paper]
[arXiv] DexPilot: Vision Based Teleoperation of Dexterous Robotic Hand-Arm System, [paper]
[IROS] Optimization Model for Planning Precision Grasps with Multi-Fingered Hands, [paper]
[IROS] Generating Grasp Poses for a High-DOF Gripper Using Neural Networks, [paper]
[arXiv] Deep Dynamics Models for Learning Dexterous Manipulation, [paper]
[CVPR] Learning joint reconstruction of hands and manipulated objects, [paper]
[CVPR] H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions, [paper]
[IROS] Efficient Grasp Planning and Execution with Multi-Fingered Hands by Surface Fitting, [paper]
[arXiv] Efficient Bimanual Manipulation Using Learned Task Schemas, [paper]
[ICRA] High-Fidelity Grasping in Virtual Reality using a Glove-based System, [paper] [code]
7. Data Generation
7.1 Simulation to Reality
2020:
[arXiv] iGibson, a Simulation Environment for Interactive Tasks in Large Realistic Scenes, [paper]
[RSS] Perspectives on Sim2Real Transfer for Robotics: A Summary of the RSS 2020 Workshop, [paper]
[ECCV] AutoSimulate: (Quickly) Learning Synthetic Data Generation, [paper]
[ECCV] Meta-Sim2: Unsupervised Learning of Scene Structure for Synthetic Data Generation, [paper]
[arXiv] The Importance and the Limitations of Sim2Real for Robotic Manipulation in Precision Agriculture, [paper]
[arXiv] BenchBot: Evaluating Robotics Research in Photorealistic 3D Simulation and on Real Robots, [paper]
[arXiv] How to Close Sim-Real Gap? Transfer with Segmentation!, [paper]
[arXiv] A Study on the Challenges of Using Robotics Simulators for Testing, [paper]
[arXiv] Joint Supervised and Self-Supervised Learning for 3D Real-World Challenges, [paper]
[arXiv] RoboTHOR: An Open Simulation-to-Real Embodied AI Platform, [paper]
[arXiv] On the Effectiveness of Virtual Reality-based Training for Robotic Setup, [paper]
[arXiv] LiDARNet: A Boundary-Aware Domain Adaptation Model for Lidar Point Cloud Semantic Segmentation, [paper]
[arXiv] Multi-source Domain Adaptation in the Deep Learning Era: A Systematic Survey, [paper]
[arXiv] Learning Machines from Simulation to Real World, [paper]
[arXiv] Sim2Real2Sim: Bridging the Gap Between Simulation and Real-World in Flexible Object Manipulation, [paper]
2019:
[IROS] Learning to Augment Synthetic Images for Sim2Real Policy Transfer, [paper]
[arXiv] Accept Synthetic Objects as Real-End-to-End Training of Attentive Deep Visuomotor Policies for Manipulation in Clutter, [paper]
[RSSW] Generative grasp synthesis from demonstration using parametric mixtures, [paper]
2018:
[RSS] Learning Task-Oriented Grasping for Tool Manipulation from Simulated Self-Supervision, [paper]
[CoRL] Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects, [paper] [code]
[arXiv] Multi-Task Domain Adaptation for Deep Learning of Instance Grasping from Simulation, [paper]
2017:
[arXiv] Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping, [paper]
7.2 Self-supervised Methods
2019:
[arXiv] Self-supervised 6D Object Pose Estimation for Robot Manipulation, [paper]
2018:
[RSS] Learning Task-Oriented Grasping for Tool Manipulation from Simulated Self-Supervision, [paper]
8. Multi-source
2020:
[arXiv] Teaching Cameras to Feel: Estimating Tactile Physical Properties of Surfaces From Images, [paper]
[arXiv] Multimodal Material Classification for Robots using Spectroscopy and High Resolution Texture Imaging, [paper]
[arXiv] Understanding Contexts Inside Robot and Human Manipulation Tasks through a Vision-Language Model and Ontology System in a Video Stream, [paper]
[ToR] A Transfer Learning Approach to Cross-modal Object Recognition: from Visual Observation to Robotic Haptic Exploration, [paper]
[arXiv] Accurate Vision-based Manipulation through Contact Reasoning, [paper]
2019:
[arXiv] RoboSherlock: Cognition-enabled Robot Perception for Everyday Manipulation Tasks, [paper]
[ICRA] Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks, [paper]
[CVPR] ContactDB: Analyzing and Predicting Grasp Contact via Thermal Imaging, [paper] [code]
2018:
[arXiv] Learning to Grasp without Seeing, [paper]
9. Motion Planning
9.1 Visual servoing
2020:
[arXiv] Nothing But Geometric Constraints: A Model-Free Method for Articulated Object Pose Estimation, [paper]
[arXiv] Robust Keypoint Detection and Pose Estimation of Robot Manipulators with Self-Occlusions via Sim-to-Real Transfer, [paper]
[IROS] KOVIS: Keypoint-based Visual Servoing with Zero-Shot Sim-to-Real Transfer for Robotics Manipulation, [paper]
[arXiv] Detailed 2D-3D Joint Representation for Human-Object Interaction, [paper]
[arXiv] Neuromorphic Eye-in-Hand Visual Servoing, [paper]
[arXiv] Predicting Target Feature Configuration of Non-stationary Objects for Grasping with Image-Based Visual Servoing, [paper]
[AAAI] That and There: Judging the Intent of Pointing Actions with Robotic Arms, [paper]
2019:
[arXiv] Camera-to-Robot Pose Estimation from a Single Image, [paper]
[ICRA] Learning Driven Coarse-to-Fine Articulated Robot Tracking, [paper]
[CVPR] Craves: controlling robotic arm with a vision-based, economic system, [paper] [code]
2018:
[arXiv] Point-to-Pose Voting based Hand Pose Estimation using Residual Permutation Equivariant Layer, [paper]
2016:
[ICRA] Robot Arm Pose Estimation by Pixel-wise Regression of Joint Angles, [paper]
2014:
[ICRA] Robot Arm Pose Estimation through Pixel-Wise Part Classification, [paper]
9.2 Path Planning
2021:
[arXiv] Dynamic Movement Primitives in Robotics: A Tutorial Survey, [paper]
2020:
[arXiv] Human-Guided Planner for Non-Prehensile Manipulation, [paper)]
[arXiv] Latent Space Roadmap for Visual Action Planning of Deformable and Rigid Object Manipulation, [paper]
[arXiv] GOMP: Grasp-Optimized Motion Planning for Bin Picking, [paper]
[arXiv] Describing Physics For Physical Reasoning: Force-based Sequential Manipulation Planning, [paper]
[arXiv] Reaching, Grasping and Re-grasping: Learning Fine Coordinated Motor Skills, [paper]
2019:
[arXiv] Manipulation Trajectory Optimization with Online Grasp Synthesis and Selection, [paper]
[arXiv] Parareal with a Learned Coarse Model for Robotic Manipulation, [paper]
10. Imitation Learning
2020:
[arXiv] Learning-from-Observation Framework: One-Shot Robot Teaching for Grasp-Manipulation-Release Household Operations, [paper]
[arXiv] Self-supervised Learning for Precise Pick-and-place without Object Model, [paper]
[arXiv] HOPE-Net: A Graph-based Model for Hand-Object Pose Estimation, [paper]
[arXiv] SQUIRL: Robust and Efficient Learning from Video Demonstration of Long-Horizon Robotic Manipulation Tasks, [paper]
[arXiv] A Geometric Perspective on Visual Imitation Learning, [paper]
[arXiv] Vision-based Robot Manipulation Learning via Human Demonstrations, [paper]
[arXiv] Gaussian-Process-based Robot Learning from Demonstration, [paper]
2019:
[arXiv] Grasping in the Wild: Learning 6DoF Closed-Loop Grasping from Low-Cost Demonstrations, [paper] [project]
[arXiv] Motion Reasoning for Goal-Based Imitation Learning, [paper]
[IROS] Robot Learning of Shifting Objects for Grasping in Cluttered Environments, [paper] [code]
[arXiv] Learning Deep Parameterized Skills from Demonstration for Re-targetable Visuomotor Control, [paper]
[arXiv] Adversarial Skill Networks: Unsupervised Robot Skill Learning from Video, [paper]
[IROS] Learning Actions from Human Demonstration Video for Robotic Manipulation, [paper]
[RSSW] Generative grasp synthesis from demonstration using parametric mixtures, [paper]
2018:
[arXiv] Deep Imitation Learning for Complex Manipulation Tasks from Virtual Reality Teleoperation, [paper]
11. Reinforcement Learning
2020:
[arXiv] A Framework for Efficient Robotic Manipulation, [paper]
[IROS] Physics-Based Dexterous Manipulations with Estimated Hand Poses and Residual Reinforcement Learning, [paper]
[arXiv] Follow the Object: Curriculum Learning for Manipulation Tasks with Imagined Goals, [paper]
[arXiv] Towards Generalization and Data Efficient Learning of Deep Robotic Grasping, [paper]
[ICLR] The Ingredients of Real World Robotic Reinforcement Learning, [paper]
[arXiv] Efficient Adaptation for End-to-End Vision-Based Robotic Manipulation, [paper]
[arXiv] Spatial Action Maps for Mobile Manipulation, [paper]
[arXiv] Learning Precise 3D Manipulation from Multiple Uncalibrated Cameras, [paper]
[arXiv] The Surprising Effectiveness of Linear Models for Visual Foresight in Object Pile Manipulation, [paper]
[arXiv] Learning Pregrasp Manipulation of Objects from Ungraspable Poses, [paper]
[arXiv] Deep Reinforcement Learning for Autonomous Driving: A Survey, [paper]
[arXiv] Lyceum: An efficient and scalable ecosystem for robot learning, [paper]
[arXiv] Planning an Efficient and Robust Base Sequence for a Mobile Manipulator Performing Multiple Pick-and-place Tasks, [paper]
[arXiv] Reward Engineering for Object Pick and Place Training, [paper]
2019:
[arXiv] Towards Practical Multi-Object Manipulation using Relational Reinforcement Learning, [paper] [project] [code]
[ROBIO] Efficient Robotic Task Generalization Using Deep Model Fusion Reinforcement Learning, [paper]
[arXiv] Contextual Reinforcement Learning of Visuo-tactile Multi-fingered Grasping Policies, [paper]
[IROS] Scaling Robot Supervision to Hundreds of Hours with RoboTurk: Robotic Manipulation Dataset through Human Reasoning and Dexterity, [paper]
[arXiv] IRIS: Implicit Reinforcement without Interaction at Scale for Learning Control from Offline Robot Manipulation Data, [paper]
[arXiv] Dynamic Cloth Manipulation with Deep Reinforcement Learning, [paper]
[CoRL] Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning, [paper] [project]
[CoRL] Asynchronous Methods for Model-Based Reinforcement Learning, [paper]
[CoRL] Entity Abstraction in Visual Model-Based Reinforcement Learning, [paper]
[CoRL] Dynamics Learning with Cascaded Variational Inference for Multi-Step Manipulation, [paper] [project]
[arXiv] Contextual Imagined Goals for Self-Supervised Robotic Learning, [paper]
[arXiv] Learning to Manipulate Deformable Objects without Demonstrations, [paper] [project]
[arXiv] A Deep Learning Approach to Grasping the Invisible, [paper]
[arXiv] Knowledge Induced Deep Q-Network for a Slide-to-Wall Object Grasping, [paper]
[arXiv] Quantile QT-Opt for Risk-Aware Vision-Based Robotic Grasping, [paper]
[arXiv] Adaptive Curriculum Generation from Demonstrations for Sim-to-Real Visuomotor Control, [paper]
[arXiv] Reinforcement Learning for Robotic Manipulation using Simulated Locomotion Demonstrations, [paper]
[arXiv] Self-Supervised Sim-to-Real Adaptation for Visual Robotic Manipulation, [paper]
[arXiv] Object Perception and Grasping in Open-Ended Domains, [paper]
[CoRL] ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots, [paper] [code]
[RSS] End-to-End Robotic Reinforcement Learning without Reward Engineering, [paper]
[arXiv] Learning to combine primitive skills: A step towards versatile robotic manipulation, [paper]
[CoRL] A Survey on Reproducibility by Evaluating Deep Reinforcement Learning Algorithms on Real-World Robots, [paper] [code]
[ICCAS] Deep Reinforcement Learning Based Robot Arm Manipulation with Efficient Training Data through Simulation, [paper]
[CVPR] CRAVES: Controlling Robotic Arm with a Vision-based Economic System, [paper] [code]
[Report] A Unified Framework for Manipulating Objects via Reinforcement Learning, [paper]
2018:
[IROS] Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning, [paper] [code]
[CoRL] QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation, [paper]
[arXiv] Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods, [paper]
[arXiv] Pick and Place Without Geometric Object Models, [paper]
2017:
[arXiv] Deep Reinforcement Learning for Robotic Manipulation-The state of the art, [paper]
2016:
[IJRR] Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning, [paper]
2013:
[IJRR] Reinforcement learning in robotics: A survey, [paper]
12. Experts
Abhinav Gupta(CMU & FAIR): Robotics, machine learning
Andreas ten Pas(Northeastern University): Robotic Grasping, Deep Learning, Simulation-based Planning
Andy Zeng(Princeton University & Google Brain Robotics): 3D Deep Learning, Robotic Grasping
Animesh Garg(University of Toronto): Robotics, Reinforcement Learning
Bugra Tekin(Microsoft MR): Pose Estimation
Cewu Lu(SJTU): Machine Vision
Charles Ruizhongtai Qi(Waymo(Google)): 3D Deep Learning
Danfei Xu(Stanford University): Robotics, Computer Vision
Deter Fox(Nvidia & University of Washington): Robotics, Artificial intelligence, State Estimation
Fei-Fei Li(Stanford University): Computer Vision
Guofeng Zhang(ZJU): 3D Vision, SLAM
Hao Su(UC San Diego): 3D Deep Learning
Jeannette Bohg(Stanford University): Perception for robotic manipulation and grasping
Jianping Shi(SenseTime): Computer Vision
Juxi Leitner(Australian Centre of Excellence for Robotic Vision (ACRV)): Robotic grasping
Lerrel Pinto(UC Berkeley): Robotics
Lorenzo Jamone(Queen Mary University of London (QMUL)): Cognitive Robotics
Lorenzo Natale(Italian Institute of Technology): Humanoid robotic sensing and perception
Kaiming He(Facebook AI Research (FAIR)): Deep Learning
Kai Xu(NUDT): Graphics, Geometry
Ken Goldberg(UC Berkeley): Robotics
Marc Pollefeys(Microsoft & ETH): Computer Vision
Markus Vincze(Technical University Wien (TUW)): Robotic Vision
Matthias Nießner(TUM): 3D reconstruction, Semantic 3D Scene Understanding
Oliver Brock(TU Berlin): Robotic manipulation
Pascal Fua(EPFL): Computer Vision
Peter K. Allen.(Columbia University): Robotic Grasping, 3-D vision, Modeling, Medical robotics
Peter Corke(Queensland University of Technology): Robotic vision
Pieter Abbeel(UC Berkeley): Artificial Intelligence, Advanced Robotics
Raquel Urtasun(Uber ATG & University of Toronto): AI for self-driving cars, Computer Vision, Robotics
Robert Platt(Northeastern University): Robotic manipulation
Ruigang Yang(Baidu): Computer Vision, Robotics
Sergey Levine(UC Berkeley): Reinforcement Learning
Shuran Song(Columbia University), 3D Deep Learning, Robotics
Silvio Savarese(Stanford University): Computer Vision
Song-Chun Zhu(UCLA): Computer Vision
Tamim Asfour(Karlsruhe Institute of Technology (KIT)): Humanoid Robotics
Thomas Funkhouser(Princeton University): Geometry, Graphics, Shape
Valerio Ortenzi(University of Birmingham): Robotic vision
Vicient Lepetit(University of Bordeaux): Machine Learning, 3D Vision
Xiaogang Wang(Chinese University of Hong Kong): Deep Learning, Computer Vision
Xiaozhi Chen(DJI): Deep learning
Yan Xinchen(Uber ATG): Deep Representation Learning, Generative Modeling
Yasutaka Furukawa(SFU): 3D Reconstruction
Yu Xiang(Nvidia): Robotics, Computer Vision
Yue Wang(MIT): 3D Deep Learning