Awesome-Self-Supervised-Papers
Collecting papers about Self-Supervised Learning, Representation Learning.
Last Update : 2021. 09. 26.
- Update papers that handles self-supervised learnning with distillation. (Seed, Compress, DisCo, DoGo, SimDis ...)
- Add a dense prediction paper (SoCo)
Any contributions, comments are welcome.
Computer Vision (CV)
Pretraining / Feature / Representation
Contrastive Learning
Dense Contrastive Learning
Conference / Journal | Paper | AP(bbox) @COCO | AP(mask) @COCO |
---|---|---|---|
NeurIPS 2020 | Unsupervised Learning of Dense Visual Representations | 39.2 | 35.6 |
arXiv:2011.09157 | Dense Contrastive Learning for Self-Supervised Visual Pre-Training | 40.3 @COCO | 36.4 |
arXiv:2011.10043 | Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning | 41.4 | 37.4 |
arXiv:2102.08318 | Instance Localization for Self-supervised Detection Pretraining | 42.0 | 37.6 |
arXiv:2103.06122 | Spatially Consistent Representation Learning | 41.3 | 37.7 |
arXiv:2103.10957 | Efficient Visual Pretraining with Contrastive Detection | 42.7 (DetCon_B) | 38.2 (DetCon_B) |
arXiv:2106.02637 | Aligning Pretraining for Detection via Object-Level Contrastive Learning | 43.2 | 38.4 |
Image Transformation
Conference / Journal | Paper | ImageNet Acc (Top 1). |
---|---|---|
ECCV 2016 | Colorful image colorization(Colorization) | 39.6% |
ECCV 2016 | Unsupervised learning of visual representations by solving jigsaw puzzles | 45.7% |
CVPR 2018 | Unsupervised Feature Learning via Non-Parametric Instance Discrimination (NPID, NPID++) | NPID: 54.0%, NPID++: 59.0% |
CVPR 2018 | Boosting Self-Supervised Learning via Knowledge Transfer (Jigsaw++) | - |
CVPR 2020 | Self-Supervised Learning of Pretext-Invariant Representations (PIRL) | 63.6 % |
CVPR 2020 | Steering Self-Supervised Feature Learning Beyond Local Pixel Statistics | - |
arXiv:2003.04298 | Multi-modal Self-Supervision from Generalized Data Transformations | - |
Self-supervised learning with Knowledge Distillation
Others (in Pretraining / Feature / Representation)
Identification / Verification / Classification / Recognition
Conference / Journal | Paper | Datasets | Performance |
---|---|---|---|
CVPR 2020 | Real-world Person Re-Identification via Degradation Invariance Learning | MLR-CHUK03 | Acc : 85.7(R@1) |
CVPR 2020 | Spatially Attentive Output Layer for Image Classification | ImageNet | Acc : 81.01 (Top-1) |
CVPR 2020 | Look-into-Object: Self-supervised Structure Modeling for Object Recognition | ImageNet | Top-1 err : 22.87 |
Segmentation / Depth Estimation
Conference / Journal | Paper | Datasets | Performance |
---|---|---|---|
CVPR 2020 | Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation | VOC 2012 | mIoU : 64.5 |
CVPR 2020 | Towards Better Generalization: Joint Depth-Pose Learning without PoseNet | KITTI 2015 | F1 : 18.05 % |
IROS 2020 | Monocular Depth Estimation with Self-supervised Instance Adaptation | KITTI 2015 | Abs Rel : 0.074 |
CVPR 2020 | Novel View Synthesis of Dynamic Scenes with Globally Coherent Depths from a Monocular Camera | - | - |
CVPR 2020 | Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-Supervision | GTA5->Cityscape | mIoU : 46.3 |
CVPR 2020 | D3VO : Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry | - | - |
CVPR 2020 | Self-Supervised Human Depth Estimation from Monocular Videos | - | - |
arxiv:2009.07714 | Calibrating Self-supervised Monocular Depth Estimation | KITTI | Abs Rel: 0.113 |
Detection / Localization
Conference / Journal | Paper | Datsets | Performance |
---|---|---|---|
CVPR 2020 | Instance-aweare, Context-focused, and Memory-efficient Weakly Supervised Object Detection | VOC 2012 | AP(50) : 67.0 |
Generation
Conference / Journal | Paper | Task |
---|---|---|
CVPR 2020 | StyleRig: Rigging StyleGAN for 3D Control over Portrait Images | Portrait Images |
ICLR 2020 | From Inference to Generation: End-to-End Fully Self-Supervised Generation of Human Face from Speech | Generate human face from speech |
ACMMM2020 | Neutral Face Game Character Auto-Creation via PokerFace-GAN | |
ICLR 2021 under review |
Self-Supervised Variational Auto-Encoders | FID: 34.71 (CIFAR-10) |
Video
Conference / Journal | Paper | Task | Performance | Datasets |
---|---|---|---|---|
TPAMI | A Review on Deep Learning Techniques for Video Prediction | Video prediction review | - | - |
CVPR 2020 | Distilled Semantics for Comprehensive Scene Understanding from Videos | Scene Understanding | Sq Rel : 0.748 | KITTI 2015 |
CVPR 2020 | Self-Supervised Learning of Video-Induced Visual Invariances | Representation Learning | - | - |
ECCV 2020 | Video Representation Learning by Recognizing Temporal Transformations | Representation Learning | 26.1 % (Video Retrieval Top-1) | UCF101 |
arXiv:2008.02531 | Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework | Representation Learning | 42.4 % (Video Retrieval Top-1) | UCF101 |
NeurIPS 2020 | Space-Time Correspondence as a Contrastive Random Walk | Contrastive Learning | 64.8 (Region Similarity) | DAVIS 2017 |
Others
Natural Language Processing (NLP)
Conference / Journal | Paper | Datasets | Performance |
---|---|---|---|
arXiv:2004.03808 | Improving BERT with Self-Supervised Attention | GLUE | Avg : 79.3 (BERT-SSA-H) |
arXiv:2004.07159 | PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation | MARCO | 0.498 (Rouge-L) |
ACL 2020 | TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition | - | - |
arXiv:1909.11942 | ALBERT: A Lite BERT For Self-Supervised Learning of Language Representations | GLUE | Avg : 89.4 |
AAAI 2020 | Learning to Compare for Better Training and Evaluation of Open Domain Natural Language Generation Models | - | - |
ACL 2020 | Contrastive Self-Supervised Learning for Commonsense Reasoning | PDP-60 | 90.0% |
Speech
Conference / Journal | Paper | Datasets | Performance |
---|---|---|---|
arXiv:1910.05453v3 | VQ-WAV2VEC: SELF-SUPERVISED LEARNING OF DISCRETE SPEECH REPRESENTATIONS | nov92 | WER : 2.34 |
arXiv:1911.03912v2 | EFFECTIVENESS OF SELF-SUPERVISED PRE-TRAINING FOR SPEECH RECOGNITION | Librispeech | WER : 4.0 |
ICASSP 2020 | Generative Pre-Training for Speech with Augoregressive Predictive Coding | - | - |
Interspeech 2020 | Jointly Fine-Tuning โBERT-likeโ Self Supervised Models to Improve Multimodal Speech Emotion Recognition | IEMOCAP | Emotion Acc: 75.458(%) |
Graph
Conference / Journal | Paper | Datasets | Performance |
---|---|---|---|
arXiv:2009.05923 | Contrastive Self-supervised Learning for Graph Classification | PROTEINS | A3-specific:85.80 |
arXiv:2102.13085 | Towards Robust Graph Contrastive Learning | Cora, Citeseer, Pubmed | Acc: 82.4 (Cora, GCA-DE) |
Reinforcement Learning
Conference / Journal | Paper | Performance |
---|---|---|
arxiv:2009.05923 | CONTRASTIVE BEHAVIORAL SIMILARITY EMBEDDINGS FOR GENERALIZATION IN REINFORCEMENT LEARNING | BiC-catch: 821ยฑ17 (Random Initialization / DrQ+PSEs) |