• Stars
    star
    575
  • Rank 77,622 (Top 2 %)
  • Language
  • Created over 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Paper bank for Self-Supervised Learning

Awesome-Self-Supervised-Papers

Collecting papers about Self-Supervised Learning, Representation Learning.

Last Update : 2021. 09. 26.

  • Update papers that handles self-supervised learnning with distillation. (Seed, Compress, DisCo, DoGo, SimDis ...)
  • Add a dense prediction paper (SoCo)

Any contributions, comments are welcome.

Computer Vision (CV)

Pretraining / Feature / Representation

Contrastive Learning

Conference / Journal Paper ImageNet Acc (Top 1)
CVPR 2006 Dimensionality Reduction by Learning an Invariant Mapping -
arXiv:1807.03748 Representation learning with contrastive predictive coding (CPC) -
arXiv:1911.05722 Momentum Contrast for Unsupervised Visual Representation Learning (MoCo) 60.6 %
arXiv:1905.09272 Data-Efficient Image Recognition contrastive predictive coding (CPC v2) 63.8 %
arXiv:1906.05849 Contrastive Multiview Coding (CMC) 66.2 %
arXiv:2002.05709 A Simple Framework for Contrastive Learning of Visual Representations (SimCLR) 69.3 %
arXiv:2003.12338 Improved Baselines with Momentum Contrastive Learning(MoCo v2) 71.1 %
arXiv:2003.05438 Rethinking Image Mixture for Unsupervised Visual Representation Learning 65.9 %
arXiv:2004.05554 Feature Lenses: Plug-and-play Neural Modules for Transformation-Invariant Visual Representations
arXiv:2006.10029 Big Self-Supervised Models are Strong Semi-Supervised Learners(SimCLRv2) 77.5 % (10% label)
arXiv:2006.07733 Bootstrap Your Own Latent A New Approach to Self-Supervised Learning 74.3 %
arXiv:2006.09882 Unsupervised Learning of Visual Features by Contrasting Cluster Assignments(SwAV) 75.3%
arXiv:2008.05659 What Should Not Be Contrastive in Contrastive Learning 80.2 % (ImageNet-100)
arXiv:2007.00224 Debiased Contrastive Learning 74.6 % (ImageNet-100)
arXiv:2009.00104 A Framework For Contrastive Self-Supervised Learning And Designing A New Approach -
ICLR2021 under review SELF-SUPERVISED REPRESENTATION LEARNING VIA ADAPTIVE HARD-POSITIVE MINING 72.3% (ResNet-50(4x): 77.3%)
IEEE Access Contrastive Representation Learning: A Framework and Review review paper
arXiv:2010.01929 EQCO: EQUIVALENT RULES FOR SELF-SUPERVISED CONTRASTIVE LEARNING 68.5 % (Proposed) / 66.6 % (SimCLR) / 200epochs
arXiv:2010.01028 Hard Negative Mixing for Contrastive Learning 68.0% / 200epochs
arXiv:2011.10566 Exploring Simple Siamese Representation Learning(SimSiam) 68.1% / 100 epochs / 256 batch
arXiv:2010.06682 Are all negatives created equal in contrastive instance discrimination? -
arXiv:2101.05224 Big Self-Supervised Models Advance Medical Image Classification AUC: 0.7729 (SimCLR / ImagNet--> Chexpert / ResNet-152(2x))
arXiv:2012.08850 Contrastive Learning Inverts the Data Generating Process Theoretical fondation about contrastive learning
arXiv:2103.01988 Self-supervised Pretraining of Visual Features in the Wild (finetune) 83.8%(693M parameters), 84.2%(1.3B parameters)
arXiv:2103.03230 Barlow Twins: Self-Supervised Learning via Redundancy Reduction 73.2%
arXiv:2104.02057 An Empirical Study of Training Self-Supervised Vision Transformers 81.0%

Dense Contrastive Learning

Conference / Journal Paper AP(bbox) @COCO AP(mask) @COCO
NeurIPS 2020 Unsupervised Learning of Dense Visual Representations 39.2 35.6
arXiv:2011.09157 Dense Contrastive Learning for Self-Supervised Visual Pre-Training 40.3 @COCO 36.4
arXiv:2011.10043 Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning 41.4 37.4
arXiv:2102.08318 Instance Localization for Self-supervised Detection Pretraining 42.0 37.6
arXiv:2103.06122 Spatially Consistent Representation Learning 41.3 37.7
arXiv:2103.10957 Efficient Visual Pretraining with Contrastive Detection 42.7 (DetCon_B) 38.2 (DetCon_B)
arXiv:2106.02637 Aligning Pretraining for Detection via Object-Level Contrastive Learning 43.2 38.4

Image Transformation

Conference / Journal Paper ImageNet Acc (Top 1).
ECCV 2016 Colorful image colorization(Colorization) 39.6%
ECCV 2016 Unsupervised learning of visual representations by solving jigsaw puzzles 45.7%
CVPR 2018 Unsupervised Feature Learning via Non-Parametric Instance Discrimination (NPID, NPID++) NPID: 54.0%, NPID++: 59.0%
CVPR 2018 Boosting Self-Supervised Learning via Knowledge Transfer (Jigsaw++) -
CVPR 2020 Self-Supervised Learning of Pretext-Invariant Representations (PIRL) 63.6 %
CVPR 2020 Steering Self-Supervised Feature Learning Beyond Local Pixel Statistics -
arXiv:2003.04298 Multi-modal Self-Supervision from Generalized Data Transformations -

Self-supervised learning with Knowledge Distillation

Conference / Journal Paper Method
NeurIPS 2020 CompRess: Self-Supervised Learning by Compressing Representations Similarity Distribution + Memory bank
ICLR 2021 SEED: SELF-SUPERVISED DISTILLATION FOR VISUAL REPRESENTATION Similarity Distribution + Memory bank
arXiv:2104.09124 DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning Contrastive Learning w/ Teacher Model
arXiv:2104.09866 Distill on the Go: Online knowledge distillation in self-supervised learning Contrastive Learnning w/ Teacher Model
arXiv:2104.14294 Emerging Properties in Self-Supervised Vision Transformers Self Distillation w/ Teacher Model
ICLR 2022 iBOT: Image BERT Pre-Training with Online Tokenizer Self Distillation w/ Teacher Model + Masked Image Modeling
arXiv:2106.11304 Simple Distillation Baselines for Improving Small Self-supervised Models Contrastive Learning w/ Teacher Model + Multi-view loss
arXiv:2107.01691 Bag of Instances Aggregation Boosts Self-supervised Learning Bag aggregation

Others (in Pretraining / Feature / Representation)

Conference / Journal Paper Method
ICLR2018 Unsupervised Representation Learning by Predicting Image Rotations Surrogate classes, pre-training
ICML 2018 Mutual Information Neural Estimation Mutual Information
NeurIPS 2019 Wasserstein Dependency Measure for Representation Learning Mutual Information
ICLR 2019 Learning Deep Representations by Mutual Information Estimation and Maximization Mutual Information
arXiv:1903.12355 Local Aggregation for Unsupervised Learning of Visual Embeddings Local Aggregation
arXiv:1906.00910 Learning Representations by Maximizing Mutual Information Across Views Mutual Information
arXiv:1907.02544 Large Scale Adversarial Representation Learning(BigBiGAN) Adversarial Training
ICLR 2020 On Mutual Information Maximization for Representation Learning Mutual Information
CVPR 2020 How Useful is Self-Supervised Pretraining for Visual Tasks? -
CVPR 2020 Adversarial Robustness: From Self-Supervised Pre-Training to Fine-Tuning Adversarial Training
ICLR 2020 Self-Labeling via Simultaneous Clustering and Representation Learning Information
arXiv:1912.11370 Big Transfer (BiT): General Visual Representation Learning pre-training
arXiv:2009.07724 Evaluating Self-Supervised Pretraining Without Using Labels pre-training
arXiv:2010.00578 UNDERSTANDING SELF-SUPERVISED LEARNING WITH DUAL DEEP NETWORKS Dual Deep Network
ICLR 2021 under review REPRESENTATION LEARNING VIA INVARIANT CAUSAL MECHANISMS Casual mechanism
arXiv:2006.06882 Rethinking Pre-training and Self-training Rethinking
arXiv:2102.12903 Self-Tuning for Data-Efficient Deep Learning Data-efficient deep learning
arXiv:2102.10106 Mine Your Own vieW: Self-Supervised Learning Through Across-Sample Prediction Find similar samples
ECCV 2020 Mitigating Embedding and Class Assignment Mismatch in Unsupervised Image Classification Feature embedding & refining
arXiv:2102.11150 Improving Unsupervised Image Clustering With Robust Learning Pseudo-label, clustering
CVPR 2021 How Well Do Self-Supervised Models Transfer? Benchmarking

Identification / Verification / Classification / Recognition

Conference / Journal Paper Datasets Performance
CVPR 2020 Real-world Person Re-Identification via Degradation Invariance Learning MLR-CHUK03 Acc : 85.7(R@1)
CVPR 2020 Spatially Attentive Output Layer for Image Classification ImageNet Acc : 81.01 (Top-1)
CVPR 2020 Look-into-Object: Self-supervised Structure Modeling for Object Recognition ImageNet Top-1 err : 22.87

Segmentation / Depth Estimation

Conference / Journal Paper Datasets Performance
CVPR 2020 Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation VOC 2012 mIoU : 64.5
CVPR 2020 Towards Better Generalization: Joint Depth-Pose Learning without PoseNet KITTI 2015 F1 : 18.05 %
IROS 2020 Monocular Depth Estimation with Self-supervised Instance Adaptation KITTI 2015 Abs Rel : 0.074
CVPR 2020 Novel View Synthesis of Dynamic Scenes with Globally Coherent Depths from a Monocular Camera - -
CVPR 2020 Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-Supervision GTA5->Cityscape mIoU : 46.3
CVPR 2020 D3VO : Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry - -
CVPR 2020 Self-Supervised Human Depth Estimation from Monocular Videos - -
arxiv:2009.07714 Calibrating Self-supervised Monocular Depth Estimation KITTI Abs Rel: 0.113

Detection / Localization

Conference / Journal Paper Datsets Performance
CVPR 2020 Instance-aweare, Context-focused, and Memory-efficient Weakly Supervised Object Detection VOC 2012 AP(50) : 67.0

Generation

Conference / Journal Paper Task
CVPR 2020 StyleRig: Rigging StyleGAN for 3D Control over Portrait Images Portrait Images
ICLR 2020 From Inference to Generation: End-to-End Fully Self-Supervised Generation of Human Face from Speech Generate human face from speech
ACMMM2020 Neutral Face Game Character Auto-Creation via PokerFace-GAN
ICLR 2021
under review
Self-Supervised Variational Auto-Encoders FID: 34.71 (CIFAR-10)

Video

Conference / Journal Paper Task Performance Datasets
TPAMI A Review on Deep Learning Techniques for Video Prediction Video prediction review - -
CVPR 2020 Distilled Semantics for Comprehensive Scene Understanding from Videos Scene Understanding Sq Rel : 0.748 KITTI 2015
CVPR 2020 Self-Supervised Learning of Video-Induced Visual Invariances Representation Learning - -
ECCV 2020 Video Representation Learning by Recognizing Temporal Transformations Representation Learning 26.1 % (Video Retrieval Top-1) UCF101
arXiv:2008.02531 Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework Representation Learning 42.4 % (Video Retrieval Top-1) UCF101
NeurIPS 2020 Space-Time Correspondence as a Contrastive Random Walk Contrastive Learning 64.8 (Region Similarity) DAVIS 2017

Others

Conference / Journal Paper Task Performance
CVPR 2020 Flow2Stereo: Effective Self-Supervised Learning of Optical Flow and Stereo Matching Optical Flow F1 : 7.63% (KITTI 2012)
CVPR 2020 Self-Supervised Viewpoint Learning From Image Collections Viewpoint learning MAE : 4.0 (BIWI)
CVPR 2020 Self-Supervised Scene De-occlusion Remove occlusion mAP : 29.3 % (KINS)
CVPR 2020 Distilled Semantics for Comprehensive Scene Understanding from Videos Scene Understanding -
CVPR 2020 Learning by Analogy : Reliable Supervision from Transformations for Unsupervised Optical Flow Estimation Optical Flow F1 : 11.79% (KITTI 2015)
CVPR 2020 D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features 3D Local Features -
CVPR 2020 SpeedNet: Learning the Speediness in Videos predict the "speediness" -
CVPR 2020 Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation Action Segmentation F1@10 : 83.0 (GTEA)
CVPR 2020 MVP: Unified Motion and Visual Self-Supervised Learning for Large-Scale Robotic Navigation Robotic Navigation -
arXiv:2003.06734 Active Perception and Representation for Robotic Manipulation Robot manipulation -
arXiv:2005.01655 Words arenโ€™t enough, their order matters: On the Robustness of Grounding Visual Referring Expressions Visual Referring Expressions -
arXiv:2004.11362 Supervised Contrastive Learning Supervised Contrastive Learning ImageNet Acc: 80.8 (Top-1)
arXiv:2007.14449 Learning from Scale-Invariant Examples for Domain Adaptation in Semantic Segmentation Domain Adaptation GTA5 to Cityscape : 47.5 (mIoU)
arXiv:2007.12360 On the Effectiveness of Image Rotation for Open Set Domain Adaptation Domain Adaptation -
arXiv:2003.12283 LIMP: Learning Latent Shape Representations with Metric Preservation Priors Geneartive models -
arXiv:2004.04312 Learning to Scale Multilingual Representations for Vision-Language Tasks Vision-Language MSCOCO: 81.5
arXiv:2003.08934 NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis View Synthesis -
arXiv:2001.01536 Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification Knowledge Distillation, Long-tail classification -
arXiv:2006.07114 Knowledge Distillation Meets Self-Supervision Knowledge Distillation Res50 --> MobileNetv2 Acc: 72.57 (Top-1)
AAAI2020 Fast and Robust Face-to-Parameter Translation for Game Character Auto-Creation Game Character Auto-Creation -
arXiv:2009.07719 Domain-invariant Similarity Activation Map Metric Learning for Retrieval-based Long-term Visual Localization Similarity Activation Map -
arXiv:2008.10312 Self-Supervised Learning for Large-Scale Unsupervised Image Clustering Image Clustering ImageNet Acc: 38.60 (cluster assignment)
ICLR2021 under review SSD: A UNIFIED FRAMEWORK FOR SELFSUPERVISED OUTLIER DETECTION Outlier Detection CIFAR10/CIFAR100 : 94.1% (in/out)

Natural Language Processing (NLP)

Conference / Journal Paper Datasets Performance
arXiv:2004.03808 Improving BERT with Self-Supervised Attention GLUE Avg : 79.3 (BERT-SSA-H)
arXiv:2004.07159 PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation MARCO 0.498 (Rouge-L)
ACL 2020 TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition - -
arXiv:1909.11942 ALBERT: A Lite BERT For Self-Supervised Learning of Language Representations GLUE Avg : 89.4
AAAI 2020 Learning to Compare for Better Training and Evaluation of Open Domain Natural Language Generation Models - -
ACL 2020 Contrastive Self-Supervised Learning for Commonsense Reasoning PDP-60 90.0%

Speech

Conference / Journal Paper Datasets Performance
arXiv:1910.05453v3 VQ-WAV2VEC: SELF-SUPERVISED LEARNING OF DISCRETE SPEECH REPRESENTATIONS nov92 WER : 2.34
arXiv:1911.03912v2 EFFECTIVENESS OF SELF-SUPERVISED PRE-TRAINING FOR SPEECH RECOGNITION Librispeech WER : 4.0
ICASSP 2020 Generative Pre-Training for Speech with Augoregressive Predictive Coding - -
Interspeech 2020 Jointly Fine-Tuning โ€œBERT-likeโ€ Self Supervised Models to Improve Multimodal Speech Emotion Recognition IEMOCAP Emotion Acc: 75.458(%)

Graph

Conference / Journal Paper Datasets Performance
arXiv:2009.05923 Contrastive Self-supervised Learning for Graph Classification PROTEINS A3-specific:85.80
arXiv:2102.13085 Towards Robust Graph Contrastive Learning Cora, Citeseer, Pubmed Acc: 82.4 (Cora, GCA-DE)

Reinforcement Learning

Conference / Journal Paper Performance
arxiv:2009.05923 CONTRASTIVE BEHAVIORAL SIMILARITY EMBEDDINGS FOR GENERALIZATION IN REINFORCEMENT LEARNING BiC-catch: 821ยฑ17 (Random Initialization / DrQ+PSEs)