• Stars
    star
    252
  • Rank 155,467 (Top 4 %)
  • Language
  • License
    MIT License
  • Created almost 3 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

My collection of machine learning papers

ML Papers

Reviews

  1. 191210 최근 논문들에 대한 생각
  2. 200323 최근 논문들에 대한 생각
  3. 200326 최근 논문들에 대한 생각
  4. 200403 최근 논문들에 대한 생각
  5. 200411 최근 논문들에 대한 생각
  6. 200708 최근 논문들에 대한 생각
  7. 200717 최근 논문들에 대한 생각
  8. 200726 최근 논문들에 대한 생각
  9. 200802 최근 논문들에 대한 생각
  10. 201118 최근 논문들에 대한 생각
  11. 201120 최근 논문들에 대한 생각
  12. 201125 최근 논문들에 대한 생각
  13. 201126 최근 논문들에 대한 생각 1
  14. 201126 최근 논문들에 대한 생각 2
  15. 201204 최근 논문들에 대한 생각
  16. 210121 최근 논문들에 대한 생각
  17. 210121 최근 논문들에 대한 생각
  18. 210305 최근 논문들에 대한 생각
  19. 210319 최근 논문들에 대한 생각
  20. 210323 최근 논문들에 대한 생각
  21. 210326 최근 논문들에 대한 생각
  22. 210403 최근 논문들에 대한 생각
  23. 210412 최근 논문들에 대한 생각
  24. 210424 최근 논문들에 대한 생각
  25. 210429 최근 논문들에 대한 생각
  26. 210430 최근 논문들에 대한 생각 1
  27. 210430 최근 논문들에 대한 생각
  28. 210505 최근 논문들에 대한 생각
  29. 210508 최근 논문들에 대한 생각
  30. 230222 LLM 필요 데이터셋에 대한 리뷰

Table of contents

  1. 3d generative model
  2. activation
  3. active learning
  4. adaptation
  5. adapter
  6. adversarial training
  7. antialiasing
  8. asr
  9. attention
  10. audio generation
  11. audio source separation
  12. augmentation
  13. autoregressive model
  14. backbone
  15. bayesian
  16. bert
  17. bias
  18. calibration
  19. causality
  20. channel attention
  21. chat
  22. classificiation
  23. computation
  24. continual learning
  25. contrastive learning
  26. convolution
  27. dataset
  28. ddpm
  29. decoding
  30. deep prior
  31. detr
  32. dewarping
  33. dialog
  34. differentiable operator
  35. differentiable tree
  36. discrete vae
  37. disentangle
  38. distillation
  39. distributed training
  40. domain adaptation
  41. dropout
  42. efficiency
  43. efficient attention
  44. efficient training
  45. embedding
  46. end2end
  47. energy based model
  48. ensemble
  49. federated learning
  50. few shot
  51. finetuning
  52. flow
  53. fpn
  54. gan
  55. gan inversion
  56. generalization
  57. generative model
  58. graph
  59. hallucination
  60. hypernetwork
  61. hyperparameter
  62. identifiability
  63. image editing
  64. image generation
  65. img2img
  66. implicit model
  67. implicit representation
  68. in context learning
  69. instance segmentation
  70. instruct
  71. interpolation
  72. knowledge base
  73. language generation
  74. language model
  75. layout
  76. lightweight
  77. line
  78. llm
  79. lm
  80. local attention
  81. loss
  82. loss surface
  83. matting
  84. memory
  85. meta learning
  86. metric
  87. metric learning
  88. mixture of experts
  89. mixup
  90. mlm
  91. mlops
  92. multilingual
  93. multimodal
  94. multimodal generation
  95. multitask
  96. nas
  97. nerf
  98. neural computer
  99. neural ode
  100. neural rendering
  101. nlp
  102. nmt
  103. non autoregressive
  104. norm free
  105. normalization
  106. object detection
  107. ocr
  108. open set recognition
  109. optimization
  110. optimizer
  111. oriented object detection
  112. out of distribution
  113. panoptic segmentation
  114. perceptual loss
  115. point cloud
  116. pooling
  117. pose
  118. positional encoding
  119. practice
  120. pretraining
  121. probabilistic model
  122. prompt
  123. pruning
  124. qa
  125. quantization
  126. reasoning
  127. regularization
  128. reinforcement learning
  129. rendering
  130. representation
  131. resampling
  132. restoration
  133. retrieval
  134. review
  135. robustness
  136. saliency
  137. salient object detection
  138. scale
  139. score
  140. self supervised
  141. self supervised discovery
  142. semantic factor
  143. semantic segmentation
  144. semi supervised learning
  145. sgld
  146. singing voice synthesis
  147. single image
  148. speech
  149. state space model
  150. structure learning
  151. style transfer
  152. stylegan
  153. super resolution
  154. table
  155. text generation
  156. text2img
  157. tokenizer
  158. topic model
  159. topology
  160. tracking
  161. training
  162. transducer
  163. transfer
  164. transformer
  165. tropical geometry
  166. tts
  167. uncertainty
  168. unsupervised img2img
  169. unsupervised nmt
  170. vae
  171. video
  172. video transformer
  173. vision
  174. vision language
  175. vision transformer
  176. visual grounding
  177. vit
  178. vocoder
  179. vqa
  180. weak supervision
  181. yolo
  182. uncategorized

3d generative model

  1. 211220 3D-aware Image Synthesis via Learning Structural and Textural Representations
  2. 220615 GRAM-HD
  3. 220621 EpiGRAF
  4. 221125 3DDesigner #text2img
  5. 221126 AvatarGen
  6. 230209 In-N-Out #gan_inversion
  7. 230216 3D-aware Conditional Image Synthesis

activation

  1. 201019 Smooth activations and reproducibility in deep networks #stability

active learning

  1. 200630 Similarity Search for Efficient Active Learning and Search of Rare
  2. 210729 Batch Active Learning at Scale

adaptation

  1. 200129 Side-Tuning
  2. 200130 Once for All #deploy

adapter

  1. 210608 Compacter
  2. 220524 AdaMix #moe

adversarial training

  1. 200130 Adversarial Examples Improve Image Recognition
  2. 200625 Smooth Adversarial Training

antialiasing

  1. 201120 An Effective Anti-Aliasing Approach for Residual Networks
  2. 201128 Truly shift-invariant convolutional neural networks

asr

  1. 200220 Imputer #non-autoregressive #ctc
  2. 200506 RNN-T Models Fail to Generalize to Out-of-Domain Audio #transducer #out_of_distribution #domain #regularization
  3. 200510 Listen Attentively, and Spell Once #non-autoregressive
  4. 200516 Large scale weakly and semi-supervised learning for low-resource video ASR #weak_supervision #semi_supervised_learning
  5. 200516 Reducing Spelling Inconsistencies in Code-Switching ASR using #ctc
  6. 200516 Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition #non-autoregressive
  7. 200518 Attention-based Transducer for Online Speech Recognition #transducer
  8. 200518 Iterative Pseudo-Labeling for Speech Recognition
  9. 200519 Distilling Knowledge from Ensembles of Acoustic Models for Joint CTC-Attention End-to-End Speech Recognition #ctc
  10. 200519 Improved Noisy Student Training for Automatic Speech Recognition #semi_supervised_learning
  11. 200729 Developing RNN-T Models Surpassing High-Performance Hybrid Models with #rnn_t
  12. 201021 FastEmit #transducer #decoding
  13. 201027 CASS-NAT #non-autoregressive
  14. 201125 Streaming end-to-end multi-talker speech recognition #transducer
  15. 210524 Unsupervised Speech Recognition #unsupervised_training
  16. 210608 SpeechBrain
  17. 211012 Word Order Does Not Matter For Speech Recognition #weak_supervision
  18. 211030 Pseudo-Labeling for Massively Multilingual Speech Recognition #semi_supervised_learning #multilingual
  19. 211210 Building a great multi-lingual teacher with sparsely-gated mixture of experts for speech recognition #moe
  20. 220829 A Language Agnostic Multilingual Streaming On-Device ASR System #multilingual
  21. 220922 Whisper

attention

  1. 200122 Object Contextual Representations #semantic_segmentation
  2. 200129 Empirical Attention
  3. 200130 Axial Attention #generative_model
  4. 200130 Criss-Cross Attention #semantic_segmentation
  5. 200212 Capsules with Inverted Dot-Product Attention Routing #capsule
  6. 200219 Tree-structured Attention with Hierarchical Accumulation #parse
  7. 200226 Sparse Sinkhorn Attention #sparse_attention
  8. 200317 Axial-DeepLab #panoptic_segmentation
  9. 200404 Neural Architecture Search for Lightweight Non-Local Networks
  10. 200421 Attention is Not Only a Weight #bert
  11. 200423 Self-Attention Attribution #bert
  12. 200428 Exploring Self-attention for Image Recognition
  13. 200510 CTC-synchronous Training for Monotonic Attention Model #asr #ctc
  14. 200516 Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory #asr #memory
  15. 200519 Normalized Attention Without Probability Cage
  16. 200519 Staying True to Your Word
  17. 200626 Object-Centric Learning with Slot Attention
  18. 201119 On the Dynamics of Training Attention Models #training
  19. 210223 Linear Transformers Are Secretly Fast Weight Memory Systems #linear_attention #efficient_attention
  20. 210225 LazyFormer #bert
  21. 210517 Pay Attention to MLPs #mlp
  22. 210524 Self-Attention Networks Can Process Bounded Hierarchical Languages #nlp
  23. 210826 Train Short, Test Long #positional_encoding

audio generation

  1. 220220 It's Raw! Audio Generation with State-Space Models
  2. 230126 MusicLM
  3. 230208 Noise2Music

audio source separation

  1. 211019 The Cocktail Fork Problem

augmentation

  1. 200122 FixMatch #semi_supervised_learning #manifold #mixup
  2. 200220 Affinity and Diversity
  3. 200621 AdvAug #mixup #nlp #adversarial_training
  4. 200710 Meta-Learning Requires Meta-Augmentation #metalearning
  5. 201117 Sequence-Level Mixed Sample Data Augmentation #nlp
  6. 201213 Simple Copy-Paste is a Strong Data Augmentation Method for Instance #instance_segmentation
  7. 201214 Improving Panoptic Segmentation at All Scales #panoptic_segmentation
  8. 210318 AlignMix #mixup
  9. 210318 TrivialAugment
  10. 210429 Ensembling with Deep Generative Views #ensemble #gan_inversion
  11. 220830 Augraphy

autoregressive model

  1. 200129 Semi Autorgressive Training
  2. 201027 Scaling Laws for Autoregressive Generative Modeling #scale
  3. 211216 Characterizing and addressing the issue of oversmoothing in neural autoregressive sequence modeling
  4. 220622 Scaling Autoregressive Models for Content-Rich Text-to-Image Generation #image_generation
  5. 230202 Accelerating Large Language Model Decoding with Speculative Sampling #decoding

backbone

  1. 190724 MixNet #convolution
  2. 200123 Antialiasing #invariance
  3. 200128 Attentive Normalization
  4. 200128 IBN-Net
  5. 200128 Selective Kernel
  6. 200128 SpineNet
  7. 200128 Squeeze-Excitation
  8. 200128 Switchable Normalization
  9. 200128 Switchable Whitening
  10. 200129 Assembled Techniques #regularization
  11. 200129 DenseNet
  12. 200129 Dual Path Networks
  13. 200129 HarDNet
  14. 200129 PyramidNet
  15. 200129 SelecSLS
  16. 200129 ShuffleNet V2 #efficiency
  17. 200129 VoVNet
  18. 200130 FishNet
  19. 200130 HRNet
  20. 200130 MixConv #convolution
  21. 200330 Designing Network Design Spaces #hypernetwork
  22. 200330 TResNet #antialiasing
  23. 200419 ResNeSt
  24. 200630 Deep Isometric Learning for Visual Recognition #normalization #resnet #cnn #norm_free
  25. 200712 PSConv #cnn #multiscale
  26. 201015 HS-ResNet #multiscale
  27. 201221 FcaNet #channel_attention
  28. 210226 Transformer in Transformer #vision_transformer
  29. 210304 Barlow Twins #self_supervised #contrastive_learning
  30. 210310 Involution #convolution #attention
  31. 210312 Revisiting ResNets #resnet
  32. 210317 Learning to Resize Images for Computer Vision Tasks #resizing
  33. 210331 EfficientNetV2
  34. 210408 SI-Score #robustness #vision_transformer
  35. 210505 RepMLP #mlp
  36. 210506 Do You Even Need Attention #mlp
  37. 210510 ResMLP #mlp
  38. 210617 Layer Folding #efficiency #pruning
  39. 210628 Early Convolutions Help Transformers See Better #cnn #vit
  40. 210718 AS-MLP #mlp
  41. 210726 Contextual Transformer Networks for Visual Recognition
  42. 211014 Non-deep Networks
  43. 211018 HRFormer #vit
  44. 211227 Augmenting Convolutional networks with attention-based aggregation #vit #cnn
  45. 220110 A ConvNet for the 2020s #cnn #vit
  46. 220313 Scaling Up Your Kernels to 31x31
  47. 220318 Three things everyone should know about Vision Transformers #vit
  48. 220728 HorNet #cnn

bayesian

  1. 200207 Bayes Posterior
  2. 200210 Liberty or Depth #mean_field
  3. 200514 Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors #ensemble #variational_inference

bert

  1. 200305 What the [MASK]
  2. 200405 FastBERT #distillation #lightweight
  3. 200408 DynaBERT #distillation #pruning
  4. 200412 XtremeDistil #distillation #lightweight
  5. 200427 DeeBERT #lightweight
  6. 200518 Audio ALBERT #audio #representation
  7. 200601 Amnesic Probing
  8. 200608 On the Stability of Fine-tuning BERT #finetuning
  9. 200610 Revisiting Few-sample BERT Fine-tuning #finetuning
  10. 210906 An Empirical Study on Few-shot Knowledge Probing for Pretrained Language Models #few_shot #knowledge_base #prompt
  11. 210907 Beyond Preserved Accuracy #lightweight #distillation

bias

  1. 200519 Identifying Statistical Bias in Dataset Replication
  2. 201202 Learning from others' mistakes #product_of_experts
  3. 220919 The Biased Artist #image_generation

calibration

  1. 200221 Calibrating Deep Neural Networks using Focal Loss #loss
  2. 200223 Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks #bayesian
  3. 200620 Regression Prior Networks
  4. 210730 Soft Calibration Objectives for Neural Networks

causality

  1. 200518 An Analysis of the Adaptation Speed of Causal Models

channel attention

  1. 200129 GCNet

chat

  1. 200630 PLATO-2 #text_gen #chatbot

classificiation

  1. 220107 Generalized Category Discovery #open_set_recognition

computation

  1. 200213 Training Large Neural Networks with Constant Memory using a New Execution Algorithm
  2. 201204 Nimble

continual learning

  1. 201124 Energy-Based Models for Continual Learning #energy_based_model
  2. 211103 One Pass ImageNet #online_learning

contrastive learning

  1. 200213 A Simple Framework for Contrastive Learning of Visual Representations #augmentation
  2. 200309 Improved Baselines with Momentum Contrastive Learning
  3. 200423 Supervised Contrastive Learning #metric_learning
  4. 200511 Prototypical Contrastive Learning of Unsupervised Representations
  5. 200520 What Makes for Good Views for Contrastive Learning
  6. 200613 Bootstrap your own latent
  7. 200630 Debiased Contrastive Learning
  8. 200730 Contrastive Learning for Unpaired Image-to-Image Translation #img2img
  9. 200803 LoCo
  10. 201020 BYOL works even without batch statistics
  11. 201109 Towards Domain-Agnostic Contrastive Learning #mixup #multimodal
  12. 201116 AdCo #adversarial_training
  13. 201117 Dense Contrastive Learning for Self-Supervised Visual Pre-Training
  14. 201119 Heterogeneous Contrastive Learning
  15. 201119 Propagate Yourself
  16. 201121 Run Away From your Teacher
  17. 201123 Boosting Contrastive Self-Supervised Learning with False Negative
  18. 201126 Beyond Single Instance Multi-view Unsupervised Representation Learning #self_supervised #mixup
  19. 201126 How Well Do Self-Supervised Models Transfer #self_supervised #transfer
  20. 201127 Self-EMD
  21. 201201 Towards Good Practices in Self-supervised Representation Learning #self_supervised
  22. 201204 Seed the Views #mixup
  23. 201212 Contrastive Learning for Label-Efficient Semantic Segmentation #semantic_segmentation
  24. 201221 Online Bag-of-Visual-Words Generation for Unsupervised Representation #self_supervised #discrete_vae
  25. 201226 Spatial Contrastive Learning for Few-Shot Classification #few_shot #attention
  26. 210325 Rethinking Self-Supervised Learning #training
  27. 210405 An Empirical Study of Training Self-Supervised Vision Transformers #vision_transformer
  28. 210426 Multimodal Contrastive Training for Visual Representation Learning #multimodal
  29. 210429 A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning #video
  30. 210429 Emerging Properties in Self-Supervised Vision Transformers #saliency #vision_transformer #representation
  31. 210429 With a Little Help from My Friends #knn
  32. 210510 Self-Supervised Learning with Swin Transformers #vision_transformer
  33. 210511 VICReg
  34. 210517 Divide and Contrast #self_supervised #dataset #distillation
  35. 210601 Exploring the Diversity and Invariance in Yourself for Visual Pre-Training Task
  36. 211018 Understanding Dimensional Collapse in Contrastive Self-supervised Learning
  37. 220701 e-CLIP #vision-language #retrieval
  38. 220727 Contrastive Masked Autoencoders are Stronger Vision Learners #self_supervised #mlm
  39. 220804 Fine-Grained Semantically Aligned Vision-Language Pre-Training #vision-language
  40. 221017 Non-Contrastive Learning Meets Language-Image Pre-Training #clip

convolution

  1. 200316 SlimConv
  2. 210429 Decoupled Dynamic Filter Networks
  3. 230221 Hyena Hierarchy #state_space_model

dataset

  1. 200509 Building a Manga Dataset
  2. 201130 Image Quality Assessment for Perceptual Image Restoration #score
  3. 201201 Weakly-Supervised Arbitrary-Shaped Text Detection with #ocr #weak_supervision
  4. 210601 Comparing Test Sets with Item Response Theory
  5. 210907 Datasets
  6. 210927 PASS
  7. 211103 LAION-400M
  8. 220704 How Much More Data Do I Need
  9. 230220 Poisoning Web-Scale Training Datasets is Practical

ddpm

  1. 200619 Denoising Diffusion Probabilistic Models
  2. 201126 Score-Based Generative Modeling through Stochastic Differential #generative_model
  3. 201214 Learning Energy-Based Models by Diffusion Recovery Likelihood #energy_based_model
  4. 210302 Fixing Data Augmentation to Improve Adversarial Robustness #augmentation #generative_model
  5. 210305 Fixing Data Augmentation to Improve Adversarial Robustness 2 #robustness #augmentation #generative_model
  6. 210506 DiffSinger #singing_voice_synthesis
  7. 210511 Diffusion Models Beat GANs on Image Synthesis
  8. 210528 Gotta Go Fast When Generating Data with Score-Based Models
  9. 210531 On Fast Sampling of Diffusion Probabilistic Models
  10. 210607 Learning to Efficiently Sample from Diffusion Probabilistic Models
  11. 210610 Cascaded Diffusion Models for High Fidelity Image Generation
  12. 210610 Score-based Generative Modeling in Latent Space
  13. 210612 D2C
  14. 210701 Variational Diffusion Models
  15. 210802 SDEdit
  16. 210819 ImageBART #vq #autoregressive_model
  17. 211129 Blended Diffusion for Text-driven Editing of Natural Images #clip #image_editing
  18. 211130 Diffusion Autoencoders
  19. 211220 GLIDE #multimodal
  20. 211220 High-Resolution Image Synthesis with Latent Diffusion Models #vae #vq
  21. 220201 Progressive Distillation for Fast Sampling of Diffusion Models #distillation
  22. 220316 Dual Diffusion Implicit Bridges for Image-to-Image Translation
  23. 220524 Imagen #conditional_generative_model
  24. 220601 Elucidating the Design Space of Diffusion-Based Generative Models
  25. 220803 Pyramidal Denoising Diffusion Probabilistic Models
  26. 220808 Analog Bits
  27. 220912 Blurring Diffusion Models
  28. 220912 Soft Diffusion
  29. 220929 DreamFusion #3d_generative_model
  30. 221017 Imagic #image_editing
  31. 221018 Differentially Private Diffusion Models
  32. 221102 eDiffi #text2img
  33. 221115 Versatile Diffusion #vae
  34. 221117 Null-text Inversion for Editing Real Images using Guided Diffusion Models #image_editing
  35. 221118 Magic3D #3d_generative_model #text2img #nerf
  36. 221120 Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models #text2img
  37. 221124 Fast Sampling of Diffusion Models via Operator Learning
  38. 230126 On the Importance of Noise Scheduling for Diffusion Models
  39. 230126 simple diffusion
  40. 230131 Attend-and-Excite #text2img
  41. 230205 Design Booster #image_editing
  42. 230206 Zero-shot Image-to-Image Translation #image_editing
  43. 230207 Long Horizon Temperature Scaling #calibration #lm
  44. 230208 Q-Diffusion #quantization
  45. 230212 I$^2$SB #sde #image_restoration
  46. 230215 PRedItOR #image_editing
  47. 230216 MultiDiffusion #image_editing
  48. 230220 Composer #image_editing
  49. 230221 Diffusion Models and Semi-Supervised Learners Benefit Mutually with Few Labels #semi_supervised_learning #self_supervised
  50. 230221 On Calibrating Diffusion Probabilistic Models

decoding

  1. 200516 Layer-Wise Cross-View Decoding for Sequence-to-Sequence Learning
  2. 200601 Cascaded Text Generation with Markov Transformers #text_generation
  3. 210608 FastSeq

deep prior

  1. 200408 Deep Manifold Prior

detr

  1. 210813 Conditional DETR for Fast Training Convergence
  2. 220726 Group DETR #efficient_training

dewarping

  1. 211025 DocTr
  2. 211028 DocScanner

dialog

  1. 200129 Meena #NLP
  2. 210715 Beyond Goldfish Memory
  3. 220120 LaMDA

differentiable operator

  1. 200220 Fast Differentiable Sorting and Ranking

differentiable tree

  1. 200218 The Tree Ensemble Layer

discrete vae

  1. 200518 Robust Training of Vector Quantized Bottleneck Models

disentangle

  1. 200130 ID-GAN #GAN
  2. 200130 MixNMatch #conditional_generative_model
  3. 200515 Face Identity Disentanglement via Latent Space Mapping

distillation

  1. 200129 Learning by Cheating
  2. 200209 Understanding and Improving Knowledge Distillation
  3. 200210 Subclass Distillation
  4. 200219 Knapsack Pruning with Inner Distillation #pruning #lightweight
  5. 200221 Residual Knowledge Distillation
  6. 200309 Knowledge distillation via adaptive instance normalization #normalization
  7. 200521 Why distillation helps #calibration
  8. 200629 An EM Approach to Non-autoregressive Conditional Sequence Generation #non-autoregressive
  9. 200701 Go Wide, Then Narrow #lightweight
  10. 200702 Interactive Knowledge Distillation
  11. 210726 Text is Text, No Matter What #multitask

distributed training

  1. 210510 GSPMD
  2. 230121 SuperScaler

domain adaptation

  1. 200526 Keep it Simple

dropout

  1. 200701 On Dropout, Overfitting, and Interaction Effects in Deep Neural Networks

efficiency

  1. 230130 Alternating Updates for Efficient Transformers

efficient attention

  1. 200410 Longformer
  2. 200412 ProFormer
  3. 200605 Masked Language Modeling for Proteins via Linearly Scalable Long-Context
  4. 200608 Linformer
  5. 210324 Finetuning Pretrained Transformers into RNNs
  6. 210505 Beyond Self-attention
  7. 210510 Poolingformer
  8. 210603 Luna
  9. 210623 Stable, Fast and Accurate
  10. 210705 Long-Short Transformer #local_attention
  11. 210712 Combiner #sparse_attention #local_attention
  12. 210725 H-Transformer-1D
  13. 211210 Self-attention Does Not Need $O(n^2)$ Memory
  14. 220527 FlashAttention
  15. 220726 DETRs with Hybrid Matching #detr
  16. 220911 On The Computational Complexity of Self-Attention
  17. 220921 Mega

efficient training

  1. 230216 Decoupled Model Schedule for Deep Learning Training #distributed_training

embedding

  1. 200424 All Word Embeddings from One Embedding
  2. 200717 A Unifying Perspective on Neighbor Embeddings along the
  3. 210907 Rare Words Degenerate All Words

end2end

  1. 200605 End-to-End Adversarial Text-to-Speech #tts
  2. 200608 FastSpeech 2 #tts
  3. 201106 Wave-Tacotron #tts
  4. 210716 Autonomy 2.0
  5. 211215 SPTS

energy based model

  1. 200504 How to Train Your Energy-Based Model for Regression

ensemble

  1. 200217 BatchEnsemble

federated learning

  1. 210415 See through Gradients

few shot

  1. 200228 AdarGCN #graph
  2. 210608 Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks #adapter #multitask
  3. 210910 LibFewShot
  4. 220715 Plex #uncertainty #generalization

finetuning

  1. 200214 AutoLR #pruning
  2. 200426 Masking as an Efficient Alternative to Finetuning for Pretrained
  3. 200709 Sample-based Regularization #transfer

flow

  1. 200220 Regularized Autoencoders via Relaxed Injective Probability Flow
  2. 200227 Woodbury Transformations for Deep Generative Flows

fpn

  1. 200122 CARAFE #resampling
  2. 200129 Mixture FPN
  3. 200506 Scale-Equalizing Pyramid Convolution for Object Detection
  4. 201201 Dynamic Feature Pyramid Networks for Object Detection
  5. 201202 Dual Refinement Feature Pyramid Networks for Object Detection
  6. 201202 Parallel Residual Bi-Fusion Feature Pyramid Network for Accurate
  7. 201225 Implicit Feature Pyramid Network for Object Detection #equilibrium_model #implicit_model

gan

  1. 170629 Do GANs actually learn the distribution
  2. 191022 MelGAN #tts
  3. 200129 Adversarial Lipschitz Regularization
  4. 200129 GAN generalization metric
  5. 200129 OneGAN
  6. 200130 AttentionGAN #attention #img2img
  7. 200130 Evaluation metrics of GAN #metric #evaluation #generative_model
  8. 200130 Local GAN #attention
  9. 200130 Noise Robust GAN #robustness
  10. 200130 Small-GAN
  11. 200130 Smoothness and Stability in GANs
  12. 200206 Unbalanced GANs #vae
  13. 200210 Unsupervised Discovery of Interpretable Directions in the GAN Latent #semantic_factor
  14. 200211 Improved Consistency Regularization for GANs #augmentation #consistency_regularization
  15. 200211 Smoothness and Stability in GANs #regularization
  16. 200212 Image-to-Image Translation with Text Guidance #multimodal #multimodal_generation #img2img
  17. 200212 Real or Not Real, that is the Question
  18. 200214 Top-k Training of GANs #regularization
  19. 200220 The Benefits of Pairwise Discriminators for Adversarial Training #regularization
  20. 200223 GANHopper #img2img
  21. 200224 When Relation Networks meet GANs #regularization
  22. 200225 Freeze the Discriminator #finetuning #transfer
  23. 200226 On Leveraging Pretrained GANs for Generation with Limited Data #finetuning #transfer
  24. 200227 Topology Distance #topology #score
  25. 200228 A U-Net Based Discriminator for Generative Adversarial Networks
  26. 200304 Creating High Resolution Images with a Latent Adversarial Generator #generative_model #super_resolution
  27. 200308 Perceptual Image Super-Resolution with Progressive Adversarial Network #super_resolution
  28. 200312 Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling #energy_based_model #sampling
  29. 200317 Blur, Noise, and Compression Robust Generative Adversarial Networks #noise
  30. 200318 OpenGAN #metric_learning
  31. 200325 Improved Techniques for Training Single-Image GANs #single_image
  32. 200326 Image Generation Via Minimizing Fréchet Distance in Discriminator Feature Space
  33. 200402 Controllable Orthogonalization in Training DNNs #regularization
  34. 200404 Feature Quantization Improves GAN Training #discrete_vae
  35. 200405 Discriminator Contrastive Divergence
  36. 200407 Inclusive GAN
  37. 200408 Attentive Normalization for Conditional Image Generation #attention
  38. 200504 Transforming and Projecting Images into Class-conditional Generative #generative_model
  39. 200518 Unconditional Audio Generation with Generative Adversarial Networks and Cycle Regularization #audio_generation
  40. 200519 CIAGAN
  41. 200519 Regularization Methods for Generative Adversarial Networks #review #regularization
  42. 200604 Image Augmentations for GAN Training #augmentation
  43. 200611 Training Generative Adversarial Networks with Limited Data #augmentation
  44. 200618 Differentiable Augmentation for Data-Efficient GAN Training #augmentation
  45. 200618 Diverse Image Generation via Self-Conditioned GANs #generative_model
  46. 200630 PriorGAN
  47. 200708 InfoMax-GAN #regularization
  48. 200713 Closed-Form Factorization of Latent Semantics in GANs #semantic_factor
  49. 200729 Instance Selection for GANs
  50. 200729 VocGAN #vocoder
  51. 200730 Rewriting a Deep Generative Model
  52. 200804 Open-Edit #image_editing
  53. 200807 Improving the Speed and Quality of GAN by Adversarial Training #robustness
  54. 201028 Training Generative Adversarial Networks by Solving Ordinary #neural_ode
  55. 201109 Learning Semantic-aware Normalization for Generative Adversarial Networks #normalization
  56. 201109 Towards a Better Global Loss Landscape of GANs #training
  57. 201118 Style Intervention #semantic_factor
  58. 201124 Adversarial Generation of Continuous Images #implicit_representation
  59. 201125 How to train your conditional GAN #img2img #generative_model
  60. 201125 Omni-GAN #generative_model
  61. 201127 Image Generators with Conditionally-Independent Pixel Synthesis #implicit_representation
  62. 201201 Refining Deep Generative Models via Discriminator Gradient Flow #sampling
  63. 201201 pi-GAN #implicit_representation
  64. 201203 Self-labeled Conditional GANs #unsupervised_training
  65. 201204 A Note on Data Biases in Generative Models #bias #generative_model
  66. 201208 You Only Need Adversarial Supervision for Semantic Image Synthesis #img2img
  67. 210227 Ultra-Data-Efficient GAN Training #augmentation #few_shot
  68. 210317 Training GANs with Stronger Augmentations via Contrastive Discriminator #contrastive_learning #augmentation
  69. 210318 Drop the GAN #single_image #generative_model #patch
  70. 210330 Dual Contrastive Loss and Attention for GANs #contrastive_learning
  71. 210401 Partition-Guided GANs
  72. 210407 Regularizing Generative Adversarial Networks under Limited Data #regularization
  73. 210408 InfinityGAN
  74. 210413 DatasetGAN #few_shot
  75. 210413 Few-shot Image Generation via Cross-domain Correspondence #img2img #generative_model #few_shot
  76. 210414 Aligning Latent and Image Spaces to Connect the Unconnectable
  77. 210415 GANcraft #nerf
  78. 210422 On Buggy Resizing Libraries and Surprising Subtleties in FID Calculation #antialiasing
  79. 210426 EigenGAN #semantic_factor
  80. 210608 Data-Efficient Instance Generation from Instance Discrimination #contrastive_learning
  81. 210614 Improved Transformer for High-Resolution GANs #transformer #efficient_training
  82. 210623 Alias-Free Generative Adversarial Networks #antialiasing
  83. 210910 Instance-Conditioned GAN
  84. 210927 WarpedGANSpace
  85. 211017 AE-StyleGAN #gan_inversion
  86. 211101 Projected GANs Converge Faster
  87. 211215 Efficient Geometry-aware 3D Generative Adversarial Networks #nerf
  88. 211216 GRAM #3d_generative_model #nerf
  89. 220201 StyleGAN-XL
  90. 220219 Truncated Diffusion Probabilistic Models #generative_model #ddpm
  91. 220224 Self-Distilled StyleGAN
  92. 220311 The Role of ImageNet Classes in Fréchet Inception Distance
  93. 220314 InsetGAN for Full-Body Image Generation #pose
  94. 220414 Any-resolution Training for High-resolution Image Synthesis
  95. 230123 StyleGAN-T #text2img

gan inversion

  1. 200330 Exploiting Deep Generative Prior for Versatile Image Restoration and #perceptual_loss
  2. 200331 In-Domain GAN Inversion for Real Image Editing
  3. 200703 Collaborative Learning for Faster StyleGAN Embedding
  4. 200803 Encoding in Style #stylegan
  5. 220223 Near Perfect GAN Inversion

generalization

  1. 200130 Fantastic Generalization Measures
  2. 200225 Rethinking Bias-Variance Trade-off for Generalization of Neural Networks

generative model

  1. 190325 Implicit Generative and Generalization in Energy-Based Models #energy_based_model
  2. 200129 Controlling Generative Model
  3. 200129 Deep Automodulator
  4. 200129 Frechet Joint Distance
  5. 200129 Spot CNN generated image
  6. 200130 BIVA
  7. 200130 Glow #flow
  8. 200130 IGEBM #energy_based_model
  9. 200130 Neural Spline Flows #flow
  10. 200130 VQ-VAE-2 #autoregressive_model
  11. 200217 Augmented Normalizing Flows #flow
  12. 200313 Semantic Pyramid for Image Generation #perceptual_loss #image_editing
  13. 200616 Improved Techniques for Training Score-Based Generative Models #ncsn
  14. 201117 DeepNAG
  15. 201202 Improved Contrastive Divergence Training of Energy Based Models #energy_based_model
  16. 201204 Few-shot Image Generation with Elastic Weight Consolidation #few_shot #continual_learning
  17. 201209 Positional Encoding as Spatial Inductive Bias in GANs #positional_encoding
  18. 201224 Soft-IntroVAE #vae
  19. 210223 Zero-Shot Text-to-Image Generation #discrete_vae #autoregressive_model #multimodal
  20. 210318 Few-shot Semantic Image Synthesis Using StyleGAN Prior #stylegan #few_shot
  21. 210824 SimVLM #vision-language
  22. 211015 MaGNET #sampling
  23. 220208 MaskGIT #autoregressive_model #non-autoregressive #vq

graph

  1. 200129 Multi-Graph Transformer

hallucination

  1. 210413 The Curious Case of Hallucinations in Neural Machine Translation #mt

hypernetwork

  1. 200722 WeightNet #channel_attention

hyperparameter

  1. 200425 Learning to Guide Random Search
  2. 200521 HyperSTAR

identifiability

  1. 200701 On Linear Identifiability of Learned Representations

image editing

  1. 200515 Semantic Photo Manipulation with a Generative Image Prior
  2. 201123 HistoGAN
  3. 201127 Navigating the GAN Parameter Space for Semantic Image Editing #semantic_factor
  4. 210318 Using latent space regression to analyze and leverage compositionality
  5. 220531 IDE-3D #3d_generative_model
  6. 220802 An Image is Worth One Word
  7. 220802 Prompt-to-Prompt Image Editing with Cross Attention Control
  8. 230202 Dreamix #video
  9. 230213 3D-aware Blending with Generative NeRFs #3d_generative_model

image generation

  1. 200426 Disentangled Image Generation Through Structured Noise Injection

img2img

  1. 200130 FUNIT
  2. 200305 SketchyCOCO
  3. 200315 GMM-UNIT #multimodal_generation
  4. 200319 High-Resolution Daytime Translation Without Domain Labels
  5. 200330 Semi-supervised Learning for Few-shot Image-to-Image Translation #semi_supervised_learning #few_shot
  6. 200406 Rethinking Spatially-Adaptive Normalization #lightweight
  7. 200409 TuiGAN #few_shot #single_image
  8. 200419 TriGAN #domain_adaptation
  9. 200702 Deep Single Image Manipulation #single_image #image_editing
  10. 200709 Improving Style-Content Disentanglement in Image-to-Image Translation #disentangle
  11. 200714 COCO-FUNIT
  12. 200715 Transformation Consistency Regularization- A Semi-Supervised Paradigm #augmentation #semi_supervised_learning
  13. 200723 TSIT
  14. 200724 The Surprising Effectiveness of Linear Unsupervised Image-to-Image
  15. 201203 CoCosNet v2 #patch #pose
  16. 201205 Spatially-Adaptive Pixelwise Networks for Fast Image Translation #implicit_representation

implicit model

  1. 200615 Multiscale Deep Equilibrium Models

implicit representation

  1. 210408 Modulated Periodic Activations for Generalizable Local Functional #positional_encoding #periodic_activation
  2. 210506 ACORN #positional_encoding
  3. 211026 NeRV
  4. 211122 Neural Fields in Visual Computing and Beyond
  5. 220117 Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
  6. 220522 ReLU Fields
  7. 230202 Factor Fields

in context learning

  1. 220520 Prototypical Calibration for Few-shot Learning of Language Models
  2. 220522 Instruction Induction

instance segmentation

  1. 200129 BlendMask
  2. 200129 COCO 2018 Instance Segmentation #challenge
  3. 200129 Deep Snake
  4. 200130 PointRend
  5. 200311 Conditional Convolutions for Instance Segmentation
  6. 200313 PointINS #dynamic_conv
  7. 200722 Deep Variational Instance Segmentation
  8. 200730 LevelSet R-CNN
  9. 201119 DCT-Mask
  10. 201119 Unifying Instance and Panoptic Segmentation with Dynamic Rank-1 #panoptic_segmentation #dynamic_conv
  11. 201126 The Devil is in the Boundary
  12. 201129 End-to-End Video Instance Segmentation with Transformers #end2end #detr #video
  13. 201203 BoxInst #dataset #weak_supervision
  14. 210503 ISTR #end2end
  15. 210505 QueryInst #end2end
  16. 210604 SOLQ
  17. 210713 Per-Pixel Classification is Not All You Need for Semantic Segmentation #panoptic_segmentation #semantic_segmentation #detr
  18. 221110 OneFormer #semantic_segmentation #panoptic_segmentation #detr

instruct

  1. 230131 The Flan Collection

interpolation

  1. 200804 Autoencoder Image Interpolation by Shaping the Latent Space
  2. 211018 Learning in High Dimension Always Amounts to Extrapolation #extrapolation

knowledge base

  1. 200214 Scalable Neural Methods for Reasoning With a Symbolic Knowledge Base

language generation

  1. 200712 Do You Have the Right Scissors
  2. 200729 Mirostat

language model

  1. 200128 Scaling Laws for LM
  2. 200205 K-Adapter #multitask #adapter
  3. 200206 Consistency of a Recurrent Language Model With Respect to Incomplete #decoding #hallucination #language_generation
  4. 200222 Training Question Answering Models From Synthetic Data #qa #bert
  5. 200225 MiniLM #distillation #lightweight
  6. 200406 Sparse Text Generation #language_generation #sampling
  7. 200427 Recall and Learn #finetuning #continual_learning
  8. 200505 Stolen Probability
  9. 200516 MicroNet for Efficient Language Modeling #lightweight
  10. 200518 Contextual Embeddings
  11. 201015 Fine-Tuning Pre-trained Language Model with Weak Supervision #transfer #weak_supervision
  12. 201023 Rethinking embedding coupling in pre-trained language models #regularization
  13. 201201 How Can We Know When Language Models Know #qa #calibration
  14. 201228 Universal Sentence Representation Learning with Conditional Masked #sentence_embedding #mlm
  15. 210216 Non-Autoregressive Text Generation with Pre-trained Language Models #non-autoregressive #text_generation
  16. 210318 GPT Understands, Too #finetuning #prompt
  17. 210407 Revisiting Simple Neural Probabilistic Language Models
  18. 210420 Carbon Emissions and Large Neural Network Training #nlp
  19. 210922 Recursively Summarizing Books with Human Feedback #summarization

layout

  1. 210601 Incorporating Visual Layout Structures for Scientific Text Classification
  2. 210902 Skim-Attention
  3. 220418 LayoutLMv3
  4. 220517 MATrIX -- Modality-Aware Transformer for Information eXtraction
  5. 220912 PreSTU
  6. 220918 ERNIE-mmLayout

lightweight

  1. 200624 Neural Architecture Design for GPU-Efficient Networks
  2. 201124 MicroNet
  3. 210507 Pareto-Optimal Quantized ResNet Is Mostly 4-bit #quantization
  4. 220409 Searching for Efficient Neural Architectures for On-Device ML on Edge TPUs

line

  1. 210601 Towards Real-time and Light-weight Line Segment Detection

llm

  1. 220521 Scaling Laws and Interpretability of Learning from Repeated Data
  2. 220522 Memorization Without Overfitting
  3. 220524 Large Language Models are Zero-Shot Reasoners #prompt
  4. 220711 Exploring Length Generalization in Large Language Models
  5. 220711 Language Models (Mostly) Know What They Know
  6. 220926 Can Large Language Models Truly Understand Prompts
  7. 220929 Compositional Semantic Parsing with Large Language Models #semantic_parsing
  8. 221017 Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them #prompt #reasoning
  9. 221020 Transcending Scaling Laws with 0.1% Extra Compute #mlm
  10. 221103 Inverse scaling can become U-shaped #prompt
  11. 221109 BLOOM
  12. 221109 Efficiently Scaling Transformer Inference #efficiency
  13. 221118 PAL #prompt
  14. 221118 SmoothQuant #quantization
  15. 230124 A Watermark for Large Language Models
  16. 230126 DetectGPT
  17. 230131 Faithful Chain-of-Thought Reasoning #prompt
  18. 230131 Grounding Language Models to Images for Multimodal Generation #multimodal_generation #vision-language
  19. 230131 Large Language Models Can Be Easily Distracted by Irrelevant Context #in_context_learning
  20. 230209 Toolformer
  21. 230211 Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented Large Language Models #retrieval
  22. 230215 Learning Performance-Improving Code Edits #in_context_learning
  23. 230215 The Capacity for Moral Self-Correction in Large Language Models #instruct #ethics
  24. 230216 Pretraining Language Models with Human Preferences #instruct #alignment
  25. 230221 ChatGPT #instruct

lm

  1. 210524 StructuralLM #layout
  2. 210524 True Few-Shot Learning with Language Models #few_shot
  3. 210528 ByT5
  4. 210617 LoRA #adapter #finetuning
  5. 210623 Charformer #tokenizer
  6. 210714 Deduplicating Training Data Makes Language Models Better #corpus
  7. 210714 HTLM
  8. 210811 DEMix Layers #mixture_of_experts
  9. 210813 Curriculum Learning #curriculum
  10. 210816 On the Opportunities and Risks of Foundation Models
  11. 210902 Do Prompt-Based Models Really Understand the Meaning of their Prompts #prompt
  12. 210903 Finetuned Language Models Are Zero-Shot Learners #zero-shot
  13. 210908 A Recipe For Arbitrary Text Style Transfer with Large Language Models #prompt
  14. 211011 Unsupervised Neural Machine Translation with Generative Language Models Only #unsupervised_nmt
  15. 211015 Multitask Prompted Training Enables Zero-Shot Task Generalization #zero-shot
  16. 211016 Invariant Language Modeling #irm
  17. 211016 MarkupLM #layout
  18. 211016 Sharpness-Aware Minimization Improves Language Model Generalization #regularization
  19. 211020 Shaking the foundations #causality
  20. 211027 Training Verifiers to Solve Math Word Problems
  21. 211213 GLaM #moe
  22. 211220 Efficient Large Scale Language Modeling with Mixtures of Experts #mixture_of_experts
  23. 220210 Red Teaming Language Models with Language Models #safety
  24. 220213 A Contrastive Framework for Neural Text Generation #decoding
  25. 220215 General-purpose, long-context autoregressive modeling with Perceiver AR #efficient_attention #autoregressive_model
  26. 220314 Efficient Language Modeling with Sparse all-MLP #mlp
  27. 220329 Training Compute-Optimal Large Language Models
  28. 220413 METRO
  29. 220414 GPT-NeoX-20B
  30. 220502 OPT
  31. 220524 On the Role of Bidirectionality in Language Model Pre-Training #bert
  32. 220728 Efficient Training of Language Models to Fill in the Middle #mlm
  33. 220805 Branch-Train-Merge #product_of_experts #ensemble
  34. 220805 Few-shot Learning with Retrieval Augmented Language Model #retrieval #few_shot
  35. 221110 The CRINGE Loss #safety
  36. 230131 In-Context Retrieval-Augmented Language Models #retrieval

local attention

  1. 210323 Scaling Local Self-Attention for Parameter Efficient Visual Backbones

loss

  1. 200712 It Is Likely That Your Loss Should be a Likelihood

loss surface

  1. 210225 Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling

matting

  1. 200401 Background Matting
  2. 201123 Is a Green Screen Really Necessary for Real-Time Portrait Matting

memory

  1. 200206 Product Kanerva Machines

meta learning

  1. 200221 Learning to Continually Learn #continual_learning
  2. 200312 Online Fast Adaptation and Knowledge Accumulation
  3. 200401 Editable Neural Networks
  4. 200706 Meta-Learning Symmetries by Reparameterization #group_equivariance

metric

  1. 211025 The Efficiency Misnomer

metric learning

  1. 200319 A unifying mutual information view of metric learning

mixture of experts

  1. 220202 Unified Scaling Laws for Routed Language Models
  2. 230220 TA-MoE

mixup

  1. 201220 ResizeMix
  2. 211228 LINDA #interpolation

mlm

  1. 200424 Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order #language_generation
  2. 210502 Larger-Scale Transformers for Multilingual Masked Language Modeling #multilingual #scale
  3. 220216 Should You Mask 15% in Masked Language Modeling
  4. 220715 Position Prediction as an Effective Pretraining Strategy #unsupervised_training
  5. 220909 Improved Masked Image Generation with Token-Critic #non-autoregressive
  6. 220929 Bidirectional Language Models Are Also Few-shot Learners #in_context_learning
  7. 221006 XDoc #layoutlm
  8. 221114 EVA #clip
  9. 230204 Representation Deficiency in Masked Language Modeling

mlops

  1. 230203 PyGlove

multilingual

  1. 220512 Lifting the Curse of Multilinguality by Pre-training Modular Transformers #adapter #mixture_of_experts

multimodal

  1. 200401 Pixel-BERT
  2. 200513 INFOTABS
  3. 200514 Behind the Scene
  4. 201130 Multimodal Pretraining Unmasked
  5. 210928 VideoCLIP #video_transformer #retrieval
  6. 220512 A Generalist Agent #reinforcement_learning
  7. 220527 GIT
  8. 230110 Scaling Laws for Generative Mixed-Modal Language Models
  9. 230123 Zorro #video #audio
  10. 230201 mPLUG-2

multimodal generation

  1. 211122 L-Verse
  2. 211124 NÜWA

multitask

  1. 200508 Transforming task representations to perform novel tasks #continual_learning
  2. 200625 MTAdam
  3. 210825 Multi-Task Self-Training for Learning General Representations
  4. 220520 UViM
  5. 230207 Exploring the Benefits of Training Expert Language Models over Instruction Tuning #instruct

nas

  1. 200324 BigNAS
  2. 200326 Are Labels Necessary for Neural Architecture Search #unsupervised_training
  3. 200406 Network Adjustment
  4. 200412 FBNetV2
  5. 200428 Angle-based Search Space Shrinking for Neural Architecture Search
  6. 200506 Local Search is State of the Art for Neural Architecture Search
  7. 200507 Noisy Differentiable Architecture Search
  8. 200602 FBNetV3 #hyperparameter #training #swa
  9. 200720 NSGANetV2
  10. 220831 Efficient Sparsely Activated Transformers #moe

nerf

  1. 201014 NeRF++
  2. 201125 Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes
  3. 201127 D-NeRF
  4. 201203 Learned Initializations for Optimizing Coordinate-Based Neural #implicit_representation
  5. 201203 pixelNeRF
  6. 201215 Object-Centric Neural Scene Rendering
  7. 210225 IBRNet
  8. 210318 FastNeRF
  9. 210318 GNeRF
  10. 210318 MVSNeRF
  11. 210318 NeMI
  12. 210324 Mip-NeRF
  13. 210325 KiloNeRF
  14. 210325 PlenOctrees for Real-time Rendering of Neural Radiance Fields
  15. 210706 Depth-supervised NeRF
  16. 210809 NeuralMVS
  17. 211019 CIPS-3D #stylegan
  18. 211129 Deblur-NeRF
  19. 211129 HDR-NeRF
  20. 211129 Urban Radiance Fields
  21. 211210 CityNeRF
  22. 221010 NerfAcc
  23. 230204 AV-NeRF
  24. 230208 Nerfstudio

neural computer

  1. 200720 Distributed Associative Memory Network with Memory Refreshing Loss
  2. 211130 Show Your Work

neural ode

  1. 200207 How to train your neural ODE
  2. 200520 Neural Controlled Differential Equations
  3. 200708 Learning Differential Equations that are Easy to Solve

neural rendering

  1. 200226 Learning to Shadow Hand-drawn Sketches
  2. 200427 Neural Hair Rendering
  3. 200506 CONFIG
  4. 201116 Stylized Neural Painting
  5. 201119 Creative Sketch Generation
  6. 201130 Animating Pictures with Eulerian Motion Fields #single_image
  7. 210319 Paint by Word
  8. 210512 Enhancing Photorealism Enhancement
  9. 211013 ADOP
  10. 220728 Neural Strands

nlp

  1. 200518 (Re)construing Meaning in NLP
  2. 200715 Towards Debiasing Sentence Representations #bias
  3. 220826 What Do NLP Researchers Believe

nmt

  1. 200207 A Multilingual View of Unsupervised Machine Translation #multilingual
  2. 200427 Lexically Constrained Neural Machine Translation with Levenshtein Transformer
  3. 200710 Learn to Use Future Information in Simultaneous Translation #simultaneous_translation
  4. 201224 Why Neural Machine Translation Prefers Empty Outputs #hallucination
  5. 211015 Breaking Down Multilingual Machine Translation #multilingual
  6. 230120 Is ChatGPT A Good Translator #chatgpt
  7. 230219 Scaling Laws for Multilingual Neural Machine Translation #multilingual #scaling

non autoregressive

  1. 200403 Aligned Cross Entropy for Non-Autoregressive Machine Translation
  2. 200415 Non-Autoregressive Machine Translation with Latent Alignments #nmt #ctc
  3. 200422 A Study of Non-autoregressive Model for Sequence Generation
  4. 201022 Parallel Tacotron #vae
  5. 201025 Improved Mask-CTC for Non-Autoregressive End-to-End ASR #ctc
  6. 201125 FBWave #vocoder #lightweight
  7. 201207 EfficientTTS #tts
  8. 211213 Step-unrolled Denoising Autoencoders for Text Generation
  9. 220520 Lossless Acceleration for Seq2seq Generation with Aggressive Decoding #efficiency

norm free

  1. 200310 ReZero is All You Need #initialization

normalization

  1. 200122 Group Norm, Weight Standardization
  2. 200122 Moving Average Batch Normalization
  3. 200122 StyleGAN 2 #GAN
  4. 200130 Rethinking Normalization
  5. 200130 Weight Standardization #weight
  6. 200224 Batch Normalization Biases Residual Blocks Towards the Identity Function #optimization #norm_free #initialization
  7. 200306 TaskNorm #meta_learning
  8. 200406 Evolving Normalization-Activation Layers #nas #activation
  9. 200427 A Batch Normalized Inference Network Keeps the KL Vanishing Away
  10. 201128 Batch Normalization with Enhanced Linear Transformation
  11. 211026 Revisiting Batch Normalization

object detection

  1. 191118 Anchor-Free
  2. 191118 CenterMask #instance_segmentation #backbone #1stage
  3. 191121 EfficientDet
  4. 200103 BlendMask #instance_segmentation #1stage
  5. 200122 SABL
  6. 200129 AP Loss #loss
  7. 200129 Backbone Reallocation for Detection #backbone #nas
  8. 200129 Dense RepPoints
  9. 200129 DetNAS #nas #backbone
  10. 200129 IOU-aware single stage detector #1stage
  11. 200130 ATSS #anchor #retinanet #fcos
  12. 200130 AutoAugment #augmentation #search
  13. 200130 EfficientDet #fpn
  14. 200130 Keypoint Triplet #keypoint
  15. 200130 Learning from Noisy Anchors
  16. 200130 Multiple Anchor Learning #anchor
  17. 200130 Objects as Points #keypoint
  18. 200130 Soft Anchor-Point #anchor
  19. 200211 Object Detection as a Positive-Unlabeled Problem #positive_unlabled #dataset
  20. 200212 Solving Missing-Annotation Object Detection with Background #dataset #noise
  21. 200218 Universal-RCNN #multi_dataset #graph
  22. 200316 Frustratingly Simple Few-Shot Object Detection #few_shot
  23. 200317 Revisiting the Sibling Head in Object Detector
  24. 200319 Revisiting the Sibling Head in Object Detector #review
  25. 200320 CentripetalNet #keypoint
  26. 200413 Dynamic R-CNN
  27. 200423 YOLOv4
  28. 200511 Scope Head for Accurate Localization in Object Detection
  29. 200526 End-to-End Object Detection with Transformers #end2end #matching
  30. 200603 DetectoRS
  31. 200611 Rethinking Pre-training and Self-training #semi_supervised_learning #transfer
  32. 200706 LabelEnc #distillation
  33. 200707 AutoAssign #anchor_free
  34. 200714 AQD #quantization
  35. 200715 Probabilistic Anchor Assignment with IoU Prediction for Object Detection #anchor #1stage
  36. 200716 RepPoints V2 #1stage #anchor_free
  37. 200723 PP-YOLO #tuning
  38. 200723 The Devil is in Classification #longtail
  39. 200727 Corner Proposal Network for Anchor-free, Two-stage Object Detection #anchor_free #2stage
  40. 201116 Scaled-YOLOv4
  41. 201118 End-to-End Object Detection with Adaptive Clustering Transformer #detr #end2end #efficiency
  42. 201121 Rethinking Transformer-based Set Prediction for Object Detection #detr #end2end #efficiency
  43. 201124 Sparse R-CNN
  44. 201128 Class-agnostic Object Detection
  45. 201207 End-to-End Object Detection with Fully Convolutional Network #end2end
  46. 201223 SWA Object Detection #swa
  47. 201227 Towards A Category-extended Object Detector without Relabeling or #continual_learning
  48. 210225 Simple multi-dataset detection #multi_dataset
  49. 210316 You Only Look One-level Feature
  50. 210325 USB #dataset
  51. 210417 TransVG #visual_grounding
  52. 210420 PP-YOLOv2 #yolo
  53. 210426 MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding #detr #visual_grounding
  54. 210601 You Only Look at One Sequence #vit
  55. 210615 Dynamic Head #attention
  56. 210718 YOLOX #yolo
  57. 210728 SimROD #domain_adaptation #self_supervised
  58. 210922 Pix2seq #detr #autoregressive_model
  59. 210929 Localizing Objects with Self-Supervised Transformers and no Labels #self_supervised #self_supervised_discovery #salient_object_detection
  60. 211101 PP-PicoDet #lightweight
  61. 211122 Benchmarking Detection Transfer Learning with Vision Transformers #unsupervised_training #vit
  62. 211123 Dynamic DETR
  63. 211129 Sparse DETR #detr
  64. 220107 Detecting Twenty-thousand Classes using Image-level Supervision #weak_supervision
  65. 220330 Exploring Plain Vision Transformer Backbones for Object Detection #vit #instance_segmentation
  66. 220615 A Unified Sequence Interface for Vision Tasks #multitask #instance_segmentation #keypoint

ocr

  1. 191231 LayoutLM
  2. 200217 Text Perceptron
  3. 210415 Rethinking Text Line Recognition Models
  4. 220107 Data-Efficient Information Extraction from Form-Like Documents #information_extraction
  5. 220328 Towards End-to-End Unified Scene Text Detection and Layout Analysis
  6. 220416 Pushing the Performance Limit of Scene Text Recognizer without Human Annotation

open set recognition

  1. 211012 Open-Set Recognition

optimization

  1. 200221 The Break-Even Point on Optimization Trajectories of Deep Neural Networks #loss #training
  2. 200224 The Early Phase of Neural Network Training
  3. 200227 Using a thousand optimization tasks to learn hyperparameter search strategies #optimizer #hyperparameter
  4. 200228 A Self-Tuning Actor-Critic Algorithm #reinforcement_learning #hyperparameter #meta_learning
  5. 200316 Weak and Strong Gradient Directions
  6. 200403 Gradient Centralization #training
  7. 200508 An Investigation of Why Overparameterization Exacerbates Spurious #training
  8. 200519 One Size Fits All

optimizer

  1. 200130 LAMB #large_batch
  2. 211006 8-bit Optimizers via Block-wise Quantization
  3. 221117 VeLO
  4. 230118 Learning-Rate-Free Learning by D-Adaptation
  5. 230213 Symbolic Discovery of Optimization Algorithms #search

oriented object detection

  1. 200129 Modulated Loss
  2. 200129 Oriented Objects as Middle Lines

out of distribution

  1. 200509 Generalizing Outside the Training Set
  2. 200519 Bridging the Gap Between Training and Inference for Spatio-Temporal Forecasting

panoptic segmentation

  1. 200129 Bridge gap of traininfer Panoptic Segmentation
  2. 200130 Panoptic-DeepLab
  3. 200218 Towards Bounding-Box Free Panoptic Segmentation #box_free
  4. 200404 Pixel Consensus Voting for Panoptic Segmentation
  5. 200421 Panoptic-based Image Synthesis #neural_rendering
  6. 201123 Scaling Wide Residual Networks for Panoptic Segmentation #scale
  7. 201201 Fully Convolutional Networks for Panoptic Segmentation #dynamic_conv
  8. 201201 MaX-DeepLab #detr #end2end
  9. 201202 Single-shot Path Integrated Panoptic Segmentation #dynamic_conv
  10. 210910 Panoptic Narrative Grounding #visual_grounding
  11. 211202 Masked-attention Mask Transformer for Universal Image Segmentation #detr

perceptual loss

  1. 200206 Image Fine-grained Inpainting #inpainting
  2. 200515 Enhancing Perceptual Loss with Adversarial Feature Matching for Super-Resolution
  3. 200626 A Loss Function for Generative Neural Networks Based on Watson's
  4. 201223 Focal Frequency Loss for Image Reconstruction and Synthesis #loss

point cloud

  1. 220325 Point2Seq

pooling

  1. 200325 What Deep CNNs Benefit from Global Covariance Pooling
  2. 200330 Strip Pooling

pose

  1. 200729 Unselfie #inpainting
  2. 210913 Pose with Style

positional encoding

  1. 200628 Rethinking Positional Encoding in Language Pre-training
  2. 210706 Rethinking Positional Encoding

practice

  1. 210630 Using AntiPatterns to avoid MLOps Mistakes

pretraining

  1. 190620 XLNet #language_model
  2. 190729 RoBERTa #language_model
  3. 200128 mBART #machine_translation #nlp
  4. 200129 ImageBERT #multimodal
  5. 200129 LM Pretraining #nlp
  6. 200129 oLMpics #language_model #nlp
  7. 200130 RoBERTa #language_model #nlp #transformer
  8. 200130 T5 #nlp #transformer #seq2seq
  9. 200130 ViLBERT #multimodal
  10. 200210 Pre-training Tasks for Embedding-based Large-scale Retrieval #retrieval
  11. 200217 Incorporating BERT into Neural Machine Translation #language_model #bert #nmt
  12. 200219 CodeBERT #bert
  13. 200228 UniLMv2 #language_model
  14. 200317 Calibration of Pre-trained Transformers #calibration
  15. 200405 Unsupervised Domain Clusters in Pretrained Language Models #domain
  16. 200412 Pre-training Text Representations as Meta Learning #meta_learning #finetuning
  17. 200413 Pretrained Transformers Improve Out-of-Distribution Robustness #out_of_distribution
  18. 200419 Are we pretraining it right #multimodal
  19. 200420 Adversarial Training for Large Neural Language Models #adversarial_training #language_model #finetuning
  20. 200420 MPNet #language_model
  21. 200423 Don't Stop Pretraining #domain
  22. 200427 LightPAFF #distillation #finetuning
  23. 200520 Pretraining with Contrastive Sentence Objectives Improves Discourse Performance of Language Models #contrastive_learning #sentence_embedding
  24. 200610 MC-BERT
  25. 200615 To Pretrain or Not to Pretrain #nlp #finetuning
  26. 200626 Pre-training via Paraphrasing #retrieval
  27. 200703 Language-agnostic BERT Sentence Embedding #embedding #multilingual
  28. 200713 An Empirical Study on Robustness to Spurious Correlations using #nlp #multitask
  29. 200715 InfoXLM #nlp #cross_lingual
  30. 200804 Taking Notes on the Fly Helps BERT Pre-training #nlp
  31. 201020 Pushing the Limits of Semi-Supervised Learning for Automatic Speech #semi_supervised_learning #asr
  32. 201021 Self-training and Pre-training are Complementary for Speech Recognition #self_supervised #asr
  33. 201022 mT5 #language_model #multilingual
  34. 201109 When Do You Need Billions of Words of Pretraining Data #language_model
  35. 201117 UP-DETR #detr #end2end #object_detection
  36. 201127 Progressively Stacking 2.0 #efficiency
  37. 201201 Pre-Trained Image Processing Transformer #contrastive_learning #vision_transformer #restoration
  38. 201201 StructFormer #parse #attention #mlm
  39. 201227 Syntax-Enhanced Pre-trained Model #language_model #syntax
  40. 210225 SparseBERT #attention #sparse_attention #bert
  41. 210318 All NLP Tasks Are Generation Tasks #language_model
  42. 210324 Can Vision Transformers Learn without Natural Images #vision_transformer
  43. 210402 Robust wav2vec 2.0 #asr
  44. 210407 Pushing the Limits of Non-Autoregressive Speech Recognition #non-autoregressive #asr #ctc
  45. 210413 Masked Language Modeling and the Distributional Hypothesis #language_model #mlm
  46. 210417 mT6 #language_model
  47. 210418 Data-Efficient Language-Supervised Zero-Shot Learning with #multimodal
  48. 210422 ImageNet-21K Pretraining for the Masses #backbone
  49. 210510 Are Pre-trained Convolutions Better than Pre-trained Transformers #nlp #convolution #transformer
  50. 210606 On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation #finetuning #adapter
  51. 210606 Rethinking Training from Scratch for Object Detection #object_detection
  52. 210608 DETReg #detr
  53. 210614 SAS
  54. 210615 BEiT #vit #bert
  55. 210907 How much pretraining data do language models need to learn syntax #bert
  56. 210910 ReasonBERT #bert #reasoning #qa
  57. 210913 STraTA #finetuning #semi_supervised_learning #few_shot
  58. 210914 Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition #asr
  59. 210914 Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding #finetuning #semi_supervised_learning #few_shot
  60. 210927 BigSSL #asr #semi_supervised_learning #unsupervised_training
  61. 211005 Exploring the Limits of Large Scale Pre-training #classificiation #scaling
  62. 211018 Unsupervised Finetuning #unsupervised_training #finetuning
  63. 211026 WavLM #speech
  64. 211103 VLMo #mixture_of_experts #vision-language
  65. 211111 Masked Autoencoders Are Scalable Vision Learners #vit
  66. 211122 ExT5 #multitask
  67. 211122 Florence #vision-language #transfer
  68. 211201 Revisiting the Transferability of Supervised Pretraining #transfer
  69. 211216 Masked Feature Prediction for Self-Supervised Visual Pre-Training #self_supervised
  70. 211220 Are Large-scale Datasets Necessary for Self-Supervised Pre-training #self_supervised #transfer
  71. 220429 Vision-Language Pre-Training for Boosting Scene Text Detectors
  72. 220914 PaLI #vision-language

probabilistic model

  1. 200413 Einsum Networks
  2. 200419 Roundtrip

prompt

  1. 220118 ZeroPrompt #zero-shot
  2. 220916 Text and Patterns
  3. 230207 Hard Prompts Made Easy #text2img

pruning

  1. 200130 Rethinking Pruning
  2. 200218 Picking Winning Tickets Before Training by Preserving Gradient Flow #lottery_ticket
  3. 200224 HRank #rank
  4. 200305 Comparing Rewinding and Fine-tuning in Neural Network Pruning
  5. 200424 Convolution-Weight-Distribution Assumption
  6. 200514 Bayesian Bits #quantization #variational_inference
  7. 200515 Movement Pruning
  8. 200518 Joint Multi-Dimension Pruning
  9. 200706 Lossless CNN Channel Pruning via Decoupling Remembering and Forgetting
  10. 200710 To Filter Prune, or to Layer Prune, That Is The Question

qa

  1. 200222 Unsupervised Question Decomposition for Question Answering

quantization

  1. 220815 LLM.int8()
  2. 230216 Shared Microexponents

reasoning

  1. 200129 Neural Arithmetic Units
  2. 200409 Injecting Numerical Reasoning Skills into Language Models

regularization

  1. 200130 DropAttention #dropout
  2. 200219 Revisiting Training Strategies and Generalization Performance in Deep #metric_learning
  3. 200225 On Feature Normalization and Data Augmentation #normalization #mixup
  4. 200228 The Implicit and Explicit Regularization Effects of Dropout #dropout
  5. 200331 Regularizing Class-wise Predictions via Self-knowledge Distillation #distillation #consistency_regularization
  6. 200409 Orthogonal Over-Parameterized Training
  7. 200424 Dropout as an Implicit Gating Mechanism For Continual Learning
  8. 200427 Scheduled DropHead
  9. 200513 Implicit Regularization in Deep Learning May Not Be Explainable by Norms #training #optimization
  10. 200707 RIFLE #finetuning
  11. 200707 Remix #imbalanced
  12. 200721 Improving compute efficacy frontiers with SliceOut #efficient_training
  13. 201122 Stable Weight Decay Regularization
  14. 220527 Sharpness-Aware Training for Free

reinforcement learning

  1. 191120 Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
  2. 200130 Mastering Atari, Go, Chess, Shogi
  3. 200626 Critic Regularized Regression
  4. 210929 Vision-Guided Quadrupedal Locomotion in the Wild with Multi-Modal Delay Randomization
  5. 211030 Mastering Atari Games with Limited Data
  6. 230210 The Wisdom of Hindsight Makes Language Models Better Instruction Followers #instruct

rendering

  1. 200130 Textured Neural Avatars

representation

  1. 200220 Neural Bayes #bayesian #clustering
  2. 200412 Gradients as Features for Deep Representation Learning
  3. 201223 Noisy Labels Can Induce Good Representations #noise

resampling

  1. 200512 Invertible Image Rescaling

restoration

  1. 200402 Learning to See Through Obstructions
  2. 200404 Deblurring by Realistic Blurring
  3. 200406 Self-Supervised Scene De-occlusion
  4. 201123 Cross-Camera Convolutional Color Constancy
  5. 201123 Dissecting Image Crops

retrieval

  1. 210715 Internet-Augmented Dialogue Generation #dialog
  2. 220124 Text and Code Embeddings by Contrastive Pre-Training

review

  1. 200130 Filter Response Normalization
  2. 200227 A Primer in BERTology #bert
  3. 200306 What is the State of Neural Network Pruning #pruning
  4. 200311 Improved Baselines with Momentum Contrastive Learning #contrastive_learning
  5. 200318 A Metric Learning Reality Check #metric_learning
  6. 200324 A Systematic Evaluation
  7. 200325 Rethinking Few-Shot Image Classification #meta_learning
  8. 200408 State of the Art on Neural Rendering #neural_rendering
  9. 200409 EvoNorm
  10. 200428 Showing Your Work Doesn't Always Work
  11. 200619 Augmentation for GANs
  12. 200627 Denoising Diffusion Probabilistic Models Implementation
  13. 200717 Semantic factor of GANs
  14. 200725 Neighbor Embedding
  15. 200821 Virtual Try On
  16. 201016 Representation Learning via Invariant Causal Mechanisms
  17. 201021 BYOL works even without batch statistics
  18. 201108 Long Range Arena #attention #efficient_attention
  19. 201112 Learning Semantic-aware Normalization for Generative Adversarial Networks
  20. 201112 When Do You Need Billions of Words of Pretraining Data
  21. 210324 A Broad Study on the Transferability of Visual Representations with Contrastive Learning #contrastive_learning
  22. 210325 Contrasting Contrastive Self-Supervised Representation Learning Models #contrastive_learning
  23. 210512 When Does Contrastive Visual Representation Learning Work #contrastive_learning #self_supervised #transfer

robustness

  1. 200211 Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial #adversarial_training
  2. 200304 A Closer Look at Accuracy vs. Robustness #adversarial_training
  3. 200810 Informative Dropout for Robust Representation Learning
  4. 220607 Can CNNs Be More Robust Than Transformers

saliency

  1. 200406 There and Back Again

salient object detection

  1. 200518 U$^2$-Net

scale

  1. 200712 Learning to Learn Parameterized Classification Networks for Scalable #hypernetwork
  2. 201130 Towards Better Accuracy-efficiency Trade-offs

score

  1. 200319 GIQA
  2. 200426 Evaluation Metrics for Conditional Image Generation

self supervised

  1. 200213 Automatically Discovering and Learning New Visual Categories with Ranking Statistics #weak_supervision
  2. 200218 MAST #tracking
  3. 200224 Self-Adaptive Training #noise #dataset
  4. 200408 Improving BERT with Self-Supervised Attention #bert #distillation
  5. 200722 CrossTransformers #few_shot
  6. 201015 Representation Learning via Invariant Causal Mechanisms #causality
  7. 201117 Neural Semi-supervised Learning for Text Classification Under #nlp
  8. 201125 Can Temporal Information Help with Contrastive Self-Supervised Learning #video #augmentation
  9. 201224 Self-supervised Pre-training with Hard Examples Improves Visual #mixup
  10. 210726 Continental-Scale Building Detection from High Resolution Satellite Imagery
  11. 210827 Injecting Text in Self-Supervised Speech Pretraining #asr
  12. 210927 Compressive Visual Representations
  13. 211027 Neural Analysis and Synthesis #audio_synthesis
  14. 220124 data2vec
  15. 220216 Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
  16. 220520 Uniform Masking
  17. 220526 Green Hierarchical Vision Transformer for Masked Image Modeling
  18. 220526 MixMIM
  19. 220526 Revealing the Dark Secrets of Masked Image Modeling #representation
  20. 220715 Is a Caption Worth a Thousand Images #clip
  21. 220803 Masked Vision and Language Modeling for Multi-modal Representation Learning #mlm

self supervised discovery

  1. 200403 Self-Supervised Viewpoint Learning From Image Collections #viewpoint
  2. 201127 Unsupervised part representation by Flow Capsules
  3. 210429 MarioNette

semantic factor

  1. 200307 StyleGAN2 Distillation for Feed-forward Image Manipulation #stylegan
  2. 200308 PULSE #stylegan
  3. 200406 GANSpace
  4. 201222 Time-Travel Rephotography #restoration #stylegan

semantic segmentation

  1. 200323 Learning Dynamic Routing for Semantic Segmentation
  2. 200516 Single-Stage Semantic Segmentation from Image Labels
  3. 200826 EfficientFCN
  4. 210512 Segmenter
  5. 220918 SegNeXt

semi supervised learning

  1. 200218 DivideMix #mixup #noise #dataset
  2. 200306 Semi-Supervised StyleGAN for Disentanglement Learning #stylegan #mixup
  3. 200323 Meta Pseudo Labels #meta_learning
  4. 200627 Laplacian Regularized Few-Shot Learning #few_shot
  5. 200724 Deep Co-Training with Task Decomposition for Semi-Supervised Domain #domain_adaptation
  6. 201116 On the Marginal Benefit of Active Learning #active_learning #unsupervised_training
  7. 201118 FROST
  8. 220811 Semi-supervised Vision Transformers at Scale
  9. 220829 Open-Set Semi-Supervised Object Detection #open_set_recognition
  10. 220918 The Geometry of Self-supervised Learning Models and its Impact on Transfer Learning

sgld

  1. 200706 Kernel Stein Generative Modeling #svgd

singing voice synthesis

  1. 211008 KaraSinger

single image

  1. 200405 Structural-analogy from a Single Image Pair

speech

  1. 200129 Speech Recognition
  2. 200129 WaveFlow #conditional_generative_model

state space model

  1. 211031 Efficiently Modeling Long Sequences with Structured State Spaces
  2. 221017 What Makes Convolutional Models Great on Long Sequence Modeling
  3. 230213 Simple Hardware-Efficient Long Convolutions for Sequence Modeling

structure learning

  1. 200518 Large-scale empirical validation of Bayesian Network structure learning

style transfer

  1. 200318 A Content Transformation Block For Image Style Transfer
  2. 200324 Deformable Style Transfer
  3. 200710 Geometric Style Transfer

stylegan

  1. 210318 Labels4Free #unsupervised_segmentation

super resolution

  1. 200129 ESRGAN+
  2. 200323 Deep Unfolding Network for Image Super-Resolution

table

  1. 210906 Parsing Table Structures in the Wild
  2. 220809 TSRFormer

text generation

  1. 200130 Unlikelihood Training
  2. 200605 CoCon

text2img

  1. 221125 SpaText

tokenizer

  1. 211006 How BPE Affects Memorization in Transformers

topic model

  1. 200426 Neural Topic Modeling with Bidirectional Adversarial Training

topology

  1. 200413 Topology of deep neural networks #theory

tracking

  1. 200402 Tracking Objects as Points #keypoint
  2. 200402 Tracking by Instance Detection #meta_learning
  3. 200403 FairMOT
  4. 200506 PeTra
  5. 201215 Detecting Invisible People
  6. 211013 ByteTrack

training

  1. 200702 Beyond Signal Propagation

transducer

  1. 200519 A New Training Pipeline for an Improved Neural Transducer

transfer

  1. 200130 BiT ResNet #resnet
  2. 200512 Neural Architecture Transfer #nas
  3. 200711 Adversarially-Trained Deep Nets Transfer Better #adversarial_training
  4. 200716 Do Adversarially Robust ImageNet Models Transfer Better #robust
  5. 200721 Adversarial Training Reduces Information and Improves Transferability #adversarial_training
  6. 201122 Ranking Neural Checkpoints
  7. 211012 Rethinking supervised pre-training for better downstream transferring #classificiation #metric_learning

transformer

  1. 200129 Are Transformers universal approximator
  2. 200129 Product Key Memory #attention
  3. 200129 Reformer #attention
  4. 200130 Sparse Transformer #generative_model
  5. 200130 Structured Pruning for LM #pruning
  6. 200207 Transformer Transducer #asr #transducer
  7. 200211 On Layer Normalization in the Transformer Architecture #normalization
  8. 200212 GLU Variants Improve Transformer #activation
  9. 200214 Transformer on a Diet #efficient_attention
  10. 200214 Transformers as Soft Reasoners over Language #language
  11. 200215 Fine-Tuning Pretrained Language Models #bert #finetuning
  12. 200221 Addressing Some Limitations of Transformers with Feedback Memory #recurrent
  13. 200305 Talking-Heads Attention #attention
  14. 200424 Lite Transformer with Long-Short Range Attention #lightweight
  15. 200515 Finding Experts in Transformer Models
  16. 200515 JDI-T #tts
  17. 200516 Conformer #asr
  18. 200518 Weak-Attention Suppression For Transformer Based Speech Recognition #asr
  19. 200605 Funnel-Transformer #efficient_attention
  20. 200707 Do Transformers Need Deep Long-Range Memory #lm #attention
  21. 200709 Fast Transformers with Clustered Attention #attention
  22. 200715 AdapterHub #nlp #finetuning
  23. 200727 Big Bird #attention
  24. 200802 DeLighT #nlp
  25. 201217 Taming Transformers for High-Resolution Image Synthesis #discrete_vae #generative_model #autoregressive_model
  26. 201221 RealFormer #attention
  27. 201227 SG-Net #syntax #attention
  28. 210223 Do Transformer Modifications Transfer Across Implementations and
  29. 210225 Evolving Attention with Residual Convolutions #attention
  30. 210318 HiT #video #retrieval
  31. 210318 Looking Beyond Two Frames #tracking
  32. 210318 TFPose #pose
  33. 210318 TransCenter #tracking
  34. 210318 Transformer Trackin #tracking
  35. 210407 Seeing Out of tHe bOx #multimodal #vision-language
  36. 210409 Efficient Large-Scale Language Model Training on GPU Clusters #distributed_training
  37. 210409 Not All Attention Is All You Need
  38. 210410 UniDrop #regularization
  39. 210417 Demystifying the Better Performance of Position Encoding Variants for #positional_encoding
  40. 210420 RoFormer #positional_encoding
  41. 210423 M3DeTR #3d
  42. 210509 FNet #efficient_attention #fourier
  43. 210613 Thinking Like Transformers
  44. 210617 Multi-head or Single-head
  45. 210730 Perceiver IO
  46. 210809 Making Transformers Solve Compositional Tasks
  47. 210812 Mobile-Former #backbone
  48. 210830 A Battle of Network Structures #cnn #mlp #backbone
  49. 210830 Shatter #bert
  50. 210908 Panoptic SegFormer #panoptic_segmentation #detr
  51. 210909 Bag of Tricks for Optimizing Transformer Efficiency #nmt #lightweight
  52. 210917 Primer #lm #nas
  53. 210922 Scale Efficiently
  54. 211018 NormFormer
  55. 211026 Hierarchical Transformers Are More Efficient Language Models #lm #efficient_attention
  56. 211122 MetaFormer is Actually What You Need for Vision #vit
  57. 211124 Sparse is Enough in Scaling Transformers #sparsity #efficiency
  58. 220221 Transformer Quality in Linear Time #efficient_attention #linear_attention #local_attention
  59. 220301 DeepNet #normalization
  60. 220330 Transformer Language Models without Positional Encodings Still Learn Positional Information #lm #positional_encoding
  61. 220924 In-context Learning and Induction Heads #in_context_learning
  62. 221004 MOAT #backbone
  63. 230209 In-Context Learning with Many Demonstration Examples #efficient_attention

tropical geometry

  1. 200220 On the Decision Boundaries of Neural Networks

tts

  1. 200512 Flowtron #flow
  2. 210617 WaveGrad 2

uncertainty

  1. 210727 A Tale Of Two Long Tails

unsupervised img2img

  1. 200310 Unpaired Image-to-Image Translation using Adversarial Consistency Loss
  2. 200611 Rethinking the Truly Unsupervised Image-to-Image Translation
  3. 201201 Unpaired Image-to-Image Translation via Latent Energy Transport

unsupervised nmt

  1. 200422 When and Why is Unsupervised Neural Machine Translation Useless

vae

  1. 200420 Bringing Old Photos Back to Life #restoration
  2. 200707 NVAE
  3. 201119 Dual Contradistinctive Generative Autoencoder
  4. 201120 Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them

video

  1. 210325 An Image is Worth 16x16 Words, What is a Video Worth

video transformer

  1. 210423 VidTr

vision

  1. 200305 Optimizing JPEG Quantization for Classification Networks
  2. 201127 Field of Junctions

vision language

  1. 201212 MiniVLM
  2. 201222 Seeing past words
  3. 210407 Multimodal Fusion Refiner Networks
  4. 210727 Is Object Detection Necessary for Human-Object Interaction Recognition #human-object-interaction
  5. 211103 An Empirical Study of Training End-to-End Vision-and-Language Transformers #multimodal
  6. 220221 Vision-Language Pre-Training with Triple Contrastive Learning
  7. 220504 CoCa
  8. 220612 GLIPv2
  9. 220615 Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
  10. 220617 Bridge-Tower
  11. 220617 Unified-IO #multitask
  12. 220810 Patching open-vocabulary models by interpolating weights #clip #multitask #domain
  13. 220822 Image as a Foreign Language #mlm
  14. 230202 Multimodal Chain-of-Thought Reasoning in Language Models #multimodal
  15. 230209 Re-ViLM

vision transformer

  1. 201127 General Multi-label Image Classification with Transformers
  2. 201223 A Survey on Visual Transformer
  3. 201223 Training data-efficient image transformers & distillation through #distillation
  4. 210223 Pyramid Vision Transformer
  5. 210318 CrossViT
  6. 210318 CvT
  7. 210318 Multi-Scale Vision Longformer
  8. 210319 ConViT
  9. 210319 Scalable Visual Transformers with Hierarchical Pooling
  10. 210324 Vision Transformers for Dense Prediction #fpn
  11. 210325 Swin Transformer #local_attention
  12. 210331 Going deeper with Image Transformers
  13. 210402 LeViT
  14. 210421 Token Labeling
  15. 210422 Multiscale Vision Transformers
  16. 210422 So-ViT
  17. 210426 Improve Vision Transformers Training by Suppressing Over-smoothing
  18. 210426 Visformer
  19. 210427 ConTNet
  20. 210428 Twins #local_attention #positional_encoding
  21. 210509 Conformer
  22. 210515 Are Convolutional Neural Networks or Transformers more like human vision #cnn #inductive_bias
  23. 210517 Rethinking the Design Principles of Robust Vision Transformer #robustness

visual grounding

  1. 210401 Towards General Purpose Vision Systems
  2. 210510 Visual Grounding with Transformers

vit

  1. 210521 Intriguing Properties of Vision Transformers #robustness
  2. 210526 Aggregating Nested Transformers #local_attention
  3. 210529 Less is More
  4. 210603 DynamicViT #sparse_attention
  5. 210603 When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations #regularization
  6. 210604 RegionViT #local_attention
  7. 210607 Refiner #attention
  8. 210607 Shuffle Transformer
  9. 210608 Scaling Vision Transformers #scale
  10. 210609 CoAtNet
  11. 210614 Delving Deep into the Generalization of Vision Transformers under Distribution Shifts #robustness
  12. 210615 Revisiting the Calibration of Modern Neural Networks #mlp #calibration
  13. 210617 XCiT #efficient_attention
  14. 210624 Exploring Corruption Robustness #robustness #mlp
  15. 210624 VOLO #efficient_attention
  16. 210624 Video Swin Transformer #local_attention #video #video_transformer
  17. 210701 CSWin Transformer #efficient_attention #local_attention
  18. 210701 Focal Self-attention for Local-Global Interactions in Vision Transformers #local_attention
  19. 210705 What Makes for Hierarchical Vision Transformer #attention #mlp #local_attention
  20. 210713 Visual Parser #local_attention
  21. 210731 CrossFormer
  22. 210811 ConvNets vs. Transformers #robustness #transfer
  23. 210819 Do Vision Transformers See Like Convolutional Neural Networks #resnet
  24. 210908 Scaled ReLU Matters for Training Vision Transformers #cnn
  25. 211118 Swin Transformer V2
  26. 211202 Improved Multiscale Vision Transformers for Classification and Detection
  27. 211210 Deep ViT Features as Dense Visual Descriptors #self_supervised #semantic_segmentation
  28. 211217 A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation #multiscale
  29. 220214 How Do Vision Transformers Work #cnn
  30. 220414 DeiT III
  31. 220722 An Impartial Take to the CNN vs Transformer Robustness Contest #robustness #cnn
  32. 220812 BEiT v2 #self_supervised #mlm
  33. 221110 Demystify Transformers & Convolutions in Modern Image Deep Networks #cnn
  34. 230202 Dual PatchNorm #normalization

vocoder

  1. 200512 FeatherWave
  2. 201118 Universal MelGAN

vqa

  1. 220914 MUST-VQA

weak supervision

  1. 201126 SelfText Beyond Polygon #ocr

yolo

  1. 230113 YOLOv6 v3.0

uncategorized

  1. 09
  2. 200211 fastai
  3. 210224 Zero-Shot Text-to-Image Generation
  4. 210603 The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models
  5. 210606 Referring Transformer
  6. 210607 ViTAE
  7. 210614 Non Gaussian Denoising Diffusion Models
  8. 210909 PIMNet
  9. 211026 Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers
  10. 211028 Colossal-AI
  11. 211215 Value Retrieval with Arbitrary Queries for Form-like Documents
  12. 221125 Solving math word problems with process- and outcome-based feedback
  13. 221204 Languages You Know Influence Those You Learn
  14. 221215 Constitutional AI
  15. 220114 DeepSpeed-MoE
  16. 220203 AlphaCode, Formal Math
  17. 220204 InstructGPT
  18. 220323 Pathways
  19. 220329 Few Could Be Better Than All
  20. 220405 Text Spotting Transformers
  21. 220416 Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks
  22. 220510 UL2
  23. 220610 A Multi-Task Benchmark for Korean Legal Language Understanding and Judgement Prediction
  24. 220614 RDU
  25. 220630 DeepSpeed Inference
  26. 220712 Inner Monologue
  27. 220720 NUWA-Infinity
  28. 220722 Multiface
  29. 220725 CelebV-HQ
  30. 220725 Neural Generation Meets Real People
  31. 220725 Towards Complex Document Understanding By Discrete Reasoning
  32. 220819 FP8 Quantization
  33. 220823 CLOWER
  34. 220912 FP8 Formats for Deep Learning
  35. 220923 Diffusion
  36. 220928 The Change You Want to See
  37. 221219 MatCha
  38. 230206 SmoothQuant
  39. 230207 Efficiently Upgrading Multilingual Machine Translation Models to Support More Languages
  40. 230207 FP8
  41. 230208 Google Configuration System
  42. 230209 Efficient Attention via Control Variates
  43. 230211 Generative AI에 대한 생각
  44. 230213 Lossy Compression
  45. 230214 Adding Instructions during Pretraining
  46. 230214 Score-based Diffusion Models in Function Space
  47. 230220 DSP
  48. 230221 Anthropic
  49. 230222 FlexGen
  50. 230223 Colossal AI ChatGPT
  51. 230224 World Models

More Repositories

1

stylegan2-pytorch

Implementation of Analyzing and Improving the Image Quality of StyleGAN (StyleGAN 2) in PyTorch
Python
2,651
star
2

vq-vae-2-pytorch

Implementation of Generating Diverse High-Fidelity Images with VQ-VAE-2 in PyTorch
Python
1,503
star
3

style-based-gan-pytorch

Implementation A Style-Based Generator Architecture for Generative Adversarial Networks in PyTorch
Python
1,079
star
4

alias-free-gan-pytorch

Unofficial implementation of Alias-Free Generative Adversarial Networks. (https://arxiv.org/abs/2106.12423) in PyTorch
Python
507
star
5

glow-pytorch

PyTorch implementation of Glow
Python
492
star
6

denoising-diffusion-pytorch

Implementation of Denoising Diffusion Probabilistic Models in PyTorch
Python
334
star
7

swapping-autoencoder-pytorch

Unofficial implementation of Swapping Autoencoder for Deep Image Manipulation (https://arxiv.org/abs/2007.00653) in PyTorch
Python
257
star
8

mac-network-pytorch

Memory, Attention and Composition (MAC) Network for CLEVR implemented in PyTorch
Python
85
star
9

vision-transformers-pytorch

Implementation of various Vision Transformers I found interesting
Python
82
star
10

adaptive-softmax-pytorch

Adaptive Softmax implementation for PyTorch
Python
80
star
11

sagan-pytorch

Self-Attention Generative Adversarial Networks Implementation in PyTorch
Python
73
star
12

igebm-pytorch

Implicit Generation and Generalization in Energy Based Models in PyTorch
Python
64
star
13

ocr-pytorch

Object-Contextual Representations for Semantic Segmentation in PyTorch
Python
63
star
14

relation-networks-pytorch

Relation Networks for CLEVR implemented in PyTorch
Python
61
star
15

progressive-gan-pytorch

Implemetatin of Progressive Growing of GANs in PyTorch
Python
60
star
16

imputer-pytorch

Implementation of Imputer: Sequence Modelling via Imputation and Dynamic Programming in PyTorch
Python
58
star
17

depthwise-conv-pytorch

Faster depthwise convolutions for PyTorch
Cuda
56
star
18

fcos-pytorch

Re-implementation of FCOS for personal study
Python
51
star
19

knotter

Implementation of Mapper algorithm for Topological Data Analysis
JavaScript
46
star
20

semantic-pyramid-pytorch

Implementation of Semantic Pyramid for Image Generation (https://arxiv.org/abs/2003.06221) in PyTorch
Python
40
star
21

id-gan-pytorch

Information Distillation Generative Adversrial Network in PyTorch
Python
27
star
22

nerf-pytorch

Python
22
star
23

tensorfn

Weakly opinionated library for implementing ML models. Less boilerplate, More rigor
Python
21
star
24

taming-transformers-pytorch

Implementation of Taming Transformers for High-Resolution Image Synthesis (https://arxiv.org/abs/2012.09841) in PyTorch
17
star
25

film-pytorch

Just another implementation of FiLM in PyTorch
Python
14
star
26

instant-ngp-pytorch

Study for Instant neural graphics primitives (Unofficial)
12
star
27

melgan-pytorch

MelGAN and Tacotron 2 in PyTorch
Python
11
star
28

meshfn

Framework for Human Alignment Learning
Python
8
star
29

nansy-pytorch

Unofficial implementation of Neural Analysis and Synthesis
8
star
30

sarigan-pytorch

Unofficial implementation of Learning Semantic-aware Normalization for Generative Adversarial Networks (SariGAN) in PyTorch
8
star
31

arxiv-sanity

arXiv feed tool that heavily inspired by Arxiv Sanity Preserver
Python
6
star
32

lvpga-pytorch

Implementation of Perceptual Generative Autoencoders in PyTorch
Python
5
star
33

esrgan-pytorch

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks in PyTorch
3
star
34

dockerfiles

dockerfiles
Dockerfile
3
star
35

sujip

Non-opionated utility library for PyTorch
Python
2
star
36

rosinality.github.io

HTML
2
star
37

langfn

A DSL for LLMs
2
star
38

small-logan-pytorch

Small-GAN and LOGAN in PyTorch
2
star
39

maskrcnn-pytorch

Re-implementation of Mask R-CNN for personal study
2
star
40

usrnet-pytorch

Reimplementation of Deep Unfolding Network for Image Super-Resolution for self study.
2
star
41

synapticmap

Synaptic Map - Simple mindmapping program with directional connections
JavaScript
1
star
42

fill-blank

Paragraph embedding by solving the fill in the blank problems
Python
1
star
43

centernet-pytorch

Re-implementation of CenterNet for personal study
1
star