ML Papers
Reviews
- 191210 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ
- 200323 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ
- 200326 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ
- 200403 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ
- 200411 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ
- 200708 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ
- 200717 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ
- 200726 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ
- 200802 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ
- 201118 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ
- 201120 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ
- 201125 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ
- 201126 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ 1
- 201126 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ 2
- 201204 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ
- 210121 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ
- 210121 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ
- 210305 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ
- 210319 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ
- 210323 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ
- 210326 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ
- 210403 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ
- 210412 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ
- 210424 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ
- 210429 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ
- 210430 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ 1
- 210430 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ
- 210505 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ
- 210508 ์ต๊ทผ ๋ ผ๋ฌธ๋ค์ ๋ํ ์๊ฐ
- 230222 LLM ํ์ ๋ฐ์ดํฐ์ ์ ๋ํ ๋ฆฌ๋ทฐ
Table of contents
- 3d generative model
- activation
- active learning
- adaptation
- adapter
- adversarial training
- antialiasing
- asr
- attention
- audio generation
- audio source separation
- augmentation
- autoregressive model
- backbone
- bayesian
- bert
- bias
- calibration
- causality
- channel attention
- chat
- classificiation
- computation
- continual learning
- contrastive learning
- convolution
- dataset
- ddpm
- decoding
- deep prior
- detr
- dewarping
- dialog
- differentiable operator
- differentiable tree
- discrete vae
- disentangle
- distillation
- distributed training
- domain adaptation
- dropout
- efficiency
- efficient attention
- efficient training
- embedding
- end2end
- energy based model
- ensemble
- federated learning
- few shot
- finetuning
- flow
- fpn
- gan
- gan inversion
- generalization
- generative model
- graph
- hallucination
- hypernetwork
- hyperparameter
- identifiability
- image editing
- image generation
- img2img
- implicit model
- implicit representation
- in context learning
- instance segmentation
- instruct
- interpolation
- knowledge base
- language generation
- language model
- layout
- lightweight
- line
- llm
- lm
- local attention
- loss
- loss surface
- matting
- memory
- meta learning
- metric
- metric learning
- mixture of experts
- mixup
- mlm
- mlops
- multilingual
- multimodal
- multimodal generation
- multitask
- nas
- nerf
- neural computer
- neural ode
- neural rendering
- nlp
- nmt
- non autoregressive
- norm free
- normalization
- object detection
- ocr
- open set recognition
- optimization
- optimizer
- oriented object detection
- out of distribution
- panoptic segmentation
- perceptual loss
- point cloud
- pooling
- pose
- positional encoding
- practice
- pretraining
- probabilistic model
- prompt
- pruning
- qa
- quantization
- reasoning
- regularization
- reinforcement learning
- rendering
- representation
- resampling
- restoration
- retrieval
- review
- robustness
- saliency
- salient object detection
- scale
- score
- self supervised
- self supervised discovery
- semantic factor
- semantic segmentation
- semi supervised learning
- sgld
- singing voice synthesis
- single image
- speech
- state space model
- structure learning
- style transfer
- stylegan
- super resolution
- table
- text generation
- text2img
- tokenizer
- topic model
- topology
- tracking
- training
- transducer
- transfer
- transformer
- tropical geometry
- tts
- uncertainty
- unsupervised img2img
- unsupervised nmt
- vae
- video
- video transformer
- vision
- vision language
- vision transformer
- visual grounding
- vit
- vocoder
- vqa
- weak supervision
- yolo
- uncategorized
3d generative model
- 211220 3D-aware Image Synthesis via Learning Structural and Textural Representations
- 220615 GRAM-HD
- 220621 EpiGRAF
- 221125 3DDesigner #text2img
- 221126 AvatarGen
- 230209 In-N-Out #gan_inversion
- 230216 3D-aware Conditional Image Synthesis
activation
active learning
- 200630 Similarity Search for Efficient Active Learning and Search of Rare
- 210729 Batch Active Learning at Scale
adaptation
adapter
adversarial training
antialiasing
- 201120 An Effective Anti-Aliasing Approach for Residual Networks
- 201128 Truly shift-invariant convolutional neural networks
asr
- 200220 Imputer #non-autoregressive #ctc
- 200506 RNN-T Models Fail to Generalize to Out-of-Domain Audio #transducer #out_of_distribution #domain #regularization
- 200510 Listen Attentively, and Spell Once #non-autoregressive
- 200516 Large scale weakly and semi-supervised learning for low-resource video ASR #weak_supervision #semi_supervised_learning
- 200516 Reducing Spelling Inconsistencies in Code-Switching ASR using #ctc
- 200516 Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition #non-autoregressive
- 200518 Attention-based Transducer for Online Speech Recognition #transducer
- 200518 Iterative Pseudo-Labeling for Speech Recognition
- 200519 Distilling Knowledge from Ensembles of Acoustic Models for Joint CTC-Attention End-to-End Speech Recognition #ctc
- 200519 Improved Noisy Student Training for Automatic Speech Recognition #semi_supervised_learning
- 200729 Developing RNN-T Models Surpassing High-Performance Hybrid Models with #rnn_t
- 201021 FastEmit #transducer #decoding
- 201027 CASS-NAT #non-autoregressive
- 201125 Streaming end-to-end multi-talker speech recognition #transducer
- 210524 Unsupervised Speech Recognition #unsupervised_training
- 210608 SpeechBrain
- 211012 Word Order Does Not Matter For Speech Recognition #weak_supervision
- 211030 Pseudo-Labeling for Massively Multilingual Speech Recognition #semi_supervised_learning #multilingual
- 211210 Building a great multi-lingual teacher with sparsely-gated mixture of experts for speech recognition #moe
- 220829 A Language Agnostic Multilingual Streaming On-Device ASR System #multilingual
- 220922 Whisper
attention
- 200122 Object Contextual Representations #semantic_segmentation
- 200129 Empirical Attention
- 200130 Axial Attention #generative_model
- 200130 Criss-Cross Attention #semantic_segmentation
- 200212 Capsules with Inverted Dot-Product Attention Routing #capsule
- 200219 Tree-structured Attention with Hierarchical Accumulation #parse
- 200226 Sparse Sinkhorn Attention #sparse_attention
- 200317 Axial-DeepLab #panoptic_segmentation
- 200404 Neural Architecture Search for Lightweight Non-Local Networks
- 200421 Attention is Not Only a Weight #bert
- 200423 Self-Attention Attribution #bert
- 200428 Exploring Self-attention for Image Recognition
- 200510 CTC-synchronous Training for Monotonic Attention Model #asr #ctc
- 200516 Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory #asr #memory
- 200519 Normalized Attention Without Probability Cage
- 200519 Staying True to Your Word
- 200626 Object-Centric Learning with Slot Attention
- 201119 On the Dynamics of Training Attention Models #training
- 210223 Linear Transformers Are Secretly Fast Weight Memory Systems #linear_attention #efficient_attention
- 210225 LazyFormer #bert
- 210517 Pay Attention to MLPs #mlp
- 210524 Self-Attention Networks Can Process Bounded Hierarchical Languages #nlp
- 210826 Train Short, Test Long #positional_encoding
audio generation
audio source separation
augmentation
- 200122 FixMatch #semi_supervised_learning #manifold #mixup
- 200220 Affinity and Diversity
- 200621 AdvAug #mixup #nlp #adversarial_training
- 200710 Meta-Learning Requires Meta-Augmentation #metalearning
- 201117 Sequence-Level Mixed Sample Data Augmentation #nlp
- 201213 Simple Copy-Paste is a Strong Data Augmentation Method for Instance #instance_segmentation
- 201214 Improving Panoptic Segmentation at All Scales #panoptic_segmentation
- 210318 AlignMix #mixup
- 210318 TrivialAugment
- 210429 Ensembling with Deep Generative Views #ensemble #gan_inversion
- 220830 Augraphy
autoregressive model
- 200129 Semi Autorgressive Training
- 201027 Scaling Laws for Autoregressive Generative Modeling #scale
- 211216 Characterizing and addressing the issue of oversmoothing in neural autoregressive sequence modeling
- 220622 Scaling Autoregressive Models for Content-Rich Text-to-Image Generation #image_generation
- 230202 Accelerating Large Language Model Decoding with Speculative Sampling #decoding
backbone
- 190724 MixNet #convolution
- 200123 Antialiasing #invariance
- 200128 Attentive Normalization
- 200128 IBN-Net
- 200128 Selective Kernel
- 200128 SpineNet
- 200128 Squeeze-Excitation
- 200128 Switchable Normalization
- 200128 Switchable Whitening
- 200129 Assembled Techniques #regularization
- 200129 DenseNet
- 200129 Dual Path Networks
- 200129 HarDNet
- 200129 PyramidNet
- 200129 SelecSLS
- 200129 ShuffleNet V2 #efficiency
- 200129 VoVNet
- 200130 FishNet
- 200130 HRNet
- 200130 MixConv #convolution
- 200330 Designing Network Design Spaces #hypernetwork
- 200330 TResNet #antialiasing
- 200419 ResNeSt
- 200630 Deep Isometric Learning for Visual Recognition #normalization #resnet #cnn #norm_free
- 200712 PSConv #cnn #multiscale
- 201015 HS-ResNet #multiscale
- 201221 FcaNet #channel_attention
- 210226 Transformer in Transformer #vision_transformer
- 210304 Barlow Twins #self_supervised #contrastive_learning
- 210310 Involution #convolution #attention
- 210312 Revisiting ResNets #resnet
- 210317 Learning to Resize Images for Computer Vision Tasks #resizing
- 210331 EfficientNetV2
- 210408 SI-Score #robustness #vision_transformer
- 210505 RepMLP #mlp
- 210506 Do You Even Need Attention #mlp
- 210510 ResMLP #mlp
- 210617 Layer Folding #efficiency #pruning
- 210628 Early Convolutions Help Transformers See Better #cnn #vit
- 210718 AS-MLP #mlp
- 210726 Contextual Transformer Networks for Visual Recognition
- 211014 Non-deep Networks
- 211018 HRFormer #vit
- 211227 Augmenting Convolutional networks with attention-based aggregation #vit #cnn
- 220110 A ConvNet for the 2020s #cnn #vit
- 220313 Scaling Up Your Kernels to 31x31
- 220318 Three things everyone should know about Vision Transformers #vit
- 220728 HorNet #cnn
bayesian
- 200207 Bayes Posterior
- 200210 Liberty or Depth #mean_field
- 200514 Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors #ensemble #variational_inference
bert
- 200305 What the [MASK]
- 200405 FastBERT #distillation #lightweight
- 200408 DynaBERT #distillation #pruning
- 200412 XtremeDistil #distillation #lightweight
- 200427 DeeBERT #lightweight
- 200518 Audio ALBERT #audio #representation
- 200601 Amnesic Probing
- 200608 On the Stability of Fine-tuning BERT #finetuning
- 200610 Revisiting Few-sample BERT Fine-tuning #finetuning
- 210906 An Empirical Study on Few-shot Knowledge Probing for Pretrained Language Models #few_shot #knowledge_base #prompt
- 210907 Beyond Preserved Accuracy #lightweight #distillation
bias
- 200519 Identifying Statistical Bias in Dataset Replication
- 201202 Learning from others' mistakes #product_of_experts
- 220919 The Biased Artist #image_generation
calibration
- 200221 Calibrating Deep Neural Networks using Focal Loss #loss
- 200223 Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks #bayesian
- 200620 Regression Prior Networks
- 210730 Soft Calibration Objectives for Neural Networks
causality
channel attention
chat
- 200630 PLATO-2 #text_gen #chatbot
classificiation
- 220107 Generalized Category Discovery #open_set_recognition
computation
- 200213 Training Large Neural Networks with Constant Memory using a New Execution Algorithm
- 201204 Nimble
continual learning
- 201124 Energy-Based Models for Continual Learning #energy_based_model
- 211103 One Pass ImageNet #online_learning
contrastive learning
- 200213 A Simple Framework for Contrastive Learning of Visual Representations #augmentation
- 200309 Improved Baselines with Momentum Contrastive Learning
- 200423 Supervised Contrastive Learning #metric_learning
- 200511 Prototypical Contrastive Learning of Unsupervised Representations
- 200520 What Makes for Good Views for Contrastive Learning
- 200613 Bootstrap your own latent
- 200630 Debiased Contrastive Learning
- 200730 Contrastive Learning for Unpaired Image-to-Image Translation #img2img
- 200803 LoCo
- 201020 BYOL works even without batch statistics
- 201109 Towards Domain-Agnostic Contrastive Learning #mixup #multimodal
- 201116 AdCo #adversarial_training
- 201117 Dense Contrastive Learning for Self-Supervised Visual Pre-Training
- 201119 Heterogeneous Contrastive Learning
- 201119 Propagate Yourself
- 201121 Run Away From your Teacher
- 201123 Boosting Contrastive Self-Supervised Learning with False Negative
- 201126 Beyond Single Instance Multi-view Unsupervised Representation Learning #self_supervised #mixup
- 201126 How Well Do Self-Supervised Models Transfer #self_supervised #transfer
- 201127 Self-EMD
- 201201 Towards Good Practices in Self-supervised Representation Learning #self_supervised
- 201204 Seed the Views #mixup
- 201212 Contrastive Learning for Label-Efficient Semantic Segmentation #semantic_segmentation
- 201221 Online Bag-of-Visual-Words Generation for Unsupervised Representation #self_supervised #discrete_vae
- 201226 Spatial Contrastive Learning for Few-Shot Classification #few_shot #attention
- 210325 Rethinking Self-Supervised Learning #training
- 210405 An Empirical Study of Training Self-Supervised Vision Transformers #vision_transformer
- 210426 Multimodal Contrastive Training for Visual Representation Learning #multimodal
- 210429 A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning #video
- 210429 Emerging Properties in Self-Supervised Vision Transformers #saliency #vision_transformer #representation
- 210429 With a Little Help from My Friends #knn
- 210510 Self-Supervised Learning with Swin Transformers #vision_transformer
- 210511 VICReg
- 210517 Divide and Contrast #self_supervised #dataset #distillation
- 210601 Exploring the Diversity and Invariance in Yourself for Visual Pre-Training Task
- 211018 Understanding Dimensional Collapse in Contrastive Self-supervised Learning
- 220701 e-CLIP #vision-language #retrieval
- 220727 Contrastive Masked Autoencoders are Stronger Vision Learners #self_supervised #mlm
- 220804 Fine-Grained Semantically Aligned Vision-Language Pre-Training #vision-language
- 221017 Non-Contrastive Learning Meets Language-Image Pre-Training #clip
convolution
- 200316 SlimConv
- 210429 Decoupled Dynamic Filter Networks
- 230221 Hyena Hierarchy #state_space_model
dataset
- 200509 Building a Manga Dataset
- 201130 Image Quality Assessment for Perceptual Image Restoration #score
- 201201 Weakly-Supervised Arbitrary-Shaped Text Detection with #ocr #weak_supervision
- 210601 Comparing Test Sets with Item Response Theory
- 210907 Datasets
- 210927 PASS
- 211103 LAION-400M
- 220704 How Much More Data Do I Need
- 230220 Poisoning Web-Scale Training Datasets is Practical
ddpm
- 200619 Denoising Diffusion Probabilistic Models
- 201126 Score-Based Generative Modeling through Stochastic Differential #generative_model
- 201214 Learning Energy-Based Models by Diffusion Recovery Likelihood #energy_based_model
- 210302 Fixing Data Augmentation to Improve Adversarial Robustness #augmentation #generative_model
- 210305 Fixing Data Augmentation to Improve Adversarial Robustness 2 #robustness #augmentation #generative_model
- 210506 DiffSinger #singing_voice_synthesis
- 210511 Diffusion Models Beat GANs on Image Synthesis
- 210528 Gotta Go Fast When Generating Data with Score-Based Models
- 210531 On Fast Sampling of Diffusion Probabilistic Models
- 210607 Learning to Efficiently Sample from Diffusion Probabilistic Models
- 210610 Cascaded Diffusion Models for High Fidelity Image Generation
- 210610 Score-based Generative Modeling in Latent Space
- 210612 D2C
- 210701 Variational Diffusion Models
- 210802 SDEdit
- 210819 ImageBART #vq #autoregressive_model
- 211129 Blended Diffusion for Text-driven Editing of Natural Images #clip #image_editing
- 211130 Diffusion Autoencoders
- 211220 GLIDE #multimodal
- 211220 High-Resolution Image Synthesis with Latent Diffusion Models #vae #vq
- 220201 Progressive Distillation for Fast Sampling of Diffusion Models #distillation
- 220316 Dual Diffusion Implicit Bridges for Image-to-Image Translation
- 220524 Imagen #conditional_generative_model
- 220601 Elucidating the Design Space of Diffusion-Based Generative Models
- 220803 Pyramidal Denoising Diffusion Probabilistic Models
- 220808 Analog Bits
- 220912 Blurring Diffusion Models
- 220912 Soft Diffusion
- 220929 DreamFusion #3d_generative_model
- 221017 Imagic #image_editing
- 221018 Differentially Private Diffusion Models
- 221102 eDiffi #text2img
- 221115 Versatile Diffusion #vae
- 221117 Null-text Inversion for Editing Real Images using Guided Diffusion Models #image_editing
- 221118 Magic3D #3d_generative_model #text2img #nerf
- 221120 Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models #text2img
- 221124 Fast Sampling of Diffusion Models via Operator Learning
- 230126 On the Importance of Noise Scheduling for Diffusion Models
- 230126 simple diffusion
- 230131 Attend-and-Excite #text2img
- 230205 Design Booster #image_editing
- 230206 Zero-shot Image-to-Image Translation #image_editing
- 230207 Long Horizon Temperature Scaling #calibration #lm
- 230208 Q-Diffusion #quantization
- 230212 I$^2$SB #sde #image_restoration
- 230215 PRedItOR #image_editing
- 230216 MultiDiffusion #image_editing
- 230220 Composer #image_editing
- 230221 Diffusion Models and Semi-Supervised Learners Benefit Mutually with Few Labels #semi_supervised_learning #self_supervised
- 230221 On Calibrating Diffusion Probabilistic Models
decoding
- 200516 Layer-Wise Cross-View Decoding for Sequence-to-Sequence Learning
- 200601 Cascaded Text Generation with Markov Transformers #text_generation
- 210608 FastSeq
deep prior
detr
- 210813 Conditional DETR for Fast Training Convergence
- 220726 Group DETR #efficient_training
dewarping
dialog
differentiable operator
differentiable tree
discrete vae
disentangle
- 200130 ID-GAN #GAN
- 200130 MixNMatch #conditional_generative_model
- 200515 Face Identity Disentanglement via Latent Space Mapping
distillation
- 200129 Learning by Cheating
- 200209 Understanding and Improving Knowledge Distillation
- 200210 Subclass Distillation
- 200219 Knapsack Pruning with Inner Distillation #pruning #lightweight
- 200221 Residual Knowledge Distillation
- 200309 Knowledge distillation via adaptive instance normalization #normalization
- 200521 Why distillation helps #calibration
- 200629 An EM Approach to Non-autoregressive Conditional Sequence Generation #non-autoregressive
- 200701 Go Wide, Then Narrow #lightweight
- 200702 Interactive Knowledge Distillation
- 210726 Text is Text, No Matter What #multitask
distributed training
domain adaptation
dropout
efficiency
efficient attention
- 200410 Longformer
- 200412 ProFormer
- 200605 Masked Language Modeling for Proteins via Linearly Scalable Long-Context
- 200608 Linformer
- 210324 Finetuning Pretrained Transformers into RNNs
- 210505 Beyond Self-attention
- 210510 Poolingformer
- 210603 Luna
- 210623 Stable, Fast and Accurate
- 210705 Long-Short Transformer #local_attention
- 210712 Combiner #sparse_attention #local_attention
- 210725 H-Transformer-1D
- 211210 Self-attention Does Not Need $O(n^2)$ Memory
- 220527 FlashAttention
- 220726 DETRs with Hybrid Matching #detr
- 220911 On The Computational Complexity of Self-Attention
- 220921 Mega
efficient training
- 230216 Decoupled Model Schedule for Deep Learning Training #distributed_training
embedding
- 200424 All Word Embeddings from One Embedding
- 200717 A Unifying Perspective on Neighbor Embeddings along the
- 210907 Rare Words Degenerate All Words
end2end
- 200605 End-to-End Adversarial Text-to-Speech #tts
- 200608 FastSpeech 2 #tts
- 201106 Wave-Tacotron #tts
- 210716 Autonomy 2.0
- 211215 SPTS
energy based model
ensemble
federated learning
few shot
- 200228 AdarGCN #graph
- 210608 Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks #adapter #multitask
- 210910 LibFewShot
- 220715 Plex #uncertainty #generalization
finetuning
- 200214 AutoLR #pruning
- 200426 Masking as an Efficient Alternative to Finetuning for Pretrained
- 200709 Sample-based Regularization #transfer
flow
- 200220 Regularized Autoencoders via Relaxed Injective Probability Flow
- 200227 Woodbury Transformations for Deep Generative Flows
fpn
- 200122 CARAFE #resampling
- 200129 Mixture FPN
- 200506 Scale-Equalizing Pyramid Convolution for Object Detection
- 201201 Dynamic Feature Pyramid Networks for Object Detection
- 201202 Dual Refinement Feature Pyramid Networks for Object Detection
- 201202 Parallel Residual Bi-Fusion Feature Pyramid Network for Accurate
- 201225 Implicit Feature Pyramid Network for Object Detection #equilibrium_model #implicit_model
gan
- 170629 Do GANs actually learn the distribution
- 191022 MelGAN #tts
- 200129 Adversarial Lipschitz Regularization
- 200129 GAN generalization metric
- 200129 OneGAN
- 200130 AttentionGAN #attention #img2img
- 200130 Evaluation metrics of GAN #metric #evaluation #generative_model
- 200130 Local GAN #attention
- 200130 Noise Robust GAN #robustness
- 200130 Small-GAN
- 200130 Smoothness and Stability in GANs
- 200206 Unbalanced GANs #vae
- 200210 Unsupervised Discovery of Interpretable Directions in the GAN Latent #semantic_factor
- 200211 Improved Consistency Regularization for GANs #augmentation #consistency_regularization
- 200211 Smoothness and Stability in GANs #regularization
- 200212 Image-to-Image Translation with Text Guidance #multimodal #multimodal_generation #img2img
- 200212 Real or Not Real, that is the Question
- 200214 Top-k Training of GANs #regularization
- 200220 The Benefits of Pairwise Discriminators for Adversarial Training #regularization
- 200223 GANHopper #img2img
- 200224 When Relation Networks meet GANs #regularization
- 200225 Freeze the Discriminator #finetuning #transfer
- 200226 On Leveraging Pretrained GANs for Generation with Limited Data #finetuning #transfer
- 200227 Topology Distance #topology #score
- 200228 A U-Net Based Discriminator for Generative Adversarial Networks
- 200304 Creating High Resolution Images with a Latent Adversarial Generator #generative_model #super_resolution
- 200308 Perceptual Image Super-Resolution with Progressive Adversarial Network #super_resolution
- 200312 Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling #energy_based_model #sampling
- 200317 Blur, Noise, and Compression Robust Generative Adversarial Networks #noise
- 200318 OpenGAN #metric_learning
- 200325 Improved Techniques for Training Single-Image GANs #single_image
- 200326 Image Generation Via Minimizing Frรฉchet Distance in Discriminator Feature Space
- 200402 Controllable Orthogonalization in Training DNNs #regularization
- 200404 Feature Quantization Improves GAN Training #discrete_vae
- 200405 Discriminator Contrastive Divergence
- 200407 Inclusive GAN
- 200408 Attentive Normalization for Conditional Image Generation #attention
- 200504 Transforming and Projecting Images into Class-conditional Generative #generative_model
- 200518 Unconditional Audio Generation with Generative Adversarial Networks and Cycle Regularization #audio_generation
- 200519 CIAGAN
- 200519 Regularization Methods for Generative Adversarial Networks #review #regularization
- 200604 Image Augmentations for GAN Training #augmentation
- 200611 Training Generative Adversarial Networks with Limited Data #augmentation
- 200618 Differentiable Augmentation for Data-Efficient GAN Training #augmentation
- 200618 Diverse Image Generation via Self-Conditioned GANs #generative_model
- 200630 PriorGAN
- 200708 InfoMax-GAN #regularization
- 200713 Closed-Form Factorization of Latent Semantics in GANs #semantic_factor
- 200729 Instance Selection for GANs
- 200729 VocGAN #vocoder
- 200730 Rewriting a Deep Generative Model
- 200804 Open-Edit #image_editing
- 200807 Improving the Speed and Quality of GAN by Adversarial Training #robustness
- 201028 Training Generative Adversarial Networks by Solving Ordinary #neural_ode
- 201109 Learning Semantic-aware Normalization for Generative Adversarial Networks #normalization
- 201109 Towards a Better Global Loss Landscape of GANs #training
- 201118 Style Intervention #semantic_factor
- 201124 Adversarial Generation of Continuous Images #implicit_representation
- 201125 How to train your conditional GAN #img2img #generative_model
- 201125 Omni-GAN #generative_model
- 201127 Image Generators with Conditionally-Independent Pixel Synthesis #implicit_representation
- 201201 Refining Deep Generative Models via Discriminator Gradient Flow #sampling
- 201201 pi-GAN #implicit_representation
- 201203 Self-labeled Conditional GANs #unsupervised_training
- 201204 A Note on Data Biases in Generative Models #bias #generative_model
- 201208 You Only Need Adversarial Supervision for Semantic Image Synthesis #img2img
- 210227 Ultra-Data-Efficient GAN Training #augmentation #few_shot
- 210317 Training GANs with Stronger Augmentations via Contrastive Discriminator #contrastive_learning #augmentation
- 210318 Drop the GAN #single_image #generative_model #patch
- 210330 Dual Contrastive Loss and Attention for GANs #contrastive_learning
- 210401 Partition-Guided GANs
- 210407 Regularizing Generative Adversarial Networks under Limited Data #regularization
- 210408 InfinityGAN
- 210413 DatasetGAN #few_shot
- 210413 Few-shot Image Generation via Cross-domain Correspondence #img2img #generative_model #few_shot
- 210414 Aligning Latent and Image Spaces to Connect the Unconnectable
- 210415 GANcraft #nerf
- 210422 On Buggy Resizing Libraries and Surprising Subtleties in FID Calculation #antialiasing
- 210426 EigenGAN #semantic_factor
- 210608 Data-Efficient Instance Generation from Instance Discrimination #contrastive_learning
- 210614 Improved Transformer for High-Resolution GANs #transformer #efficient_training
- 210623 Alias-Free Generative Adversarial Networks #antialiasing
- 210910 Instance-Conditioned GAN
- 210927 WarpedGANSpace
- 211017 AE-StyleGAN #gan_inversion
- 211101 Projected GANs Converge Faster
- 211215 Efficient Geometry-aware 3D Generative Adversarial Networks #nerf
- 211216 GRAM #3d_generative_model #nerf
- 220201 StyleGAN-XL
- 220219 Truncated Diffusion Probabilistic Models #generative_model #ddpm
- 220224 Self-Distilled StyleGAN
- 220311 The Role of ImageNet Classes in Frรฉchet Inception Distance
- 220314 InsetGAN for Full-Body Image Generation #pose
- 220414 Any-resolution Training for High-resolution Image Synthesis
- 230123 StyleGAN-T #text2img
gan inversion
- 200330 Exploiting Deep Generative Prior for Versatile Image Restoration and #perceptual_loss
- 200331 In-Domain GAN Inversion for Real Image Editing
- 200703 Collaborative Learning for Faster StyleGAN Embedding
- 200803 Encoding in Style #stylegan
- 220223 Near Perfect GAN Inversion
generalization
- 200130 Fantastic Generalization Measures
- 200225 Rethinking Bias-Variance Trade-off for Generalization of Neural Networks
generative model
- 190325 Implicit Generative and Generalization in Energy-Based Models #energy_based_model
- 200129 Controlling Generative Model
- 200129 Deep Automodulator
- 200129 Frechet Joint Distance
- 200129 Spot CNN generated image
- 200130 BIVA
- 200130 Glow #flow
- 200130 IGEBM #energy_based_model
- 200130 Neural Spline Flows #flow
- 200130 VQ-VAE-2 #autoregressive_model
- 200217 Augmented Normalizing Flows #flow
- 200313 Semantic Pyramid for Image Generation #perceptual_loss #image_editing
- 200616 Improved Techniques for Training Score-Based Generative Models #ncsn
- 201117 DeepNAG
- 201202 Improved Contrastive Divergence Training of Energy Based Models #energy_based_model
- 201204 Few-shot Image Generation with Elastic Weight Consolidation #few_shot #continual_learning
- 201209 Positional Encoding as Spatial Inductive Bias in GANs #positional_encoding
- 201224 Soft-IntroVAE #vae
- 210223 Zero-Shot Text-to-Image Generation #discrete_vae #autoregressive_model #multimodal
- 210318 Few-shot Semantic Image Synthesis Using StyleGAN Prior #stylegan #few_shot
- 210824 SimVLM #vision-language
- 211015 MaGNET #sampling
- 220208 MaskGIT #autoregressive_model #non-autoregressive #vq
graph
hallucination
hypernetwork
- 200722 WeightNet #channel_attention
hyperparameter
identifiability
image editing
- 200515 Semantic Photo Manipulation with a Generative Image Prior
- 201123 HistoGAN
- 201127 Navigating the GAN Parameter Space for Semantic Image Editing #semantic_factor
- 210318 Using latent space regression to analyze and leverage compositionality
- 220531 IDE-3D #3d_generative_model
- 220802 An Image is Worth One Word
- 220802 Prompt-to-Prompt Image Editing with Cross Attention Control
- 230202 Dreamix #video
- 230213 3D-aware Blending with Generative NeRFs #3d_generative_model
image generation
img2img
- 200130 FUNIT
- 200305 SketchyCOCO
- 200315 GMM-UNIT #multimodal_generation
- 200319 High-Resolution Daytime Translation Without Domain Labels
- 200330 Semi-supervised Learning for Few-shot Image-to-Image Translation #semi_supervised_learning #few_shot
- 200406 Rethinking Spatially-Adaptive Normalization #lightweight
- 200409 TuiGAN #few_shot #single_image
- 200419 TriGAN #domain_adaptation
- 200702 Deep Single Image Manipulation #single_image #image_editing
- 200709 Improving Style-Content Disentanglement in Image-to-Image Translation #disentangle
- 200714 COCO-FUNIT
- 200715 Transformation Consistency Regularization- A Semi-Supervised Paradigm #augmentation #semi_supervised_learning
- 200723 TSIT
- 200724 The Surprising Effectiveness of Linear Unsupervised Image-to-Image
- 201203 CoCosNet v2 #patch #pose
- 201205 Spatially-Adaptive Pixelwise Networks for Fast Image Translation #implicit_representation
implicit model
implicit representation
- 210408 Modulated Periodic Activations for Generalizable Local Functional #positional_encoding #periodic_activation
- 210506 ACORN #positional_encoding
- 211026 NeRV
- 211122 Neural Fields in Visual Computing and Beyond
- 220117 Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
- 220522 ReLU Fields
- 230202 Factor Fields
in context learning
- 220520 Prototypical Calibration for Few-shot Learning of Language Models
- 220522 Instruction Induction
instance segmentation
- 200129 BlendMask
- 200129 COCO 2018 Instance Segmentation #challenge
- 200129 Deep Snake
- 200130 PointRend
- 200311 Conditional Convolutions for Instance Segmentation
- 200313 PointINS #dynamic_conv
- 200722 Deep Variational Instance Segmentation
- 200730 LevelSet R-CNN
- 201119 DCT-Mask
- 201119 Unifying Instance and Panoptic Segmentation with Dynamic Rank-1 #panoptic_segmentation #dynamic_conv
- 201126 The Devil is in the Boundary
- 201129 End-to-End Video Instance Segmentation with Transformers #end2end #detr #video
- 201203 BoxInst #dataset #weak_supervision
- 210503 ISTR #end2end
- 210505 QueryInst #end2end
- 210604 SOLQ
- 210713 Per-Pixel Classification is Not All You Need for Semantic Segmentation #panoptic_segmentation #semantic_segmentation #detr
- 221110 OneFormer #semantic_segmentation #panoptic_segmentation #detr
instruct
interpolation
- 200804 Autoencoder Image Interpolation by Shaping the Latent Space
- 211018 Learning in High Dimension Always Amounts to Extrapolation #extrapolation
knowledge base
language generation
language model
- 200128 Scaling Laws for LM
- 200205 K-Adapter #multitask #adapter
- 200206 Consistency of a Recurrent Language Model With Respect to Incomplete #decoding #hallucination #language_generation
- 200222 Training Question Answering Models From Synthetic Data #qa #bert
- 200225 MiniLM #distillation #lightweight
- 200406 Sparse Text Generation #language_generation #sampling
- 200427 Recall and Learn #finetuning #continual_learning
- 200505 Stolen Probability
- 200516 MicroNet for Efficient Language Modeling #lightweight
- 200518 Contextual Embeddings
- 201015 Fine-Tuning Pre-trained Language Model with Weak Supervision #transfer #weak_supervision
- 201023 Rethinking embedding coupling in pre-trained language models #regularization
- 201201 How Can We Know When Language Models Know #qa #calibration
- 201228 Universal Sentence Representation Learning with Conditional Masked #sentence_embedding #mlm
- 210216 Non-Autoregressive Text Generation with Pre-trained Language Models #non-autoregressive #text_generation
- 210318 GPT Understands, Too #finetuning #prompt
- 210407 Revisiting Simple Neural Probabilistic Language Models
- 210420 Carbon Emissions and Large Neural Network Training #nlp
- 210922 Recursively Summarizing Books with Human Feedback #summarization
layout
- 210601 Incorporating Visual Layout Structures for Scientific Text Classification
- 210902 Skim-Attention
- 220418 LayoutLMv3
- 220517 MATrIX -- Modality-Aware Transformer for Information eXtraction
- 220912 PreSTU
- 220918 ERNIE-mmLayout
lightweight
- 200624 Neural Architecture Design for GPU-Efficient Networks
- 201124 MicroNet
- 210507 Pareto-Optimal Quantized ResNet Is Mostly 4-bit #quantization
- 220409 Searching for Efficient Neural Architectures for On-Device ML on Edge TPUs
line
llm
- 220521 Scaling Laws and Interpretability of Learning from Repeated Data
- 220522 Memorization Without Overfitting
- 220524 Large Language Models are Zero-Shot Reasoners #prompt
- 220711 Exploring Length Generalization in Large Language Models
- 220711 Language Models (Mostly) Know What They Know
- 220926 Can Large Language Models Truly Understand Prompts
- 220929 Compositional Semantic Parsing with Large Language Models #semantic_parsing
- 221017 Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them #prompt #reasoning
- 221020 Transcending Scaling Laws with 0.1% Extra Compute #mlm
- 221103 Inverse scaling can become U-shaped #prompt
- 221109 BLOOM
- 221109 Efficiently Scaling Transformer Inference #efficiency
- 221118 PAL #prompt
- 221118 SmoothQuant #quantization
- 230124 A Watermark for Large Language Models
- 230126 DetectGPT
- 230131 Faithful Chain-of-Thought Reasoning #prompt
- 230131 Grounding Language Models to Images for Multimodal Generation #multimodal_generation #vision-language
- 230131 Large Language Models Can Be Easily Distracted by Irrelevant Context #in_context_learning
- 230209 Toolformer
- 230211 Characterizing Attribution and Fluency Tradeoffs for Retrieval-Augmented Large Language Models #retrieval
- 230215 Learning Performance-Improving Code Edits #in_context_learning
- 230215 The Capacity for Moral Self-Correction in Large Language Models #instruct #ethics
- 230216 Pretraining Language Models with Human Preferences #instruct #alignment
- 230221 ChatGPT #instruct
lm
- 210524 StructuralLM #layout
- 210524 True Few-Shot Learning with Language Models #few_shot
- 210528 ByT5
- 210617 LoRA #adapter #finetuning
- 210623 Charformer #tokenizer
- 210714 Deduplicating Training Data Makes Language Models Better #corpus
- 210714 HTLM
- 210811 DEMix Layers #mixture_of_experts
- 210813 Curriculum Learning #curriculum
- 210816 On the Opportunities and Risks of Foundation Models
- 210902 Do Prompt-Based Models Really Understand the Meaning of their Prompts #prompt
- 210903 Finetuned Language Models Are Zero-Shot Learners #zero-shot
- 210908 A Recipe For Arbitrary Text Style Transfer with Large Language Models #prompt
- 211011 Unsupervised Neural Machine Translation with Generative Language Models Only #unsupervised_nmt
- 211015 Multitask Prompted Training Enables Zero-Shot Task Generalization #zero-shot
- 211016 Invariant Language Modeling #irm
- 211016 MarkupLM #layout
- 211016 Sharpness-Aware Minimization Improves Language Model Generalization #regularization
- 211020 Shaking the foundations #causality
- 211027 Training Verifiers to Solve Math Word Problems
- 211213 GLaM #moe
- 211220 Efficient Large Scale Language Modeling with Mixtures of Experts #mixture_of_experts
- 220210 Red Teaming Language Models with Language Models #safety
- 220213 A Contrastive Framework for Neural Text Generation #decoding
- 220215 General-purpose, long-context autoregressive modeling with Perceiver AR #efficient_attention #autoregressive_model
- 220314 Efficient Language Modeling with Sparse all-MLP #mlp
- 220329 Training Compute-Optimal Large Language Models
- 220413 METRO
- 220414 GPT-NeoX-20B
- 220502 OPT
- 220524 On the Role of Bidirectionality in Language Model Pre-Training #bert
- 220728 Efficient Training of Language Models to Fill in the Middle #mlm
- 220805 Branch-Train-Merge #product_of_experts #ensemble
- 220805 Few-shot Learning with Retrieval Augmented Language Model #retrieval #few_shot
- 221110 The CRINGE Loss #safety
- 230131 In-Context Retrieval-Augmented Language Models #retrieval
local attention
loss
loss surface
matting
memory
meta learning
- 200221 Learning to Continually Learn #continual_learning
- 200312 Online Fast Adaptation and Knowledge Accumulation
- 200401 Editable Neural Networks
- 200706 Meta-Learning Symmetries by Reparameterization #group_equivariance
metric
metric learning
mixture of experts
mixup
- 201220 ResizeMix
- 211228 LINDA #interpolation
mlm
- 200424 Probabilistically Masked Language Model Capable of Autoregressive Generation in Arbitrary Word Order #language_generation
- 210502 Larger-Scale Transformers for Multilingual Masked Language Modeling #multilingual #scale
- 220216 Should You Mask 15% in Masked Language Modeling
- 220715 Position Prediction as an Effective Pretraining Strategy #unsupervised_training
- 220909 Improved Masked Image Generation with Token-Critic #non-autoregressive
- 220929 Bidirectional Language Models Are Also Few-shot Learners #in_context_learning
- 221006 XDoc #layoutlm
- 221114 EVA #clip
- 230204 Representation Deficiency in Masked Language Modeling
mlops
multilingual
- 220512 Lifting the Curse of Multilinguality by Pre-training Modular Transformers #adapter #mixture_of_experts
multimodal
- 200401 Pixel-BERT
- 200513 INFOTABS
- 200514 Behind the Scene
- 201130 Multimodal Pretraining Unmasked
- 210928 VideoCLIP #video_transformer #retrieval
- 220512 A Generalist Agent #reinforcement_learning
- 220527 GIT
- 230110 Scaling Laws for Generative Mixed-Modal Language Models
- 230123 Zorro #video #audio
- 230201 mPLUG-2
multimodal generation
multitask
- 200508 Transforming task representations to perform novel tasks #continual_learning
- 200625 MTAdam
- 210825 Multi-Task Self-Training for Learning General Representations
- 220520 UViM
- 230207 Exploring the Benefits of Training Expert Language Models over Instruction Tuning #instruct
nas
- 200324 BigNAS
- 200326 Are Labels Necessary for Neural Architecture Search #unsupervised_training
- 200406 Network Adjustment
- 200412 FBNetV2
- 200428 Angle-based Search Space Shrinking for Neural Architecture Search
- 200506 Local Search is State of the Art for Neural Architecture Search
- 200507 Noisy Differentiable Architecture Search
- 200602 FBNetV3 #hyperparameter #training #swa
- 200720 NSGANetV2
- 220831 Efficient Sparsely Activated Transformers #moe
nerf
- 201014 NeRF++
- 201125 Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes
- 201127 D-NeRF
- 201203 Learned Initializations for Optimizing Coordinate-Based Neural #implicit_representation
- 201203 pixelNeRF
- 201215 Object-Centric Neural Scene Rendering
- 210225 IBRNet
- 210318 FastNeRF
- 210318 GNeRF
- 210318 MVSNeRF
- 210318 NeMI
- 210324 Mip-NeRF
- 210325 KiloNeRF
- 210325 PlenOctrees for Real-time Rendering of Neural Radiance Fields
- 210706 Depth-supervised NeRF
- 210809 NeuralMVS
- 211019 CIPS-3D #stylegan
- 211129 Deblur-NeRF
- 211129 HDR-NeRF
- 211129 Urban Radiance Fields
- 211210 CityNeRF
- 221010 NerfAcc
- 230204 AV-NeRF
- 230208 Nerfstudio
neural computer
neural ode
- 200207 How to train your neural ODE
- 200520 Neural Controlled Differential Equations
- 200708 Learning Differential Equations that are Easy to Solve
neural rendering
- 200226 Learning to Shadow Hand-drawn Sketches
- 200427 Neural Hair Rendering
- 200506 CONFIG
- 201116 Stylized Neural Painting
- 201119 Creative Sketch Generation
- 201130 Animating Pictures with Eulerian Motion Fields #single_image
- 210319 Paint by Word
- 210512 Enhancing Photorealism Enhancement
- 211013 ADOP
- 220728 Neural Strands
nlp
- 200518 (Re)construing Meaning in NLP
- 200715 Towards Debiasing Sentence Representations #bias
- 220826 What Do NLP Researchers Believe
nmt
- 200207 A Multilingual View of Unsupervised Machine Translation #multilingual
- 200427 Lexically Constrained Neural Machine Translation with Levenshtein Transformer
- 200710 Learn to Use Future Information in Simultaneous Translation #simultaneous_translation
- 201224 Why Neural Machine Translation Prefers Empty Outputs #hallucination
- 211015 Breaking Down Multilingual Machine Translation #multilingual
- 230120 Is ChatGPT A Good Translator #chatgpt
- 230219 Scaling Laws for Multilingual Neural Machine Translation #multilingual #scaling
non autoregressive
- 200403 Aligned Cross Entropy for Non-Autoregressive Machine Translation
- 200415 Non-Autoregressive Machine Translation with Latent Alignments #nmt #ctc
- 200422 A Study of Non-autoregressive Model for Sequence Generation
- 201022 Parallel Tacotron #vae
- 201025 Improved Mask-CTC for Non-Autoregressive End-to-End ASR #ctc
- 201125 FBWave #vocoder #lightweight
- 201207 EfficientTTS #tts
- 211213 Step-unrolled Denoising Autoencoders for Text Generation
- 220520 Lossless Acceleration for Seq2seq Generation with Aggressive Decoding #efficiency
norm free
- 200310 ReZero is All You Need #initialization
normalization
- 200122 Group Norm, Weight Standardization
- 200122 Moving Average Batch Normalization
- 200122 StyleGAN 2 #GAN
- 200130 Rethinking Normalization
- 200130 Weight Standardization #weight
- 200224 Batch Normalization Biases Residual Blocks Towards the Identity Function #optimization #norm_free #initialization
- 200306 TaskNorm #meta_learning
- 200406 Evolving Normalization-Activation Layers #nas #activation
- 200427 A Batch Normalized Inference Network Keeps the KL Vanishing Away
- 201128 Batch Normalization with Enhanced Linear Transformation
- 211026 Revisiting Batch Normalization
object detection
- 191118 Anchor-Free
- 191118 CenterMask #instance_segmentation #backbone #1stage
- 191121 EfficientDet
- 200103 BlendMask #instance_segmentation #1stage
- 200122 SABL
- 200129 AP Loss #loss
- 200129 Backbone Reallocation for Detection #backbone #nas
- 200129 Dense RepPoints
- 200129 DetNAS #nas #backbone
- 200129 IOU-aware single stage detector #1stage
- 200130 ATSS #anchor #retinanet #fcos
- 200130 AutoAugment #augmentation #search
- 200130 EfficientDet #fpn
- 200130 Keypoint Triplet #keypoint
- 200130 Learning from Noisy Anchors
- 200130 Multiple Anchor Learning #anchor
- 200130 Objects as Points #keypoint
- 200130 Soft Anchor-Point #anchor
- 200211 Object Detection as a Positive-Unlabeled Problem #positive_unlabled #dataset
- 200212 Solving Missing-Annotation Object Detection with Background #dataset #noise
- 200218 Universal-RCNN #multi_dataset #graph
- 200316 Frustratingly Simple Few-Shot Object Detection #few_shot
- 200317 Revisiting the Sibling Head in Object Detector
- 200319 Revisiting the Sibling Head in Object Detector #review
- 200320 CentripetalNet #keypoint
- 200413 Dynamic R-CNN
- 200423 YOLOv4
- 200511 Scope Head for Accurate Localization in Object Detection
- 200526 End-to-End Object Detection with Transformers #end2end #matching
- 200603 DetectoRS
- 200611 Rethinking Pre-training and Self-training #semi_supervised_learning #transfer
- 200706 LabelEnc #distillation
- 200707 AutoAssign #anchor_free
- 200714 AQD #quantization
- 200715 Probabilistic Anchor Assignment with IoU Prediction for Object Detection #anchor #1stage
- 200716 RepPoints V2 #1stage #anchor_free
- 200723 PP-YOLO #tuning
- 200723 The Devil is in Classification #longtail
- 200727 Corner Proposal Network for Anchor-free, Two-stage Object Detection #anchor_free #2stage
- 201116 Scaled-YOLOv4
- 201118 End-to-End Object Detection with Adaptive Clustering Transformer #detr #end2end #efficiency
- 201121 Rethinking Transformer-based Set Prediction for Object Detection #detr #end2end #efficiency
- 201124 Sparse R-CNN
- 201128 Class-agnostic Object Detection
- 201207 End-to-End Object Detection with Fully Convolutional Network #end2end
- 201223 SWA Object Detection #swa
- 201227 Towards A Category-extended Object Detector without Relabeling or #continual_learning
- 210225 Simple multi-dataset detection #multi_dataset
- 210316 You Only Look One-level Feature
- 210325 USB #dataset
- 210417 TransVG #visual_grounding
- 210420 PP-YOLOv2 #yolo
- 210426 MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding #detr #visual_grounding
- 210601 You Only Look at One Sequence #vit
- 210615 Dynamic Head #attention
- 210718 YOLOX #yolo
- 210728 SimROD #domain_adaptation #self_supervised
- 210922 Pix2seq #detr #autoregressive_model
- 210929 Localizing Objects with Self-Supervised Transformers and no Labels #self_supervised #self_supervised_discovery #salient_object_detection
- 211101 PP-PicoDet #lightweight
- 211122 Benchmarking Detection Transfer Learning with Vision Transformers #unsupervised_training #vit
- 211123 Dynamic DETR
- 211129 Sparse DETR #detr
- 220107 Detecting Twenty-thousand Classes using Image-level Supervision #weak_supervision
- 220330 Exploring Plain Vision Transformer Backbones for Object Detection #vit #instance_segmentation
- 220615 A Unified Sequence Interface for Vision Tasks #multitask #instance_segmentation #keypoint
ocr
- 191231 LayoutLM
- 200217 Text Perceptron
- 210415 Rethinking Text Line Recognition Models
- 220107 Data-Efficient Information Extraction from Form-Like Documents #information_extraction
- 220328 Towards End-to-End Unified Scene Text Detection and Layout Analysis
- 220416 Pushing the Performance Limit of Scene Text Recognizer without Human Annotation
open set recognition
optimization
- 200221 The Break-Even Point on Optimization Trajectories of Deep Neural Networks #loss #training
- 200224 The Early Phase of Neural Network Training
- 200227 Using a thousand optimization tasks to learn hyperparameter search strategies #optimizer #hyperparameter
- 200228 A Self-Tuning Actor-Critic Algorithm #reinforcement_learning #hyperparameter #meta_learning
- 200316 Weak and Strong Gradient Directions
- 200403 Gradient Centralization #training
- 200508 An Investigation of Why Overparameterization Exacerbates Spurious #training
- 200519 One Size Fits All
optimizer
- 200130 LAMB #large_batch
- 211006 8-bit Optimizers via Block-wise Quantization
- 221117 VeLO
- 230118 Learning-Rate-Free Learning by D-Adaptation
- 230213 Symbolic Discovery of Optimization Algorithms #search
oriented object detection
out of distribution
- 200509 Generalizing Outside the Training Set
- 200519 Bridging the Gap Between Training and Inference for Spatio-Temporal Forecasting
panoptic segmentation
- 200129 Bridge gap of traininfer Panoptic Segmentation
- 200130 Panoptic-DeepLab
- 200218 Towards Bounding-Box Free Panoptic Segmentation #box_free
- 200404 Pixel Consensus Voting for Panoptic Segmentation
- 200421 Panoptic-based Image Synthesis #neural_rendering
- 201123 Scaling Wide Residual Networks for Panoptic Segmentation #scale
- 201201 Fully Convolutional Networks for Panoptic Segmentation #dynamic_conv
- 201201 MaX-DeepLab #detr #end2end
- 201202 Single-shot Path Integrated Panoptic Segmentation #dynamic_conv
- 210910 Panoptic Narrative Grounding #visual_grounding
- 211202 Masked-attention Mask Transformer for Universal Image Segmentation #detr
perceptual loss
- 200206 Image Fine-grained Inpainting #inpainting
- 200515 Enhancing Perceptual Loss with Adversarial Feature Matching for Super-Resolution
- 200626 A Loss Function for Generative Neural Networks Based on Watson's
- 201223 Focal Frequency Loss for Image Reconstruction and Synthesis #loss
point cloud
pooling
pose
- 200729 Unselfie #inpainting
- 210913 Pose with Style
positional encoding
- 200628 Rethinking Positional Encoding in Language Pre-training
- 210706 Rethinking Positional Encoding
practice
pretraining
- 190620 XLNet #language_model
- 190729 RoBERTa #language_model
- 200128 mBART #machine_translation #nlp
- 200129 ImageBERT #multimodal
- 200129 LM Pretraining #nlp
- 200129 oLMpics #language_model #nlp
- 200130 RoBERTa #language_model #nlp #transformer
- 200130 T5 #nlp #transformer #seq2seq
- 200130 ViLBERT #multimodal
- 200210 Pre-training Tasks for Embedding-based Large-scale Retrieval #retrieval
- 200217 Incorporating BERT into Neural Machine Translation #language_model #bert #nmt
- 200219 CodeBERT #bert
- 200228 UniLMv2 #language_model
- 200317 Calibration of Pre-trained Transformers #calibration
- 200405 Unsupervised Domain Clusters in Pretrained Language Models #domain
- 200412 Pre-training Text Representations as Meta Learning #meta_learning #finetuning
- 200413 Pretrained Transformers Improve Out-of-Distribution Robustness #out_of_distribution
- 200419 Are we pretraining it right #multimodal
- 200420 Adversarial Training for Large Neural Language Models #adversarial_training #language_model #finetuning
- 200420 MPNet #language_model
- 200423 Don't Stop Pretraining #domain
- 200427 LightPAFF #distillation #finetuning
- 200520 Pretraining with Contrastive Sentence Objectives Improves Discourse Performance of Language Models #contrastive_learning #sentence_embedding
- 200610 MC-BERT
- 200615 To Pretrain or Not to Pretrain #nlp #finetuning
- 200626 Pre-training via Paraphrasing #retrieval
- 200703 Language-agnostic BERT Sentence Embedding #embedding #multilingual
- 200713 An Empirical Study on Robustness to Spurious Correlations using #nlp #multitask
- 200715 InfoXLM #nlp #cross_lingual
- 200804 Taking Notes on the Fly Helps BERT Pre-training #nlp
- 201020 Pushing the Limits of Semi-Supervised Learning for Automatic Speech #semi_supervised_learning #asr
- 201021 Self-training and Pre-training are Complementary for Speech Recognition #self_supervised #asr
- 201022 mT5 #language_model #multilingual
- 201109 When Do You Need Billions of Words of Pretraining Data #language_model
- 201117 UP-DETR #detr #end2end #object_detection
- 201127 Progressively Stacking 2.0 #efficiency
- 201201 Pre-Trained Image Processing Transformer #contrastive_learning #vision_transformer #restoration
- 201201 StructFormer #parse #attention #mlm
- 201227 Syntax-Enhanced Pre-trained Model #language_model #syntax
- 210225 SparseBERT #attention #sparse_attention #bert
- 210318 All NLP Tasks Are Generation Tasks #language_model
- 210324 Can Vision Transformers Learn without Natural Images #vision_transformer
- 210402 Robust wav2vec 2.0 #asr
- 210407 Pushing the Limits of Non-Autoregressive Speech Recognition #non-autoregressive #asr #ctc
- 210413 Masked Language Modeling and the Distributional Hypothesis #language_model #mlm
- 210417 mT6 #language_model
- 210418 Data-Efficient Language-Supervised Zero-Shot Learning with #multimodal
- 210422 ImageNet-21K Pretraining for the Masses #backbone
- 210510 Are Pre-trained Convolutions Better than Pre-trained Transformers #nlp #convolution #transformer
- 210606 On the Effectiveness of Adapter-based Tuning for Pretrained Language Model Adaptation #finetuning #adapter
- 210606 Rethinking Training from Scratch for Object Detection #object_detection
- 210608 DETReg #detr
- 210614 SAS
- 210615 BEiT #vit #bert
- 210907 How much pretraining data do language models need to learn syntax #bert
- 210910 ReasonBERT #bert #reasoning #qa
- 210913 STraTA #finetuning #semi_supervised_learning #few_shot
- 210914 Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition #asr
- 210914 Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding #finetuning #semi_supervised_learning #few_shot
- 210927 BigSSL #asr #semi_supervised_learning #unsupervised_training
- 211005 Exploring the Limits of Large Scale Pre-training #classificiation #scaling
- 211018 Unsupervised Finetuning #unsupervised_training #finetuning
- 211026 WavLM #speech
- 211103 VLMo #mixture_of_experts #vision-language
- 211111 Masked Autoencoders Are Scalable Vision Learners #vit
- 211122 ExT5 #multitask
- 211122 Florence #vision-language #transfer
- 211201 Revisiting the Transferability of Supervised Pretraining #transfer
- 211216 Masked Feature Prediction for Self-Supervised Visual Pre-Training #self_supervised
- 211220 Are Large-scale Datasets Necessary for Self-Supervised Pre-training #self_supervised #transfer
- 220429 Vision-Language Pre-Training for Boosting Scene Text Detectors
- 220914 PaLI #vision-language
probabilistic model
prompt
- 220118 ZeroPrompt #zero-shot
- 220916 Text and Patterns
- 230207 Hard Prompts Made Easy #text2img
pruning
- 200130 Rethinking Pruning
- 200218 Picking Winning Tickets Before Training by Preserving Gradient Flow #lottery_ticket
- 200224 HRank #rank
- 200305 Comparing Rewinding and Fine-tuning in Neural Network Pruning
- 200424 Convolution-Weight-Distribution Assumption
- 200514 Bayesian Bits #quantization #variational_inference
- 200515 Movement Pruning
- 200518 Joint Multi-Dimension Pruning
- 200706 Lossless CNN Channel Pruning via Decoupling Remembering and Forgetting
- 200710 To Filter Prune, or to Layer Prune, That Is The Question
qa
quantization
reasoning
regularization
- 200130 DropAttention #dropout
- 200219 Revisiting Training Strategies and Generalization Performance in Deep #metric_learning
- 200225 On Feature Normalization and Data Augmentation #normalization #mixup
- 200228 The Implicit and Explicit Regularization Effects of Dropout #dropout
- 200331 Regularizing Class-wise Predictions via Self-knowledge Distillation #distillation #consistency_regularization
- 200409 Orthogonal Over-Parameterized Training
- 200424 Dropout as an Implicit Gating Mechanism For Continual Learning
- 200427 Scheduled DropHead
- 200513 Implicit Regularization in Deep Learning May Not Be Explainable by Norms #training #optimization
- 200707 RIFLE #finetuning
- 200707 Remix #imbalanced
- 200721 Improving compute efficacy frontiers with SliceOut #efficient_training
- 201122 Stable Weight Decay Regularization
- 220527 Sharpness-Aware Training for Free
reinforcement learning
- 191120 Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
- 200130 Mastering Atari, Go, Chess, Shogi
- 200626 Critic Regularized Regression
- 210929 Vision-Guided Quadrupedal Locomotion in the Wild with Multi-Modal Delay Randomization
- 211030 Mastering Atari Games with Limited Data
- 230210 The Wisdom of Hindsight Makes Language Models Better Instruction Followers #instruct
rendering
representation
- 200220 Neural Bayes #bayesian #clustering
- 200412 Gradients as Features for Deep Representation Learning
- 201223 Noisy Labels Can Induce Good Representations #noise
resampling
restoration
- 200402 Learning to See Through Obstructions
- 200404 Deblurring by Realistic Blurring
- 200406 Self-Supervised Scene De-occlusion
- 201123 Cross-Camera Convolutional Color Constancy
- 201123 Dissecting Image Crops
retrieval
- 210715 Internet-Augmented Dialogue Generation #dialog
- 220124 Text and Code Embeddings by Contrastive Pre-Training
review
- 200130 Filter Response Normalization
- 200227 A Primer in BERTology #bert
- 200306 What is the State of Neural Network Pruning #pruning
- 200311 Improved Baselines with Momentum Contrastive Learning #contrastive_learning
- 200318 A Metric Learning Reality Check #metric_learning
- 200324 A Systematic Evaluation
- 200325 Rethinking Few-Shot Image Classification #meta_learning
- 200408 State of the Art on Neural Rendering #neural_rendering
- 200409 EvoNorm
- 200428 Showing Your Work Doesn't Always Work
- 200619 Augmentation for GANs
- 200627 Denoising Diffusion Probabilistic Models Implementation
- 200717 Semantic factor of GANs
- 200725 Neighbor Embedding
- 200821 Virtual Try On
- 201016 Representation Learning via Invariant Causal Mechanisms
- 201021 BYOL works even without batch statistics
- 201108 Long Range Arena #attention #efficient_attention
- 201112 Learning Semantic-aware Normalization for Generative Adversarial Networks
- 201112 When Do You Need Billions of Words of Pretraining Data
- 210324 A Broad Study on the Transferability of Visual Representations with Contrastive Learning #contrastive_learning
- 210325 Contrasting Contrastive Self-Supervised Representation Learning Models #contrastive_learning
- 210512 When Does Contrastive Visual Representation Learning Work #contrastive_learning #self_supervised #transfer
robustness
- 200211 Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial #adversarial_training
- 200304 A Closer Look at Accuracy vs. Robustness #adversarial_training
- 200810 Informative Dropout for Robust Representation Learning
- 220607 Can CNNs Be More Robust Than Transformers
saliency
salient object detection
scale
- 200712 Learning to Learn Parameterized Classification Networks for Scalable #hypernetwork
- 201130 Towards Better Accuracy-efficiency Trade-offs
score
self supervised
- 200213 Automatically Discovering and Learning New Visual Categories with Ranking Statistics #weak_supervision
- 200218 MAST #tracking
- 200224 Self-Adaptive Training #noise #dataset
- 200408 Improving BERT with Self-Supervised Attention #bert #distillation
- 200722 CrossTransformers #few_shot
- 201015 Representation Learning via Invariant Causal Mechanisms #causality
- 201117 Neural Semi-supervised Learning for Text Classification Under #nlp
- 201125 Can Temporal Information Help with Contrastive Self-Supervised Learning #video #augmentation
- 201224 Self-supervised Pre-training with Hard Examples Improves Visual #mixup
- 210726 Continental-Scale Building Detection from High Resolution Satellite Imagery
- 210827 Injecting Text in Self-Supervised Speech Pretraining #asr
- 210927 Compressive Visual Representations
- 211027 Neural Analysis and Synthesis #audio_synthesis
- 220124 data2vec
- 220216 Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
- 220520 Uniform Masking
- 220526 Green Hierarchical Vision Transformer for Masked Image Modeling
- 220526 MixMIM
- 220526 Revealing the Dark Secrets of Masked Image Modeling #representation
- 220715 Is a Caption Worth a Thousand Images #clip
- 220803 Masked Vision and Language Modeling for Multi-modal Representation Learning #mlm
self supervised discovery
- 200403 Self-Supervised Viewpoint Learning From Image Collections #viewpoint
- 201127 Unsupervised part representation by Flow Capsules
- 210429 MarioNette
semantic factor
- 200307 StyleGAN2 Distillation for Feed-forward Image Manipulation #stylegan
- 200308 PULSE #stylegan
- 200406 GANSpace
- 201222 Time-Travel Rephotography #restoration #stylegan
semantic segmentation
- 200323 Learning Dynamic Routing for Semantic Segmentation
- 200516 Single-Stage Semantic Segmentation from Image Labels
- 200826 EfficientFCN
- 210512 Segmenter
- 220918 SegNeXt
semi supervised learning
- 200218 DivideMix #mixup #noise #dataset
- 200306 Semi-Supervised StyleGAN for Disentanglement Learning #stylegan #mixup
- 200323 Meta Pseudo Labels #meta_learning
- 200627 Laplacian Regularized Few-Shot Learning #few_shot
- 200724 Deep Co-Training with Task Decomposition for Semi-Supervised Domain #domain_adaptation
- 201116 On the Marginal Benefit of Active Learning #active_learning #unsupervised_training
- 201118 FROST
- 220811 Semi-supervised Vision Transformers at Scale
- 220829 Open-Set Semi-Supervised Object Detection #open_set_recognition
- 220918 The Geometry of Self-supervised Learning Models and its Impact on Transfer Learning
sgld
singing voice synthesis
single image
speech
- 200129 Speech Recognition
- 200129 WaveFlow #conditional_generative_model
state space model
- 211031 Efficiently Modeling Long Sequences with Structured State Spaces
- 221017 What Makes Convolutional Models Great on Long Sequence Modeling
- 230213 Simple Hardware-Efficient Long Convolutions for Sequence Modeling
structure learning
style transfer
- 200318 A Content Transformation Block For Image Style Transfer
- 200324 Deformable Style Transfer
- 200710 Geometric Style Transfer
stylegan
- 210318 Labels4Free #unsupervised_segmentation
super resolution
table
text generation
text2img
tokenizer
topic model
topology
tracking
- 200402 Tracking Objects as Points #keypoint
- 200402 Tracking by Instance Detection #meta_learning
- 200403 FairMOT
- 200506 PeTra
- 201215 Detecting Invisible People
- 211013 ByteTrack
training
transducer
transfer
- 200130 BiT ResNet #resnet
- 200512 Neural Architecture Transfer #nas
- 200711 Adversarially-Trained Deep Nets Transfer Better #adversarial_training
- 200716 Do Adversarially Robust ImageNet Models Transfer Better #robust
- 200721 Adversarial Training Reduces Information and Improves Transferability #adversarial_training
- 201122 Ranking Neural Checkpoints
- 211012 Rethinking supervised pre-training for better downstream transferring #classificiation #metric_learning
transformer
- 200129 Are Transformers universal approximator
- 200129 Product Key Memory #attention
- 200129 Reformer #attention
- 200130 Sparse Transformer #generative_model
- 200130 Structured Pruning for LM #pruning
- 200207 Transformer Transducer #asr #transducer
- 200211 On Layer Normalization in the Transformer Architecture #normalization
- 200212 GLU Variants Improve Transformer #activation
- 200214 Transformer on a Diet #efficient_attention
- 200214 Transformers as Soft Reasoners over Language #language
- 200215 Fine-Tuning Pretrained Language Models #bert #finetuning
- 200221 Addressing Some Limitations of Transformers with Feedback Memory #recurrent
- 200305 Talking-Heads Attention #attention
- 200424 Lite Transformer with Long-Short Range Attention #lightweight
- 200515 Finding Experts in Transformer Models
- 200515 JDI-T #tts
- 200516 Conformer #asr
- 200518 Weak-Attention Suppression For Transformer Based Speech Recognition #asr
- 200605 Funnel-Transformer #efficient_attention
- 200707 Do Transformers Need Deep Long-Range Memory #lm #attention
- 200709 Fast Transformers with Clustered Attention #attention
- 200715 AdapterHub #nlp #finetuning
- 200727 Big Bird #attention
- 200802 DeLighT #nlp
- 201217 Taming Transformers for High-Resolution Image Synthesis #discrete_vae #generative_model #autoregressive_model
- 201221 RealFormer #attention
- 201227 SG-Net #syntax #attention
- 210223 Do Transformer Modifications Transfer Across Implementations and
- 210225 Evolving Attention with Residual Convolutions #attention
- 210318 HiT #video #retrieval
- 210318 Looking Beyond Two Frames #tracking
- 210318 TFPose #pose
- 210318 TransCenter #tracking
- 210318 Transformer Trackin #tracking
- 210407 Seeing Out of tHe bOx #multimodal #vision-language
- 210409 Efficient Large-Scale Language Model Training on GPU Clusters #distributed_training
- 210409 Not All Attention Is All You Need
- 210410 UniDrop #regularization
- 210417 Demystifying the Better Performance of Position Encoding Variants for #positional_encoding
- 210420 RoFormer #positional_encoding
- 210423 M3DeTR #3d
- 210509 FNet #efficient_attention #fourier
- 210613 Thinking Like Transformers
- 210617 Multi-head or Single-head
- 210730 Perceiver IO
- 210809 Making Transformers Solve Compositional Tasks
- 210812 Mobile-Former #backbone
- 210830 A Battle of Network Structures #cnn #mlp #backbone
- 210830 Shatter #bert
- 210908 Panoptic SegFormer #panoptic_segmentation #detr
- 210909 Bag of Tricks for Optimizing Transformer Efficiency #nmt #lightweight
- 210917 Primer #lm #nas
- 210922 Scale Efficiently
- 211018 NormFormer
- 211026 Hierarchical Transformers Are More Efficient Language Models #lm #efficient_attention
- 211122 MetaFormer is Actually What You Need for Vision #vit
- 211124 Sparse is Enough in Scaling Transformers #sparsity #efficiency
- 220221 Transformer Quality in Linear Time #efficient_attention #linear_attention #local_attention
- 220301 DeepNet #normalization
- 220330 Transformer Language Models without Positional Encodings Still Learn Positional Information #lm #positional_encoding
- 220924 In-context Learning and Induction Heads #in_context_learning
- 221004 MOAT #backbone
- 230209 In-Context Learning with Many Demonstration Examples #efficient_attention
tropical geometry
tts
uncertainty
unsupervised img2img
- 200310 Unpaired Image-to-Image Translation using Adversarial Consistency Loss
- 200611 Rethinking the Truly Unsupervised Image-to-Image Translation
- 201201 Unpaired Image-to-Image Translation via Latent Energy Transport
unsupervised nmt
vae
- 200420 Bringing Old Photos Back to Life #restoration
- 200707 NVAE
- 201119 Dual Contradistinctive Generative Autoencoder
- 201120 Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them
video
video transformer
vision
vision language
- 201212 MiniVLM
- 201222 Seeing past words
- 210407 Multimodal Fusion Refiner Networks
- 210727 Is Object Detection Necessary for Human-Object Interaction Recognition #human-object-interaction
- 211103 An Empirical Study of Training End-to-End Vision-and-Language Transformers #multimodal
- 220221 Vision-Language Pre-Training with Triple Contrastive Learning
- 220504 CoCa
- 220612 GLIPv2
- 220615 Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
- 220617 Bridge-Tower
- 220617 Unified-IO #multitask
- 220810 Patching open-vocabulary models by interpolating weights #clip #multitask #domain
- 220822 Image as a Foreign Language #mlm
- 230202 Multimodal Chain-of-Thought Reasoning in Language Models #multimodal
- 230209 Re-ViLM
vision transformer
- 201127 General Multi-label Image Classification with Transformers
- 201223 A Survey on Visual Transformer
- 201223 Training data-efficient image transformers & distillation through #distillation
- 210223 Pyramid Vision Transformer
- 210318 CrossViT
- 210318 CvT
- 210318 Multi-Scale Vision Longformer
- 210319 ConViT
- 210319 Scalable Visual Transformers with Hierarchical Pooling
- 210324 Vision Transformers for Dense Prediction #fpn
- 210325 Swin Transformer #local_attention
- 210331 Going deeper with Image Transformers
- 210402 LeViT
- 210421 Token Labeling
- 210422 Multiscale Vision Transformers
- 210422 So-ViT
- 210426 Improve Vision Transformers Training by Suppressing Over-smoothing
- 210426 Visformer
- 210427 ConTNet
- 210428 Twins #local_attention #positional_encoding
- 210509 Conformer
- 210515 Are Convolutional Neural Networks or Transformers more like human vision #cnn #inductive_bias
- 210517 Rethinking the Design Principles of Robust Vision Transformer #robustness
visual grounding
vit
- 210521 Intriguing Properties of Vision Transformers #robustness
- 210526 Aggregating Nested Transformers #local_attention
- 210529 Less is More
- 210603 DynamicViT #sparse_attention
- 210603 When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations #regularization
- 210604 RegionViT #local_attention
- 210607 Refiner #attention
- 210607 Shuffle Transformer
- 210608 Scaling Vision Transformers #scale
- 210609 CoAtNet
- 210614 Delving Deep into the Generalization of Vision Transformers under Distribution Shifts #robustness
- 210615 Revisiting the Calibration of Modern Neural Networks #mlp #calibration
- 210617 XCiT #efficient_attention
- 210624 Exploring Corruption Robustness #robustness #mlp
- 210624 VOLO #efficient_attention
- 210624 Video Swin Transformer #local_attention #video #video_transformer
- 210701 CSWin Transformer #efficient_attention #local_attention
- 210701 Focal Self-attention for Local-Global Interactions in Vision Transformers #local_attention
- 210705 What Makes for Hierarchical Vision Transformer #attention #mlp #local_attention
- 210713 Visual Parser #local_attention
- 210731 CrossFormer
- 210811 ConvNets vs. Transformers #robustness #transfer
- 210819 Do Vision Transformers See Like Convolutional Neural Networks #resnet
- 210908 Scaled ReLU Matters for Training Vision Transformers #cnn
- 211118 Swin Transformer V2
- 211202 Improved Multiscale Vision Transformers for Classification and Detection
- 211210 Deep ViT Features as Dense Visual Descriptors #self_supervised #semantic_segmentation
- 211217 A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation #multiscale
- 220214 How Do Vision Transformers Work #cnn
- 220414 DeiT III
- 220722 An Impartial Take to the CNN vs Transformer Robustness Contest #robustness #cnn
- 220812 BEiT v2 #self_supervised #mlm
- 221110 Demystify Transformers & Convolutions in Modern Image Deep Networks #cnn
- 230202 Dual PatchNorm #normalization
vocoder
vqa
weak supervision
yolo
uncategorized
- 09
- 200211 fastai
- 210224 Zero-Shot Text-to-Image Generation
- 210603 The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models
- 210606 Referring Transformer
- 210607 ViTAE
- 210614 Non Gaussian Denoising Diffusion Models
- 210909 PIMNet
- 211026 Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers
- 211028 Colossal-AI
- 211215 Value Retrieval with Arbitrary Queries for Form-like Documents
- 221125 Solving math word problems with process- and outcome-based feedback
- 221204 Languages You Know Influence Those You Learn
- 221215 Constitutional AI
- 220114 DeepSpeed-MoE
- 220203 AlphaCode, Formal Math
- 220204 InstructGPT
- 220323 Pathways
- 220329 Few Could Be Better Than All
- 220405 Text Spotting Transformers
- 220416 Benchmarking Generalization via In-Context Instructions on 1,600+ Language Tasks
- 220510 UL2
- 220610 A Multi-Task Benchmark for Korean Legal Language Understanding and Judgement Prediction
- 220614 RDU
- 220630 DeepSpeed Inference
- 220712 Inner Monologue
- 220720 NUWA-Infinity
- 220722 Multiface
- 220725 CelebV-HQ
- 220725 Neural Generation Meets Real People
- 220725 Towards Complex Document Understanding By Discrete Reasoning
- 220819 FP8 Quantization
- 220823 CLOWER
- 220912 FP8 Formats for Deep Learning
- 220923 Diffusion
- 220928 The Change You Want to See
- 221219 MatCha
- 230206 SmoothQuant
- 230207 Efficiently Upgrading Multilingual Machine Translation Models to Support More Languages
- 230207 FP8
- 230208 Google Configuration System
- 230209 Efficient Attention via Control Variates
- 230211 Generative AI์ ๋ํ ์๊ฐ
- 230213 Lossy Compression
- 230214 Adding Instructions during Pretraining
- 230214 Score-based Diffusion Models in Function Space
- 230220 DSP
- 230221 Anthropic
- 230222 FlexGen
- 230223 Colossal AI ChatGPT
- 230224 World Models