ML Papers Explained

Explanations to key concepts in ML

Language Models

Paper	Date	Description
Transformer	June 2017	An Encoder Decoder model, that introduced multihead attention mechanism for language translation task.
Elmo	February 2018	Deep contextualized word representations that captures both intricate aspects of word usage and contextual variations across language contexts.
GPT	June 2018	A Decoder only transformer which is autoregressively pretrained and then finetuned for specific downstream tasks using task-aware input transformations.
BERT	October 2018	Introduced pre-training for Encoder Transformers. Uses unified architecture across different tasks.
Transformer XL	January 2019	Extends the original Transformer model to handle longer sequences of text by introducing recurrence into the self-attention mechanism.
XLNet	June 2019	Extension of the Transformer-XL, pre-trained using a new method that combines ideas from AR and AE objectives.
RoBERTa	July 2019	Built upon BERT, by carefully optimizing hyperparameters and training data size to improve performance on various language tasks .
Sentence BERT	August 2019	A modification of BERT that uses siamese and triplet network structures to derive sentence embeddings that can be compared using cosine-similarity.
Tiny BERT	September 2019	Uses attention transfer, and task specific distillation for distilling BERT.
ALBERT	September 2019	Presents certain parameter reduction techniques to lower memory consumption and increase the training speed of BERT.
Distil BERT	October 2019	Distills BERT on very large batches leveraging gradient accumulation, using dynamic masking and without the next sentence prediction objective.
T5	October 2019	A unified encoder-decoder framework that converts all text-based language problems into a text-to-text format.
BART	October 2019	A Decoder pretrained to reconstruct the original text from corrupted versions of it.
FastBERT	April 2020	A speed-tunable encoder with adaptive inference time having branches at each transformer output to enable early outputs.
MobileBERT	April 2020	Compressed and faster version of the BERT, featuring bottleneck structures, optimized attention mechanisms, and knowledge transfer.
Longformer	April 2020	Introduces a linearly scalable attention mechanism, allowing handling texts of exteded length.
DeBERTa	June 2020	Enhances BERT and RoBERTa through disentangled attention mechanisms, an enhanced mask decoder, and virtual adversarial training.
Codex	July 2021	A GPT language model finetuned on publicly available code from GitHub.
FLAN	September 2021	An instruction-tuned language model developed through finetuning on various NLP datasets described by natural language instructions.
Gopher	December 2021	Provides a comprehensive analysis of the performance of various Transformer models across different scales upto 280B on 152 tasks.
Instruct GPT	March 2022	Fine-tuned GPT using supervised learning (instruction tuning) and reinforcement learning from human feedback to align with user intent.
Chinchilla	March 2022	Investigated the optimal model size and number of tokens for training a transformer LLM within a given compute budget (Scaling Laws).
PALM	April 2022	A 540-B parameter, densely activated, Transformer, trained using Pathways, (ML system that enables highly efficient training across multiple TPU Pods).
OPT	May 2022	A suite of decoder-only pre-trained transformers with parameter ranges from 125M to 175B. OPT-175B being comparable to GPT-3.
BLOOM	November 2022	A 176B-parameter open-access decoder-only transformer, collaboratively developed by hundreds of researchers, aiming to democratize LLM technology.
Galactica	November 2022	An LLM trained on scientific data thus specializing in scientific knowledge.
ChatGPT	November 2022	An interactive model designed to engage in conversations, built on top of GPT 3.5.
LLaMA	February 2023	A collection of foundation LLMs by Meta ranging from 7B to 65B parameters, trained using publicly available datasets exclusively.
Alpaca	Marcg 2023	A fine-tuned LLaMA 7B model, trained on instruction-following demonstrations generated in the style of self-instruct using text-davinci-003.

Vision Models

Paper	Date	Description
Vision Transformer	October 2020	Images are segmented into patches, which are treated as tokens and a sequence of linear embeddings of these patches are input to a Transformer
DeiT	December 2020	A convolution-free vision transformer that uses a teacher-student strategy with attention-based distillation tokens.
Swin Transformer	March 2021	A hierarchical vision transformer that uses shifted windows to addresses the challenges of adapting the transformer model to computer vision.
BEiT	June 2021	Utilizes a masked image modeling task inspired by BERT in, involving image patches and visual tokens to pretrain vision Transformers.
MobileViT	October 2021	A lightweight vision transformer designed for mobile devices, effectively combining the strengths of CNNs and ViTs.
Masked AutoEncoder	November 2021	An encoder-decoder architecture that reconstructs input images by masking random patches and leveraging a high proportion of masking for self-supervision.

Convolutional Neural Networks

Paper	Date	Description
Lenet	December 1998	Introduced Convolutions.
Alex Net	September 2012	Introduced ReLU activation and Dropout to CNNs. Winner ILSVRC 2012.
VGG	September 2014	Used large number of filters of small size in each layer to learn complex features. Achieved SOTA in ILSVRC 2014.
Inception Net	September 2014	Introduced Inception Modules consisting of multiple parallel convolutional layers, designed to recognize different features at multiple scales.
Inception Net v2 / Inception Net v3	December 2015	Design Optimizations of the Inception Modules which improved performance and accuracy.
Res Net	December 2015	Introduced residual connections, which are shortcuts that bypass one or more layers in the network. Winner ILSVRC 2015.
Inception Net v4 / Inception ResNet	February 2016	Hybrid approach combining Inception Net and ResNet.
Dense Net	August 2016	Each layer receives input from all the previous layers, creating a dense network of connections between the layers, allowing to learn more diverse features.
Xception	October 2016	Based on InceptionV3 but uses depthwise separable convolutions instead on inception modules.
Res Next	November 2016	Built over ResNet, introduces the concept of grouped convolutions, where the filters in a convolutional layer are divided into multiple groups.
Mobile Net V1	April 2017	Uses depthwise separable convolutions to reduce the number of parameters and computation required.
Mobile Net V2	January 2018	Built upon the MobileNetv1 architecture, uses inverted residuals and linear bottlenecks.
Mobile Net V3	May 2019	Uses AutoML to find the best possible neural network architecture for a given problem.
Efficient Net	May 2019	Uses a compound scaling method to scale the network's depth, width, and resolution to achieve a high accuracy with a relatively low computational cost.
Conv Mixer	January 2022	Processes image patches using standard convolutions for mixing spatial and channel dimensions.

Single Stage Object Detectors

Paper	Date	Description
SSD	December 2015	Discretizes bounding box outputs over a span of various scales and aspect ratios per feature map.
Feature Pyramid Network	December 2016	Leverages the inherent multi-scale hierarchy of deep convolutional networks to efficiently construct feature pyramids.
Focal Loss	August 2017	Addresses class imbalance in dense object detectors by down-weighting the loss assigned to well-classified examples.

Region-based Convolutional Neural Networks

Paper	Date	Description
RCNN	November 2013	Uses selective search for region proposals, CNNs for feature extraction, SVM for classification followed by box offset regression.
Fast RCNN	April 2015	Processes entire image through CNN, employs RoI Pooling to extract feature vectors from ROIs, followed by classification and BBox regression.
Faster RCNN	June 2015	A region proposal network (RPN) and a Fast R-CNN detector, collaboratively predict object regions by sharing convolutional features.
Mask RCNN	March 2017	Extends Faster R-CNN to solve instance segmentation tasks, by adding a branch for predicting an object mask in parallel with the existing branch.

Document AI

Paper	Date	Description
Table Net	January 2020	An end-to-end deep learning model designed for both table detection and structure recognition.
Donut	November 2021	An OCR-free Encoder-Decoder Transformer model. The encoder takes in images, decoder takes in prompts & encoded images to generate the required text.
DiT	March 2022	An Image Transformer pre-trained (self-supervised) on document images
UDoP	December 2022	Integrates text, image, and layout information through a Vision-Text-Layout Transformer, enabling unified representation.

Layout Transformers

Paper	Date	Description
Layout LM	December 2019	Utilises BERT as the backbone, adds two new input embeddings: 2-D position embedding and image embedding (Only for downstream tasks).
LamBERT	February 2020	Utilises RoBERTa as the backbone and adds Layout embeddings along with relative bias.
Layout LM v2	December 2020	Uses a multi-modal Transformer model, to integrate text, layout, and image in the pre-training stage, to learn end-to-end cross-modal interaction.
Structural LM	May 2021	Utilises BERT as the backbone and feeds text, 1D and (2D cell level) embeddings to the transformer model.
Doc Former	June 2021	Encoder-only transformer with a CNN backbone for visual feature extraction, combines text, vision, and spatial features through a multi-modal self-attention layer.
LiLT	February 2022	Introduced Bi-directional attention complementation mechanism (BiACM) to accomplish the cross-modal interaction of text and layout.
Layout LM V3	April 2022	A unified text-image multimodal Transformer to learn cross-modal representations, that imputs concatenation of text embedding and image embedding.
ERNIE Layout	October 2022	Reorganizes tokens using layout information, combines text and visual embeddings, utilizes multi-modal transformers with spatial aware disentangled attention.

Tabular Deep Learning

Paper	Date	Description
Entity Embeddings	April 2016	Maps categorical variables into continuous vector spaces through neural network learning, revealing intrinsic properties.
Wide and Deep Learning	June 2016	Combines memorization of specific patterns with generalization of similarities.
Deep and Cross Network	August 2017	Combines the a novel cross network with deep neural networks (DNNs) to efficiently learn feature interactions without manual feature engineering.
Tab Transformer	December 2020	Employs multi-head attention-based Transformer layers to convert categorical feature embeddings into robust contextual embeddings.
Tabular ResNet	June 2021	An MLP with skip connections.
Feature Tokenizer Transformer	June 2021	Transforms all features (categorical and numerical) to embeddings and applies a stack of Transformer layers to the embeddings.

Miscellaneous

Paper	Date	Description
ColD Fusion	December 2022	A method enabling the benefits of multitask learning through distributed computation without data sharing and improving model performance.

Literature Reviewed

Reading Lists

Reach out to Ritvik or Elvis if you have any questions.

If you are interested to contribute, feel free to open a PR.

Join our Discord

dair-ai/ML-Papers-Explained

dair-ai

Reviews

Repository Details

ML Papers Explained

Language Models

Vision Models

Convolutional Neural Networks

Single Stage Object Detectors

Region-based Convolutional Neural Networks

Document AI

Layout Transformers

Tabular Deep Learning

Miscellaneous

Literature Reviewed

Reading Lists

More Repositories