Top Rating
- Top Contributors
  Discover the Top Open Source contributors by country or by language
- Interviews
  Discover real stories from Open Source developers
Discover

Discover your Favorite Language
Discover the top trending repositories and projects on Github. Explore the latest trends in your preferred languages.

Scala

Ruby

CoffeeScript

Go

Lua

C++

PHP

Nix

More Languages
Awesome

Awesome repositories
Discover the most awesome repositories and projects of your favorite languages. Inspired by the Awesome-* lists trend in GitHub.

JavaScript

Nix

C

Kotlin

Java

Groovy

Dart

Erlang

More Languages
By Country

Rankings by Country
Discover the community of talented open source contributors in each country.

🇦🇩 Andorra

🇵🇪 Peru

🇹🇳 Tunisia

🇳🇴 Norway

🇮🇷 Iran

🇪🇭 Western Sahara

🇷🇸 Serbia

🇧🇴 Bolivia

All Countries Compare Countries

lucidrains/attention

Stars
189
Rank 204,649 (Top 5 %)
Language
HTML
License
MIT License
Created over 2 years ago
Updated about 1 year ago

lucidrains/attention

lucidrains

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

This repository will house a visualization that will attempt to convey instant enlightenment of how Attention works to someone not working in artificial intelligence, with 3Blue1Brown as inspiration

Dive into Deep Learning, redone by Quanta Magazine

Attention (wip)

This repository will house a visualization that will attempt to convey instant enlightenment of how Attention works, in the field of artificial intelligence. Obviously I believe this algorithm to be one of the most important developments in the history of deep learning. We can possibly use it to solve, well, everything.

In my mind, one good intuitive visualization can bring about more insight and understanding than long highly paid tutoring / courses.

Why does it work?

Attention has many interpretations, ranging from physics based intepretations to speculations on biological plausibility.

Update: Recently, three papers have concurrently closed in on a connection between self-attention and gradient descent, while investigating in-context learning properties of Transformers!

What has Attention accomplished?

Will keep adding to this list as time goes on

Other resources

Is it all we need?

No one really knows. All I know is, if we were to dethrone attention with a better algorithm, it is over. Part of what motivates me to do some scalable 21st century teaching is the hope maybe someone can find a way to improve on it, or find its replacement. It just takes one discovery!

Potential improvements

Simplicial Hopfield networks

Appreciation

Large thanks goes to 3Blue1Brown for showing us that complex mathematics can be taught with such elegance and potency through visualizations

Citations

@misc{vaswani2017attention,
    title   = {Attention Is All You Need},
    author  = {Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin},
    year    = {2017},
    eprint  = {1706.03762},
    archivePrefix = {arXiv},
    primaryClass = {cs.CL}
}

@article{Bahdanau2015NeuralMT,
    title   = {Neural Machine Translation by Jointly Learning to Align and Translate},
    author  = {Dzmitry Bahdanau and Kyunghyun Cho and Yoshua Bengio},
    journal = {CoRR},
    year    = {2015},
    volume  = {abs/1409.0473}
}

Gotta teach the AGI to love. - Ilya Sutskever ♥

vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

DALLE2-pytorch

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch

imagen-pytorch

Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch

PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

deep-daze

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun

denoising-diffusion-pytorch

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

stylegan2-pytorch

Simplest working implementation of Stylegan2, state of the art generative adversarial network, in Pytorch. Enabling everyone to experience disentanglement

musiclm-pytorch

Implementation of MusicLM, Google's new SOTA model for music generation using attention networks, in Pytorch

x-transformers

A simple but complete full-attention transformer with a set of promising experimental features from various papers

big-sleep

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun

audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch

lion-pytorch

🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly better than Adam(w), in Pytorch

toolformer-pytorch

Implementation of Toolformer, Language Models That Can Use Tools, by MetaAI

reformer-pytorch

Reformer, the efficient Transformer, in Pytorch

make-a-video-pytorch

Implementation of Make-A-Video, new SOTA text to video generator from Meta AI, in Pytorch

gigagan-pytorch

Implementation of GigaGAN, new SOTA GAN out of Adobe. Culmination of nearly a decade of research into GANs

alphafold2

To eventually become an unofficial Pytorch implementation / replication of Alphafold2, as details of the architecture get released

lightweight-gan

Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two

lambda-networks

Implementation of LambdaNetworks, a new approach to image recognition that reaches SOTA with less compute

byol-pytorch

Usable Implementation of "Bootstrap Your Own Latent" self-supervised learning, from Deepmind, in Pytorch

self-rewarding-lm-pytorch

Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI

naturalspeech2-pytorch

Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch

flamingo-pytorch

Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

video-diffusion-pytorch

Implementation of Video Diffusion Models, Jonathan Ho's new paper extending DDPMs to Video Generation - in Pytorch

soundstorm-pytorch

Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch

CoCa-pytorch

Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch

performer-pytorch

An implementation of Performer, a linear attention-based transformer, in Pytorch

perceiver-pytorch

Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch

RETRO-pytorch

Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch

mlp-mixer-pytorch

An All-MLP solution for Vision, from Google AI

muse-maskgit-pytorch

Implementation of Muse: Text-to-Image Generation via Masked Generative Transformers, in Pytorch

PaLM-pytorch

Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways

vector-quantize-pytorch

Vector Quantization, in Pytorch

phenaki-pytorch

Implementation of Phenaki Video, which uses Mask GIT to produce text guided videos of up to 2 minutes in length, in Pytorch

x-clip

A concise but complete implementation of CLIP with various experimental improvements from recent papers

bottleneck-transformer-pytorch

Implementation of Bottleneck Transformer in Pytorch

memorizing-transformers-pytorch

Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

TimeSformer-pytorch

Implementation of TimeSformer from Facebook AI, a pure attention-based solution for video classification

MEGABYTE-pytorch

Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch

meshgpt-pytorch

Implementation of MeshGPT, SOTA Mesh generation using Attention, in Pytorch

nuwa-pytorch

Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch

voicebox-pytorch

Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch

point-transformer-pytorch

Implementation of the Point Transformer layer, in Pytorch

parti-pytorch

Implementation of Parti, Google's pure attention-based text-to-image neural network, in Pytorch

tab-transformer-pytorch

Implementation of TabTransformer, attention network for tabular data, in Pytorch

alphafold3-pytorch

Implementation of Alphafold 3 in Pytorch

linear-attention-transformer

Transformer based on a variant of attention that is linear complexity in respect to sequence length

magvit2-pytorch

Implementation of MagViT2 Tokenizer in Pytorch

ema-pytorch

A simple way to keep track of an Exponential Moving Average (EMA) version of your pytorch model

egnn-pytorch

Implementation of E(n)-Equivariant Graph Neural Networks, in Pytorch

g-mlp-pytorch

Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch

recurrent-memory-transformer-pytorch

Implementation of Recurrent Memory Transformer, Neurips 2022 paper, in Pytorch

ring-attention-pytorch

Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch

siren-pytorch

Pytorch implementation of SIREN - Implicit Neural Representations with Periodic Activation Function

enformer-pytorch

Implementation of Enformer, Deepmind's attention network for predicting gene expression, in Pytorch

iTransformer

Unofficial implementation of iTransformer - SOTA Time Series Forecasting using Attention networks, out of Tsinghua / Ant group

robotic-transformer-pytorch

Implementation of RT1 (Robotic Transformer) in Pytorch

memory-efficient-attention-pytorch

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

FLASH-pytorch

Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"

bit-diffusion

Implementation of Bit Diffusion, Hinton's group's attempt at discrete denoising diffusion, in Pytorch

medical-chatgpt

Implementation of ChatGPT, but tailored towards primary care medicine, with the reward being able to collect patient histories in a thorough and efficient manner and come up with a reasonable differential diagnosis

slot-attention

Implementation of Slot Attention from GoogleAI

q-transformer

Implementation of Q-Transformer, Scalable Offline Reinforcement Learning via Autoregressive Q-Functions, out of Google Deepmind

BS-RoFormer

Implementation of Band Split Roformer, SOTA Attention network for music source separation out of ByteDance AI Labs

classifier-free-guidance-pytorch

Implementation of Classifier Free Guidance in Pytorch, with emphasis on text conditioning, and flexibility to include multiple text embedding models

transformer-in-transformer

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

axial-attention

Implementation of Axial attention - attending to multi-dimensional data efficiently

conformer

Implementation of the convolutional module from the Conformer paper, for use in Transformers

mixture-of-experts

A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models

deformable-attention

Implementation of Deformable Attention in Pytorch from the paper "Vision Transformer with Deformable Attention"

magic3d-pytorch

Implementation of Magic3D, Text to 3D content synthesis, in Pytorch

x-unet

Implementation of a U-net complete with efficient attention as well as the latest research findings

routing-transformer

Fully featured implementation of Routing Transformer

Adan-pytorch

Implementation of the Adan (ADAptive Nesterov momentum algorithm) Optimizer in Pytorch

spear-tts-pytorch

Implementation of Spear-TTS - multi-speaker text-to-speech attention network, in Pytorch

st-moe-pytorch

Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch

perfusion-pytorch

Implementation of Key-Locked Rank One Editing, from Nvidia AI

equiformer-pytorch

Implementation of the Equiformer, SE3/E3 equivariant attention network that reaches new SOTA, and adopted for use by EquiFold for protein folding

segformer-pytorch

Implementation of Segformer, Attention + MLP neural network for segmentation, in Pytorch

sinkhorn-transformer

Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention

pixel-level-contrastive-learning

Implementation of Pixel-level Contrastive Learning, proposed in the paper "Propagate Yourself", in Pytorch

lumiere-pytorch

Implementation of Lumiere, SOTA text-to-video generation from Google Deepmind, in Pytorch

local-attention

An implementation of local windowed attention for language modeling

CoLT5-attention

Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch

natural-speech-pytorch

Implementation of the neural network proposed in Natural Speech, a text-to-speech generator that is indistinguishable from human recordings for the first time, from Microsoft Research

soft-moe-pytorch

Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch

se3-transformer-pytorch

Implementation of SE3-Transformers for Equivariant Self-Attention, in Pytorch. This specific repository is geared towards integration with eventual Alphafold2 replication.

block-recurrent-transformer-pytorch

Implementation of Block Recurrent Transformer - Pytorch

Mega-pytorch

Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena

simple-hierarchical-transformer

Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT

med-seg-diff-pytorch

Implementation of MedSegDiff in Pytorch - SOTA medical segmentation using DDPM and filtering of features in fourier space

triton-transformer

Implementation of a Transformer, but completely in Triton

jax2torch

Use Jax functions in Pytorch

flash-cosine-sim-attention

Implementation of fused cosine similarity attention in the same style as Flash Attention

halonet-pytorch

Implementation of the 😇 Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones

recurrent-interface-network-pytorch

Implementation of Recurrent Interface Network (RIN), for highly efficient generation of images and video without cascading networks, in Pytorch

electra-pytorch

A simple and working implementation of Electra, the fastest way to pretrain language models from scratch, in Pytorch

PaLM-jax

Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways - in Jax (Equinox framework)

unet-stylegan2

A Pytorch implementation of Stylegan2 with UNet Discriminator