lucidrains/ema-pytorch

Stars
408
Rank 105,946 (Top 3 %)
Language
Python
License
MIT License
Created over 2 years ago
Updated 8 months ago

lucidrains/ema-pytorch

lucidrains

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

A simple way to keep track of an Exponential Moving Average (EMA) version of your pytorch model

EMA - Pytorch

A simple way to keep track of an Exponential Moving Average (EMA) version of your pytorch model

Install

$ pip install ema-pytorch

Usage

import torch
from ema_pytorch import EMA

# your neural network as a pytorch module

net = torch.nn.Linear(512, 512)

# wrap your neural network, specify the decay (beta)

ema = EMA(
    net,
    beta = 0.9999,              # exponential moving average factor
    update_after_step = 100,    # only after this number of .update() calls will it start updating
    update_every = 10,          # how often to actually update, to save on compute (updates every 10th .update() call)
)

# mutate your network, with SGD or otherwise

with torch.no_grad():
    net.weight.copy_(torch.randn_like(net.weight))
    net.bias.copy_(torch.randn_like(net.bias))

# you will call the update function on your moving average wrapper

ema.update()

# then, later on, you can invoke the EMA model the same way as your network

data = torch.randn(1, 512)

output     = net(data)
ema_output = ema(data)

# if you want to save your ema model, it is recommended you save the entire wrapper
# as it contains the number of steps taken (there is a warmup logic in there, recommended by @crowsonkb, validated for a number of projects now)
# however, if you wish to access the copy of your model with EMA, then it will live at ema.ema_model

Todo

address the issue of annealing EMA to 1 near the end of training for BYOL lucidrains/byol-pytorch#82

vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

DALLE2-pytorch

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch

imagen-pytorch

Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch

PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

deep-daze

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun

denoising-diffusion-pytorch

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

stylegan2-pytorch

Simplest working implementation of Stylegan2, state of the art generative adversarial network, in Pytorch. Enabling everyone to experience disentanglement

musiclm-pytorch

Implementation of MusicLM, Google's new SOTA model for music generation using attention networks, in Pytorch

x-transformers

A simple but complete full-attention transformer with a set of promising experimental features from various papers

big-sleep

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun

audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch

lion-pytorch

🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly better than Adam(w), in Pytorch

toolformer-pytorch

Implementation of Toolformer, Language Models That Can Use Tools, by MetaAI

reformer-pytorch

Reformer, the efficient Transformer, in Pytorch

make-a-video-pytorch

Implementation of Make-A-Video, new SOTA text to video generator from Meta AI, in Pytorch

gigagan-pytorch

Implementation of GigaGAN, new SOTA GAN out of Adobe. Culmination of nearly a decade of research into GANs

alphafold2

To eventually become an unofficial Pytorch implementation / replication of Alphafold2, as details of the architecture get released

lightweight-gan

Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two

lambda-networks

Implementation of LambdaNetworks, a new approach to image recognition that reaches SOTA with less compute

byol-pytorch

Usable Implementation of "Bootstrap Your Own Latent" self-supervised learning, from Deepmind, in Pytorch

self-rewarding-lm-pytorch

Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI

naturalspeech2-pytorch

Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch

flamingo-pytorch

Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

video-diffusion-pytorch

Implementation of Video Diffusion Models, Jonathan Ho's new paper extending DDPMs to Video Generation - in Pytorch

soundstorm-pytorch

Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch

CoCa-pytorch

Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch

performer-pytorch

An implementation of Performer, a linear attention-based transformer, in Pytorch

perceiver-pytorch

Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch

RETRO-pytorch

Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch

mlp-mixer-pytorch

An All-MLP solution for Vision, from Google AI

muse-maskgit-pytorch

Implementation of Muse: Text-to-Image Generation via Masked Generative Transformers, in Pytorch

PaLM-pytorch

Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways

vector-quantize-pytorch

Vector Quantization, in Pytorch

phenaki-pytorch

Implementation of Phenaki Video, which uses Mask GIT to produce text guided videos of up to 2 minutes in length, in Pytorch

x-clip

A concise but complete implementation of CLIP with various experimental improvements from recent papers

bottleneck-transformer-pytorch

Implementation of Bottleneck Transformer in Pytorch

memorizing-transformers-pytorch

Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

TimeSformer-pytorch

Implementation of TimeSformer from Facebook AI, a pure attention-based solution for video classification

MEGABYTE-pytorch

Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch

meshgpt-pytorch

Implementation of MeshGPT, SOTA Mesh generation using Attention, in Pytorch

nuwa-pytorch

Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch

voicebox-pytorch

Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch

point-transformer-pytorch

Implementation of the Point Transformer layer, in Pytorch

parti-pytorch

Implementation of Parti, Google's pure attention-based text-to-image neural network, in Pytorch

tab-transformer-pytorch

Implementation of TabTransformer, attention network for tabular data, in Pytorch

alphafold3-pytorch

Implementation of Alphafold 3 in Pytorch

linear-attention-transformer

Transformer based on a variant of attention that is linear complexity in respect to sequence length

magvit2-pytorch

Implementation of MagViT2 Tokenizer in Pytorch

egnn-pytorch

Implementation of E(n)-Equivariant Graph Neural Networks, in Pytorch

g-mlp-pytorch

Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch

recurrent-memory-transformer-pytorch

Implementation of Recurrent Memory Transformer, Neurips 2022 paper, in Pytorch

ring-attention-pytorch

Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch

siren-pytorch

Pytorch implementation of SIREN - Implicit Neural Representations with Periodic Activation Function

enformer-pytorch

Implementation of Enformer, Deepmind's attention network for predicting gene expression, in Pytorch

iTransformer

Unofficial implementation of iTransformer - SOTA Time Series Forecasting using Attention networks, out of Tsinghua / Ant group

robotic-transformer-pytorch

Implementation of RT1 (Robotic Transformer) in Pytorch

memory-efficient-attention-pytorch

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

FLASH-pytorch

Implementation of the Transformer variant proposed in "Transformer Quality in Linear Time"

bit-diffusion

Implementation of Bit Diffusion, Hinton's group's attempt at discrete denoising diffusion, in Pytorch

medical-chatgpt

Implementation of ChatGPT, but tailored towards primary care medicine, with the reward being able to collect patient histories in a thorough and efficient manner and come up with a reasonable differential diagnosis

slot-attention

Implementation of Slot Attention from GoogleAI

q-transformer

Implementation of Q-Transformer, Scalable Offline Reinforcement Learning via Autoregressive Q-Functions, out of Google Deepmind

BS-RoFormer

Implementation of Band Split Roformer, SOTA Attention network for music source separation out of ByteDance AI Labs

classifier-free-guidance-pytorch

Implementation of Classifier Free Guidance in Pytorch, with emphasis on text conditioning, and flexibility to include multiple text embedding models

transformer-in-transformer

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

axial-attention

Implementation of Axial attention - attending to multi-dimensional data efficiently

conformer

Implementation of the convolutional module from the Conformer paper, for use in Transformers

mixture-of-experts

A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models

deformable-attention

Implementation of Deformable Attention in Pytorch from the paper "Vision Transformer with Deformable Attention"

magic3d-pytorch

Implementation of Magic3D, Text to 3D content synthesis, in Pytorch

x-unet

Implementation of a U-net complete with efficient attention as well as the latest research findings

routing-transformer

Fully featured implementation of Routing Transformer

Adan-pytorch

Implementation of the Adan (ADAptive Nesterov momentum algorithm) Optimizer in Pytorch

spear-tts-pytorch

Implementation of Spear-TTS - multi-speaker text-to-speech attention network, in Pytorch

st-moe-pytorch

Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch

perfusion-pytorch

Implementation of Key-Locked Rank One Editing, from Nvidia AI

equiformer-pytorch

Implementation of the Equiformer, SE3/E3 equivariant attention network that reaches new SOTA, and adopted for use by EquiFold for protein folding

segformer-pytorch

Implementation of Segformer, Attention + MLP neural network for segmentation, in Pytorch

sinkhorn-transformer

Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention

pixel-level-contrastive-learning

Implementation of Pixel-level Contrastive Learning, proposed in the paper "Propagate Yourself", in Pytorch

lumiere-pytorch

Implementation of Lumiere, SOTA text-to-video generation from Google Deepmind, in Pytorch

local-attention

An implementation of local windowed attention for language modeling

CoLT5-attention

Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch

natural-speech-pytorch

Implementation of the neural network proposed in Natural Speech, a text-to-speech generator that is indistinguishable from human recordings for the first time, from Microsoft Research

soft-moe-pytorch

Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch

se3-transformer-pytorch

Implementation of SE3-Transformers for Equivariant Self-Attention, in Pytorch. This specific repository is geared towards integration with eventual Alphafold2 replication.

block-recurrent-transformer-pytorch

Implementation of Block Recurrent Transformer - Pytorch

Mega-pytorch

Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena

simple-hierarchical-transformer

Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT

med-seg-diff-pytorch

Implementation of MedSegDiff in Pytorch - SOTA medical segmentation using DDPM and filtering of features in fourier space

triton-transformer

Implementation of a Transformer, but completely in Triton

jax2torch

Use Jax functions in Pytorch

flash-cosine-sim-attention

Implementation of fused cosine similarity attention in the same style as Flash Attention

halonet-pytorch

Implementation of the 😇 Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones

attention

This repository will house a visualization that will attempt to convey instant enlightenment of how Attention works to someone not working in artificial intelligence, with 3Blue1Brown as inspiration

recurrent-interface-network-pytorch

Implementation of Recurrent Interface Network (RIN), for highly efficient generation of images and video without cascading networks, in Pytorch

electra-pytorch

A simple and working implementation of Electra, the fastest way to pretrain language models from scratch, in Pytorch

PaLM-jax

Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways - in Jax (Equinox framework)

unet-stylegan2

A Pytorch implementation of Stylegan2 with UNet Discriminator