fastdiffusion
- Big resource list: What's the score? Review of latest Score Based Generative Modeling papers.
- List of diffusion papers: Diffusion Reading Group
- labml.ai Annotated PyTorch Paper Implementations
Useful resources
- Stable Diffusion with
𧨠Diffusers - Huggingface noteboooks
- Simple diffusion from Johno
- Introduction to Diffusion Models for Machine Learning - AssemblyAI
- Tutorial - What is a variational autoencoder?
- "Grokking Stable Diffusion" from Johno
- Grokking SD Part 2: Textual Inversion
- What are Diffusion Models? Β· Lilian Weng
- Generative Modeling by Estimating Gradients of the Data Distribution (Yang Song)
- The Annotated Diffusion Model
- Understanding VQ-VAE (DALL-E Explained Pt. 1)
- Diffusers Interpret. Model explainability, could be adapted to show some nice instructive plots.
- Denoising Diffusion Probabilistic Model in Flax by YiYi Xu, includes P2 weighting, self-conditioning, and EMA
- A Travelerβs Guide to the Latent Space
- Denoising diffusion probabilistic models - math+code tutorials in 4 notebooks
- Two articles from Sander Dieleman
Additional papers
-
Diffusion Models Beat GANs on Image Synthesis, Dhariwal & Nichol 2021.
Proposes architecture improvements (as of the state of the art in 2021, i.e. DDPM and DDIM) that could give some insight when we write models from scratch. In addition, introduces classifier guidance to improve conditional image synthesis. This was later replaced by classifier-free guidance, but using a classifier looks like the natural thing to do for conditional generation.
-
Fast Sampling of Diffusion Models with Exponential Integrator.
DEIS Scheduler. Authors claim excellent sampling results with as few as 12 steps. I haven't read it yet.
Application-oriented papers
Some of these tricks could be effective / didactic.
-
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion.
"Text Inversion": create new text embeddings from a few sample images. This effectively introduces new terms in the vocabulary that can be used in phrases for text to image generation.
-
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation.
Similar goal as the text inversion paper, but different approach I think (I haven't read it yet).
-
Prompt-to-Prompt Image Editing with Cross Attention Control, Hertz et al. 2022.
Manipulate the cross-attention layers to produce changes in the text-to-image generation by replacing words, introducing new terms or weighting the importance of existing terms.
-
VToonify: Controllable High-Resolution Portrait Video Style Transfer
High-quality and temporally-coherent artistic portrait videos with flexible style controls.
Improvements on simple diffusion
- Better denoising autoencoder (diffusion model)
- Unet
- Attention
- P2 weighting
- EMA
- Self-conditioning
- Predict noise / gradient (Score based diffusion)
- Latent diffusion (can not be a unet)
- Attention
- Better loss functions
- Perceptual + MSE + GAN (in the VAE)
- Preconditioning/scaling inputs and outputs
- Other crappifiers
- Data augmentation
- Better samplers / optimisers
- Initialisers such as pixelshuffle
- Learnable blur
- Blur noise
Applications
- Style transfer
- Super-res
- Colorisation
- Remove jpeg noise
- Remove watermarks
- Deblur
- CycleGAN / Pixel2Pixel -> change subject/location/weather/etc
Diffusion Applications and Demos
-
Stable Diffusion fine-tuning (for specific styles or domains).
-
Stable Diffusion morphing / videos. Code by @nateraw based on a gist by @karpathy.
-
Image Variations. Demo, with links to code. Use the CLIP image embeddings as conditioning for the generation, instead of the text embeddings. This requires fine-tuning of the model because, as far as I understand it, the text and image embeddings are not aligned in the embedding space. CLOOB doesn't have this limitation, but I heard (source: Boris Dayma from a conversation with Katherine Crowson) that attempting to train a diffusion model with CLOOB conditioning instead of CLIP produced less variety of results.
-
Image to image generation. Demo sketch -> image.
Style Transfer
- Vincent's work: https://github.com/VinceMarron/style_transfer/blob/master/vgg_styletrans.py
- Johno's implementation of that plus some different style loss variants: https://colab.research.google.com/drive/1nTcswqeDmiW67WjEaQ8lAZP9v_5gKjCB?usp=sharing
- Insporation for the Sliced OT version: https://www.youtube.com/watch?v=ZFYZFlY7lgI&t=10s (Aside: NCA are super cool, I want to research them more as soon as the course craziness subsides)
- ImStack (which I like over just optimizing raw pixels): https://johnowhitaker.github.io/imstack/
- Q: fast style transfer (where a network does one-shot stylization) what networks and tricks seem to work best?
- Q: Do augmentations help with Getys style style transfer? TODO Johno test
- Q: What layers give good results? Would a different network to VGG16 be better?
Other model ideas
- Latent space models
- Imagenet
- CLIP
- Noisy clip