• Stars
    star
    203
  • Rank 186,813 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created over 1 year ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Playing around with stable diffusion. Generated images are reproducible because I save the metadata and latent information. You can generate and then later interpolate between the images of your choice.

Stable Diffusion Playground | 💻 + 🎨 = ❤️

Welcome to stable diffusion playground! Use this repo to generate cool images!

Also - you get reproducibility for free! You'll know exactly how you created all of your images.

The metadata and latent information is stored inside of the image and into a npy file respectively.

Here are some images I generated using the prompt: a painting of an ai robot having an epiphany moment:

If you generate something cool, tag me on Twitter 🐦 @gordic_aleksa - I'd love to see what you create.

Setup

Follow the next steps to run this code:

  1. git clone https://github.com/gordicaleksa/stable_diffusion_playground
  2. Open Anaconda console and navigate into project directory cd path_to_repo
  3. Run conda env create from project directory (this will create a brand new conda environment).
  4. Run activate sd_playground (for running scripts from your console or setup the interpreter in your IDE)
  5. Run huggingface-cli login before the first time you try to use it to access model weights.

That's it! It should work out-of-the-box executing environment.yml file which deals with dependencies.

Important note: you have to locally patch the pipeline_stable_diffusion.py file from the diffusers 0.2.4 lib using the code from the main branch. The changes I rely (having latents as an argument) on still haven't propagated to the pip package.

How to use this code

The script can be run using an IDE (such as vscode, PyCharm, etc.) but it can also be run via command line thanks to fire package. fire makes things much more concise than using argparse! E.g. if there is an argument in the generate_images function with name <arg_name> then you can call python generate_images.py --<arg_name> <arg_value>.

Next up - a brief explanation of certain script arguments.

output_dir_name is the name of the output directory.

  • Your images will be stored at output/<output_dir_name>/imgs.
  • Your latents will be stored at output/<output_dir_name>/latents.
  • Your metadata will be stored inside of the user_comment exif tag if save_metadata_to_img==True otherwise it'll be saved to output/<output_dir_name>/metadata.

All of this relative to from where you're running the code.

prompt, guidance_scale, seed, num_inference_steps are the main knobs you have at your disposal to control image generation. Check out the code comments for more info.

Finally, the script has 3 modes of execution - let me explain each of them below.

GENERATE_DIVERSE mode

Set execution_mode=ExecutionMode.GENERATE_DIVERSE.

It will generate num_imgs images (of widthxheight resolution) and store them (as well as other info as described above) into the output file structure.

Use the main knobs as described above to control the content and quality of the image.

Here are some images I generated using this mode:

INTERPOLATE mode

Set execution_mode=INTERPOLATE.

There are 2 ways to run this mode:

  1. Run GENERATE_DIVERSE and pick the 2 images you like. Grab paths to their latents (you'll find them under output/<output_dir_name>/latents) and specify them inside of src_latent_path and trg_latent_path. After this the code will spherically interpolate num_imgs between them and by doing that generate a (mostly) smooth transition from source image into the target one.
  2. Don't specify the latents - they will be generated on the fly so you won't know how your source and target image look like upfront. Everything else remains the same.

As an example I'll take the 2 images from above and interpolate between them here is the resulting grid:

Note: I generated 200 images but had to subsample to only 32 for this grid image. But in general there are always sudden jumps in the decoded image space unless you move with very fine steps through the latent space.

REPRODUCE mode

Set execution_mode=REPRODUCE.

This one is more for debugging purposes.

Specify src_latent_path and metadata_path. For metadata_path specify either the actual metadata .json file path or simply the image path if it contains the metadata (this depends on save_metadata_to_img flag).

After this the script will reconstruct the original image - showcasing the reproducibility.

Hardware requirements

You need a GPU that has at least 8 GBs of VRAM to run this at 512x512 in fp16 precision.

If you wish to run it in fp32 precision you will need ~16 GBs of VRAM (unless you're willing to sacrifice resolution).

fp16 flag controls whether you load the fp16 or fp32 weights.

Learning material

Here is a video walk-through of this repo:

Getting started with Stable Diffusion

(the commit I used in the video is this one)

And here is a deep dive video going through the stable diffusion codebase:

How does Stable Diffusion work

Connect With Me

💼 LinkedIn 🐦 Twitter 👨‍👩‍👧‍👦 Discord

📺 YouTube 📚 Medium 💻 GitHub 📢 AI Newsletter - one day heh

Acknowledgements

Took inspiration from Karpathy's gist.

Licence

License: MIT

More Repositories

1

pytorch-GAT

My implementation of the original GAT paper (Veličković et al.). I've additionally included the playground.py file for visualizing the Cora dataset, GAT embeddings, an attention mechanism, and entropy histograms. I've supported both Cora (transductive) and PPI (inductive) examples!
Jupyter Notebook
2,253
star
2

pytorch-original-transformer

My implementation of the original transformer model (Vaswani et al.). I've additionally included the playground.py file for visualizing otherwise seemingly hard concepts. Currently included IWSLT pretrained models.
Jupyter Notebook
880
star
3

get-started-with-JAX

The purpose of this repo is to make it easy to get started with JAX, Flax, and Haiku. It contains my "Machine Learning with JAX" series of tutorials (YouTube videos and Jupyter Notebooks) as well as the content I found useful while learning about the JAX ecosystem.
Jupyter Notebook
546
star
4

pytorch-GANs

My implementation of various GAN (generative adversarial networks) architectures like vanilla GAN (Goodfellow et al.), cGAN (Mirza et al.), DCGAN (Radford et al.), etc.
Python
366
star
5

Open-NLLB

Effort to open-source NLLB checkpoints.
Python
364
star
6

pytorch-deepdream

PyTorch implementation of DeepDream algorithm (Mordvintsev et al.). Additionally I've included playground.py to help you better understand basic concepts behind the algo.
Jupyter Notebook
352
star
7

pytorch-neural-style-transfer

Reconstruction of the original paper on neural style transfer (Gatys et al.). I've additionally included reconstruction scripts which allow you to reconstruct only the content or the style of the image - for better understanding of how NST works.
Python
343
star
8

pytorch-learn-reinforcement-learning

A collection of various RL algorithms like policy gradients, DQN and PPO. The goal of this repo will be to make it a go-to resource for learning about RL. How to visualize, debug and solve RL problems. I've additionally included playground.py for learning more about OpenAI gym, etc.
Python
140
star
9

pytorch-neural-style-transfer-johnson

Reconstruction of the fast neural style transfer (Johnson et al.). Some portions of the paper have been improved by the follow-up work like the instance normalization, etc. Checkout transformer_net.py's header for details.
Python
110
star
10

serbian-llm-eval

Serbian LLM Eval.
Python
81
star
11

pytorch-naive-video-neural-style-transfer

Create naive (no temporal loss) NST for videos with person segmentation. Just place your videos in data/, run and you get your stylized and segmented videos.
Python
73
star
12

OpenGemini

Effort to open-source 10.5 trillion parameter Gemini model.
17
star
13

gordicaleksa

GitHub's new feature: repo with the same name as your GitHub name initialized with README.md will show on your landing page!
12
star
14

digital-image-processing

Projects I did for the Digital Image Processing course on my university
MATLAB
7
star
15

streamlit_playground

Simple Streamlit app.
Python
4
star
16

Open-NLLB-stopes

A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) for the Open-NLLB effort.
Python
3
star
17

MachineLearningMicrosoftPetnica

Problems I solved for Microsoft ML summer camp in Petnica, Serbia
C++
3
star
18

competitive_programming

Contains algorithms and snippets I found useful when solving problems for TopCoder, Google Code Jam etc.
C++
2
star
19

slovenian-llm-eval

Slovenian LLM Eval.
Python
2
star
20

MicrosoftBubbleCup2018

My solutions for Bubble Cup 2018
C++
1
star
21

.dotfiles

Configuration files for my vim editor, bash etc.
Shell
1
star
22

GoogleCodeJam2018

My solutions for Google Code Jam 2018
C++
1
star