Resources at the intersection of AI AND Art. Mainly tools and tutorials but also with some inspiring people and places thrown in too!
For a broader resource covering more general creative coding tools (that you might want to use with what is listed here), check out terkelg/awesome-creative-coding or thatcreativecode.page. For resources on AI and deep learning in general, check out ChristosChristofidis/awesome-deep-learning and https://github.com/dair-ai.
Contents
bold entries signify my favorite resource(s) for that section/subsection (if I HAD to choose a single resource). Additionally each subsection is usually ordered by specificity of content (most general listed first).
Learning
Courses
General Deep Learning
- Practical Deep Learning for Coders (fast.ai)
- Deep Learning (NYU)
- Introduction to Deep Learning (CMU)
⭐️ Deep Learning for Computer Vision (UMich)- Deep Learning for Computer Vision (Stanford CS231n)
- Natural Language Processing with Deep Learning (Stanford CS224n)
Deep Generative Modeling
- Deep Generative Models (Stanford)
- Deep Unsupervised Learning (UC Berkeley)
- Differentiable Inference and Generative Models (Toronto)
⭐️ Learning-Based Image Synthesis (CMU)- Learning Discrete Latent Structure (Toronto)
- From Deep Learning Foundations to Stable Diffusion (fast.ai)
Creative Coding and New Media
⭐️ Deep Learning for Art, Aesthetics, and Creativity (MIT)- Machine Learning for the Web (ITP/NYU)
- Art and Machine Learning (CMU)
- New Media Installation: Art that Learns (CMU)
- Introduction to Computational Media (ITP/NYU)
Videos
⭐️ The AI that creates any picture you want, explained (Vox)- I Created a Neural Network and Tried Teaching it to Recognize Doodles (Sebastian Lague)
- Neural Network Series (3Blue1Brown)
- Beginner's Guide to Machine Learning in JavaScript (Coding Train)
- Two Minute Papers
Books
⭐️ Dive into Deep Learning (Zhang, Lipton, Li, and Smola)- Deep Learning (Goodfellow, Bengio, and Courville)
- Computer Vision: Algorithms and Applications (Szeliski)
- Procedural Content Generation in Games (Shaker, Togelius, and Nelson)
- Generative Design (Benedikt Groß)
Tutorials and Blogs
Deep Learning
⭐️ VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance (Crowson and Biderman)- Tutorial on Deep Generative Models (IJCAI-ECAI 2018)
- Tutorial on GANs (CVPR 2018)
- Lil'Log (Lilian Weng)
- Distill [on hiatus]
Generative Art
⭐️ Making Generative Art with Simple Mathematics- Book of Shaders: Generative Designs
- Mike Bostock: Visualizing Algorithms (with Eyeo talk)
- Generative Examples in Processing
- Generative Music
Papers/Methods
Diffusion models (and text-to-image)
- SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations: Paper predating Stable Diffusion describing a method for image synthesis and editing with diffusion based models.
- GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
- High-Resolution Image Synthesis with Latent Diffusion Models: Original paper that introduced Stable Diffusion and started it all.
- Prompt-to-Prompt Image Editing with Cross-Attention Control: Edit Stable Diffusion outputs by editing the original prompt.
- An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion: Similar to prompt-to-prompt but instead takes an input image and a text description. Kinda like Style Transfer... but with Stable diffusion.
- DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation: Similar to Textual Inversion but instead focused on manipulating subject based images (i.e. this thing/person/etc. but underwater).
- Novel View Synthesis with Diffusion Models
- AudioGen: Textually Guided Audio Generation
- Make-A-Video: Text-to-Video Generation without Text-Video Data
- Imagic: Text-Based Real Image Editing with Diffusion Models
- MDM: Human Motion Diffusion Model
- Soft Diffusion: Score Matching for General Corruptions
- Multi-Concept Customization of Text-to-Image Diffusion: Like DreamBooth but capable of synthesizing multiple concepts.
- eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
- Elucidating the Design Space of Diffusion-Based Generative Models (EDM)
- Tackling the Generative Learning Trilemma with Denoising Diffusion GANs
- Imagen Video: High Definition Video Generation with Diffusion Models
Neural Radiance fields (and NeRF like things)
- Structure-from-Motion Revisited: prior work on sparse modeling (still needed/useful for NeRF)
- Pixelwise View Selection for Unstructured Multi-View Stereo: prior work on dense modeling (NeRF kinda replaces this)
- DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation
- Deferred Neural Rendering: Image Synthesis using Neural Textures
- Neural Volumes: Learning Dynamic Renderable Volumes from Images
⭐️ NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis: The paper that started it all...- Neural Radiance Fields for Unconstrained Photo Collections: NeRF in the wild (alternative to MVS)
- Nerfies: Deformable Neural Radiance Fields: Photorealistic NeRF from casual in-the-wild photos and videos (like from a cellphone)
- Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields: NeRF... but BETTER FASTER HARDER STRONGER
- Depth-supervised NeRF: Fewer Views and Faster Training for Free: Train NeRF models faster with fewer images by leveraging depth information
- Instant Neural Graphics Primitives with a Multiresolution Hash Encoding: caching for NeRF training to make it rlllly FAST
- Understanding Pure CLIP Guidance for Voxel Grid NeRF Models: text-to-3D using CLIP
- NeRF-SLAM: Real-Time Dense Monocular SLAM with Neural Radiance Fields: NeRF for robots (and cars)
- nerf2nerf: Pairwise Registration of Neural Radiance Fields: pretrained NeRF
- The One Where They Reconstructed 3D Humans and Environments in TV Shows
- ClimateNeRF: Physically-based Neural Rendering for Extreme Climate Synthesis
- Realistic one-shot mesh-based head avatars
- Neural Point Catacaustics for Novel-View Synthesis of Reflections
- 3D Moments from Near-Duplicate Photos
- NeRDi: Single-View NeRF Synthesis with Language-Guided Diffusion as General Image Priors
3D and point clouds
- DreamFusion: Text-to-3D using 2D Diffusion (Google)
- ULIP: Learning Unified Representation of Language, Image and Point Cloud for 3D Understanding (Salesforce)
- Extracting Triangular 3D Models, Materials, and Lighting From Images (NVIDIA)
- GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images (NVIDIA)
- 3D Neural Field Generation using Triplane Diffusion
🎠 MagicPony: Learning Articulated 3D Animals in the Wild- ObjectStitch: Generative Object Compositing (Adobe)
- LADIS: Language Disentanglement for 3D Shape Editing (Snap)
- Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion (Microsoft)
- SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation (Snap)
- DiffRF: Rendering-guided 3D Radiance Field Diffusion (Meta)
- Novel View Synthesis with Diffusion Models (Google)
⭐️ Magic3D: High-Resolution Text-to-3D Content Creation (NVIDIA)
Unconditional Image Synthesis
- Sampling Generative Networks
- Neural Discrete Representation Learning (VQVAE)
- Progressive Growing of GANs for Improved Quality, Stability, and Variation
- A Style-Based Generator Architecture for Generative Adversarial Networks (StyleGAN)
⭐️ Analyzing and Improving the Image Quality of StyleGAN (StyleGAN2)- Training Generative Adversarial Networks with Limited Data (StyleGAN2-ADA)
- Alias-Free Generative Adversarial Networks (StyleGAN3)
- Generating Diverse High-Fidelity Images with VQ-VAE-2
- Taming Transformers for High-Resolution Image Synthesis (VQGAN)
- Diffusion Models Beat GANs on Image Synthesis
- StyleNAT: Giving Each Head a New Perspective
- StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets
Conditional Image Synthesis (and inverse problems)
- Image-to-Image Translation with Conditional Adversarial Nets (pix2pix)
- Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (CycleGAN)
- High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs (pix2pixHD)
- Semantic Editing of Scenes by Adding, Manipulating or Erasing Objects (SESAME)
- Semantic Image Synthesis with Spatially-Adaptive Normalization (SPADE)
- You Only Need Adversarial Supervision for Semantic Image Synthesis (OASIS)
- Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation
- Multimodal Conditional Image Synthesis with Product-of-Experts GANs
- Palette: Image-to-Image Diffusion Models
- Sketch-Guided Text-to-Image Diffusion Models
- HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation
- PiPa: Pixel- and Patch-wise Self-supervised Learning for Domain Adaptative Semantic Segmentation
- MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation
- Pretraining is All You Need for Image-to-Image Translation (PITI)
GAN inversion (and editing)
- Generative Visual Manipulation on the Natural Image Manifold (iGAN)
- In-Domain GAN Inversion for Real Image Editing
- Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?
- Designing an Encoder for StyleGAN Image Manipulation
- Pivotal Tuning for Latent-based Editing of Real Images
⭐️ HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing- StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
- High-Fidelity GAN Inversion for Image Attribute Editing
- Swapping Autoencoder for Deep Image Manipulation
- Sketch Your Own GAN
- Rewriting Geometric Rules of a GAN
- Anycost GANs for Interactive Image Synthesis and Editing
- Third Time’s the Charm? Image and Video Editing with StyleGAN3
Latent Space Interpretation
⭐️ Discovering Interpretable GAN Controls (GANspace)- Interpreting the Latent Space of GANs for Semantic Face Editing
- GAN Dissection: Visualizing and Understanding Generative Adversarial Networks
- Unsupervised Extraction of StyleGAN Edit Directions (CLIP2StyleGAN)
- Seeing What a GAN Cannot Generate
Image Matting
- Deep Image Matting
- Background Matting: The World is Your Green Screen
- Robust Video Matting
- Semantic Image Matting
- Privacy-Preserving Portrait Matting
- Deep Automatic Natural Image Matting
- MatteFormer
- MODNet: Real-Time Trimap-Free Portrait Matting via Objective Decomposition
⭐️ Robust Human Matting via Semantic Guidance
Tools
Generative Modeling
- NVIDIA Imaginaire: 2D Image synthesis library
- NVIDIA Omniverse: The platform for creating and operating metaverse applications
- mmgeneration
- Modelverse: Content-Based Search for Deep Generative Models
- PaddleGAN
Creative ML
Deep Learning Frameworks
Runtimes/Deployment
- FFCV: an Optimized Data Pipeline for Accelerating ML Training
- ONNX Runtime
- DeepSpeed (training, inference, compression)
- TensorRT
- Tensorflow Lite
- TorchScript
- TorchServe
- AITemplate
Text-to-Image
⭐️ Stable Diffusion- Imagen
- DALLE 2
- VQGAN+CLIP
- Parti
- Muse: Text-To-Image Generation via Masked Generative Transformers: More efficient than diffusion or autoregressive text-to-image models used masked image modeling w/ transformers
Stable Diffusion (SD)
- Dream Studio: Official Stability AI cloud hosted service.
⭐️ Stable Diffusion Web UI: A user friendly UI for SD with additional features to make common workflows easy.- AI render (Blender): Render scenes in Blender using a text prompt.
- Dream Textures (Blender): Plugin to render textures, reference images, and background with SD.
- lexica.art - SD Prompt Search.
- koi (Krita): SD plugin for Krita for img2img generation.
- Alpaca (Photoshop): Photoshop plugin (beta).
- Christian Cantrell's Plugin (Photoshop): Another Photoshop plugin.
- Stable Diffusion Studio: Animation focused frontend for SD.
- DeepSpeed-MII: Low-latency and high-throughput inference for a variety (20,000+) models/tasks, including SD.
Neural Radiance Fields
Creative Coding
Frameworks
Visual Programming Languages
Datasets
- LAION Datasets: Various very large scale image-text pairs datasets (notably used to train the open source Stable Diffusion models)
- Unsplash Images
- Open Images: Open Images is a dataset of ~9M images annotated with image-level labels, object bounding boxes, object segmentation masks, visual relationships, and localized narratives:
- Mozilla Common Voice: 17,127 validated hours of transcribed speech covering 104 languages. Additionally many of the recorded hours in the dataset also include demographic metadata like age, sex, and accent that can help improve the accuracy of speech recognition engines.
Faces/People
- Labeled Faces in the Wild (LFW)
- CelebA
- LFWA+
- CelebAMask-HQ
- CelebA-Spoof
- UTKFace
- SSHQ: full body 1024 x 512px
Video
Products/Apps
- Artbreeder
- Midjourney
- DALLE 2 (OpenAI)
- Runway - AI powered video editor.
- Facet AI - AI powered image editor.
- Adobe Sensei - AI powered features for the Creative Cloud suite.
- NVIDIA AI Demos
- ClipDrop and cleanup.pictures
Artists
A non-exhaustive list of people doing interesting things at the intersection of art, ML, and design.
- Memo Akten
- Neural Bricolage (helena sarin)
- Sofia Crespo
- Lauren McCarthy
- Philipp Schmitt
- Anna Ridler
- Tom White
- Ivona Tau
- Trevor Paglen
- Sasha Stiles
- Mario Klingemann
- Tega Brain
- Mimi Onuoha
- Allison Parrish
- Caroline Sinders
- Robbie Barrat
- Kyle McDonald
- Golan Levin
Institutions/Places
- STUDIO for Creative Inquiry
- ITP @ NYU
- Gray Area Foundation for the Arts
- Stability AI (Eleuther, LAION, et al.)
- Goldsmiths @ University of London
- UCLA Design Media Arts
- Berkeley Center for New Media
- Google Artists and Machine Intelligence
- Google Creative Lab
- The Lab at the Google Cultural Institute
- Sony CSL (Tokyo and Paris)
Related lists and collections
- Machine Learning for Art
- Tools and Resources for AI Art (pharmapsychotic) - Big list of Google Colab notebooks for generative text-to-image techniques as well as general tools and resources.
- Awesome Generative Deep Art - A curated list of Generative Deep Art / Generative AI projects, tools, artworks, and models
Contributing
Contributions are welcome! Read the contribution guidelines first.