sail-sg/BindDiffusion

Stars
140
Rank 261,473 (Top 6 %)
Language
Python
License
Apache License 2.0
Created over 1 year ago
Updated over 1 year ago

sail-sg/BindDiffusion

sail-sg

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

BindDiffusion: One Diffusion Model to Bind Them All

BindDiffusion: One Diffusion Model to Bind Them All

Inspired by the recent progress in multimodality learning (ImageBind), we explore the idea of using one single diffusion model for multimodality-based image generation. Noticeably, we leverage a pre-trained diffusion model to comsume conditions from diverse or even mixed modalities. This design allows many novel applications, such as audio-to-image, without any additional training. This repo is still under development. Please stay tuned!

Acknowledgement: This repo is based on the following amazing projects: Stable Diffusion, ImageBind.

Install

pip install -r requirements.txt

Pretrained checkpoints

cd checkpoints;
wget https://huggingface.co/stabilityai/stable-diffusion-2-1-unclip/blob/main/sd21-unclip-h.ckpt;
wget https://dl.fbaipublicfiles.com/imagebind/imagebind_huge.pth;

An Jupyter Notebook for beginners

Image-conditioned generation:

python main_bind.py --prompt <prompt> --device cuda --modality image \
--H 768 --W 768 \ 
--config ./configs/stable-diffusion/v2-1-stable-unclip-h-bind-inference.yaml \
--ckpt ./checkpoints/sd21-unclip-h.ckpt \
--noise-level <noise-level> --init <init-img> --strength <strength-level>

Audio-conditioned generation:

python main_bind.py --prompt <prompt> --device cuda --modality audio \
--H 768 --W 768 \
--config ./configs/stable-diffusion/v2-1-stable-unclip-h-bind-inference.yaml \
--ckpt ./checkpoints/sd21-unclip-h.ckpt \
--strength <strength-level> --noise-level <noise-level> --init <init-audio>

Naive mixed-modality generation:

python main_multi_bind.py --prompt <prompt> --device cuda \
--H 768 --W 768 \
--config ./configs/stable-diffusion/v2-1-stable-unclip-h-bind-inference.yaml \
--ckpt ./checkpoints/sd21-unclip-h.ckpt \
--noise-level <noise-level> --init-image <init-img> --init-audio <init-audio> \
--alpha <alpha>

Contributors

We welcome contributions and suggestions from anyone interested in this fun project!

Feel free to explore the profiles of our contributors:

We appreciate your interest and look forward to your involvement!

EditAnything

Edit anything in images powered by segment-anything, ControlNet, StableDiffusion, etc. (ACM MM)

poolformer

PoolFormer: MetaFormer Is Actually What You Need for Vision (CVPR 2022 Oral)

envpool

C++-based high-performance parallel environment execution engine (vectorized env) for general RL environments.

volo

VOLO: Vision Outlooker for Visual Recognition

Jupyter Notebook

Adan

Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models

MDT

Masked Diffusion Transformer is the SOTA for image synthesis. (ICCV 2023)

metaformer

MetaFormer Baselines for Vision (TPAMI 2024)

lorahub

The official repository of paper "LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition".

mvp

NeurIPS-2021: Direct Multi-view Multi-person 3D Human Pose Estimation

CLoT

CVPR'24, Official Codebase of our Paper: "Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation".

inceptionnext

InceptionNeXt: When Inception Meets ConvNeXt (CVPR 2024)

iFormer

iFormer: Inception Transformer

ptp

[CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》

sailor-llm

⚓️ Sailor: Open Language Models for South-East Asia

FDM

The official PyTorch implementation of Fast Diffusion Model

mugs

A PyTorch implementation of Mugs proposed by our paper "Mugs: A Multi-Granular Self-Supervised Learning Framework".

Agent-Smith

[ICML2024] Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast

sdft

[ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".

symbolic-instruction-tuning

The official repository for the paper "From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning".

scaling-with-vocab

📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623

ScaleLong

The official repository of paper "ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection" (NeurIPS 2023)

VGT

Video Graph Transformer for Video Question Answering (ECCV'22)

jax_xc

Exchange correlation functionals translated from libxc to jax

d4ft

A JAX library for Density Functional Theory.

finetune-fair-diffusion

Code of the paper: Finetuning Text-to-Image Diffusion Models for Fairness

dice

Official implementation of Bootstrapping Language Models via DPO Implicit Rewards

ILD

Imitation Learning via Differentiable Physics

GP-Nerf

Official implementation for GP-NeRF (ECCV 2022)

Consistent3D

The official PyTorch implementation of Consistent3D (CVPR 2024)

edp

[NeurIPS 2023] Efficient Diffusion Policy

rosmo

Codes for "Efficient Offline Policy Optimization with a Learned Model", ICLR2023

MMCBench

GDPO

Graph Diffusion Policy Optimization

dualformer

hloenv

an environment based on XLA for deep learning compiler optimization research.

DiffMemorize

On Memorization in Diffusion Models

optim4rl

Optim4RL is a Jax framework of learning to optimize for reinforcement learning.

TEC

numcc

NU-MCC: Multiview Compressive Coding with Neighborhood Decoder and Repulsive UDF

PatchAIL

Implementation of PatchAIL in the ICLR 2023 paper <Visual Imitation with Patch Rewards>

offbench

OPER

code for the paper Offline Prioritized Experience Replay

Jupyter Notebook

win

P-DoS

[ArXiv 2024] Denial-of-Service Poisoning Attacks on Large Language Models

sailcompass

SLRLA-optimizer

Cheating-LLM-Benchmarks

Jupyter Notebook

I-FSJ

Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses

MISA

[NeurIPS 2023] Mutual Information Regularized Offline Reinforcement Learning