MVDream
Yichun Shi, Peng Wang, Jianglong Ye, Long Mai, Kejie Li, Xiao Yang
| Project Page | 3D Generation | Paper | HuggingFace Demo (Coming) |
3D Generation
- This repository only includes the diffusion model and 2D image generation code of MVDream paper.
- For 3D Generation, please check MVDream-threestudio.
Installation
You can use the same environment as in Stable-Diffusion for this repo. Or you can set up the environment by installing the given requirements
pip install -r requirements.txt
To use MVDream as a python module, you can install it by pip install -e .
or:
pip install git+https://github.com/bytedance/MVDream
Model Card
Our models are provided on the Huggingface Model Page with the OpenRAIL license.
Model | Base Model | Resolution |
---|---|---|
sd-v2.1-base-4view | Stable Diffusion 2.1 Base | 4x256x256 |
sd-v1.5-4view | Stable Diffusion 1.5 | 4x256x256 |
By default, we use the SD-2.1-base model in our experiments.
Note that you don't have to manually download the checkpoints for the following scripts.
Text-to-Image
You can simply generate multi-view images by running the following command:
python scripts/t2i.py --text "an astronaut riding a horse"
We also provide a gradio script to try out with GUI:
python scripts/gradio_app.py
Usage
Load the Model
We provide two ways to load the models of MVDream:
- Automatic: load the model config with model name and weights from huggingface.
from mvdream.model_zoo import build_model
model = build_model("sd-v2.1-base-4view")
- Manual: load the model with a config file and a checkpoint file.
from omegaconf import OmegaConf
from mvdream.ldm.util import instantiate_from_config
config = OmegaConf.load("mvdream/configs/sd-v2-base.yaml")
model = instantiate_from_config(config.model)
model.load_state_dict(torch.load("path/to/sd-v2.1-base-4view.th", map_location='cpu'))
Inference
Here is a simple example for model inference:
import torch
from mvdream.camera_utils import get_camera
model.eval()
model.cuda()
with torch.no_grad():
noise = torch.randn(4,4,32,32, device="cuda") # batch of 4x for 4 views, latent size 32=256/8
t = torch.tensor([999]*4, dtype=torch.long, device="cuda") # same timestep for 4 views
cond = {
"context": model.get_learned_conditioning([""]*4).cuda(), # text embeddings
"camera": get_camera(4).cuda(),
"num_frames": 4,
}
eps = model.apply_model(noise, t, cond=cond)
Acknowledgement
This repository is heavily based on Stable Diffusion. We would like to thank the authors of these work for publicly releasing their code.
Citation
@article{shi2023MVDream,
author = {Shi, Yichun and Wang, Peng and Ye, Jianglong and Mai, Long and Li, Kejie and Yang, Xiao},
title = {MVDream: Multi-view Diffusion for 3D Generation},
journal = {arXiv:2308.16512},
year = {2023},
}