Text2Light: Zero-Shot Text-Driven HDR Panorama Generation
TOG 2022 (Proc. SIGGRAPH Asia)
TL;DR
Our high-quality results can be directly applied to downstream tasks, e.g., light 3D scenes and immersive VR.
Text2Light can generate HDR panoramas in 4K+ resolution using free-form texts solely. Project Page | Video | Paper | Colab
Updates
[05/2023] Release text descriptions used during inference.
[09/2022] Our online demo in Colab is released!
[09/2022] Paper uploaded to arXiv.
[09/2022] Model weights released.
[09/2022] Code released.
Citation
If you find our work useful for your research, please consider citing this paper:
@article{chen2022text2light,
title={Text2Light: Zero-Shot Text-Driven HDR Panorama Generation},
author={Chen, Zhaoxi and Wang, Guangcong and Liu, Ziwei},
journal={ACM Transactions on Graphics (TOG)},
volume={41},
number={6},
articleno={195},
pages={1--16},
year={2022},
publisher={ACM New York, NY, USA}
}
Installation
We highly recommend using Anaconda to manage your python environment. You can setup the required environment by the following command:
conda env create -f environment.yml
conda activate text2light
Text-driven HDRI Generation
You may do the following steps to generate HDR panoramas from free-form texts with our models.
Download Pretrained Models
Please download our checkpoints from Google Drive to run the following inference scripts. We use the model trained on our full dataset by default (local_sampler
). Note that we also release models that trained on outdoor (local_sampler_outdoor
) and indoor (local_sampler_indoor
) scenes respectively.
All-in-one Inference Script
All inference codes are in text2light.py, you can learn to use it by:
python text2light.py -h
Here are some examples, the output will be saved in ./generated_panorama
:
-
Generate a HDR panorama from a single sentence:
python text2light.py -rg logs/global_sampler_clip -rl logs/local_sampler_outdoor --outdir ./generated_panorama --text "YOUR SCENE DESCRIPTION" --clip clip_emb.npy --sritmo ./logs/sritmo.pth --sr_factor 4
-
Generate HDR panoramas from a list of texts:
# assume your texts is stored in alt.txt python text2light.py -rg logs/global_sampler_clip -rl logs/local_sampler_outdoor --outdir ./generated_panorama --text ./alt.txt --clip clip_emb.npy --sritmo ./logs/sritmo.pth --sr_factor 4
-
Generate low-resolution (512x1024) LDR panoramas only:
# assume your texts is stored in alt.txt python text2light.py -rg logs/global_sampler_clip -rl logs/local_sampler_outdoor --outdir ./generated_panorama --text ./alt.txt --clip clip_emb.npy
Here are some examples of Text2Light in generating HDRIs. The generated results can be directly used to render 3D scenes like Barcelona Pavillion from Blender Demo Files.
Rendering
Our generated HDR panoramas can be directly used in any modern graphics pipeline as the environment texture and light source. Here we take Blender as an example.
From GUI
Open Blender -> Select Shading
Panel -> Select Shader Type
as World
-> Add an Environment Texture
node -> Browse and select our generated panoramas -> Render
You can also refer to this tutorial.
Here is an example of rendering a landscape in San Francisco using the HDRI with input texts as landscape photography of mountain ranges under purple and pink skies
.
From Command line
For the ease of batch processing, e.g. rendering with multiple HDRIs, we offer scripts in command line for rendering your 3D assets.
- Download the Linux version of Blender from Blender Download Page.
- Unpack it and check the usage of Blender:
# assume your downloaded version is 3.1.2 tar -xzvf blender-3.1.2-linux-x64.tar.xz cd blender-3.1.2-linux-x64 ./blender --help
- Add an alias to your .bashrc or .zshrc:
# PATH_TO_DOWNLOADED_BLENDER indicates the parent directory where you save the downloaded blender alias blender="/PATH_TO_DOWNLOADED_BLENDER/blender-3.1.2-linux-x64/blender"
- Back to the codebase of Text2Light, and run the following commands for different rendering setup:
- Render four shader balls given all HDRIs stored at
PATH_TO_HDRI
The results will be saved inblender --background --python rendering_shader_ball.py -- ./rendered_balls 100 1000 PATH_TO_HDRI
./rendered_balls
which looks like: - Render four shader balls given all HDRIs stored at
Training
Our training is stage-wise with multiple steps. The details are listed as follows.
Data Preparation
Assume all your HDRIs for training are stored at PATH_TO_HDR_DATA
, please run process_hdri.py to process the data:
python process_hdri.py --src PATH_TO_HDR_DATA
The processed data will be saved to ./data
by default and organized as follows:
βββ ...
βββ Text2Light/
βββ data/
βββ train/
βββ calib_hdr
βββ ldr
βββ raw_hdr
βββ val/
βββ calib_hdr
βββ ldr
βββ raw_hdr
βββ meta/
Stage I - Text-driven LDR Panorama Generation
The training stage1 is launched by train_stage1.py, you can check the usage by:
python train_stage1.py -h
- Train the global codebook
python train_stage1.py --base configs/global_codebook.yaml -t True --gpu 0,1,2,3,4,5,6,7
- Train the local codebook
python train_stage1.py --base configs/local_codebook.yaml -t True --gpu 0,1,2,3,4,5,6,7
- Train the text-conditioned global sampler. Please specify the path to global codebook in the config YAML.
python train_stage1.py --base configs/global_sampler_clip.yaml -t True --gpu 0,1,2,3,4,5,6,7
- Train the structure-aware local sampler. Please specify the path to global and local codebooks in the config YAML, respectively.
python train_stage1.py --base configs/local_sampler_spe.yaml -t True --gpu 0,1,2,3,4,5,6,7
Stage II - Super-resolution Inverse Tonemapping
The training stage2 is launched by train_stage2.py, you can check the usage by:
python train_stage2.py -h
The default setting can be trained on a single A100 GPU without DDP:
# assume you use the default --dst_dir in process_hdri.py, thus the hdr dataset would be stored in ./data
python train_stage2.py --dir ./data --save_dir ./output/bs32_7e-5 --workers 16 --val_ep 5 --gpu 0
To enable distributed training, for example, over 8 GPUs:
python train_stage2.py --dir ./data --save_dir ./output/bs32_7e-5 --workers 8 --val_ep 5 --ddp
Acknowledgements
This work is supported by the National Research Foundation, Singapore under its AI Singapore Programme, NTU NAP, MOE AcRF Tier 2 (T2EP20221-0033), and under the RIE2020 Industry Alignment Fund - Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s).
Text2Light is implemented on top of the VQGAN codebase. We also thanks CLIP and LIIF for their released models and codes. Thanks this repo for its amazing command line rendering toolbox.