DreamLLM
DreamLLM: Synergistic Multimodal Comprehension and Creation
Runpei Dong, Chunrui Han, Yuang Peng, Zekun Qi, Zheng Ge, Jinrong Yang, Liang Zhao, Jianjian Sun, Hongyu Zhou, Haoran Wei, Xiangwen Kong, Xiangyu Zhang, Kaisheng Ma and Li Yi
DreamLLM is a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between multimodal comprehension and creation. DreamLLM operates on two fundamental principles. The first focuses on the generative modeling of both language and image posteriors by direct sampling in the raw multimodal space. Second, DreamLLM fosters the generation of raw, interleaved documents, modeling both text and image contents, along with unstructured layouts. DreamLLM is a zero-shot multimodal generalist capable of both comprehension and creation.
Code, model weights, and demo will be released soon.
Contact
If you have any questions related to the code or the paper, feel free to email Runpei Dong ([email protected]
).
License
Our model and weights are licensed for both researchers and commercial entities, upholding the principles of openness. The license is drafted by modification of the license of LLaMA.
See the LICENSE, as well as our accompanying Acceptable Use Policy.
Citation
If you find our work useful in your research, please consider citing DreamLLM:
@article{dong2023dreamllm,
author = {Dong, Runpei and Han, Chunrui and Peng, Yuang and Qi, Zekun and Ge, Zheng and Yang, Jinrong and Zhao, Liang and Sun, Jianjian and Zhou, Hongyu and Wei, Haoran and Kong, Xiangwen and Zhang, Xiangyu and Ma, Kaisheng and Yi, Li},
title = {DreamLLM: Synergistic Multimodal Comprehension and Creation},
journal = {arXiv preprint arXiv:2309.11499},
year = {2023},
}