• Stars
    star
    376
  • Rank 113,184 (Top 3 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 1 year ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation

DreamLLM

DreamLLM: Synergistic Multimodal Comprehension and Creation

Runpei Dong, Chunrui Han, Yuang Peng, Zekun Qi, Zheng Ge, Jinrong Yang, Liang Zhao, Jianjian Sun, Hongyu Zhou, Haoran Wei, Xiangwen Kong, Xiangyu Zhang, Kaisheng Ma and Li Yi

DreamLLM is a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between multimodal comprehension and creation. DreamLLM operates on two fundamental principles. The first focuses on the generative modeling of both language and image posteriors by direct sampling in the raw multimodal space. Second, DreamLLM fosters the generation of raw, interleaved documents, modeling both text and image contents, along with unstructured layouts. DreamLLM is a zero-shot multimodal generalist capable of both comprehension and creation.

Code, model weights, and demo will be released soon.

Contact

If you have any questions related to the code or the paper, feel free to email Runpei Dong ([email protected]).

License

Our model and weights are licensed for both researchers and commercial entities, upholding the principles of openness. The license is drafted by modification of the license of LLaMA.

See the LICENSE, as well as our accompanying Acceptable Use Policy.

Citation

If you find our work useful in your research, please consider citing DreamLLM:

@article{dong2023dreamllm,
  author = {Dong, Runpei and Han, Chunrui and Peng, Yuang and Qi, Zekun and Ge, Zheng and Yang, Jinrong and Zhao, Liang and Sun, Jianjian and Zhou, Hongyu and Wei, Haoran and Kong, Xiangwen and Zhang, Xiangyu and Ma, Kaisheng and Yi, Li},
  title = {DreamLLM: Synergistic Multimodal Comprehension and Creation},
  journal = {arXiv preprint arXiv:2309.11499},
  year = {2023},
}