• Stars
    star
    2
  • Language
    Python
  • License
    MIT License
  • Created 8 months ago
  • Updated 7 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Bring multimodality to the LLaMA model by leveraging ImageBind as the modal encoder. This project supports vision input (both images and short videos) to the LLaMA model, with text output generated by LLaMA.