• Stars
    star
    40
  • Rank 680,660 (Top 14 %)
  • Language
    Python
  • License
    MIT License
  • Created 5 months ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[GenRL] Multimodal foundation world models allow grounding language and video prompts into embodied domains, by turning them into sequences of latent world model states. Latent state sequences can be decoded using the decoder of the model, allowing visualization of the expected behavior, before training the agent to execute it.