• Stars
    star
    126
  • Rank 284,543 (Top 6 %)
  • Language
    Python
  • Created about 1 year ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

GPT-4V in Wonderland: LMMs as Smartphone Agents

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation

Our code and evaluation benchmark will be out soon!

Demo

A demo figure using GPT-4V to shop on the Amazon app with an iphone:

Citation

If you find our work helpful to your research, please consider citing the paper:

@article{yan2023gpt,
  title={GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation},
  author={Yan, An and Yang, Zhengyuan and Zhu, Wanrong and Lin, Kevin and Li, Linjie and Wang, Jianfeng and Yang, Jianwei and Zhong, Yiwu and McAuley, Julian and Gao, Jianfeng and others},
  journal={arXiv preprint arXiv:2311.07562},
  year={2023}
}