• Stars
    star
    19
  • Rank 1,163,249 (Top 23 %)
  • Language
    Python
  • Created 5 months ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"

More Repositories

1

MiniGPT-5

Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
Python
840
star
2

photoswap

Official implementation of the NeurIPS 2023 paper "Photoswap: Personalized Subject Swapping in Images"
Jupyter Notebook
307
star
3

PEViT

Official implementation of AAAI 2023 paper "Parameter-efficient Model Adaptation for Vision Transformers"
Python
94
star
4

CPL

Official implementation of our EMNLP 2022 paper "CPL: Counterfactual Prompt Learning for Vision and Language Models"
Python
32
star
5

ComCLIP

Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
Python
30
star
6

Discffusion

Official repo for the paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"
Python
26
star
7

llm_coordination

Code repository for the paper "LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models"
Python
21
star
8

MMWorld

Official repo of the paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
Python
18
star
9

Aerial-Vision-and-Dialog-Navigation

Codebase of the ACL 2023 (Findings) Paper "Aerial Vision-and-Dialog Navigation"
Python
14
star
10

FedVLN

[ECCV 2022] Official pytorch implementation of the paper "FedVLN: Privacy-preserving Federated Vision-and-Language Navigation"
C++
13
star
11

Mitigate-Gender-Bias-in-Image-Search

Code for the EMNLP 2021 Oral paper "Are Gender-Neutral Queries Really Gender-Neutral? Mitigating Gender Bias in Image Search" https://arxiv.org/abs/2109.05433
Python
12
star
12

ACLToolBox

Python
8
star
13

PECTVLM

Code implementation for Findings of EMNLP 2023 paper "Parameter-Efficient Cross-lingual Transfer of Vision and Language Models via Translation-based Alignment"
Smalltalk
7
star
14

T2IAT

T2IAT: Measuring Valence and Stereotypical Biases in Text-to-Image Generation
Python
7
star
15

MSSBench

Official codebase for the paper "Multimodal Situational Safety"
Python
6
star
16

Naivgation-as-wish

Official implementation of the NAACL 2024 paper "Navigation as Attackers Wish? Towards Building Robust Embodied Agents under Federated Learning"
Python
5
star
17

ViCor

This is the implementation of ACL 2024 Findings paper ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models
3
star
18

via-video

1
star