There are no reviews yet. Be the first to send feedback to the community and the maintainers!
MiniGPT-5
Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"photoswap
Official implementation of the NeurIPS 2023 paper "Photoswap: Personalized Subject Swapping in Images"PEViT
Official implementation of AAAI 2023 paper "Parameter-efficient Model Adaptation for Vision Transformers"CPL
Official implementation of our EMNLP 2022 paper "CPL: Counterfactual Prompt Learning for Vision and Language Models"ComCLIP
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"Discffusion
Official repo for the paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"llm_coordination
Code repository for the paper "LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models"Screen-Point-and-Read
Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"MMWorld
Official repo of the paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"Aerial-Vision-and-Dialog-Navigation
Codebase of the ACL 2023 (Findings) Paper "Aerial Vision-and-Dialog Navigation"FedVLN
[ECCV 2022] Official pytorch implementation of the paper "FedVLN: Privacy-preserving Federated Vision-and-Language Navigation"Mitigate-Gender-Bias-in-Image-Search
Code for the EMNLP 2021 Oral paper "Are Gender-Neutral Queries Really Gender-Neutral? Mitigating Gender Bias in Image Search" https://arxiv.org/abs/2109.05433ACLToolBox
PECTVLM
Code implementation for Findings of EMNLP 2023 paper "Parameter-Efficient Cross-lingual Transfer of Vision and Language Models via Translation-based Alignment"T2IAT
T2IAT: Measuring Valence and Stereotypical Biases in Text-to-Image GenerationMSSBench
Official codebase for the paper "Multimodal Situational Safety"ViCor
This is the implementation of ACL 2024 Findings paper ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Modelsvia-video
Love Open Source and this site? Check out how you can help us