• Stars
    star
    106
  • Rank 325,871 (Top 7 %)
  • Language
  • Created 3 months ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

πŸ“– This is a repository for organizing papers, codes and other resources related to unified multimodal models.

More Repositories

1

Awesome-Video-Diffusion

A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
3,195
star
2

Show-1

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation
Python
1,089
star
3

Tune-A-Video

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
Python
1,010
star
4

Image2Paragraph

[A toolbox for fun.] Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.
Python
781
star
5

MotionDirector

MotionDirector: Motion Customization of Text-to-Video Diffusion Models.
Python
747
star
6

Show-o

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
Python
684
star
7

VideoSwap

Code for [CVPR 2024] VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
342
star
8

Awesome-MLLM-Hallucination

πŸ“– A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
340
star
9

all-in-one

[CVPR2023] All in One: Exploring Unified Video-Language Pre-training
Python
277
star
10

BoxDiff

[ICCV 2023] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
Python
239
star
11

DeVRF

The Pytorch implementation of "DeVRF: Fast Deformable Voxel Radiance Fields for Dynamic Scenes"
Python
179
star
12

EgoVLP

[NeurIPS2022] Egocentric Video-Language Pretraining
Python
140
star
13

VisorGPT

[NeurIPS 2023] Customize spatial layouts for conditional image synthesis models, e.g., ControlNet, using GPT
Python
129
star
14

Awesome-GUI-Agent

πŸ’» A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
109
star
15

ShowAnything

Jupyter Notebook
79
star
16

cosmo

Python
70
star
17

loveu-tgve-2023

Official GitHub repository for the Text-Guided Video Editing (TGVE) competition of LOVEU Workshop @ CVPR'23.
Python
68
star
18

sparseformer

(ICLR 2024, CVPR 2024) SparseFormer
Python
62
star
19

datacentric.vlp

Compress conventional Vision-Language Pre-training data
Python
48
star
20

Region_Learner

The Pytorch implementation for "Video-Text Pre-training with Learned Regions"
Python
42
star
21

ShowRoom3D

This is the project page of ShowRoom3D
24
star
22

Long-form-Video-Prior

Python
22
star
23

DemoVLP

[Arxiv2022] Revitalize Region Feature for Democratizing Video-Language Pre-training
Python
21
star
24

CLVQA

[AAAI2023 (Oral)] Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task
Python
19
star
25

BYOC

[IEEE-VR 2024] Bring Your Own Character: A Holistic Solution for Automatic Facial Animation Generation of Customized Characters
C#
19
star
26

Q2A

[ECCV 2022] AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
Python
18
star
27

HOSNeRF

This is the project page for the HOSNeRF
JavaScript
15
star
28

headshot

12
star
29

GEB-Plus

[ECCV 2022] GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval
Python
12
star
30

LOVA3

[NeurIPS 2024] "Learning to Visual Question Answering, Asking and Assessment"
Python
12
star
31

Show-Anything-3D

Edit and Generate Anything in 3D world!
11
star
32

Awesome-Long-Context

A curated list of resources about long-context in large-language models and video understanding.
10
star
33

SCT

[IJCV2023] Offical implementation of "SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels"
Python
10
star
34

VisInContext

Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
Python
9
star
35

SOIS

The Pytorch implementation of "Single-Stage Open-world Instance Segmentation with Cross-task Consistency Regularization"
8
star
36

AVA-AVD

Python
7
star
37

Efficient-CLS

[arXiv2022] Label-Efficient Online Continual Object Detection in Streaming Video
6
star
38

videollm-online

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
Python
6
star
39

Tune-An-Ellipse

[CVPR 2024] Tune-An-Ellipse: CLIP Has Potential to Find What You Want
6
star
40

mist

5
star
41

ColonNeRF

This is the project page for ColonNeRF.
JavaScript
4
star
42

DynVideo-E

This is the project page for DynVideo-E.
JavaScript
3
star
43

VideoLISA

[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
3
star
44

TTC-Tuning

Revisit Parameter-Efficient Transfer Learning: A Two-Stage Paradigm
2
star
45

assistq

SCSS
1
star