There are no reviews yet. Be the first to send feedback to the community and the maintainers!
Awesome-Video-Diffusion
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.Show-1
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video GenerationTune-A-Video
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video GenerationImage2Paragraph
[A toolbox for fun.] Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.MotionDirector
MotionDirector: Motion Customization of Text-to-Video Diffusion Models.Show-o
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.VideoSwap
Code for [CVPR 2024] VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point CorrespondenceAwesome-MLLM-Hallucination
๐ A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).all-in-one
[CVPR2023] All in One: Exploring Unified Video-Language Pre-trainingBoxDiff
[ICCV 2023] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained DiffusionDeVRF
The Pytorch implementation of "DeVRF: Fast Deformable Voxel Radiance Fields for Dynamic Scenes"EgoVLP
[NeurIPS2022] Egocentric Video-Language PretrainingVisorGPT
[NeurIPS 2023] Customize spatial layouts for conditional image synthesis models, e.g., ControlNet, using GPTAwesome-GUI-Agent
๐ป A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.Awesome-Unified-Multimodal-Models
๐ This is a repository for organizing papers, codes and other resources related to unified multimodal models.ShowAnything
cosmo
loveu-tgve-2023
Official GitHub repository for the Text-Guided Video Editing (TGVE) competition of LOVEU Workshop @ CVPR'23.sparseformer
(ICLR 2024, CVPR 2024) SparseFormerdatacentric.vlp
Compress conventional Vision-Language Pre-training dataRegion_Learner
The Pytorch implementation for "Video-Text Pre-training with Learned Regions"ShowRoom3D
This is the project page of ShowRoom3DLong-form-Video-Prior
DemoVLP
[Arxiv2022] Revitalize Region Feature for Democratizing Video-Language Pre-trainingCLVQA
[AAAI2023 (Oral)] Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA TaskBYOC
[IEEE-VR 2024] Bring Your Own Character: A Holistic Solution for Automatic Facial Animation Generation of Customized CharactersQ2A
[ECCV 2022] AssistQ: Affordance-centric Question-driven Task Completion for Egocentric AssistantHOSNeRF
This is the project page for the HOSNeRFheadshot
GEB-Plus
[ECCV 2022] GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and RetrievalLOVA3
[NeurIPS 2024] "Learning to Visual Question Answering, Asking and Assessment"Show-Anything-3D
Edit and Generate Anything in 3D world!Awesome-Long-Context
A curated list of resources about long-context in large-language models and video understanding.SCT
[IJCV2023] Offical implementation of "SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels"VisInContext
Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal LearningSOIS
The Pytorch implementation of "Single-Stage Open-world Instance Segmentation with Cross-task Consistency Regularization"AVA-AVD
Efficient-CLS
[arXiv2022] Label-Efficient Online Continual Object Detection in Streaming Videovideollm-online
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)Tune-An-Ellipse
[CVPR 2024] Tune-An-Ellipse: CLIP Has Potential to Find What You Wantmist
ColonNeRF
This is the project page for ColonNeRF.DynVideo-E
This is the project page for DynVideo-E.VideoLISA
[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videosassistq
Love Open Source and this site? Check out how you can help us