There are no reviews yet. Be the first to send feedback to the community and the maintainers!
GFPGAN
GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.PhotoMaker
PhotoMaker [CVPR 2024]T2I-Adapter
T2I-AdapterInstantMesh
InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction ModelsBrushNet
[ECCV 2024] The official implementation of paper "BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion"MotionCtrl
Official Code for MotionCtrl [SIGGRAPH 2024]MasaCtrl
[ICCV 2023] Consistent Image Synthesis and EditingSEED-Story
SEED-Story: Multimodal Long Story Generation with Large Language ModelLLaMA-Pro
[ACL 2024] Progressive LLaMA with Block Expansion.Mix-of-Show
NeurIPS 2023, Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion ModelsOpen-MAGVIT2
Open-MAGVIT2: Democratizing Autoregressive Visual GenerationAnimeSR
Codes for "AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos"VQFR
ECCV 2022, Oral, VQFR: Blind Face Restoration with Vector-Quantized Dictionary and Parallel DecoderCustomNet
UMT
UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.MM-RealSR
Codes for "Metric Learning based Interactive Modulation for Real-World Super-Resolution"ViT-Lens
[CVPR 2024] ViT-Lens: Towards Omni-modal RepresentationsMCQ
Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).DeSRA
Official codes for DeSRA (ICML 2023)FAIG
NeurIPS 2021, Spotlight, Finding Discriminative Filters for Specific Degradations in Blind Super-ResolutionArcNerf
Nerf and extensions in allST-LLM
[ECCV 2024π₯] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"SurfelNeRF
SurfelNeRF: Neural Surfel Radiance Fields for Online Photorealistic Reconstruction of Indoor ScenesRepSR
Codes for "RepSR: Training Efficient VGG-style Super-Resolution Networks with Structural Re-Parameterization and Batch Normalization"mllm-npu
mllm-npu: training multimodal large language models on Ascend NPUsHOSNeRF
HOSNeRF: Dynamic Human-Object-Scene Neural Radiance Fields from a Single VideoFastRealVSR
Codes for "Mitigating Artifacts in Real-World Video Super-Resolution Models"ConMIM
Official codes for ConMIM (ICLR 2023)GVT
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".TVTS
Turning to Video for Transcript SortingBEBR
Official code for "Binary embedding based retrieval at Tencent"ViSFT
pi-Tuning
Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.FLM
Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)Efficient-VSR-Training
Codes for "Accelerating the Training of Video Super-Resolution"DTN
Official code for "Dynamic Token Normalization Improves Vision Transformer", ICLR 2022.OpenCompatible
OpenCompatible provides a standard compatible training benchmark, covering practical training scenarios.BTS
BTS: A Bi-lingual Benchmark for Text Segmentation in the WildSGAT4PASS
This is the official implementation of the paper SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation (IJCAI 2023)SFDA
TaCA
Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".Plot2Code
common_trainer
Common template for pytorch project. Easy to extent and modify for new project.TransFusion
The code repo for the ACM MM paper: TransFusion: Multi-Modal Fusion for Video Tag Inference viaTranslation-based Knowledge Embedding.BasicVQ-GEN
ArcVis
Visualization of 3d and 2d components interactively.VTLayout
Love Open Source and this site? Check out how you can help us