dvlab-research/Video-P2P

Stars
365
Rank 116,851 (Top 3 %)
Language
Python
Created over 1 year ago
Updated 4 months ago

dvlab-research/Video-P2P

dvlab-research

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Video-P2P: Video Editing with Cross-attention Control

Video-P2P: Video Editing with Cross-attention Control

The official implementation of Video-P2P.

Shaoteng Liu, Yuechen Zhang, Wenbo Li, Zhe Lin, Jiaya Jia

Changelog

2023.03.20 Release Demo.
2023.03.19 Release Code.
2023.03.09 Paper preprint on arxiv.

Todo

Setup

pip install -r requirements.txt

The code was tested on both Tesla V100 32GB and RTX3090 24GB. At least 20GB VRAM is required.

The environment is similar to Tune-A-Video and prompt-to-prompt.

xformers on 3090 may meet this issue.

Quickstart

Please replace pretrained_model_path with the path to your stable-diffusion.

To download the pre-trained model, please refer to diffusers.

# Stage 1: Tuning to do model initialization.

# You can minimize the tuning epochs to speed up.
python run_tuning.py  --config="configs/rabbit-jump-tune.yaml"

# Stage 2: Attention Control

# We develop a faster mode (1 min on V100):
python run_videop2p.py --config="configs/rabbit-jump-p2p.yaml" --fast

# The official mode (10 mins on V100, more stable):
python run_videop2p.py --config="configs/rabbit-jump-p2p.yaml"

Find your results in Video-P2P/outputs/xxx/results.

Dataset

We release our dataset here.

Download them under ./data and explore your creativity!

Results

configs/rabbit-jump-p2p.yaml	configs/penguin-run-p2p.yaml

configs/man-motor-p2p.yaml	configs/car-drive-p2p.yaml

configs/tiger-forest-p2p.yaml	configs/bird-forest-p2p.yaml

Gradio demo

Running the following command to launch the local demo built with gradio:

python app_gradio.py

Find the demo on HuggingFace here. The demo code borrows heavily from Tune-A-Video.

Citation

@misc{liu2023videop2p,
      author={Liu, Shaoteng and Zhang, Yuechen and Li, Wenbo and Lin, Zhe and Jia, Jiaya},
      title={Video-P2P: Video Editing with Cross-attention Control}, 
      journal={arXiv:2303.04761},
      year={2023},
}

References

prompt-to-prompt: https://github.com/google/prompt-to-prompt
Tune-A-Video: https://github.com/showlab/Tune-A-Video
diffusers: https://github.com/huggingface/diffusers

MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

LongLoRA

Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)

LISA

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

VoxelNeXt

VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking (CVPR 2023)

LLaMA-VID

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)

DeepUPE

Underexposed Photo Enhancement Using Deep Illumination Estimation

3D-Box-Segment-Anything

We extend Segment Anything to 3D perception by combining it with VoxelNeXt.

Jupyter Notebook

LLMGA

This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024

ControlNeXt

Controllable video and image Generation, SVD, Animate Anyone, ControlNet, LoRA

PanopticFCN

Fully Convolutional Networks for Panoptic Segmentation (CVPR2021 Oral)

PointGroup

PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation

3DSSD

3DSSD: Point-based 3D Single Stage Object Detector (CVPR 2020)

FocalsConv

Focal Sparse Convolutional Networks for 3D Object Detection (CVPR 2022, Oral)

Stratified-Transformer

Stratified Transformer for 3D Point Cloud Segmentation (CVPR 2022)

DSGN

DSGN: Deep Stereo Geometry Network for 3D Object Detection (CVPR 2020)

PFENet

PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation (TPAMI).

SphereFormer

The official implementation for "Spherical Transformer for LiDAR-based 3D Recognition" (CVPR 2023).

GridMask

ReviewKD

Distilling Knowledge via Knowledge Review, CVPR 2021

Parametric-Contrastive-Learning

Parametric Contrastive Learning (ICCV2021) & GPaCo (TPAMI 2023)

Step-DPO

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

Simple-SR

Include MuCAN, LAPAR, etc.

UVTR

Unifying Voxel-based Representation with Transformer for 3D Object Detection (NeurIPS 2022)

Facelet_Bank

Facelet-Bank for Fast Portrait Manipulation (pytorch)

SA-AutoAug

Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

LargeKernel3D

LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs (CVPR 2023)

SNR-Aware-Low-Light-Enhance

This is the official implementation for the paper "SNR-aware low-light image enhancement" in CVPR2022

MASA-SR

MASA-SR: Matching Acceleration and Spatial Adaptation for Reference-Based Image Super-Resolution (CVPR2021)

ECCV22-P3AFormer-Tracking-Objects-as-Pixel-wise-Distributions

The official code for our ECCV22 oral paper: tracking objects as pixel-wise distributions.

Context-Aware-Consistency

Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CVPR 2021)

SparseTransformer

A fast and memory-efficient libarary for sparse transformer with varying token numbers (e.g., 3D point cloud).

spconv-plus

EfficientNeRF

The official code for "Efficient Neural Radiance Fields" in CVPR2022.

MiSLAS

Improving Calibration for Long-Tailed Recognition (CVPR2021)

RIVAL

[NeurIPS 2023 Spotlight] Real-World Image Variation by Aligning Diffusion Inversion Chain

MOOD

Official PyTorch implementation of MOOD series: (1) MOODv1: Rethinking Out-of-distributionDetection: Masked Image Modeling Is All You Need. (2) MOODv2: Masked Image Modeling for Out-of-Distribution Detection.

outpainting_srn

Wide-Context Semantic Image Extrapolation, CVPR2019

MSAD

Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

DeepVision3D

DeepVision3D is an open source toolbox for point-cloud understanding.

Ref-NPR

[CVPR 2023] Ref-NPR: Reference-Based Non-PhotoRealistic Radiance Fields

VFIformer

Video Frame Interpolation with Transformer (CVPR2022)

Prompt-Highlighter

[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs

VFF

Voxel Field Fusion for 3D Object Detection (CVPR2022)

SMR

Self-Supervised 3D Mesh Reconstruction from Single Images (CVPR2021)

SCGAN

The implementation of 'Image synthesis via semantic composition', ICCV2021.

Imbalanced-Learning

Imbalanced learning tool for imbalanced recognition and segmentation

JigsawClustering

This is the code for CVPR 2021 oral paper: Jigsaw Clustering for Unsupervised Visual Representation Learning

AttenNorm

Attentive Normalization for Conditional Image Generation

GFS-Seg

The official implementation of Generalized Few-shot Semantic Segmentation (CVPR 2022)

Mask-Attention-Free-Transformer

Official Implementation for "Mask-Attention-Free Transformer for 3D Instance Segmentation"

MoTCoder

This is the official code repository of MoTCoder: Elevating Large Language Models with Modular of Thought for Challenging Programming Tasks.

SDSD

Seeing Dynamic Scene in the Dark: High-Quality Video Dataset with Mechatronic Alignment (ICCV2021)

GroupContrast

[CVPR 2024] GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding

ProposeReduce

Video Instance Segmentation with a Propose-Reduce Paradigm (ICCV 2021)

Robust-Semantic-Segmentation

Dynamic Divide-and-Conquer Adversarial Training for Robust Semantic Segmentation （ICCV2021）

Mr-Ben

This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"

BAL

BAL: Balancing Diversity and Novelty for Active Learning - Official Pytorch Implementation

TriVol

The official code of TriVol in CVPR-2023

MR-GSM8K

Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs

DecoupleNet

Official implementation for our ECCV 2022 paper "DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation"

Dsig

Deep Structured Instance Graph for Distilling Object Detectors (ICCV 2021)

LBGAT

Learnable Boundary Guided Adversarial Training (ICCV2021)

Q-LLM

This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"

AGSS-VOS

AGSS-VOS: Attention Guided Single-Shot Video Object Segmentation

MAT

MAT: Mask-Aware Transformer for Large Hole Image Inpainting

MSN

Memory Selection Network for Video Propagation (ECCV 2020)

APD

Point2Pix

The official code of Point2pix in CVPR-2023

TagCLIP