thu-ml/controlvideo

Stars
181
Rank 212,110 (Top 5 %)
Language
Python
License
Apache License 2.0
Created over 1 year ago
Updated over 1 year ago

thu-ml/controlvideo

thu-ml

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Official implementation for "ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing"

ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing

This is the official implementation for "ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing". The project page is available here. Code will be released soon.

Overview

ControlVideo incorporates visual conditions for all frames to amplify the source video's guidance, key-frame attention that aligns all frames with a selected one and temporal attention modules succeeded by a zero convolutional layer for temporal consistency and faithfulness. The three key components and corresponding fine-tuned parameters are designed by a systematic empirical study. Built upon the trained ControlVideo, during inference, we employ DDIM inversion and then generate the edited video using the target prompt via DDIM sampling.

Main Results

To Do List

Environment

conda env create -f environment.yml

The environment is similar to Tune-A-Video

Prepare Pretrained Text-to-Image Diffusion Model

Download the Stable Diffusion 1.5 and ControlNet 1.0 for canny, HED, depth and pose. Put them in ./ .

Quick Start

python main.py --control_type hed --video_path videos/car10.mp4 --source 'a car' --target 'a red car' --out_root outputs/ --max_step 300

The control_type is the type of controls, which is chosen from canny/hed/depth/pose. The video_path is the path to the input video. The source is the source prompt for the source video. The target is the target prompt. The max_step is the step for training. The out_root is the path for saving results.

Run More Demos

Download the data and put them in videos/.

python run_demos.py

References

If you find this repository helpful, please cite as:

@article{zhao2023controlvideo,
  title={ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing},
  author={Zhao, Min and Wang, Rongzhen and Bao, Fan and Li, Chongxuan and Zhu, Jun},
  journal={arXiv preprint arXiv:2305.17098},
  year={2023}
}

This implementation is based on Tune-A-Video and Video-p2p.

tianshou

An elegant PyTorch deep reinforcement learning library.

zhusuan

A probabilistic programming library for Bayesian deep learning, generative models, based on Tensorflow

prolificdreamer

ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation (NeurIPS 2023 Spotlight)

unidiffuser

Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"

CRM

[ECCV 2024] Single Image to 3D Textured Mesh in 10 seconds with Convolutional Reconstruction Model.

ares

A Python library for adversarial machine learning focusing on benchmarking adversarial robustness.

SageAttention

Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

warplda

Cache efficient implementation for Latent Dirichlet Allocation

3D_Corruptions_AD

Benchmarking Robustness of 3D Object Detection to Common Corruptions in Autonomous Driving, CVPR 2023

low-bit-optimizers

Low-bit optimizers for PyTorch

MMTrustEval

A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)

stochastic_gcn

Stochastic training of graph convolutional networks

RoboticsDiffusionTransformer

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation

Attack-Bard

DPM-Solver-v3

Official code for "DPM-Solver-v3: Improved Diffusion ODE Solver with Empirical Model Statistics" (NeurIPS 2023)

tianshou-docs-zh_CN

天授中文文档

Prior-Guided-RGF

zh-clip

SRPO

Codes accompanying the paper "Score Regularized Policy Optimization through Diffusion Behavior" (ICLR 2024).

vflow

Official code for "VFlow: More Expressive Generative Flows with Variational Data Augmentation" (ICML 2020)

AT3D

Towards Effective Adversarial Textured 3D Meshes on Physical Face Recognition, CVPR 2023, Highlight

implicit-normalizing-flows

Code for "Implicit Normalizing Flows" (ICLR 2021 spotlight)

HiDe-Prompt

Hierarchical Decomposition of Prompt-Based Continual Learning: Rethinking Obscured Sub-optimality (NeurIPS 2023, Spotlight)

BigTopicModel

Big Topic Model is a fast engine for running large-scale Topic Models.

NUNO

[ICML 2023] Non-Uniform Neural Operator (NUNO)

IODF

fpovi

Code for "Function Space Particle Optimization for Bayesian Neural Networks"

CF-UIcA

Code for "Collaborative Filtering with User-Item Co-Autoregressive Models"

Zhusuan-Jittor

Zhusuan with backend Jittor

LM-Calibration

mmdcgm-ssl

mmDCGMs for accurate classification and excellent class-conditional generation in semi-supervised learning

Zhusuan-PaddlePaddle

Zhusuan with backend PaddlePaddle

ood-dgm

MEM_DGM

Code for "Learning to Generate with Memory"

ProbML-book-solution

Jupyter Notebook

adversarial_training_imagenet

pmd

Population matching discrepancy

CEURL

Official implementation for "PEAC: Unsupervised Pre-training for Cross-Embodiment Reinforcement Learning" (NeurIPS 2024)

VCAS

Official code for "Efficient Backpropagation with Variance Controlled Adaptive Sampling" (ICLR 2024)

imagenet-a-plus

wmvl

Code for "A Wasserstein Minimum Velocity Approach to Learning Unnormalized Models"

Jupyter Notebook

sEM-vr

code for pLSA and LDA in the paper "Stochastic Expectation Maximization with Variance Reduction"

Efficient-Diffusion-Alignment

Official Codebase for "Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control" (NeurIPS 2024)

Noise-Contrastive-Alignment

Code accompanying the paper "Noise Contrastive Alignment of Language Models with Explicit Rewards"

i-DODE

Official code for "Improved Techniques for Maximum Likelihood Estimation for Diffusion ODEs" (ICML 2023)

CCA

Codes accompanying the paper "Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment"

ACTNN-PaddlePaddle

DBIM

Official codebase for "Diffusion Bridge Implicit Models" (https://arxiv.org/abs/2405.15885).

Jetfire-INT8Training

Jupyter Notebook