• Stars
    star
    707
  • Rank 63,841 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 2 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition

RAD-NeRF: Real-time Neural Talking Portrait Synthesis

This repository contains a PyTorch re-implementation of the paper: Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition.

Colab notebook demonstration: Open In Colab

Project Page | Arxiv | Data

A GUI for easy visualization:

obama.mp4

Install

Tested on Ubuntu 22.04, Pytorch 1.12 and CUDA 11.6.

git clone https://github.com/ashawkey/RAD-NeRF.git
cd RAD-NeRF

Install dependency

# for ubuntu, portaudio is needed for pyaudio to work.
sudo apt install portaudio19-dev

pip install -r requirements.txt

Build extension (optional)

By default, we use load to build the extension at runtime. However, this may be inconvenient sometimes. Therefore, we also provide the setup.py to build each extension:

# install all extension modules
bash scripts/install_ext.sh

Data pre-processing

Preparation:

## install pytorch3d
pip install "git+https://github.com/facebookresearch/pytorch3d.git"

## prepare face-parsing model
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_parsing/79999_iter.pth?raw=true -O data_utils/face_parsing/79999_iter.pth

## prepare basel face model
# 1. download `01_MorphableModel.mat` from https://faces.dmi.unibas.ch/bfm/main.php?nav=1-2&id=downloads and put it under `data_utils/face_tracking/3DMM/`
# 2. download other necessary files from AD-NeRF's repository:
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/exp_info.npy?raw=true -O data_utils/face_tracking/3DMM/exp_info.npy
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/keys_info.npy?raw=true -O data_utils/face_tracking/3DMM/keys_info.npy
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/sub_mesh.obj?raw=true -O data_utils/face_tracking/3DMM/sub_mesh.obj
wget https://github.com/YudongGuo/AD-NeRF/blob/master/data_util/face_tracking/3DMM/topology_info.npy?raw=true -O data_utils/face_tracking/3DMM/topology_info.npy
# 3. run convert_BFM.py
cd data_utils/face_tracking
python convert_BFM.py
cd ../..

## prepare ASR model
# if you want to use DeepSpeech as AD-NeRF, you should install tensorflow 1.15 manually.
# else, we also support Wav2Vec in PyTorch.

Pre-processing Custom Training Video

  • Put training video under data/<ID>/<ID>.mp4.

    The video must be 25FPS, with all frames containing the talking person. The resolution should be about 512x512, and duration about 1-5min.

    # an example training video from AD-NeRF
    mkdir -p data/obama
    wget https://github.com/YudongGuo/AD-NeRF/blob/master/dataset/vids/Obama.mp4?raw=true -O data/obama/obama.mp4
  • Run script (may take hours dependending on the video length)

    # run all steps
    python data_utils/process.py data/<ID>/<ID>.mp4
    
    # if you want to run a specific step 
    python data_utils/process.py data/<ID>/<ID>.mp4 --task 1 # extract audio wave
  • File structure after finishing all steps:

    ./data/<ID>
    β”œβ”€β”€<ID>.mp4 # original video
    β”œβ”€β”€ori_imgs # original images from video
    β”‚  β”œβ”€β”€0.jpg
    β”‚  β”œβ”€β”€0.lms # 2D landmarks
    β”‚  β”œβ”€β”€...
    β”œβ”€β”€gt_imgs # ground truth images (static background)
    β”‚  β”œβ”€β”€0.jpg
    β”‚  β”œβ”€β”€...
    β”œβ”€β”€parsing # semantic segmentation
    β”‚  β”œβ”€β”€0.png
    β”‚  β”œβ”€β”€...
    β”œβ”€β”€torso_imgs # inpainted torso images
    β”‚  β”œβ”€β”€0.png
    β”‚  β”œβ”€β”€...
    β”œβ”€β”€aud.wav # original audio 
    β”œβ”€β”€aud_eo.npy # audio features (wav2vec)
    β”œβ”€β”€aud.npy # audio features (deepspeech)
    β”œβ”€β”€bc.jpg # default background
    β”œβ”€β”€track_params.pt # raw head tracking results
    β”œβ”€β”€transforms_train.json # head poses (train split)
    β”œβ”€β”€transforms_val.json # head poses (test split)

Usage

Quick Start

We provide some pretrained models here for quick testing on arbitrary audio.

  • Download a pretrained model. For example, we download obama_eo.pth to ./pretrained/obama_eo.pth

  • Download a pose sequence file. For example, we download obama.json to ./data/obama.json

  • Prepare your audio as <name>.wav, and extract audio features.

    # if model is `<ID>_eo.pth`, it uses wav2vec features
    python nerf/asr.py --wav data/<name>.wav --save_feats # save to data/<name>_eo.npy
    
    # if model is `<ID>.pth`, it uses deepspeech features 
    python data_utils/deepspeech_features/extract_ds_features.py --input data/<name>.wav # save to data/<name>.npy

    You can download pre-processed audio features too. For example, we download intro_eo.npy to ./data/intro_eo.npy.

  • Run inference: It takes about 2GB GPU memory to run inference at 40FPS (measured on a V100).

    # save video to trail_obama/results/*.mp4
    # if model is `<ID>.pth`, should append `--asr_model deepspeech` and use `--aud intro.npy` instead.
    python test.py --pose data/obama.json --ckpt pretrained/obama_eo.pth --aud data/intro_eo.npy --workspace trial_obama/ -O --torso
    
    # provide a background image (default is white)
    python test.py --pose data/obama.json --ckpt pretrained/obama_eo.pth --aud data/intro_eo.npy --workspace trial_obama/ -O --torso --bg_img data/bg.jpg
    
    # test with GUI
    python test.py --pose data/obama.json --ckpt pretrained/obama_eo.pth --aud data/intro_eo.npy --workspace trial_obama/ -O --torso --bg_img data/bg.jpg --gui

Detailed Usage

First time running will take some time to compile the CUDA extensions.

# train (head)
# by default, we load data from disk on the fly.
# we can also preload all data to CPU/GPU for faster training, but this is very memory-hungry for large datasets.
# `--preload 0`: load from disk (default, slower).
# `--preload 1`: load to CPU, requires ~70G CPU memory (slightly slower)
# `--preload 2`: load to GPU, requires ~24G GPU memory (fast)
python main.py data/obama/ --workspace trial_obama/ -O --iters 200000

# train (finetune lips for another 50000 steps, run after the above command!)
python main.py data/obama/ --workspace trial_obama/ -O --iters 250000 --finetune_lips

# train (torso)
# <head>.pth should be the latest checkpoint in trial_obama
python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --head_ckpt <head>.pth --iters 200000

# test on the test split
python main.py data/obama/ --workspace trial_obama/ -O --test # use head checkpoint, will load GT torso
python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --test

# test with GUI
python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --test --gui

# test with GUI (load speech recognition model for real-time application)
python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --test --gui --asr

# test with specific audio & pose sequence
# --test_train: use train split for testing
# --data_range: use this range's pose & eye sequence (if shorter than audio, automatically mirror and repeat)
python main.py data/obama/ --workspace trial_obama_torso/ -O --torso --test --test_train --data_range 0 100 --aud data/intro_eo.npy

check the scripts directory for more provided examples.

Acknowledgement

  • The data pre-processing part is adapted from AD-NeRF.
  • The NeRF framework is based on torch-ngp.
  • The GUI is developed with DearPyGui.

Citation

@article{tang2022radnerf,
  title={Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition},
  author={Tang, Jiaxiang and Wang, Kaisiyuan and Zhou, Hang and Chen, Xiaokang and He, Dongliang and Hu, Tianshu and Liu, Jingtuo and Zeng, Gang and Wang, Jingdong},
  journal={arXiv preprint arXiv:2211.12368},
  year={2022}
}

More Repositories

1

stable-dreamfusion

Text-to-3D & Image-to-3D & Mesh Exportation with NeRF + Diffusion.
Python
7,270
star
2

torch-ngp

A pytorch CUDA extension implementation of instant-ngp (sdf and nerf), with a GUI.
Python
1,863
star
3

nerf2mesh

[ICCV2023] Delicate Textured Mesh Recovery from NeRF via Adaptive Surface Refinement
Python
743
star
4

Drag3D

DragGAN meets GET3D for interactive mesh generation and editing.
Python
452
star
5

diff-gaussian-rasterization

Cuda
308
star
6

Segment-Anything-NeRF

Segment-anything interactively in NeRF.
Python
277
star
7

chatgpt_please_improve_my_paper_writing

a thin wrapper of chatgpt for improving paper writing.
Python
251
star
8

torch-merf

An unofficial pytorch implementation of MeRF
Python
137
star
9

dreamfields-torch

A pytorch implementation of dreamfields with modifications.
Python
134
star
10

fantasia3d.unofficial

An unofficial reproduction of Fantasia3D
Python
127
star
11

CCNeRF

[NeurIPS 2022] Compressible-composable NeRF via Rank-residual Decomposition.
Python
125
star
12

nerf_template

a simple template for practicing NeRF.
Python
125
star
13

cubvh

CUDA Mesh BVH tools.
Cuda
121
star
14

jiif

[ACM MM 2021] Joint Implicit Image Function for Guided Depth Super-Resolution
Python
90
star
15

raytracing

A CUDA Mesh RayTracer with BVH acceleration, with python bindings and a GUI.
Cuda
83
star
16

volumentations

3D volume data augmentation package inspired by albumentations
Python
78
star
17

kiuikit

A maintained, reusable and trustworthy toolkit for computer vision tasks.
Python
42
star
18

envlight

Environment light tools.
Python
38
star
19

FocalLoss.pytorch

Implementation of focal loss in pytorch for unbalanced classification.
Python
35
star
20

dimr

[ECCV 2022] Disentangled Instance Mesh Reconstruction
Python
27
star
21

NotVeryFastNeRF

an unofficial and partial implementation of FastNeRF
Jupyter Notebook
25
star
22

note

notebook archive
PowerShell
19
star
23

3d_human_poser

a naive 3d human pose editor GUI.
Python
16
star
24

vscode-mesh-viewer

A 3D mesh viewer for vscode
JavaScript
16
star
25

CCA

CCA, DCCA, DCCAE, ConvCCA
Python
14
star
26

grid_put

An operation trying to do the opposite of F.grid_sample
Python
13
star
27

index_grid_sample

Extension to `F.grid_sample` that allows using batch index per grid point.
Cuda
12
star
28

made-in-heaven-timer

create timer videos at any speed.
Python
11
star
29

q10r

A simple web questionnaire application.
Python
6
star
30

ddddsr

A python library for end-to-end image super resolution.
Python
5
star
31

lightnet

light weight convolutional neural network implementation in one c++ file.
C++
5
star
32

bsp_cvae

Python
4
star
33

learn_matmul

Cuda
3
star
34

trojan-privoxy-client

for unraid proxy.
Dockerfile
2
star
35

numpytorch

Monkey-patched numpy with pytorch syntax
Python
2
star
36

point_seg_dist

a CUDA implementation of points to lines/segments distance
C
2
star
37

pytorch_ddp_examples

Python
1
star
38

uuunet

Python
1
star
39

fbxloader

FBX file loader for python (only supports geometry currently)
Python
1
star
40

unraid_tutorial

2021εΉ΄ηš„unraid搭建教程
1
star
41

CapsNet.pytorch

reimplementation of capsule network for MNIST classification.
Python
1
star
42

Uncertainty

program to calculate uncertainty for Physics experiment.
Python
1
star
43

nonsense

NoNSeNSe frontend.
JavaScript
1
star
44

JLGCN

Joing learning of graphs and features
Python
1
star
45

live-speech-recognition

A simple sliding window based real-time speech recognition example.
Python
1
star
46

dullPLYviewer

HTML
1
star
47

MaxClique

Heuristic algorithms to solve the max clique problem.
C++
1
star
48

hawtorch

pytorch extensions for code reuse
Python
1
star