SeqDeepFake: Detecting and Recovering Sequential DeepFake Manipulation
[Project Page] | [Paper] | [Extension Paper] | [Dataset]
Updates
- [09/2023] Arxiv extension paper released.
- [07/2022] Pretrained models are uploaded.
- [07/2022] Project page and dataset are released.
- [07/2022] Code is released.
Introduction
This is the official implementation of Detecting and Recovering Sequential DeepFake Manipulation. We introduce a novel research problem: Detecting Sequential DeepFake Manipulation (Seq-DeepFake), which focus on detecting the sequences of multi-step facial manipulations. To faciliatate the study of Seq-Deepfake, we provide a large-scale Sequential Deepfake Dataset, and propose a concise yet effective Seq-DeepFake Transformer (SeqFakeFormer).
The framework of the proposed method:
Installation
Download
git clone https://github.com/rshao/SeqDeepFake.git
cd SeqDeepFake
Environment
We recommend using Anaconda to manage the python environment:
conda create -n seqdeepfake python=3.6
conda activate seqdeepfake
conda install -c pytorch pytorch=1.6.0 torchvision=0.7.0 cudatoolkit==10.1.243
conda install pandas
conda install tqdm
conda install pillow
pip install tensorboard==2.4.1
Dataset Preparation
A brief introduction
We contribute the first large-scale Sequential DeepFake Dataset, Seq-Deepfake, including ~85k sequentially manipulated face images, each annotated with its ground-truth manipulation sequence.
The images are generated based on the following two different facial manipulation methods, with 28 / 26 types of manipulation sequences (including original), repectively. The lengths of all manipulation sequences range from 1~5.
- Sequential facial components manipulation (based on CelebAMask-HQ and StyleMapGAN)
- Sequential facial attributes manipulation (based on FFHQ and Talk-To-Edit)
Here are some sample images and statistics:
Annotations
Each image in the dataset is annotated with a list of length 5, indicating the ground-truth manipulation sequence. The labels in the sequence are defined as follows:
For Sequential facial components manipulation:
0: 'NA', 1: 'nose', 2: 'eye', 3: 'eyebrow', 4: 'lip', 5: 'hair'
Note: 'NA' means no manipulation is taken in this step.
For Sequential facial attributes manipulation:
0: 'NA', 1: 'Bangs', 2: 'Eyeglasses', 3: 'Beard', 4: 'Smiling', 5: 'Young'
Note: 'NA' means no manipulation is taken in this step.
Note that label 0
serves as the placeholder for sequential manipulations shorter than 5 steps. For example, the annotation for manipulation sequence nose-eye-lip
would be: [1, 2, 4, 0, 0]
. Original images are annotated with [0, 0, 0, 0, 0]
.
Prepare data
You can download the Seq-Deepfake dataset through this link: [Dataset]
After unzip all sub files, the structure of the dataset should be as follows:
./
βββ facial_attributes
β βββ annotations
β | βββ train.csv
β | βββ test.csv
β | βββ val.csv
β βββ images
β βββ train
β β βββ Bangs-Eyeglasses-Smiling-Young
β β | βββ xxxxxx.jpg
| | | ...
| | | βββ xxxxxx.jpg
| | ...
β β βββ Young-Smiling-Eyeglasses
β β | βββ xxxxxx.jpg
| | | ...
| | | βββ xxxxxx.jpg
β β βββ original
β β βββ xxxxxx.jpg
| | ...
| | βββ xxxxxx.jpg
β βββ test
β β % the same structure as in train
β βββ val
β % the same structure as in train
βββ facial_components
βββ annotations
| βββ train.csv
| βββ test.csv
| βββ val.csv
βββ images
βββ train
β βββ eyebrow-eye-hair-nose-lip
β | βββ xxxxxx.jpg
| | ...
| | βββ xxxxxx.jpg
| ...
β βββ nose-eyebrow-lip-eye-hair
β | βββ xxxxxx.jpg
| | ...
| | βββ xxxxxx.jpg
β βββ original
β βββ xxxxxx.jpg
| ...
| βββ xxxxxx.jpg
βββ test
β % the same structure as in train
βββ val
% the same structure as in train
Training
Single-GPU
Modify train.sh
and run:
sh train.sh
Please refer to the following instructions about some arguments:
Args | Description |
---|---|
CONFIG | Path of the network and optimization configuration file. |
DATA_DIR | Directory to the downloaded dataset. |
DATASET_NAME | Name of the selected manipulation type. Choose from 'facial_components' and 'facial_attributes'. |
RESULTS_DIR | Directory to save logs and checkpoints. |
You can change the network and optimization configurations by adding new configuration files under the directory ./configs/
.
Multiple-GPUs (Slurm)
We also provide slurm script that supports multiple GPUs training:
sh train_slurm.sh
where PARTITION
and NODE
should be modified according to your own environment. The number of GPUs to be used can be set through the NUM_GPU
argument.
Testing
Modify test.sh
and run:
sh test.sh
For the arguments in test.sh
, please refer to the training instructions above, plus the following ones:
Args | Description |
---|---|
TEST_TYPE | The evaluation metrics to use. Choose from 'fixed' and 'adaptive'. |
LOG_NAME | Should be set according to the log_name of your trained checkpoint to be tested. |
We also provide slurm script for testing:
sh test_slurm.sh
Benchmark Results
Here we list the performance of three SOTA deepfake detection methods and our method. Please refer to our paper for more details.
Facial Components Manipulation
Method | Reference | Fixed-Acc |
Adaptive-Acc |
---|---|---|---|
DRN | Wang et al. | 66.06 | 45.79 |
MA | Zhao et al. | 71.31 | 52.94 |
Two-Stream | Luo et al. | 71.92 | 53.89 |
SeqFakeFormer | Shao et al. | 72.65 | 55.30 |
Facial Attributes Manipulation
Method | Reference | Fixed-Acc |
Adaptive-Acc |
---|---|---|---|
DRN | Wang et al. | 64.42 | 43.20 |
MA | Zhao et al. | 67.58 | 47.48 |
Two-Stream | Luo et al. | 66.77 | 46.38 |
SeqFakeFormer | Shao et al. | 68.86 | 49.63 |
Pretrained Models
We also provide the pretrained models that generate our results in the benchmark table:
Model | Description |
---|---|
pretrained-r50-c | Trained on facial_components with resnet50 backbone. |
pretrained-r50-a | Trained on facial_attributes with resnet50 backbone. |
In order to try the pre-trained checkpoints, please:
-
download from the links in the table, unzip the file and put them under the
./results
folder with the following structure:results βββ resnet50 βββ facial_attributes β βββ pretrained-r50-a β βββ snapshots β βββ best_model_adaptive.pt β βββ best_model_fixed.pt βββ facial_components βββ pretrained-r50-c βββ snapshots βββ best_model_adaptive.pt βββ best_model_fixed.pt
-
In
test.sh
, modifyDATA_DIR
to the root of your Seq-DeepFake dataset. ModifyLOGNAME
andDATASET_NAME
to'pretrained-r50-c'
,'facial_components'
or'pretrained-r50-a'
,'facial_attributes'
, respectively. -
Run
test.sh
.
Citation
If you find this work useful for your research, please kindly cite our paper:
@inproceedings{shao2022seqdeepfake,
title={Detecting and Recovering Sequential DeepFake Manipulation},
author={Shao, Rui and Wu, Tianxing and Liu, Ziwei},
booktitle={European Conference on Computer Vision (ECCV)},
year={2022}
}