• Stars
    star
    123
  • Rank 288,474 (Top 6 %)
  • Language
    Python
  • Created about 2 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

[ECCV 2022] PyTorch code for SeqDeepFake: Detecting and Recovering Sequential DeepFake Manipulation

Updates

  • [09/2023] Arxiv extension paper released.
  • [07/2022] Pretrained models are uploaded.
  • [07/2022] Project page and dataset are released.
  • [07/2022] Code is released.

Introduction

This is the official implementation of Detecting and Recovering Sequential DeepFake Manipulation. We introduce a novel research problem: Detecting Sequential DeepFake Manipulation (Seq-DeepFake), which focus on detecting the sequences of multi-step facial manipulations. To faciliatate the study of Seq-Deepfake, we provide a large-scale Sequential Deepfake Dataset, and propose a concise yet effective Seq-DeepFake Transformer (SeqFakeFormer).

The framework of the proposed method:

Installation

Download

git clone https://github.com/rshao/SeqDeepFake.git
cd SeqDeepFake

Environment

We recommend using Anaconda to manage the python environment:

conda create -n seqdeepfake python=3.6
conda activate seqdeepfake
conda install -c pytorch pytorch=1.6.0 torchvision=0.7.0 cudatoolkit==10.1.243
conda install pandas
conda install tqdm
conda install pillow
pip install tensorboard==2.4.1

Dataset Preparation

A brief introduction

We contribute the first large-scale Sequential DeepFake Dataset, Seq-Deepfake, including ~85k sequentially manipulated face images, each annotated with its ground-truth manipulation sequence.

The images are generated based on the following two different facial manipulation methods, with 28 / 26 types of manipulation sequences (including original), repectively. The lengths of all manipulation sequences range from 1~5.

Here are some sample images and statistics:

Annotations

Each image in the dataset is annotated with a list of length 5, indicating the ground-truth manipulation sequence. The labels in the sequence are defined as follows:

For Sequential facial components manipulation:

0: 'NA', 1: 'nose', 2: 'eye', 3: 'eyebrow', 4: 'lip', 5: 'hair'

Note: 'NA' means no manipulation is taken in this step.

For Sequential facial attributes manipulation:

0: 'NA', 1: 'Bangs', 2: 'Eyeglasses', 3: 'Beard', 4: 'Smiling', 5: 'Young'

Note: 'NA' means no manipulation is taken in this step.

Note that label 0 serves as the placeholder for sequential manipulations shorter than 5 steps. For example, the annotation for manipulation sequence nose-eye-lip would be: [1, 2, 4, 0, 0]. Original images are annotated with [0, 0, 0, 0, 0].

Prepare data

You can download the Seq-Deepfake dataset through this link: [Dataset]

After unzip all sub files, the structure of the dataset should be as follows:

./
โ”œโ”€โ”€ facial_attributes
โ”‚   โ”œโ”€โ”€ annotations
โ”‚   |   โ”œโ”€โ”€ train.csv
โ”‚   |   โ”œโ”€โ”€ test.csv
โ”‚   |   โ””โ”€โ”€ val.csv
โ”‚   โ””โ”€โ”€ images
โ”‚       โ”œโ”€โ”€ train
โ”‚       โ”‚   โ”œโ”€โ”€ Bangs-Eyeglasses-Smiling-Young
โ”‚       โ”‚   |   โ”œโ”€โ”€ xxxxxx.jpg
|       |   |   ...
|       |   |   โ””โ”€โ”€ xxxxxx.jpg
|       |   ...
โ”‚       โ”‚   โ”œโ”€โ”€ Young-Smiling-Eyeglasses
โ”‚       โ”‚   |   โ”œโ”€โ”€ xxxxxx.jpg
|       |   |   ...
|       |   |   โ””โ”€โ”€ xxxxxx.jpg
โ”‚       โ”‚   โ””โ”€โ”€ original
โ”‚       โ”‚       โ”œโ”€โ”€ xxxxxx.jpg
|       |       ...
|       |       โ””โ”€โ”€ xxxxxx.jpg
โ”‚       โ”œโ”€โ”€ test
โ”‚       โ”‚   % the same structure as in train
โ”‚       โ””โ”€โ”€ val
โ”‚           % the same structure as in train
โ””โ”€โ”€ facial_components
    โ”œโ”€โ”€ annotations
    |   โ”œโ”€โ”€ train.csv
    |   โ”œโ”€โ”€ test.csv
    |   โ””โ”€โ”€ val.csv
    โ””โ”€โ”€ images
        โ”œโ”€โ”€ train
        โ”‚   โ”œโ”€โ”€ eyebrow-eye-hair-nose-lip
        โ”‚   |   โ”œโ”€โ”€ xxxxxx.jpg
        |   |   ...
        |   |   โ””โ”€โ”€ xxxxxx.jpg
        |   ...
        โ”‚   โ”œโ”€โ”€ nose-eyebrow-lip-eye-hair
        โ”‚   |   โ”œโ”€โ”€ xxxxxx.jpg
        |   |   ...
        |   |   โ””โ”€โ”€ xxxxxx.jpg
        โ”‚   โ””โ”€โ”€ original
        โ”‚       โ”œโ”€โ”€ xxxxxx.jpg
        |       ...
        |       โ””โ”€โ”€ xxxxxx.jpg
        โ”œโ”€โ”€ test
        โ”‚   % the same structure as in train
        โ””โ”€โ”€ val
            % the same structure as in train

Training

Single-GPU

Modify train.sh and run:

sh train.sh

Please refer to the following instructions about some arguments:

Args Description
CONFIG Path of the network and optimization configuration file.
DATA_DIR Directory to the downloaded dataset.
DATASET_NAME Name of the selected manipulation type. Choose from 'facial_components' and 'facial_attributes'.
RESULTS_DIR Directory to save logs and checkpoints.

You can change the network and optimization configurations by adding new configuration files under the directory ./configs/.

Multiple-GPUs (Slurm)

We also provide slurm script that supports multiple GPUs training:

sh train_slurm.sh

where PARTITION and NODE should be modified according to your own environment. The number of GPUs to be used can be set through the NUM_GPU argument.

Testing

Modify test.sh and run:

sh test.sh

For the arguments in test.sh, please refer to the training instructions above, plus the following ones:

Args Description
TEST_TYPE The evaluation metrics to use. Choose from 'fixed' and 'adaptive'.
LOG_NAME Should be set according to the log_name of your trained checkpoint to be tested.

We also provide slurm script for testing:

sh test_slurm.sh

Benchmark Results

Here we list the performance of three SOTA deepfake detection methods and our method. Please refer to our paper for more details.

Facial Components Manipulation

Method Reference Fixed-Acc ${\uparrow}$ Adaptive-Acc ${\uparrow}$
DRN Wang et al. 66.06 45.79
MA Zhao et al. 71.31 52.94
Two-Stream Luo et al. 71.92 53.89
SeqFakeFormer Shao et al. 72.65 55.30

Facial Attributes Manipulation

Method Reference Fixed-Acc ${\uparrow}$ Adaptive-Acc ${\uparrow}$
DRN Wang et al. 64.42 43.20
MA Zhao et al. 67.58 47.48
Two-Stream Luo et al. 66.77 46.38
SeqFakeFormer Shao et al. 68.86 49.63

Pretrained Models

We also provide the pretrained models that generate our results in the benchmark table:

Model Description
pretrained-r50-c Trained on facial_components with resnet50 backbone.
pretrained-r50-a Trained on facial_attributes with resnet50 backbone.

In order to try the pre-trained checkpoints, please:

  1. download from the links in the table, unzip the file and put them under the ./results folder with the following structure:

    results
    โ””โ”€โ”€ resnet50
        โ”œโ”€โ”€ facial_attributes
        โ”‚   โ””โ”€โ”€ pretrained-r50-a
        โ”‚       โ””โ”€โ”€ snapshots
        โ”‚           โ”œโ”€โ”€ best_model_adaptive.pt
        โ”‚           โ””โ”€โ”€ best_model_fixed.pt
        โ””โ”€โ”€ facial_components
            โ””โ”€โ”€ pretrained-r50-c
                โ””โ”€โ”€ snapshots
                    โ”œโ”€โ”€ best_model_adaptive.pt
                    โ””โ”€โ”€ best_model_fixed.pt
    
  2. In test.sh, modify DATA_DIR to the root of your Seq-DeepFake dataset. Modify LOGNAME and DATASET_NAME to 'pretrained-r50-c', 'facial_components' or 'pretrained-r50-a', 'facial_attributes', respectively.

  3. Run test.sh.

Citation

If you find this work useful for your research, please kindly cite our paper:

@inproceedings{shao2022seqdeepfake,
  title={Detecting and Recovering Sequential DeepFake Manipulation},
  author={Shao, Rui and Wu, Tianxing and Liu, Ziwei},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2022}
}