• This repository has been archived on 03/Dec/2022
  • Stars
    star
    157
  • Rank 237,059 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created over 4 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

2nd place solution for Kaggle Deepfake Detection Challenge

This is the code of Team \WM/ to reproduce our solution for the Deepfake Detection Challenge (DFDC).

Please refer to Model_Summary.pdf for a descriptive summary of our method.

Members (alphabetical order):

Content

  • make_dataset.sh and make_dataset.py: Script to extract faces from videos. For dataset processing.
  • train-wsdan.py: Script to train WS-DAN models.
  • train-xception.py: Script to train the Xception model.
  • best-submission.py: Our best submission on Kaggle.

Environment

We trained models on our lab's Linux cluster. The environment listed below reflects a typical software / hardware configuration in this cluster.

Hardware:

  • CPU: Xeon Gold 5120
  • GPU: 2080Ti or 1080Ti
  • Mem: > 64GB
  • Data is stored in SSD.

Software:

  • System: Ubuntu 16.04.6 with Linux 4.4.0 kernel.
  • Python: 3.6 or 3.7 distributed by Anaconda.
  • CUDA: 10.0

Reproduce Guide

Code & Data Dependency

  • For dependent python packages, please refer to requirements.txt.
  • Other external code dependencies are provided as git submodules in the external/ folder.
    • Run git submodule init && git submodule update to fetch these dependencies.
  • Other data dependencies can be downloaded from Google Drive:
    • RetinaFace-Resnet50-fixed.pth: Pretrained RetinaFace model.
    • ckpt_x.pth: Pretrained weight files for WS-DAN w/ Xception.
    • ckpt_e.pth: Pretrained weight files for WS-DAN w/ EfficientNet-b3.
    • xception-hg-2.pth: Pretrained Xception weight files.
  • External data used by the code which is not generated by us:
    • Pretrained RetinaFace model [1].
    • Pretrained EfficientNet on ImageNet [2].
    • They are accessible publicly and have been posted in the External Data Disclosure Thread by multiple other users according to the competition rules.

Dataset Processing

The script make_dataset.py extracts aligned faces from a video file and save as images. It works like this:

$ mkdir /path/to/output_frames/
$ python make_dataset.py /path/to/video.mp4 /path/to/output_frames/

The script make_dataset.sh finds all mp4 files recursively in a directory and calls make_dataset.py on each mp4 file.

Supposing that you have downloaded DFDC datasets and extracted all zip files to videos/, run the following command to process the whole dataset and save face images to /mnt/ssd0/dfdc/ for training:

$ bash make_dataset.sh videos/ /mnt/ssd0/dfdc/

Training: WS-DAN

WS-DAN is the core part of our final solution. We have trained two variants of WS-DAN: one with Xception and another with EfficientNet-b3 as feature extractors.

Training configs for both variants are provided in wsdan-conf/ folder. You should check save_dir, datapath and pretrained settings before training.

Note the pretrained setting:

  • For WS-DAN w/Xception, we used our previously trained Xception model (see the next section) to initialize feature extractor. This should be the path to Xception weight files.
  • For WS-DAN w/EfficientNet-b3, we used weight files downloaded by the EfficientNet-PyTorch code which is pretrained on ImageNet. Set it with any non-empty string is fine.

To train WS-DAN w/Xception:

$ python train-wsdan.py wsdan-conf/xception.py

To train WS-DAN w/EfficientNet:

$ python train-wsdan.py wsdan-conf/efb3.py 

Time estimation: We trained WS-DAN w/ Xception with 6 GPUs for almost a week (50 epochs).

Training: Xception

Xception part is not of much interest. We had been using two-class Xception as per-face classifier before we found powerful WS-DAN.

Check xception-conf.py for various path settings. Then run:

$ python train-xception.py xception-conf.py

Time estimation: With 4 GPUs, 92% or more validation accuracy should be observed in around 12h. We typically trained for more than 1 day (20+ epochs).

Validation

submission.py is the code of our best submission on Kaggle. We got 0.42842 (private) and 0.28680 (public). Please modify paths accordingly for your testing purpose.

Summary

Following are a summary of command-lines to train our model:

# Dataset Processing
$ ls dfdc_train_part_*.zip | xargs -i unzip {} -d videos/
$ bash make_dataset.sh videos/ /mnt/ssd0/dfdc/

# Train Xception
$ python train-xception.py xception-conf.py

# Train WS-DAN
$ mkdir -p output/dfdc-wsdan-{xception,efb3}/
$ python train-wsdan.py wsdan-conf/xception.py
$ python train-wsdan.py wsdan-conf/efb3.py

Reference

[1] RetinaFace implementation: biubug6/Pytorch_Retinaface

[2] WS-DAN implementation: GuYuc/WS-DAN.PyTorch.

[3] EfficientNet implementation: lukemelas/EfficientNet-PyTorch.

[4] Face alignment code is from: deepinsight/insightface.