• Stars
    star
    161
  • Rank 233,470 (Top 5 %)
  • Language
    Python
  • License
    MIT License
  • Created over 4 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code for the ICML 2020 paper: Leveraging Frequency Analysis for Deep Fake Image Recognition.

Leveraging Frequency Analysis for Deep Fake Image Recognition

logo

This is the code repository accompaning our ICML 2020 paper Leveraging Frequency Analysis for Deep Fake Image Recognition.

Deep neural networks can generate images that are astonishingly realistic, so much so that it is often hard for untrained humans to distinguish them from actual photos. These achievementshave been largely made possible by Generative Adversarial Networks (GANs). While these deepfake images have been thoroughly investigatedin the image domain—a classical approach from the area of image forensics—an analysis in the frequency domain has been missing. This paper addresses this shortcoming and our results reveal, that in frequency space, GAN-generated images exhibit severe artifacts that can be easily identified. We perform a comprehensive analysis, showing that these artifacts are consistent across different neural network architectures, data sets,and resolutions. In a further investigation, we demonstrate that these artifacts are caused by upsampling operations found in all current GAN architectures, indicating a structural and fundamental problem in the way images are generatedvia GANs. Based on this analysis, we demonstrate how the frequency representation can be used to automatically identify deep fake images, surpassing state-of-the-art methods.

Prerequisites

For ease of use we provide a Dockerfile which builds a container in which you can execute all experiments. Additionally, we provide a shell script for ease of use:

Choose: docker.sh {build|download|convert|shell}
    build - Build the Dockerfile.
    shell - Spawn a shell inside the docker container.
    tests - Spawn Docker instance for pytest.
    clean - Cleanup directories from training.

Otherwise you will need a recent Python 3 version, tensorflow 2.0+ with CUDA compatibility. See requirements.txt for packages needed.

Datasets

We utilize these three popular datasets:

Additionally, we utilize the pre-trained models from these repositories:

Dataset preparation

The datasets have to be converted beforehand. First run crop_celeba.py or crop_lsun.py depending on your dataset. This will create a new folder which has cropped instances of the training data to 128x128. Then run prepare_dataset.py, depending on the mode selection the script expects different input. Note FFHQ is distributed in a cropped version.

The scripts expects one directory as input, containing multiple directories each with at least 27,000 images. These directories will get encoded with labels in the order of appearence, i.e., encoded as follows:

data
 |--- A_lsun 	-> label 0
 |--- B_ProGAN 	-> label 1
 	...

It converts all images to dct encoded numpy arrays/tfrecords, depending on the mode selected. Saving the output in three directories train (100,000), val (10,000) and test (25,000).

usage: prepare_dataset.py [-h] [--raw] [--log] [--color] [--normalize]
                          DIRECTORY {normal,tfrecords} ...

positional arguments:
  DIRECTORY           Directory to convert.
  {normal,tfrecords}  Select the mode {normal|tfrecords}

optional arguments:
  -h, --help          show this help message and exit
  --raw, -r           Save image data as raw image.
  --log, -l           Log scale Images.
  --color, -c         Compute as color instead.
  --normalize, -n     Normalize data.

Example:
python prepare_dataset.py ~/datasets/GANFingerprints/perturbed_experiments/lsun/blur/ -lnc normal

Computing Statistics

To compute all of our statistics we utilize the compute_statistics.py script. This script is run on the raw (cropped) image files.

usage: compute_statistics.py [-h] [--output OUTPUT] [--color]
                             AMOUNT [DATASETS [DATASETS ...]]

positional arguments:
  AMOUNT                The amount of images to load.
  DATASETS              Path to datasets. The first entry is assumed to be the
                        referrence one.

optional arguments:
  -h, --help            show this help message and exit
  --output OUTPUT, -o OUTPUT
                        Output directory. Default: {output_default}.
  --color, -c           Plot for each color channel seperate.
  
Example:
python compute_statistics.py 10000 ~/datasets/ffhq/real,REAL ~/datasets/ffhq/fake,FAKE

Experiments

Training your own models

After you have converted the data files as laid out above, you can train a new classifier:

usage: classifer.py train [-h] [--debug] [--epochs EPOCHS]
                          [--image_size IMAGE_SIZE]
                          [--early_stopping EARLY_STOPPING]
                          [--classes CLASSES] [--grayscale]
                          [--batch_size BATCH_SIZE] [--l1 L1] [--l2 L2]
                          MODEL TRAIN_DATASET VAL_DATASET

positional arguments:
  MODEL                 Select model to train {resnet, cnn, nn, log, log1,
                        log2, log3}.
  TRAIN_DATASET         Dataset to load.
  VAL_DATASET           Dataset to load.

optional arguments:
  -h, --help            show this help message and exit
  --debug, -d           Debug mode.
  --epochs EPOCHS, -e EPOCHS
                        Epochs to train for; Default: 50.
  --image_size IMAGE_SIZE
                        Image size. Default: [128, 128, 3]
  --early_stopping EARLY_STOPPING
                        Early stopping criteria. Default: 5
  --classes CLASSES     Classes. Default: 5
  --grayscale, -g       Train on grayscaled images.
  --batch_size BATCH_SIZE, -b BATCH_SIZE
                        Batch size. Default: 32
  --l1 L1               L1 reguralizer intensity. Default: 0.01
  --l2 L2               L2 reguralizer intensity. Default: 0.01
 
Example:

python classifer.py train log2 datasets/ffhq/data_raw_color_train_tf/data.tfrecords datasets/ffhq/data_raw_color_val_tf/data.tfrecords -b 32 -e 100 --l2 0.01 --classes 1 --image_size 1024

Testing

You can also use our pre-trained models.

usage: classifer.py test [-h] [--image_size IMAGE_SIZE] [--grayscale]
                         [--batch_size BATCH_SIZE]
                         MODEL TEST_DATASET

positional arguments:
  MODEL                 Path to model.
  TEST_DATASET          Dataset to load.

optional arguments:
  -h, --help            show this help message and exit
  --image_size IMAGE_SIZE
                        Image size. Default: [128, 128, 3]
  --grayscale, -g       Test on grayscaled images.
  --batch_size BATCH_SIZE, -b BATCH_SIZE
                        Batch size. Default: 32
                        
Example:
python classifer.py test path/to/classifier datasets/ffhq/data_raw_color_test_tf/data.tfrecords -b 32 --image_size 1024

If you simply want to clasify a directory of images, you can do so by utilizing our run_classifer.py script:

usage: run_classifier.py [-h] [--size SIZE] [--batch_size BATCH_SIZE]
                         [--dct DCT]
                         MODEL DATA

positional arguments:
  MODEL                 Model to evaluate.
  DATA                  Directory to classify.

optional arguments:
  -h, --help            show this help message and exit
  --size SIZE, -s SIZE  Only use this amount of images.
  --batch_size BATCH_SIZE, -b BATCH_SIZE
                        Batch size to use; Default: {batch_size}.
  --dct DCT, -d DCT     DCT input

Example:

python run_classifier.py submission_models/ffhq/ridge_pixel ~/datasets/ffhq/fake -s 1000

If you use a DCT-variant of the classifer, you also have to provide an estimate of the mean and variance of the dataset so it can be normalized. They can also be found in the google drive:

python run_classifier.py submission_models/ffhq/ridge_dct ~/datasets/ffhq/fake -s 1000 -d mean_var/ffhq_mean_var

Baselines

Basesline experiments are located in baselines directory. Each experiment (i.e., PRNU, Eigenfaces, and kNN) can be executed using the common baseline.py script.

usage: baselines.py [-h] [--command {train,test,grid_search}]
                    [--n_jobs N_JOBS] --datasets DATASETS --datasets_dir
                    DATASETS_DIR --output_dir OUTPUT_DIR
                    [--classifier_name CLASSIFIER_NAME]
                    {knn,prnu,eigenfaces} ...

Example (grid-search):
  python baselines.py --n_jobs 64 --command grid_search \
    --output_dir baselines/results \
    --datasets_dir datasets \
    --datasets lsun_raw_color_raw_normalized \
    --datasets lsun_raw_color_dct_log_scaled_normalized \
    --datasets celeba_raw_color_raw_normalized \
    --datasets celeba_raw_color_dct_log_scaled_normalized \
    knn

Example (train):
  python baselines.py --command train \
    --output_dir baselines/results \
    --datasets_dir datasets \
    --datasets lsun_raw_color_raw_normalized \
    eigenfaces \
    --pca_target_variance 0.95 \
    --C 0.01

Example (test):
  python3 baselines.py --command test \
    --output_dir baselines/results \
    --datasets_dir datasets \
    --classifier_name classifier_lsun_raw_color_raw_prnu_levels.3_sigma.0.8 \
    --datasets lsun_raw_color_raw \
    prnu

More Repositories

1

DroneSecurity

DroneSecurity (NDSS 2023)
Python
945
star
2

kAFL

Code for the USENIX 2017 paper: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels
Python
550
star
3

redqueen

Python
339
star
4

OMEN

OMEN: Ordered Markov ENumerator - Password Guesser
C
313
star
5

Microcode

Microcode Updates for the USENIX 2017 paper: Reverse Engineering x86 Processor Microcode
Python
297
star
6

syntia

Program synthesis based deobfuscation framework for the USENIX 2017 paper "Syntia: Synthesizing the Semantics of Obfuscated Code"
Python
296
star
7

mobile_sentinel

Python
187
star
8

nyx-net

Python
176
star
9

Nyx

USENIX 2021 - Nyx: Greybox Hypervisor Fuzzing using Fast Snapshots and Affine Types
C
169
star
10

ijon

C
164
star
11

nautilus

a grammar based feedback fuzzer
Rust
158
star
12

aurora

Usenix Security 2021 - AURORA: Statistical Crash Analysis for Automated Root Cause Explanation
Rust
146
star
13

grimoire

Python
125
star
14

loki

Hardening code obfuscation against automated attacks
Python
125
star
15

Password-Guessing-Framework

A Framework for Comparing Password Guessing Strategies
Python
120
star
16

Marx

Uncovering Class Hierarchies in C++ Programs
C++
114
star
17

antifuzz

AntiFuzz: Impeding Fuzzing Audits of Binary Executables
C
101
star
18

EthBMC

The code repository for the 2020 Usenix Security paper "EthBMC: A Bounded Model Checker for Smart Contracts"
Rust
91
star
19

WaveFake

Python
71
star
20

SiemensS7-Bootloader

Client utility for Siemens S7 bootloader special access feature
Python
55
star
21

NEMO

Modeling Password Guessability Using Markov Models
Python
54
star
22

gadget_synthesis

Esorics 2021 - Towards Automating Code-Reuse Attacks Using Synthesized Gadget Chains
Python
54
star
23

EvilCoder

Code for the paper EvilCoder: Automated Bug Insertion at ACSAC 2016
Java
42
star
24

JIT-Picker

Swift
34
star
25

cupid

Cupid: Automatic Fuzzer Selection for Collaborative Fuzzing
C
29
star
26

Probfuscator

An Obfuscation Approach using Probabilistic Control Flows
C#
28
star
27

Hypercube

NDSS 2020 - HYPER-CUBE: High-Dimensional Hypervisor Fuzzing
C
24
star
28

ijon-data

14
star
29

PrimGen

ACSAC 2018 paper: Towards Automated Generation of Exploitation Primitives for Web Browsers
HTML
13
star
30

adversarial-papers

TeX
12
star
31

DroneSecurity-Fuzzer

DroneSecurity Fuzzer (NDSS 2023)
11
star
32

dompteur

C++
10
star
33

we-value-your-privacy

Results and data from the paper "We Value Your Privacy ... Now Take Some Cookies: Measuring the GDPR’s Impact on Web Privacy"
9
star
34

VPS

VTable Pointer Separation
C++
7
star
35

APC

Android (Unlock) Pattern Classifier
Kotlin
6
star
36

WindowsVTV

MinGW for 32bit with Vtable pointer verification (VTV)
C++
6
star
37

nyx-net-profuzzbench

Shell
6
star
38

PriDi

Python
5
star
39

xTag-mtalloc

C
5
star
40

SUCI-artifacts

some PCAPs and logs
5
star
41

ASSS

Application-Specific Software Stacks
4
star
42

xTag

4
star
43

MiddleboxProtocolStudy

Auxiliary material for NDSS'20 paper: On Using Application-Layer Middlebox Protocols for Peeking Behind NAT Gateways
Python
4
star
44

Password-Strength-Meter-Accuracy

Measuring the Accuracy of Password Strength Meters
Python
3
star
45

uninformed-consent

Repo for material related to the CCS 2019 paper, "(Un)informed Consent: Studying GDPR Consent Notices in the Field"
3
star
46

be-the-phisher

Code related to the study presented in "Be the Phisher - Understanding Users’ Perception of Malicious Domains" @ AsiaCCS 2020
Jupyter Notebook
2
star
47

symtegrity

Code for the DIMVA 2018 paper "On the Weaknesses of Function Table Randomization"
2
star
48

MastersOfTime

2
star
49

libXSGS

Implementation of Delerablée and Pointcheval's eXtremely Short Group Signatures (XSGS)
Shell
2
star
50

xTag-llvm

C++
1
star
51

MachineCodeTimings

JavaScript
1
star
52

tropyhunter

TODO
Python
1
star
53

GDPR-fines

Supplemental Material for the PETS 2022 Paper "Investigating GDPR Fines in the Light of Data Flows"
Jupyter Notebook
1
star
54

GeneratedMediaSurvey

Jupyter Notebook
1
star