Detecting Facial Manipulation Deepfakes Evan Kravitz, Huazhe Xu - - PowerPoint PPT Presentation

detecting facial manipulation deepfakes
SMART_READER_LITE
LIVE PREVIEW

Detecting Facial Manipulation Deepfakes Evan Kravitz, Huazhe Xu - - PowerPoint PPT Presentation

Detecting Facial Manipulation Deepfakes Evan Kravitz, Huazhe Xu April 21, 2020 1 2 https://www.instagram.com/p/ByaVigGFP2U/ 3 https://www.instagram.com/p/ByaVigGFP2U/ What is a deepfake? Synthetic image/video of a person that looks


slide-1
SLIDE 1

Detecting Facial Manipulation Deepfakes

Evan Kravitz, Huazhe Xu

April 21, 2020

1

slide-2
SLIDE 2

https://www.instagram.com/p/ByaVigGFP2U/

2

slide-3
SLIDE 3

3

https://www.instagram.com/p/ByaVigGFP2U/

slide-4
SLIDE 4

What is a deepfake?

  • Synthetic image/video of a person that looks realistic to human viewers, which

can be used to perpetrate fraud or spread misinformation

  • Deepfakes are a form of social engineering attack
  • We have focused our research on detecting facial deepfakes

4

slide-5
SLIDE 5

Social engineering attacks

5

slide-6
SLIDE 6

Face synthesis

StyleGan (2019)

6

slide-7
SLIDE 7

Face swap

7

FaceSwap (2016) Deepfake FaceSwap (2020)

slide-8
SLIDE 8

Face attribute

8

StarGAN (2018)

slide-9
SLIDE 9

Facial expression

9

Face2Face (2016)

slide-10
SLIDE 10

Protecting against deepfake

  • We need a system for authenticating media

10

Authenticator

Real/fake?

slide-11
SLIDE 11

Convolutional Neural Network (CNN)

11

slide-12
SLIDE 12

Convolutional Neural Network (CNN) cont.

12

Convolutional Neural Network

Real/fake?

slide-13
SLIDE 13

13

slide-14
SLIDE 14

Optical flow

14

Amerini et al., 2019

slide-15
SLIDE 15

CNN’s with optical flow

15

Convolutional Neural Network

Real/fake?

slide-16
SLIDE 16

CNN’s with self-labeled data (Li et al., 2019)

1. Generate “negative” examples that contain deepfake generation artifacts 2. Use “negative” examples to train a CNN

16

Convolutional Neural Network

Real/fake?

slide-17
SLIDE 17

Forensic deepfake detection

  • Forensic approach

○ Generate correlations between facial features in a video to determine “signature motion” (Agarwal et al., 2019)

17

slide-18
SLIDE 18

18

slide-19
SLIDE 19

Forensic deepfake detection cont.

19

SVM

Real/fake?

Cor(X1, X1) Cor(X1, X2) Cor(X1, X2) : : Cor(Xi, Xj)

slide-20
SLIDE 20

Our contribution

  • We aim to improve upon existing neural network and forensic feature models.

✓ Feature augmentation and enhancement ✓ Better classification model

20

slide-21
SLIDE 21

Dataset

21

Entire YouTube 8M Dataset Cropped faces from video frames Face2Face manipulated video frames

Face2Face

Original labeled data Altered labeled data

slide-22
SLIDE 22

Dataset cont.

  • 704 videos for training (368,135 images)
  • 150 videos for validation (75,526 images)
  • 50 videos for testing (77,745 images)

22

slide-23
SLIDE 23

Forensic analysis of facial landmarks

68 (x,y) coordinates = 136 features PCA 50 features Classifier Prediction

23

slide-24
SLIDE 24

Principal Component Analysis (PCA)

  • Popular technique for dimensionality reduction
  • Transform feature space into orthogonal basis features, only capture most

prominent features

  • Fewer features → less variance, less overfitting

24

slide-25
SLIDE 25

Method: Random forest classifier

  • Pros:

○ Works with few features ○ Lower variance compared to regular decision tree ○ Explainable model ○ Low cost

https://towardsdatascience.com/random-forest-classification-and-its-implementation-d5d840dbead0

25

  • Cons:

○ Hard to tune

slide-26
SLIDE 26

Method: Support vector machine

  • Pros:

○ Supports non-linear decision boundaries

https://pythonmachinelearning.pro/classification-with-support-vector-machines/

26

  • Cons:

○ Hard to tune kernel and hyperparameters

slide-27
SLIDE 27

Method: Neural Network with facial landmarks

Facial Landmark detector Features FC neural Net Output Loss: Cross Entropy loss PCA for dimension reduction Pros: Lightweight --- single GPU training Large batchsize Cons: Data hungry Need extensive tuning

27

slide-28
SLIDE 28

Metrics

Accuracy: (True Positive + True Negative) / total samples Precision: True Positives / All the predicted positives Recall: True Positives / All the actual positives

28

slide-29
SLIDE 29

Results: in-distribution samples (small scale)

  • Near perfect performance

for random forest

  • What does this imply? We

can perfectly detect fake/real across the web if we have label for part of a clip.

  • 10K training images

Random Forest NN Precision 98.52% 92.81% Recall 98.72% 85.01% Table 2: Precision and Recall for top 2 models SVM Random Forest NN Accuracy 80.00% 98.10% 85.12%

29

Table 1: Accuracy for different models

slide-30
SLIDE 30

Results: out-of-distribution training and testing

  • Both methods drops

significantly

  • Neural Net performs slightly

better (the training accuracy for NN is 90% and for random forest 99.9%)

  • Training data is too little!
  • 14K training images

Random Forest NN Precision 77.15% 79.23% Recall 58.82% 63.44% Table 2: Precision and Recall for top 2 models SVM Random Forest NN Accuracy N/A 70.50% 73.78%

30

Table 1: Accuracy of Random and NN model

slide-31
SLIDE 31

Public Benchmark Results w/ ~5 times our current training data

  • Larger net
  • More data
  • Utilize video

property

http://kaldir.vc.in.tum.de/faceforensics_benchmark/index.php?sortby=dface2face

31

slide-32
SLIDE 32

Visualized Examples

Original Image Altered Image

32

slide-33
SLIDE 33

Next steps

33

Scale up & Analysis Temporal Features Compare with public Benchmark CNN + Forensic Features

slide-34
SLIDE 34

Thank you!

34