Analyzing Artifacts in Discriminative and Generative Models Richard - - PowerPoint PPT Presentation

analyzing artifacts in discriminative and generative
SMART_READER_LITE
LIVE PREVIEW

Analyzing Artifacts in Discriminative and Generative Models Richard - - PowerPoint PPT Presentation

Analyzing Artifacts in Discriminative and Generative Models Richard Zhang ( ) Research Scientist, Adobe SF GAMES Webinar Aug 2020 Example classifications P(correct class) P(correct class) not Shift-Invariant Deep Networks are not


slide-1
SLIDE 1

Analyzing Artifacts in Discriminative and Generative Models

Richard Zhang (章睿嘉)

Research Scientist, Adobe SF GAMES Webinar Aug 2020

slide-2
SLIDE 2

Example classifications

P(correct class) P(correct class)

slide-3
SLIDE 3

Deep Networks are not not Shift-Invariant

P(correct class) P(correct class)

slide-4
SLIDE 4

c.f. Azulay and Weiss. Why y do deep convo volutional networks ks generalize ze so so poorly y to sm small image transf sformations? s? Arxiv, 2018. JMLR, 2019. Engstrom, Tsipras, Schmidt, Madry. Exp xploring the Landsc scape of Spatial Robust stness.

  • ss. Arxiv, 2017. ICML, 2019.

Deep Networks are not not Shift-Invariant

slide-5
SLIDE 5

Why is shift-invariance lost?

“Convo volutions are sh shift-equiva variant” “Po Poolin ling builds up sh shift-inva variance” …but st striding ignores Nyquist sampling theorem and aliase ses

5

slide-6
SLIDE 6

Nyquist-Shannon theorem

6

If the sa sampling frequency is less than twice the underlyi ying si signal frequency, the samples cannot faithfully y represe sent the underlying signal…

slide-7
SLIDE 7

What is aliasing?

7

…worse, the samples will misr sreprese sent the underlying signal

slide-8
SLIDE 8

8

Further squares appear bigger than closer squares

Rendering

Further away, decreased sampling rate

slide-9
SLIDE 9

Temporal aliasing

9

Magical helicopter!

slide-10
SLIDE 10

Adapted from Lectures on Digital Photography. Marc Levoy. https://sites.google.com/site/marclevoylectures/schedule

Image

slide-11
SLIDE 11

11

Adapted from Lectures on Digital Photography. Marc Levoy. https://sites.google.com/site/marclevoylectures/schedule

4x naive sub- sampling

slide-12
SLIDE 12

Blur/ prefilter

Antialiasing

  • Low sampling rate à signal cannot be

represented

  • Just don’t downsample? Expensive…
  • Naïve subsampling à high freqs

misr sreprese sented

  • Before subsampling, antialias by

blurring à high freqs unreprese sented

  • Better than aliasing!
  • Blur / prefilter / antialias / low-pass filter

12

Signal

Naïve subsample

Aliased signal Anti-

Adapted from Alexei A. Efros CS 194. https://inst.eecs.berkeley.edu//~cs194-26/fa18/

slide-13
SLIDE 13

Effect of prefiltering

13

Adapted from Lectures on Digital Photography. Marc Levoy. https://sites.google.com/site/marclevoylectures/schedule

Naïvely subsampled Pr Prefi filte tered, then subsampled

slide-14
SLIDE 14

Effect of prefiltering

14

Adapted from Lectures on Digital Photography. Marc Levoy. https://sites.google.com/site/marclevoylectures/schedule

Naïvely subsampled Pr Prefi filte tered, then subsampled

slide-15
SLIDE 15

Aliasing à Loss of shift-equivariance

15

from Jaakko Lehtinen https://twitter.com/jaakkolehtinen/status/1258102168176951299

Shift to the right Naive Prefiltered

slide-16
SLIDE 16

Making Convolutional Networks Shift-Invariant Again

Richard Zhang

In IC ICML, 2019

slide-17
SLIDE 17

Alternative downsampling methods

  • Signal processing; image processing; graphics
  • Pr

Prefi filte ter before subsampling

  • Deep learning; computer vision
  • Szeliski vision book, Sec 2.3.1 “Sampling and aliasing”
  • Average pooling [LeNet 1989] does some antialiasing
  • Max-pooling gets better accuracy [Scherer 2010]

17

Antialiasing abandoned and forgotten

slide-18
SLIDE 18

18

Re-examining Max-Pooling

slide-19
SLIDE 19

19

Re-examining Max-Pooling

max

slide-20
SLIDE 20

20

Re-examining Max-Pooling

max

slide-21
SLIDE 21

21

Re-examining Max-Pooling

max

slide-22
SLIDE 22

22

Re-examining Max-Pooling

max

slide-23
SLIDE 23

23

Re-examining Max-Pooling

max

slide-24
SLIDE 24

24

Re-examining Max-Pooling

max

slide-25
SLIDE 25

25

Re-examining Max-Pooling

max

slide-26
SLIDE 26

26

Re-examining Max-Pooling

max

slide-27
SLIDE 27

27

Re-examining Max-Pooling

slide-28
SLIDE 28

28

Re-examining Max-Pooling

max

slide-29
SLIDE 29

29

Re-examining Max-Pooling

max

slide-30
SLIDE 30

30

Re-examining Max-Pooling

max

slide-31
SLIDE 31

31

Re-examining Max-Pooling

max

slide-32
SLIDE 32

32

Re-examining Max-Pooling

max

slide-33
SLIDE 33

33

Re-examining Max-Pooling

max

slide-34
SLIDE 34

34

Re-examining Max-Pooling

slide-35
SLIDE 35

35

Re-examining Max-Pooling

slide-36
SLIDE 36

36

Re-examining Max-Pooling

slide-37
SLIDE 37

37

Re-examining Max-Pooling

Max-pooling breaks ks shift-equivariance

slide-38
SLIDE 38

Shift-equivariance Testbed

  • CIFAR
  • VGG network
  • 5 max-pools
  • Test shift-equivariance condition
  • Circular convolution/shift

pixels conv1 pool1 conv2 pool2 conv3 pool3 conv4 pool4 conv5 pool5 classifier softmax 32x32 1x1

slide-39
SLIDE 39

Perfect shift-eq. Large deviation from shift-eq.

39

pixels

conv1

pool1 conv2 pool2 conv3 pool3 conv4 pool4 conv5 pool5 classifier softmax

Shift-equivariance, per layer

Convolution is shift-equivariant

slide-40
SLIDE 40

Perfect shift-eq. Large deviation from shift-eq.

40

pixels conv1

pool1

conv2 pool2 conv3 pool3 conv4 pool4 conv5 pool5 classifier softmax

Shift-equivariance, per layer

Pooling breaks shift-equivariance

slide-41
SLIDE 41

41

pixels conv1 pool1

conv2

pool2 conv3 pool3 conv4 pool4 conv5 pool5 classifier softmax

Shift-equivariance, per layer

Perfect shift-eq. Large deviation from shift-eq.

slide-42
SLIDE 42

pixels conv1 pool1 conv2

pool2

conv3 pool3 conv4 pool4 conv5 pool5 classifier softmax

Shift-equivariance, per layer

Perfect shift-eq. Large deviation from shift-eq.

slide-43
SLIDE 43

pixels conv1 pool1 conv2 pool2

conv3

pool3 conv4 pool4 conv5 pool5 classifier softmax

Shift-equivariance, per layer

Perfect shift-eq. Large deviation from shift-eq.

slide-44
SLIDE 44

pixels conv1 pool1 conv2 pool2 conv3

pool3

conv4 pool4 conv5 pool5 classifier softmax

Shift-equivariance, per layer

Perfect shift-eq. Large deviation from shift-eq.

slide-45
SLIDE 45

pixels conv1 pool1 conv2 pool2 conv3 pool3

conv4

pool4 conv5 pool5 classifier softmax

Shift-equivariance, per layer

Perfect shift-eq. Large deviation from shift-eq.

slide-46
SLIDE 46

Perfect shift-eq. Large deviation from shift-eq.

pixels conv1 pool1 conv2 pool2 conv3 pool3 conv4

pool4

conv5 pool5 classifier softmax

Shift-equivariance, per layer

Every pooling increases periodicity

Go Goal: : Reconcile antialiasing with max-pooling

slide-47
SLIDE 47

Max x (dense sely) y)

Prese serve ves s sh shift-equiva variance max( x( ) max( x( )

Anti-aliased pooling (MaxBlurPool)

Sh Shift ft-equiva variance lost st; heavy vy aliasi sing max( x( )

Strided-MaxPool

max( x( )

Blu Blur

Prese serve ves s sh shift-eq eq.

Bl Blur ur

Equivalent Interpretation

Sh Shift ft-eq. lost st; heavy vy aliasi sing Shift eq. lost st, but reduced aliasi sing

Subsa sampling

Evaluated together as “BlurPool”

slide-48
SLIDE 48

52

(1) Max (densely)

max

slide-49
SLIDE 49

53

(2) Blur+Subsample

Shift-equivariance better preserved

slide-50
SLIDE 50

Anti-aliased

Perfect shift-eq. Large deviation

slide-51
SLIDE 51

Baseline

Perfect shift-eq. Large deviation

slide-52
SLIDE 52

Anti-aliased

Perfect shift-eq. Large deviation

slide-53
SLIDE 53

MaxPool

(stride 2)

Max

(stride 1)

BlurPool

(stride 2)

Max Pooling (VGG, AlexNet)

Conv

(stride 2)

ReLU Conv

(stride 1)

ReLU BlurPool

(stride 2)

Strided-Convolution (ResNet, MobileNetv2)

Other downsampling layers

AvgPool

(stride 2)

BlurPool

(stride 2)

Average Pooling (DenseNet)

Antialias any off-the-shelf network

slide-54
SLIDE 54

ImageNet

Shift-invariance Accuracy

slide-55
SLIDE 55

ImageNet

Shift-invariance Accuracy

Baseline Antialiased

slide-56
SLIDE 56

ImageNet

Shift-invariance Accuracy Antialiasing also improves accur accuracy acy

Baseline Antialiased

slide-57
SLIDE 57

Qualitative examples

slide-58
SLIDE 58

Striding aliases Antialiasing improves + shift-equivariance, accuracy + stability to perturbations + robustness to corruptions Code + models: s: richzhang.github.io/antialiased-cnns/

Ex: models.resnet50() à models_lpf.resnet50()

à Implications on image generation models?

63

Discussion

slide-59
SLIDE 59

Making fake images is getting easier

64

“Deepfakes” GANs

Can we create a “universal” detector?

slide-60
SLIDE 60

synthetic real

Generative Adversarial Networks

Dataset of CNN-generated fakes

Styl yleGAN Karras 2019 Bi BigGAN AN Brock 2019 Cyc ycleGAN Zhu 2017 St StarGAN Choi 2018 Ga GauGA GAN Park 2019 Se Seeing g in th the d dark Chen 2018 Su Supe perres Dai 2019 Casc scad aded ed re refine Chen 2017 IM IMLE Li 2019 Pr ProGAN Karras 2018 Facesw swap Rossler 2019

Perceptual loss Low-level vision Deepfakes

Can we create a “universal” detector?

Train/Test st Train/Test st Train/Test st Train/Test st Train/Test st Train/Test st Train/Test st Train/Test st Train/Test st Train/Test st Train/Test st

Test st: ??? et al. 20XX

slide-61
SLIDE 61

synthetic real

Generative Adversarial Networks

Dataset of CNN-generated fakes

Styl yleGAN Karras 2019 Bi BigGAN AN Brock 2019 Cyc ycleGAN Zhu 2017 St StarGAN Choi 2018 Ga GauGA GAN Park 2019 Se Seeing g in th the d dark Chen 2018 Su Supe perres Dai 2019 Casc scad aded ed re refine Chen 2017 IM IMLE Li 2019 Pr ProGAN Karras 2018 Facesw swap Rossler 2019

Perceptual loss Low-level vision Deepfakes

Tr Trai ain Test st: ??? et al. 20XX

Can we create a “universal” detector?

slide-62
SLIDE 62

Many di differ eren ences ces (architecture, dataset, objective) …but underlying co commonal aliti ties es may enable generalization

Generative Adversarial Networks

Dataset of CNN-generated fakes

Styl yleGAN Karras 2019 Bi BigGAN AN Brock 2019 Cyc ycleGAN Zhu 2017 St StarGAN Choi 2018 Ga GauGA GAN Park 2019 Se Seeing g in th the d dark Chen 2018 Su Supe perres Dai 2019 Casc scad aded ed re refine Chen 2017 IM IMLE Li 2019 Pr ProGAN Karras 2018 Facesw swap Rossler 2019

Perceptual loss Low-level vision Deepfakes

Tr Trai ain Test st

synthetic real

slide-63
SLIDE 63

CNN-generated images are surprisingly easy to spot…for now

Sheng-Yu Wang Oliver Wang Richard Zhang Andrew Owens Alexei A. Efros

In CVPR CVPR, 2020 (oral).

slide-64
SLIDE 64

!(#)

%

Training on ProGAN

&

ProGAN detector

Real vs. fake

ProGAN-generated 720K real images, 20 categories from LSUN

slide-65
SLIDE 65

Testing across architectures

&

Real vs. fake

ProGAN detector

Synthesized images from ot

  • ther

her CNNs Real images

… …

slide-66
SLIDE 66

ProGAN

100

StyleGAN

96

BigGAN

72

CycleGAN

84

StarGAN

100

GauGAN

67

CRN

94

IMLE

90

Seeing dark

96

Super-res.

94

Deep fake

98

Chance Perfect Average Precision

Training and testing on ProGAN is trivial

99 88 97 95 98 99 100 100 93 64 66 No augmentation Blur+JPEG aug (at training)

Generalizes above chance, but performance inconsistent

slide-67
SLIDE 67

StyleGAN

96

BigGAN

72

CycleGAN

84

StarGAN

100

GauGAN

67

CRN

94

ProGAN

100

IMLE

90

Seeing dark

96

Super-res.

94

Deep fake

98

Chance Perfect Average Precision

99 88 97 95 98 99 100 100 93 64 66

Augmentation is not always appropriate

No augmentation Blur+JPEG aug (at training)

slide-68
SLIDE 68

ProGAN

100

StyleGAN

96

BigGAN

72

CycleGAN

84

StarGAN

100

GauGAN

67

CRN

94

IMLE

90

Seeing dark

96

Super-res.

94

Deep fake

98

Chance Perfect Average Precision

99 88 97 95 98 99 100 100 93 64 66

Aggressive augmentation adds surprising generalization

No augmentation Blur+JPEG aug (at training)

slide-69
SLIDE 69

ProGAN

100

StyleGAN

96

BigGAN

72

CycleGAN

84

StarGAN

100

GauGAN

67

CRN

94

IMLE

90

Seeing dark

96

Super-res.

94

Deep fake

98

Chance Perfect Average Precision

99 88 97 95 98 99 100 100 93 64 66 No augmentation Blur+JPEG aug (at training)

'

Blurring at test-time AP

Indicates exploitable, generalizable artifacts at lo lowe wer frequency bands

slide-70
SLIDE 70

Discussion

  • Suggests CNN-generated images have common artifacts
  • Artifacts can be detected by a simple classifier!
  • StyleGAN2 (released af

after er our submission): 100% AP on FFHQ

  • Swapping Autoencoder (Park et al.): 95% AP on FFHQ
  • No

Note: AP is computed on a collection of images; a real/fake decision on a per-image basis is more difficult

  • Situation may not persist
  • GANs train with a discriminator
  • Future architecture changes
  • “Shallow” fakes, e.g., Photoshop

75

slide-71
SLIDE 71

Media manipulation example

https://gizmodo.com/russian-state-tv-photoshops-an-awkward-smile-on-kim-jon-1826529277 June 2018

slide-72
SLIDE 72

https://www.youtube.com/watch?v=5Qqv_C6iVvQ

slide-73
SLIDE 73
slide-74
SLIDE 74

Original

slide-75
SLIDE 75

#1 modification

slide-76
SLIDE 76

#2 modification

slide-77
SLIDE 77

#3 modification

slide-78
SLIDE 78

#4 modification

slide-79
SLIDE 79

Photoshop Undo-er

&

Or Origin/PS’ PS’d

Original

Fl Flow field

Warp Recovered original Photoshopped

slide-80
SLIDE 80

&

Original

Fl Flow field

Warp Recovered original PWC-Net

“G “Ground und trut uth” h” fl flow

  • w fi

fiel eld

Loss Loss

Photoshop Undo-er

slide-81
SLIDE 81

Manipulated

slide-82
SLIDE 82

Flow prediction

slide-83
SLIDE 83

Suggested “undo” Prediction

Suggested “undo”

slide-84
SLIDE 84

Original

slide-85
SLIDE 85

Manipulated vs. Original

slide-86
SLIDE 86

Suggested “undo” Prediction Manipulated

Undo vs. Original

slide-87
SLIDE 87

Manipulated

slide-88
SLIDE 88

Suggested “undo” Prediction Manipulated

Flow prediction

slide-89
SLIDE 89

Suggested “undo” Prediction

Suggested “undo”

slide-90
SLIDE 90

Original

slide-91
SLIDE 91

Manipulated vs. Original

slide-92
SLIDE 92

Suggested “undo” Prediction Manipulated

Undo vs. Original

slide-93
SLIDE 93

Reversal evaluation

PSNR (dB)

29.0 31.2 42.7 Before Prediction GT Flow

Senses of generalization

  • Heldout artist data

Held-out artist generated data

slide-94
SLIDE 94

Reversal evaluation

+0.61 dB With Aug +0.15 dB No aug

Senses of generalization

  • Heldout artist data
  • Post-processing
  • Different warp, image domains

Facebook post-processing

Data augmentation important (again)

slide-95
SLIDE 95

Original Photo

Snapchat warps

slide-96
SLIDE 96

Manipulated Photo

Snapchat warps

slide-97
SLIDE 97

Suggested “undo” Prediction Manipulated

Flow Prediction

Snapchat warps

slide-98
SLIDE 98

Suggested “Undo”

Snapchat warps

slide-99
SLIDE 99

Original Photo

Snapchat warps

Some generalization across warp methods

slide-100
SLIDE 100

Different image domain

slide-101
SLIDE 101

Different image domain

slide-102
SLIDE 102

Different image domain

slide-103
SLIDE 103

Predicted warp (not successful)

Does not generalize well to arbitrary image

slide-104
SLIDE 104

Warped noise

slide-105
SLIDE 105

Warped noise

slide-106
SLIDE 106

Warped noise

slide-107
SLIDE 107

Warped noise

Successfully identifies warp from noise map

slide-108
SLIDE 108

Discussion

  • Suggests a multi-prong approach
  • For rapidly evolving tools, continuous training and generalize
  • For a relatively static tool, directly specialize
  • Detection is only a piece of the puzzle
  • e.g., Content Authenticity Initiative: https://contentauthenticity.org/,

collaboration between Adobe, New York Times, and Twitter

119

slide-109
SLIDE 109

Thank You!

Making convolutional networks shift-invariant again.

  • R. Zhang. In ICM

ICML, 2019.

richzhang.github.io/antialiased-cnns/

CNN-generated images are surprisingly easy to spot…for now.

Wang, Wang, Zhang, Owens, Efros. In CVPR CVPR, 2020.

peterwang512.github.io/CNNDetection/ Antialiasing code, models, and weights

Detecting Photoshopped Images by Scripting Photoshop.

Wang, Wang, Owens, Zhang, Efros. In ICCV ICCV, 2019.

peterwang512.github.io/FALdetector/

slide-110
SLIDE 110

Backup

121

slide-111
SLIDE 111

122

slide-112
SLIDE 112

123

slide-113
SLIDE 113

Baseline vs Anti-aliased

slide-114
SLIDE 114

Loss Function

  • End point error :
slide-115
SLIDE 115

Loss Function

  • Multi-scale gradient error:
slide-116
SLIDE 116

Loss Function

  • Reconstruction error:
slide-117
SLIDE 117

Loss Function

  • Reconstruction error:
slide-118
SLIDE 118

Loss Function

  • Reconstruction error: