Geirhos et al. (2019) Introduction ImageNet classifjcation with - - PowerPoint PPT Presentation

geirhos et al 2019 introduction
SMART_READER_LITE
LIVE PREVIEW

Geirhos et al. (2019) Introduction ImageNet classifjcation with - - PowerPoint PPT Presentation

IMAGENET-TRAINED CNNS ARE BIASED TOWARDS TEXTURE; INCREASING SHAPE BIAS IMPROVES ACCURACY AND ROBUSTNESS Geirhos et al. (2019) Introduction ImageNet classifjcation with CNNs Which image cues are learned How infmuential they are


slide-1
SLIDE 1

IMAGENET-TRAINED CNNS ARE BIASED TOWARDS TEXTURE; INCREASING SHAPE BIAS IMPROVES ACCURACY AND ROBUSTNESS

Geirhos et al. (2019)

slide-2
SLIDE 2

2

  • ImageNet classifjcation with CNNs
  • Which image cues are learned
  • How infmuential they are
  • Comparison with humans

Introduction

slide-3
SLIDE 3

3

Shape Hypothesis

“The network acquires complex knowledge about the kinds of shapes associated with each category. [...] High-level units appear to learn representations of shapes

  • ccurring in natural images”

Kriegeskorte (2015) Intermediate CNN layers recognize “parts of familiar objects, and subsequent layers [...] detect objects as combinations of these parts” LeCun et al. (2015)

Testing Hypothesis

slide-4
SLIDE 4

4

Testing Hypothesis

Texture Hypothesis

CNNs can still classify texturised images perfectly well, even if the global shape structure is completely destroyed Gatys et al. (2017) and Brendel & Bethge (2019) Standard CNNs are bad at recognizing object sketches where object shapes are preserved yet all texture cues are missing Ballester & de Araújo (2016)

Shape Hypothesis

“The network acquires complex knowledge about the kinds of shapes associated with each category. [...] High-level units appear to learn representations of shapes

  • ccurring in natural images”

Kriegeskorte (2015) Intermediate CNN layers recognize “parts of familiar objects, and subsequent layers [...] detect objects as combinations of these parts” LeCun et al. (2015)

slide-5
SLIDE 5

5

Set-up

Psychophysical

  • 97 observers
  • 48,560 trials
  • 300 ms fjxation square

+ 200 ms image + 200 ms pink noise + 1500 ms category selection

  • Breaks after every 256 trials
  • Practice session of 320 trials

Model experiments

  • AlexNet
  • GoogLeNet
  • VGG-16
  • ResNet-50
slide-6
SLIDE 6

6

Experiments

Original

160 color images 10 per category white background

slide-7
SLIDE 7

7

Experiments

Original

160 color images 10 per category white background

Greyscale

As original but greyscale

slide-8
SLIDE 8

8

Experiments

Original

160 color images 10 per category white background

Greyscale

As original but greyscale

Silhouette

As original but

  • nly a manually

created black mask

slide-9
SLIDE 9

9

Experiments

Original

160 color images 10 per category white background

Greyscale

As original but greyscale

Silhouette

As original but

  • nly a manually

created black mask

Edge

Canny edge extractor on

  • riginal dataset
slide-10
SLIDE 10

10

Experiments

Original

160 color images 10 per category white background

Greyscale

As original but greyscale

Silhouette

As original but

  • nly a manually

created black mask

Edge

Canny edge extractor on

  • riginal dataset

T exture

For items with no textured areas, eg “bottles” a cluster

  • f those objects are

considered as texture

slide-11
SLIDE 11

11

Experiments

Filled silhouette experiment

Masked texture images inside The silhouettes. The textures had 360 degrees data augmentation

Cue confmict experiment

Using iterative style transfer Gatys et al. (2016)

Original content images Original texture images

slide-12
SLIDE 12

12

Cue Confmict Results

Human observers (red circles) AlexNet (purple diamonds) VGG-16 (blue triangles) GoogLeNet (turquoise circles) ResNet-50 (grey squares)

slide-13
SLIDE 13

13

Overcoming the texture bias

Stylized-ImageNet (SIN) Created by applying AdaIN style transfer to ImageNet images Huang et al. (2017)

slide-14
SLIDE 14

14

Model Metrics

T

  • p-5 Accuracy of the stylized-ImageNet trained models

compared to the ImageNet trained models

slide-15
SLIDE 15

15

Model Metrics

T

  • p-5 Accuracy of the stylized-ImageNet trained models

compared to the ImageNet trained models Shape-ResNet is the model trained jointly

  • n SIN and IN and fjne-tuned on IN
slide-16
SLIDE 16

16

Bias Results

Human observers (red circles) ResNet-50 on Stylized-Imagenet (orange squares) ResNet-50 on Imagenet (grey squares)

slide-17
SLIDE 17

17

Distortion Robustness Results

slide-18
SLIDE 18

Questions?