Attacks Meet Interpretability: Attribute-steered Detection of - - PowerPoint PPT Presentation

attacks meet interpretability attribute steered detection
SMART_READER_LITE
LIVE PREVIEW

Attacks Meet Interpretability: Attribute-steered Detection of - - PowerPoint PPT Presentation

Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples Guanhong Tao , Shiqing Ma, Yingqi Liu, Xiangyu Zhang Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( 50 times)


slide-1
SLIDE 1

Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples

Guanhong Tao, Shiqing Ma, Yingqi Liu, Xiangyu Zhang

slide-2
SLIDE 2

Understanding Adversarial Samples

2

Model A.J. Buckley Legitimate input C&W2 attack Isla Fisher Pixel-wise Differences (×50 times)

slide-3
SLIDE 3

Understanding Adversarial Samples

2

Model A.J. Buckley Legitimate input C&W2 attack Isla Fisher Human Pixel-wise Differences (×50 times)

slide-4
SLIDE 4

Understanding Adversarial Samples

2

Model A.J. Buckley Legitimate input C&W2 attack Isla Fisher Human Pixel-wise Differences (×50 times)

slide-5
SLIDE 5

Understanding Adversarial Samples

2

Model A.J. Buckley Legitimate input C&W2 attack Isla Fisher Human Pixel-wise Differences (×50 times)

slide-6
SLIDE 6

Understanding Adversarial Samples

2

Model A.J. Buckley Legitimate input C&W2 attack Isla Fisher Human Pixel-wise Differences (×50 times)

slide-7
SLIDE 7

Understanding Adversarial Samples

2

Model A.J. Buckley Legitimate input C&W2 attack Isla Fisher Human Pixel-wise Differences (×50 times)

slide-8
SLIDE 8
  • Idea: is the classification result of a model mainly based on human

perceptible attributes?

Understanding Adversarial Samples

2

Model A.J. Buckley Legitimate input C&W2 attack Isla Fisher Human Pixel-wise Differences (×50 times)

slide-9
SLIDE 9

Architecture of AmI

3

slide-10
SLIDE 10

Architecture of AmI

3

Input

slide-11
SLIDE 11

Architecture of AmI

3

1

Landmark generation Input

slide-12
SLIDE 12

Architecture of AmI

3

1

Landmark generation

2

✓ Left eye ✓ Right eye ✓ Nose ✓ Mouth ✓ …

Attribute annotation Input

slide-13
SLIDE 13

Architecture of AmI

3

1

Landmark generation

2

✓ Left eye ✓ Right eye ✓ Nose ✓ Mouth ✓ …

Attribute annotation

3

Attribute witness extraction Input

slide-14
SLIDE 14

Architecture of AmI

3

1

Landmark generation

2

✓ Left eye ✓ Right eye ✓ Nose ✓ Mouth ✓ …

Attribute annotation

3

Attribute witness extraction

4

Attribute-steered model Input

slide-15
SLIDE 15

Architecture of AmI

3

1

Landmark generation

2

✓ Left eye ✓ Right eye ✓ Nose ✓ Mouth ✓ …

Attribute annotation

3

Attribute witness extraction

4

Attribute-steered model Original model Input

slide-16
SLIDE 16

Architecture of AmI

3

1

Landmark generation

2

✓ Left eye ✓ Right eye ✓ Nose ✓ Mouth ✓ …

Attribute annotation

3

Attribute witness extraction

4

Attribute-steered model Original model

Input

5

Consistency

  • bserver
slide-17
SLIDE 17

Architecture of AmI

3

1

Landmark generation

2

✓ Left eye ✓ Right eye ✓ Nose ✓ Mouth ✓ …

Attribute annotation

3

Attribute witness extraction

4

Attribute-steered model Original model

Input

5

Consistency

  • bserver
slide-18
SLIDE 18
  • Are there correspondences between attributes and neurons?
  • If yes, how to extract corresponding neurons?

Challenges

4

slide-19
SLIDE 19
  • Are there correspondences between attributes and neurons?
  • If yes, how to extract corresponding neurons?
  • Propose: Bi-directional reasoning

Challenges

4

slide-20
SLIDE 20
  • Are there correspondences between attributes and neurons?
  • If yes, how to extract corresponding neurons?
  • Propose: Bi-directional reasoning
  • Forward: attribute changes —> neuron activation changes

Challenges

4

slide-21
SLIDE 21
  • Are there correspondences between attributes and neurons?
  • If yes, how to extract corresponding neurons?
  • Propose: Bi-directional reasoning
  • Forward: attribute changes —> neuron activation changes

Challenges

4

  • Backward: neuron activation changes —> attribute changes
slide-22
SLIDE 22
  • Are there correspondences between attributes and neurons?
  • If yes, how to extract corresponding neurons?
  • Propose: Bi-directional reasoning
  • Forward: attribute changes —> neuron activation changes

Challenges

4

  • Backward: neuron activation changes —> attribute changes
  • Backward: no attribute changes —> no neuron activation changes
slide-23
SLIDE 23

Attribute Witness Extraction

5

slide-24
SLIDE 24

Attribute Witness Extraction

5

Input

slide-25
SLIDE 25

Attribute Witness Extraction

5

Input

Model

Attribute substitution

A

Model

C

Feature variants

slide-26
SLIDE 26

Attribute Witness Extraction

5

Input

Model

Attribute substitution Attribute preservation

A

B

Model Model Model

⊖ ⊖

C

D

Feature variants Feature invariants

slide-27
SLIDE 27

Attribute Witness Extraction

5

Input

Model

Attribute substitution Attribute preservation

A

B

Model Model Model

⊖ ⊖

C

D

Feature variants Feature invariants

E

Attribute witnesses

slide-28
SLIDE 28

Experimental Results

6

slide-29
SLIDE 29

Experimental Results

6

  • Attribute witnesses
slide-30
SLIDE 30

Experimental Results

6

  • Attribute witnesses
  • The number of witnesses extracted is smaller than 20, although there are 64-4096

neurons in each layer

slide-31
SLIDE 31

Experimental Results

6

  • Attribute witnesses
  • The number of witnesses extracted is smaller than 20, although there are 64-4096

neurons in each layer

  • Adversary detection
slide-32
SLIDE 32

Experimental Results

6

  • Attribute witnesses
  • The number of witnesses extracted is smaller than 20, although there are 64-4096

neurons in each layer

  • Adversary detection
  • Achieve 94% detection accuracy for 7 different kinds of attacks with 9.91% false

positives on benign inputs

slide-33
SLIDE 33

Experimental Results

6

  • Attribute witnesses
  • The number of witnesses extracted is smaller than 20, although there are 64-4096

neurons in each layer

  • Adversary detection
  • Achieve 94% detection accuracy for 7 different kinds of attacks with 9.91% false

positives on benign inputs

  • A state-of-the-art technique Feature Squeezing (NDSS '18) can only achieve 55%

accuracy with 23.3% false positives for face recognition systems

slide-34
SLIDE 34

Thank you! Please visit our poster #99

05:00-07:00 PM @ Room 210 & 230 AB