and Features in Deep Learning Interpretation Sahil Singla Joint - - PowerPoint PPT Presentation

and features in deep learning interpretation
SMART_READER_LITE
LIVE PREVIEW

and Features in Deep Learning Interpretation Sahil Singla Joint - - PowerPoint PPT Presentation

Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation Sahil Singla Joint work with Eric Wallace, Shi Feng, Soheil Feizi University of Maryland Pacific Ballroom # 69 , 6:30-9:00 PM June 13th 2019


slide-1
SLIDE 1

Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation

Sahil Singla Joint work with Eric Wallace, Shi Feng, Soheil Feizi University of Maryland Pacific Ballroom #69, 6:30-9:00 PM June 13th 2019 https://github.com/singlasahil14/CASO

slide-2
SLIDE 2

Why Deep Learning Interpretation?

slide-3
SLIDE 3

Why Deep Learning Interpretation?

slide-4
SLIDE 4

Why Deep Learning Interpretation?

Classified as y=0 (low-grade glioma) Deep neural network

slide-5
SLIDE 5

Why Deep Learning Interpretation?

We need to explain AI decisions to humans Classified as y=0 (low-grade glioma) Deep neural network Saliency map to highlight salient features

slide-6
SLIDE 6

Assumptions of Current Methods

Loss function

slide-7
SLIDE 7

Assumptions of Current Methods

Loss function

  • 1. Linear approximation of the loss
slide-8
SLIDE 8

Assumptions of Current Methods

  • 2. Isolated features: perturb (i) keeping all other features fixed

Loss function

  • 1. Linear approximation of the loss
slide-9
SLIDE 9

Assumptions of Current Methods

  • 2. Isolated features: perturb (i) keeping all other features fixed

Loss function

  • 1. Linear approximation of the loss
slide-10
SLIDE 10

Desiderata of a New Interpretation Framework

slide-11
SLIDE 11

Desiderata of a New Interpretation Framework

Loss function

slide-12
SLIDE 12

Desiderata of a New Interpretation Framework

  • 1. Quadratic approximation of the loss

Loss function

slide-13
SLIDE 13

Desiderata of a New Interpretation Framework

  • 2. Group features: find group of k pixels that maximizes the loss
  • 1. Quadratic approximation of the loss

Loss function

slide-14
SLIDE 14

Confronting the Second-Order term

slide-15
SLIDE 15

Confronting the Second-Order term

  • Optimization can be non-concave

maximization

slide-16
SLIDE 16

Confronting the Second-Order term

  • Optimization can be non-concave

maximization

  • Hessian can be VERY LARGE:

~150k x 150k for 224 x 224 x 3 input

slide-17
SLIDE 17

Confronting the Second-Order term

  • Optimization can be non-concave

maximization

  • Hessian can be VERY LARGE:

~150k x 150k for 224 x 224 x 3 input Concave for > L/2 where L is the largest eigenvalue of

slide-18
SLIDE 18

Confronting the Second-Order term

  • Optimization can be non-concave

maximization

  • Hessian can be VERY LARGE:

~150k x 150k for 224 x 224 x 3 input Concave for > L/2 where L is the largest eigenvalue of Can efficiently compute Hessian vector product

slide-19
SLIDE 19

When Does Second-Order Matter?

slide-20
SLIDE 20

When Does Second-Order Matter?

  • Theorem:

For a deep ReLU network:

slide-21
SLIDE 21

When Does Second-Order Matter?

  • Theorem: If the probability of the predicted class is close to
  • ne and the number of classes is large:
  • Theorem:

For a deep ReLU network:

slide-22
SLIDE 22

Empirical results on the impact of Hessian

slide-23
SLIDE 23

Empirical results on the impact of Hessian

RESNET-50 (uses only ReLU)

Confidence of predicted class

slide-24
SLIDE 24

Empirical results on the impact of Hessian

RESNET-50 (uses only ReLU)

Confidence of predicted class Confidence of predicted class

SE-RESNET-50 (uses Sigmoid)

slide-25
SLIDE 25

Second-Order vs First Order (qualitative)

slide-26
SLIDE 26

Second-Order vs First Order (qualitative)

slide-27
SLIDE 27

Second-Order vs First Order (qualitative)

slide-28
SLIDE 28

Confronting the L1 term

slide-29
SLIDE 29

Confronting the L1 term

slide-30
SLIDE 30

Confronting the L1 term

  • term is non-smooth

Not smooth at 0 y = |x|

slide-31
SLIDE 31

Confronting the L1 term

  • term is non-smooth

Not smooth at 0 y = |x|

  • How to select ?
slide-32
SLIDE 32

Confronting the L1 term

Use proximal gradient descent to

  • ptimize the objective.
  • term is non-smooth

Not smooth at 0 y = |x|

  • How to select ?
slide-33
SLIDE 33

Confronting the L1 term

Use proximal gradient descent to

  • ptimize the objective.
  • term is non-smooth

Select the value that induces sparsity within a range (0.75, 1).

Not smooth at 0 y = |x|

  • How to select ?
slide-34
SLIDE 34

Impact of Group Features

slide-35
SLIDE 35

Impact of Group Features

First-Order

slide-36
SLIDE 36

Impact of Group Features

First-Order Second-Order

slide-37
SLIDE 37

Conclusions

  • A new formulation for interpretation

Pacific Ballroom #69, 6:30-9:00 PM https://github.com/singlasahil14/CASO

  • Efficient Computation

➢ Second-Order information ➢ Group Features