SLIDE 1
and Features in Deep Learning Interpretation Sahil Singla Joint - - PowerPoint PPT Presentation
and Features in Deep Learning Interpretation Sahil Singla Joint - - PowerPoint PPT Presentation
Understanding Impacts of High-Order Loss Approximations and Features in Deep Learning Interpretation Sahil Singla Joint work with Eric Wallace, Shi Feng, Soheil Feizi University of Maryland Pacific Ballroom # 69 , 6:30-9:00 PM June 13th 2019
SLIDE 2
SLIDE 3
Why Deep Learning Interpretation?
SLIDE 4
Why Deep Learning Interpretation?
Classified as y=0 (low-grade glioma) Deep neural network
SLIDE 5
Why Deep Learning Interpretation?
We need to explain AI decisions to humans Classified as y=0 (low-grade glioma) Deep neural network Saliency map to highlight salient features
SLIDE 6
Assumptions of Current Methods
Loss function
SLIDE 7
Assumptions of Current Methods
Loss function
- 1. Linear approximation of the loss
SLIDE 8
Assumptions of Current Methods
- 2. Isolated features: perturb (i) keeping all other features fixed
Loss function
- 1. Linear approximation of the loss
SLIDE 9
Assumptions of Current Methods
- 2. Isolated features: perturb (i) keeping all other features fixed
Loss function
- 1. Linear approximation of the loss
SLIDE 10
Desiderata of a New Interpretation Framework
SLIDE 11
Desiderata of a New Interpretation Framework
Loss function
SLIDE 12
Desiderata of a New Interpretation Framework
- 1. Quadratic approximation of the loss
Loss function
SLIDE 13
Desiderata of a New Interpretation Framework
- 2. Group features: find group of k pixels that maximizes the loss
- 1. Quadratic approximation of the loss
Loss function
SLIDE 14
Confronting the Second-Order term
SLIDE 15
Confronting the Second-Order term
- Optimization can be non-concave
maximization
SLIDE 16
Confronting the Second-Order term
- Optimization can be non-concave
maximization
- Hessian can be VERY LARGE:
~150k x 150k for 224 x 224 x 3 input
SLIDE 17
Confronting the Second-Order term
- Optimization can be non-concave
maximization
- Hessian can be VERY LARGE:
~150k x 150k for 224 x 224 x 3 input Concave for > L/2 where L is the largest eigenvalue of
SLIDE 18
Confronting the Second-Order term
- Optimization can be non-concave
maximization
- Hessian can be VERY LARGE:
~150k x 150k for 224 x 224 x 3 input Concave for > L/2 where L is the largest eigenvalue of Can efficiently compute Hessian vector product
SLIDE 19
When Does Second-Order Matter?
SLIDE 20
When Does Second-Order Matter?
- Theorem:
For a deep ReLU network:
SLIDE 21
When Does Second-Order Matter?
- Theorem: If the probability of the predicted class is close to
- ne and the number of classes is large:
- Theorem:
For a deep ReLU network:
SLIDE 22
Empirical results on the impact of Hessian
SLIDE 23
Empirical results on the impact of Hessian
RESNET-50 (uses only ReLU)
Confidence of predicted class
SLIDE 24
Empirical results on the impact of Hessian
RESNET-50 (uses only ReLU)
Confidence of predicted class Confidence of predicted class
SE-RESNET-50 (uses Sigmoid)
SLIDE 25
Second-Order vs First Order (qualitative)
SLIDE 26
Second-Order vs First Order (qualitative)
SLIDE 27
Second-Order vs First Order (qualitative)
SLIDE 28
Confronting the L1 term
SLIDE 29
Confronting the L1 term
SLIDE 30
Confronting the L1 term
- term is non-smooth
Not smooth at 0 y = |x|
SLIDE 31
Confronting the L1 term
- term is non-smooth
Not smooth at 0 y = |x|
- How to select ?
SLIDE 32
Confronting the L1 term
Use proximal gradient descent to
- ptimize the objective.
- term is non-smooth
Not smooth at 0 y = |x|
- How to select ?
SLIDE 33
Confronting the L1 term
Use proximal gradient descent to
- ptimize the objective.
- term is non-smooth
Select the value that induces sparsity within a range (0.75, 1).
Not smooth at 0 y = |x|
- How to select ?
SLIDE 34
Impact of Group Features
SLIDE 35
Impact of Group Features
First-Order
SLIDE 36
Impact of Group Features
First-Order Second-Order
SLIDE 37
Conclusions
- A new formulation for interpretation
Pacific Ballroom #69, 6:30-9:00 PM https://github.com/singlasahil14/CASO
- Efficient Computation