The Mythos of Model Interpretability Zachary C. Lipton - - PowerPoint PPT Presentation

the mythos of model interpretability
SMART_READER_LITE
LIVE PREVIEW

The Mythos of Model Interpretability Zachary C. Lipton - - PowerPoint PPT Presentation

The Mythos of Model Interpretability Zachary C. Lipton https://arxiv.org/abs/1606.03490 Outline What is interpretability ? What are its desiderata? What model properties confer interpretability? Caveats, pitfalls, and takeaways


slide-1
SLIDE 1

The Mythos of Model Interpretability

Zachary C. Lipton https://arxiv.org/abs/1606.03490

slide-2
SLIDE 2

Outline

  • What is interpretability?
  • What are its desiderata?
  • What model properties confer interpretability?
  • Caveats, pitfalls, and takeaways
slide-3
SLIDE 3

What is Interpretability?

  • Many papers make axiomatic claims that some

model is interpretable and therefore preferable

  • But what interpretability is and precisely what

desiderata it serves are seldom defined

  • Does interpretability hold consistent meaning

across papers?

slide-4
SLIDE 4

Inconsistent Definitions

  • Papers use the words interpretable, explainable,

intelligible, transparent, and understandable, both interchangeably (within papers) and inconsistently (across papers)

  • One common thread, however, is that

interpretability is something other than performance

slide-5
SLIDE 5

We want good models

Evaluation Metric

slide-6
SLIDE 6

We also want interpretable models

Evaluation Metric Interpretation

slide-7
SLIDE 7

The Human Wants Something the Metric Doesn’t

Evaluation Metric Interpretation

slide-8
SLIDE 8

What Gives?

  • So either the metric captures everything and people

seeking interpretable models are crazy or…

  • The metric / loss functions we optimize are

fundamentally mismatched from real life objectives

  • We hope to refine the discourse on interpretability,


introducing more specific language

  • Through the lens of the literature, we create a taxonomy
  • f both objectives & methods

slide-9
SLIDE 9

Outline

  • What is interpretability?
  • What are its desiderata?
  • What model properties confer interpretability?
  • Caveats, pitfalls, and takeaways
slide-10
SLIDE 10

Trust

  • Does the model know 


when it’s uncertain?

  • Does the model make 


same mistakes as human?

  • Are we comfortable 


with the model?

slide-11
SLIDE 11

Causality

  • We may want models to


tell us something about 
 the natural world

  • Supervised models are 


trained simply to make 
 predictions, but often used to take actions

  • Caruana (2015) shows a mortality predictor (for use

in triage) that assigns lower risk to asthma patients

slide-12
SLIDE 12

Transferability

  • The idealized training 


setups often differ from 
 real world

  • Real problem may be 


non-stationary, noisier,
 etc.

  • Want sanity-checks that 


the model doesn’t depend


  • n weaknesses in setup
slide-13
SLIDE 13

Informativeness

  • We may train a model


to make a decision

  • But it’s real purpose is 


to aid a person in 
 making a decision

  • Thus an interpretation


may simply be valuable for the extra bits it carries

slide-14
SLIDE 14

Outline

  • What is interpretability?
  • What are its desiderata?
  • What model properties confer interpretability?
  • Caveats, pitfalls, and takeaways
slide-15
SLIDE 15

Transparency

  • Proposed solutions conferring interpretability tend

to fall into two categories

  • Transparency addresses understanding how the

model works

  • Explainability concerns the model’s ability to offer

some (potentially post-hoc) explanation

slide-16
SLIDE 16

Simulatability

  • One notion of transparency


is simplicity

  • This accords with papers 


advocating small decision trees

  • A model is transparent if a 


person can step through the
 algorithm in reasonable time

slide-17
SLIDE 17

Decomposability

  • A relaxed notion requires


understanding individual
 components of a model

  • Such as: weights of a linear 


model or the nodes of a 
 decision tree

slide-18
SLIDE 18

Transparent Algorithms

  • A yet weaker notion


would require only
 that we understand the 
 behavior algorithm

  • E.g. convergence of 


convex optimizations,
 generalization bounds 


slide-19
SLIDE 19

Post-Hoc Interpretability

A h y e s , s

  • m

e t h i n g c

  • l

i s h a p p e n i n g i n n

  • d

e 7 5 , 3 4 5 , 1 6 7 … m a y b e i t s e e s a c a t ? 
 M a y b e w e ’ l l s e e s

  • m

e t h i n g a w e s

  • m

e i f w e j i g g l e t h e i n p u t s ?

slide-20
SLIDE 20

Verbal Explanations

  • Just as people generate 


explanations (absent 
 transparency), we might
 train a (possibly separate)
 model to generate 
 explanations

  • We might consider image 


captions as interpretations


  • f object predictions


(Image: Karpathy et al 2015)

slide-21
SLIDE 21

Saliency Maps

  • While the full relationship 


between input and output
 might be impossible to 
 describe succinctly,
 local explanations are 
 potentially useful. (Image: Wang et al 2016)

slide-22
SLIDE 22

Case-Based Explanations

  • Another way to generate


a post-hoc explanation might
 be to retrieve labeled items 
 that are deemed similar 
 by the model

  • For some models, we can 


retrieve histories from 
 similar patients
 (Image: Mikolov et al 2014)

slide-23
SLIDE 23

Outline

  • What is interpretability?
  • What are its desiderata?
  • What model properties confer interpretability?
  • Caveats, pitfalls, and takeaways
slide-24
SLIDE 24

Discussion Points

  • Linear models not strictly more interpretable than

deep learning

  • Claims about interpretability must be qualified
  • Transparency may be at odds with the goals of AI
  • Post-hoc interpretations may potentially mislead
slide-25
SLIDE 25

Thanks!

Acknowledgments: Zachary C. Lipton was supported by the Division of Biomedical Informatics at UCSD, via training grant (T15LM011271) from the NIH/NLM. Thanks to Charles Elkan, Julian McAuley, David Kale, Maggie Makar, Been Kim, Lihong Li, Rich Caruana, Daniel Fried, Jack Berkowitz, & Sepp Hochreiter

References: The Mythos of Model Interpretability (ICML Workshop on Human Interpretability 2016) - ZC Lipton http://arxiv.org/abs/1511.03677 Directly Modeling Missing Data with RNNs (MLHC 2016) - ZC Lipton, DC Kale, R Wetzel http://arxiv.org/abs/1606.04130
 Learning to Diagnose (ICLR 2016) - ZC Lipton, DC Kale, Charles Elkan, R Wetzel http://arxiv.org/abs/1511.03677 Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. (2015) - R Caruana et al
 http://dl.acm.org/citation.cfm?id=2788613