Robust and Stable Black Box Explanations Hi Hima Lakka kkaraju - - PowerPoint PPT Presentation

robust and stable black box explanations
SMART_READER_LITE
LIVE PREVIEW

Robust and Stable Black Box Explanations Hi Hima Lakka kkaraju - - PowerPoint PPT Presentation

Robust and Stable Black Box Explanations Hi Hima Lakka kkaraju Nino Ar Arsov ov Os Osbert ert Bas Bastan ani Harvard University Macedonian Academy University of Pennsylvania of Arts & Sciences


slide-1
SLIDE 1

Robust and Stable Black Box Explanations

Hi Hima Lakka kkaraju Nino Ar Arsov

  • v

Os Osbert ert Bas Bastan ani

Harvard University Macedonian Academy University of Pennsylvania

  • f Arts & Sciences
slide-2
SLIDE 2

Motivation

§ ML models are increasingly proprietary and complex, and are therefore not interpretable § Several post hoc explanation techniques proposed in recent literature

§ E.g., LIME, SHAP, MUSE, Anchors, MAPLE

2

slide-3
SLIDE 3

Motivation

§ However, post hoc explanations have been shown to be unstable and unreliable

§ Small perturbations to input can substantially change the explanations; running same algorithm multiple times results in different explanations (Ghorbani et. al.) § High-fidelity explanations with very different covariates than black box (Lakkaraju & Bastani) § Also, they are not robust to distribution shifts

3

slide-4
SLIDE 4

Why can explanations be unstable?

§ Distribution !(#$, #&) where #$ and #& are perfectly correlated § Blackbox (∗ #$, #& = I #$ ≥ 0 § Explanation . / #$, #& = I #& ≥ 0 § . / has perfect fidelity, but is completely different from (∗!

§ If !(#$, #&) shifts, . / may no longer have high fidelity

4

slide-5
SLIDE 5

Why do we care?

§ Domain experts rely on explanations to validate properties of the black box model

§ Check if model uses spurious or sensitive attributes [Caruana 2015, Bastani 2017, Rudin 2019]

§ Poor explanations may mislead experts into drawing incorrect conclusions

5

slide-6
SLIDE 6

Our Contributions: ROPE

§ We propose ROPE (RObust Post hoc Explanations)

§ Framework for generating stable and robust explanations § It is flexible, e.g., it can be instantiated for local vs. global explanations as well as linear vs. rule based explanations § First approach to generating explanations robust to distribution shifts § Our experiments show that ROPE significantly improves robustness on real-world distribution shifts

6

slide-7
SLIDE 7

Robust Learning Objective

§ ROPE ensures robustness via a minimax objective: § The maximum in the objective is over possible distribution shifts !" # = ! # − & § Ensures ' ( has high fidelity for all distributions !" #

7

standard supervised learning loss for !" # worst-case over distribution shifts

slide-8
SLIDE 8

Robust Learning Objective

8

§ We can upper bound the objective as follows: § Thus, we can approximate ! " as follows:

slide-9
SLIDE 9

Class of Distribution Shifts

§ Ke Key question: How to choose Δ?

§ Determines distributions "# to which $ % is robust

§ Ou Our choice

§ &' constraint induces sparsity, i.e., only a few covariates are perturbed § &( constraint bounds the magnitude of the perturbation, i.e., covariates do not change too much

9

slide-10
SLIDE 10

Robust Linear Explanations

§ Use adversarial training, i.e., approximate stochastic gradient descent on the objective § Can approximate !∗ using a linear program

10

where

slide-11
SLIDE 11

Robust Rule Based Explanations

§ Approximate the objective using sampling § Adjust learning algorithm to handle maximum

  • ver finite set

§ For rule lists and decision sets, only count a point !, # $(!) as correct if # $(!) = (

∗ ! + +, for all of the

possible perturbations +,

11

Distribution over shifts + ∈ Δ

slide-12
SLIDE 12

Experimental Evaluation

§ Real-world distribution shifts § Approach

§ Generate explanation on one distribution (e.g., first court) § Evaluate fidelity on shifted distribution (e.g., second court)

12

Da Dataset # of

  • f

Ca Cases At Attributes Ou Outcomes

Ba Bail 31K defendants (2 courts) Criminal History, Demographic Attributes, Current Offenses Bail (Yes/No) He Healthc thcare 22K patients (2 hospitals) Symptoms, Demographic Attributes, Current & Past Conditions Diabetes (Yes/No) Ac Academic 19K students (2 schools) Grades, Absence Rates, Suspensions, Tardiness Scores Graduated High School

  • n Time (Yes/No)
slide-13
SLIDE 13

Experimental Evaluation

§ Baselines

§ LIME, SHAP, MUSE § All state-of-the-art post hoc explanation tools

§ Instantiations of ROPE

§ Linear models (comparison to LIME and SHAP) § Decision sets (comparison to MUSE) § Focus on global explanations

13

slide-14
SLIDE 14

Robustness to Real Distribution Shifts

§ Report fidelity on both original and shifted distributions, as well as percentage drop in fidelity § ROPE is substantially more robust without sacrificing fidelity on original distribution

14

slide-15
SLIDE 15

Percentage Drop in Fidelity vs. Size of Distribution Shift

§ Use synthetic data and vary size of shift § Report percentage drop in fidelity

15

slide-16
SLIDE 16

Structural Match with the Black Box

§ Choose “black box” from the same model class as explanation (e.g., linear or decision set) § Report match between explanation and black box § ROPE explanations match black box substantially better

16

slide-17
SLIDE 17

Conclusions

§ We have proposed the first framework for generating stable and robust explanations § Our approach significantly improves explanation robustness to real-world distribution shifts

17