explaining machine learning models
play

Explaining Machine Learning Models Armen Donigian Director of Data - PowerPoint PPT Presentation

Explaining Machine Learning Models Armen Donigian Director of Data Science Engineering Roadmap + Definition of Interpretability + The Need for Interpretability + Role of Interpretability in Data Science Process + Relevant Application


  1. Explaining Machine Learning Models Armen Donigian Director of Data Science Engineering

  2. Roadmap + Definition of Interpretability + The Need for Interpretability + Role of Interpretability in Data Science Process + Relevant Application Domains + Barriers to Adoption + Conveying Interpretations + Research Directions

  3. Working Definition of Interpretability “The ability to explain or to present in understandable terms to a human.” Paper titled "Towards A Rigorous Science of Interpretable Machine Learning"

  4. The Need for Interpretability In Supervised ML, we learn a model to accomplish a specific goal by minimizing a loss function. Purpose is to trust & understand how the model uses inputs to make predictions. Train Validation loss is Not Enough ! Can’t encode needs below into single loss function... Test Bias: Non-stationarity Fairness : Overlook gender-biased word embeddings (or other protected classes) Validation (Refer paper titled: “Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings”) Safety : Infeasible to test all failure scenarios Regulatory compliance: Adverse Action & Disparate Impact Mismatched Objectives - Single-Objective: Overly associates wolves with snow - Multi-Objective trade-off: Privacy vs Prediction Quality Security : Is model vulnerable to an adversarial user? - User asks for increase in credit ceiling to increase credit score

  5. Interpretability: The Need to Keep Up GoogLeNet As our methods to learn patterns from data become more complex... Failure Modes: Adversarial examples (more complex model can have less intuitive failure modes) Small carefully constructed noise Figure 1 Paper titled "Explaining and Harnessing Adversarial Examples"

  6. Role of Interpretability: Data Science Process Application ML Engineer End User Reference Figure 1

  7. Application Domains for Interpretability Credit UW (Equal Credit Opportunity Act) - Adverse Action - Disparate Impact Neural machine translation - Bridge translation gap between source & target languages - Large corpus, unwanted co-occurrences of words which bias the model Figure 1 Medical diagnoses - Show physician regions where lesions appear in retina Figure 2 Autonomous driving - Saliency map of what model used to predict orientation & direction of steering Scientific discoveries - Show how molecules interact w/ enzymes, potential to learn causal relationships Think of the cost of an incorrect prediction! Figure 3

  8. Barrier to Adoption in Underwriting The Explainable Machine Learning Challenge FICO teams up with Google, UC Berkeley, Oxford, Imperial, MIT and UC Irvine to sponsor a contest to generate new research in the area of algorithmic explainability - Home Equity Line of Credit (HELOC) dataset - Lines of credit $5,000 to $150,000 The black box nature of machine learning algorithms means that they are currently neither interpretable nor explainable… Without explanations, these algorithms cannot meet regulatory requirements, and thus cannot be adopted by financial institutions. - FICO blog

  9. Catalogue Methods by Output Visualizations (Intuitive) DeepDream Partial Dependence Plots, Correlations, Dim Reduction, Clustering Text For image captioning, we can use stochastic neighborhood embedding using n-dims to find relative neighborhoods Asked to find bananas, DeepDream finds bananas in noise Figure 1 Examples t-SNE Find most influential training samples by unweighing different samples & observe sensitivity Influence Functions Figure 2 Figure 3

  10. Ways to Convey Interpretability (Feat Level) Naturally Interpretable Models Sensitivity Analysis: “What makes the shark less/more a shark?” ● Measure sensitivity of output to changes made in the input features ● Randomly shuffle feature values one column at a time and measure change on performance ● Saliency map of what model was looking for when it made decision ○ Which pixels lead to increase/decrease of prediction score when changed? Approach: Permutation Impact Decomposition : “What makes the shark a shark?” ● Breaks down relevance of each feature to the prediction as a whole ● Done with respect to some reference (select bottom tier of good loans) ● Feature attributions must add up to the whole prediction (normalizing factor) Approach: Backprop Figure 1 Figure 2

  11. Naturally Interpretable Models Decision Trees Linear Models f(x) = a 1 x 1 + a 2 x 2 + b contrib(x i ) = a i x i Boston f(x) - f(x 0 ) = � i contrib(x i ) housing prices dataset baseline: x 0 = (0,0) Figure 1 Decomposition: assigns blame to causes (some reference cause) Trace path of each decision & observe how it changes the regression value. ● Feature importances. How often a feature is used to make a decision? Sensitivity: Take gradient of this model w/ respect to input, coefficients Check out SHapley Additive exPlanations, treeinterpreter remain.

  12. Permutation Feature Importance Permutation feature importance Randomly shuffle feature values one column at a time and measure change on performance Pros Simple implementation Model agnostic Cons No variable interaction Computationally expensive Works when few features are important & Single pixel perturbation operate independently does not change prediction

  13. Surrogate Models (LIME) Local Interpretable Model-Agnostic Explanations Learn a simple interpretable model about the test point using proximity weighted samples Figure 1 Top 3 predicted classes Figure 2 Pros Model-agnostic Cons Computationally Expensive Figure 3

  14. Backpropagation Based Approaches Gradients (saliency map) ● Start w/ particular output ● Assign importance scores to neurons in layer below depending on function connecting those 2 layers ● Repeat process until you reach input ● With a single backward prop, you get importance scores for all features in the input Gradient w/ respect to inputs gives us feature attributions Pros Simple and efficient GPU-optimized implementation Cons Fails in flat regions (e.g. ReLU)...gives 0 when contribution isn’t zero

  15. Backprop Approaches if f(x) ≠ f(x 0 ) Improving gradients If 2 feature vectors differ only on a attribution(x 2 ) > 0 single feature but have different predictions then the differing Dealing with absence of signal feature attributions should be non-zero attribution. x Towards decomposition x 0 Define a set of axioms: Sensitivity Implementation invariance Completeness/additivity Linearity f(x) - f(x 0 ) = � i attr(x i )

  16. Backprop Approaches Better way to backprop thru RELUs - DeconvNet Equivalent to gradients, but ReLU in backwards direction - Guided Backprop Gradients, but ReLU in both directions - PatternNet/Attribution Correct gradient for correlated distracting noise - Layerwise Relevance Propagation Integrated Gradients Equivalent to input-scaled gradients - Pick starting value, scale up linearly from reference to actual value, compute gradients along the way Some other interesting approaches... - Positive & negative contribution scores - Integrated Gradients DeepLift Path integral of gradients from baseline - Compare activation of each neuron to its reference activation - DeepLIFT - Assign contribution scores based on difference True decomposition relative to baseline with discrete jump - Positive & negative contribution scores - Deep Taylor Decomposition - Generalizes to all activations - Importance is propagated even when gradient is 0 Taylor approximation about a baseline for each neuron

  17. Evaluating Interpretability Methods If we have a set of feature contributions... Spearman’s Rank-Order Correlation What % of Top-K intersect Experimental Evaluation Approaches Assign a user (domain expert) tasks based on the produced feature attributions - Show saliency maps, ask user to choose which classifier generalizes better - Show attributions & ask user to perform feature selection to improve the model - Ask user to identify classifier failure modes

  18. Adversarial Examples Interpretability can suffer from adversarial attacks independently of prediction Figure 1 Paper titled "Interpretation of Neural Networks is Fragile" Attack types Figure 2 Top-k attack Take top 5 features, create distortion which drops their rank Center attack Take center of mass, try to move it as far as it can with some constrained distortion, goal to move the center of mass of the saliency map

  19. Research Directions Better loss functions for interpretability Understand what makes certain models more interpretable and how interpretability fails Explain models in unsupervised learning, sequence learning (RNNs), and reinforcement learning E.g. generating text explanations of the actions of a reinforcement learning agent Develop interpretability techniques into tools for model diagnostics, security, and compliance

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend