Explainable(?) Statistical ML Derek Doran Dept. of Computer Science - - PowerPoint PPT Presentation

explainable statistical ml
SMART_READER_LITE
LIVE PREVIEW

Explainable(?) Statistical ML Derek Doran Dept. of Computer Science - - PowerPoint PPT Presentation

Web and Complex Systems Lab, Wright State University Explainable(?) Statistical ML Derek Doran Dept. of Computer Science and Engineering Wright State University, Dayton, OH, USA May 8, 2017 Explainable(?) Statistical Machine


slide-1
SLIDE 1

Web and Complex Systems Lab, Wright State University

“Explainable(?)” Statistical ML

Derek Doran

  • Dept. of Computer Science and Engineering

Wright State University, Dayton, OH, USA May 8, 2017

“Explainable(?)” Statistical Machine Learning

slide-2
SLIDE 2

Web and Complex Systems Lab, Wright State University

Context

We may be aware that AI, Machine Learning, “Thinking Machines” are major technology concepts in society.

◮ To the layman (or expert), and worse, to professionals making

  • r interpreting a machine’s decision, the view looks more like...

“Explainable(?)” Statistical Machine Learning

slide-3
SLIDE 3

Web and Complex Systems Lab, Wright State University

Context

Rationalizing the decisions an AI makes is crucial!

http://www.darpa.mil/program/explainable-artificial-intelligence “Explainable(?)” Statistical Machine Learning

slide-4
SLIDE 4

Web and Complex Systems Lab, Wright State University

Peering in

Yet not all is opaque! May I propose “White”, “Grey”, “Black” box machine learning algorithms:

◮ White-box: You can see model mechanisms simple enough to

be able to trace how inputs map to outputs

◮ Grey-box: You have some vision mechanisms, but parameters

are numerous, decisions are probabilistic, or inputs get “lost” (transformed)

◮ Black-box: The model is so complex and the number and

space of parameters are so large, it is impossible to decipher any mechanisms

“Explainable(?)” Statistical Machine Learning

slide-5
SLIDE 5

Web and Complex Systems Lab, Wright State University

Peering in

◮ WB: Regression, DTs, association rule mining, linear SVMs ◮ GB: Clustering, Bayesian nets, genetic algs, logic programming ◮ BB: DNNs, matrix factorizations, non-linear dim reduction

The problem: (notionally) statisticians think they can explain models (“% of variance”), some ML/DL engineers think “explainability” is inherent

“Explainable(?)” Statistical Machine Learning

slide-6
SLIDE 6

Web and Complex Systems Lab, Wright State University

The Automatic Statistician [3]

Is the machine helping me understand the data?

“Explainable(?)” Statistical Machine Learning

slide-7
SLIDE 7

Web and Complex Systems Lab, Wright State University

“Inherent Explainability” in ML

“Explainability” in ML models from linear decision boundaries (think regression, decision trees, SVMs) Red lines are the “explanation” (essentially a rule). Classify x3 as positive because x3 satisfies E2 and E3.

Turner, Ryan. ”A Model Explanation System: Latest Updates and Extensions.”arXiv preprint arXiv:1606.09517(2016). “Explainable(?)” Statistical Machine Learning

slide-8
SLIDE 8

Web and Complex Systems Lab, Wright State University

“Inherent Explainability” in ML

This is certainly interpretable, e.g., classify teal if x2 < 0.3, but there is no explanation for why datums having x2 < 0.3 should be classified teal.

◮ What is the relation of x2 to the teal data?

Hara, Satoshi, and Kohei Hayashi. ”Making Tree Ensembles Interpretable: A Bayesian Model Selection Approach.”arXiv preprint arXiv:1606.09066(2016). “Explainable(?)” Statistical Machine Learning

slide-9
SLIDE 9

Web and Complex Systems Lab, Wright State University

“Inherent Explainability” in ML

“Explainability” in DNNs is getting hotter, but much work focuses

  • n rich labeling of input features, not explanations of decision

making

Dong, Yinpeng, et al. ”Improving Interpretability of Deep Neural Networks with Semantic Information.”arXiv preprint arXiv:1703.04096(2017). “Explainable(?)” Statistical Machine Learning

slide-10
SLIDE 10

Web and Complex Systems Lab, Wright State University

A Reasoning Hierarchy

There is evidence that the ML community may have differing definitions of what “explainable” means. And where do statistical models that “explain” covariate relationships fall into play? Perhaps there is a hierarchy of interpretable statistical ML:

◮ Interpretable: I can identify why an

input goes to an output.

◮ Explainable: I can explain how

inputs are mapped to outputs.

◮ Reasonable: I can conceptualize

how inputs are mapped to outputs.

“Explainable(?)” Statistical Machine Learning

slide-11
SLIDE 11

Web and Complex Systems Lab, Wright State University

So you were denied a loan.

Say you go to a bank and you are denied a loan. You ask “Why was I denied?”

◮ (Interpretation): “Well the account balance covariate in the

logistic regression model we use to make decisions explains 89.9% of residual variance.” “What.... does that mean?”

◮ (Explanation): “It means our system denies loan applicants

with low bank account balances.” “What is the rationale for that?”

◮ (Reasoning): “Because the system does not want to award

loans to those who do not show evidence of being able to pay them off.”

“Explainable(?)” Statistical Machine Learning

slide-12
SLIDE 12

Web and Complex Systems Lab, Wright State University

Explainability in DNN [2, 1]

Colors: LSTM activations in an RNN generating source code This is much closer to Explainable under the hierarchy: I can explain how the source code was generated: the RNN created some white space, learned the structure of a switch, etc.

“Explainable(?)” Statistical Machine Learning

slide-13
SLIDE 13

Web and Complex Systems Lab, Wright State University

Towards Reasoning DNNs

What about building explanations that let us approach reasoning for DNNs that make decisions? This is really new (very recently funded) work with:

◮ Ning Xie (WSU; PhD Student) ◮ Md Kamruzzaman Sarker (WSU; PhD Student) ◮ Pascal Hitzler (WSU) ◮ Mike Raymer (WSU) ◮ Eric Nichols (Wright State Research Institute)

(Funding provided under the Human Centered Big Data project by the Ohio Federal Research Network)

“Explainable(?)” Statistical Machine Learning

slide-14
SLIDE 14

Web and Complex Systems Lab, Wright State University

Towards Reasoning DNNs

Key idea: If input features carry meaning (e.g. semantics), and internal node activations are driven by inputs, semantics describing internal node activations could be derived by input semantics

“Explainable(?)” Statistical Machine Learning

slide-15
SLIDE 15

Web and Complex Systems Lab, Wright State University

Towards Reasoning DNNs

Semantics attached to internal nodes give us a chance to reason about their activations, developing real meaning! Engineering problem: How do we bias the network to learn node activations that are inherently explainable?

“Explainable(?)” Statistical Machine Learning

slide-16
SLIDE 16

Web and Complex Systems Lab, Wright State University

Towards Reasoning DNNs

We investigate kinds of regularization in a generic loss function

  • i

L(f (xi, Θ), yi) + λR(Θ) Where L is an error penalty, Θ are model parameters.

◮ R(Θ) =

λ1

  • m,n wmn − λ1

k,m

i,j=1 wij log wij − λ3

m,k

i,j=1 wji log wji

“Explainable(?)” Statistical Machine Learning

slide-17
SLIDE 17

Web and Complex Systems Lab, Wright State University

Proposed architectures

But we do not expect a 1:1 correspondence of input to single internal node.

◮ Ideally, “related” inputs drive some subset of internal nodes

This motivates topographic sparse coding as a regularizer: group related input features into separate ℓ1 penalties, encouraging sparser activations when many features in the group are present. R(Θ) = λ1

  • gi∈G
  • k∈gi

k2 + ǫ + λ2||Θ||2 G is some partitioning of network activities (inputs, nodes, weights, etc.)

“Explainable(?)” Statistical Machine Learning

slide-18
SLIDE 18

Web and Complex Systems Lab, Wright State University

Current Progress

(Preliminary!) We experiment with ADE20k dataset

◮ Input is an image; output is a scene label ◮ ADE20k annotates data with objects present and scene mask

We train a 2 layer fully connected architecture with binary feature vector as input.

◮ Topographic sparse coding regularization over random

grouping of nodes or weights (we are experimenting with both schemes)

“Explainable(?)” Statistical Machine Learning

slide-19
SLIDE 19

Web and Complex Systems Lab, Wright State University

Some Results

Example experiments with two fully connected hidden layers ∼ 10% reduction in classification accuracy

“Explainable(?)” Statistical Machine Learning

slide-20
SLIDE 20

Web and Complex Systems Lab, Wright State University

Looking Forward

Feature representations that learn are ‘naturally local’, e.g. a self-organizing map

“Explainable(?)” Statistical Machine Learning

slide-21
SLIDE 21

Web and Complex Systems Lab, Wright State University

Looking Forward

Feature representations that learn are ‘naturally local’ Left: 2x Fully Connected; 1st layer. Right: SOM. (Note activations are pre-scaling) The overfitting struggle is real...

“Explainable(?)” Statistical Machine Learning

slide-22
SLIDE 22

Web and Complex Systems Lab, Wright State University

Looking Forward

Turning toward convolutional architectures.. One can think Convolutions+MaxPools for feature representation carry natural localization, in fact shown at CVPR’ 16.

◮ Can we explain what CNNs recognize in images, semantically?

Experiments are active!

“Explainable(?)” Statistical Machine Learning

slide-23
SLIDE 23

Web and Complex Systems Lab, Wright State University

Thanks!

Thank you!

“Explainable(?)” Statistical Machine Learning

slide-24
SLIDE 24

Web and Complex Systems Lab, Wright State University

References

[1]

  • A. Karpathy, J. Johnson, and L. Fei-Fei.

Visualizing and understanding recurrent networks. arXiv preprint arXiv:1506.02078, 2015. [2]

  • V. Krakovna and F. Doshi-Velez.

Increasing the interpretability of recurrent neural networks using hidden markov models. arXiv preprint arXiv:1606.05320, 2016. [3]

  • J. R. Lloyd, D. Duvenaud, R. Grosse, J. B. Tenenbaum, and Z. Ghahramani.

Automatic construction and Natural-Language description of nonparametric regression models. In Association for the Advancement of Artificial Intelligence (AAAI), 2014. “Explainable(?)” Statistical Machine Learning