On Learning Sparse Boolean Formulae For Explaining AI Decisions - - PowerPoint PPT Presentation

on learning sparse boolean formulae for
SMART_READER_LITE
LIVE PREVIEW

On Learning Sparse Boolean Formulae For Explaining AI Decisions - - PowerPoint PPT Presentation

On Learning Sparse Boolean Formulae For Explaining AI Decisions SUSMIT JHA, SRI INTERNATIONAL VASUMATHI RAMAN, ZOOX INC. ALESSANDRO PINTO, TUHIN SAHAI, AND MICHAEL FRANCIS UNITED TECHNOLOGIES RESEARCH CENTER, BERKELEY 1 5/20/2017 Ubiquitous


slide-1
SLIDE 1

On Learning Sparse Boolean Formulae For Explaining AI Decisions

SUSMIT JHA, SRI INTERNATIONAL VASUMATHI RAMAN, ZOOX INC. ALESSANDRO PINTO, TUHIN SAHAI, AND MICHAEL FRANCIS UNITED TECHNOLOGIES RESEARCH CENTER, BERKELEY

5/20/2017

1

slide-2
SLIDE 2

Ubiquitous AI and need for explaination

5/20/2017

2

Why did we take the San Mateo bridge instead of the Bay Bridge ?

  • This route is faster.
  • There is traffic on Bay

Bridge.

  • There is an accident just

after Bay Bridge backing up traffic.

slide-3
SLIDE 3

Decision/Recommendation/Classification

5/20/2017

3

Autonomous Systems Certification Medical Diagnosis Incorrect ML Recommendations

slide-4
SLIDE 4

Decision/Recommendation/Classification

5/20/2017

4

Autonomous Systems Certification Medical Diagnosis Incorrect ML Recommendations Scalable but less interpretable : Neural Networks, Support Vector Interpretable but less scalable: Decision Trees, Propositional Rules

slide-5
SLIDE 5

Even `algorithmic’ decision making: A* Path Planning

5

A* is an algorithm that: Uses heuristic to guide search While ensuring that it will compute a path with minimum cost A* computes the function f(n) = g(n) + h(n) “actual cost” “estimated cost” f(n) = g(n) + h(n) g(n) = “cost from the starting node to reach n” h(n) = “estimate of the cost of the cheapest path from n to the goal node”

slide-6
SLIDE 6

Example: A* Path Planner

6

slide-7
SLIDE 7

Example: A* Path Planner

7

Why didn’t we go through X / Z ?

slide-8
SLIDE 8

Example: A* Path Planner

8

  • 1. Internal details of algorithm and its implementation is

unknown to human observer/user. So, explanation must be in terms of a common vocabulary.

  • 2. Some explanations need further deduction and

inference that were not performed by the AI algorithm while making the original decision.

  • 3. Decision making is often accomplished through complex

composition of a number of AI algorithms and hence, an explanation process would be practical only if it did not require detailed modeling of each AI algorithm.

slide-9
SLIDE 9

Local Explanations of Complex Models

5/20/2017

9

slide-10
SLIDE 10

Local Explanations of Complex Models

5/20/2017

10

Sufficient Cause

slide-11
SLIDE 11

Local Explanations of Complex Models

5/20/2017

11

Simplified Sufficient Cause

slide-12
SLIDE 12

Local Explanations in AI

5/20/2017

12

Simplified Sufficient Cause Formulation in AI:

  • Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. "Why Should I Trust You?:

Explaining the Predictions of Any Classifier." Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2016.

  • Hayes, Bradley, and Julie A. Shah. "Improving Robot Controller Transparency Through

Autonomous Policy Explanation." Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction. ACM, 2017.

Measure of how well g approximates f Measure of complexity of g

slide-13
SLIDE 13

Model Agnostic Explanation through Boolean Learning

5/20/2017

13

Why does the path not go through Green? Let each point in k-dimensions (for some k) correspond to a map. Maps in which optimum path goes via green Maps in which optimum path does not go via green Find a Boolean formula 𝜚 such that 𝜚 ⇔ 𝑄𝑏𝑢ℎ 𝑑𝑝𝑜𝑢𝑏𝑗𝑜 𝑨 𝜚 ⇒ 𝑄𝑏𝑢ℎ 𝑑𝑝𝑜𝑢𝑏𝑗𝑜 𝑨

slide-14
SLIDE 14

Example: Explanations in A*

A*

𝜚𝑓𝑦𝑞𝑚𝑏𝑗𝑜 : Using explanation vocabulary Ex: Obstacle presence 𝜚𝑟𝑣𝑓𝑠𝑧 : Some property of the output Ex: Some cells not selected

𝝔𝒇𝒚𝒒𝒎𝒃𝒋𝒐 ⇒ 𝝔𝒓𝒗𝒇𝒔𝒛 𝝔𝒇𝒚𝒒𝒎𝒃𝒋𝒐 ⇔ 𝝔𝒓𝒗𝒇𝒔𝒛

slide-15
SLIDE 15

Example: Explanations in A*

A*

𝜚𝑓𝑦𝑞𝑚𝑏𝑗𝑜 : Using explanation vocabulary Ex: Obstacle presence 𝜚𝑟𝑣𝑓𝑠𝑧 : Some property of the output Ex: Some cells not selected

𝑴𝒇𝒃𝒔𝒐 𝑬𝒇𝒅𝒋𝒕𝒋𝒑𝒐 𝑼𝒔𝒇𝒇𝒕 For 𝝔𝒇𝒚𝒒𝒎𝒃𝒋𝒐 𝑽𝒕𝒋𝒐𝒉 𝑴𝒃𝒄𝒇𝒎𝒕 𝒈𝒑𝒔 𝝔𝒓𝒗𝒇𝒔𝒛

slide-16
SLIDE 16

Example: Explanations in A*

A*

𝑴𝒇𝒃𝒔𝒐 𝑬𝒇𝒅𝒋𝒕𝒋𝒑𝒐 𝑼𝒔𝒇𝒇𝒕 For 𝝔𝒇𝒚𝒒𝒎𝒃𝒋𝒐 𝑽𝒕𝒋𝒐𝒉 𝑴𝒃𝒄𝒇𝒎𝒕 𝒈𝒑𝒔 𝝔𝒓𝒗𝒇𝒔𝒛

50x50 grid has 2250𝑌50 possible explanations even if vocabulary only considers presence/absence of obstacles. Scalability: Usually the feature space or vocabulary is large. For a map, its order

  • f features in the map. For an image, it is order of the image’s resolution.

Guarantee: Is the sampled space of maps enough to generate the explanation with some quantifiable probabilistic guarantee?

slide-17
SLIDE 17

Example: Explanations in A*

Theoretical Result: Learning Boolean formula even approximately is hard. 3-DNF is not learnable in Probably Approximately Correct framework unless RP = NP.

Definition

slide-18
SLIDE 18

Two Key Ideas

Active learning Boolean formula 𝝔𝒇𝒚𝒒𝒎𝒃𝒋𝒐 and not learning from fixed sample. Explanations are often short and involve only few variables !

  • 1. Vocabulary is large.
  • 2. How many samples (and what

distribution) to consider for learning explanation ?

  • 3. Learning Boolean formula with

PAC guarantees is hard.

slide-19
SLIDE 19

Two Key Ideas

Active learning Boolean formula 𝝔𝒇𝒚𝒒𝒎𝒃𝒋𝒐 and not learning from fixed sample. Explanations are often short and involve only few variables !

slide-20
SLIDE 20

Two Key Ideas

Active learning Boolean formula 𝝔𝒇𝒚𝒒𝒎𝒃𝒋𝒐 and not learning from fixed sample. Explanations are often short and involve only few variables ! Involves only two variables. If we knew which two, we had

  • nly 222 = 16

possible explanations. How do we find these relevant variables?

slide-21
SLIDE 21

Actively Learning Boolean Formula

𝜚 Evaluates assignments and returns T,F

Assignments to V m1 = (0,0,0,1,1,0,1) m2 = (0,0,1,1,0,1,0)

A*

𝜚𝑟𝑣𝑓𝑠𝑧 : Some property of the output Ex: Some cells not selected 𝜚𝑓𝑦𝑞𝑚𝑏𝑗𝑜 (V) : Using explanation vocabulary Ex: Obstacle presence

slide-22
SLIDE 22

Actively Learning Relevant Variables

Assignments to V m1 = (0,0,0,1,1,0,1) 𝐺𝑗𝑜𝑒 𝑉 𝑡𝑣𝑑ℎ 𝑢ℎ𝑏𝑢 𝜚𝑓𝑦𝑞𝑚𝑏𝑗𝑜 V ≡ 𝜚𝑓𝑦𝑞𝑚𝑏𝑗𝑜 𝑉 𝑥ℎ𝑓𝑠𝑓 𝑉 ≪ |𝑊|

m1 : True

slide-23
SLIDE 23

Actively Learning Relevant Variables

Assignments to V m1 = (0,0,0,1,1,0,1) m2 = (0,0,1,1,0,1,0) 𝐺𝑗𝑜𝑒 𝑉 𝑡𝑣𝑑ℎ 𝑢ℎ𝑏𝑢 𝜚𝑓𝑦𝑞𝑚𝑏𝑗𝑜 V ≡ 𝜚𝑓𝑦𝑞𝑚𝑏𝑗𝑜 𝑉 𝑥ℎ𝑓𝑠𝑓 𝑉 ≪ |𝑊|

m1: True, m2: False

Random Sample Till Oracle differs

slide-24
SLIDE 24

Actively Learning Relevant Variables

Assignments to V m1 = (0,0,0,1,1,0,1) m2 = (0,0,1,1,0,1,0) m3 = (0,0,0,1,1,1,0) 𝐺𝑗𝑜𝑒 𝑉 𝑡𝑣𝑑ℎ 𝑢ℎ𝑏𝑢 𝜚𝑓𝑦𝑞𝑚𝑏𝑗𝑜 V ≡ 𝜚𝑓𝑦𝑞𝑚𝑏𝑗𝑜 𝑉 𝑥ℎ𝑓𝑠𝑓 𝑉 ≪ |𝑊|

m1: True, m2: False

slide-25
SLIDE 25

Actively Learning Relevant Variables

Assignments to V m1 = (0,0,0,1,1,0,1) m2 = (0,0,1,1,0,1,0) m3 = (0,0,0,1,1,1,0) 𝐺𝑗𝑜𝑒 𝑉 𝑡𝑣𝑑ℎ 𝑢ℎ𝑏𝑢 𝜚𝑓𝑦𝑞𝑚𝑏𝑗𝑜 V ≡ 𝜚𝑓𝑦𝑞𝑚𝑏𝑗𝑜 𝑉 𝑥ℎ𝑓𝑠𝑓 𝑉 ≪ |𝑊|

m1: True, m2: False m3: True

slide-26
SLIDE 26

Actively Learning Relevant Variables

Assignments to V m1 = (0,0,0,1,1,0,1) m2 = (0,0,1,1,0,1,0) m3 = (0,0,0,1,1,1,0) 𝐺𝑗𝑜𝑒 𝑉 𝑡𝑣𝑑ℎ 𝑢ℎ𝑏𝑢 𝜚𝑓𝑦𝑞𝑚𝑏𝑗𝑜 V ≡ 𝜚𝑓𝑦𝑞𝑚𝑏𝑗𝑜 𝑉 𝑥ℎ𝑓𝑠𝑓 𝑉 ≪ |𝑊|

m1: True, m2: False m3: True

Hamming Distance = 4 Hamming Distance = 2

slide-27
SLIDE 27

Assignments to V m2 = (0,0,1,1,0,1,0) m3 = (0,0,0,1,1,1,0) m4 = (0,0,1,1,1,1,0)

Actively Learning Relevant Variables

𝐺𝑗𝑜𝑒 𝑉 𝑡𝑣𝑑ℎ 𝑢ℎ𝑏𝑢 𝜚𝑓𝑦𝑞𝑚𝑏𝑗𝑜 V ≡ 𝜚𝑓𝑦𝑞𝑚𝑏𝑗𝑜 𝑉 𝑥ℎ𝑓𝑠𝑓 𝑉 ≪ |𝑊|

m2: False, m3: True m4: True

Hamming Distance = 2 Hamming Distance = 1

slide-28
SLIDE 28

Assignments to V m2 = (0,0,1,1,0,1,0) m3 = (0,0,0,1,1,1,0) m4 = (0,0,1,1,1,1,0)

Actively Learning Relevant Variables

𝐺𝑗𝑜𝑒 𝑉 𝑡𝑣𝑑ℎ 𝑢ℎ𝑏𝑢 𝜚𝑓𝑦𝑞𝑚𝑏𝑗𝑜 V ≡ 𝜚𝑓𝑦𝑞𝑚𝑏𝑗𝑜 𝑉 𝑥ℎ𝑓𝑠𝑓 𝑉 ≪ |𝑊|

m2: False, m3: True m4: True

Hamming Distance = 2 Hamming Distance = 1

slide-29
SLIDE 29

Assignments to V m2 = (0,0,1,1,0,1,0) m4 = (0,0,1,1,1,1,0)

Actively Learning Relevant Variables

𝐺𝑗𝑜𝑒 𝑉 𝑡𝑣𝑑ℎ 𝑢ℎ𝑏𝑢 𝜚𝑓𝑦𝑞𝑚𝑏𝑗𝑜 V ≡ 𝜚𝑓𝑦𝑞𝑚𝑏𝑗𝑜 𝑉 𝑥ℎ𝑓𝑠𝑓 𝑉 ≪ |𝑊|

m2: False, m4: True

Hamming Distance = 1

Fifth variable 𝒘𝟔 is relevant !!

slide-30
SLIDE 30

Actively Learning Relevant Variables

𝐺𝑗𝑜𝑒 𝑉 𝑡𝑣𝑑ℎ 𝑢ℎ𝑏𝑢 𝜚𝑓𝑦𝑞𝑚𝑏𝑗𝑜 V ≡ 𝜚𝑓𝑦𝑞𝑚𝑏𝑗𝑜 𝑉 𝑥ℎ𝑓𝑠𝑓 𝑉 ≪ |𝑊|

Random Sample Till Oracle differs Binary Search Over Hamming Distance

𝑚𝑜(1/(1 − 𝜆)) 𝑚𝑜(|𝑊|) 2|𝑉|

For each assignment to relevant variables

Relevant variables of 𝝔𝒇𝒚𝒒𝒎𝒃𝒋𝒐 found with confidence 𝝀 in 𝟑 𝐕 𝒎𝒐( 𝑾 𝟐 − 𝝀)

slide-31
SLIDE 31

Actively Learning Boolean Formula

𝐺𝑗𝑜𝑒 𝑉 𝑡𝑣𝑑ℎ 𝑢ℎ𝑏𝑢 𝜚𝑓𝑦𝑞𝑚𝑏𝑗𝑜 V ≡ 𝜚𝑓𝑦𝑞𝑚𝑏𝑗𝑜 𝑉 𝑥ℎ𝑓𝑠𝑓 𝑉 ≪ |𝑊|

Build Truth Table for the relevant variables U

𝑋𝑝𝑠𝑡𝑢 𝐷𝑏𝑡𝑓: 2|𝑉|

Used distinguishing input based approach from ICSE’10

𝝔𝒇𝒚𝒒𝒎𝒃𝒋𝒐 found with confidence 𝝀 in 𝟑 𝐕 (𝟐 + 𝒎 𝒐 𝑾 𝟐 − 𝝀 )

slide-32
SLIDE 32

Distinguishing Example

Space of all possible 22|𝑉| formulae. Each dot represents semantically unique Boolean formula

slide-33
SLIDE 33

Distinguishing Example

some of the 22|𝑉| formula are not consistent with example set {𝑓1} Oracle O: Label 𝑓1 as true iff it satisfies target 𝜚 Randomly select - assignment to d Boolean variables - 𝑓1

slide-34
SLIDE 34

Distinguishing Example

Oracle O: Label 𝑓1 as true iff it satisfies target 𝜚 Randomly select - assignment to d Boolean variables - 𝑓1 some of the 22|𝑉| formula are not consistent with example set {𝑓1} 𝜚1 𝜚2 2 Candidate Formulae Consistent with Example Set

slide-35
SLIDE 35

Distinguishing Example

Oracle O: Label 𝑓2 as true iff it satisfies target 𝜚 Randomly select - assignment to d Boolean variables - 𝑓1 some of the 22|𝑉| formula are not consistent with example set {𝑓1} 𝜚1 𝜚2 Find distinguishing example 𝑓2 ⊨ 𝜚1 ⊕ 𝜚2

slide-36
SLIDE 36

Distinguishing Example

Oracle O: Label 𝑓2 as true iff it satisfies target 𝜚 Randomly select - assignment to d Boolean variables - 𝑓1 some of the 22|𝑉| formula are not consistent with example set {𝑓1, 𝑓2} Find distinguishing example 𝑓2 ⊨ 𝜚1 ⊕ 𝜚2

slide-37
SLIDE 37

Distinguishing Example

Oracle O: Label 𝑓2 as true iff it satisfies target 𝜚 Randomly select - assignment to d Boolean variables - 𝑓1 some of the 22|𝑉| formula are not consistent with example set {𝑓1, 𝑓2} 𝜚1 𝜚2 Find distinguishing example 𝑓3 ⊨ 𝜚1 ⊕ 𝜚2

slide-38
SLIDE 38

Distinguishing Example

Oracle O: Label 𝑓2 as true iff it satisfies target 𝜚 Randomly select - assignment to d Boolean variables - 𝑓1 some of the 22|𝑉| formula are not consistent with example set {𝑓1, 𝑓2, 𝑓3} 𝜚1 𝜚2 Find distinguishing example 𝑓3 ⊨ 𝜚1 ⊕ 𝜚2

slide-39
SLIDE 39

Distinguishing Example

Oracle O: Label 𝑓2 as true iff it satisfies target 𝜚 Randomly select - assignment to d Boolean variables - 𝑓1 some of the 22|𝑉| formula are not consistent with example set {𝑓1, 𝑓2, 𝑓3, … , 𝑓𝑙} 𝜚 Find distinguishing example 𝑓𝑙 ⊨ 𝜚1 ⊕ 𝜚2 Terminate when we are left with a single consistent 𝜚 w.r.t all the examples.

slide-40
SLIDE 40

Actively Learning Boolean Formula

𝐺𝑗𝑜𝑒 𝑉 𝑡𝑣𝑑ℎ 𝑢ℎ𝑏𝑢 𝜚𝑓𝑦𝑞𝑚𝑏𝑗𝑜 V ≡ 𝜚𝑓𝑦𝑞𝑚𝑏𝑗𝑜 𝑉 𝑥ℎ𝑓𝑠𝑓 𝑉 ≪ |𝑊| Used distinguishing input based approach from ICSE’10

𝝔𝒇𝒚𝒒𝒎𝒃𝒋𝒐 found with confidence 𝝀 in 𝟑 𝐕 (𝟐 + 𝒎 𝒐 𝑾 𝟐 − 𝝀 )

Correctness Guarantee: If there exists an explanation using variables in V, then this approach is guaranteed to find it (learn correct Boolean formula). If there does not exist an explanation using V, the approach will either detect it or find a wrong explanation (learn incorrect Boolean formula). In practice, since we depend only logarithmically on |V|, we use large vocabulary.

slide-41
SLIDE 41

Experiments

5/20/2017

41

Explaining A* Planning |V| = 2500 |U| <= 4 Runtime < 3 minutes Reactive Exploration Strategy |V| = 96 |U| <= 2 Runtime < 5 seconds Image Classification |V| = 784 |U| <= 27 Runtime < 5 minutes

10^153 10^28

slide-42
SLIDE 42

Challenges

Improving selection of inputs to run to reduce number of introspections needed.

  • Exploit structural assumption on nature of explanation
  • Exploit probability distribution over inputs

Exploitation of partial knowledge about AI algorithms to enable efficient sampling. Extension to randomized decision making algorithms which may make different decisions for the same input. Automated/Iterative Selection of Vocabulary

42

slide-43
SLIDE 43

Thanks!!

43

𝝔𝒇𝒚𝒒𝒎𝒃𝒋𝒐 found with confidence 𝝀 in 𝐏(𝟑 𝐕 𝒎𝒐(|𝑾|/(𝟐 − 𝝀)))

slide-44
SLIDE 44

Thanks!

5/20/2017

44

slide-45
SLIDE 45

5/20/2017

45