MLIC: A MaxSAT-Based framework for learning interpretable - PowerPoint PPT Presentation

MLIC: A MaxSAT-Based framework for learning interpretable classification rules Dmitry Malioutov 1 Kuldeep S. Meel 2 1 IBM Research, USA 2 School of Computing, National University of Singapore CP 2018 1 / 24

The Rise of Artificial Intelligence • “In Phoenix, cars are self-navigating the streets. In many homes, people are barking commands at tiny machines, with the machines responding. On our smartphones, apps can now recognize faces in photos and translate from one language to another.” (New York Times, 2018) • “AI is the new electricity” (Andrew Ng, 2017) 2 / 24

The Need for Interpretable Models • Core public agencies, such as those responsible for criminal justice, healthcare, welfare, and education (e.g., “high stakes” domains) should no longer use “black box” AI and algorithmic systems (AI Now Institute, 2018) 3 / 24

The Need for Interpretable Models • Core public agencies, such as those responsible for criminal justice, healthcare, welfare, and education (e.g., “high stakes” domains) should no longer use “black box” AI and algorithmic systems (AI Now Institute, 2018) • The practitioners adopt techniques that can be interpreted and validated by them 3 / 24

The Need for Interpretable Models • Core public agencies, such as those responsible for criminal justice, healthcare, welfare, and education (e.g., “high stakes” domains) should no longer use “black box” AI and algorithmic systems (AI Now Institute, 2018) • The practitioners adopt techniques that can be interpreted and validated by them • Medical and education domains see usage of techniques such as classification rules, decision rules, and decision lists. 3 / 24

Prior Work • Long history of interpretable classification models from data such as decision trees, decision lists, checklists etc with tools such as C4.5, CN2, RIPPER, SLIPPER 4 / 24

Prior Work • Long history of interpretable classification models from data such as decision trees, decision lists, checklists etc with tools such as C4.5, CN2, RIPPER, SLIPPER • The problem of learning optimal interpretable models is computationally intractable 4 / 24

Prior Work • Long history of interpretable classification models from data such as decision trees, decision lists, checklists etc with tools such as C4.5, CN2, RIPPER, SLIPPER • The problem of learning optimal interpretable models is computationally intractable • Prior work, which was mostly rooted in late 1980s and 1990s, focused on greedy approaches 4 / 24

Our Approach Objective Learn rules that are accurate and interpretable. The learning procedure is offline, so learning does not need to happen in real time. 5 / 24

Our Approach Objective Learn rules that are accurate and interpretable. The learning procedure is offline, so learning does not need to happen in real time. • The problem of rule learning is inherently an Approach optimization problem • The past few years have seen SAT revolution and development of tools that employ SAT as core engine 5 / 24

Our Approach Objective Learn rules that are accurate and interpretable. The learning procedure is offline, so learning does not need to happen in real time. • The problem of rule learning is inherently an Approach optimization problem • The past few years have seen SAT revolution and development of tools that employ SAT as core engine • Can we take advantage of SAT revolution , in particular progress on MaxSAT solvers? 5 / 24

Key Contributions • A MaxSAT-based framework, MLIC, that provably trades off accuracy vs interpretability of rules • The prototype implementation is capable of finding optimal (or high quality near-optimal) classification rules from large data sets 6 / 24

Part I From Rule Learning to MaxSAT 7 / 24

Binary Classification • Features: x = { x 1 , x 2 , · · · x m } • Input: Set of training samples { X i , y i } – each vector X i ∈ X contains valuation of the features for sample i , – y i ∈ { 0 , 1 } is the binary label for sample i • Output: Classifier R , i.e. y = R ( x ) • Our focus: classifiers that can be represented as CNF Formulas R := C 1 ∧ C 2 ∧ · · · ∧ C k . • Size of classifiers: |R| = Σ i | C i | 8 / 24

Constraint Learning vs Machine Learning Input Set of training samples { X i , y i } Output Classifier R • Constraint Learning: min R |R| such that R ( X i ) = y i , ∀ i 9 / 24

Constraint Learning vs Machine Learning Input Set of training samples { X i , y i } Output Classifier R • Constraint Learning: min R |R| such that R ( X i ) = y i , ∀ i • Machine Learning: min R |R| + λ |E R | such that R ( X i ) = y i , ∀ i / ∈ E R 9 / 24

MLIC Step 1 Discretization of Features Step 2 Transformation to MaxSAT Query Step 3 Invoke a MaxSAT Solver and extract R from MaxSAT solution 10 / 24

Encoding to MaxSAT Input Features: x = { x 1 , x 2 , · · · x m } ; Training Data: { X i , y i } over m featues Output R of k clauses Key Ideas • k × m binary coefficients, denoted by { b 1 1 , b 2 1 , · · · b m 1 · · · b m k } , i x 1 ∨ b 2 i x 2 . . . ∨ b m such that R i = ( b 1 i x m ) • For every sample i , we have noise variable η i to encode sample i should be considered as noise or not. 11 / 24

Encoding to MaxSAT Key Ideas • k × m binary coefficients, denoted by { b 1 1 , b 2 1 , · · · b m 1 · · · b m k } , i x 1 ∨ b 2 i x 2 . . . ∨ b m such that R i = ( b 1 i x m ) • For every sample i , we have noise variable η i to encode whether sample i should be considered as noise or not. 1 R = � k l =1 R l ( x �→ X i ): Output of substituting valuation of feature vectors of i th sample 12 / 24

Encoding to MaxSAT Key Ideas • k × m binary coefficients, denoted by { b 1 1 , b 2 1 , · · · b m 1 · · · b m k } , i x 1 ∨ b 2 i x 2 . . . ∨ b m such that R i = ( b 1 i x m ) • For every sample i , we have noise variable η i to encode whether sample i should be considered as noise or not. 1 R = � k l =1 R l ( x �→ X i ): Output of substituting valuation of feature vectors of i th sample 2 D i := ( ¬ η i → ( y i ↔ R ( x �→ X i ))); W ( D i ) = ⊤ If η i is False, y i is equivalent to prediction of the Rule 12 / 24

Encoding to MaxSAT Key Ideas • k × m binary coefficients, denoted by { b 1 1 , b 2 1 , · · · b m 1 · · · b m k } , i x 1 ∨ b 2 i x 2 . . . ∨ b m such that R i = ( b 1 i x m ) • For every sample i , we have noise variable η i to encode whether sample i should be considered as noise or not. 1 R = � k l =1 R l ( x �→ X i ): Output of substituting valuation of feature vectors of i th sample 2 D i := ( ¬ η i → ( y i ↔ R ( x �→ X i ))); W ( D i ) = ⊤ If η i is False, y i is equivalent to prediction of the Rule � � 3 V j i := ( b j V j i ); = 1 W i We want as few b j i to be true as possible 12 / 24

Encoding to MaxSAT Key Ideas • k × m binary coefficients, denoted by { b 1 1 , b 2 1 , · · · b m 1 · · · b m k } , i x 1 ∨ b 2 i x 2 . . . ∨ b m such that R i = ( b 1 i x m ) • For every sample i , we have noise variable η i to encode whether sample i should be considered as noise or not. 1 R = � k l =1 R l ( x �→ X i ): Output of substituting valuation of feature vectors of i th sample 2 D i := ( ¬ η i → ( y i ↔ R ( x �→ X i ))); W ( D i ) = ⊤ If η i is False, y i is equivalent to prediction of the Rule � � 3 V j i := ( b j V j i ); = 1 W i We want as few b j i to be true as possible 4 N i := ( η i ); W ( N i ) = λ We want as few η i to be true as possible 12 / 24

Encoding to MaxSAT 1 R = � k l =1 R l ( x �→ X i ): Output of substituting valuation of feature vectors of i th sample 2 D i := ( ¬ η i → ( y i ↔ R ( x �→ X i ))); W ( D i ) = ⊤ � � 3 V j i := ( b j V j i ); = 1 W i We want as few b j i to be true as possible 4 N i := ( η i ); W ( N i ) = λ We want as few η i to be true as possible Construction Let Q k = � i , j V j i D i ∧ � i N i ∧ � i σ ∗ = MaxSAT( Q k , W ) , then x j ∈ R i iff σ ∗ ( b j i ) = 1 . i x 1 ∨ b 2 i x 2 . . . ∨ b m Remember, R i = ( b 1 i x m ) 13 / 24

Provable Guarantees Theorem ( Provable trade off of accuracy vs interpretability of rules) Let R 1 ← MLIC ( X , y , k , λ 1 ) and R 2 ← MLIC ( X , y , k , λ 2 ) , if λ 2 > λ 1 then |R 1 | ≤ |R 2 | and |E R 1 | ≥ |E R 2 | . 14 / 24

Learning DNF Rules • ( y = S ( x )) ↔ ¬ ( y = ¬ S ( x )). • And if S is a DNF formula, then ¬ S is a CNF formula. • To learn rule S , we simply call MLIC with ¬ y as input and negate the learned rule. 15 / 24

Part II Experimental Results 16 / 24

Illustrative Example • Iris Classification: • Features: sepal length, sepal width, petal length, and petal width • MLIC learned R := (sepal length > 6.3 ∨ sepal width > 3.0 ∨ petal width < = 1.5 ) ∧ 1 ( sepal width < = 2.7 ∨ petal length > 4.0 ∨ petal width > 1.2 ) ∧ 2 ( petal length < = 5.0) 3 17 / 24

MLIC: A MaxSAT-Based framework for learning interpretable - PowerPoint PPT Presentation

MLIC: A MaxSAT-Based framework for learning interpretable classification rules Dmitry Malioutov 1 Kuldeep S. Meel 2 1 IBM Research, USA 2 School of Computing, National University of Singapore CP 2018 1 / 24 The Rise of Artificial Intelligence

IMLI: An Incremental Framework for MaxSAT-Based Learning of Interpretable Classification Rules

MaxSAT Resolution and Subcube Sums Yuval Filmus Meena Mahajan Gaurav Sood Marc Vinyals SAT

SAT by MaxSAT From NP to Beyond NP and Back Again Joao Marques-Silva (joint work with A.

Interpretable sets in o-minimal structures Will Johnson March 27, 2015 Will Johnson

Not Just a Black Box: Interpretable Deep Learning for Genomics Avan> Shrikumar, Peyton

Incremental Approach to Interpretable Classification Rule Learning Bishwamittra Ghosh and Kuldeep

From ML Successes to Applications ICIP18 Tutorial on Interpretable Deep Learning 2 Black Box

Community-based Partitioning for MaxSAT Solving Ruben Martins Vasco Manquinho In es Lynce

Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources Overview Jan 11,

Two-level Authoring of Computer- Interpretable Guidelines David Buenestado, Juan M. Pikatza, Unai

Deep Visual Models with Interpretable Features and Modularized Structures Quanshi Zhang John

Abstract Cores in Implicit Hitting Set MaxSat Solving Jeremias Berg 1 Fahiem Bacchus 2 Alex Poole

MaxSAT and Related Optimization Problems Joao Marques-Silva 1 , 2 1 University College Dublin,

A Modular Approach to MaxSAT Modulo Theories Alessandro Cimatti 1 , Alberto Griggio 1 , Bastiaan

Approximation Strategies for Incomplete MaxSAT Saurabh Joshi 1 Prateek Kumar 1 Ruben Martins 2

Exploiting the Power of MIP Solvers in MAXSAT Jessica Davies 1 and Fahiem Bacchus 2 1 MIAT, INRA,

Compatibility, coherence and the running intersection property Enrique Miranda Marco Zaffalon

BU CS 332 Theory of Computation Lecture 6: Reading: NFAs > Regular expressions

The Case for Cooperative Kernel Threads Yanjin Zhu (student) Leonid Ryzhyk (student) Peter Chubb

Android App Anatomy Eric Burke Square @burke_eric Topics Android lifecycle Fragments

Cyber@UC Meeting 42 CEH Cryptography If Youre New! Join our Slack ucyber.slack.com SIGN

Mining Airfare Data to Minimize Ticket Purchase Price Oren Etzioni ( UW ) Craig Knoblock

Iterative Optimization of Rule Sets Jiawei Du 16. November 2010 Prof. Dr. Johannes Frnkranz

Lecture 3 - Passwords and Authentication CMPSC 443 - Spring 2012 Introduction Computer and

MLIC: A MaxSAT-Based framework for learning interpretable - PowerPoint PPT Presentation

MLIC: A MaxSAT-Based framework for learning interpretable classification rules Dmitry Malioutov 1 Kuldeep S. Meel 2 1 IBM Research, USA 2 School of Computing, National University of Singapore CP 2018 1 / 24 The Rise of Artificial Intelligence

IMLI: An Incremental Framework for MaxSAT-Based Learning of Interpretable Classification Rules

MaxSAT Resolution and Subcube Sums Yuval Filmus Meena Mahajan Gaurav Sood Marc Vinyals SAT

SAT by MaxSAT From NP to Beyond NP and Back Again Joao Marques-Silva (joint work with A.

Interpretable sets in o-minimal structures Will Johnson March 27, 2015 Will Johnson

Not Just a Black Box: Interpretable Deep Learning for Genomics Avan&gt; Shrikumar, Peyton

Incremental Approach to Interpretable Classification Rule Learning Bishwamittra Ghosh and Kuldeep

From ML Successes to Applications ICIP18 Tutorial on Interpretable Deep Learning 2 Black Box

Community-based Partitioning for MaxSAT Solving Ruben Martins Vasco Manquinho In es Lynce

Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources Overview Jan 11,

Two-level Authoring of Computer- Interpretable Guidelines David Buenestado, Juan M. Pikatza, Unai

Deep Visual Models with Interpretable Features and Modularized Structures Quanshi Zhang John

Abstract Cores in Implicit Hitting Set MaxSat Solving Jeremias Berg 1 Fahiem Bacchus 2 Alex Poole

MaxSAT and Related Optimization Problems Joao Marques-Silva 1 , 2 1 University College Dublin,

A Modular Approach to MaxSAT Modulo Theories Alessandro Cimatti 1 , Alberto Griggio 1 , Bastiaan

Approximation Strategies for Incomplete MaxSAT Saurabh Joshi 1 Prateek Kumar 1 Ruben Martins 2

Exploiting the Power of MIP Solvers in MAXSAT Jessica Davies 1 and Fahiem Bacchus 2 1 MIAT, INRA,

Compatibility, coherence and the running intersection property Enrique Miranda Marco Zaffalon

BU CS 332 Theory of Computation Lecture 6: Reading: NFAs &gt; Regular expressions

The Case for Cooperative Kernel Threads Yanjin Zhu (student) Leonid Ryzhyk (student) Peter Chubb

Android App Anatomy Eric Burke Square @burke_eric Topics Android lifecycle Fragments

Cyber@UC Meeting 42 CEH Cryptography If Youre New! Join our Slack ucyber.slack.com SIGN

Mining Airfare Data to Minimize Ticket Purchase Price Oren Etzioni ( UW ) Craig Knoblock

Iterative Optimization of Rule Sets Jiawei Du 16. November 2010 Prof. Dr. Johannes Frnkranz

Lecture 3 - Passwords and Authentication CMPSC 443 - Spring 2012 Introduction Computer and

Not Just a Black Box: Interpretable Deep Learning for Genomics Avan> Shrikumar, Peyton

BU CS 332 Theory of Computation Lecture 6: Reading: NFAs > Regular expressions