IMLI: An Incremental Framework for MaxSAT-Based Learning of - PowerPoint PPT Presentation

IMLI: An Incremental Framework for MaxSAT-Based Learning of Interpretable Classification Rules Bishwamittra Ghosh Joint work with Kuldeep S. Meel 1

Applications of Machine Learning 2

Example Dataset 3

Representation of an interpretable model and a black box model A sample is Iris Versicolor if (sepal length > 6 . 3 OR sepal width > 3 OR petal width ≤ 1 . 5 ) AND (sepal width ≤ 2 . 7 OR petal length > 4 OR petal width > 1 . 2) AND (petal length ≤ 5) Black Box Model Interpretable Model 4

Formula ◮ A CNF (Conjunctive Normal Form) formula is a conjunction of clauses where each clause is a disjunction of literals ◮ A DNF (Disjunctive Normal Form) formula is a disjunction of clauses where each clause is a conjunction of literals ◮ Example ◮ CNF: ( a ∨ b ∨ c ) ∧ ( d ∨ e ) ◮ DNF: ( a ∧ b ∧ c ) ∨ ( d ∧ e ) ◮ Decision rules in CNF and DNF are highly interpretable [Malioutov’18; Lakkaraju’19] 5

Expectation from a ML model ◮ Model needs to be interpretable ◮ End users should understand the reasoning behind decision-making ◮ Examples of interpretable models: ◮ Decision tree ◮ Decision rules (If-Else rules) ◮ ... 6

Definition of Interpretability in Rule-based Classification ◮ There exists different notions of interpretability of rules ◮ Rules with fewer terms are considered interpretable in medical domains [Letham’15] ◮ We consider rule size as a proxy of interpretability for rule-based classifiers ◮ Rule size = number of literals 7

Outline Introduction Preliminaries Motivation Proposed Framework Experimental Evaluation Conclusion 8

Motivation ◮ Recently a MaxSAT-based interpretable rule learning framework MLIC has been [Malioutov’18 ] ◮ MLIC learns interpretable rules expressed as CNF ◮ The number of clauses in the query is linear with the number of samples in the dataset ◮ Suffers from poor scalability for large datasets 9

Can we design? A sound framework- ◮ takes benefit of success of MaxSAT solving ◮ scales to large dataset ◮ provides interpretability ◮ achieves competitive prediction accuracy 10

IMLI: I ncremental approach to M axSAT-based L earning of I nterpretable Rules ◮ p is the number of partition ◮ n is the number of samples ◮ The number of clauses in MaxSAT query is O ( n p ) 11

Continued . . . ◮ consider binary variables b i for feature i ◮ b i = 1 { feature i is selected in R} ◮ Consider assignment b 1 = 1 , b 2 = 0 , b 3 = 0 , b 4 = 1 R = (1 st feature OR 4 th feature) 12

Continued . . . In MaxSAT ◮ Hard Clause: always satisfied, weight = ∞ ◮ Soft Clause: can be falsified, weight = R + MaxSAT finds an assignment that satisfies all hard clauses and most soft clauses such that the weight of satisfied soft clauses is maximize 13

Continued . . . ( i − 1)-th partition i -th partition we learn assignment we construct soft unit clause ◮ ¬ b 1 ◮ b 1 = 0 ◮ b 2 = 1 ◮ b 2 ◮ ¬ b 3 ◮ b 3 = 0 ◮ b 4 = 1 ◮ b 4 14

Experimental Results 15

Accuracy and training time of different classifiers Dataset Size Features RF SVC RIPPER MLIC IMLI 76 . 62 75 . 32 75 . 32 75.97 73 . 38 PIMA 768 134 (1 . 99) (0 . 37) (2 . 58) Timeout ( 0.74 ) 97 . 11 96 . 83 96 . 75 96 . 61 96.86 Tom’s HW 28179 844 (27 . 11) (354 . 15) (37 . 81) Timeout ( 23.67 ) 84 . 31 84 . 39 83.72 79 . 72 80 . 84 Adult 32561 262 (36 . 64) (918 . 26) (37 . 66) Timeout ( 25.07 ) 80 . 87 80 . 69 80.97 80 . 72 79 . 41 Credit-default 30000 334 (37 . 72) (847 . 93) ( 20.37 ) Timeout (32 . 58) 95 . 16 95.56 94 . 78 94 . 69 Twitter 49999 1050 Timeout (67 . 83) (98 . 21) Timeout ( 59.67 ) Table: For every cell in the last seven columns the top value represents the test accuracy (%) on unseen data and the bottom value surrounded by parenthesis represents the average training time (seconds). 16

Size of interpretable rules of different classifiers Dataset RIPPER MLIC IMLI Parkinsons 2 . 6 2 8 Ionosphere 9 . 6 13 5 WDBC 7 . 6 14 . 5 2 Adult 107 . 55 44 . 5 28 PIMA 8 . 25 16 3.5 Tom’s HW 30 . 33 2 . 5 2 Twitter 21 . 6 20 . 5 6 Credit 14 . 25 6 3 Table: Size of the rule of interpretable classifiers. 17

Rule for WDBC Dataset Tumor is diagnosed as malignant if standard area of tumor > 38 . 43 OR largest perimeter of tumor > 115 . 9 OR largest number of concave points of tumor > 0 . 1508 18

Conclusion ◮ We propose IMLI: an incremental approach to MaxSAT-based framework for learning interpretable classification rules ◮ IMLI achieves up to three orders of magnitude runtime improvement without loss of accuracy and interpretability ◮ The generated rules appear to be reasonable, intuitive, and more interpretable 19

Thank You !! 20

MaxSAT ◮ MaxSAT is an optimization problem of general SAT problem ◮ Try to maximize the number of satisfied clauses in the formula 21

MaxSAT ◮ MaxSAT is an optimization problem of general SAT problem ◮ Try to maximize the number of satisfied clauses in the formula ◮ A variant of general MaxSAT is weighted partial MaxSAT ◮ Maximize the weight of satisfied clauses ◮ Consider two types of clause 1. Hard clause: weight is infinity, hence always satisfied 2. Soft clause: priority is set based on positive real valued weight ◮ Cost of the solution is the total weight of unsatisfied clauses 21

Example of MaxSAT 1 : x 2 : y 3 : z ∞ : ¬ x ∨ ¬ y ∞ : x ∨ ¬ z ∞ : y ∨ ¬ z 22

Example of MaxSAT 1 : x 1 : x 2 : y 2 : y 3 : z 3 : z ∞ : ¬ x ∨ ¬ y ∞ : ¬ x ∨ ¬ y ∞ : x ∨ ¬ z ∞ : x ∨ ¬ z ∞ : y ∨ ¬ z ∞ : y ∨ ¬ z 22

Example of MaxSAT 1 : x 1 : x 2 : y 2 : y 3 : z 3 : z ∞ : ¬ x ∨ ¬ y ∞ : ¬ x ∨ ¬ y ∞ : x ∨ ¬ z ∞ : x ∨ ¬ z ∞ : y ∨ ¬ z ∞ : y ∨ ¬ z Optimal Assignment : ¬ x , y , ¬ z Cost of the solution is 1 + 3 = 4 22

Solution Outline ◮ Reduce the learning problem as an optimization problem ◮ Define the objective function ◮ Define decision variables ◮ Define constraints ◮ Choose a proper solver to find the assignment of the decision variables ◮ Construct the rule 23

Input Specification ◮ Discrete optimization problem requires dataset to be in binary ◮ Categorical and real-valued datasets can be converted to binary by applying standard techniques, e.g., one hot encoding and comparison of feature value with predefined threshold. ◮ Input instance { X , y } where X ∈ { 0 , 1 } n × m , and y ∈ { 0 , 1 } n ◮ x = { x 1 , . . . , x m } is the boolean feature vector ◮ Learn a k -clause CNF rule 24

Objective Function ◮ Let |R| = number of literals in the rule ◮ E R = set of samples which are misclassified by R ◮ λ be data fidelity parameter ◮ We find a classifier R as follows: min R |R| + λ |E R | such that ∀ X i / ∈ E R , y i = R ( X i ) ◮ |R| defines interpretability or sparsity ◮ |E R | defines classification error 25

Decision Variables Two types of decision variables- 1. Feature variable b l j ◮ Feature x j can participate in each of the l -th clause of CNF rule R ◮ If b l j is assigned true , feature x j is present in the l -th clause of R ◮ Let R = ( x 1 ∨ x 2 ∨ x 3 ) ∧ ( x 1 ∨ x 4 ) ◮ For feature x 1 , decision variable b 1 1 and b 2 1 are assigned true 26

Decision Variables Two types of decision variables- 1. Feature variable b l j ◮ Feature x j can participate in each of the l -th clause of CNF rule R ◮ If b l j is assigned true , feature x j is present in the l -th clause of R ◮ Let R = ( x 1 ∨ x 2 ∨ x 3 ) ∧ ( x 1 ∨ x 4 ) ◮ For feature x 1 , decision variable b 1 1 and b 2 1 are assigned true 2. Noise variable (classification error) η q ◮ If η q is assigned true , the q -th sample is misclassified by R 26

MaxSAT Constraints Q i ◮ MaxSAT constraint is a CNF formula where each clause has a weight ◮ Q i is the MaxSAT constraints for the i -th partition. ◮ Q i consists of three set of clauses. 27

1. Soft Clause for Feature Variable ◮ IMLI tries to falsify each feature variable b l j for sparsity 28

1. Soft Clause for Feature Variable ◮ IMLI tries to falsify each feature variable b l j for sparsity ◮ If a feature variable is assigned true in R i − 1 , IMLI keeps previous assignment 28

1. Soft Clause for Feature Variable ◮ IMLI tries to falsify each feature variable b l j for sparsity ◮ If a feature variable is assigned true in R i − 1 , IMLI keeps previous assignment � b l if x j ∈ clause ( R i − 1 , l ) V l j W ( V l j := ; j ) = 1 ¬ b l otherwise j 28

Example � 0 � � 1 � 1 1 X i = ; y i = 1 0 1 0 ◮ #samples n = 2, #features m = 3 ◮ We learn a 2-clause rule, i.e. k = 2 Let ◮ R i − 1 = ( b 1 1 ∨ b 1 2 ) ∧ ( b 2 1 ) Now V 1 1 = ( b 1 V 1 2 = ( b 1 V 1 3 = ( ¬ b 1 1 ); 2 ); 3 ); V 2 1 = ( b 2 V 2 2 = ( ¬ b 2 V 2 3 = ( ¬ b 2 1 ); 2 ); 3 ); 29

2. Soft Clause for Noise Variable ◮ IMLI tries to falsify as many noise variables as possible ◮ As data fidelity parameter λ is proportionate to accuracy, IMLI puts λ weight to following soft clause N q := ( ¬ η q ); W ( N q ) = λ 30

Example � 0 � � 1 � 1 1 X i = ; y i = 1 0 1 0 N 1 := ( ¬ η 1 ) N 2 := ( ¬ η 2 ) 31

IMLI: An Incremental Framework for MaxSAT-Based Learning of - PowerPoint PPT Presentation

IMLI: An Incremental Framework for MaxSAT-Based Learning of Interpretable Classification Rules Bishwamittra Ghosh Joint work with Kuldeep S. Meel 1 Applications of Machine Learning 2 Example Dataset 3 Representation of an interpretable

MaxSAT Resolution and Subcube Sums Yuval Filmus Meena Mahajan Gaurav Sood Marc Vinyals SAT

SAT by MaxSAT From NP to Beyond NP and Back Again Joao Marques-Silva (joint work with A.

MLIC: A MaxSAT-Based framework for learning interpretable classification rules Dmitry Malioutov 1

Incremental Garbage Collection Part II Roland Schatz Incremental Garbage Collection p.1/22

Extent- -based Incremental Identification based Incremental Identification Extent of Reaction

Community-based Partitioning for MaxSAT Solving Ruben Martins Vasco Manquinho In es Lynce

Batch-Incremental vs. Instance-Incremental Learning in Dynamic and Evolving Data Jesse Read 1 ,

Incremental and Non-incremental Learning of Control Knowledge for Planning Daniel Borrajo Mill

Incremental Classification: First Step into Lifelong Learning PAN Xinyu MMLab, Department of IE

Abstract Cores in Implicit Hitting Set MaxSat Solving Jeremias Berg 1 Fahiem Bacchus 2 Alex Poole

MaxSAT and Related Optimization Problems Joao Marques-Silva 1 , 2 1 University College Dublin,

A Modular Approach to MaxSAT Modulo Theories Alessandro Cimatti 1 , Alberto Griggio 1 , Bastiaan

Approximation Strategies for Incomplete MaxSAT Saurabh Joshi 1 Prateek Kumar 1 Ruben Martins 2

Exploiting the Power of MIP Solvers in MAXSAT Jessica Davies 1 and Fahiem Bacchus 2 1 MIAT, INRA,

SAT Preprocessing for MUS Extraction and MaxSAT Anton Belov Joint work with: Matti J arvisalo,

Incremental Construction Cost Incremental Construction Cost Analysis for New Homes Robin Snyder,

BBNANG243 Phonological analysis 34. Contrast in English consonants Zoltn G. Kiss,

Nonlinear Aspects of Speech Production: Fractals and Chaotic Dynamics Petros Maragos Summer

Markov Decision Processes Philipp Koehn 7 April 2020 Philipp Koehn Artificial Intelligence:

Lecture 11 Authentication 1 Where are we now? We know a bit of the following:

Intro, packages & tools Advanced functional programming - Lecture 1 Wouter Swierstra and

Serious Mental Illness in Youth: Working with Families Lindsay Smart, Ph.D. Center for Rural and

S Models for Change Initiative John D. and Catherine T. MacArthur Foundation A juvenile justice

Medicaid and CHIP in 2014: A Seamless Path to Affordable Coverage Application, Verification and

IMLI: An Incremental Framework for MaxSAT-Based Learning of - PowerPoint PPT Presentation

IMLI: An Incremental Framework for MaxSAT-Based Learning of Interpretable Classification Rules Bishwamittra Ghosh Joint work with Kuldeep S. Meel 1 Applications of Machine Learning 2 Example Dataset 3 Representation of an interpretable

MaxSAT Resolution and Subcube Sums Yuval Filmus Meena Mahajan Gaurav Sood Marc Vinyals SAT

SAT by MaxSAT From NP to Beyond NP and Back Again Joao Marques-Silva (joint work with A.

MLIC: A MaxSAT-Based framework for learning interpretable classification rules Dmitry Malioutov 1

Incremental Garbage Collection Part II Roland Schatz Incremental Garbage Collection p.1/22

Extent- -based Incremental Identification based Incremental Identification Extent of Reaction

Community-based Partitioning for MaxSAT Solving Ruben Martins Vasco Manquinho In es Lynce

Batch-Incremental vs. Instance-Incremental Learning in Dynamic and Evolving Data Jesse Read 1 ,

Incremental and Non-incremental Learning of Control Knowledge for Planning Daniel Borrajo Mill

Incremental Classification: First Step into Lifelong Learning PAN Xinyu MMLab, Department of IE

Abstract Cores in Implicit Hitting Set MaxSat Solving Jeremias Berg 1 Fahiem Bacchus 2 Alex Poole

MaxSAT and Related Optimization Problems Joao Marques-Silva 1 , 2 1 University College Dublin,

A Modular Approach to MaxSAT Modulo Theories Alessandro Cimatti 1 , Alberto Griggio 1 , Bastiaan

Approximation Strategies for Incomplete MaxSAT Saurabh Joshi 1 Prateek Kumar 1 Ruben Martins 2

Exploiting the Power of MIP Solvers in MAXSAT Jessica Davies 1 and Fahiem Bacchus 2 1 MIAT, INRA,

SAT Preprocessing for MUS Extraction and MaxSAT Anton Belov Joint work with: Matti J arvisalo,

Incremental Construction Cost Incremental Construction Cost Analysis for New Homes Robin Snyder,

BBNANG243 Phonological analysis 34. Contrast in English consonants Zoltn G. Kiss,

Nonlinear Aspects of Speech Production: Fractals and Chaotic Dynamics Petros Maragos Summer

Markov Decision Processes Philipp Koehn 7 April 2020 Philipp Koehn Artificial Intelligence:

Lecture 11 Authentication 1 Where are we now? We know a bit of the following:

Intro, packages &amp; tools Advanced functional programming - Lecture 1 Wouter Swierstra and

Serious Mental Illness in Youth: Working with Families Lindsay Smart, Ph.D. Center for Rural and

S Models for Change Initiative John D. and Catherine T. MacArthur Foundation A juvenile justice

Medicaid and CHIP in 2014: A Seamless Path to Affordable Coverage Application, Verification and

Intro, packages & tools Advanced functional programming - Lecture 1 Wouter Swierstra and