Born-Again Tree Ensembles Thibaut Vidal 1 , Maximilian Schiffer 2 - PowerPoint PPT Presentation

Born-Again Tree Ensembles Thibaut Vidal 1 , Maximilian Schiffer 2 with the support of Toni Pacheco 1 1 Computer Science Department, Pontifical Catholic University of Rio de Janeiro 2 TUM School of Management, Technical University of Munich

Our Concept • We propose the first exact algorithm that transforms a tree ensemble into a born-again decision tree (BA tree) that is: ◮ Optimal in size (number of leaves or depth), and ◮ Faithful to the tree ensemble in its entire feature space . • The BA tree is effectively a different representation of the same decision function . We seek a single —minimal-size— decision tree that faithfully reproduces the decision function of the random forest . References 2 / 18

Why interpretability is critical • Machine learning is becoming widespread, even for high stakes decisions: ◮ Recurrence predictions in medicine ◮ Custody decisions in criminal justice ◮ Credit risk evaluations... • Some studies suggest that there is a trade-off between algorithm accuracy and interpretability ◮ This is not always the case [1] We need interpretable and accurate algorithms to leverage the best of both worlds References 3 / 18

Related Research Thinning tree Thinning neural Optimal decision ensembles networks trees Pruning some weak learners Model compression and [18, 21, 22, 25] knowledge distillation Linear programming [8, 15]: Using a “teacher” algorithms have been Replacing the tree ensemble to train a compact “student’ exploited to find linear by a simpler classifier with similar knowledge. combination splits [5]. [2, 7, 19, 23] Creating soft decision trees Extensive study of global Rule extraction via from a neural network [11], optimization methods, bayesian model selection or decomposing the based on mixed-integer [14] gradient in knowledge programming or dynamic distillation [12]. programming, for the con- Extracting a single tree struction of optimal deci- from a tree ensemble by Simplifying neural networks sion trees [6, 13, 16, 20, 24] actively sampling training [9, 10] or synthetizing them points [3, 4] as an interpretable simulation model [17]. Thinning algorithms do not guarantee faithfulness References 4 / 18

Methodology Construction Process BORN-AGAIN TREE x 2 ≤ 4 x 2 ≤ 4 ● ○ TRUE FALSE REGION TRUE FALSE x 1 ≤ 7 x 1 ≤ 2 x 1 ≤ 4 x 1 ≤ 2 ○ ● ● ○ ○ ○ ○ ● ● ○ ● ● ● ● ○ ○ x 2 ≤ 4 x 1 ≤ 7 ● CELL ○ ○ ○ ● ● ○ ○ x 1 ≤ 2 ○ ● DYNAMIC TRUE FALSE MAJORITY CLASS PROGRAM x 2 ≤ 2 x 2 ≤ 4 ● ○ ● ○ ○ ● ● ○ ● ● ○ x 2 ● ● ○ x 2 ≤ 2 ○ 4 ● ○ TRUE FALSE x 1 ≤ 7 x 1 ≤ 4 2 ● ○ ○ ● ● ○ x 1 2 4 7 References 5 / 18

Methodology Problem 1: Born-Again Tree Ensemble Given a tree ensemble T , we search for a decision tree T of minimal size such that F T ( x ) = F T ( x ) for all x ∈ ❘ p . Theorem 1 Problem 1 is NP-hard when optimizing depth, number of leaves, or any hierarchy of these two objectives. Verifying that a given solution is feasible (faithful) is NP-hard. References 6 / 18

Methodology Dynamic Program 1 Let Φ( z l , z r ) be the depth of an optimal born-again decision tree for a region ( z l , z r ). Then:  0 if id ( z l , z r )   � �� Φ( z l , z r ) = � 1 + max { Φ( z l , z r jl ) , Φ( z l jl , z r ) } min min ,  z l j ≤ l<z r  1 ≤ j ≤ p j in which id ( z l , z r ) takes value True iff all cells z such that z l ≤ z ≤ z r are from the same class (i.e. base case). Issue 1 Issue 2 Detecting base cases Numerous recursive calls References 7 / 18

Circumventing Issue 1 We tried several alternatives to efficiently check base cases. The best approach we found consisted in including the base case evaluation within the DP: Dynamic Program 2 Let Φ( z l , z r ) be the depth of an optimal born-again decision tree for a region ( z l , z r ). Then: � � Φ( z l , z r ) = min � ✶ jl ( z l , z r ) + max { Φ( z l , z r jl ) , Φ( z l jl , z r ) } � min z l j ≤ l<z r 1 ≤ j ≤ p j Φ( z l , z r jl ) = Φ( z l jl , z r ) = 0  0 if  and F T ( z l ) = F T ( z r ); where ✶ jl ( z l , z r ) = 1 otherwise.  References 8 / 18

Circumventing Issue 2 We exploit two simple properties to reduce the number of recursive calls: Property 2 φ=2 φ=1 z jl R jl , z r ) then for all l ′ > l : z R If Φ( z l , z r jl ) ≥ Φ( z l ✶ jl ( z l , z r ) + max { Φ( z l , z r jl ) , Φ( z l jl , z r ) } z jl L z L ≤ ✶ jl ′ ( z l , z r ) + max { Φ( z l , z r jl ′ ) , Φ( z l jl ′ , z r ) } Property 3 jl , z r ) then for all l ′ < l : If Φ( z l , z r jl ) ≤ Φ( z l ✶ jl ( z l , z r ) + max { Φ( z l , z r jl ) , Φ( z l jl , z r ) } ≤ ✶ jl ′ ( z l , z r ) + max { Φ( z l , z r jl ′ ) , Φ( z l jl ′ , z r ) } Allowing us to search for the best hyperplane level for each feature with a binary search. References 9 / 18

Experimental Analyses Datasets We used datasets from diverse applications, including medicine (BC, PD), criminal justice (COMPAS), and credit scoring (FICO). Data set CD Src. n p K BC – Breast-Cancer 683 9 2 65-35 UCI CP – COMPAS 6907 12 2 54-46 HuEtAl FI – FICO 10459 17 2 52-48 HuEtAl HT – HTRU2 17898 8 2 91-9 UCI PD – Pima-Diabetes 768 8 2 65-35 SmithEtAl SE – Seeds 210 7 3 33-33-33 UCI Data Preparation One-hot encoding for categorical variables. Continuous variables binned into ten ordinal scales. Generate training and test samples for all data sets by ten-fold cross validation. For each fold and each dataset, generate a random forest composed of 10 trees with a depth of 3. References 10 / 18

Experimental Analyses Scalability Number of Samples Number of Features Number of Trees T(ms) T(ms) T(ms) ● ● 12 ● 300 15 ● 10 ● 250 ● ● 200 8 ● 10 150 6 ● 100 4 5 ● ● 50 2 ● 0 0 0 2 3 5 7 10 12 15 17 0.25 0.5 0.75 1 2.5 5 7.5 10.5 3 5 7 10 12 15 17 20 Number of Features p Number of Trees T Number of S amples n (x1000) Computational time(ms) of the DP as a function of the number of samples, features and trees. References 11 / 18

Experimental Analyses Simplicity Depth and number of leaves of the born-again trees: D L DL Data set Depth # Leaves Depth # Leaves Depth # Leaves BC 12.5 2279.4 18.0 890.1 12.5 1042.3 CP 8.9 119.9 8.9 37.1 8.9 37.1 FI 8.6 71.3 8.6 39.2 8.6 39.2 HT 6.0 20.2 6.3 11.9 6.0 12.0 PD 9.6 460.1 15.0 169.7 9.6 206.7 SE 10.2 450.9 13.8 214.6 10.2 261.0 Avg. 9.3 567.0 11.8 227.1 9.3 266.4 Analysis The decision function of a random forest is visibly complex One main reason: Incompatible feature combinations are being represented, and the decision function of the RF is not necessarily uniform on these regions due to the other features. References 12 / 18

Experimental Analyses Post-Pruning Eliminate inexpressive tree sub-regions. From bottom to top: • Verify whether both sides of a split contain at least one sample • Eliminate every such empty split References 13 / 18

Experimental Analyses Analysis With post-pruning, faithfulness is no longer guaranteed per definition. We need to experimentally evaluate: ◮ Impact on simplicity ◮ Impact on accuracy Depth and number of leaves: Accuracy and F1 score comparison: RF BA-Tree BA+P RF BA-Tree BA+P Leaves Depth Leaves Depth Leaves Acc F1 Acc F1 Acc F1 BC 61.1 12.5 2279.4 9.1 35.9 BC 0.953 0.949 0.953 0.949 0.946 0.941 CP 46.7 8.9 119.9 7.0 31.2 CP 0.660 0.650 0.660 0.650 0.660 0.650 FI 47.3 8.6 71.3 6.5 15.8 FI 0.697 0.690 0.697 0.690 0.697 0.690 HT 42.6 6.0 20.2 5.1 13.2 HT 0.977 0.909 0.977 0.909 0.977 0.909 PD 53.7 9.6 460.1 9.4 79.0 PD 0.746 0.692 0.746 0.692 0.750 0.700 SE 55.7 10.2 450.9 7.5 21.5 SE 0.790 0.479 0.790 0.479 0.790 0.481 Avg. 51.2 9.3 567.0 7.4 32.8 Avg. 0.804 0.728 0.804 0.728 0.803 0.729 References 14 / 18

Conclusions • Compact representations of the decision functions of random forests, as a single —minimal size— decision tree. • Sheds a new light on random forests visualization and interpretability . • Progressing towards interpretable models is an important step towards addressing bias and data mistakes in learning algorithms. • Optimal classifiers can be fairly complex. Indeed, BA-trees reproduce the complete decision function for all regions of the feature space . ◮ Pruning can solve this issue ◮ Heuristics can be used for datasets which are too large to be solved to optimality References 15 / 18

Born-Again Tree Ensembles Thibaut Vidal 1 , Maximilian Schiffer 2 - PowerPoint PPT Presentation

Born-Again Tree Ensembles Thibaut Vidal 1 , Maximilian Schiffer 2 with the support of Toni Pacheco 1 1 Computer Science Department, Pontifical Catholic University of Rio de Janeiro 2 TUM School of Management, Technical University of Munich Our

Again & Again Again & Again Again & Again Again & Again Gods people

Again & Again Again & Again Again & Again Again & Again The Detailed

Again & Again Again & Again Again & Again Again & Again The Divine Statement:

Again & Again Again & Again Again & Again Again & Again Afuer the death of

Again & Again Again & Again Again & Again Again & Again Life, like war, is a

Again & Again Again & Again Again & Again Again & Again Now when all the

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

OUR HOUSE A Production of Ishyo Arts Centre and Helios Theater (2016) I was born I was

Keziah Temilola Daramola Born 2/17/19 Isaac Henry Schmidt Born 1/6/19 Jack William Silver Born

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Monte Carlo in different ensembles Daan Frenkel Different Ensembles Ensemble Name Constant

COS424 Scribe Notes Lecture 14: Ensembles Donghun Lee April 8, 2010 1 Ensembles A set of

Coulomb gas ensembles in 2D H. Hedenmalm December 11, 2015 H. Hedenmalm Coulomb gas ensembles

ENSEMBLES FOR TIME SERIES FORECASTING Mariana Oliveira & Lus Torgo Ensembles for Time

Born Again Neural Networks Tommaso Furlanello, Zachary C. Lipton, Michael Tschannen, Laurent

-- Deep Neural Network or Gaussian Mixture Model? Dong Yu Microsoft Research Thanks to my

Optimal Sparse Decision Trees Xiyang Hu Cynthia Rudin Margo Seltzer Carnegie Mellon Duke

Responding to the Housing Challenges Posed by the Pandemic Presenters Call llie S Selt

Layering in Provenance Systems Kiran-Kumar Muniswamy-Reddy , Uri Braun, David A. Holland, Peter

Geant4 Physics in More Detail Fermilab Geant4 Tutorial 27-29 October 2003 Dennis Wright (SLAC)

DNS / DNSSEC / DANE / DPRIVE Results at IETF 93

POLI 120N: Contention and Conflict in Africa Professor Adida Explaining civil conflict: political

Ant-32 On the Design of a New CPU Architecture for Pedagogical Purposes Daniel Ellard, David

Born-Again Tree Ensembles Thibaut Vidal 1 , Maximilian Schiffer 2 - PowerPoint PPT Presentation

Born-Again Tree Ensembles Thibaut Vidal 1 , Maximilian Schiffer 2 with the support of Toni Pacheco 1 1 Computer Science Department, Pontifical Catholic University of Rio de Janeiro 2 TUM School of Management, Technical University of Munich Our

Again &amp; Again Again &amp; Again Again &amp; Again Again &amp; Again Gods people

Again &amp; Again Again &amp; Again Again &amp; Again Again &amp; Again The Detailed

Again &amp; Again Again &amp; Again Again &amp; Again Again &amp; Again The Divine Statement:

Again &amp; Again Again &amp; Again Again &amp; Again Again &amp; Again Afuer the death of

Again &amp; Again Again &amp; Again Again &amp; Again Again &amp; Again Life, like war, is a

Again &amp; Again Again &amp; Again Again &amp; Again Again &amp; Again Now when all the

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

OUR HOUSE A Production of Ishyo Arts Centre and Helios Theater (2016) I was born I was

Keziah Temilola Daramola Born 2/17/19 Isaac Henry Schmidt Born 1/6/19 Jack William Silver Born

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

Monte Carlo in different ensembles Daan Frenkel Different Ensembles Ensemble Name Constant

COS424 Scribe Notes Lecture 14: Ensembles Donghun Lee April 8, 2010 1 Ensembles A set of

Coulomb gas ensembles in 2D H. Hedenmalm December 11, 2015 H. Hedenmalm Coulomb gas ensembles

ENSEMBLES FOR TIME SERIES FORECASTING Mariana Oliveira &amp; Lus Torgo Ensembles for Time

Born Again Neural Networks Tommaso Furlanello, Zachary C. Lipton, Michael Tschannen, Laurent

-- Deep Neural Network or Gaussian Mixture Model? Dong Yu Microsoft Research Thanks to my

Optimal Sparse Decision Trees Xiyang Hu Cynthia Rudin Margo Seltzer Carnegie Mellon Duke

Responding to the Housing Challenges Posed by the Pandemic Presenters Call llie S Selt

Layering in Provenance Systems Kiran-Kumar Muniswamy-Reddy , Uri Braun, David A. Holland, Peter

Geant4 Physics in More Detail Fermilab Geant4 Tutorial 27-29 October 2003 Dennis Wright (SLAC)

DNS / DNSSEC / DANE / DPRIVE Results at IETF 93

POLI 120N: Contention and Conflict in Africa Professor Adida Explaining civil conflict: political

Ant-32 On the Design of a New CPU Architecture for Pedagogical Purposes Daniel Ellard, David

Again & Again Again & Again Again & Again Again & Again Gods people

Again & Again Again & Again Again & Again Again & Again The Detailed

Again & Again Again & Again Again & Again Again & Again The Divine Statement:

Again & Again Again & Again Again & Again Again & Again Afuer the death of

Again & Again Again & Again Again & Again Again & Again Life, like war, is a

Again & Again Again & Again Again & Again Again & Again Now when all the

ENSEMBLES FOR TIME SERIES FORECASTING Mariana Oliveira & Lus Torgo Ensembles for Time