Intelligible Models for Classification and Regression Yin Lou 1 Rich - PowerPoint PPT Presentation

Intelligible Models for Classification and Regression Yin Lou 1 Rich Caruana 2 Johannes Gehrke 1 Department of Computer Science 1 Microsoft Research 2 Cornell University Microsoft Corporation Aug. 13, 2012 Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 1 / 31

Motivation Simple Model Linear regression, logistic regression Regression: y = β 0 + β 1 x 1 + ... + β n x n Classification: logit ( y ) = β 0 + β 1 x 1 + ... + β n x n Linear Regression Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 2 / 31

Motivation Simple Model Linear regression, logistic regression Regression: y = β 0 + β 1 x 1 + ... + β n x n Classification: logit ( y ) = β 0 + β 1 x 1 + ... + β n x n Linear Regression Intelligible but usually less accurate Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 2 / 31

Motivation Complex Model Random forest, SVMs with RBF kernel, etc. y = f ( x 1 , ..., x n ) Random Forest Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 3 / 31

Motivation Complex Model Random forest, SVMs with RBF kernel, etc. y = f ( x 1 , ..., x n ) Random Forest Unintelligible but usually more accurate Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 3 / 31

Motivation The tradeoff SVMs with RBF Kernel Random Forest Complexity ? Linear Regression Logistic Regression Intelligibility Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 4 / 31

Motivation The tradeoff SVMs with RBF Kernel Random Forest Complexity ? Linear Regression Logistic Regression Intelligibility Intelligibility is important Medical applications Domains where we want scientific understanding Efficient model engineering Impact of features in a ranker Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 4 / 31

Outline Motivation 1 Towards More Accurate Models 2 Algorithms 3 Experiments 4 Discussion 5 Conclusion 6 Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 5 / 31

Generalized Additive Models Developed by Hastie and Tibshirani Regression: y = f 1 ( x 1 ) + ... + f n ( x n ) Classification: logit ( y ) = f 1 ( x 1 ) + ... + f n ( x n ) Each feature is “shaped” by shape function f i Intelligible and accurate T. Hastie and R. Tibshirani. Generalized additive models . Chapman & Hall/CRC, 1990. Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 7 / 31

Example 2 + √ x 3 + log x 4 + e x 5 + 2 sin x 6 + ǫ y = x 1 + x 2 2 2 2 1 1 1 0 0 0 −1 −1 −1 −2 −2 −2 0 1 2 3 4 0.0 0.5 1.0 1.5 2.0 0 5 10 15 f 1 ( x 1 ) f 2 ( x 2 ) f 3 ( x 3 ) 2 2 2 1 1 1 0 0 0 −1 −1 −1 −2 −2 −2 0 10 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 0 1 2 3 4 5 6 f 4 ( x 4 ) f 5 ( x 5 ) f 6 ( x 6 ) Figure: Shape Functions for Synthetic Dataset. Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 8 / 31

Model Space Model Form Intelligibility Accuracy Linear Model y = β 0 + β 1 x 1 + ... + β n x n +++ + Generalized Linear Model g ( y ) = β 0 + β 1 x 1 + ... + β n x n +++ + Additive Model y = f 1 ( x 1 ) + ... + f n ( x n ) ++ ++ Generalized Additive Model g ( y ) = f 1 ( x 1 ) + ... + f n ( x n ) ++ ++ Full Complexity Model y = f ( x 1 , ..., x n ) + +++ Table: From Linear to Additive Models. Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 9 / 31

Fitting GAMs g ( y ) = f 1 ( x 1 ) + ... + f n ( x n ) Shape Functions Splines (SP) Single Tree (TR) Bagged Trees (bagTR) Boosted Trees (bstTR) Boosted Bagged Trees (bbTR) Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 11 / 31

Fitting GAMs g ( y ) = f 1 ( x 1 ) + ... + f n ( x n ) Shape Functions Splines (SP) Single Tree (TR) Bagged Trees (bagTR) Boosted Trees (bstTR) Boosted Bagged Trees (bbTR) Learning Methods Penalized Least Squares (P-LS/P-IRLS) Backfitting (BF) Gradient Boosting (BST) Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 11 / 31

Fitting GAMs g ( y ) = f 1 ( x 1 ) + ... + f n ( x n ) Shape Function: Splines (SP) f i ( x i ) = � d k =1 β k b k ( x i ) 40 30 20 10 0 −10 −20 100 200 300 400 500 Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 12 / 31

Fitting GAMs g ( y ) = f 1 ( x 1 ) + ... + f n ( x n ) Shape Function: Single Tree (TR) f i ( x i ) = RegressionTree ( x i , response ) Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 13 / 31

Fitting GAMs g ( y ) = f 1 ( x 1 ) + ... + f n ( x n ) Shape Function: Bagged Trees (bagTR) f i ( x i ) = 1 � B j =1 RegressionTree ( x i , bootstrap sample j ) B 1 ( + ... + ) B Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 14 / 31

Fitting GAMs g ( y ) = f 1 ( x 1 ) + ... + f n ( x n ) Shape Function: Boosted Trees (bstTR) f i ( x i ) = � B j =1 RegressionTree ( x i , residual j ) + ... + Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 15 / 31

Fitting GAMs g ( y ) = f 1 ( x 1 ) + ... + f n ( x n ) Shape Function: Boosted Bagged Trees (bbTR) f i ( x i ) = � B j =1 BaggedRegressionTree ( x i , residual j ) Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 16 / 31

Fitting GAMs g ( y ) = f 1 ( x 1 ) + ... + f n ( x n ) Learning Method: Penalized Least Squares (P-LS/P-IRLS) Works only on Splines ( f i ( x i ) = � d k =1 β k b k ( x i )) Converts the optimization problem to fitting linear regression/logistic regression with different basis S. Wood. Generalized additive models: an introduction with R . CRC Press, 2006. Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 17 / 31

Fitting GAMs g ( y ) = f 1 ( x 1 ) + ... + f n ( x n ) Learning Method: Backfitting (BF) 1: f j ← 0 2: for m = 1 to M do for j = 1 to n do 3: k � = j f k } N R ← { x ij , y i − � 4: 1 Learn shaping function S : x j → y using R as training dataset 5: f j ← S 6: end for 7: 8: end for T. Hastie and R. Tibshirani. Generalized additive models . Chapman & Hall/CRC, 1990. Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 18 / 31

Fitting GAMs g ( y ) = f 1 ( x 1 ) + ... + f n ( x n ) Learning Method: Gradient Boosting (BST) 1: f j ← 0 2: for m = 1 to M do for j = 1 to n do 3: k f k } N R ← { x ij , y i − � 4: 1 Learn shaping function S : x j → y using R as training dataset 5: f j ← f j + S 6: end for 7: 8: end for J. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics , 29:1189–1232, 2001. Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 19 / 31

Contributions First large-scale study that uses trees as shape function for GAMs Novel methods for using trees as shape functions Largest empirical study of fitting GAMs Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 20 / 31

Datasets Dataset Size Attributes %Pos Concrete 1030 9 - Regression Wine 4898 12 - Delta 7192 6 - CompAct 8192 22 - Music 50000 90 - Synthetic 10000 6 - Spambase 4601 58 39.40 Classification Insurance 9823 86 5.97 Magic 19020 11 64.84 Letter 20000 17 49.70 Adult 46033 9/43 16.62 Physics 50000 79 49.72 Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 22 / 31

Methods Shape Least Gradient Backfitting Function Squares Boosting Splines P-LS/P-IRLS BST-SP BF-SP Single Tree N/A BST-TR x BF-TR Bagged Trees N/A BST-bagTR x BF-bagTR Boosted Trees N/A BST-TR x BF-bstTR x Boosted N/A BST-bagTR x BF-bbTR x Bagged Trees Table: Notation for learning methods and shape functions. 9 different methods 5-fold cross validation for each method Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 23 / 31

Results Model Regression Classification Mean Linear/Logistic P-LS/P-IRLS BST-SP BF-SP BST-bagTR2 BST-bagTR3 BST-bagTR4 BST-bagTRX Random Forest Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 24 / 31

Results Model Regression Classification Mean Linear/Logistic 1.68 1.22 1.45 P-LS/P-IRLS BST-SP BF-SP BST-bagTR2 BST-bagTR3 BST-bagTR4 BST-bagTRX Random Forest Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 24 / 31

Results Model Regression Classification Mean Linear/Logistic 1.68 1.22 1.45 P-LS/P-IRLS BST-SP BF-SP BST-bagTR2 BST-bagTR3 BST-bagTR4 BST-bagTRX Random Forest 0.88 0.80 0.84 Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 24 / 31

Results Model Regression Classification Mean Linear/Logistic 1.68 1.22 1.45 P-LS/P-IRLS 1.00 1.00 1.00 BST-SP 1.04 1.00 1.02 BF-SP 1.00 1.00 1.00 BST-bagTR2 BST-bagTR3 BST-bagTR4 BST-bagTRX Random Forest 0.88 0.80 0.84 Yin Lou (Cornell University) Intelligible Models Aug. 13, 2012 24 / 31

Intelligible Models for Classification and Regression Yin Lou 1 Rich - PowerPoint PPT Presentation

Intelligible Models for Classification and Regression Yin Lou 1 Rich Caruana 2 Johannes Gehrke 1 Department of Computer Science 1 Microsoft Research 2 Cornell University Microsoft Corporation Aug. 13, 2012 Yin Lou (Cornell University)

Classification or Regression? Regression Classification: want to learn a discrete target

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Analysis of variance and regression Other types of regression models Other types of regression

On Hopefully Intelligible Contributions to Seminar Series and Related Events (aka. This Talk on

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

10-601 Machine Learning Regression Outline Regression vs Classification Linear regression

Regression Based on Support Vector Classification Marcin Orchel AGH University of Science and

Theory for Minimum Norm Interpolation: Regression and Classification in High Dimensions Tengyuan

Regression l For classification the output(s) is nominal l In regression the output is continuous

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

CS 310 Advanced Data Structures and Algorithms Binary Search Tree June 26, 2017 Tong Wang

Definitions Binary Search Tree: n Tree: hierarchical structure made of nodes with father-son

Massive Data Algorithmics Lecture 3: External Search Trees Massive Data Algorithmics Lecture 3:

Trees CoSc 450: Programming Paradigms 08 The definition of a tree CoSc 450: Programming

Introduction CSCE423/823 CSCE423/823 Computer Science & Engineering 423/823 Dynamic

Binary Search Trees get(), put() and remove() Dictionary Operations: Data Structure Worst

BST property: for each node v with key k , nodes in left subtree have keys k nodes in

Programming with dependent types: passing fad or useful tool? Xavier Leroy INRIA

Intelligible Models for Classification and Regression Yin Lou 1 Rich - PowerPoint PPT Presentation

Intelligible Models for Classification and Regression Yin Lou 1 Rich Caruana 2 Johannes Gehrke 1 Department of Computer Science 1 Microsoft Research 2 Cornell University Microsoft Corporation Aug. 13, 2012 Yin Lou (Cornell University)

Classification or Regression? Regression Classification: want to learn a discrete target

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Analysis of variance and regression Other types of regression models Other types of regression

On Hopefully Intelligible Contributions to Seminar Series and Related Events (aka. This Talk on

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

10-601 Machine Learning Regression Outline Regression vs Classification Linear regression

Regression Based on Support Vector Classification Marcin Orchel AGH University of Science and

Theory for Minimum Norm Interpolation: Regression and Classification in High Dimensions Tengyuan

Regression l For classification the output(s) is nominal l In regression the output is continuous

Planning and Optimization B2. Regression: Introduction &amp; STRIPS Case Malte Helmert and

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

CS 310 Advanced Data Structures and Algorithms Binary Search Tree June 26, 2017 Tong Wang

Definitions Binary Search Tree: n Tree: hierarchical structure made of nodes with father-son

Massive Data Algorithmics Lecture 3: External Search Trees Massive Data Algorithmics Lecture 3:

Trees CoSc 450: Programming Paradigms 08 The definition of a tree CoSc 450: Programming

Introduction CSCE423/823 CSCE423/823 Computer Science &amp; Engineering 423/823 Dynamic

Binary Search Trees get(), put() and remove() Dictionary Operations: Data Structure Worst

BST property: for each node v with key k , nodes in left subtree have keys k nodes in

Programming with dependent types: passing fad or useful tool? Xavier Leroy INRIA

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

Introduction CSCE423/823 CSCE423/823 Computer Science & Engineering 423/823 Dynamic