Outline Introduction 1 Algorithms 2 crtrees 3 Examples 4 - PowerPoint PPT Presentation

CRTREES : A N I MPLEMENTATION OF C LASSIFICATION A ND R EGRESSION T REES (CART) & R ANDOM F ORESTS IN S TATA Ricardo Mora Universidad Carlos III de Madrid Madrid, October 2019 1 / 52

Outline Introduction 1 Algorithms 2 crtrees 3 Examples 4 Simulations 5 2 / 52

Introduction Introduction 3 / 52

Introduction Decision trees Decision tree-structured models are predictive models that use tree-like diagrams Classification trees: the target variable takes a finite set of values Regression trees: the target variable takes real numbers 4 / 52

Introduction Decision trees Decision tree-structured models are predictive models that use tree-like diagrams Classification trees: the target variable takes a finite set of values Regression trees: the target variable takes real numbers Each branch in the tree represents a sample split criterion 4 / 52

Introduction Decision trees Decision tree-structured models are predictive models that use tree-like diagrams Classification trees: the target variable takes a finite set of values Regression trees: the target variable takes real numbers Each branch in the tree represents a sample split criterion Several Approaches: Chi-square automated interaction detection, CHAID (Kass 1980; Biggs et al. 1991) Classification and Regression Trees, CART (Breiman et al. 1984) Random Forests (Breiman 2001; Scornet et al. 2015) 4 / 52

Introduction A simple tree structure  = y 1 if x 1 ≤ s 1  y ( x 1 , x 2 ) = y 2 if x 1 > s 1 and x 2 ≤ s 2 = y 3 if x 1 > s 1 and x 2 > s 2  x 1 ≤ s 1 yes no y = y 1 x 2 ≤ s 2 yes no y = y 2 y = y 3 5 / 52

Introduction CART CART objective is to estimate a binary tree structure 6 / 52

Introduction CART CART objective is to estimate a binary tree structure It performs three algorithms: Tree-growing : step-optimal recursive partition (LS on 50 cells with at most two terminal nodes ≈ 6 × 10 14 models) Tree-pruning & Obtaining the honest tree 6 / 52

Introduction CART CART objective is to estimate a binary tree structure It performs three algorithms: Tree-growing : step-optimal recursive partition (LS on 50 cells with at most two terminal nodes ≈ 6 × 10 14 models) Tree-pruning & Obtaining the honest tree The last two algorithms attempt to minimize overfitting (growing trees with no external validity) test sample cross-validation, bootstrap 6 / 52

Introduction CART CART objective is to estimate a binary tree structure It performs three algorithms: Tree-growing : step-optimal recursive partition (LS on 50 cells with at most two terminal nodes ≈ 6 × 10 14 models) Tree-pruning & Obtaining the honest tree The last two algorithms attempt to minimize overfitting (growing trees with no external validity) test sample cross-validation, bootstrap In Stata, modules < chaid > perform CHAID and < cart > performs CART analysis for failure time data 6 / 52

Introduction Random Forests Random forests is an ensemble learning method to generate predictions using tree structures 7 / 52

Introduction Random Forests Random forests is an ensemble learning method to generate predictions using tree structures Ensemble learning method: use of many strategically generated models First step: create multitude of (presumably over-fitted) trees with tree-growing algorithm The multitude of trees are obtained by random sampling (bagging) and by random choice of splitting variables Second step: case predictions are built using modes (in classification) and averages (in regression) 7 / 52

Introduction Random Forests Random forests is an ensemble learning method to generate predictions using tree structures Ensemble learning method: use of many strategically generated models First step: create multitude of (presumably over-fitted) trees with tree-growing algorithm The multitude of trees are obtained by random sampling (bagging) and by random choice of splitting variables Second step: case predictions are built using modes (in classification) and averages (in regression) In Stata, < sctree > is a Stata wrapper for the R functions "tree()", "randomForest()", and "gbm()" Classification tree with optimal pruning, bagging, boosting, and random forests 7 / 52

Algorithms Algorithms 8 / 52

Algorithms Growing the tree (CART & Random Forests) 9 / 52

Algorithms Growing the tree (CART & Random Forests) Requires a so-called training or learning sample 9 / 52

Algorithms Growing the tree (CART & Random Forests) Requires a so-called training or learning sample At iteration i with tree structure T i consider all terminal nodes t ∗ ( T i ) Classification: Let i ( T i ) be an overall impurity measure (using the gini or entropy index) Regression: Let i ( T i ) be the residual sum of squares in all terminal nodes The best split at iteration i identifies the terminal node and split criterion that maximizes i ( T i ) − i ( T i + 1 ) 9 / 52

Algorithms Growing the tree (CART & Random Forests) Requires a so-called training or learning sample At iteration i with tree structure T i consider all terminal nodes t ∗ ( T i ) Classification: Let i ( T i ) be an overall impurity measure (using the gini or entropy index) Regression: Let i ( T i ) be the residual sum of squares in all terminal nodes The best split at iteration i identifies the terminal node and split criterion that maximizes i ( T i ) − i ( T i + 1 ) Recursive partitioning ends with the largest possible tree, T MAX where there are no nodes to split or the number of observations reach a lower limit (splitting rule) 9 / 52

Algorithms Overfitting and aggregation bias 10 / 52

Algorithms Overfitting and aggregation bias In a trivial setting, the result is equivalent to dividing the sample into all possible cells and computing within-cell least squares 10 / 52

Algorithms Overfitting and aggregation bias In a trivial setting, the result is equivalent to dividing the sample into all possible cells and computing within-cell least squares Overfitting: T MAX will usually be too complex in the sense that it has no external validity and some terminal nodes should be aggregated Besides, a more simplified structure will normally lead to more accurate estimates since the number of observations in each terminal node grows as aggregation takes place 10 / 52

Algorithms Overfitting and aggregation bias In a trivial setting, the result is equivalent to dividing the sample into all possible cells and computing within-cell least squares Overfitting: T MAX will usually be too complex in the sense that it has no external validity and some terminal nodes should be aggregated Besides, a more simplified structure will normally lead to more accurate estimates since the number of observations in each terminal node grows as aggregation takes place However, if aggregation goes too far, aggregation bias becomes a serious problem 10 / 52

Algorithms Pruning the tree: Error-complexity clustering (CART) In order to avoid overfitting, CART identifies a sequence of nested trees that results from recursive aggregation of nodes from T MAX with a clustering procedure 11 / 52

Algorithms Pruning the tree: Error-complexity clustering (CART) In order to avoid overfitting, CART identifies a sequence of nested trees that results from recursive aggregation of nodes from T MAX with a clustering procedure For a given value α , let R ( α, T ) = R ( T ) + α | T | where | T | denotes the number of terminal nodes, or complexity, of tree T and R ( T ) is the MSE in regression or the misclassification rate in classification 11 / 52

Algorithms Pruning the tree: Error-complexity clustering (CART) In order to avoid overfitting, CART identifies a sequence of nested trees that results from recursive aggregation of nodes from T MAX with a clustering procedure For a given value α , let R ( α, T ) = R ( T ) + α | T | where | T | denotes the number of terminal nodes, or complexity, of tree T and R ( T ) is the MSE in regression or the misclassification rate in classification The optimal tree for a given α , T ( α ) , minimizes R ( α, T ) within the set of subtrees of T MAX T ( α ) belongs to a much broader set than the sequence of trees obtained in the growing algorithm 11 / 52

Algorithms Pruning the tree: Error-complexity clustering (CART) In order to avoid overfitting, CART identifies a sequence of nested trees that results from recursive aggregation of nodes from T MAX with a clustering procedure For a given value α , let R ( α, T ) = R ( T ) + α | T | where | T | denotes the number of terminal nodes, or complexity, of tree T and R ( T ) is the MSE in regression or the misclassification rate in classification The optimal tree for a given α , T ( α ) , minimizes R ( α, T ) within the set of subtrees of T MAX T ( α ) belongs to a much broader set than the sequence of trees obtained in the growing algorithm Pruning identifies a sequence of real positive numbers { α 0 , α 1 , ..., α M } such that α j < α j + 1 and T MAX ≡ T ( α 0 ) ≻ T ( α 1 ) ≻ T ( α 2 ) ≻ . . . ≻ { root } 11 / 52

Algorithms Honest tree (CART) Out of the sequence of optimal trees, { T ( α j ) } j , T MAX has lowest R ( T ) in the learning sample by construction and R ( · ) increases with α 12 / 52

Outline Introduction 1 Algorithms 2 crtrees 3 Examples 4 - PowerPoint PPT Presentation

CRTREES : A N I MPLEMENTATION OF C LASSIFICATION A ND R EGRESSION T REES (CART) & R ANDOM F ORESTS IN S TATA Ricardo Mora Universidad Carlos III de Madrid Madrid, October 2019 1 / 52 Outline Introduction 1 Algorithms 2 crtrees 3

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

QUINT: On Query-Specific Optimal Networks Presenter: Liangyue Li Joint work with Jie Tang

Nationwide Collaborative Effort nectar cloud Nationwide Collaborative Effort The NeCTAR Research

Graphs On Databases Alekh Jindal Sam Madden Mike Stonebraker CSAIL, MIT + = Jena FlockDB

K-Anonymity & Social Networks CompSci 590.03 Instructor:

Week 4 - Friday What did we talk about last time? Linked lists You are given a

Impact of Node Level Caching in MPI Job Launch Mechanisms Jaidev Sridhar and D. K. Panda

Eventual Consistency: Bayou CS 240: Computing Systems and Concurrency Lecture 13 Marco Canini

Leveraging Lessons from the Cloud Strategies every system can benefit from Jayson Raymond,

Outline Introduction 1 Algorithms 2 crtrees 3 Examples 4 - PowerPoint PPT Presentation

CRTREES : A N I MPLEMENTATION OF C LASSIFICATION A ND R EGRESSION T REES (CART) & R ANDOM F ORESTS IN S TATA Ricardo Mora Universidad Carlos III de Madrid Madrid, October 2019 1 / 52 Outline Introduction 1 Algorithms 2 crtrees 3

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

QUINT: On Query-Specific Optimal Networks Presenter: Liangyue Li Joint work with Jie Tang

Nationwide Collaborative Effort nectar cloud Nationwide Collaborative Effort The NeCTAR Research

Graphs On Databases Alekh Jindal Sam Madden Mike Stonebraker CSAIL, MIT + = Jena FlockDB

K-Anonymity &amp; Social Networks CompSci 590.03 Instructor:

Week 4 - Friday What did we talk about last time? Linked lists You are given a

Impact of Node Level Caching in MPI Job Launch Mechanisms Jaidev Sridhar and D. K. Panda

Eventual Consistency: Bayou CS 240: Computing Systems and Concurrency Lecture 13 Marco Canini

Leveraging Lessons from the Cloud Strategies every system can benefit from Jayson Raymond,

K-Anonymity & Social Networks CompSci 590.03 Instructor: