Chapter 5. Tree-based Methods Wei Pan Division of Biostatistics, - PowerPoint PPT Presentation

Chapter 5. Tree-based Methods Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 � Wei Pan c

Regression And Classification Tree (CART) ◮ § 9.2: Breiman et al (1984). ≈ C4.5 (Quinlan 1993). ◮ Main idea: approximate any f ( x ) by a piece-wise constant ˆ f ( x ). ◮ Use recursive partitioning: Fig 9.2, 1) Partition the x space into two regions R 1 and R 2 by x j < c j ; 2) Partition R 1 , R 2 ; 3) Then their sub-regions, ... until the model fits data well. ◮ ˆ f ( x ) = � m c m I ( x ∈ R m ). can be represented as a (decision) tree.

Elements of Statistical Learning (2nd Ed.) � Hastie, Tibshirani & Friedman 2009 Chap 9 c R 5 R 2 t 4 X 2 X 2 R 3 t 2 R 4 R 1 t 1 t 3 X 1 X 1 X 1 ≤ t 1 | X 2 ≤ t 2 X 1 ≤ t 3 X 2 ≤ t 4 R 1 R 2 R 3 X 2 X 1 R 4 R 5 FIGURE 9.2. Partitions and CART. Top right panel shows a partition of a two-dimensional feature space by recursive binary splitting, as used in CART, applied to some fake data. Top left panel shows a general partition that cannot be obtained from recursive binary splitting.

Regression Tree ◮ Y : continuous. ◮ Key: 1) determin splitting variables and split points (e.g. x j < t j ); = ⇒ R 1 , R 2 , ...; 2) determine c m in each R m . ◮ in 1), use a sequential or greedy searchfor each j and s : find x j < s s.t. R 1 ( j , s ) = { x | x j < s } , R 2 ( j , s ) = { x | x j ≥ s } , X i ∈ R 1 ( j , s ) ( Y i − c 1 ) 2 +min c 2 X i ∈ R 2 ( j , s ) ( Y i − c 2 ) 2 ]. min j , s [min c 1 � � ◮ in 2), given R 1 and R 2 , ˆ c k = Ave( Y i | X i ∈ R k } for k = 1 , 2. ◮ Repeat the process on R 1 and R 2 respectively, ... ◮ When to stop? Have to stop when having all equal or too few Y i ’s in R m ; Tree size gives a model complexity!

◮ A strategy: first grow a large tree, then prune it. ◮ Cost-complexity criterion for tree T : c m ) 2 + α | T | , � � C α ( T ) = RSS ( T ) + α | T | = ( Y i − ˆ m X i ∈ R m where | T | is # of terminal nodes (leaves) and α > 0 is a tuning parameter to be determined by CV.

Elements of Statistical Learning (2nd Ed.) � Hastie, Tibshirani & Friedman 2009 Chap 9 c α 176 21 7 5 3 2 0 0.4 0.3 Misclassification Rate 0.2 0.1 0.0 0 10 20 30 40 Tree Size FIGURE 9.4. Results for spam example. The blue curve is the 10 -fold cross-validation estimate of misclassification rate as a function of tree size, with standard error bars. The minimum occurs at a tree size with about 17 terminal nodes (using the “one-standard- -error” rule). The orange curve is the test error, which tracks the CV error quite closely. The cross-validation is indexed by values of α , shown above. The tree sizes shown below refer to | T α | , the size of the original tree indexed by α .

Classification Tree ◮ Y i ∈ { 1 , 2 , ..., K } . ◮ Classify obs’s in node m to the majority class: ˆ p mk = � X i ∈ R m I ( Y i = k ) / n m , k ( m ) = arg max k ˆ p mk . ◮ Impurity measure Q m ( T ): Used squarted error in regression trees. 1. Misclassification error: 1 � X i ∈ R m I ( Y i � = k ( m )) = 1 − ˆ p m , k ( m ) . n m 2. Gini index: � K k =1 ˆ p mk (1 − ˆ p mk ). 3. Cross-entropy or deviance: � K k =1 ˆ p mk log ˆ p mk . ◮ For K = 2, 1-3 reduce to 1 − max(ˆ p , 1 − ˆ p ), 2ˆ p (1 − ˆ p ), − ˆ p log ˆ p − (1 − ˆ p ) log(1 − ˆ p ). Look similar; see Fig 9.3. ◮ Example: ex5.1.r

◮ Advantages: 1. Easy to incorporate unequal losses of misclassifications: 1 � X i ∈ R m w i I ( Y i � = k ( m )) with w i = C k if Y i = k . n m 2. Handling missing data: use a surrogate splitting var/value at each node (to best approximate the selected one). ◮ Extensions: 1. May use non-binary splits; 2. A linear combination of multiple var’s as a splitting var. more flexible, but better? ◮ +: easy interpretation –decision trees! -: unstable due to greedy search and discontinuity; predicting performance not best. ◮ R packages tree , rpart ; commercial CART. ◮ Other implementations: C4.5/C5.0; FIRM by Prof Hawkins (U of M): to detect interactions; by Prof Loh’s group (UW-Madison): for count, survival, ... data; regression in each terminal node; ...

Elements of Statistical Learning (2nd Ed.) � Hastie, Tibshirani & Friedman 2009 Chap 9 c email 600/1536 ch$<0.0555 ch$>0.0555 email spam 280/1177 48/359 remove<0.06 hp<0.405 remove>0.06 hp>0.405 email spam spam email 180/1065 9/112 26/337 0/22 ch!<0.191 george<0.15 CAPAVE<2.907 ch!>0.191 george>0.15 CAPAVE>2.907 email email spam email spam spam 80/861 100/204 6/109 0/3 19/110 7/227 george<0.005 CAPAVE<2.7505 1999<0.58 george>0.005 CAPAVE>2.7505 1999>0.58 email email email spam spam email 80/652 0/209 36/123 16/81 18/109 0/1 hp<0.03 free<0.065 hp>0.03 free>0.065 email email email spam 77/423 3/229 16/94 9/29 CAPMAX<10.5 business<0.145 CAPMAX>10.5 business>0.145 email email email spam 20/238 57/185 14/89 3/5 receive<0.125 edu<0.045 receive>0.125 edu>0.045 email spam email email 19/236 1/2 48/113 9/72 our<1.2 our>1.2 email spam 37/101 1/12 FIGURE 9.5. The pruned tree for the spam example.

Elements of Statistical Learning (2nd Ed.) � Hastie, Tibshirani & Friedman 2009 Chap 9 c 1.0 • • • • • • • • • •• ••• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 0.8 • • • • • • • • • • 0.6 • Sensitivity • • Tree (0.95) GAM (0.98) Weighted Tree (0.90) 0.4 • 0.2 • 0.0 • • • 0.0 0.2 0.4 0.6 0.8 1.0 Specificity FIGURE 9.6. ROC curves for the classification rules fit to the spam data. Curves that are closer to the north- east corner represent better classifiers. In this case the GAM classifier dominates the trees. The weighted tree achieves better sensitivity for higher specificity than the unweighted tree. The numbers in the legend represent the area under the curve.

Application: personalized medicine ◮ Also called subgroup analysis (or Precision Medicine): to identify subgroups of patients that would be most benefit from a treatment. ◮ Statistical problem: detect (qualitative) trt-predictor interaction! quantitative interactions: differ in magnitudes but in teh same direction; qualitative interactions: differ in directions. ◮ Many approaches ... one of them is to use trees. ◮ Prof Loh’s GUIDE: http://www.stat.wisc.edu/ ∼ loh/guide.html ◮ An example: http://onlinelibrary.wiley.com/doi/10.1002/sim. 6454/abstract ◮ Another example: https://www.ncbi.nlm.nih.gov/pubmed/24983709

Chapter 5. Tree-based Methods Wei Pan Division of Biostatistics, - PowerPoint PPT Presentation

Chapter 5. Tree-based Methods Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 Wei Pan c Regression And Classification Tree (CART)

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

15 Tree-based MT In this chapter, we will cover methods for sequence-to-sequence mapping that are

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

Tree-based Methods Here we describe tree-based methods for regression and classification.

Tree-based Methods Here we describe tree-based methods for regression and classification.

Binary Tree Traversal Methods Preorder Inorder In a traversal of a binary tree, each

Binary Tree Traversal Methods Preorder Inorder In a traversal of a binary tree, each

Session 12 Tree-based models: tree and rpart Two libraries The tree library is like the

PLTree A tree programming language Overview Philosophy: Everything is a tree All data structures

Education Endowment (TREE) Fund TREE Fund is a 501(c)3 nonprofit organization that supports

Services Using E-Tree Service Type Ethernet Private Tree (EP-Tree) and Ethernet Virtual Private

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

TREE = TOKEN The Frontier of Impact Finance T TREE T TREE Token = oken = 1 The Frontier

Capturing Translational Divergences with Zhechev & Andy Way a Statistical Tree-to-Tree

Case Studies in Asynchronous, Message-Driven Shared Memory Programming Pritish Jetley Parallel

Merge Sort 7 2 9 4 2 4 7 9 7 2 2 7 9 4 4 9 7 7 2 2 9 9 4

ray tracing 2 1 improving raytracing speed 2 raytracing computational complexity ray-scene

Brick diagrams, string diagrams, proof trees, k-d trees Jules Hedges Jelle Herold Max Planck

segment tree By Zohre Akbari January2014 2 Arbitrarily oriented segments Two cases of

15-388/688 - Practical Data Science: Decision trees and interpretable models J. Zico Kolter

Two Major Rendering Methods: Rasterization and Ray Tracing Sung-Eui Yoon ( ) Course

Lecture 4: Rule-based classification and regression Felix Held, Mathematical Sciences

Chapter 5. Tree-based Methods Wei Pan Division of Biostatistics, - PowerPoint PPT Presentation

Chapter 5. Tree-based Methods Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 Wei Pan c Regression And Classification Tree (CART)

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

15 Tree-based MT In this chapter, we will cover methods for sequence-to-sequence mapping that are

Final Examples Announcements Trees Tree-Structured Data def tree(label, branches=[]): A tree

Tree-based Methods Here we describe tree-based methods for regression and classification.

Tree-based Methods Here we describe tree-based methods for regression and classification.

Binary Tree Traversal Methods Preorder Inorder In a traversal of a binary tree, each

Binary Tree Traversal Methods Preorder Inorder In a traversal of a binary tree, each

Session 12 Tree-based models: tree and rpart Two libraries The tree library is like the

PLTree A tree programming language Overview Philosophy: Everything is a tree All data structures

Education Endowment (TREE) Fund TREE Fund is a 501(c)3 nonprofit organization that supports

Services Using E-Tree Service Type Ethernet Private Tree (EP-Tree) and Ethernet Virtual Private

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

TREE = TOKEN The Frontier of Impact Finance T TREE T TREE Token = oken = 1 The Frontier

Capturing Translational Divergences with Zhechev &amp; Andy Way a Statistical Tree-to-Tree

Case Studies in Asynchronous, Message-Driven Shared Memory Programming Pritish Jetley Parallel

Merge Sort 7 2 9 4 2 4 7 9 7 2 2 7 9 4 4 9 7 7 2 2 9 9 4

ray tracing 2 1 improving raytracing speed 2 raytracing computational complexity ray-scene

Brick diagrams, string diagrams, proof trees, k-d trees Jules Hedges Jelle Herold Max Planck

segment tree By Zohre Akbari January2014 2 Arbitrarily oriented segments Two cases of

15-388/688 - Practical Data Science: Decision trees and interpretable models J. Zico Kolter

Two Major Rendering Methods: Rasterization and Ray Tracing Sung-Eui Yoon ( ) Course

Lecture 4: Rule-based classification and regression Felix Held, Mathematical Sciences

Capturing Translational Divergences with Zhechev & Andy Way a Statistical Tree-to-Tree