decision trees
play

Decision Trees COMPSCI 371D Machine Learning COMPSCI 371D Machine - PowerPoint PPT Presentation

Decision Trees COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Decision Trees 1 / 19 Outline 1 Motivation 2 Recursive Splits and Trees 3 Prediction 4 Purity 5 How to Split 6 When to Stop Splitting COMPSCI 371D Machine


  1. Decision Trees COMPSCI 371D — Machine Learning COMPSCI 371D — Machine Learning Decision Trees 1 / 19

  2. Outline 1 Motivation 2 Recursive Splits and Trees 3 Prediction 4 Purity 5 How to Split 6 When to Stop Splitting COMPSCI 371D — Machine Learning Decision Trees 2 / 19

  3. Motivation Linear Predictors → Trees → Forests • Linear predictors: + Few parameters → Good generalization, efficient training + Convex risk → Unique minimum risk, easy optimization + Score-based → Measure of confidence - Few parameters → Limited expressiveness: • Regessor is an affine function • Classifier is a set of convex regions in X • Decision trees: • Score based (in a sophisticated way) • Arbitrarily expressive: Flexible, but generalizes poorly • Interpretable: We can audit a decision • Random decision forests: • Ensembles of trees that vote on an answer • Expressive (somewhat less than trees), generalize well COMPSCI 371D — Machine Learning Decision Trees 3 / 19

  4. Recursive Splits and Trees Splitting X Recursively 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 COMPSCI 371D — Machine Learning Decision Trees 4 / 19

  5. Recursive Splits and Trees A Decision Tree Choose splits to maximize purity a: d = 2 t = 0.265 b: d = 1 c: d = 2 t = 0.41 t = 0.34 d: d = 1 p = [0, 1, 0] p = [1, 0, 0] p = [1, 0, 0] t = 0.16 e: d = 2 p = [0, 0, 1] t = 0.55 p = [1, 0, 0] p = [0, 0, 1] COMPSCI 371D — Machine Learning Decision Trees 5 / 19

  6. Recursive Splits and Trees What’s in a Node • Internal: • Split parameters: Dimension j ∈ { 1 , . . . , d } , threshold t ∈ R • Pointers to children, corresponding to subsets of T : def L = { ( x , y ) ∈ S | x j ≤ t } def R = { ( x , y ) ∈ S | x j > t } • Leaf: Distribution of training values y in this subset of X : p , discrete for classification, histogram for regression • At inference time, return a summary of p as the value for the leaf • Mode (majority) for a classifier • Mean or median for a regressor • (Remember k -NN?) COMPSCI 371D — Machine Learning Decision Trees 6 / 19

  7. Recursive Splits and Trees Why Store p ? • Can’t we just store summary ( p ) at the leaves? • With p , we can compute a confidence value • (More important) We need p at every node during training to evaluate purity COMPSCI 371D — Machine Learning Decision Trees 7 / 19

  8. Prediction Prediction function y ← predict ( x , τ, summary ) if leaf ?( τ ) then return summary ( τ. p ) else return predict ( x , split ( x , τ ) , summary ) end if end function function τ ← split ( x , τ ) if x τ. j ≤ τ. t then return τ. L else return τ. R end if end function COMPSCI 371D — Machine Learning Decision Trees 8 / 19

  9. Purity Design Decisions for Training • How to define (im)purity • How to find optimal split parameters j and t • When to stop splitting COMPSCI 371D — Machine Learning Decision Trees 9 / 19

  10. Purity Impurity Measure 1: The Error Rate • Simplest option: i ( S ) = err ( S ) = 1 − max y p ( y | S ) • S : subset of T that reaches the given node • Interpretation: • Put yourself at node τ • The distribution of training-set labels that are routed to τ is that of the labels in S • The best the classifier can do is to pick the label with the highest fraction, max y p ( y | S ) • If the distribution is representative, err ( S ) is the probability that the classifier is wrong at τ (empirical risk) COMPSCI 371D — Machine Learning Decision Trees 10 / 19

  11. Purity Impurity Measure 2: The Gini Index • A classifier that always picks the most likely label does best at inference time • However, it ignores all other labels at training time p = [ 0 . 5 , 0 . 49 , 0 . 01 ] same error rate as q = [ 0 . 5 , 0 . 25 , 0 . 25 ] • In p , we have almost eliminated the third label • q closer to uniform, perhaps less desirable • For evaluating splits (only), consider a stochastic predictor : y = h Gini ( x ) = ˆ ˆ y with probability p (ˆ y | S ( x )) • The Gini index measures the empirical risk for the stochastic predictor (looks at all of p , not just p max ) • Says that p is a bit better than q : p is less impure than q • i ( S p ) ≈ 0 . 51 and i ( S q ) ≈ 0 . 62 COMPSCI 371D — Machine Learning Decision Trees 11 / 19

  12. Purity The Gini Index • Stochastic predictor : ˆ y = h Gini ( x ) = ˆ y with probability p (ˆ y | S ( x )) • What is the empirical risk for h Gini ? • True answer is y with probability p ( y | S ( x )) • If the true answer is y , then ˆ y is wrong with probability ≈ 1 − p ( y | S ) (because h Gini picks y with probability p ( y | S ( x )) ) • Therefore, impurity defined as the empirical risk of h Gini is i ( S ) = L S ( h Gini ) = � y ∈ Y p ( y | S )( 1 − p ( y | S )) = y ∈ Y p 2 ( y | S ) 1 − � COMPSCI 371D — Machine Learning Decision Trees 12 / 19

  13. How to Split How to Split • Split at training time: If training subset S made it to the current node, put all samples in S into either L or R by the split rule • Split at inference time: Send x either to τ. L or to τ. R • Either way: • Choose a dimension j in { 1 , . . . , d } • Choose a threshold t • Any data point for which x j ≤ t goes to τ. L • All other points go to τ. R • How to pick j and t ? COMPSCI 371D — Machine Learning Decision Trees 13 / 19

  14. How to Split How to Pick j and t at Each Node? • Try all possibilities and pick the best • “Best:” Maximizes the decrease in impurity: ∆ i ( S , L , R ) = i ( S ) − | L | | S | i ( L ) − | R | | S | i ( R ) • “All possibilities:” Choices are finite in number x ( 0 ) ( u j ) • Sorted unique values in x j across T : , . . . , x j j • Possible thresholds: t = t ( 1 ) ( u j ) , . . . , t j j x ( ℓ − 1 ) + x ( ℓ ) where t ( ℓ ) j j = for ℓ = 1 , . . . , u j j 2 • Nested loop: for j = 1 , . . . , d ( u j ) for t = t ( 1 ) , . . . , t j j • Efficiency hacks are possible COMPSCI 371D — Machine Learning Decision Trees 14 / 19

  15. When to Stop Splitting Stopping too Soon is Dangerous • Temptation: Stop when impurity does not decrease o + + o o o + o + + o o o + + o o o + o + o + COMPSCI 371D — Machine Learning Decision Trees 15 / 19

  16. When to Stop Splitting When to Stop Splitting • Possible stopping criteria • Impurity is zero • Too few samples would result in either L or R • Maximum depth reached • Overgrow the tree, then prune it • There is no optimal pruning method (Finding the optimal tree is NP-hard) (Reduction from set cover problem, Hyafil and Rivest) • Better option: Random Decision Forests COMPSCI 371D — Machine Learning Decision Trees 16 / 19

  17. When to Stop Splitting Summary: Training a Decision Tree • Use exhaustive search at the root of the tree to find the dimension j and threshold t that splits T with the biggest decrease in impurity • Store j and t at the root of the tree • Make new children with L and R • Repeat on the two subtrees until some criterion is met 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 COMPSCI 371D — Machine Learning Decision Trees 17 / 19

  18. When to Stop Splitting Summary: Predicting with a Decision Tree • Use j and t at the root τ to see if x belongs in τ. L or τ. R • Go to the appropriate child • Repeat until a leaf is reached • Return summary ( p ) • summary is majority for a classifier, mean or median for a regressor COMPSCI 371D — Machine Learning Decision Trees 18 / 19

  19. When to Stop Splitting From Trees to Forests • Trees are flexible → good expressiveness • Trees are flexible → poor generalization • Pruning is an option, but messy • Random Decision Forests let several trees vote • Use the bootstrap to give different trees different views of the data • Randomize split rules to make trees even more independent COMPSCI 371D — Machine Learning Decision Trees 19 / 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend