Decision trees Decision Trees / Discrete Variables Location Season - PowerPoint PPT Presentation

Decision trees

Decision Trees / Discrete Variables Location Season Location Fun? Ski Slope Prison summer prison -1 Beach summer beach +1 -1 +1 Winter ski-slope +1 Season Winter beach -1 Winter Summer -1 +1

Decision Trees / Discrete Variables Mass>8 Mass Temperature explosion 1 100 -1 yes no 3.4 945 -1 Temperature>500 -1 10 32 -1 yes no 11.5 1202 +1 -1 +1

Decision Trees Y X>3 +1 no yes 5 -1 Y>5 -1 yes no -1 -1 +1 X 3

Decision trees • Popular because very flexible and easy to interpret. • Learning a decision tree = finding a tree with small error on the training set. 1. Start with the root node. 2. At each step split one of the leaves 3. Repeat until a termination criterion.

Which node to split? • We want the children to be more “pure” than the parent. • Example: • Parent node is 50%+, 50%-. • Child nodes are (90%+,10%-),(10%+,90%-) • How can we quantify improvement in purity?

First approach:   minimize error Parent yes no P ( + 1) = Probability of label +1 in parent node. P ( A ) + P ( B ) = 1 Probability of each one of the children B A P ( + 1| A ), P ( + 1| B ) = Probability of label +1 condition on each of the children P ( + 1) = P ( + 1| A ) P ( A ) + P ( + 1| B ) P ( B ) At Parent: if P ( + 1) > P ( − 1) then: predict +1 Else: predict -1 Error rate = min( P ( + 1), P ( − 1)) = min( P ( + 1),1 − P ( + 1)) At node A: if P ( + 1| A ) > P ( − 1| A ) then: predict +1 At node B: If P ( + 1| B ) > P ( − 1| B ) then: predict +1 Else: predict -1 Else: predict -1 Error rate = min( P ( + 1| A ),1 − P ( + 1| A )) Error rate = min( P ( + 1| B ),1 − P ( + 1| B )) Combined error of A and B: P ( A )min( P ( + 1| A ),1 − P ( + 1| A )) + P ( B )min( P ( + 1| B ),1 − P ( + 1| B ))

The problem with classification error. Define err( p ) = min( p ,1 − p ) error rate at parent - error rate at children ( ) ( ) ( ) + P ( B )err P + 1| B ( ) = err( P ( + 1)) − P ( A )err P ( + 1| A ) We also that P ( + 1) = P ( + 1| A ) P ( A ) + P ( + 1| B ) P ( B ) ⎡ ⎤ Therefor if P ( + 1| A ) > 1 2 and P ( + 1| B ) > 1 ⎢ ⎥ ⎣ ⎦ 2 ⎡ ⎤ or P ( + 1| A ) < 1 2 and P ( + 1| B ) < 1 ⎢ ⎥ ⎣ ⎦ 2 Then the change in the error is zero.

The problem with classification error (pictorially) P ( + 1| A ) = 0.7, P ( + 1) = 0.8, P ( + 1| B ) = 0.9

Fixing the problem instead of err( p ) = min( p ,1 − p ) use H ( p ) = − 1 ( ) 2 p log 2 p + (1 − p )log 2 (1 − p ) 2 P ( + 1| A ) = 0.7, P ( + 1) = 0.8, P ( + 1| B ) = 0.9

Any strictly convex function can be used H ( P ) = p log p + (1 − p )log(1 − p ) Circle ( p ) = 1/ 4 − ( p − 1/ 2) 2 Gini ( p ) = p (1 − p )

Decision tree learning algorithm • Learning a decision tree = finding a tree with small error on the training set. 1. Start with the root node. 2. At each step split one of the leaves 3. Repeat until a termination criterion.

The splitting step • Given: current tree. • For each leaf and each feature, • find all possible splitting rules (finite because data is finite). • compute reduction in entropy • find leaf X feature X split rule the minimizes entropy. • Add selected rule to split selected leaf.

Enumerating splitting rules • If feature has a fixed, small, number of values. then either: • Split on all values (Location is beach/prison/ski-slope) • or Split on equality to one value (location = beach) • If feature is continuous (temperature) then either: • Sort records by feature value and search for best split. • or split on parcentiles: 1%,2%,….,99%

Splitting on percentiles • Suppose data is in an RDD with 100 million examples. • sorting according to each feature value is very expensive. • Instead: use Sample(false,0.00001).collect() to get a sample of about 10,000 examples. • sort the sample (small, sort in head node). • pick examples at location 100,200,… as boundaries. Call those feature values T1,T2,T3,… ,T99 • Broadcast boundaries to all partitions. • Each partition computes it’s contribution to P ( + 1| T i ≤ f ≤ T i + 1 )

Pruning trees • Trees are very flexible. • A “fully grown” tree is one where all leaves are “pure”, i.e. are all +1 or all -1. • A fully grown tree has training error zero. • If the tree is large and the data is limited, the test error of the tree is likely to be high = the tree overfits the data. • Statisticians say that trees are “high variance” or “unstable” • One way to reduce overfitting is “pruning” which means that the fully grown tree is made smaller by “pruning” leaves that have have few examples and contribute little to the training set performance.

Bagging • Bagging, invented by Leo Breiman in the 90s, is a different way to reduce the variance of trees. • Instead of pruning the tree, we generate many trees, using randomly selected subsets of the training data. • We predict using the majority vote over the trees. • A more sophisticated method to reduce variance, that is currently very popular, is “Random Forests” about which we will talk in a later lesson.

Decision trees Decision Trees / Discrete Variables Location Season - PowerPoint PPT Presentation

Decision trees Decision Trees / Discrete Variables Location Season Location Fun? Ski Slope Prison summer prison -1 Beach summer beach +1 -1 +1 Winter ski-slope +1 Season Winter beach -1 Winter Summer -1 +1 Decision Trees

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Discrete Random Variables October 7, 2010 Discrete Random Variables Random Variables In many

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

Decision Trees and Nave Bayes 3/29/17 Hypothesis Spaces Decision Trees and K-Nearest

YCL Week 3 Lets talk about variables! Variables Variables are containers for data. Variables

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Lecture 7 The Five Basic Discrete Random Variables In this lecture we define and study the five

Chapter 2: Random Variables In this chapter we will cover: 1. Discrete Random variables, ( 2.1

Closures & Scoping Variables Parameters Local variables Free variables

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

Lecture 23: Decision Trees Decision trees Prof. Julia Hockenmaier

IAML: Decision Trees Chris Williams and Victor Lavrenko School of Informatics Semester 1 1 / 17

Decision Trees Matt Gormley Lecture 2 January 22, 2018 1 Reminders Homework 1: Background

A Balance Sheet Perspective on Financial Success: Why Starting Early Matters FLEC Financial

106 Cd and 116 Cd with enriched 106 CdWO 4 and 116 CdWO 4 crystal scintillators V.I. Tretyak a,b ,

Decision Trees 2-26-16 Reading Quiz Decision trees are an algorithm for which machine learning

Hauliers & Logistics EU Business Readiness Webinar 10 th October 2019 Border Delivery Group

TDDE18 & 726G77 Classes & Pointers Christoffer Holm Department of Computer and

Using Loops } Sam earns $100 per day with a daily raise of $100 . Sue earns $0.01 per day with a

Decision trees Decision Trees / Discrete Variables Location Season - PowerPoint PPT Presentation

Decision trees Decision Trees / Discrete Variables Location Season Location Fun? Ski Slope Prison summer prison -1 Beach summer beach +1 -1 +1 Winter ski-slope +1 Season Winter beach -1 Winter Summer -1 +1 Decision Trees

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Discrete Random Variables October 7, 2010 Discrete Random Variables Random Variables In many

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

Decision Trees and Nave Bayes 3/29/17 Hypothesis Spaces Decision Trees and K-Nearest

YCL Week 3 Lets talk about variables! Variables Variables are containers for data. Variables

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Lecture 7 The Five Basic Discrete Random Variables In this lecture we define and study the five

Chapter 2: Random Variables In this chapter we will cover: 1. Discrete Random variables, ( 2.1

Closures &amp; Scoping Variables Parameters Local variables Free variables

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

Lecture 23: Decision Trees Decision trees Prof. Julia Hockenmaier

IAML: Decision Trees Chris Williams and Victor Lavrenko School of Informatics Semester 1 1 / 17

Decision Trees Matt Gormley Lecture 2 January 22, 2018 1 Reminders Homework 1: Background

A Balance Sheet Perspective on Financial Success: Why Starting Early Matters FLEC Financial

106 Cd and 116 Cd with enriched 106 CdWO 4 and 116 CdWO 4 crystal scintillators V.I. Tretyak a,b ,

Decision Trees 2-26-16 Reading Quiz Decision trees are an algorithm for which machine learning

Hauliers &amp; Logistics EU Business Readiness Webinar 10 th October 2019 Border Delivery Group

TDDE18 &amp; 726G77 Classes &amp; Pointers Christoffer Holm Department of Computer and

Using Loops } Sam earns $100 per day with a daily raise of $100 . Sue earns $0.01 per day with a

Closures & Scoping Variables Parameters Local variables Free variables

Hauliers & Logistics EU Business Readiness Webinar 10 th October 2019 Border Delivery Group

TDDE18 & 726G77 Classes & Pointers Christoffer Holm Department of Computer and