Lecture 23: Decision Trees Decision trees Prof. Julia Hockenmaier - PowerPoint PPT Presentation

CS440/ECE448: Intro to Artificial Intelligence � Lecture 23:   Decision Trees � Decision trees � Prof. Julia Hockenmaier � juliahmr@illinois.edu � � http://cs.illinois.edu/fa11/cs440 � � � Decision trees � Decision tree learning � drink? � Training data D = {( x 1 , y 1 ),…, ( x N , y N )} coffee � tea � – each x i = ( x 1 i ,…., x d i ) is a d -dimensional feature vector � – each y i is the target label (class) of the i-th data point � milk? � milk? � � no � yes � yes � no � Training algorithm: � – Initial tree = the root, corresponding to all items in D no sugar � sugar � sugar � no sugar � – A node is a leaf if all its data items have the same y � – At each non-leaf node: find the feature x i with the highest information gain, create a new child for each value of x i , distribute the items accordingly. � CS440/ECE448: Intro AI � 3 � CS440/ECE448: Intro AI � 4 �

Information Gain � Dealing with numerical attributes � How much information are we gaining by splitting Many attributes are not boolean (0,1)   node S on attribute A with values V(A) ? � or nominal (classes) � � – Number of times a word appears in a text � Information required before the split: � – RGB values of a pixel � H(S parent ) � – height, weight, …. � Information required after the split: � ! i ∈ V(A) P(S child_i )H(S child_i )   Splitting on integer or real-valued attributes: � � – Find a split point: A i < " or A i # " ?   N S child i � # Gain ( S parent , A ) = H ( S parent ) ! H ( S child i ) � S parent i " V ( A ) CS440/ECE448: Intro AI � 6 � Complete Training Data � Our training data � + - - + + + - - + - + - + + - - + + + - - + - + - - + - - + - + - - + - + - + + - - + + - - - + - + - + + - - + + + - - + - + - + + - - + - + � - - + + + - + - + + - + - + + + - - - + - + - + - - - + - + + + + - + - + - - + - + � - - - + - - + - - - � + + + + � - - + - + - + - - - - - - - - - - - + + + + - + + + � - - - - - - � - - - - - + + + + - - � + + + + + + � - - - - - � + + + + + + � CS440/ECE448: Intro AI � 8 �

The example space � Generalization � We need to label unseen examples accurately. � � But: � The training data is only a very small sample of the example space. � – We won ʼ t have seen all possible combinations of attribute values. � � The training data may be noisy � – Some items may have incorrect attributes or labels � � CS440/ECE448: Intro AI � 9 � CS440/ECE448: Intro AI � 10 � When does learning stop? � The effect of noise � The tree will grow until all leaf nodes have   If the training data are noisy, � only one label. � it may introduce incorrect splits. � � � + - + - � If this false value + - + - � � � should have been + - - � + � A2: true A2: false true, we wouldn ʼ t split on A2. � + - - � + � + - � - � + - � - � If this + label should + � - � have been -, we wouldn ʼ t have to split   + � - � any further. �

The effect of incomplete data � Overfitting � If the training data are incomplete,   The decision tree might overfit the particularities of we may miss important generalizations. � the training data. � � � On training data � full + - + - + + - - - � + - + - + + - - - example   + + � + + � space � + - + - � + - + - � Accuracy � training data � On test data � A4 � A4 A2 A2 � + +   + + � - -   - - � + - + - + � + - - � + - + � Size of tree � + + + + � - - - - - � - - + + � We should have split on A4, not A2. � CS440/ECE448: Intro AI � 14 � Reducing Overfitting in Decision Pruning a decision tree � Trees � 1. Train a decision tree on training data � Limit the depth of the tree � – No deeper than N (say 3 or 12 or 86 - how to choose?)   (keep a part of training data as unseen � validation data) � Require a minimum number of examples   used to select a split � 2. Prune from the leaves: � – Need at least M (is 10 enough? 20?) � – Want significance: Statistical hypothesis testing can help � Simplest method: � Replace (prune) each non-leaf node whose BEST: Learn an overfit tree and prune, using validation children are all leaves with its majority label. � (held-out) data � Keep this change if the accuracy on validation set does not degrade. � 15 � CS440/ECE448: Intro AI � 16 �

Dealing with overfitting � Bias-variance tradeoff � Overfitting is a very common problem in machine Bias: What kind of hypotheses do we allow? � learning. � We want rich enough hypotheses to capture   � the target function f( x ) � � Many machine learning algorithms have Variance: How much does our learned hypothesis parameters that can be tuned to improve change if we resample the training data? � performance (because they reduce overfitting). � � Rich hypotheses (e.g. large decision trees)   need more data (which we may not have) � We use a held-out data set to set these parameters. � CS440/ECE448: Intro AI � 17 � CS440/ECE448: Intro AI � 18 � Reducing variance: bagging � Create a new training set by sampling (with replacement) N items from the original data set. � � Repeat this K times to get K training sets. � Regression � (K is an odd number, e.g. 3, 5, …) � � Train one classifier on each of the K training sets � � Testing: take the majority vote of these K classifiers � � CS440/ECE448: Intro AI � 19 �

Polynomial curve fitting � Polynomial curve fitting � Given some data {(x,y)…}, with x, y ∈ R, � f ( x ) = w 0 + w 1 x 1 + w 2 x 2 + ... + w m x m find a function f such that f(x) = y. � m ! w i x i = o i = 0 Task: � find weights w 0 … w m to best fit the data. � o o � o This requires a loss (error) function o CS440/ECE448: Intro AI � 22 � Squared Loss � Accounting for model complexity � We would like to find the simplest polynomial to fit We want to find a weight vector w which our data. � minimizes the loss (error) on the training � data {(x 1 ,y 1 )…(x N , y N )} We need to penalize the degree of the polynomial. � � N ! L ( w ) = L 2 ( f w ( x i ), y i ) We can add a regularization term to the loss which � penalizes for overly complex functions) � i = 1 N � ! ) 2 = ( y i " f w ( x i ) i = 1 CS440/ECE448: Intro AI � 23 � CS440/ECE448: Intro AI � 24 �

Linear regression � Given some data {(x,y)…}, with x, y ∈ R, � find a function f(x) = w 1 x + w 0 such that f(x) = y. � Regression � o o o o o Linear regression � We need to minimize the loss on the training data: w = argmin w Loss(f w ) � � We need to set partial derivatives of Loss(f w ) � with respect to w1, w0 to zero. � � This has a closed-form solution (see book). �

Lecture 23: Decision Trees Decision trees Prof. Julia Hockenmaier - PowerPoint PPT Presentation

CS440/ECE448: Intro to Artificial Intelligence Lecture 23: Decision Trees Decision trees Prof. Julia Hockenmaier juliahmr@illinois.edu http://cs.illinois.edu/fa11/cs440 Decision trees Decision tree

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

Splay Trees and B-Trees CSE 373 Data Structures Lecture 9 Readings Reading Sections

Learning Decision Trees Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Decision Trees: Discussion Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

/ + - * * 5 3 2 6 5 2 Examples Binary Trees BSTs Augmenting BinExpr General Trees

Computational Geometry Lecture 8: Range trees 1 Computational Geometry Lecture 8: Range trees

Computational Geometry Lecture 7: Range searching and kd-trees Computational Geometry Lecture 7:

Page 1 1 Midterm Topics Reading: Today H1, P1, H2, P2 FCG Chapter 11 pp 209-214

Likely Program Invariants Michael Ernst, Jake Cockrell, Bill Griswold (UCSD), and David Notkin

Environmental Analysis & Decision-Making Region 6 Roundtable February 22-23, 2018

Prelim 2 Review Spring 2019 Exam Info 4/21/19 Prelim 2 Review 2 What is on the Exam?

Closed forms for generating series, and finite summation analogues modulo a prime Sandro Mattarei

Variational Inference CMSC 691 UMBC Goal: Posterior Inference Hyperparameters Unknown

Recursive Definitions And Applications to Counting C(n,k) C(n,k) = C(n-1,k-1) + C(n-1,k) (where

TOP SECRET CONFIDENTIAL 1 TOP SECRET WITCHCRAFT SECRETS CONFIDENTIAL 2 Witchcraft Secrets

Lecture 23: Decision Trees Decision trees Prof. Julia Hockenmaier - PowerPoint PPT Presentation

CS440/ECE448: Intro to Artificial Intelligence Lecture 23: Decision Trees Decision trees Prof. Julia Hockenmaier juliahmr@illinois.edu http://cs.illinois.edu/fa11/cs440 Decision trees Decision tree

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

Splay Trees and B-Trees CSE 373 Data Structures Lecture 9 Readings Reading Sections

Learning Decision Trees Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Decision Trees: Discussion Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

/ + - * * 5 3 2 6 5 2 Examples Binary Trees BSTs Augmenting BinExpr General Trees

Computational Geometry Lecture 8: Range trees 1 Computational Geometry Lecture 8: Range trees

Computational Geometry Lecture 7: Range searching and kd-trees Computational Geometry Lecture 7:

Page 1 1 Midterm Topics Reading: Today H1, P1, H2, P2 FCG Chapter 11 pp 209-214

Likely Program Invariants Michael Ernst, Jake Cockrell, Bill Griswold (UCSD), and David Notkin

Environmental Analysis &amp; Decision-Making Region 6 Roundtable February 22-23, 2018

Prelim 2 Review Spring 2019 Exam Info 4/21/19 Prelim 2 Review 2 What is on the Exam?

Closed forms for generating series, and finite summation analogues modulo a prime Sandro Mattarei

Variational Inference CMSC 691 UMBC Goal: Posterior Inference Hyperparameters Unknown

Recursive Definitions And Applications to Counting C(n,k) C(n,k) = C(n-1,k-1) + C(n-1,k) (where

TOP SECRET CONFIDENTIAL 1 TOP SECRET WITCHCRAFT SECRETS CONFIDENTIAL 2 Witchcraft Secrets

Environmental Analysis & Decision-Making Region 6 Roundtable February 22-23, 2018