Lecture 8: Regression Trees
Instructor: Saravanan Thirumuruganathan
CSE 5334 Saravanan Thirumuruganathan
Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan - - PowerPoint PPT Presentation
Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan Thirumuruganathan Outline 1 Regression 2 Linear Regression 3 Regression Trees CSE 5334 Saravanan Thirumuruganathan Regression and Linear Regression CSE
Instructor: Saravanan Thirumuruganathan
CSE 5334 Saravanan Thirumuruganathan
1 Regression 2 Linear Regression 3 Regression Trees CSE 5334 Saravanan Thirumuruganathan
CSE 5334 Saravanan Thirumuruganathan
Dataset:
Training (labeled) data: D = {(xi, yi)} xi ∈ Rd Test (unlabeled) data: x0 ∈ Rd
Tasks:
Classification: yi ∈ {1, 2, . . . , C} Regression: yi ∈ R
Objective: Given x0, predict y0 Supervised learning as yi was given during training
CSE 5334 Saravanan Thirumuruganathan
Predict cost of house from details Predict job salary from job description Predict SAT, GRE scores Predict future price of Petrol from past prices Predict future GDP of a country, valuation of a company
CSE 5334 Saravanan Thirumuruganathan
CSE 5334 Saravanan Thirumuruganathan
CSE 5334 Saravanan Thirumuruganathan
CSE 5334 Saravanan Thirumuruganathan
CSE 5334 Saravanan Thirumuruganathan
CSE 5334 Saravanan Thirumuruganathan
CSE 5334 Saravanan Thirumuruganathan
CSE 5334 Saravanan Thirumuruganathan
CSE 5334 Saravanan Thirumuruganathan
CSE 5334 Saravanan Thirumuruganathan
CSE 5334 Saravanan Thirumuruganathan
CSE 5334 Saravanan Thirumuruganathan
CSE 5334 Saravanan Thirumuruganathan
CSE 5334 Saravanan Thirumuruganathan
CSE 5334 Saravanan Thirumuruganathan
CSE 5334 Saravanan Thirumuruganathan
CSE 5334 Saravanan Thirumuruganathan
Salary is color-coded from low (blue, green) to high (yellow,red)
CSE 5334 Saravanan Thirumuruganathan
CSE 5334 Saravanan Thirumuruganathan
CSE 5334 Saravanan Thirumuruganathan
CSE 5334 Saravanan Thirumuruganathan
Years is the most important factor in determining Salary, and players with less experience earn lower salaries than more experienced players. Given that a player is less experienced, the number of Hits that he made in the previous year seems to play little role in his Salary . But among players who have been in the major leagues for five or more years, the number of Hits made in the previous year does affect Salary , and players who made more Hits last year tend to have higher salaries. Surely an over-simplification, but compared to a regression model, it is easy to display, interpret and explain
CSE 5334 Saravanan Thirumuruganathan
Classification Tree: Quality of split measured by general “Impurity measure” Regression Tree: Quality of split measured by “Squared error”
CSE 5334 Saravanan Thirumuruganathan
We divide the feature space into J distinct and non-overlapping regions R1, R2, . . . , RJ For every observation that falls into the region Ri, we make same prediction, which is simply the mean of the response values for the training observations in Ri Objective: Find boxes R1, R2, . . . , RJ that minimizes Residual Sum of Square (RSS) RSS =
J
(yj − yRi)2 where yRi is the mean response for the training in the i-th box.
CSE 5334 Saravanan Thirumuruganathan
We first select the feature Xi and the cutpoint s such that splitting the feature space into the regions {X|Xi < s} and {X|Xi ≥ s} leads to the greatest possible reduction in RSS. Next, we repeat the process, looking for the best attribute and best cutpoint in order to split the data further so as to minimize the RSS within each of the resulting regions. The process continues until a stopping criterion is reached; for instance, we may continue until no region contains more than five observations.
CSE 5334 Saravanan Thirumuruganathan
Geometric interpretation of Classification Decision trees
CSE 5334 Saravanan Thirumuruganathan
Slides from ISLR book Slides by Piyush Rai Slides from OpenIntro Statistics book (http://www.webpages.uidaho.edu/~stevel/251/ slides/os2_slides_07.pdf) See also the footnotes
CSE 5334 Saravanan Thirumuruganathan