workshop 11 classification and regression trees
play

Workshop 11: Classification and Regression Trees Murray Logan - PDF document

Workshop 11: Classification and Regression Trees Murray Logan 26-011-2013 Limitations of Linear Models Feature complexity non-linear trends (GAM) complex (multi-way) interactions non-additive interactions Limitations of Linear


  1. Workshop 11: Classification and Regression Trees Murray Logan 26-011-2013 Limitations of Linear Models • Feature complexity – non-linear trends (GAM) – complex (multi-way) interactions – non-additive interactions Limitations of Linear Models • Feature complexity – non-linear trends (GAM) – complex (multi-way) interactions – non-additive interactions single models fail to capture complexity Limitations of Linear Models • Feature complexity • Prediction • Relative importance • (Multi)collinearity 1

  2. Classication & Regression Trees Advantanges • Feature complexity • Prediction • Relative importance • (Multi)collinearity Classication & Regression Trees Diss-advantanges • over-fitting (over learning) Classication & Regression Trees Classification • Categorical response Regression • Continuous response CART Simple regression trees • split ( partition ) data up into major chunks Simple regression trees • split ( partition ) data up into major chunks – maximizing change in explained deviance – when Gaussian error, ∗ maximizing between group SS ∗ minimizing SSerror 2

  3. Figure 1: Simple regression trees • split ( partition ) data up into major chunks Figure 2: Simple regression trees • split ( partition ) data up into major chunks 3

  4. Figure 3: Simple regression trees • split ( partition ) data up into major chunks Figure 4: Simple regression trees • split these subsets 4

  5. Figure 5: Simple regression trees • split these subsets Figure 6: Simple regression trees • split these subsets 5

  6. Figure 7: Simple regression trees • split these subsets Figure 8: Simple regression trees • recursively partition (split) • decision tree 6

  7. • simple trees tend to overfit. – error is fitted along with the model Figure 9: Simple regression trees Pruning • reduce overfitting – deviance at each terminal node (leaf) Simple regression trees Predictions • partial plots Classification and Regression Trees R packages • simple CART 7

  8. Figure 10: Figure 11: 8

  9. library (tree) • an extension that facilitates (some) non-gaussian errors library (rpart) Classification and Regression Trees Limitations • crude overfitting protection • low resolution • limited error distributions • little scope for random effects Boosted regression Trees Boosting • machine learning meets predictive modelling • ensemble models – sequence of simple Trees (10,000+ trees) – built to predict residuals of previous tree – shrinkage – produce excellent fit Boosted regression Trees Over fitting • over vs under fitting • residual error vs precision • minimizing square error loss 9

  10. Boosted regression Trees minimizing square error loss • test (validation) data – 75% train, 25% test • out of bag – 50% in, 50% out • cross validation – 3 folds Boosted regression Trees Over fitting Error: unused argument (to = c(0.5, 2.5)) Error: unused argument (to = c(0, 1)) Error: object ’at’ not found Boosted regression Trees Predictions Boosted regression Trees Variable importance var rel.inf x1 x1 55.06 x2 x2 44.94 Boosted regression Trees R 2 [1] 0.5952 10

  11. Figure 12: Figure 13: 11

  12. Figure 14: Worked Examples paruelo <- read.table (’../data/paruelo.csv’, header=T, sep=’,’, strip.white=T) head (paruelo) C3 LAT LONG MAP MAT JJAMAP DJFMAP 1 0.65 46.40 119.55 199 12.4 0.12 0.45 2 0.65 47.32 114.27 469 7.5 0.24 0.29 3 0.76 45.78 110.78 536 7.2 0.24 0.20 4 0.75 43.95 101.87 476 8.2 0.35 0.15 5 0.33 46.90 102.82 484 4.8 0.40 0.14 6 0.03 38.87 99.38 623 12.0 0.40 0.11 library (tree) paruelo.tree <- tree (C3 ~ LAT+LONG+MAP+MAT+JJAMAP, data=paruelo) plot ( residuals (paruelo.tree)~ predict (paruelo.tree)) plot (paruelo.tree) text (paruelo.tree, cex=0.75) plot ( prune.tree (paruelo.tree)) 12

  13. Figure 15: plot of chunk unnamed-chunk-9 13

  14. Figure 16: plot of chunk unnamed-chunk-9 14

  15. Figure 17: plot of chunk unnamed-chunk-9 15

  16. paruelo.tree1 <- prune.tree (paruelo.tree, best=4) plot ( residuals (paruelo.tree1)~ predict (paruelo.tree1)) Figure 18: plot of chunk unnamed-chunk-9 plot (paruelo.tree1) text (paruelo.tree1) summary (paruelo.tree1) Regression tree: snip.tree(tree = paruelo.tree, nodes = c(7L, 5L, 4L, 6L)) 16

  17. Figure 19: plot of chunk unnamed-chunk-9 17

  18. Variables actually used in tree construction: [1] "LAT" "MAT" Number of terminal nodes: 4 Residual mean deviance: 0.0304 = 2.1 / 69 Distribution of residuals: Min. 1st Qu. Median Mean 3rd Qu. Max. -0.4220 -0.1010 -0.0325 0.0000 0.0787 0.4820 paruelo.tree1$frame var n dev yval splits.cutleft 1 LAT 73 4.9093 0.2714 <42.785 2 MAT 50 1.6364 0.1592 <7.25 4 <leaf> 12 0.4384 0.3425 5 <leaf> 38 0.6674 0.1013 3 MAT 23 1.2762 0.5152 <6.9 6 <leaf> 12 0.6912 0.4083 7 <leaf> 11 0.2984 0.6318 splits.cutright 1 >42.785 2 >7.25 4 5 3 >6.9 6 7 library (scales) #ys <- with(paruelo, #rescale(C3, from=c(min(C3), max(C3)), #to=c(0.8,0))) #plot(paruelo$LONG,paruelo$LAT,col=grey(ys), #pch=20,xlab="Longitude",ylab="Latitude") #partition.tree(paruelo.tree1,ordvars=c("MAT","LAT"),add=TRUE) #partition.tree(paruelo.tree1,add=TRUE) #Prediction xlat <- seq ( min (paruelo$LAT), max (paruelo$LAT), l=100) #xlong <- mean(paruelo$LONG) #xmat <- mean(paruelo$MAT) pred <- predict (paruelo.tree1, newdata= data.frame (LAT=xlat, LONG= mean (paruelo$LONG), MAT= mean (paruelo$MAT), MAP= mean (paruelo$MAP), JJAMAP= mean (paruelo$JJAMAP))) 18

  19. par (mfrow= c (1,2)) plot (C3~LAT, paruelo, type="p",pch=16, cex=0.2) points ( I ( predict (paruelo.tree1)- resid (paruelo.tree1)) ~LAT, paruelo, type="p",pch=16, col="grey") lines (pred~xlat, col="red", lwd=2) xmat <- seq ( min (paruelo$MAT), max (paruelo$MAT), l=100) #xlat <- mean(paruelo$LAT) pred <- predict (paruelo.tree1, newdata= data.frame (LAT= mean (paruelo$LAT), LONG= mean (paruelo$LONG), MAT=xmat, MAP= mean (paruelo$MAP), JJAMAP= mean (paruelo$JJAMAP))) plot (C3~MAT, paruelo, type="p",pch=16, cex=0.2) points ( I ( predict (paruelo.tree1)- resid (paruelo.tree1))~MAT, paruelo, type="p",pch=16, col="grey" lines (pred~xmat, col="red", lwd=2) #xlong <- seq(min(paruelo$LONG), max(paruelo$LONG), l=100) ##xlat <- mean(paruelo$LAT) #pred <- predict(paruelo.tree1, # newdata=data.frame(LAT=mean(paruelo$LAT), LONG=xlong, MAT=mean(paruelo$MAT), # MAP=mean(paruelo$MAP), # JJAMAP=mean(paruelo$JJAMAP))) #plot(C3~LONG, paruelo, type="p",pch=16, cex=0.2) #points(I(predict(paruelo.tree1)-resid(paruelo.tree1))~LONG, paruelo, type="p",pch=16, col="grey") #lines(pred~xlong, col="red", lwd=2) ## paruelo.tree <- tree(C3 ~ LAT+LONG+MAP+MAT+ ## JJAMAP+DJFMAP, data=paruelo) ## plot(paruelo.tree) ## text(paruelo.tree, cex=0.75) ## xlat <- seq(min(paruelo$LAT), max(paruelo$LAT), l=100) ## xlong <- mean(paruelo$LONG) ## xmat<- mean(paruelo$MAT) ## xmap<- mean(paruelo$MAP) ## xjjamap<- mean(paruelo$JJAMAP) ## xdjfmap<- mean(paruelo$DJFMAP) ## pp <- predict(paruelo.tree1, newdata=data.frame(LAT=xlat, LONG=xlong, MAT=xmat, MAP=xmap, ## par(mfrow=c(1,2)) ## plot(C3~LAT, paruelo, type="p",pch=16, cex=0.2) ## points(I(predict(paruelo.tree1)-resid(paruelo.tree1))~LAT, paruelo, type="p",pch=16, col="grey") ## lines(pp~xlat, col="red", lwd=2) 19

  20. Figure 20: plot of chunk unnamed-chunk-9 20

  21. ## xlat <- mean(paruelo$LAT) ## xlong <- mean(paruelo$LONG) ## xmat<- seq(min(paruelo$MAT), max(paruelo$MAT), l=100) ## xmap<- mean(paruelo$MAP) ## xjjamap<- mean(paruelo$JJAMAP) ## xdjfmap<- mean(paruelo$DJFMAP) ## pp <- predict(paruelo.tree1, newdata=data.frame(LAT=xlat, LONG=xlong, MAT=xmat, MAP=xmap, ## par(mfrow=c(1,2)) ## plot(C3~MAT, paruelo, type="p",pch=16, cex=0.2) ## points(I(predict(paruelo.tree1)-resid(paruelo.tree1))~MAT, paruelo, type="p",pch=16, col="grey") ## lines(pp~xmat, col="red", lwd=2) ## Now GBM library (gbm) paruelo.gbm <- gbm (C3~LAT+LONG+MAP+MAT+JJAMAP+DJFMAP, data=paruelo, distribution="gaussian", n.trees=10000, interaction.depth=3, # 1: additive model, 2: two-way interactions, etc. cv.folds=3, train.fraction=0.75, bag.fraction=0.5, shrinkage=0.001, n.minobsinnode=2) ## Out of Bag method of determining number of iterations (best.iter <- gbm.perf (paruelo.gbm, method="test")) [1] 1533 (best.iter <- gbm.perf (paruelo.gbm, method="OOB")) [1] 1197 (best.iter <- gbm.perf (paruelo.gbm, method="OOB",oobag.curve=TRUE,overlay=TRUE, plot.it=TRUE)) [1] 1197 (best.iter <- gbm.perf (paruelo.gbm, method="cv")) 21

  22. Figure 21: plot of chunk unnamed-chunk-9 22

  23. Figure 22: plot of chunk unnamed-chunk-9 23

  24. [1] 1844 par (mfrow= c (1,2)) Figure 23: plot of chunk unnamed-chunk-9 best.iter <- gbm.perf (paruelo.gbm, method="cv", oobag.curve=TRUE,overlay=TRUE,plot.it=TRUE) best.iter [1] 1844 24

  25. Figure 24: plot of chunk unnamed-chunk-9 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend