Decision trees Image Learned Training Features model Subhransu - PowerPoint PPT Presentation

Recall: Steps Training Training Labels Training Images Decision trees Image Learned Training Features model Subhransu Maji CMPSCI 670: Computer Vision Learned November 1, 2016 model Testing Image Prediction Features Test Image Slide credit: D. Hoiem The decision tree model of learning Learning a decision tree Classic and natural model of learning Recall that one of the ingredients of learning is training data ‣ I’ll give you (x, y) pairs, i.e., set of Question: Will an unknown student enjoy an unknown course? (attributes, label) pairs ‣ You: Is the course under consideration in Systems? ‣ We will simplify the problem by ‣ Me: Yes ➡ {0,+1, +2} as “liked” ‣ You: Has this student taken any other Systems courses? ‣ Me: Yes ➡ {-1,-2} as “hated” ‣ You: Has this student liked most previous Systems courses? Here: ‣ Me: No ‣ Questions are features ‣ You: I predict this student will not like this course. ‣ Responses are feature values ‣ Rating is the label Goal of learner: Figure out what questions to ask, and in what order, and what to predict when you have answered enough questions Lots of possible trees to build Can we find good one quickly? Course ratings dataset CMPSCI 670 Subhransu Maji (UMASS) 3 CMPSCI 670 Subhransu Maji (UMASS) 4

Greedy decision tree learning What attribute is useful? If I could ask one question, what If I could ask one question, what question would I ask? question would I ask? ‣ You want a feature that is most useful in predicting the rating of the course ‣ A useful way of thinking about this is to look at the histogram of the labels for each feature Attribute = Easy? CMPSCI 670 Subhransu Maji (UMASS) 5 CMPSCI 670 Subhransu Maji (UMASS) 6 What attribute is useful? What attribute is useful? If I could ask one question, what If I could ask one question, what question would I ask? question would I ask? # correct = 6 # correct = 6 Attribute = Easy? Attribute = Easy? CMPSCI 670 Subhransu Maji (UMASS) 7 CMPSCI 670 Subhransu Maji (UMASS) 8

What attribute is useful? What attribute is useful? If I could ask one question, what If I could ask one question, what question would I ask? question would I ask? # correct = 12 Attribute = Easy? Attribute = Sys? CMPSCI 670 Subhransu Maji (UMASS) 9 CMPSCI 670 Subhransu Maji (UMASS) 10 What attribute is useful? What attribute is useful? If I could ask one question, what If I could ask one question, what question would I ask? question would I ask? # correct = 10 # correct = 8 Attribute = Sys? Attribute = Sys? CMPSCI 670 Subhransu Maji (UMASS) 11 CMPSCI 670 Subhransu Maji (UMASS) 12

What attribute is useful? Picking the best attribute If I could ask one question, what question would I ask? =12 =12 =15 =18 # correct = 18 =14 =13 Attribute = Sys? best attribute CMPSCI 670 Subhransu Maji (UMASS) 13 CMPSCI 670 Subhransu Maji (UMASS) 14 Decision tree training Decision tree train Training procedure 1.Find the feature that leads to best prediction on the data 2.Split the data into two sets {feature = Y}, {feature = N} 3.Recurse on the two sets (Go back to Step 1) 4.Stop when some criteria is met When to stop? ‣ When the data is unambiguous (all the labels are the same) ‣ When there are no questions remaining ‣ When maximum depth is reached (e.g. limit of 20 questions) Testing procedure ‣ Traverse down the tree to the leaf node ‣ Pick the majority label CMPSCI 670 Subhransu Maji (UMASS) 15 CMPSCI 670 Subhransu Maji (UMASS) 16

Decision tree test Underfitting and overfitting Decision trees: ‣ Underfitting: an empty decision tree ➡ Test error: ? ‣ Overfitting: a full decision tree ➡ Test error: ? CMPSCI 670 Subhransu Maji (UMASS) 17 CMPSCI 670 Subhransu Maji (UMASS) 18 Model, parameters, and hyperparameters DTs in action: Face detection Model: decision tree Application: Face detection [Viola & Jones, 01] ‣ Features: detect light/dark rectangles in an image Parameters: learned by the algorithm Hyperparameter: depth of the tree to consider ‣ A typical way of setting this is to use validation data ‣ Usually set 2/3 training and 1/3 testing ➡ Split the training into 1/2 training and 1/2 validation ➡ Estimate optimal hyperparameters on the validation data training validation testing CMPSCI 670 Subhransu Maji (UMASS) 19 CMPSCI 670 Subhransu Maji (UMASS) 20

Ensembles Voting multiple classifiers Wisdom of the crowd: groups of people can often make better Most of the learning algorithms we saw so far are deterministic decisions than individuals ‣ If you train a decision tree multiple times on the same dataset, you Questions: will get the same tree ‣ Ways to combine base learners into ensembles Two ways of getting multiple classifiers: ‣ We might be able to use simple learning algorithms ‣ Change the learning algorithm ‣ Inherent parallelism in training ➡ Given a dataset (say, for classification) ➡ Train several classifiers: decision tree, kNN, logistic regression, neural ‣ Boosting — a method that takes classifiers that are only slightly networks with different architectures, etc better than chance and learns an arbitrarily good classifier ➡ Call these classifiers f 1 ( x ) , f 2 ( x ) , . . . , f M ( x ) ➡ Take majority of predictions y = majority( f 1 ( x ) , f 2 ( x ) , . . . , f M ( x )) ˆ • For regression use mean or median of the predictions ‣ Change the dataset ➡ How do we get multiple datasets? CMPSCI 670 Subhransu Maji (UMASS) 21 CMPSCI 670 Subhransu Maji (UMASS) 22 Bagging Random ensembles Option: split the data into K pieces and train a classifier on each One drawback of ensemble learning is that the training time increases ‣ A drawback is that each classifier is likely to perform poorly ‣ For example when training an ensemble of decision trees the expensive step is choosing the splitting criteria Bootstrap resampling is a better alternative Random forests are an efficient and surprisingly effective alternative ‣ Given a dataset D sampled i.i.d from a unknown distribution D , and ̂ by random sampling with replacement from ‣ Choose trees with a fixed structure and random features we get a new dataset D ̂ is also an i.i.d sample from D D, then D ➡ Instead of finding the best feature for splitting at each node, choose a ̂ random subset of size k and pick the best among these D There will be repetitions D sampling with replacement ➡ Train decision trees of depth d Probability that the first point will not be selected: ➡ Average results from multiple randomly trained trees ◆ N ✓ 1 − 1 → 1 ‣ When k=1, no training is involved — only need to record the values e ∼ 0 . 3679 − N at the leaf nodes which is significantly faster Roughly only 63% of the original data Random forests tends to work better than bagging decision trees will be contained in any bootstrap because bagging tends produce highly correlated trees — a good Bootstrap aggregation (bagging) of classifiers [Breiman 94] feature is likely to be used in all samples ‣ Obtain datasets D 1 , D 2 , … ,D N using bootstrap resampling from D ‣ Train classifiers on each dataset and average their predictions CMPSCI 670 Subhransu Maji (UMASS) 23 CMPSCI 670 Subhransu Maji (UMASS) 24

DTs in action: Digits classification DT in action: Kinect pose estimation Early proponents of random forests: “Joint Induction of Shape Human pose estimation from Features and Tree Classifiers”, Amit, Geman and Wilder, PAMI 1997 depth in the Kinect sensor [Shotton et al. CVPR 11] Features: arrangement of tags tags Common 4x4 patterns Training: 3 trees, 20 deep, 300k training images per tree, 2000 training example pixels per image, 2000 candidate features θ , and 50 candidate thresholds τ per A subset of all the 62 tags feature (Takes about 1 day on a 1000 core cluster) Arrangements: 8 angles #Features: 62x62x8 = 30,752 Single tree: 7.0% error Combination of 25 trees: 0.8% error CMPSCI 670 Subhransu Maji (UMASS) 25 CMPSCI 670 Subhransu Maji (UMASS) 26 ground'truth' Retarget'to'several'models' Record'mocap' ' 500k'frames' Average'per)class'accuracy' distilled'to'100k'poses' 55%' 50%' Render'(depth,'body'parts)'pairs'' inferred'body'parts'(most'likely)' 45%' 1'tree' 3'trees' 6'trees' Train&invariance&to:& 40%' 1' 2' 3' 4' 5' 6' Number'of'trees' && CMPSCI 670 Subhransu Maji (UMASS) 27 CMPSCI 670 Subhransu Maji (UMASS) 28

Slides credit Decision tree learning and material are based on CIML book by Hal Daume III (http://ciml.info/dl/v0_9/ciml-v0_9-ch01.pdf) Bias-variance figures — https://theclevermachine.wordpress.com/ tag/estimator-variance/ Figures for random forest classifier on MNIST dataset — Amit, Geman and Wilder, PAMI 1997 — http://www.cs.berkeley.edu/~malik/ cs294/amitgemanwilder97.pdf Figures for Kinect pose — “Real-Time Human Pose Recognition in Parts from Single Depth Images”, J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, R. Moore, A. Kipman, A. Blake, CVPR 2011 Credit for many of these slides go to Alyosha Efros, Shvetlana Lazebnik, Hal Daume III, Alex Berg, etc CMPSCI 670 Subhransu Maji (UMASS) 29

Decision trees Image Learned Training Features model Subhransu - PowerPoint PPT Presentation

Recall: Steps Training Training Labels Training Images Decision trees Image Learned Training Features model Subhransu Maji CMPSCI 670: Computer Vision Learned November 1, 2016 model Testing Image Prediction Features Test Image

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Lecture 23: Decision Trees Decision trees Prof. Julia Hockenmaier

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

/ + - * * 5 3 2 6 5 2 Examples Binary Trees BSTs Augmenting BinExpr General Trees

Learning Decision Trees Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Optimal Sparse Decision Trees Xiyang Hu Cynthia Rudin Margo Seltzer Carnegie Mellon Duke

Decision Trees: Discussion Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Decision trees Decision Trees / Discrete Variables Location Season Location Fun? Ski Slope

The Language of Mathematics when one alphabet just isnt enough Julian J. Schlder

Geometry, invariants, and linearization of mechanical control systems Witold RESPONDEK Normandie

P2-NEO-I SSA-NEO S EGMENT F INAL P RESENTATIONS DA Y ESRIN 1 F EBRUARY 2016 ETTORE PEROZZI Deimos

GW2Propagation, detection, generation Michele Vallisneri ICTP Summer School on Cosmology 2016

Statistical methods for mapping connectivity Liam Paninski Department of Statistics Center for

Tutorial on Proof Systems Connected to SAT Solving Sam Buss Univ. of California, San Diego

Water Mitigation & Mold Prevention Presented by ATIs Jamie Moore & Sara Baker June

e s a e l e R r o f d e v o r p p A 1 Declassified by: MG Michael X. Garrett,

Decision trees Image Learned Training Features model Subhransu - PowerPoint PPT Presentation

Recall: Steps Training Training Labels Training Images Decision trees Image Learned Training Features model Subhransu Maji CMPSCI 670: Computer Vision Learned November 1, 2016 model Testing Image Prediction Features Test Image

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Lecture 23: Decision Trees Decision trees Prof. Julia Hockenmaier

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

/ + - * * 5 3 2 6 5 2 Examples Binary Trees BSTs Augmenting BinExpr General Trees

Learning Decision Trees Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Optimal Sparse Decision Trees Xiyang Hu Cynthia Rudin Margo Seltzer Carnegie Mellon Duke

Decision Trees: Discussion Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Decision trees Decision Trees / Discrete Variables Location Season Location Fun? Ski Slope

The Language of Mathematics when one alphabet just isnt enough Julian J. Schlder

Geometry, invariants, and linearization of mechanical control systems Witold RESPONDEK Normandie

P2-NEO-I SSA-NEO S EGMENT F INAL P RESENTATIONS DA Y ESRIN 1 F EBRUARY 2016 ETTORE PEROZZI Deimos

GW2Propagation, detection, generation Michele Vallisneri ICTP Summer School on Cosmology 2016

Statistical methods for mapping connectivity Liam Paninski Department of Statistics Center for

Tutorial on Proof Systems Connected to SAT Solving Sam Buss Univ. of California, San Diego

Water Mitigation &amp; Mold Prevention Presented by ATIs Jamie Moore &amp; Sara Baker June

e s a e l e R r o f d e v o r p p A 1 Declassified by: MG Michael X. Garrett,

Water Mitigation & Mold Prevention Presented by ATIs Jamie Moore & Sara Baker June