Overview of Decision Trees, Ensemble Methods and Reinforcement - PowerPoint PPT Presentation

Overview of Decision Trees, Ensemble Methods and Reinforcement Learning CMSC 678 UMBC

Outline Decision Trees Ensemble Methods Bagging Random Forests Reinforcement Learning

Decision Trees “20 Questions”: http://20q.net/ Goals: 1. Figure out what questions to ask 2. In what order 3. Determine how many questions are enough 4. What to predict at the end Adapted from Hamed Pirsiavash

Example: Learning a decision tree Course ratings dataset Adapted from Hamed Pirsiavash

Example: Learning a decision tree Course ratings dataset Rating is the label Adapted from Hamed Pirsiavash

Example: Learning a decision tree Course ratings dataset Questions are features Rating is the label Adapted from Hamed Pirsiavash

Example: Learning a decision tree Course ratings dataset Questions are features Responses are feature values Rating is the label Idea: Predict the label by forming a tree where each node branches on values of particular features Adapted from Hamed Pirsiavash

Example: Learning a decision tree Course ratings dataset Questions are features Responses are feature values Rating is the label Easy? Adapted from Hamed Pirsiavash

Example: Learning a decision tree Course ratings dataset Questions are features Responses are feature values Rating is the label Easy? Easy: yes Easy: no AI? Adapted from Hamed Pirsiavash

Example: Learning a decision tree Course ratings dataset Questions are features Responses are feature values Rating is the label Easy? Easy: yes Easy: no AI? AI: yes AI: no …. Adapted from Hamed Pirsiavash

Example: Learning a decision tree Course ratings dataset Questions are features Responses are feature values Rating is the label Easy? Easy: yes Easy: no AI? Sys? AI: yes AI: no …. Adapted from Hamed Pirsiavash

Example: Learning a decision tree Course ratings dataset Questions are features Responses are feature values Rating is the label Easy? Easy: yes Easy: no AI? Sys? AI: yes Sys: yes Sys: no AI: no …. …. Adapted from Hamed Pirsiavash

Predicting with a Decision Tree is Done Easily and Recursively CIML, Ch 1

There Are Many Ways to Learn a Decision Tree 1. Greedy/Count: What is the most accurate feature at each decision point? 1. See CIML Ch. 1 (and next slides) 2. Maximize information gain at each step 1. Most popular approaches: ID3, C4.5 3. Account for statistical significance 1. Example: Chi-square automatic interaction detection (CHAID) 4. Other task-specific ones (including clustering based)

CIML, Ch 1

counting CIML, Ch 1

counting recursive CIML, Ch 1

simple base cases counting recursive CIML, Ch 1

Ensembles Key Idea: “Wisdom of the crowd“ groups of people can often make better decisions than individuals Apply this to ML Learn multiple classifiers and combine their predictions

Combining Multiple Classifiers by Voting Train several classifiers and take majority of predictions Courtesy Hamed Pirsiavash

Combining Multiple Classifiers by Voting Train several classifiers and take majority of predictions For regression use mean or median of the predictions For ranking and collective classification use some form of averaging

Combining Multiple Classifiers by Voting Train several classifiers and take majority of predictions For regression use mean or median of the predictions For ranking and collective classification use some form of averaging A common family of approaches is called bagging

Bagging: Split the Data Q: What can go wrong with option 1? Option 1: Split the data into K pieces and train a classifier on each

Bagging: Split the Data Q: What can go wrong with option 1? Option 1: Split the data into K pieces and train a classifier on each A: Small sample → poor performance

Bagging: Split the Data Q: What can go wrong with option 1? Option 1: Split the data into K pieces and train a classifier on each A: Small sample → poor performance Option 2: Bootstrap aggregation (bagging) resampling

Bagging: Split the Data Q: What can go wrong with option 1? Option 1: Split the data into K pieces and train a classifier on each A: Small sample → poor performance Option 2: Bootstrap aggregation (bagging) Given a dataset D… resampling Obtain datasets D 1 , D 2 , … , D N sampling with using bootstrap resampling replacement from D get new datasets D̂ by random sampling with replacement from D Courtesy Hamed Pirsiavash

Bagging: Split the Data Q: What can go wrong with option 1? Option 1: Split the data into K pieces and train a classifier on each A: Small sample → poor performance Option 2: Bootstrap aggregation (bagging) Given a resampling dataset D… Obtain datasets D 1 , D 2 , … , D N using bootstrap resampling sampling with replacement from D Train classifiers on each dataset and average their get new datasets D̂ by predictions random sampling with replacement from D Courtesy Hamed Pirsiavash

Why does averaging work? Averaging reduces the variance of estimators Courtesy Hamed Pirsiavash

Why does averaging work? Averaging reduces the variance of estimators f: Generating line y: observed data 𝑕 𝑗 : Learned polynomial regression Courtesy Hamed Pirsiavash

Why does averaging work? Averaging reduces the variance of estimators 50 samples Averaging is a form of regularization: each model can individually overfit but the average is able to overcome the overfitting Courtesy Hamed Pirsiavash

Bagging Decision Trees How would it work?

Bagging Decision Trees How would it work? Bootstrap sample S samples {(X 1 , Y 1 ), …, (X S , Y S )} Train a tree t s on (X s , Y s ) At test time: ො 𝑧 = avg(𝑢 1 𝑦 , … 𝑢 𝑇 𝑦 )

Random Forests Bagging trees with one modification At each split point, choose a random subset of features of size k and pick the best among these Train decision trees of depth d Average results from multiple randomly trained trees Q : What’s the difference between bagging decision trees and random forests? Courtesy Hamed Pirsiavash

Random Forests Bagging trees with one modification At each split point, choose a random subset of features of size k and pick the best among these Train decision trees of depth d Average results from multiple randomly trained trees A : Bagging → highly Q : What’s the difference between bagging decision correlated trees (reuse good trees and random forests? features) Courtesy Hamed Pirsiavash

Random Forests: Human Pose Estimation (Shotton et al., CVPR 2011) Training: 3 trees, 20 deep, 300k training images per tree, 2000 training example pixels per image, 2000 candidate features θ, and 50 candidate thresholds τ per feature (Takes about 1 day on a 1000 core cluster)

(Shotton et al., CVPR 2011)

There’s an entire book! http://incompleteideas. net/book/the-book- 2nd.html

Reinforcement Learning environment agent https://static.vecteezy.com/system/resources/previews/000/0 Robot image: openclipart.org 90/451/original/four-seasons-landscape-illustrations-vector.jpg

Reinforcement Learning take action environment agent https://static.vecteezy.com/system/resources/previews/000/0 Robot image: openclipart.org 90/451/original/four-seasons-landscape-illustrations-vector.jpg

Reinforcement Learning take action get new state environment and/or reward agent https://static.vecteezy.com/system/resources/previews/000/0 Robot image: openclipart.org 90/451/original/four-seasons-landscape-illustrations-vector.jpg

Markov Decision Process: Formalizing Reinforcement Learning take action get new state environment and/or reward agent Markov Decision (𝒯, 𝒝, ℛ, 𝑞, 𝛿) Process: https://static.vecteezy.com/system/resources/previews/000/0 Robot image: openclipart.org 90/451/original/four-seasons-landscape-illustrations-vector.jpg

Markov Decision Process: Formalizing Reinforcement Learning take action get new state environment and/or reward agent set of possible actions Markov Decision (𝒯, 𝒝, ℛ, 𝑞, 𝛿) Process: set of possible states https://static.vecteezy.com/system/resources/previews/000/0 Robot image: openclipart.org 90/451/original/four-seasons-landscape-illustrations-vector.jpg

Markov Decision Process: Formalizing Reinforcement Learning take action get new state environment and/or reward agent set of possible actions Markov Decision (𝒯, 𝒝, ℛ, 𝑞, 𝛿) Process: set of reward of possible (state, states action) pairs https://static.vecteezy.com/system/resources/previews/000/0 Robot image: openclipart.org 90/451/original/four-seasons-landscape-illustrations-vector.jpg

Overview of Decision Trees, Ensemble Methods and Reinforcement - PowerPoint PPT Presentation

Overview of Decision Trees, Ensemble Methods and Reinforcement Learning CMSC 678 UMBC Outline Decision Trees Ensemble Methods Bagging Random Forests Reinforcement Learning Decision Trees 20 Questions: http://20q.net/ Goals: 1.

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

Decision trees and Ensemble methods Camilo Fosco CS109A Introduction to Data Science Pavlos

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

Lecture 23: Decision Trees Decision trees Prof. Julia Hockenmaier

Outline Univariate Trees 1 Decision Trees Classification Regression Pruning Steven J Zeil

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

2-3-4 Trees and Red- Black Trees 204 erm CS 16: Balanced Trees 2-3-4 Trees Revealed Nodes

/ + - * * 5 3 2 6 5 2 Examples Binary Trees BSTs Augmenting BinExpr General Trees

Learning Decision Trees Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Forest for the monitoring of wetland vegetation with multispectral data. Julie Campagna, phD

A Brief History of Decision Tree Implementation MAX AUSTIN Overview Famous Decision Tree

Ricco RAKOTOMALALA Ricco Rakotomalala 1 Tutoriels Tanagra -

Uniform Interpolation Part II: An Algebraic Framework George Metcalfe Mathematical Institute

So#ware as academic output Caroline Jay and Robert Haines

Lecture 7: Decision Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis 1 Data Mining

TOWARDS HOLISTIC BIM-BASED BUILDING DESIGN APPLYING COMPUTATIONAL APPROACHES TO ENHANCE