Overview of Decision Trees, Ensemble Methods and Reinforcement - - PowerPoint PPT Presentation

overview of decision trees ensemble
SMART_READER_LITE
LIVE PREVIEW

Overview of Decision Trees, Ensemble Methods and Reinforcement - - PowerPoint PPT Presentation

Overview of Decision Trees, Ensemble Methods and Reinforcement Learning CMSC 678 UMBC Outline Decision Trees Ensemble Methods Bagging Random Forests Reinforcement Learning Decision Trees 20 Questions: http://20q.net/ Goals: 1.


slide-1
SLIDE 1

Overview of Decision Trees, Ensemble Methods and Reinforcement Learning

CMSC 678 UMBC

slide-2
SLIDE 2

Outline

Decision Trees Ensemble Methods Bagging Random Forests Reinforcement Learning

slide-3
SLIDE 3

“20 Questions”: http://20q.net/ Goals: 1. Figure out what questions to ask

  • 2. In what order
  • 3. Determine how many questions are enough
  • 4. What to predict at the end

Decision Trees

Adapted from Hamed Pirsiavash

slide-4
SLIDE 4

Example: Learning a decision tree

Course ratings dataset

Adapted from Hamed Pirsiavash

slide-5
SLIDE 5

Rating is the label

Example: Learning a decision tree

Course ratings dataset

Adapted from Hamed Pirsiavash

slide-6
SLIDE 6

Questions are features Rating is the label

Example: Learning a decision tree

Course ratings dataset

Adapted from Hamed Pirsiavash

slide-7
SLIDE 7

Questions are features Responses are feature values Rating is the label

Idea: Predict the label by forming a tree where each node branches on values of particular features

Example: Learning a decision tree

Course ratings dataset

Adapted from Hamed Pirsiavash

slide-8
SLIDE 8

Questions are features Responses are feature values Rating is the label

Example: Learning a decision tree

Course ratings dataset

Adapted from Hamed Pirsiavash

Easy?

slide-9
SLIDE 9

Questions are features Responses are feature values Rating is the label

Example: Learning a decision tree

Course ratings dataset

Adapted from Hamed Pirsiavash

Easy?

Easy: yes Easy: no

AI?

slide-10
SLIDE 10

Questions are features Responses are feature values Rating is the label

Example: Learning a decision tree

Course ratings dataset

Adapted from Hamed Pirsiavash

Easy?

Easy: yes Easy: no

AI?

AI: yes AI: no

….

slide-11
SLIDE 11

Questions are features Responses are feature values Rating is the label

Example: Learning a decision tree

Course ratings dataset

Adapted from Hamed Pirsiavash

Easy?

Easy: yes Easy: no

AI? Sys?

AI: yes AI: no

….

slide-12
SLIDE 12

Questions are features Responses are feature values Rating is the label

Example: Learning a decision tree

Course ratings dataset

Adapted from Hamed Pirsiavash

Easy?

Easy: yes Easy: no

AI? Sys?

AI: yes AI: no Sys: yes Sys: no

…. ….

slide-13
SLIDE 13

CIML, Ch 1

Predicting with a Decision Tree is Done Easily and Recursively

slide-14
SLIDE 14

There Are Many Ways to Learn a Decision Tree

  • 1. Greedy/Count: What is the most accurate

feature at each decision point?

  • 1. See CIML Ch. 1 (and next slides)
  • 2. Maximize information gain at each step
  • 1. Most popular approaches: ID3, C4.5
  • 3. Account for statistical significance
  • 1. Example: Chi-square automatic interaction

detection (CHAID)

  • 4. Other task-specific ones (including clustering

based)

slide-15
SLIDE 15

CIML, Ch 1

slide-16
SLIDE 16

CIML, Ch 1

counting

slide-17
SLIDE 17

CIML, Ch 1

counting recursive

slide-18
SLIDE 18

CIML, Ch 1

counting recursive simple base cases

slide-19
SLIDE 19

Outline

Decision Trees Ensemble Methods Bagging Random Forests Reinforcement Learning

slide-20
SLIDE 20

Key Idea: “Wisdom of the crowd“ groups of people can often make better decisions than individuals Apply this to ML Learn multiple classifiers and combine their predictions

Ensembles

slide-21
SLIDE 21

Train several classifiers and take majority of predictions

Combining Multiple Classifiers by Voting

Courtesy Hamed Pirsiavash

slide-22
SLIDE 22

Train several classifiers and take majority of predictions For regression use mean or median of the predictions For ranking and collective classification use some form of averaging

Combining Multiple Classifiers by Voting

slide-23
SLIDE 23

Train several classifiers and take majority of predictions For regression use mean or median of the predictions For ranking and collective classification use some form of averaging

Combining Multiple Classifiers by Voting

A common family of approaches is called bagging

slide-24
SLIDE 24

Bagging: Split the Data

Q: What can go wrong with option 1?

Option 1: Split the data into K pieces and train a classifier on each

slide-25
SLIDE 25

Bagging: Split the Data

Q: What can go wrong with option 1? A: Small sample → poor performance

Option 1: Split the data into K pieces and train a classifier on each

slide-26
SLIDE 26

Option 2: Bootstrap aggregation (bagging) resampling

Bagging: Split the Data

Q: What can go wrong with option 1? A: Small sample → poor performance

Option 1: Split the data into K pieces and train a classifier on each

slide-27
SLIDE 27

Option 2: Bootstrap aggregation (bagging) resampling Obtain datasets D1, D2, … , DN using bootstrap resampling from D

Bagging: Split the Data

sampling with replacement

Q: What can go wrong with option 1? A: Small sample → poor performance

Option 1: Split the data into K pieces and train a classifier on each

Given a dataset D…

get new datasets D̂ by random sampling with replacement from D

Courtesy Hamed Pirsiavash

slide-28
SLIDE 28

Option 2: Bootstrap aggregation (bagging) resampling Obtain datasets D1, D2, … , DN using bootstrap resampling from D Train classifiers on each dataset and average their predictions

Bagging: Split the Data

sampling with replacement

Q: What can go wrong with option 1? A: Small sample → poor performance

Option 1: Split the data into K pieces and train a classifier on each

Given a dataset D…

get new datasets D̂ by random sampling with replacement from D

Courtesy Hamed Pirsiavash

slide-29
SLIDE 29

Averaging reduces the variance of estimators

Why does averaging work?

Courtesy Hamed Pirsiavash

slide-30
SLIDE 30

Averaging reduces the variance of estimators

Why does averaging work?

Courtesy Hamed Pirsiavash

y: observed data f: Generating line 𝑕𝑗: Learned polynomial regression

slide-31
SLIDE 31

Averaging reduces the variance of estimators Averaging is a form of regularization: each model can individually overfit but the average is able to overcome the overfitting

Why does averaging work?

50 samples

Courtesy Hamed Pirsiavash

slide-32
SLIDE 32

Bagging Decision Trees

How would it work?

slide-33
SLIDE 33

Bagging Decision Trees

How would it work? Bootstrap sample S samples {(X1, Y1), …, (XS, YS)} Train a tree ts on (Xs, Ys) At test time: ො 𝑧 = avg(𝑢1 𝑦 , … 𝑢𝑇 𝑦 )

slide-34
SLIDE 34

Bagging trees with one modification At each split point, choose a random subset of features

  • f size k and pick the best among these

Train decision trees of depth d Average results from multiple randomly trained trees

Random Forests

Q: What’s the difference between bagging decision trees and random forests?

Courtesy Hamed Pirsiavash

slide-35
SLIDE 35

Bagging trees with one modification At each split point, choose a random subset of features

  • f size k and pick the best among these

Train decision trees of depth d Average results from multiple randomly trained trees

Random Forests

Q: What’s the difference between bagging decision trees and random forests?

Courtesy Hamed Pirsiavash

A: Bagging → highly correlated trees (reuse good features)

slide-36
SLIDE 36

Random Forests: Human Pose Estimation (Shotton et al., CVPR 2011)

Training: 3 trees, 20 deep, 300k training images per tree, 2000 training example pixels per image, 2000 candidate features θ, and 50 candidate thresholds τ per feature (Takes about 1 day on a 1000 core cluster)

slide-37
SLIDE 37

(Shotton et al., CVPR 2011)

slide-38
SLIDE 38

Outline

Decision Trees Ensemble Methods Bagging Random Forests Reinforcement Learning

slide-39
SLIDE 39

There’s an entire book!

http://incompleteideas. net/book/the-book- 2nd.html

slide-40
SLIDE 40

Reinforcement Learning

Robot image: openclipart.org https://static.vecteezy.com/system/resources/previews/000/0 90/451/original/four-seasons-landscape-illustrations-vector.jpg

agent environment

slide-41
SLIDE 41

Reinforcement Learning

take action

Robot image: openclipart.org https://static.vecteezy.com/system/resources/previews/000/0 90/451/original/four-seasons-landscape-illustrations-vector.jpg

agent environment

slide-42
SLIDE 42

Reinforcement Learning

take action

Robot image: openclipart.org https://static.vecteezy.com/system/resources/previews/000/0 90/451/original/four-seasons-landscape-illustrations-vector.jpg

agent get new state and/or reward environment

slide-43
SLIDE 43

Reinforcement Learning

take action

Robot image: openclipart.org https://static.vecteezy.com/system/resources/previews/000/0 90/451/original/four-seasons-landscape-illustrations-vector.jpg

agent get new state and/or reward environment

slide-44
SLIDE 44

Markov Decision Process: Formalizing Reinforcement Learning

take action

Robot image: openclipart.org https://static.vecteezy.com/system/resources/previews/000/0 90/451/original/four-seasons-landscape-illustrations-vector.jpg

agent get new state and/or reward environment

(𝒯, 𝒝, ℛ, 𝑞, 𝛿)

Markov Decision Process:

slide-45
SLIDE 45

Markov Decision Process: Formalizing Reinforcement Learning

take action

Robot image: openclipart.org https://static.vecteezy.com/system/resources/previews/000/0 90/451/original/four-seasons-landscape-illustrations-vector.jpg

agent get new state and/or reward environment

(𝒯, 𝒝, ℛ, 𝑞, 𝛿)

Markov Decision Process:

set of possible states set of possible actions

slide-46
SLIDE 46

Markov Decision Process: Formalizing Reinforcement Learning

take action

Robot image: openclipart.org https://static.vecteezy.com/system/resources/previews/000/0 90/451/original/four-seasons-landscape-illustrations-vector.jpg

agent get new state and/or reward environment

(𝒯, 𝒝, ℛ, 𝑞, 𝛿)

Markov Decision Process:

set of possible states reward of (state, action) pairs set of possible actions

slide-47
SLIDE 47

Markov Decision Process: Formalizing Reinforcement Learning

take action

Robot image: openclipart.org https://static.vecteezy.com/system/resources/previews/000/0 90/451/original/four-seasons-landscape-illustrations-vector.jpg

agent get new state and/or reward environment

(𝒯, 𝒝, ℛ, 𝑞, 𝛿)

Markov Decision Process:

set of possible states reward of (state, action) pairs set of possible actions state-action transition distribution

slide-48
SLIDE 48

Markov Decision Process: Formalizing Reinforcement Learning

take action

Robot image: openclipart.org https://static.vecteezy.com/system/resources/previews/000/0 90/451/original/four-seasons-landscape-illustrations-vector.jpg

agent get new state and/or reward environment

(𝒯, 𝒝, ℛ, 𝑞, 𝛿)

Markov Decision Process:

set of possible states reward of (state, action) pairs set of possible actions state-action transition distribution discount factor

slide-49
SLIDE 49

Markov Decision Process: Formalizing Reinforcement Learning (𝒯, 𝒝, ℛ, 𝜌, 𝛿)

Markov Decision Process:

set of possible states reward of (state, action) pairs set of possible actions state-action transition distribution discount factor

Start in initial state 𝑡0

slide-50
SLIDE 50

Markov Decision Process: Formalizing Reinforcement Learning (𝒯, 𝒝, ℛ, 𝜌, 𝛿)

Markov Decision Process:

set of possible states reward of (state, action) pairs set of possible actions state-action transition distribution discount factor

Start in initial state 𝑡0 for t = 1 to …: choose action 𝑏𝑢

slide-51
SLIDE 51

Markov Decision Process: Formalizing Reinforcement Learning (𝒯, 𝒝, ℛ, 𝜌, 𝛿)

Markov Decision Process:

set of possible states reward of (state, action) pairs set of possible actions state-action transition distribution discount factor

Start in initial state 𝑡0 for t = 1 to …: choose action 𝑏𝑢 “move” to next state 𝑡𝑢 ∼ 𝜌 ⋅ 𝑡𝑢−1, 𝑏𝑢)

slide-52
SLIDE 52

Markov Decision Process: Formalizing Reinforcement Learning (𝒯, 𝒝, ℛ, 𝜌, 𝛿)

Markov Decision Process:

set of possible states reward of (state, action) pairs set of possible actions state-action transition distribution discount factor

Start in initial state 𝑡0 for t = 1 to …: choose action 𝑏𝑢 “move” to next state 𝑡𝑢 ∼ 𝜌 ⋅ 𝑡𝑢−1, 𝑏𝑢) get reward 𝑠

𝑢 = ℛ(𝑡𝑢, 𝑏𝑢)

slide-53
SLIDE 53

Markov Decision Process: Formalizing Reinforcement Learning (𝒯, 𝒝, ℛ, 𝜌, 𝛿)

Markov Decision Process:

set of possible states reward of (state, action) pairs set of possible actions state-action transition distribution discount factor

Start in initial state 𝑡0 for t = 1 to …: choose action 𝑏𝑢 “move” to next state 𝑡𝑢 ∼ 𝜌 ⋅ 𝑡𝑢−1, 𝑏𝑢) get reward 𝑠

𝑢 = ℛ(𝑡𝑢, 𝑏𝑢)

  • bjective: maximize

time-discounted reward

slide-54
SLIDE 54

Markov Decision Process: Formalizing Reinforcement Learning (𝒯, 𝒝, ℛ, 𝜌, 𝛿)

Markov Decision Process:

set of possible states reward of (state, action) pairs set of possible actions state-action transition distribution discount factor

Start in initial state 𝑡0 for t = 1 to …: choose action 𝑏𝑢 “move” to next state 𝑡𝑢 ∼ 𝜌 ⋅ 𝑡𝑢−1, 𝑏𝑢) get reward 𝑠

𝑢 = ℛ(𝑡𝑢, 𝑏𝑢)

  • bjective: maximize

discounted reward

max

𝜌

𝑢>0

𝛿𝑢𝑠𝑢

slide-55
SLIDE 55

Markov Decision Process: Formalizing Reinforcement Learning (𝒯, 𝒝, ℛ, 𝜌, 𝛿)

Markov Decision Process:

set of possible states reward of (state, action) pairs set of possible actions state-action transition distribution discount factor

Start in initial state 𝑡0 for t = 1 to …: choose action 𝑏𝑢 “move” to next state 𝑡𝑢 ∼ 𝜌 ⋅ 𝑡𝑢−1, 𝑏𝑢) get reward 𝑠

𝑢 = ℛ(𝑡𝑢, 𝑏𝑢)

“solution”: the policy 𝜌∗ that maximizes the expected (average) time-discounted reward

  • bjective: maximize

discounted reward

max

𝜌

𝑢>0

𝛿𝑢𝑠𝑢

slide-56
SLIDE 56

Markov Decision Process: Formalizing Reinforcement Learning (𝒯, 𝒝, ℛ, 𝜌, 𝛿)

Markov Decision Process:

set of possible states reward of (state, action) pairs set of possible actions state-action transition distribution discount factor

Start in initial state 𝑡0 for t = 1 to …: choose action 𝑏𝑢 “move” to next state 𝑡𝑢 ∼ 𝜌 ⋅ 𝑡𝑢−1, 𝑏𝑢) get reward 𝑠

𝑢 = ℛ(𝑡𝑢, 𝑏𝑢)

𝜌∗ = argmax

𝜌

𝔽 ෍

𝑢>0

𝛿𝑢𝑠𝑢 ; 𝜌

“solution”

  • bjective: maximize

discounted reward

max

𝜌

𝑢>0

𝛿𝑢𝑠𝑢

slide-57
SLIDE 57

Designing Rewards is Highly Task Dependent

Rewards indicate what we want to accomplish, NOT how we want to accomplish it shaping

positive reward often very “far away” rewards for achieving subgoals (domain knowledge) also: adjust initial policy or initial value function

Example: robot in a maze

episodic task, not discounted, +1 when out, 0 for each step

Example: chess

GOOD: +1 for winning, -1 losing BAD: +0.25 for taking opponent’s pieces

high reward even when lose

Slide courtesy/adapted: Peter Bodík

slide-58
SLIDE 58

Overview: Learning Strategies

Dynamic Programming Q-learning Monte Carlo approaches

slide-59
SLIDE 59

Q-learning

𝑅: 𝑡, 𝑏 → ℝ

Goal: learn a function that computes a “goodness” score for taking a particular action 𝑏 in state 𝑡

slide-60
SLIDE 60

Deep/Neural Q-learning

𝑅 𝑡, 𝑏; 𝜄 ≈ 𝑅∗(𝑡, 𝑏)

desired optimal solution neural network

Approach: Form (and learn) a neural network to model

  • ur optimal Q function
slide-61
SLIDE 61

Deep/Neural Q-learning

𝑅 𝑡, 𝑏; 𝜄 ≈ 𝑅∗(𝑡, 𝑏)

desired optimal solution neural network

Approach: Form (and learn) a neural network to model

  • ur optimal Q function

Learn weights (parameters) 𝜄 of our neural network

slide-62
SLIDE 62

Outline

Decision Trees Ensemble Methods Bagging Random Forests Reinforcement Learning