Decision tree learning Introduction to Machine Learning Task of - - PowerPoint PPT Presentation

decision tree learning
SMART_READER_LITE
LIVE PREVIEW

Decision tree learning Introduction to Machine Learning Task of - - PowerPoint PPT Presentation

INTRODUCTION TO MACHINE LEARNING Decision tree learning Introduction to Machine Learning Task of classification Automatically assign class to observations with features Observation: vector of features , with a class Automatically


slide-1
SLIDE 1

INTRODUCTION TO MACHINE LEARNING

Decision tree learning

slide-2
SLIDE 2

Introduction to Machine Learning

Task of classification

  • Automatically assign class to observations with features
  • Observation: vector of features, with a class
  • Automatically assign class to new observation with features,

using previous observations

  • Binary classification: two classes
  • Multiclass classification: more than two classes
slide-3
SLIDE 3

Introduction to Machine Learning

Example

  • A dataset consisting of persons
  • Features: age, weight and income
  • Class:
  • binary: happy or not happy
  • multiclass: happy, satisfied or not happy
slide-4
SLIDE 4

Introduction to Machine Learning

Examples of features

  • Features can be numerical
  • age: 23, 25, 75, …
  • height: 175.3, 179.5, …
  • Features can be categorical
  • travel_class: first class, business class, coach class
  • smokes?: yes, no
slide-5
SLIDE 5

Introduction to Machine Learning

The decision tree

  • Suppose you’re classifying patients as sick or not sick
  • Intuitive way of classifying: ask questions

Is the patient young or old?

slide-6
SLIDE 6

Introduction to Machine Learning

The decision tree

  • Suppose you’re classifying patients as sick or not sick
  • Intuitive way of classifying: ask questions

Is the patient young or old? Old

slide-7
SLIDE 7

Introduction to Machine Learning

The decision tree

  • Suppose you’re classifying patients as sick or not sick
  • Intuitive way of classifying: ask questions

Is the patient young or old? Old Smoked for more than 10 years?

slide-8
SLIDE 8

Introduction to Machine Learning

The decision tree

  • Suppose you’re classifying patients as sick or not sick
  • Intuitive way of classifying: ask questions

Is the patient young or old? Vaccinated against the measles? Young Old Smoked for more than 10 years?

slide-9
SLIDE 9

Introduction to Machine Learning

The decision tree

  • Suppose you’re classifying patients as sick or not sick
  • Intuitive way of classifying: ask questions

Is the patient young or old? Vaccinated against the measles? Young Old Smoked for more than 10 years? Yes No … … Yes No … …

slide-10
SLIDE 10

Introduction to Machine Learning

The decision tree

  • Suppose you’re classifying patients as sick or not sick
  • Intuitive way of classifying: ask questions

Is the patient young or old? Vaccinated against the measles? Young Old Smoked for more than 10 years? Yes No … … Yes No … …

It’s a decision tree!!!

slide-11
SLIDE 11

Introduction to Machine Learning

Define the tree

A B C D E F G

slide-12
SLIDE 12

Introduction to Machine Learning

Nodes

Define the tree

A B C D E F G

slide-13
SLIDE 13

Introduction to Machine Learning

Edges

Define the tree

A B C D E F G

slide-14
SLIDE 14

Introduction to Machine Learning

Root

Define the tree

A B C D E F G

slide-15
SLIDE 15

Introduction to Machine Learning

Root Leafs

Define the tree

A B C D E F G

slide-16
SLIDE 16

Introduction to Machine Learning

Root Children of A Children of B, C Grandchildren of A

Define the tree

A B C D E F G

slide-17
SLIDE 17

Introduction to Machine Learning

Root Children of A

Define the tree

A B C D E F G Leafs Children of B, C Grandchildren of A

slide-18
SLIDE 18

Introduction to Machine Learning

Questions to ask

age <= 18 vaccinated smoked not sick sick sick not sick

yes yes yes no no no

slide-19
SLIDE 19

Introduction to Machine Learning

Categorical feature

  • Can be a feature test on itself
  • travel_class: coach, business or first

travel_class … … …

coach business first

slide-20
SLIDE 20

Introduction to Machine Learning

Classifying with the tree

age <= 18 vaccinated smoked not sick sick sick not sick

yes yes yes no no no

Observation: patient of 40 years, vaccinated and didn’t smoke

slide-21
SLIDE 21

Introduction to Machine Learning

Classifying with the tree

age <= 18 vaccinated smoked not sick sick sick not sick

yes yes yes no no no

Observation: patient of 40 years, vaccinated and didn’t smoke

slide-22
SLIDE 22

Introduction to Machine Learning

Classifying with the tree

age <= 18 vaccinated smoked not sick sick sick not sick

yes yes yes no no no

Observation: patient of 40 years, vaccinated and didn’t smoke

slide-23
SLIDE 23

Introduction to Machine Learning

Classifying with the tree

age <= 18 vaccinated smoked not sick sick sick not sick

yes yes yes no no no

Observation: patient of 40 years, vaccinated and didn’t smoke

slide-24
SLIDE 24

Introduction to Machine Learning

Classifying with the tree

age <= 18 vaccinated smoked not sick sick sick not sick

yes yes yes no no no

Observation: patient of 40 years, vaccinated and didn’t smoke

slide-25
SLIDE 25

Introduction to Machine Learning

Classifying with the tree

age <= 18 vaccinated smoked not sick sick sick not sick

yes yes yes no no no

Observation: patient of 40 years, vaccinated and didn’t smoke

Prediction: not sick

slide-26
SLIDE 26

Introduction to Machine Learning

Learn a tree

  • Use training set
  • Come up with queries (feature tests) at each node
slide-27
SLIDE 27

Introduction to Machine Learning

part of training set part of training set part of training set

yes

part of training set

no

training set

age <= 18

Split into parts 2 parts for binary test

TRUE FALSE

slide-28
SLIDE 28

Introduction to Machine Learning

part of training set part of training set

feature test feature test

part of training set

yes

part of training set

no

part of training set

yes

part of training set

no

part of training set part of training set part of training set part of training set

slide-29
SLIDE 29

Introduction to Machine Learning keep splitting until leafs contain small portion of training set

part of training set part of training set part of training set part of training set

slide-30
SLIDE 30

Introduction to Machine Learning

Learn the tree

leaf part of training set class 1 class 2 class

  • Goal: end up with pure leafs — leafs that

contain observations of one particular class

slide-31
SLIDE 31

Introduction to Machine Learning

leaf part of training set class 1 class 2 class leaf part of training set class 1 class 2

  • When classifying new instances
  • end up in leaf
  • Goal: end up with pure leafs — leafs that

contain observations of one particular class

Learn the tree

  • In practice: almost never the case — noise
slide-32
SLIDE 32

Introduction to Machine Learning

leaf part of training set class 1

class 2

Learn the tree

  • assign class of majority of training instances
  • In practice: almost never the case — noise
  • When classifying new instances
  • end up in leaf
  • Goal: end up with pure leafs — leafs that

contain observations of one particular class

slide-33
SLIDE 33

Introduction to Machine Learning

Learn the tree

  • At each node
  • Iterate over different feature tests
  • Choose the best one
  • Comes down to two parts
  • Make list of feature tests
  • Choose test with best split
slide-34
SLIDE 34

Introduction to Machine Learning

Construct list of tests

  • Categorical features
  • Parents/grandparents/… didn’t use the test yet
  • Numerical features
  • Choose feature
  • Choose threshold
slide-35
SLIDE 35

Introduction to Machine Learning

Choose best feature test

  • More complex
  • Use spliing criteria to decide which test to use
  • Information gain ~ entropy
slide-36
SLIDE 36

Introduction to Machine Learning

Information gain

  • Information gained from split based on feature test
  • Test leads to nicely divided classes 

  • > high information gain
  • Test leads to scrambled classes

  • > low information gain
  • Test with highest information gain will be chosen
slide-37
SLIDE 37

Introduction to Machine Learning

Pruning

  • Number of nodes influences chance on overfit
  • Restrict size — higher bias
  • Decrease chance on overfit
  • Pruning the tree
slide-38
SLIDE 38

INTRODUCTION TO MACHINE LEARNING

Let’s practice!

slide-39
SLIDE 39

INTRODUCTION TO MACHINE LEARNING

k-Nearest Neighbors

slide-40
SLIDE 40

Introduction to Machine Learning

Instance-based learning

  • Save training set in memory
  • No real model like decision tree
  • Compare unseen instances to training set
  • Predict using the comparison of unseen data and the

training set

slide-41
SLIDE 41

Introduction to Machine Learning

k-Nearest Neighbor

  • Form of instance-based learning
  • Simplest form: 1-Nearest Neighbor or Nearest Neighbor
slide-42
SLIDE 42

Introduction to Machine Learning

Nearest Neighbor - example

  • 2 features: X1 and X2
  • Class: red or blue
  • Binary classification
slide-43
SLIDE 43

Introduction to Machine Learning

Nearest Neighbor - example

slide-44
SLIDE 44

Introduction to Machine Learning

Nearest Neighbor - example

  • Save complete training set
slide-45
SLIDE 45

Introduction to Machine Learning

Nearest Neighbor - example

  • Save complete training set
  • Given: unseen observation with

features X = (1.3, -2)

slide-46
SLIDE 46

Introduction to Machine Learning

Nearest Neighbor - example

  • Save complete training set
  • Given: unseen observation with

features X = (1.3, -2)

  • Compare training set with new
  • bservation
slide-47
SLIDE 47

Introduction to Machine Learning

Nearest Neighbor - example

  • Save complete training set
  • Given: unseen observation with

features X = (1.3, -2)

  • Compare training set with new
  • bservation
  • Find closest observation —

nearest neighbor — and assign same class

just Euclidean distance, nothing fancy

slide-48
SLIDE 48

Introduction to Machine Learning

k-Nearest Neighbors

  • k is the amount of neighbors
  • If k = 5
  • Use 5 most similar observations (neighbors)
  • Assigned class will be the most represented class

within the 5 neighbors

slide-49
SLIDE 49

Introduction to Machine Learning

Distance metric

  • Important aspect of k-NN

slide-50
SLIDE 50

Introduction to Machine Learning

Distance metric

  • Important aspect of k-NN

  • Euclidian distance:
slide-51
SLIDE 51

Introduction to Machine Learning

Distance metric

  • Important aspect of k-NN

  • Euclidian distance:
  • Manhaan distance:
slide-52
SLIDE 52

Introduction to Machine Learning

Scaling - example

  • Dataset with
  • 2 features: weight and height
  • 3 observations

height (m) weight (kg) 1 1.83 80 2 1.83 80.5 3 1.70 80

distance: 0.5 distance: 0.13

slide-53
SLIDE 53

Introduction to Machine Learning

Scaling - example

  • Dataset with
  • 2 features: weight and height
  • 3 observations

height (cm) weight (kg) 1 183 80 2 183 80.5 3 170 80

distance: 0.5 distance: 13 Scale influences distance!

slide-54
SLIDE 54

Introduction to Machine Learning

Scaling

  • Normalize all features
  • e.g. rescale values between 0 and 1
  • Gives beer measure of real distance
  • Don’t forget to scale new observations
slide-55
SLIDE 55

Introduction to Machine Learning

Categorical features

  • How to use in distance metric?
  • Dummy variables
  • 1 categorical features with N possible outcomes to N binary

features (2 outcomes)

slide-56
SLIDE 56

Introduction to Machine Learning

Dummy variables — Example

mother_tongue Spanish Italian Italian Spanish French French French spanish italian french 1 1 1 1 1 1 1

mother tongue: Spanish, Italian or French

slide-57
SLIDE 57

INTRODUCTION TO MACHINE LEARNING

Let’s practice!

slide-58
SLIDE 58

INTRODUCTION TO MACHINE LEARNING

Introducing: The ROC curve

slide-59
SLIDE 59

Introduction to Machine Learning

Introducing

  • Very powerful performance measure
  • For binary classification
  • Reiceiver Operator Characteristic Curve (ROC Curve)
slide-60
SLIDE 60

Introduction to Machine Learning

Probabilities as output

  • Used decision trees and k-NN to predict class
  • They can also output probability that instance belongs to

class

slide-61
SLIDE 61

Introduction to Machine Learning

Probabilities as output - example

  • Binary classification
  • Decide whether patient is sick or not sick
  • Define probability threshold from which you decide patient

to be sick

New patient: 70% 30% Decision tree: higher than 50% classify as Avoid sending sick patient home: lower threshold to 30%

decision function!

More patients classified as More patients classified as but also

slide-62
SLIDE 62

Introduction to Machine Learning

Confusion matrix

  • Other performance measure for classification
  • Important to construct the ROC curve
slide-63
SLIDE 63

Introduction to Machine Learning

Confusion matrix

  • Binary classifier: positive or negative (1 or 0)

Prediction P N Truth p TP FN n FP TN

slide-64
SLIDE 64

Introduction to Machine Learning

Prediction P N Truth p TP FN n FP TN

True Positives Prediction: P Truth: P

  • Binary classifier: positive or negative (1 or 0)

Confusion matrix

slide-65
SLIDE 65

Introduction to Machine Learning

Confusion matrix

Prediction P N Truth p TP FN n FP TN

False Negatives Prediction: N Truth: P

  • Binary classifier: positive or negative (1 or 0)
slide-66
SLIDE 66

Introduction to Machine Learning

Confusion matrix

Prediction P N Truth p TP FN n FP TN

False Positives Prediction: P Truth: N

  • Binary classifier: positive or negative (1 or 0)
slide-67
SLIDE 67

Introduction to Machine Learning

Confusion matrix

Prediction P N Truth p TP FN n FP TN

True Negatives Prediction: N Truth: N

  • Binary classifier: positive or negative (1 or 0)
slide-68
SLIDE 68

Introduction to Machine Learning

Prediction P N Truth p TP FN n FP TN

TPR TP/(TP+FN)

Ratios in the confusion matrix

  • True positive rate (TPR) = recall
  • False positive rate (FPR)
slide-69
SLIDE 69

Introduction to Machine Learning

Prediction P N Truth p TP FN n FP TN

Ratios in the confusion matrix

  • True positive rate (TPR) = recall
  • False positive rate (FPR)

TPR TP/(TP+FN) Truly Truly + Falsely

slide-70
SLIDE 70

Introduction to Machine Learning

Prediction P N Truth p TP FN n FP TN

Ratios in the confusion matrix

  • True positive rate (TPR) = recall
  • False positive rate (FPR)

FPR FP/(FP+TN)

slide-71
SLIDE 71

Introduction to Machine Learning

Prediction P N Truth p TP FN n FP TN

Ratios in the confusion matrix

  • True positive rate (TPR) = recall
  • False positive rate (FPR)

FPR FP/(FP+TN) Falsely Falsely + Truly

slide-72
SLIDE 72

Introduction to Machine Learning

ROC curve

  • Horizontal axis: FPR
  • Vertical axis: TPR
  • How to draw the curve?

False positive rate True positive rate 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

slide-73
SLIDE 73

Introduction to Machine Learning

Draw the curve

  • Need classifier which outputs probabilities
  • The decision function

probability decide to diagnose

probability

threshold by decision function

slide-74
SLIDE 74

Introduction to Machine Learning

Draw the curve

  • Need classifier which outputs probabilities
  • The decision function

probability

probability decide to diagnose

threshold by decision function

slide-75
SLIDE 75

Introduction to Machine Learning

False positive rate True positive rate 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

50%

probability

>=50%: sick < 50%: healthy

slide-76
SLIDE 76

Introduction to Machine Learning

False positive rate True positive rate 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

0%

probability

all sick

slide-77
SLIDE 77

Introduction to Machine Learning

False positive rate True positive rate 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

100%

probability

all healthy

slide-78
SLIDE 78

Introduction to Machine Learning

Interpreting the curve

False positive rate True positive rate 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

  • Is it a good curve?
  • Closer to le upper corner = beer
  • Good classifiers have big area under the curve
slide-79
SLIDE 79

Introduction to Machine Learning

False positive rate True positive rate 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

AUC = 0.905

Area under the curve (AUC) > 0.9 = very good

slide-80
SLIDE 80

INTRODUCTION TO MACHINE LEARNING

Let’s practice!