Decision Trees Gavin Brown Every Learning Method has Limitations - - PowerPoint PPT Presentation
Decision Trees Gavin Brown Every Learning Method has Limitations - - PowerPoint PPT Presentation
Decision Trees Gavin Brown Every Learning Method has Limitations Linear model? KNN ? SVM ? Explain your decisions Sometimes we need interpretable results from our techniques. How do you explain the above decision? Different types of data Rugby
Every Learning Method has Limitations
Linear model? KNN? SVM?
Explain your decisions
Sometimes we need interpretable results from our techniques. How do you explain the above decision?
Different types of data
Rugby players - height, weight can be plotted in 2-d. How do you plot hair colour? (Black, Brown, Blonde?) Predicting heart disease - how do you plot blood type? (A, B, O)? In general, how do you deal with categorical data?
The Tennis Problem
You are working for the local tennis club. They want a program that will advise inexperienced new members
- n whether they are likely to enjoy a game today, given the current
weather conditions. However they need the program to pop out interpretable rules so they can be sure it’s not giving bad advice. They provide you with some historical data....
The Tennis Problem
Outlook Temperature Humidity Wind Play Tennis? 1 Sunny Hot High Weak No 2 Sunny Hot High Strong No 3 Overcast Hot High Weak Yes 4 Rain Mild High Weak Yes 5 Rain Cool Normal Weak Yes 6 Rain Cool Normal Strong No 7 Overcast Cool Normal Strong Yes 8 Sunny Mild High Weak No 9 Sunny Cool Normal Weak Yes 10 Rain Mild Normal Weak Yes 11 Sunny Mild Normal Strong Yes 12 Overcast Mild High Strong Yes 13 Overcast Hot Normal Weak Yes 14 Rain Mild High Strong No
Note: 9 examples say ’yes’, 5 examples say ’no’.
A Decision Tree for the Tennis Problem
This tree works for any example in the table — try it!
Learning a Decision Tree : Basic recursive algorithm
tree ← learntree( data ) if all examples in data have same label, return leaf node with that label else pick the most “important” feature, call it F for each possible value v of F data(v) ← all examples where F == v add branch ← learntree( data(v) ) endfor return tree endif
Example: partitioning data by “wind” feature
Outlook Temp Humid Wind Play? 2 Sunny Hot High Strong No 6 Rain Cool Normal Strong No 7 Overcast Cool Normal Strong Yes 11 Sunny Mild Normal Strong Yes 12 Overcast Mild High Strong Yes 14 Rain Mild High Strong No
3 examples say yes, 3 say no.
Outlook Temp Humid Wind Play? 1 Sunny Hot High Weak No 3 Overcast Hot High Weak Yes 4 Rain Mild High Weak Yes 5 Rain Cool Normal Weak Yes 8 Sunny Mild High Weak No 9 Sunny Cool Normal Weak Yes 10 Rain Mild Normal Weak Yes 13 Overcast Hot Normal Weak Yes
6 examples say yes, 2 examples say no.
Learning a Decision Tree : Basic recursive algorithm
tree ← learntree( data ) if all examples in data have same label, return leaf node with that label else pick the most “important” feature , call it F for each possible value v of F data(v) ← all examples where F == v add branch ← learntree( data(v) ) endfor return tree endif Which is the most important feature?
Thinking in Probabilities...
Before the split : 9 ’yes’, 5 ’no’, ......... p(′yes′) = 9
14 ≈ 0.64
On the left branch : 3 ’yes’, 3 ’no’, ....... p(′yes′) = 3
6 = 0.5
On the right branch : 6 ’yes’, 2 ’no’, ...... p(′yes′) = 6
8 = 0.75
Remember... p(′no′) = 1 − p(′yes′)
The ‘Information’ contained in a variable - Entropy
More uncertainty = Less information H(X) = 1.0
The ‘Information’ contained in a variable - Entropy
Lower uncertainty = More information H(X) = 0.72193
Entropy
The amount of randomness in a variable X is called the ’entropy’. H(X) = −
- i
p(xi) log p(xi) (1) The log is base 2, giving us units of measurement ’bits’.
Reducing Entropy = Maximise Information Gain
The variable of interest is “T” (for tennis), taking on ’yes’ or ’no’
- values. Before the split : 9 ’yes’, 5 ’no’, .........
p(′yes′) = 9
14 ≈ 0.64
In the whole dataset, the entropy is: H(T) = −
- i
p(xi) log p(xi) = − 5 14log 5 14 + 9 14log 9 14
- = 0.94029