Machine Learning
Decision Trees: Representation
1
Some slides from Tom Mitchell, Dan Roth and others
Decision Trees: Representation Machine Learning 1 Some slides from - - PowerPoint PPT Presentation
Decision Trees: Representation Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others Key issues in machine learning Modeling How to formulate your problem as a machine learning problem? How to represent data? Which
1
Some slides from Tom Mitchell, Dan Roth and others
2
3
4
5
6
Name Label Claire Cardie
Eric Baum
Haym Hirsh
Yoav Freund
Name Name has punctuation? Second character of first name Length of first name>5? Same first letter in two names? Label Claire Cardie No l Yes Yes
No e No No
Eric Baum No r No No
Haym Hirsh No a No Yes
Kaelbling No e Yes No
Yoav Freund No
No
Name Name has punctuation? Second character of first name Length of first name>5? Same first letter in two names? Label Claire Cardie No l Yes Yes
No e No No
Eric Baum No r No No
Haym Hirsh No a No Yes
Kaelbling No e Yes No
Yoav Freund No
No
With these four attributes, how many unique rows are possible? 2· 26· 26· 2 = 2704 If there are 100 attributes, all binary, how many unique rows are possible? 2100
9
Name Name has punctuation? Second character of first name Length of first name>5? Same first letter in two names? Label Claire Cardie No l Yes Yes
No e No No
Eric Baum No r No No
Haym Hirsh No a No Yes
Kaelbling No e Yes No
Yoav Freund No
No
With these four attributes, how many unique rows are possible? 2×26×2×2 = 208 If there are 100 attributes, all binary, how many unique rows are possible? 2100
10
Name Name has punctuation? Second character of first name Length of first name>5? Same first letter in two names? Label Claire Cardie No l Yes Yes
No e No No
Eric Baum No r No No
Haym Hirsh No a No Yes
Kaelbling No e Yes No
Yoav Freund No
No
With these four attributes, how many unique rows are possible? 2×26×2×2 = 208 If there are 100 attributes, all binary, how many unique rows are possible? 2100
11
Name Name has punctuation? Second character of first name Length of first name>5? Same first letter in two names? Label Claire Cardie No l Yes Yes
No e No No
Eric Baum No r No No
Haym Hirsh No a No Yes
Kaelbling No e Yes No
Yoav Freund No
No
With these four attributes, how many unique rows are possible? 2×26×2×2 = 208 If there are 100 attributes, all binary, how many unique rows are possible? (100 times) 2×2×2× ⋯×2 = 2)**
12
Name Name has punctuation? Second character of first name Length of first name>5? Same first letter in two names? Label Claire Cardie No l Yes Yes
No e No No
Eric Baum No r No No
Haym Hirsh No a No Yes
Kaelbling No e Yes No
Yoav Freund No
No
With these four attributes, how many unique rows are possible? 2×26×2×2 = 208 If there are 100 attributes, all binary, how many unique rows are possible? (100 times) 2×2×2× ⋯×2 = 2)**
13
14
15
Label=A Label=C Label=B
16
17
Label=A Label=C Label=B
18
Label=A Label=C Label=B
19
Label=A Label=C Label=B
20
Label=A Label=C Label=B
21
Label=A Label=C Label=B
22
Label=A Label=C Label=B
23
Label=A Label=C Label=B
24
Label=A Label=C Label=B
Label=A Label=C Label=B
25
Label=A Label=C Label=B
26
Any Boolean function can be represented as a decision tree.
27
Any Boolean function can be represented as a decision tree.
28
Any Boolean function can be represented as a decision tree.
29
30
(color=blue, second letter=e, etc.)
31
(color=blue, second letter=e, etc.)
32
(color=blue, second letter=e, etc.)
1 3 X 7 5 Y
33
(color=blue, second letter=e, etc.)
1 3 X 7 5 Y
34
(color=blue, second letter=e, etc.)
1 3 X 7 5 Y
Decision boundaries can be non-linear
35
36
37
38
Label=A Label=C Label=B (think about what it means for two trees to be structurally different)