SLIDE 14 14
CS786 Lecture Slides (c) 2012 P. Poupart
27
Information gain
- A chosen attribute A divides the training set E into
subsets E1, … , Ev according to their values for A, where A has v distinct values.
- Information Gain (IG) or reduction in uncertainty
from the attribute test:
- Choose the attribute with the largest IG
v i i i i i i i i i
n p n n p p I n p n p A remainder
1
) , ( ) (
) ( ) , ( ) ( A remainder n p n n p p I A IG
CS786 Lecture Slides (c) 2012 P. Poupart
28
Information gain
For the training set, p = n = 6, I(6/12, 6/12) = 1 bit Consider the attributes Patrons and Type (and others too): Patrons has the highest IG of all attributes and so is chosen by the DTL algorithm as the root bits )] 4 2 , 4 2 ( 12 4 ) 4 2 , 4 2 ( 12 4 ) 2 1 , 2 1 ( 12 2 ) 2 1 , 2 1 ( 12 2 [ 1 ) ( bits 0541 . )] 6 4 , 6 2 ( 12 6 ) , 1 ( 12 4 ) 1 , ( 12 2 [ 1 ) ( I I I I Type IG I I I Patrons IG
.541