SLIDE 29 Handling Missing Values: Handling Missing Values: Proportional Distribution Proportional Distribution
Attach a weight w Attach a weight wi
i to each example (
to each example (x xi
i,y
,yi
i).
).
– – At the root of the tree, all examples have a weight of 1.0 At the root of the tree, all examples have a weight of 1.0
Modify all mutual information computations to use weights inste Modify all mutual information computations to use weights instead ad
When considering a test on attribute When considering a test on attribute j j, only consider those examples , only consider those examples for which x for which xij
ij is not missing
is not missing When splitting the examples on attribute When splitting the examples on attribute j j: :
– – Let p Let pL
L be the probability that a non
be the probability that a non-
- missing example is sent to the left
missing example is sent to the left child and p child and pR
R be the probability that it is sent to the right child
be the probability that it is sent to the right child – – For each example ( For each example (x xi
i,y
,yi
i) that is missing attribute
) that is missing attribute j j, sent it to both , sent it to both
- children. Send it to the left child with weight w
- children. Send it to the left child with weight wi
i := w
:= wi
i ·
· p pL
L and to the right
and to the right child with weight w child with weight wi
i := w
:= wi
i ·
· p pR
R
When classifying an example that is missing attribute When classifying an example that is missing attribute j j: :
– – Send it down the left subtree. Let P( Send it down the left subtree. Let P(ŷ ŷL
L|
|x x) be the resulting prediction ) be the resulting prediction – – Send it down the right subtree. Let P( Send it down the right subtree. Let P(ŷ ŷR
R|
|x x) be the resulting prediction ) be the resulting prediction – – Return p Return pL
L ·
· P( P(ŷ ŷL
L|
|x x) + p ) + pR
R ·
· P( P(ŷ ŷR
R|
|x x) )