Lecture 6
pr@n2nsi"eIS@n "mAd@lIN Michael Picheny, Bhuvana Ramabhadran, Stanley F . Chen, Markus Nussbaum-Thom
Watson Group IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen,nussbaum}@us.ibm.com
Lecture 6 pr@n2nsi"eIS@n "mAd@lIN Michael Picheny, - - PowerPoint PPT Presentation
Lecture 6 pr@n2nsi"eIS@n "mAd@lIN Michael Picheny, Bhuvana Ramabhadran, Stanley F . Chen, Markus Nussbaum-Thom Watson Group IBM T.J. Watson Research Center Yorktown Heights, New York, USA
Watson Group IBM T.J. Watson Research Center Yorktown Heights, New York, USA {picheny,bhuvana,stanchen,nussbaum}@us.ibm.com
2 / 96
3 / 96
4 / 96
5 / 96
1
2
6 / 96
1
7 / 96
8 / 96
9 / 96
10 / 96
11 / 96
12 / 96
13 / 96
14 / 96
1
15 / 96
16 / 96
17 / 96
18 / 96
19 / 96
20 / 96
21 / 96
22 / 96
23 / 96
24 / 96
25 / 96
26 / 96
1
27 / 96
28 / 96
29 / 96
30 / 96
31 / 96
32 / 96
wbKIY KIYP IYPwb
33 / 96
34 / 96
35 / 96
1
2
36 / 96
2
37 / 96
38 / 96
39 / 96
40 / 96
41 / 96
2
42 / 96
43 / 96
44 / 96
45 / 96
Five equivalence classes, which is much less than enumerating each of the possibilities. No uncertainy left in the classes. A node without children is called a leaf. Otherwise it is called an internal node
46 / 96
47 / 96
48 / 96
2
49 / 96
1
2
3
50 / 96
51 / 96
52 / 96
53 / 96
54 / 96
j
55 / 96
56 / 96
57 / 96
58 / 96
59 / 96
60 / 96
61 / 96
62 / 96
2
63 / 96
64 / 96
1
i
2
i and denote it x′, Q′
3
4
65 / 96
66 / 96
j , cr j denote the frequency of the jth outcome in the left
j cl j , nr = j cr j
67 / 96
M
j) cl
j
M
j )cr
j
M
j log pl j + M
j log pr j
j, pr j
68 / 96
j, pr j gives:
log L(x1, . . . , xN|Q) =
M
X
j=1
cl
j log
cl
j
nl +
M
X
j=1
cr
j log
cr
j
nr =
M
X
j=1
cl
j log cl j − log nl M
X
j=1
cl
j + M
X
j=1
cr
j log cr j − log nr M
X
j=1
cr
j
=
M
X
j=1
{cl
j log cl j + cr j log cr j } − nl log nl − nr log nr
j , cr j , nl, nr are all non-negative integers.
69 / 96
70 / 96
M
71 / 96
72 / 96
M
M
73 / 96
3
74 / 96
75 / 96
M
X
i=1
{cl
i log cl i + cr i log cr i } − nl log nl − nr log nr
log2 L(x1, . . . , x12|QA) =
cl
p
z }| { 1 log2 1 +
cl
f
z }| { 4 log2 4 +
cr
p
z }| { 3 log2 3 +
cr
φ
z }| { 4 log2 4 −
nl
z }| { 5 log2 5 −
nr
z }| { 7 log2 7 = −10.51
H(x1, . . . , x12|QA) = − log2 L(x1, . . . , x12|QA)/N = 10.51/12 = .87 bits
76 / 96
77 / 96
log2 L(x1, . . . , x12|QB) = 2 log2 2 + 2 log2 2 + 3 log2 3 + 2 log2 2 + 2 log2 2 − 7 log2 7 − 5 log2 5 = −18.51
H(x1, . . . , x12|QB) = − log2 L(x1, . . . , x12|QB)/N = 18.51/12 = .87 bits
78 / 96
79 / 96
log2 L(x1, . . . , x12|QC) = 2 log2 2 + 2 log2 2 + 2 log2 2 + 2 log2 2 + 4 log2 4 − 4 log2 4 − 8 log2 8 = −16.00
H(x1, . . . , x12|QC) = − log2 L(x1, . . . , x12|QC)/N = 16/12 = 1.33 bits
80 / 96
81 / 96
2
82 / 96
83 / 96
84 / 96
85 / 96
86 / 96
2 n
p
j + p
j }
n
j = 1/n n
ij − µ2 j , j = 1, . . . p
n
p
j + p
j }
87 / 96
n
p
2 = p
1 ˆ σj 2 n
ij ) − 2 ˆ
n
2
p
1 ˆ σj 2
n
ij ) − n ˆ
2
p
1 ˆ σj 2nˆ
j = p
n
n
p
2 + p
p
2 + np}
88 / 96
89 / 96
p
lj + nr p
rj
lj = 1/nl y∈Yl
j − 1/nl2( y∈Yl
rj = 1/nr y∈Yr
j − 1/nr 2( y∈Yr
90 / 96
91 / 96
92 / 96
93 / 96
94 / 96
95 / 96
96 / 96