SLIDE 1
The Bayesian Chow-Liu Algorithm Joe Suzuki Osaka University - - PowerPoint PPT Presentation
The Bayesian Chow-Liu Algorithm Joe Suzuki Osaka University - - PowerPoint PPT Presentation
The Bayesian Chow-Liu Algorithm Joe Suzuki Osaka University September 19, 2012 Granada, Spain September 19, 2012Granada, Spain 1 / Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm 20 Introduction Chow-Liu: Tree Approximation
SLIDE 2
SLIDE 3
Introduction
Example
i 1 1 2 1 2 3 j 2 3 3 4 4 4 I(i, j) 12 10 8 6 4 2
❥ ❥ ❥ ❥ 2 4 1 3 ❥ ❥ ❥ ❥ 2 4 1 3 ❥ ❥ ❥ ❥ 2 4 1 3 ❥ ❥ ❥ ❥ 2 4 1 3 ❅ ❅
Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 3 / 20
SLIDE 4
Introduction
Chow-Liu: Tree Estimation with ML
Estimation Not P1,··· ,N but n examples xn = {(x(1)
i
, · · · , x(N)
i
)}n
i=1 are available
ˆ Hn(xn|E): the empirical entropy w.r.t. the tree
- btained via the relative frequencies from xn
ˆ Hn(xn|E) → min Connect {i, j} with the largest empirical ˆ I(i, j) · · ·
Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 4 / 20
SLIDE 5
Introduction
Chow-Liu: Tree Estimation with Bayes (Suzuki, 1993)
Rn(xn|E) := ∏
{i,j}∈E
Rn(i, j) Rn(i)Rn(j) ∏
i∈V
Rn(i) α(i): how many values X (i) takes Rn(i) := Γ(n + α(i)/2)Γ(a)α(i) Γ(α(i)/2) ∏
x(i) Γ(ci[x(i)] + 1/2)
Rn(i, j) := Γ(n + α(i)α(j)/2)Γ(1/2)α(i)α(j) Γ(α(i)α(j)/2) ∏
x(i),x(j) Γ(ci,j[x(i), x(j)] + 1/2)
J(i, j) := 1 n log Rn(i, j) Rn(i)Rn(j) π(E)Rn(xn|E) → max (π: prior prob. assuming to be uniform) Connect {i, j} with the largest J(i, j) · · ·
Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 5 / 20
SLIDE 6
Introduction
Chow-Liu: Tree Estimation with MDL (Suzuki, 1993)
L(xn|E) := − log Rn(xn|E) ≈ ˆ Hn(xn|E) + 1 2k(E) log n k(E): # of parameters in the tree J(i, j) ≈ ˆ I(i, j) − 1 2n(α(i) − 1)(α(j) − 1) log n α(i): how many values X (i) takes L(xn|E) → min Connect X (i), X (j) with the largest J(i, j) · · ·
Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 6 / 20
SLIDE 7
Introduction
ML vs MDL
ML MDL selection minimizes minimizes
- f E
ˆ Hn(xn|E) ˆ Hn(xn|E) + 1
2k(E) log n
selection maxmizes maxmizes
- f {i, j}
ˆ I(i, j) ˆ I(i, j) − 1
2n(α(i) − 1)(α(j) − 1) log n
criterion fitness of xn to E fitness of xn to E simplicity of E
Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 7 / 20
SLIDE 8
What if both discrete and continuous variables are present
What if discrete and continuous variables are present
All the variables are discrete = ⇒ unrealistic In reality, some fields are discrete and others continuous in any database What are Bayesian measures Rn(i), Rn(j), Rn(i, j) Bayesian estimator of mutual information J(i, j) = 1 n log Rn(i, j) Rn(i)Rn(j) for the general case ?
Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 8 / 20
SLIDE 9
What if both discrete and continuous variables are present
Estimation of density functions
A0 := {A} with A := [0, 1) Aj+1 is a refinement of Aj A1 = {[0, 1/2), [1/2, 1)} A2 = {[0, 1/4), [1/4, 1/2), [1/2, 3/4), [3/4, 1)} . . . Aj = {[0, 2−(j−1)), [2−(j−1), 2 · 2−(j−1)), · · · , [(2j−1 − 1)2−(j−1), 1)} . . . Qn
j : prediction probability w.r.t. An j
sj : A → Aj (quantization) λ: Lebesgue measure (interval width) gn
j (xn) :=
Qn
j (sj(x1), · · · , sj(xn))
λ(sj(x1)) · · · λ(sj(xn)) , xn = (x1, · · · , xn) ∈ An ∑ ωj = 1, ωj > 0, gn(xn) := ∑
j
ωjgn
j (xn)
Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 9 / 20
SLIDE 10
What if both discrete and continuous variables are present
Ryabko 2009
fj(x) := P(sj(x)) λ(sj(x)) (density function for level j) f n(xn) := f (x1) · · · f (xn) Proposition Suppose we choose {Aj} s.t. D(f ||fj) := E[log f (X) fj(X)] → 0 as j → ∞. Then, for any f , as n → ∞, a.e. 1 n log f n(xn) gn(xn) → 0 (1)
Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 10 / 20
SLIDE 11
What if both discrete and continuous variables are present
Estimation of generalized density functions
B0 := {B} with B := {1, 2, 3, · · · } B1 := {{1}, {2, 3, · · · }} B2 := {{1}, {2}, {3, 4, · · · }} . . . Bk := {{1}, {2}, · · · , {k}, {k + 1, k + 2, · · · }} . . . Qn
k : prediction probability w.r.t. Bn k
tk : B → Bk (quantization) η({k}) = 1 k − 1 k + 1 gn
k (yn) := Qn k (tk(y1), · · · , tk(yn))
η(tk(y1)) · · · η(tk(yn)) , yn = (y1, · · · , yn) ∈ Bn ∑ ωk = 1, ωk > 0, gn(yn) := ∑
k
ωkgn
k (yn)
Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 11 / 20
SLIDE 12
What if both discrete and continuous variables are present
Suzuki 2011
f (y) := dP dη (y), fk(y) := P(sk(y)) η(sk(y)) Suppose that η is σ-finite, and that P ≪ η. Theorem 1 (estimation of generalized density functions) Suppose we choose {Bk} s.t. D(f ||fk) := E[log f (Y ) fk(Y )] → 0 as k → ∞. Then, for any f , as n → ∞, a.e. 1 n log f n(yn) gn(yn) → 0 (2)
Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 12 / 20
SLIDE 13
What if both discrete and continuous variables are present
(X, Y ) ∈ A × B
Qn
jk: prediction probability w.r.t. (Aj × Bk)n
gn
jk(xn, yn) :=
Qn
jk(sj(x1), · · · , sj(xn), tk(y1), · · · , tk(yn))
λ(sj(x1)) · · · λ(sj(xn))η(tk(y1)) · · · η(tk(yn)) ∑
jk ωjk = 1, ωjk > 0, gn(xn, yn) :=
∑
j,k
ωjkgn
jk(xn, yn)
For any f , as n → ∞, a.e. 1 n log f n(xn, yn) gn(xn, yn) → 0 (3)
Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 13 / 20
SLIDE 14
What if both discrete and continuous variables are present
Estimation of Mutual Information
Given X n = xn and Y n = yn, from the strong law of large numbers: 1 n log f n(xn, yn) f n(xn)f n(yn) = 1 n
n
∑
i=1
log f (xi, yi) f (xi)f (yi) → I(X, Y ) and (1)(2)(3), we obtain Theorem 2 1 n log gn(xn, yn) gn(xn)gn(yn) → I(X, Y ) a.e. as n → ∞
Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 14 / 20
SLIDE 15
What if both discrete and continuous variables are present
A Generalized Version of the Chow-Liu with Bayes/MDL
Rn(xn|E); a measure gn(xn|E): a generalized density function (contains Rn as a special case) Rn(i), Rn(j), Rn(i, j) J(i, j) = 1 n log Rn(i, j) Rn(i)Rn(j) are replaced by the generalized version: gn(i), gn(j), gn(i, j) J(i, j) = 1 n log gn(i, j) gn(i)gn(j) g(xn|E) → max Connect X (i), X (j) with the largest J(i, j) · · ·
Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 15 / 20
SLIDE 16
What if both discrete and continuous variables are present
Computing g n(xn): O(nJ)
xn = (x1, · · · , xn)
1 gn := 0 2 for j = 1, · · · , J 1
c[a] := 0 for a ∈ Aj;
2
g n
j := 1;
3
for i = 1, · · · , n
1
a := s(xi); // quantization
2
c[a] := c[a] + 1;
3
g n
j := g n j ∗ c[a] + 1/2
j + |Aj|/2 /λ(a);
4
g n := g n + wj ∗ gj;
Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 16 / 20
SLIDE 17
What if both discrete and continuous variables are present
Computing g n(xn, y n): O(nJK)
xn = (x1, · · · , xn), yn = (y1, · · · , yn)
1 gn := 0 2 for j = 1, · · · , J, k = 1, · · · , K 1
c[a, b] := 0 for (a, b) ∈ Aj × Bk;
2
g n
jk := 1;
3
for i = 1, · · · , n
1
a := sj(xi); b := tk(yi); // quantization
2
c[a, b] := c[a, b] + 1;
3
g n
jk := g n jk ∗ c[a, b] + 1/2
j + |Aj||Bk|/2/{λ(a)η(b)};
4
g n := g n + wjk ∗ gjk;
Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 17 / 20
SLIDE 18
What if both discrete and continuous variables are present
Experiments (1)
{Aj}J
j=1, {Bk}K k=1
J = K n 1 n log f n(xn, yn) gn(xn, yn) time (ms) 2 100 0.0307 1.23 2 1000 0.0281 10.67 4 100 0.0049 3.29 4 1000 0.0021 28.71 The larger J, K, the better correctness Computation is linear with J, K and n
Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 18 / 20
SLIDE 19
What if both discrete and continuous variables are present
Experiments (2)
{Aj}J
j=1 with J = 4 and n = 1000
f = f2 j gn
j (xn)
1 0.307 2 0.981 3 0.198 4 0.097 f = f4 j gn
j (xn)
1 0.083 2 0.141 3 0.198 4 0.797 gn(xn) = ∑
j wjgj(xn) vs maxj wjgj(xn)
Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 19 / 20
SLIDE 20