The Bayesian Chow-Liu Algorithm Joe Suzuki Osaka University - - PowerPoint PPT Presentation

the bayesian chow liu algorithm
SMART_READER_LITE
LIVE PREVIEW

The Bayesian Chow-Liu Algorithm Joe Suzuki Osaka University - - PowerPoint PPT Presentation

The Bayesian Chow-Liu Algorithm Joe Suzuki Osaka University September 19, 2012 Granada, Spain September 19, 2012Granada, Spain 1 / Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm 20 Introduction Chow-Liu: Tree Approximation


slide-1
SLIDE 1

The Bayesian Chow-Liu Algorithm

Joe Suzuki

Osaka University

September 19, 2012 Granada, Spain

Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 1 / 20

slide-2
SLIDE 2

Introduction

Chow-Liu: Tree Approximation (1968)

X (1), · · · , X (N): N (≥ 1) discrete random variables P1,··· ,N(x(1), · · · , x(N)): the distribution of X (1) = x(1), · · · , X (N) = x(N) Assume V := {1, · · · , N} and E ⊆ {{i, j}|i ̸= j, i, j ∈ V } consist a tree. Q1,··· ,N(x(1), · · · , x(N)|E) = ∏

{i,j}∈E

Pi,j(x(i), x(j)) Pi(x(i))Pj(x(j)) ∏

i∈V

Pi(x(i)) D(P1,··· ,N||Q1,··· ,N) → min Connect {i, j} with the largest I(i, j) if no loop is generated

Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 2 / 20

slide-3
SLIDE 3

Introduction

Example

i 1 1 2 1 2 3 j 2 3 3 4 4 4 I(i, j) 12 10 8 6 4 2

❥ ❥ ❥ ❥ 2 4 1 3 ❥ ❥ ❥ ❥ 2 4 1 3 ❥ ❥ ❥ ❥ 2 4 1 3 ❥ ❥ ❥ ❥ 2 4 1 3 ❅ ❅

Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 3 / 20

slide-4
SLIDE 4

Introduction

Chow-Liu: Tree Estimation with ML

Estimation Not P1,··· ,N but n examples xn = {(x(1)

i

, · · · , x(N)

i

)}n

i=1 are available

ˆ Hn(xn|E): the empirical entropy w.r.t. the tree

  • btained via the relative frequencies from xn

ˆ Hn(xn|E) → min Connect {i, j} with the largest empirical ˆ I(i, j) · · ·

Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 4 / 20

slide-5
SLIDE 5

Introduction

Chow-Liu: Tree Estimation with Bayes (Suzuki, 1993)

Rn(xn|E) := ∏

{i,j}∈E

Rn(i, j) Rn(i)Rn(j) ∏

i∈V

Rn(i) α(i): how many values X (i) takes Rn(i) := Γ(n + α(i)/2)Γ(a)α(i) Γ(α(i)/2) ∏

x(i) Γ(ci[x(i)] + 1/2)

Rn(i, j) := Γ(n + α(i)α(j)/2)Γ(1/2)α(i)α(j) Γ(α(i)α(j)/2) ∏

x(i),x(j) Γ(ci,j[x(i), x(j)] + 1/2)

J(i, j) := 1 n log Rn(i, j) Rn(i)Rn(j) π(E)Rn(xn|E) → max (π: prior prob. assuming to be uniform) Connect {i, j} with the largest J(i, j) · · ·

Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 5 / 20

slide-6
SLIDE 6

Introduction

Chow-Liu: Tree Estimation with MDL (Suzuki, 1993)

L(xn|E) := − log Rn(xn|E) ≈ ˆ Hn(xn|E) + 1 2k(E) log n k(E): # of parameters in the tree J(i, j) ≈ ˆ I(i, j) − 1 2n(α(i) − 1)(α(j) − 1) log n α(i): how many values X (i) takes L(xn|E) → min Connect X (i), X (j) with the largest J(i, j) · · ·

Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 6 / 20

slide-7
SLIDE 7

Introduction

ML vs MDL

ML MDL selection minimizes minimizes

  • f E

ˆ Hn(xn|E) ˆ Hn(xn|E) + 1

2k(E) log n

selection maxmizes maxmizes

  • f {i, j}

ˆ I(i, j) ˆ I(i, j) − 1

2n(α(i) − 1)(α(j) − 1) log n

criterion fitness of xn to E fitness of xn to E simplicity of E

Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 7 / 20

slide-8
SLIDE 8

What if both discrete and continuous variables are present

What if discrete and continuous variables are present

All the variables are discrete = ⇒ unrealistic In reality, some fields are discrete and others continuous in any database What are Bayesian measures Rn(i), Rn(j), Rn(i, j) Bayesian estimator of mutual information J(i, j) = 1 n log Rn(i, j) Rn(i)Rn(j) for the general case ?

Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 8 / 20

slide-9
SLIDE 9

What if both discrete and continuous variables are present

Estimation of density functions

A0 := {A} with A := [0, 1) Aj+1 is a refinement of Aj A1 = {[0, 1/2), [1/2, 1)} A2 = {[0, 1/4), [1/4, 1/2), [1/2, 3/4), [3/4, 1)} . . . Aj = {[0, 2−(j−1)), [2−(j−1), 2 · 2−(j−1)), · · · , [(2j−1 − 1)2−(j−1), 1)} . . . Qn

j : prediction probability w.r.t. An j

sj : A → Aj (quantization) λ: Lebesgue measure (interval width) gn

j (xn) :=

Qn

j (sj(x1), · · · , sj(xn))

λ(sj(x1)) · · · λ(sj(xn)) , xn = (x1, · · · , xn) ∈ An ∑ ωj = 1, ωj > 0, gn(xn) := ∑

j

ωjgn

j (xn)

Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 9 / 20

slide-10
SLIDE 10

What if both discrete and continuous variables are present

Ryabko 2009

fj(x) := P(sj(x)) λ(sj(x)) (density function for level j) f n(xn) := f (x1) · · · f (xn) Proposition Suppose we choose {Aj} s.t. D(f ||fj) := E[log f (X) fj(X)] → 0 as j → ∞. Then, for any f , as n → ∞, a.e. 1 n log f n(xn) gn(xn) → 0 (1)

Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 10 / 20

slide-11
SLIDE 11

What if both discrete and continuous variables are present

Estimation of generalized density functions

B0 := {B} with B := {1, 2, 3, · · · } B1 := {{1}, {2, 3, · · · }} B2 := {{1}, {2}, {3, 4, · · · }} . . . Bk := {{1}, {2}, · · · , {k}, {k + 1, k + 2, · · · }} . . . Qn

k : prediction probability w.r.t. Bn k

tk : B → Bk (quantization) η({k}) = 1 k − 1 k + 1 gn

k (yn) := Qn k (tk(y1), · · · , tk(yn))

η(tk(y1)) · · · η(tk(yn)) , yn = (y1, · · · , yn) ∈ Bn ∑ ωk = 1, ωk > 0, gn(yn) := ∑

k

ωkgn

k (yn)

Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 11 / 20

slide-12
SLIDE 12

What if both discrete and continuous variables are present

Suzuki 2011

f (y) := dP dη (y), fk(y) := P(sk(y)) η(sk(y)) Suppose that η is σ-finite, and that P ≪ η. Theorem 1 (estimation of generalized density functions) Suppose we choose {Bk} s.t. D(f ||fk) := E[log f (Y ) fk(Y )] → 0 as k → ∞. Then, for any f , as n → ∞, a.e. 1 n log f n(yn) gn(yn) → 0 (2)

Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 12 / 20

slide-13
SLIDE 13

What if both discrete and continuous variables are present

(X, Y ) ∈ A × B

Qn

jk: prediction probability w.r.t. (Aj × Bk)n

gn

jk(xn, yn) :=

Qn

jk(sj(x1), · · · , sj(xn), tk(y1), · · · , tk(yn))

λ(sj(x1)) · · · λ(sj(xn))η(tk(y1)) · · · η(tk(yn)) ∑

jk ωjk = 1, ωjk > 0, gn(xn, yn) :=

j,k

ωjkgn

jk(xn, yn)

For any f , as n → ∞, a.e. 1 n log f n(xn, yn) gn(xn, yn) → 0 (3)

Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 13 / 20

slide-14
SLIDE 14

What if both discrete and continuous variables are present

Estimation of Mutual Information

Given X n = xn and Y n = yn, from the strong law of large numbers: 1 n log f n(xn, yn) f n(xn)f n(yn) = 1 n

n

i=1

log f (xi, yi) f (xi)f (yi) → I(X, Y ) and (1)(2)(3), we obtain Theorem 2 1 n log gn(xn, yn) gn(xn)gn(yn) → I(X, Y ) a.e. as n → ∞

Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 14 / 20

slide-15
SLIDE 15

What if both discrete and continuous variables are present

A Generalized Version of the Chow-Liu with Bayes/MDL

Rn(xn|E); a measure gn(xn|E): a generalized density function (contains Rn as a special case) Rn(i), Rn(j), Rn(i, j) J(i, j) = 1 n log Rn(i, j) Rn(i)Rn(j) are replaced by the generalized version: gn(i), gn(j), gn(i, j) J(i, j) = 1 n log gn(i, j) gn(i)gn(j) g(xn|E) → max Connect X (i), X (j) with the largest J(i, j) · · ·

Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 15 / 20

slide-16
SLIDE 16

What if both discrete and continuous variables are present

Computing g n(xn): O(nJ)

xn = (x1, · · · , xn)

1 gn := 0 2 for j = 1, · · · , J 1

c[a] := 0 for a ∈ Aj;

2

g n

j := 1;

3

for i = 1, · · · , n

1

a := s(xi); // quantization

2

c[a] := c[a] + 1;

3

g n

j := g n j ∗ c[a] + 1/2

j + |Aj|/2 /λ(a);

4

g n := g n + wj ∗ gj;

Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 16 / 20

slide-17
SLIDE 17

What if both discrete and continuous variables are present

Computing g n(xn, y n): O(nJK)

xn = (x1, · · · , xn), yn = (y1, · · · , yn)

1 gn := 0 2 for j = 1, · · · , J, k = 1, · · · , K 1

c[a, b] := 0 for (a, b) ∈ Aj × Bk;

2

g n

jk := 1;

3

for i = 1, · · · , n

1

a := sj(xi); b := tk(yi); // quantization

2

c[a, b] := c[a, b] + 1;

3

g n

jk := g n jk ∗ c[a, b] + 1/2

j + |Aj||Bk|/2/{λ(a)η(b)};

4

g n := g n + wjk ∗ gjk;

Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 17 / 20

slide-18
SLIDE 18

What if both discrete and continuous variables are present

Experiments (1)

{Aj}J

j=1, {Bk}K k=1

J = K n 1 n log f n(xn, yn) gn(xn, yn) time (ms) 2 100 0.0307 1.23 2 1000 0.0281 10.67 4 100 0.0049 3.29 4 1000 0.0021 28.71 The larger J, K, the better correctness Computation is linear with J, K and n

Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 18 / 20

slide-19
SLIDE 19

What if both discrete and continuous variables are present

Experiments (2)

{Aj}J

j=1 with J = 4 and n = 1000

f = f2 j gn

j (xn)

1 0.307 2 0.981 3 0.198 4 0.097 f = f4 j gn

j (xn)

1 0.083 2 0.141 3 0.198 4 0.797 gn(xn) = ∑

j wjgj(xn) vs maxj wjgj(xn)

Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 19 / 20

slide-20
SLIDE 20

What if both discrete and continuous variables are present

Conclusion

Reformulate (Suzuki 1993) as the Bayes Chow-Liu Apply Bayes measures without assuming either discrete or continuous (Suzuki 2011) Propose Bayesian mutual information estimator Merits easy to execute (could be R commands) easy to embed prior information (a merit of Bayes) mixure of quantizations: more robust than selecting one quantization Other Applictions: Bayesian network structure estimation Markov order estimation (either discrete or continuous)

Joe Suzuki (Osaka University) The Bayesian Chow-Liu Algorithm September 19, 2012Granada, Spain 20 / 20