Computationally efficient probabilistic inference with noisy - - PowerPoint PPT Presentation

computationally efficient probabilistic inference with
SMART_READER_LITE
LIVE PREVIEW

Computationally efficient probabilistic inference with noisy - - PowerPoint PPT Presentation

Computationally efficient probabilistic inference with noisy threshold models based on a CP tensor decomposition Jirka Vomlel and Petr Tichavsk y Institute of Information Theory and Automation ( UTIA) Academy of Sciences of the Czech


slide-1
SLIDE 1

Computationally efficient probabilistic inference with noisy threshold models based on a CP tensor decomposition

Jirka Vomlel and Petr Tichavsk´ y

Institute of Information Theory and Automation (´ UTIA) Academy of Sciences of the Czech Republic

slide-2
SLIDE 2

Contents

  • Motivation
slide-3
SLIDE 3

Contents

  • Motivation
  • Noisy threshold models
slide-4
SLIDE 4

Contents

  • Motivation
  • Noisy threshold models
  • CP-decomposition of conditional probability tables
slide-5
SLIDE 5

Contents

  • Motivation
  • Noisy threshold models
  • CP-decomposition of conditional probability tables
  • Experiments
slide-6
SLIDE 6

Contents

  • Motivation
  • Noisy threshold models
  • CP-decomposition of conditional probability tables
  • Experiments
  • Conclusions
slide-7
SLIDE 7

Quick Medical Reference - Decision Theoretic (QMR-DT) Miller et al. (1986) and Shwe et al. (1991).

  • 570 diseases in the first level
slide-8
SLIDE 8

Quick Medical Reference - Decision Theoretic (QMR-DT) Miller et al. (1986) and Shwe et al. (1991).

  • 570 diseases in the first level
  • 4075 observations in the second level
slide-9
SLIDE 9

Quick Medical Reference - Decision Theoretic (QMR-DT) Miller et al. (1986) and Shwe et al. (1991).

  • 570 diseases in the first level
  • 4075 observations in the second level
  • all variables are binary
slide-10
SLIDE 10

Quick Medical Reference - Decision Theoretic (QMR-DT) Miller et al. (1986) and Shwe et al. (1991).

  • 570 diseases in the first level
  • 4075 observations in the second level
  • all variables are binary
  • conditional probability tables are noisy-or models
slide-11
SLIDE 11

Quick Medical Reference - Decision Theoretic (QMR-DT) Miller et al. (1986) and Shwe et al. (1991).

  • 570 diseases in the first level
  • 4075 observations in the second level
  • all variables are binary
  • conditional probability tables are noisy-or models

X3 X6 X5 X4 X2 Y1 X1 Y2

slide-12
SLIDE 12

Quick Medical Reference - Decision Theoretic (QMR-DT) Miller et al. (1986) and Shwe et al. (1991).

  • 570 diseases in the first level
  • 4075 observations in the second level
  • all variables are binary
  • conditional probability tables are noisy-or models

X3 X6 X5 X4 X2 Y1 X1 Y2

Definition (The inference task)

Given a subset of observations (e.g. Y1 and Y2) compute probabilities of diseases (e.g. P(Xi|Y1 = y1, Y2 = y2), i = 1, . . . , 6.

slide-13
SLIDE 13

Noisy threshold - a generalization of noisy-or

X′

k

. . . X′

2

X′

1

Y X1 X2 . . . Xk

slide-14
SLIDE 14

Noisy threshold - a generalization of noisy-or

X′

k

. . . X′

2

X′

1

Y X1 X2 . . . Xk

Y takes value 1 if at least ℓ

  • ut of k parents take value 1:

P(Y = 1|X′

1 = x′ 1, . . . , X′ k = x′ k)

= 1 if x′

1 + . . . + x′ k ℓ

  • therwise.
slide-15
SLIDE 15

Noisy threshold - a generalization of noisy-or

X′

k

. . . X′

2

X′

1

Y X1 X2 . . . Xk

Y takes value 1 if at least ℓ

  • ut of k parents take value 1:

P(Y = 1|X′

1 = x′ 1, . . . , X′ k = x′ k)

= 1 if x′

1 + . . . + x′ k ℓ

  • therwise.

Noise: for i = 1, . . . , k P(X′

i = 1|Xi = xi)

= if xi = 0 πi

  • therwise.
slide-16
SLIDE 16

An example for k = 4, ℓ = 1, and πi = 1, i = 1, . . . , k

  • i.e., for deterministic OR function

P(Y = 1|X1 = x1, . . . , X4 = x4)

slide-17
SLIDE 17

An example for k = 4, ℓ = 1, and πi = 1, i = 1, . . . , k

  • i.e., for deterministic OR function

P(Y = 1|X1 = x1, . . . , X4 = x4) =      1 1 1

  • 1

1 1 1

  • 1

1 1 1

  • 1

1 1 1

   

slide-18
SLIDE 18

An example for k = 4, ℓ = 1, and πi = 1, i = 1, . . . , k

  • i.e., for deterministic OR function

P(Y = 1|X1 = x1, . . . , X4 = x4) =      1 1 1

  • 1

1 1 1

  • 1

1 1 1

  • 1

1 1 1

    =      1 1 1 1

  • 1

1 1 1

  • 1

1 1 1

  • 1

1 1 1

    −      1

   

slide-19
SLIDE 19

An example for k = 4, ℓ = 1, and πi = 1, i = 1, . . . , k

  • i.e., for deterministic OR function

P(Y = 1|X1 = x1, . . . , X4 = x4) =      1 1 1

  • 1

1 1 1

  • 1

1 1 1

  • 1

1 1 1

    =      1 1 1 1

  • 1

1 1 1

  • 1

1 1 1

  • 1

1 1 1

    −      1

    = (1, 1) ⊗ (1, 1) ⊗ (1, 1) ⊗ (1, 1) − (1, 0) ⊗ (1, 0) ⊗ (1, 0) ⊗ (1, 0)

slide-20
SLIDE 20

An example for k = 4, ℓ = 1, and πi = 1, i = 1, . . . , k

  • i.e., for deterministic OR function

P(Y = 1|X1 = x1, . . . , X4 = x4) =      1 1 1

  • 1

1 1 1

  • 1

1 1 1

  • 1

1 1 1

    =      1 1 1 1

  • 1

1 1 1

  • 1

1 1 1

  • 1

1 1 1

    −      1

    = (1, 1) ⊗ (1, 1) ⊗ (1, 1) ⊗ (1, 1) − (1, 0) ⊗ (1, 0) ⊗ (1, 0) ⊗ (1, 0) = (1, 1)⊗k − (1, 0)⊗k

slide-21
SLIDE 21

Compilation of the threshold model for ℓ = 1

  • the standard approach

Lauritzen and Spiegelhalter (1988), Jensen et al. (1990), Shafer and Shenoy (1990)

X1 X2 X3 X4 Y

slide-22
SLIDE 22

Compilation of the threshold model for ℓ = 1

  • the standard approach

Lauritzen and Spiegelhalter (1988), Jensen et al. (1990), Shafer and Shenoy (1990)

X1 X2 X3 X4 Y Y X2 X3 X4 X1

slide-23
SLIDE 23

Compilation of the threshold model for ℓ = 1

  • the standard approach

Lauritzen and Spiegelhalter (1988), Jensen et al. (1990), Shafer and Shenoy (1990)

X1 X2 X3 X4 Y Y X2 X3 X4 X1

The total table size is 25 = 32.

slide-24
SLIDE 24

Compilation of the threshold model for ℓ = 1

  • after the suggested decomposition

D´ ıez and Gal´ an (2002), Vomlel (2002), Savick´ y and Vomlel (2007)

X1 X2 X3 X4 Y

slide-25
SLIDE 25

Compilation of the threshold model for ℓ = 1

  • after the suggested decomposition

D´ ıez and Gal´ an (2002), Vomlel (2002), Savick´ y and Vomlel (2007)

X1 X2 X3 X4 Y X1 X2 X3 X4 B Y

slide-26
SLIDE 26

Compilation of the threshold model for ℓ = 1

  • after the suggested decomposition

D´ ıez and Gal´ an (2002), Vomlel (2002), Savick´ y and Vomlel (2007)

X1 X2 X3 X4 Y X1 X2 X3 X4 B Y

The total table size is 5 · 22 = 20.

slide-27
SLIDE 27

Decomposition of T(ℓ, k) into sum of tensor products

  • P(Y = 1|X = x) can be viewed as a tensor T(ℓ, k).
slide-28
SLIDE 28

Decomposition of T(ℓ, k) into sum of tensor products

  • P(Y = 1|X = x) can be viewed as a tensor T(ℓ, k).
  • All dimensions of T(ℓ, k) are equal to 2.
slide-29
SLIDE 29

Decomposition of T(ℓ, k) into sum of tensor products

  • P(Y = 1|X = x) can be viewed as a tensor T(ℓ, k).
  • All dimensions of T(ℓ, k) are equal to 2.
  • T(ℓ, k) is symmetric.
slide-30
SLIDE 30

Decomposition of T(ℓ, k) into sum of tensor products

  • P(Y = 1|X = x) can be viewed as a tensor T(ℓ, k).
  • All dimensions of T(ℓ, k) are equal to 2.
  • T(ℓ, k) is symmetric.

Definition (Symmetric rank)

Symmetric rank (srank) is the minimum number r such that T(ℓ, k) =

r

  • i=1

bi · a⊗k

i

where for i = 1, . . . , k:

  • bi ∈ R and
  • ai are real-valued vectors of length 2.
slide-31
SLIDE 31

Decomposition of T(ℓ, k) into sum of tensor products

  • P(Y = 1|X = x) can be viewed as a tensor T(ℓ, k).
  • All dimensions of T(ℓ, k) are equal to 2.
  • T(ℓ, k) is symmetric.

Definition (Symmetric rank)

Symmetric rank (srank) is the minimum number r such that T(ℓ, k) =

r

  • i=1

bi · a⊗k

i

where for i = 1, . . . , k:

  • bi ∈ R and
  • ai are real-valued vectors of length 2.
  • This decomposition is called Canonical Polyadic (CP) or

CANDECOMP-PARAFAC (CP) or tensor rank-one.

slide-32
SLIDE 32

Theoretical results

Results in the proceedings:

  • srank(T(0, k)) = 1.
slide-33
SLIDE 33

Theoretical results

Results in the proceedings:

  • srank(T(0, k)) = 1.
  • srank(T(k, k)) = 1.
slide-34
SLIDE 34

Theoretical results

Results in the proceedings:

  • srank(T(0, k)) = 1.
  • srank(T(k, k)) = 1.
  • srank(T(1, k)) = 2.
slide-35
SLIDE 35

Theoretical results

Results in the proceedings:

  • srank(T(0, k)) = 1.
  • srank(T(k, k)) = 1.
  • srank(T(1, k)) = 2.
  • srank(T(k − 1, k)) = k.
slide-36
SLIDE 36

Theoretical results

Results in the proceedings:

  • srank(T(0, k)) = 1.
  • srank(T(k, k)) = 1.
  • srank(T(1, k)) = 2.
  • srank(T(k − 1, k)) = k.
  • srank(T(ℓ, k)) k for ℓ = 3, . . . , k − 2.
slide-37
SLIDE 37

Theoretical results

Results in the proceedings:

  • srank(T(0, k)) = 1.
  • srank(T(k, k)) = 1.
  • srank(T(1, k)) = 2.
  • srank(T(k − 1, k)) = k.
  • srank(T(ℓ, k)) k for ℓ = 3, . . . , k − 2.
  • An algorithm for CP-decomposition to k factors.
slide-38
SLIDE 38

Theoretical results

Results in the proceedings:

  • srank(T(0, k)) = 1.
  • srank(T(k, k)) = 1.
  • srank(T(1, k)) = 2.
  • srank(T(k − 1, k)) = k.
  • srank(T(ℓ, k)) k for ℓ = 3, . . . , k − 2.
  • An algorithm for CP-decomposition to k factors.
  • For the noisy threshold the above values represent upper

bounds.

slide-39
SLIDE 39

Theoretical results

Results in the proceedings:

  • srank(T(0, k)) = 1.
  • srank(T(k, k)) = 1.
  • srank(T(1, k)) = 2.
  • srank(T(k − 1, k)) = k.
  • srank(T(ℓ, k)) k for ℓ = 3, . . . , k − 2.
  • An algorithm for CP-decomposition to k factors.
  • For the noisy threshold the above values represent upper

bounds. New results (not in the proceedings):

slide-40
SLIDE 40

Theoretical results

Results in the proceedings:

  • srank(T(0, k)) = 1.
  • srank(T(k, k)) = 1.
  • srank(T(1, k)) = 2.
  • srank(T(k − 1, k)) = k.
  • srank(T(ℓ, k)) k for ℓ = 3, . . . , k − 2.
  • An algorithm for CP-decomposition to k factors.
  • For the noisy threshold the above values represent upper

bounds. New results (not in the proceedings):

  • srank(T(ℓ, k)) k − 1 for ℓ = 3, . . . , k − 2.
slide-41
SLIDE 41

Theoretical results

Results in the proceedings:

  • srank(T(0, k)) = 1.
  • srank(T(k, k)) = 1.
  • srank(T(1, k)) = 2.
  • srank(T(k − 1, k)) = k.
  • srank(T(ℓ, k)) k for ℓ = 3, . . . , k − 2.
  • An algorithm for CP-decomposition to k factors.
  • For the noisy threshold the above values represent upper

bounds. New results (not in the proceedings):

  • srank(T(ℓ, k)) k − 1 for ℓ = 3, . . . , k − 2.
  • An algorithm for CP-decomposition to k − 1 factors. But we

don’t know yet if we can avoid complex numbers for some values of ℓ.

slide-42
SLIDE 42

Experimental results

Comparisons of the total table size:

  • the standard junction tree method versus the CP tensor

decomposition

slide-43
SLIDE 43

Experimental results

Comparisons of the total table size:

  • the standard junction tree method versus the CP tensor

decomposition

  • using QMR subnetworks networks after 14 observations and

after 28 observations.

slide-44
SLIDE 44

Experimental results

Comparisons of the total table size:

  • the standard junction tree method versus the CP tensor

decomposition

  • using QMR subnetworks networks after 14 observations and

after 28 observations.

slide-45
SLIDE 45

Relation to arithmetic circuits (ACs) of Darwiche et al.

  • The model after the CP decomposition can be used as an

input for Ace (Chavira and Darwiche).

slide-46
SLIDE 46

Relation to arithmetic circuits (ACs) of Darwiche et al.

  • The model after the CP decomposition can be used as an

input for Ace (Chavira and Darwiche).

  • Ace supports parent divorcing for noisy-or (i.e., ℓ = 1).
slide-47
SLIDE 47

Relation to arithmetic circuits (ACs) of Darwiche et al.

  • The model after the CP decomposition can be used as an

input for Ace (Chavira and Darwiche).

  • Ace supports parent divorcing for noisy-or (i.e., ℓ = 1).
  • In our PGM’08 paper we reported comparisons of the ACs’

size for random QMR-like networks:

slide-48
SLIDE 48

Relation to arithmetic circuits (ACs) of Darwiche et al.

  • The model after the CP decomposition can be used as an

input for Ace (Chavira and Darwiche).

  • Ace supports parent divorcing for noisy-or (i.e., ℓ = 1).
  • In our PGM’08 paper we reported comparisons of the ACs’

size for random QMR-like networks:

3 4 5 6 7 8 9 3 4 5 6 7 8 9

log10 of the original model AC size log10 of the transformed model AC size

slide-49
SLIDE 49

Conclusions

  • Theoretical results that give upper bounds for symmetric rank
  • f tensors corresponding to threshold functions.
slide-50
SLIDE 50

Conclusions

  • Theoretical results that give upper bounds for symmetric rank
  • f tensors corresponding to threshold functions.
  • An algorithm for CP decomposition of these tensors.
slide-51
SLIDE 51

Conclusions

  • Theoretical results that give upper bounds for symmetric rank
  • f tensors corresponding to threshold functions.
  • An algorithm for CP decomposition of these tensors.
  • The CP tensor decomposition lead to a computational gain in

the order of several magnitudes and made many intractable models manageable.

slide-52
SLIDE 52

Acknowledgments

Thanks to:

  • Frank Jensen from Hugin for providing the Hugin optimal

triangulation method and

slide-53
SLIDE 53

Acknowledgments

Thanks to:

  • Frank Jensen from Hugin for providing the Hugin optimal

triangulation method and

  • Gregory F. Cooper from University of Pittsburgh for the

structural part of QMR-DT model.