Fast approximate inference in hybrid Bayesian networks using dynamic - - PowerPoint PPT Presentation

fast approximate inference in hybrid bayesian networks
SMART_READER_LITE
LIVE PREVIEW

Fast approximate inference in hybrid Bayesian networks using dynamic - - PowerPoint PPT Presentation

Fast approximate inference in hybrid Bayesian networks using dynamic discretisation Helge Langseth 1 , David Marquez 2 , Martin Neil 2 1 Dept. of Computer and Information Science, The Norwegian University of Science and Technology, Norway 2 School


slide-1
SLIDE 1

Fast approximate inference in hybrid Bayesian networks using dynamic discretisation

Helge Langseth1, David Marquez2, Martin Neil2

  • 1Dept. of Computer and Information Science, The Norwegian University
  • f Science and Technology, Norway

2School of Electronic Engineering and Computer Science, Queen Mary,

University of London, UK IWINAC2013, June 2013

Dynamic discretisation

1

slide-2
SLIDE 2

Exact inference and continuous variables

BN’s exact calculation procedure only supports a restricted set of “classical” distributional families:

Continuous variables must have Gaussian distributions. Discrete variables should only have discrete parents. Gaussian parents of Gaussians are partial regression coefficients of their children.

X1 Disc. Y1 Disc. X2 Disc. Y2 N(µx, σ2

x)

X3 Cont. Y3 Disc.

Dynamic discretisation

Background 2

slide-3
SLIDE 3

Requirements for efficient inference in BNs

A distribution family F over X must be closed under the three operations: Restriction: f(x) ∈ F = ⇒ f(y, E = e) ∈ F for any subset of variables {Y1, . . . , Yk} ⊆ {X1, . . . , Xn} and E = X \ Y . Combination: {f1(x) ∈ F, f2(y) ∈ F} = ⇒ f(x ∪ y) = f1(x) · f2(y) ∈ F. Elimination: f(y, z) ∈ F = ⇒

  • y f(y, z) dy ∈ F for every Y ⊆ X and

Z = X \ Y . This is very convenient from an operational point of view, as all the

  • perations required during the inference process can be carried out using

a single unique data structure with bounded complexity. One examples is discrete variables, giving the idea to use discretisation. By discretization, we mean to translate a continuous variable X into a discrete one, with labels that partition X’s domain into hypercubes. E.g., R maps into states ω(1)

x

= (−∞, 0], ω(2)

x

= (0, 1], ω(3)

x

= (1, ∞]. Thus, f(x) is replaced by a probability distribution P

  • X ∈ ω(ℓ)

x

  • .

Dynamic discretisation

Background 3

slide-4
SLIDE 4

Common ideas for discretization

Complexity increases with the number of discretized states. Thus we want an accurate yet efficient representation. “Equal width”: Each bin has the same length. “Equal mass”: Each bin has the same probability mass.

  • 4
  • 2

2 4 0.1 0.2 0.3 0.4

Different behavior, but which one is “better”?

Dynamic discretisation

Discretization of univariate distributions 4

slide-5
SLIDE 5

Distance measure: Kullback-Leibler Divergence

The Kullback-Leibler divergence from f to g is defined as D ( f g ) =

  • x

f(x) log f(x) g(x)

  • dx.

With ¯ f a discretization of f with hypercubes ωℓ, note that D

  • f ¯

f

  • =
  • x∈ωℓ

f(x) log f(x) ¯ fℓ

  • dx.

Each term can be bounded (Kozlov & Koller, 1997):

  • x∈ωℓ

f(x) log f(x) ¯ fℓ

  • dx

  • f ↑

ℓ − ¯

fℓ f ↑

ℓ − f ↓ ℓ

f ↓

ℓ log

  • f ↓

¯ fℓ

  • +

¯ fℓ − f ↓

f ↑

ℓ − f ↓ ℓ

f ↑

ℓ log

  • f ↑

¯ fℓ

  • |ωℓ| .

Dynamic discretisation

Discretization of univariate distributions 5

slide-6
SLIDE 6

A KL-based strategy

Efficient calculation of f ↓

ℓ and f ↑ ℓ (Neil, et al., 2007): 0.4 0.8 1.2 1.6 2 0.2 0.4

Obvious approach to discretize a univariate:

1

Roughly initialize, then calculate KL bound for each interval ωℓ.

2

Choose “worst” interval wrt. KL-bound, and insert a new split-point in the middle of that interval.

3

Calculate KL-bounds for the two new intervals and their

  • neighbors. (The bounds for the other intervals unchanged.)

4

Go to 2.

Dynamic discretisation

Discretization of univariate distributions 6

slide-7
SLIDE 7

A KL-based strategy – Results

Results: “Optimal” results (approximated through simulated annealing). Results from the proposed strategy.

  • 3
  • 2
  • 1

1 2 3 0.2 0.4

  • 3
  • 2
  • 1

1 2 3

(a) Discretisation w/ 10 intervals (b) Discontinuity points, 24 splits The proposed method focuses too much on the steepest areas: The bound is looser there; The approximations of f ↓

ℓ and f ↑ ℓ are less accurate when |f ′| is large.

Dynamic discretisation

Discretization of univariate distributions 7

slide-8
SLIDE 8

Discretization of a Bayes net

Discretization of a full Bayesian net is difficult because. . .

1

When discretizing a variable, we are determining also how we can discretize its children in the model: In the model X → Y , assume X is Uniform(0, 1) and Y |{X = x} ∼ N

  • x, σ2

. Let X be discretized into the two intervals ω(1)

x

=

  • 0, 1

2

  • and ω(2)

x

= 1

2, 1

  • .

The conditional distribution for Y in the discretized model can only be defined through P

  • Y ∈ ω(·)

y |X ∈ ω(j) x

  • for j = 1, 2.

Therefore, it can be impossible to capture the correlation between X and Y ; in particular if σ is small. This is the case no matter how many intervals is used to discretize Y .

Dynamic discretisation

Discretization of multivariate distributions 8

slide-9
SLIDE 9

Discretization of a Bayes net

Discretization of a full Bayesian net is difficult because. . .

1

When discretizing a variable, we are determining also how we can discretize its children in the model.

2

A discretization that is clever before evidence is observed can be useless afterwards. Assume X → Y , X ∼ N(0, 1) and Y |{X = x} ∼ N

  • x, .12

.

  • 4
  • 3
  • 2
  • 1

1 2 3 4 0.1 0.2 0.3 0.4

  • 4
  • 3
  • 2
  • 1

1 2 3 4 1 2 3 4 5

(a) f(x) (b) f(x|y = 2)

Dynamic discretisation

Discretization of multivariate distributions 8

slide-10
SLIDE 10

An apparently naïve approach

This apparently naïve approach to do dynamic discretization was proposed by Neil et al. (2007):

1

Initialize by discretizing each continuous variable “roughly” based

  • n its marginal. Continuous evidence nodes are discretized so

that there is one interval closely around the observation.

2

Do a belief update in the discretized model.

3

For each unobserved continuous variable: If applicable, add one new split-point where it helps that marginal the most.

4

If we are not finished: Go to step 2.

Dynamic discretisation

Discretization of multivariate distributions 9

slide-11
SLIDE 11

An apparently naïve approach

This apparently naïve approach to do dynamic discretization was proposed by Neil et al. (2007):

1

Initialize by discretizing each continuous variable “roughly” based

  • n its marginal. Continuous evidence nodes are discretized so

that there is one interval closely around the observation.

2

Do a belief update in the discretized model.

3

For each unobserved continuous variable: If applicable, add one new split-point where it helps that marginal the most.

4

If we are not finished: Go to step 2. Mathematical property: This algorithm minimizes

i D

  • f(xi) ¯

f(xi)

  • instead of

D

  • f(x) ¯

f(x)

  • =
  • i

D

  • f(xi|pa (xi)) ¯

f(xi|pa (xi))

  • .

Dynamic discretisation

Discretization of multivariate distributions 9

slide-12
SLIDE 12

An apparently naïve approach

This apparently naïve approach to do dynamic discretization was proposed by Neil et al. (2007):

1

Initialize by discretizing each continuous variable “roughly” based

  • n its marginal. Continuous evidence nodes are discretized so

that there is one interval closely around the observation.

2

Do a belief update in the discretized model.

3

For each unobserved continuous variable: If applicable, add one new split-point where it helps that marginal the most.

4

If we are not finished: Go to step 2. Mathematical property: This algorithm minimizes

i D

  • f(xi) ¯

f(xi)

  • instead of

D

  • f(x) ¯

f(x)

  • =
  • i

D

  • f(xi|pa (xi)) ¯

f(xi|pa (xi))

  • .

Stress-test: A worst-case scenario for the naïve algorithm is the model X → Y , X ∼ N(0, σ2

x) and Y |{X = x} ∼ N

  • x, σ2

y

  • when σ2

x ≫ σ2 y.

Dynamic discretisation

Discretization of multivariate distributions 9

slide-13
SLIDE 13

Results of the stress-test

Model: X → Y , X ∼ N(0, 1010) and Y |{X = x} ∼ N

  • x, 10−6

. Task: Calculate f(y) — Although we know it is N

  • 0, 1010 + 10−6

. Vanilla version of the algorithm:

  • 3
  • 2
  • 1

1 2 3 x 10

5

1 2 3 4 x 10

  • 6

Unsatisfactory: Result is way too “bumpy”.

Dynamic discretisation

Discretization of multivariate distributions 10

slide-14
SLIDE 14

Results of the stress-test

Model: X → Y , X ∼ N(0, 1010) and Y |{X = x} ∼ N

  • x, 10−6

. Task: Calculate f(y) — Although we know it is N

  • 0, 1010 + 10−6

. Vanilla version of the algorithm: Examination of the error shows the problem is due to numerical instability when we calculate P(Y ∈ ωy|x ∈ ωx) ∝

  • x∈ωx
  • y∈ωy

Small support.

f(y|x) dy

  • Difficult.

f(x) dx. We propose a smoothing technique based on tempering used in MCMC. Salvages the numerical problems, without significant increase in computational burden.

Dynamic discretisation

Discretization of multivariate distributions 10

slide-15
SLIDE 15

Results of the stress-test

Model: X → Y , X ∼ N(0, 1010) and Y |{X = x} ∼ N

  • x, 10−6

. Task: Calculate f(y) — Although we know it is N

  • 0, 1010 + 10−6

. Tempering/Smoothing version of the algorithm:

  • 3
  • 2
  • 1

1 2 3 x 10

5

1 2 3 4 x 10

  • 6

Satisfactory: Result are close to correct result.

Dynamic discretisation

Discretization of multivariate distributions 10

slide-16
SLIDE 16

Conclusions

We have considered dynamic discretization for approximate inference in hybrid Bayesian networks. The algorithm is considerably faster than related techniques (e.g., Kozlov & Koller’s algorithm). The method can be implemented on top of any standard BN inference scheme for discrete models. We have investigated the nature of the approximations being made, tested their validity, and proposed a scheme to circumvent difficult situations. The dynamic discretization algorithm is implemented in AgenaRisk. Download a free version from http://www.agenarisk.com/.

Dynamic discretisation

Conclusions 11