fast approximate inference in hybrid bayesian networks
play

Fast approximate inference in hybrid Bayesian networks using dynamic - PowerPoint PPT Presentation

Fast approximate inference in hybrid Bayesian networks using dynamic discretisation Helge Langseth 1 , David Marquez 2 , Martin Neil 2 1 Dept. of Computer and Information Science, The Norwegian University of Science and Technology, Norway 2 School


  1. Fast approximate inference in hybrid Bayesian networks using dynamic discretisation Helge Langseth 1 , David Marquez 2 , Martin Neil 2 1 Dept. of Computer and Information Science, The Norwegian University of Science and Technology, Norway 2 School of Electronic Engineering and Computer Science, Queen Mary, University of London, UK IWINAC2013, June 2013 Dynamic discretisation 1

  2. Exact inference and continuous variables BN’s exact calculation procedure only supports a restricted set of “classical” distributional families: Continuous variables must have Gaussian distributions. Discrete variables should only have discrete parents. Gaussian parents of Gaussians are partial regression coefficients of their children. Disc. Disc. Cont. X 1 X 2 X 3 Y 1 Y 2 Y 3 N ( µ x , σ 2 x ) Disc. Disc. Dynamic discretisation Background 2

  3. Requirements for efficient inference in BNs A distribution family F over X must be closed under the three operations: Restriction: f ( x ) ∈ F = ⇒ f ( y , E = e ) ∈ F for any subset of variables { Y 1 , . . . , Y k } ⊆ { X 1 , . . . , X n } and E = X \ Y . Combination: { f 1 ( x ) ∈ F , f 2 ( y ) ∈ F} = ⇒ f ( x ∪ y ) = f 1 ( x ) · f 2 ( y ) ∈ F . � Elimination: f ( y , z ) ∈ F = ⇒ y f ( y , z ) d y ∈ F for every Y ⊆ X and Z = X \ Y . This is very convenient from an operational point of view, as all the operations required during the inference process can be carried out using a single unique data structure with bounded complexity . One examples is discrete variables , giving the idea to use discretisation . By discretization , we mean to translate a continuous variable X into a discrete one, with labels that partition X ’s domain into hypercubes . E.g., R maps into states ω (1) = ( −∞ , 0] , ω (2) = (0 , 1] , ω (3) = (1 , ∞ ] . x x x � � X ∈ ω ( ℓ ) Thus, f ( x ) is replaced by a probability distribution P . x Dynamic discretisation Background 3

  4. Common ideas for discretization Complexity increases with the number of discretized states. Thus we want an accurate yet efficient representation. “Equal width”: Each bin has the same length . “Equal mass”: Each bin has the same probability mass . 0.4 0.3 0.2 0.1 0 -4 -2 0 2 4 Different behavior, but which one is “better”? Dynamic discretisation Discretization of univariate distributions 4

  5. Distance measure: Kullback-Leibler Divergence The Kullback-Leibler divergence from f to g is defined as � f ( x ) � � D ( f � g ) = f ( x ) log d x . g ( x ) x With ¯ f a discretization of f with hypercubes ω ℓ , note that � � f ( x ) � � � � f � ¯ D f = f ( x ) log d x . ¯ f ℓ x ∈ ω ℓ ℓ Each term can be bounded (Kozlov & Koller, 1997): � f ( x ) � � f ( x ) log ≤ d x ¯ f ℓ x ∈ ω ℓ � � � � �� ℓ − ¯ ¯ f ↑ f ↓ f ℓ − f ↓ f ↑ f ℓ f ↓ f ↑ ℓ ℓ ℓ | ω ℓ | . ℓ log + ℓ log ¯ ¯ f ↑ ℓ − f ↓ f ↑ ℓ − f ↓ f ℓ f ℓ ℓ ℓ Dynamic discretisation Discretization of univariate distributions 5

  6. A KL-based strategy Efficient calculation of f ↓ ℓ and f ↑ ℓ (Neil, et al., 2007): 0.4 0.2 0 0 0.4 0.8 1.2 1.6 2 Obvious approach to discretize a univariate: Roughly initialize , then calculate KL bound for each interval ω ℓ . 1 Choose “worst” interval wrt. KL-bound, and insert a new 2 split-point in the middle of that interval. Calculate KL-bounds for the two new intervals and their 3 neighbors . (The bounds for the other intervals unchanged .) Go to 2. 4 Dynamic discretisation Discretization of univariate distributions 6

  7. A KL-based strategy – Results Results: “Optimal” results (approximated through simulated annealing). Results from the proposed strategy . 0.4 0.2 0 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 (a) Discretisation w/ 10 intervals (b) Discontinuity points, 24 splits The proposed method focuses too much on the steepest areas: The bound is looser there; The approximations of f ↓ ℓ and f ↑ ℓ are less accurate when | f ′ | is large. Dynamic discretisation Discretization of univariate distributions 7

  8. Discretization of a Bayes net Discretization of a full Bayesian net is difficult because. . . When discretizing a variable, we are determining also how we can 1 discretize its children in the model: In the model X → Y , assume X is Uniform (0 , 1) and � x, σ 2 � Y |{ X = x } ∼ N . Let X be discretized into the two intervals � 1 � � � ω (1) and ω (2) 0 , 1 = = 2 , 1 . x x 2 The conditional distribution for Y in the discretized model can only be � � Y ∈ ω ( · ) y | X ∈ ω ( j ) defined through P for j = 1 , 2 . x Therefore, it can be impossible to capture the correlation between X and Y ; in particular if σ is small. This is the case no matter how many intervals is used to discretize Y . Dynamic discretisation Discretization of multivariate distributions 8

  9. Discretization of a Bayes net Discretization of a full Bayesian net is difficult because. . . When discretizing a variable, we are determining also how we can 1 discretize its children in the model. A discretization that is clever before evidence is observed can be useless 2 afterwards . � x, . 1 2 � Assume X → Y , X ∼ N (0 , 1) and Y |{ X = x } ∼ N . 0.4 5 4 0.3 3 0.2 2 0.1 1 0 0 -4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4 (a) f ( x ) (b) f ( x | y = 2) Dynamic discretisation Discretization of multivariate distributions 8

  10. An apparently naïve approach This apparently naïve approach to do dynamic discretization was proposed by Neil et al. (2007): Initialize by discretizing each continuous variable “roughly” based 1 on its marginal. Continuous evidence nodes are discretized so that there is one interval closely around the observation. Do a belief update in the discretized model. 2 For each unobserved continuous variable: If applicable, add one 3 new split-point where it helps that marginal the most. If we are not finished: Go to step 2. 4 Dynamic discretisation Discretization of multivariate distributions 9

  11. An apparently naïve approach This apparently naïve approach to do dynamic discretization was proposed by Neil et al. (2007): Initialize by discretizing each continuous variable “roughly” based 1 on its marginal. Continuous evidence nodes are discretized so that there is one interval closely around the observation. Do a belief update in the discretized model. 2 For each unobserved continuous variable: If applicable, add one 3 new split-point where it helps that marginal the most. If we are not finished: Go to step 2. 4 Mathematical property: � � This algorithm minimizes � f ( x i ) � ¯ i D f ( x i ) instead of � � � � � f ( x ) � ¯ f ( x i | pa ( x i )) � ¯ D f ( x ) = D f ( x i | pa ( x i )) . i Dynamic discretisation Discretization of multivariate distributions 9

  12. An apparently naïve approach This apparently naïve approach to do dynamic discretization was proposed by Neil et al. (2007): Initialize by discretizing each continuous variable “roughly” based 1 on its marginal. Continuous evidence nodes are discretized so that there is one interval closely around the observation. Do a belief update in the discretized model. 2 For each unobserved continuous variable: If applicable, add one 3 new split-point where it helps that marginal the most. If we are not finished: Go to step 2. 4 Mathematical property: � � This algorithm minimizes � f ( x i ) � ¯ i D f ( x i ) instead of � � � � � f ( x ) � ¯ f ( x i | pa ( x i )) � ¯ D f ( x ) = D f ( x i | pa ( x i )) . i Stress-test: A worst-case scenario for the naïve algorithm is the model X → Y , � � X ∼ N (0 , σ 2 x, σ 2 when σ 2 x ≫ σ 2 x ) and Y |{ X = x } ∼ N y . y Dynamic discretisation Discretization of multivariate distributions 9

  13. Results of the stress-test � x, 10 − 6 � Model: X → Y , X ∼ N (0 , 10 10 ) and Y |{ X = x } ∼ N . � 0 , 10 10 + 10 − 6 � Task: Calculate f ( y ) — Although we know it is N . Vanilla version of the algorithm: -6 x 10 4 3 2 1 0 -3 -2 -1 0 1 2 3 5 x 10 Unsatisfactory: Result is way too “bumpy”. Dynamic discretisation Discretization of multivariate distributions 10

  14. Results of the stress-test � x, 10 − 6 � Model: X → Y , X ∼ N (0 , 10 10 ) and Y |{ X = x } ∼ N . � 0 , 10 10 + 10 − 6 � Task: Calculate f ( y ) — Although we know it is N . Vanilla version of the algorithm: Examination of the error shows the problem is due to numerical instability when we calculate Small support. � � � �� � P ( Y ∈ ω y | x ∈ ω x ) ∝ f ( y | x ) dy f ( x ) dx. x ∈ ω x y ∈ ω y � �� � Difficult. We propose a smoothing technique based on tempering used in MCMC. Salvages the numerical problems, without significant increase in computational burden. Dynamic discretisation Discretization of multivariate distributions 10

  15. Results of the stress-test � x, 10 − 6 � Model: X → Y , X ∼ N (0 , 10 10 ) and Y |{ X = x } ∼ N . � 0 , 10 10 + 10 − 6 � Task: Calculate f ( y ) — Although we know it is N . Tempering/Smoothing version of the algorithm: -6 x 10 4 3 2 1 0 -3 -2 -1 0 1 2 3 5 x 10 Satisfactory: Result are close to correct result. Dynamic discretisation Discretization of multivariate distributions 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend