approximate inference
play

Approximate inference: Sampling methods Probabilistic Graphical - PowerPoint PPT Presentation

Approximate inference: Sampling methods Probabilistic Graphical Models Sharif University of Technology Spring 2018 Soleymani Approximate inference Approximate inference techniques Deterministic approximation Variational algorithms


  1. Approximate inference: Sampling methods Probabilistic Graphical Models Sharif University of Technology Spring 2018 Soleymani

  2. Approximate inference  Approximate inference techniques  Deterministic approximation  Variational algorithms  Stochastic simulation / sampling methods 2

  3. Sampling-based estimation  Assume that 𝒠 = 𝑦 (1) , … , 𝑦 (𝑂) shows the set of i.i.d. samples drawn from the desired distribution 𝑄  For any distribution 𝑄 , function 𝑔 , we can estimate 𝐹 𝑄 𝑔 : 𝑂 𝐹 𝑄 𝑔 ≈ 1 𝑔 𝑦 𝑜 𝑂 𝑜=1 Empirical expectation  Expectations reveal interesting properties about distribution 𝑄  Means and variance of 𝑄  Probability of events  E.g., we can find 𝑄(𝑦 = 𝑙) by estimating 𝐹 𝑄 𝑔 where 𝑔 𝑦 = 𝐽 𝑦 = 𝑙  We can use a stochastic representation of a complex distribution 3

  4. Bounds on error  Hoeffding bound  additive bound on error ≤ 2𝑓 −2𝑂𝜗 2  𝑄 𝜄 ∉ 𝜄 − 𝜗, 𝜄 + 𝜗 𝒠 1 𝑂 𝑔 𝒚 (𝑜) where 𝒚 (𝑜) ~𝑄(𝒚) 𝑂 𝑜=1 𝜄 = 𝜄 = 𝐹 𝑄 𝒚 𝑔 𝒚  Chernouf bound ≤ 2𝑓 − 𝑂𝜄𝜗 2  𝑄 𝜄 ∉ 𝜄 1 − 𝜗 , 𝜄 1 + 𝜗 3 𝒠  multiplicative bound on error 4

  5. The mean and variance of the estimator  For samples drawn independently from the distribution 𝑄 : 𝑂 𝑔 = 1 𝑔 𝑦 𝑜 𝑂 𝑜=1 𝐹 𝑔 = 𝐹 𝑔 𝑔 = 1 𝑤𝑏𝑠 2 𝑀 𝐹 𝑔 − 𝐹 𝑔 5

  6. Monte Carlo methods  Using a set of samples to find the answer of an inference query  expectations can be approximated using sample-based averages  Asymptotically exact and easy to apply to arbitrary problems  Challenges:  Drawing samples from many distributions is not trivial  Are the gathered samples enough?  Are all samples useful, or equally useful? 6

  7. Generating samples form a distribution  Assume that we have an algorithm that generates (pseudo-) random numbers distributed uniformly over (0,1)  How do we generate a sample from these distributions. First, we see simple cases:  Bernoulli  Multinomial  Other standard distributions 7

  8. Transformation technique  We intend to generate samples form standard distributions  map the values generated by uniform random number generator such the resulting mapped samples have the desired distribution.  Choose function 𝑔 . such that the resulting values of 𝑧 = 𝑔 𝑦 have some specific desired distribution 𝑄 𝑧 : 𝑒𝑦 𝑄 𝑧 = 𝑄 𝑦 𝑒𝑧  Since 𝑄 𝑦 = 1 , we have: 𝑧 𝑄 𝑧 ′ 𝑒𝑧′ 𝑦 = −∞ 𝑧 𝑄 𝑧 ′ 𝑒𝑧′ ⇒ 𝑧 = ℎ −1 𝑦  If we define ℎ 𝑧 ≡ −∞ 8

  9. Transformation technique  Cumulative CDF Sampling:  If 𝑦~𝑉(0,1) , and ℎ(. ) is the CDF of 𝑄 , then ℎ −1 (𝑦) ~𝑄 .  Since we need to calculate and then invert the indefinite integral of 𝑄 , it will only be feasible for a limited number of simple distributions  Thus, we will see first rejection sampling and importance sampling (in the next slides) that can be used as important components in the more general sampling techniques. 9

  10. Rejection sampling  Suppose we wish to sample from 𝑄(𝒚) = 𝑄(𝒚)/𝑎 .  𝑄(𝒚) is difficult to sample, but 𝑄(𝒚) is easy to evaluate  We choose a simpler (proposal) distribution 𝑅 𝒚 that we can sample from it more easily  Where ∃𝑙, 𝑙𝑅(𝒚) ≥ 𝑄 𝒚  Sample from 𝑅 𝒚 : 𝒚 ∗ ~𝑅(𝒚) 𝑄 𝒚 ∗  accept 𝒚 ∗ with probability 𝑙𝑅(𝒚 ∗ ) 𝑙𝑅(𝑦) 𝑄(𝑦) 10 𝑦 ∗ 𝑦

  11. Rejection sampling  Correctness: 𝑄 𝒚 𝑙𝑅(𝒚) 𝑅(𝒚) 𝑄 𝒚 = 𝑄 𝒚 𝑒𝒚 = 𝑄 𝒚 𝑄 𝒚 𝑙𝑅 𝒚 𝑅 𝒚 𝑒𝒚 Probability of acceptance: 𝑄 𝒚 𝑒𝒚 𝑄 𝒚 𝑄 𝑏𝑑𝑑𝑓𝑞𝑢 = 𝑙𝑅 𝒚 𝑅 𝒚 𝑒𝒚 = 𝑙 11

  12. Adaptive rejection sampling  It is difficult to determine a suitable analytic form for 𝑅  We can use envelope functions to define 𝑅 when 𝑄(𝑦) is log concave  Intersections of tangent lines are used to construct 𝑅  Initially, gradient are evaluated at some initial set of grid points and tangent lines are found accordingly.  In each iteration, a sample can be drawn from the envelope distribution.  the envelope distribution comprises a piecewise exponential distribution and drawing a sample from it is straight forward  If the sample is rejected, then it is incorporated into the set of grid points, a new tangent line is computed, and 𝑅 is thereby refined. ln 𝑄(𝑦) 12 𝑦 1 𝑦 2 𝑦 3 𝑦

  13. High dimensional rejection sampling  Problem: low acceptance rate rejection sampling in high dimensional spaces  exponential decrease of acceptance rate with dimensionality  Example:  Using 𝑅 = 𝑂(𝝂, 𝜏 𝑟 2 𝑱) to sample 𝑄 = 𝑂(𝝂, 𝜏 𝑞 2 𝑱)  If 𝜏 𝑟 exceeds 𝜏 𝑞 by 1%, and 𝑒 = 1000 𝑒 𝜏 𝑟 ≈ 20,000 and so the optimal acceptance  𝜏 𝑄  rate is 1/20,000 that is too small 𝑄(𝑦) 13 𝑦

  14. Importance sampling  Suppose sampling from 𝑄 is hard.  Simpler proposal distribution 𝑅 is used instead.  If 𝑅 dominates 𝑄 (i.e., 𝑅(𝒚) > 0 whenever 𝑄(𝒚) > 0 ), we can sample from 𝑅 and reweight the obtained samples: = 𝑔 𝒚 𝑄 𝒚 𝑒𝒚 = 𝑔 𝒚 𝑄 𝒚 𝐹 𝑄 𝑔 𝒚 𝑅 𝒚 𝑅 𝒚 𝑒𝒚 𝑄 𝒚 (𝑜) 1 𝒚 𝑜 ~𝑅 𝒚 𝑂 𝑔 𝒚 𝑜 𝑂 𝑜=1 𝐹 𝑄 𝑔 𝒚 ≈ 𝑅 𝒚 𝑜 𝑂 ≈ 1 𝑔 𝒚 𝑜 𝑥 (𝑜) 𝐹 𝑄 𝑔 𝒚 𝑂 𝑥 (𝑜) = 𝑄 𝒚 (𝑜) 𝑅 𝒚 𝑜 𝑜=1 14

  15. Normalized importance sampling  Suppose that we can only evaluate 𝑄(𝒚) where 𝑄 𝒚 = 𝑄(𝒚)/𝑎 : = 𝑔 𝒚 𝑄 𝒚 𝑒𝒚 = 𝑎 𝑅 𝑄 𝒚 𝐹 𝑄 𝑔 𝒚 𝑔 𝒚 𝑅 𝒚 𝑅 𝒚 𝑒𝒚 𝑎 𝑄 𝑎 𝑄 = 1 𝑄 𝒚 𝑄 𝒚 𝑒𝒚 = 𝑅 𝒚 𝑅 𝒚 𝑒𝒚 = 𝑠 𝒚 𝑅 𝒚 𝑒𝒚 𝑎 𝑅 𝑎 𝑅 𝑄 𝒚 𝑔 𝒚 𝑠 𝒚 𝑅 𝒚 𝑒𝒚 𝑠 𝒚 = 𝐹 𝑄 𝑔 𝒚 = 𝑅 𝒚 𝑠 𝒚 𝑅 𝒚 𝑒𝒚 1 𝑂 𝑔 𝒚 𝑜 𝑠 (𝑜) 𝑂 𝑜=1 𝒚 𝑜 ~𝑅 𝒚 𝐹 𝑄 𝑔 𝒚 ≈ 1 𝑂 𝑠 (𝑛) 𝑂 𝑛=1 𝑂 𝑔 𝒚 𝑜 𝑥 (𝑜) 𝐹 𝑄 𝑔 𝒚 ≈ 𝑠 (𝑜) 𝑥 (𝑜) = 𝑂 𝑠 (𝑛) 𝑛=1 𝑜=1 15

  16. Importance sampling: problem  Importance sampling depends on how well 𝑅 matches 𝑄  For mismatch distributions, weights may be dominated by few samples having large weights, with the remaining weights being relatively insignificant  It is common that 𝑄(𝒚)𝑔(𝒚) is strongly varying and has a significant proportion of its mass concentrated in a small region  The problem is severe if none of the samples falls in the regions where 𝑄(𝒚)𝑔(𝒚) is large.  The estimate of the expectation may be severely wrong w hile the variance of 𝑠 (𝑜) can be small 𝑔(𝑦) 𝑄(𝑦) 𝑅(𝑦) A key requirement for 𝑅(𝒚) is that it should not be small or zero in regions where 𝑄(𝒚) may be significant. 16 [Bishop book]

  17. Sampling methods for graphical models  DGMs:  Forward (or ancestral) sampling  Likelihood weighted sampling  For UGMs, there is no one-pass sampling strategy that will sample even from the prior distribution with no observed variables.  Instead, computationally more expensive techniques such as Gibbs sampling exist that will be introduced in the next slides 17

  18. Sampling the joint distribution represented by a BN  Sample the joint distribution by ancestral sampling  Example:  Sample from 𝑄(𝐸) ⇒ 𝐸 = 𝑒 1  Sample from 𝑄(𝐽) ⇒ 𝐽 = 𝑗 0  Sample from 𝑄 𝐻 𝑗 0 , 𝑒 1 ⇒ 𝐻 = 𝑕 3  Sample from 𝑄 𝑇 𝑗 0 ⇒ 𝑇 = 𝑡 0  Sample from 𝑄 𝑀 𝑕 3 ⇒ 𝑀 = 𝑚 0  One sample 𝑒 1 , 𝑗 0 , 𝑕 3 , 𝑡 0 , 𝑚 0 was generated 18

  19. Forward sampling in a BN  Given a BN, and number of samples 𝑂  Choose a topological ordering on variables, e.g., 𝑌 1 , … , 𝑌 𝑁  For j = 1 to N  For i = 1 to M (𝑘) from the distribution 𝑄(𝑌 𝑗 |𝒚 𝑄𝑏 𝑌𝑗 (𝑘) )  Sample 𝑦 𝑗 (𝑘) , … , 𝑦 𝑁 (𝑘) } to the sample set  Add {𝑦 1 19

  20. Sampling for conditional probability query  𝑄 𝑗 1 𝑚 0 , 𝑡 0 ) =?  Looking at the samples we can count:  𝑂 : # of samples  𝑂 𝑓 : # of samples in which the evidence holds (𝑀 = 𝑚 0 , 𝑇 = 𝑡 0 )  𝑂 𝐽 : # of samples where the joint is true (𝑀 = 𝑚 0 , 𝑇 = 𝑡 0 , 𝐽 = 𝑗 1 )  For a large enough 𝑂  𝑂 𝑓 /𝑂 ≈ 𝑄(𝑚 0 , 𝑡 0 )  𝑂 𝐽 /𝑂 ≈ 𝑄(𝑗 1 , 𝑚 0 , 𝑡 0 )  And so, we can set 𝑄(𝑗 1 ,𝑚 0 ,𝑡 0 )  𝑄 𝑗 1 𝑚 0 , 𝑡 0 ) = 𝑄(𝑚 0 ,𝑡 0 ) ≈ 𝑂 𝐽 /𝑂 𝑓 20

  21. Using rejection sampling to compute 𝑄(𝒀|𝒇)  Given a BN, a query 𝑄(𝒀|𝒇) , and number of samples 𝑂  Choose a topological ordering on variables, e.g., 𝑌 1 , … , 𝑌 𝑁  j=1  While j<N  For i = 1 to M (𝑘) from the distribution 𝑄(𝑌 𝑗 |𝒚 𝑄𝑏 𝑌𝑗 (𝑘) )  Sample 𝑦 𝑗 (𝑘) , … , 𝑦 𝑁 (𝑘) } consistent with evidence 𝒇 add it to sample set and  If {𝑦 1 j=j+1  Use samples to compute 𝑄(𝒀|𝒇) 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend