advanced sampling algorithms mobashir mohammad hirak
play

+ Advanced Sampling Algorithms + Mobashir Mohammad Hirak Sarkar - PowerPoint PPT Presentation

+ Advanced Sampling Algorithms + Mobashir Mohammad Hirak Sarkar Parvathy Sudhir Yamilet Serrano Llerena Advanced Sampling Aditya Kulkarni Algorithms Tobias Bertelsen Nirandika Wanigasekara Malay Singh +


  1. + 2D Ising Model (3) 35 โˆ’๐›พ ๐‘ก ๐‘ฆ๐‘— ๐‘ก(๐‘ฆ๐‘˜) ๐‘ฆ๐‘—๐‘ฆ๐‘˜ โˆˆ๐น ๐‘“ โˆ’๐›พ๐ผ(๐‘ก) ๐‘“ ๏ฎ P( ๐‘ก ) = = ๐‘Ž ๐‘Ž ๏ฎ If ๐›พ = 0 all spin configurations have same probability ๏ฎ If ๐›พ > 0 lower energy preferred

  2. + Exact Inference is Hard 36 ๏ฎ Posterior distribution over ๐‘ก ๏ฎ If ๐‘ is any observed variable ๐‘ž ๐‘ก,๐‘ง ๏ฎ ๐‘„ ๐‘ก ๐‘) = ๐‘ž(๐‘ก,๐‘ง) ๏ฎ Intractable computation ๏ฎ Joint probability distribution - 2 ๐‘‚ possible combinations of ๐‘ก ๏ฎ Marginal probability distribution at a site ๏ฎ MAP estimation

  3. + The Big Question 37 ๏ฎ Given ๐œŒ on ๐‘‡ = ๐‘ก 1 , ๐‘ก 2 , โ€ฆ , ๐‘ก ๐‘œ simulate a random object with distribution ๐œŒ

  4. + Methods 38 ๏ฎ Generate random samples to estimate a quantity ๏ฎ Samples are generated โ€œMarkov - chain styleโ€ ๏ฎ Markov Chain Monte Carlo (MCMC) ๏ฎ Propp Wilson Simulation ๏ฎ Las Vegas variant ๏ฎ Sandwiching ๏ฎ Improvement on Propp Wilson

  5. + MARKOV CHAIN MONTE CARLO METHOD (MCMC) YAMILET R. SERRANO LL. 39

  6. + Recall 40 Given a probability distribution ฯ€ on S = {s 1 , โ€ฆ, s k }, how do we simulate a random object with distribution ฯ€ ?

  7. + Intuition 41 Given a probability distribution ฯ€ on S = {s 1 , โ€ฆ, s k }, how do we simulate a random object with distribution ฯ€ ? High Dimension Space

  8. + MCMC 42 1. Construct an irreducible and aperiodic Markov Chain [X 0 ,X 1 , โ€ฆ ], whose stationary distribution ฯ€ . 2. If we run the chain with arbitrary initial distribution then 1. The Markov Chain Convergence Theorem guarantees that the distribution of the chain at time n converges to ฯ€ . 3. Hence, if we run the chain for a sufficiently long time n, then the distribution of X n , will be very close to ฯ€ . So it can be used as a sample.

  9. + MCMC(2) 43 ๏ฎ Generally, two types of MCMC algorithm ๏ฎ Metropolis Hasting ๏ฎ Gibbs Sampling

  10. + Metropolis Hastings Algorithm 44 ๏ฎ Original method: ๏ฎ Metropolis, Rosenbluth, Rosenbluth, Teller and Teller (1953). ๏ฎ Generalized by Hasting in 1970. ๏ฎ Rediscovered by Tanner and Wong (1987) and Gelfang and Smith (1990) ๏ฎ Is one way to implement MCMC. Nicholas Metropolis

  11. + Metropolis Hastings Algorithm 45 Basic Idea GIVEN: A probability distribution ฯ€ on S = {s 1 , โ€ฆ, s k } GOAL : Approx. sample from ฯ€ Start with a proposal distribution Q(x,y) x: current state โ€ข Q(x,y) specifies transition of Markov Chain y: new proposal โ€ข Q(x,y) plays the role of the transition matrix By accepting/rejecting the proposal, MH simulates a Markov Chain, whose stationary distribution is ฯ€

  12. + Metropolis Hastings Algorithm 46 Algorithm A probability distribution ฯ€ on S = {s 1 , โ€ฆ, s k } GIVEN: Approx. sample from ฯ€ GOAL : Given current sample, ๏ƒผ Draw y from the proposal distribution, ๏ƒผ Draw U ~ (0,1) and update where the acceptance probability is

  13. + Metropolis Hastings Algorithm 47 The Ising Model Consider m sites around a circle. Each site i can have one of two spins x i ๏ƒŽ {-1,1} The target distribution :

  14. + Metropolis Hastings Algorithm The Ising Model Target Distribution: Proposal Distribution: 1. Randomly pick one out of the m spins 2. Flip its sign Acceptance probability (say i-th spin flipped):

  15. + Metropolis Hastings Algorithm 49 The Ising Model Image has been taken from Montecarlo investigation of the Ising Model(2006) โ€“ Tobin Fricke

  16. + Disadvantages of MCMC 50 ๏ฎ No matter how large n is taken to be in the MCMC algorithm, there will still be some discrepancy between the distribution of the output and the target distribution ฯ€ . ๏ฎ In order to make the previous error small, we need to figure out how large n needs to be.

  17. + Bounds on Convergence of MCMC Aditya Kulkarni 51

  18. + Seminar on probabilities 52 (Strasunbourge - 1983) ๏ฎ If it is difficult to obtain asymptotic bounds on the convergence time of MCMC algorithms for Ising models, use quantitative bounds ๏ฎ Use a characteristics of Ising model MCMC algorithm to obtain quantitative bounds David Aldous

  19. + What we need to ensure 53 ๏ฎ As we go along the time We approach to the stationary distribution in a monotonically decreasing fashion. ๐‘’ ๐‘ข โ†’ 0 as ๐‘ข โ†’ โˆž ๏ฎ When we stop The sample should follow a distribution which is not further apart from a stationary distribution by some factor ๐œป . ๐‘’ ๐‘ข โ‰ค ๐œ

  20. + Total Variation Distance || โ‹… || ๐‘ˆ๐‘Š 54 ๏ฎ Given two distributions ๐‘ž and ๐‘Ÿ over a finite set of states ๐‘‡ , ๏ฎ The total variation distance โ€– โ‹… โ€– ๐‘ˆ๐‘Š is 1 1 | ๐‘ž โˆ’ ๐‘Ÿ | ๐‘ˆ๐‘Š = 2 ||๐‘ž โˆ’ ๐‘Ÿ|| 1 = 2 ๐‘ฆโˆˆ๐‘‡ |๐‘ž ๐‘ฆ โˆ’ ๐‘Ÿ ๐‘ฆ |

  21. + Convergence Time 55 ๏ฎ Let ๐‘Œ = ๐‘Œ 0 , ๐‘Œ 1 , โ€ฆ be a Markov Chain of a state space with stationary distribution ๐œŒ ๏ฎ We define ๐‘’ ๐‘ข as the worst total variation distance at time ๐‘ข . ๐‘’ ๐‘ข = ๐‘›๐‘๐‘ฆ ๐‘ฆโˆˆ๐‘Š ||๐‘„ ๐‘Œ ๐‘ข ๐‘Œ 0 = ๐‘ฆ โˆ’ ๐œŒ|| ๐‘ˆ๐‘Š ๏ฎ The mixing time ๐‘ข is the minimum time ๐‘ข such that ๐‘’ ๐‘ข is at most ๐œ— . ฯ„ (๐œ) = min {๐‘ข โˆถ ๐‘’ ๐‘ข โ‰ค ๐œ} 1 2๐‘“ = min {๐‘ข โˆถ ๐‘’ ๐‘ข โ‰ค 1 2๐‘“ } ฯ„ = ฯ„ ๏ฎ Define

  22. + Quantitative results 56 1 ๐’† ๐’– ๐Ÿ ๐Ÿ‘๐’‡ ๐œป ฯ„ ฯ„ (๐œป) 0 ๐’–

  23. + Lemma 57 ๏ฎ Consider two random walks started from state i and state j Define ๐œ ๐‘ข as the worst total variation distance between their respective probability distributions at time ๐‘ข a. ๐œ ๐‘ข โ‰ค 2๐‘’ ๐‘ข b. ๐‘’(๐‘ข) is decreasing

  24. + Upper bound on d ( ๐‘ข ) proof 58 From part a and the definition of ฯ„ = ฯ„ 1 2๐‘“ = min {๐‘ข โˆถ ๐‘’ ๐‘ข โ‰ค 1 2๐‘“ } we get ๐œ ๐œ โ‰ค ๐‘“ โˆ’1 Also from part b, we deduce that upper bound of ๐œ at a particular time ๐‘ข 0 gives upper bound for later times ๐œ ๐‘ข โ‰ค (๐œ(๐‘ข 0 )) ๐‘œ ; ๐‘œ๐‘ข 0 โ‰ค ๐‘ข โ‰ค (๐‘œ + 1)๐‘ข 0 ๐‘ข ๐‘ข 0 โˆ’1) ๐œ ๐‘ข โ‰ค (๐œ(๐‘ข 0 )) ( Substitute ๐œ instead of ๐‘ข 0 ๐‘ข ๐œ โˆ’1) ๐œ ๐‘ข โ‰ค (๐œ(๐œ)) ( ๐‘ข ฯ„ , ๐‘ข โ‰ฅ 0 ๐‘’ ๐‘ข โ‰ค exp 1 โˆ’

  25. + Upper bound on ฯ„ ( ๐œ ) proof 59 ๏ฎ Algebraic calculations: ๐‘ข ฯ„ ๐‘’ ๐‘ข โ‰ค exp 1 โˆ’ log ๐‘’(๐‘ข) โ‰ค 1 โˆ’ ๐‘ข ๐œ ๐‘ข 1 ๐‘’(๐‘ข) ) ๐œ โ‰ค 1 โˆ’ log ๐‘’ ๐‘ข โ‰ค 1 + log( ๐‘ข โ‰ค ๐œ โˆ— (1 + log( 1 ๐œ )) ๐œ ๐œ โ‰ค 2๐‘“๐‘ข โˆ— (1 + log( 1 ๐œ ))

  26. + Upper bounds 60 ๐‘ข ฯ„ ), ๐‘ข โ‰ฅ 0 d ๐‘ข โ‰ค min(1, exp 1 โˆ’ 1 ๐œ ), 0 < ๐œ < 1 ฯ„ ( ๐œ ) โ‰ค ฯ„ โˆ— (1 + log ฯ„ ( ๐œ ) โ‰ค 2๐‘“๐‘ข โˆ— (1 + log 1 ๐œ ), 0 < ๐œ < 1

  27. + Entropy of initial distribution 61 Measure of randomness: ๐‘“๐‘œ๐‘ข ๐œˆ = โˆ’ ๐œˆ ๐‘ฆ log ๐œˆ ๐‘ฆ ๐‘ฆโˆˆ๐‘Š ๐‘ฆ is initial state ๐œˆ is initial probability distribution

  28. + Few more lemmaโ€™s 62 1. Let (๐‘Œ ๐‘ข ) is a random walk associated with ๐œˆ , and ๐œˆ ๐‘ข be the distribution of ๐‘Œ ๐‘ข then ๐‘“๐‘œ๐‘ข ๐œˆ ๐‘ข โ‰ค ๐‘ข โ‹… ๐‘“๐‘œ๐‘ข(๐œˆ) 2. If ๐‘ค is a distribution of ๐‘‡ such that ๐‘ค โˆ’ ๐œŒ โ‰ค ๐œ then ๐‘“๐‘œ๐‘ข ๐‘ค โ‰ฅ (1 โˆ’ ๐œ) log |๐‘‡|

  29. + Lower bound on ๐‘’(๐‘ข) proof 63 ๏ฎ From lemma 2, ๐‘“๐‘œ๐‘ข ๐‘ค โ‰ฅ 1 โˆ’ ๐œ log ๐‘‡ ๐‘“๐‘œ๐‘ข ๐‘ค log |๐‘‡| โ‰ฅ 1 โˆ’ ๐œ โ‰ฅ (1 โˆ’ ๐‘’(๐‘ข)) ๐‘’ ๐‘ข โ‰ฅ 1 โˆ’ ๐‘“๐‘œ๐‘ข ๐‘ค log |๐‘‡| ๐‘ขโ‹…๐‘“๐‘œ๐‘ข ๐œˆ ๐‘’ ๐‘ข โ‰ฅ 1 โˆ’ log |๐‘‡|

  30. + Lower bound on ฯ„ ( ฮต ) proof 64 ๏ฎ From lemma 1, ๐‘“๐‘œ๐‘ข ๐œˆ ๐‘ข โ‰ค ๐‘ข ๐‘“๐‘œ๐‘ข(๐œˆ) ๐‘“๐‘œ๐‘ข ๐œˆ ๐‘ข ๐‘ข โ‰ฅ ๐‘“๐‘œ๐‘ข(๐œˆ) ๏ฎ From lemma 2, ๐‘“๐‘œ๐‘ข ๐œˆ ๐‘ข โ‰ฅ (1 โˆ’ ๐œ) log |๐‘‡| ๐‘ข โ‰ฅ 1 โˆ’ ๐œ log |๐‘‡| ๐‘“๐‘œ๐‘ข(๐œˆ) ฯ„ ( ๐œ ) โ‰ฅ 1 โˆ’ ๐œ log |๐‘‡| ๐‘“๐‘œ๐‘ข(๐œˆ)

  31. + 65 Lower bounds ๐‘’ ๐‘ข โ‰ฅ 1 โˆ’ ๐‘ข โ‹… ๐‘“๐‘œ๐‘ข(๐œˆ) log ๐‘‡ 1 โˆ’ ๐œ log |๐‘‡| ฯ„ ๐œ โ‰ฅ ๐‘“๐‘œ๐‘ข(๐œˆ)

  32. + Propp Wilson Tobias Bertelsen 66

  33. + An exact version of MCMC 67 ๏ฎ Problems with MCMC A. Have accuracy error, which depends on starting state We must know number of iterations B. ๏ฎ James Propp and David Wilson propos Coupling from the past (1996) ๏ฎ A.k.a. the Propp-Wilson algorithm ๏ฎ Idea: ๏ฎ Solve problems by running chain infinitely

  34. + An exact version of MCMC 68 Theoretical โ€ข Runs all configurations infinitely โ€ข Literarily takes โˆž time โ€ข Impossible Coupling from the past โ€ข Runs all configurations for finite time โ€ข Might take 1000โ€™s of years โ€ข Infeasible Sandwiching โ€ข Run few configurations for finite time โ€ข Takes seconds โ€ข Practicable

  35. + Theoretical exact sampling 69 ๏ฎ Recall the convergence theorem: ๏ฎ We will approach the stationary distribution as the number of steps goes to infinity ๏ฎ Intuitive approach: ๏ฎ To sample perfectly we start a chain and run for infinity ๏ฎ Start at ๐‘ข = 0 , sample at ๐‘ข = โˆž ๏ฎ Problem: We never get a sample ๏ฎ Alternative approach: ๏ฎ To sample perfectly we take a chain that have already been running for an infinite amount of time ๏ฎ Start at ๐‘ข = โˆ’โˆž , sample at ๐‘ข = 0

  36. + Theoretical independence of 70 starting state ๏ฎ Sample from a Markov chain in MCMC depends solely on ๏ฎ Starting state ๐‘Œ โˆ’โˆž ๏ฎ Sequence of random numbers ๐‘‰ ๏ฎ We want to be independent of the starting state. ๏ฎ For a given sequence of random numbers ๐‘‰ โˆ’โˆž , โ€ฆ , ๐‘‰ โˆ’1 we want to ensure that the starting state ๐‘Œ โˆ’โˆž has no effect on ๐‘Œ 0

  37. + Theoretical independence of 71 starting state Collisions: ๏ฎ For a given ๐‘‰ โˆ’โˆž , โ€ฆ , ๐‘‰ โˆ’1 if two Markov chains is at the same state at some ๐‘ขโ€ฒ the will continue on together:

  38. + Coupling from the past 72 ๏ฎ At some finite past time ๐‘ข = โˆ’๐‘‚ ๏ฎ All past chains has already run infinitively and has coupled into one โˆž โˆ’ ๐‘‚ = โˆž ๏ฎ We want to continue that coupled chain to ๐‘ข = 0 ๏ฎ But we donโ€™t know which at state they will be at ๐‘ข = โˆ’๐‘‚ ๏ฎ Run all states from โˆ’๐‘‚ instead of โˆ’โˆž

  39. + Coupling from the past 73 Let ๐‘‰ = ๐‘‰ โˆ’1 , ๐‘‰ โˆ’2 , ๐‘‰ โˆ’3 , โ€ฆ be the sequence of independnent 1. uniformly random numbers For ๐‘‚ ๐‘˜ โˆˆ 1,2,4,8, โ€ฆ 2. Extend ๐‘‰ to length ๐‘œ ๐‘˜ , keeping ๐‘‰ โˆ’1 , โ€ฆ , ๐‘‰ ๐‘œ ๐‘˜โˆ’1 the same 1. Start one chain from each state at ๐‘ข = โˆ’๐‘‚ 2. ๐‘˜ For ๐‘ข from โˆ’๐‘‚ ๐‘˜ to zero: Simulate the chains using ๐‘‰ ๐‘ข 3. If all chains has converged at ๐‘‡ ๐‘— at ๐‘ข = 0 , return ๐‘‡ ๐‘— 4. Else repeat loop 5.

  40. + Coupling from the past 74

  41. + Questions 75 ๏ฎ Why do we double the lengths? ๏ฎ Worst case < 4๐‘‚ ๐‘๐‘ž๐‘ข steps, where ๐‘‚ ๐‘๐‘ž๐‘ข is the minimal ๐‘‚ at which we can achieve convergence ๏ฎ Compare to N โˆˆ [1,2,3,4,5, โ€ฆ ] , ๐‘ƒ ๐‘‚ ๐‘๐‘ž๐‘ข 2 steps ๏ฎ Why do we have to use the same random numbers? ๏ฎ Different samples might take longer or shorter to converge. ๏ฎ We must evaluate the same sample in each iteration. ๏ฎ The sample should only be dependent on ๐‘‰ not the different ๐‘‚

  42. + Using the same ๐‘‰ 76 ๏ฎ We have ๐‘™ states with the following update function 1 1 ๐‘‡ ๐‘— , ๐‘‰ = ๐‘‡ 1 ๐‘‰ < ๐œš ๐‘‡ ๐‘™ , ๐‘‰ = ๐‘‡ 1 ๐‘‰ < 2 2 ๐‘‡ ๐‘—+1 ๐‘๐‘ขโ„Ž๐‘“๐‘ ๐‘ฅ๐‘—๐‘ก๐‘“ ๐‘‡ ๐‘™ ๐‘๐‘ขโ„Ž๐‘“๐‘ ๐‘ฅ๐‘—๐‘ก๐‘“ 1 2 ๏ฎ ๐œŒ ๐‘‡ ๐‘— = 2 ๐‘— , ๐œŒ ๐‘™ = 2 ๐‘™

  43. + 77 Using the same ๐‘‰ ๏ฎ The probability of ๐‘‡ 1 only depends on the last random number: ๐‘„ ๐‘‡ 1 = ๐‘„ ๐‘‰ โˆ’1 < 1 = 1 2 2 ๏ฎ Lets assume we generate a new ๐‘‰ for each run: ๐‘‰ 1 , ๐‘‰ 2 , ๐‘‰ 3 , โ€ฆ ๐‘ธ ๐‘ป ๐Ÿ acc. ๐’ ๐‘ฝ ๐‘ธ ๐‘ป ๐Ÿ 1 1 < 1 1 50 % ๐‘‰ โˆ’1 ๐‘„ ๐‘‰ โˆ’1 2 2 , ๐‘‰ โˆ’1 2 2 < 1 1 < 1 2 ๐‘‰ โˆ’2 75 % ๐‘„ ๐‘‰ โˆ’1 2 โˆจ ๐‘‰ โˆ’1 2 3 , โ€ฆ , ๐‘‰ โˆ’1 3 3 < 1 2 < 1 1 < 1 4 81.25 % ๐‘‰ โˆ’4 ๐‘„ ๐‘‰ โˆ’1 2 โˆจ ๐‘‰ โˆ’1 2 โˆจ ๐‘‰ โˆ’1 2 4 , โ€ฆ , ๐‘‰ โˆ’1 4 ] 3 < 1 2 < 1 1 < 1 4 < 1 8 [๐‘‰ โˆ’8 81.64 % ๐‘„ ๐‘‰ โˆ’1 2 โˆจ ๐‘‰ โˆ’1 2 โˆจ ๐‘‰ โˆ’1 2 โˆจ ๐‘‰ 1โˆ’ 2

  44. + 78 Using the same ๐‘‰ ๏ฎ The probability of ๐‘‡ 1 only depends on the last random number: ๐‘„ ๐‘‡ 1 = ๐‘„ ๐‘‰ โˆ’1 < 1 = 1 2 2 ๏ฎ Lets instead use the same ๐‘‰ for each run: ๐‘‰ 1 ๐‘ธ ๐‘ป ๐Ÿ acc. ๐’ ๐‘ฝ ๐‘ธ ๐‘ป ๐Ÿ 1 1 < 1 1 50 % ๐‘‰ โˆ’1 ๐‘„ ๐‘‰ โˆ’1 2 1 , ๐‘‰ โˆ’1 1 1 < 1 2 ๐‘‰ โˆ’2 50 % ๐‘„ ๐‘‰ โˆ’1 2 1 , โ€ฆ , ๐‘‰ โˆ’1 1 1 < 1 4 50 % ๐‘‰ โˆ’4 ๐‘„ ๐‘‰ โˆ’1 2 1 , โ€ฆ , ๐‘‰ โˆ’1 1 ] 1 < 1 8 [๐‘‰ โˆ’8 50 % ๐‘„ ๐‘‰ โˆ’1 2

  45. + Problem 79 ๏ฎ In each step we update up to ๐‘™ chains ๏ฎ Total execution time is ๐‘ƒ ๐‘‚ ๐‘๐‘ž๐‘ข ๐‘™ ๏ฎ BUT ๐‘™ = 2 ๐‘€ 2 in the Ising model ๏ฎ Worse than the naรฏve approach ๐‘ƒ(๐‘™)

  46. + Sandwiching Nirandika Wanigasekara 80

  47. + Sandwiching 81 โ€ข Many vertices ๏ƒ  running ๐‘™ Propp Wilson chains will take time algorithm โ€ข Impractical for large ๐‘™ Choose a โ€ข Can we still get the same relatively small results? state space โ€ข Try sandwiching

  48. + Sandwiching 82 ๏ฎ Idea ๏ฎ Find two chains bounding all other chains ๏ฎ If we have such two boundary chains ๏ฎ Check if those two chains converge ๏ฎ Then all other chains have also converged

  49. + Sandwiching 83 ๏ฎ To come up with the boundary chains ๏ฎ Need a way to order the states ๏ฎ ๐‘‡ 1 โ‰ค ๐‘‡ 2 โ‰ค ๐‘‡ 3 โ€ฆ ๏ฎ Chain in higher state does not cross a chain in lower state 2 2 2 2 2 2 1 1 1 1 1 1 ๏ฎ Results in a Markov Chain obeying certain monotonicity properties

  50. + Sandwiching 84 ๏ฎ Lets consider ๏ฎ A fixed set of states ๏ƒ  ๐‘™ ๏ฎ State space ๐‘‡ = {1, โ€ฆ . , ๐‘™} ๏ฎ A transition matrix 12 = 1 ๏ฎ ๐‘„ 11 = ๐‘„ 2 ๏ฎ ๐‘„ ๐‘™๐‘™ = ๐‘„ ๐‘™,๐‘™โˆ’1 = 1 2 ๏ฎ ๐‘”๐‘๐‘  ๐‘— =2, โ€ฆ.. ๐‘™ โˆ’ 1, ๐‘„ ๐‘—,๐‘—โˆ’1 = ๐‘„ ๐‘—,๐‘—+1 = 1 2 ๏ฎ All the other entries are 0

  51. + Sandwiching 85 ๏ฎ What is this Markov Chain doing? ๏ฎ Take one step up and one step down the ladder, at each integer 1 time, with probability 2 ๏ฎ If at the top or the bottom(state ๐‘™ ) ๏ฎ It will stays where it is ๏ฎ Ladder Walk on ๐’ vertices ๏ฎ The stationary distribution ๐œŒ of this Markov Chain 1 ๏ฎ ๐œŒ ๐‘— = ๐‘™ ๐‘”๐‘๐‘  ๐‘— = 1, โ€ฆ . , ๐‘™

  52. + Sandwiching 86 ๏ฎ Propp-Wilson for this Markov Chain with ๏ฎ Valid update function ๐œš ๏ฎ if u < 1 2 then step down ๏ฎ if ๐‘ฃ โ‰ฅ 1 2 then step up ๏ฎ negative starting times ๐‘‚ 1 , ๐‘‚ 2 , โ€ฆ = 1, 2, 4, 8, โ€ฆ ๏ฎ ๏ฎ States ๐‘™ = 5

  53. + Sandwiching 87

  54. + Sandwiching 88 ๏ฎ Update function preserves ordering between states ๏ฎ for all ๐‘‰ โˆˆ 0, 1 ๐‘๐‘œ๐‘’ ๐‘๐‘š๐‘š ๐‘—, ๐‘˜ โˆˆ 1, โ€ฆ . , ๐‘™ ๐‘ก๐‘ฃ๐‘‘โ„Ž ๐‘ขโ„Ž๐‘๐‘ข ๐‘— โ‰ค ๐‘˜ ๐‘ฅ๐‘“ โ„Ž๐‘๐‘ค๐‘“ ๐œš ๐‘—, ๐‘‰ โ‰ค ๐œš(๐‘˜, ๐‘‰) ๏ฎ It is sufficient to run only 2 chains rather than k

  55. + Sandwiching 89 ๏ฎ Are these conditions always met ๏ฎ No, not always ๏ฎ But, there are frequent instances where these conditions are met ๏ฎ Especially useful when k is large ๏ฎ Ising model is a good example for this

  56. + Ising Model Malay Singh 90

  57. + 2D Ising Model 91 ๏ฎ A grid with sites numbered from 1 to ๐‘€ 2 where ๐‘€ is grid size. ๏ฎ Each site ๐‘— can have spin ๐‘ฆ ๐‘— โˆˆ โˆ’1, +1 ๏ฎ ๐‘Š is set of sites ๐‘‡ = โˆ’1,1 ๐‘Š defines all possible configurations (states) ๏ฎ Magnetisation ๐‘› for a state ๐‘ก is ๐‘— ๐‘ก ๐‘ฆ ๐‘— ๏ฎ ๐‘› ๐‘ก = ๐‘€ 2 ๏ฎ The energy of a state ๐‘ก ๏ฎ H ๐‘ก = โˆ’ ๐‘—๐‘˜ ๐‘ก ๐‘ฆ ๐‘— ๐‘ก ๐‘ฆ ๐‘˜

  58. + Ordering of states 92 ๏ฎ For two states ๐‘ก, ๐‘ก โ€ฒ we say s โ‰ค ๐‘ก โ€ฒ if ๐‘ก ๐‘ฆ < ๐‘ก โ€ฒ ๐‘ฆ โˆ€ ๐‘ฆ โˆˆ ๐‘Š Maxima ๐‘› = 1 Minima ๐‘› = โˆ’1 ๐‘ก min x = โˆ’1 โˆ€๐‘ฆ โˆˆ ๐‘Š ๐‘กmax x = +1 โˆ€๐‘ฆ โˆˆ ๐‘Š Hence ๐‘ก min โ‰ค ๐‘ก โ‰ค ๐‘ก max for all ๐‘ก

  59. + The update function 93 We use the sequence of random numbers ๐‘‰ ๐‘œ , ๐‘‰ ๐‘œโˆ’1 , โ€ฆ , ๐‘‰ 0 We are updating the state at ๐‘Œ ๐‘œ . We choose a site ๐‘ฆ in 0, ๐‘€ 2 uniformly. exp 2๐›พ ๐ฟ + ๐‘ฆ, ๐‘ก โˆ’ ๐ฟ โˆ’ ๐‘ฆ, ๐‘ก +1, ๐‘—๐‘” ๐‘‰ ๐‘œ+1 < ๐‘Œ ๐‘œ+1 ๐‘ฆ = exp 2๐›พ ๐ฟ + ๐‘ฆ, ๐‘ก โˆ’ ๐ฟ โˆ’ ๐‘ฆ, ๐‘ก + 1 โˆ’1, ๐‘๐‘ขโ„Ž๐‘“๐‘ ๐‘ฅ๐‘—๐‘ก๐‘“

  60. + Maintaining ordering 94 ๏ฎ Ordering after update (From ๐‘Œ ๐‘œ to ๐‘Œ ๐‘œ+1 ) ๏ฎ We choose the same site ๐‘ฆ to update in both chains ๏ฎ The spin of ๐‘ฆ at ๐‘Œ ๐‘œ+1 depends on update function ๏ฎ We want to check exp 2๐›พ ๐ฟ + ๐‘ฆ, ๐‘ก โ€ฒ โˆ’ ๐ฟ โˆ’ ๐‘ฆ, ๐‘ก โ€ฒ exp 2๐›พ ๐ฟ + ๐‘ฆ, ๐‘ก โˆ’ ๐ฟ โˆ’ ๐‘ฆ, ๐‘ก โ‰ค exp 2๐›พ ๐ฟ + ๐‘ฆ, ๐‘ก โ€ฒ โˆ’ ๐ฟ โˆ’ ๐‘ฆ, ๐‘ก โ€ฒ exp 2๐›พ ๐ฟ + ๐‘ฆ, ๐‘ก โˆ’ ๐ฟ โˆ’ ๐‘ฆ, ๐‘ก + 1 + 1 ๏ฎ That is equivalent to checking ๐ฟ + ๐‘ฆ, ๐‘ก โˆ’ ๐ฟ โˆ’ ๐‘ฆ, ๐‘ก โ‰ค ๐ฟ + ๐‘ฆ, ๐‘ก โ€ฒ โˆ’ ๐ฟ โˆ’ ๐‘ฆ, ๐‘ก โ€ฒ

  61. + Maintaining ordering 95 ๏ฎ As ๐‘ก โ‰ค ๐‘ก โ€ฒ we have ๏ฎ ๐ฟ + ๐‘ฆ, ๐‘ก โ‰ค ๐ฟ + ๐‘ฆ, ๐‘กโ€ฒ ๏ฎ ๐ฟ โˆ’ ๐‘ฆ, ๐‘ก โ‰ฅ ๐ฟ โˆ’ ๐‘ฆ, ๐‘กโ€ฒ ๏ฎ First equation minus second equation ๏ฎ ๐ฟ + ๐‘ฆ, ๐‘ก โˆ’ ๐ฟ โˆ’ ๐‘ฆ, ๐‘ก โ‰ค ๐ฟ + ๐‘ฆ, ๐‘กโ€ฒ โˆ’ ๐ฟ โˆ’ ๐‘ฆ, ๐‘กโ€ฒ

  62. + Ising Model L = 4 96 ๐‘ˆ = 3.5 and ๐‘‚ = 512 ๐‘ˆ = 4.8 and ๐‘‚ = 128

  63. + Ising Model ๐‘€ = 8 97 ๐‘ˆ = 5.9 and ๐‘‚ = 512

  64. + Ising Model ๐‘€ = 16 98 ๐‘ˆ = 5.3 and ๐‘‚ = 16 384

  65. + Summary 99 ๏ฎ Exact sampling ๏ฎ Markov chains monte carlo but when we converge ๏ฎ Propp Wilson with Sandwiching to rescue

  66. + Questions? 100

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend