statistics and learning
play

Statistics and learning Monte Carlo Markov Chains (methods) - PowerPoint PPT Presentation

Statistics and learning Monte Carlo Markov Chains (methods) Emmanuel Rachelson and Matthieu Vignes ISAE SupAero 22 nd March 2013 E. Rachelson & M. Vignes (ISAE) SAD 2013 1 / 19 Monte Carlo computation Why, what ? An old experiment


  1. Statistics and learning Monte Carlo Markov Chains (methods) Emmanuel Rachelson and Matthieu Vignes ISAE SupAero 22 nd March 2013 E. Rachelson & M. Vignes (ISAE) SAD 2013 1 / 19

  2. Monte Carlo computation Why, what ? ◮ An old experiment that conceived the idea of Monte Carlo methods is that of “Buffon’s needle”: you throw a l -length needle on a flat surface made of parallel lines with spacing D ( > l ). Under ideal 2 l πD . → Estimation of conditions, P(needle crosses one of the lines) = π thanks to a large number of thrown needles : 2 l π = lim P n D, n →∞ where P n is the proportion of crosses in n such throws. E. Rachelson & M. Vignes (ISAE) SAD 2013 2 / 19

  3. Monte Carlo computation Why, what ? ◮ An old experiment that conceived the idea of Monte Carlo methods is that of “Buffon’s needle”: you throw a l -length needle on a flat surface made of parallel lines with spacing D ( > l ). Under ideal 2 l πD . → Estimation of conditions, P(needle crosses one of the lines) = π thanks to a large number of thrown needles : 2 l π = lim P n D, n →∞ where P n is the proportion of crosses in n such throws. ◮ Basic concept here is that of simulating random processes in order to help evaluate some quantities of interest . E. Rachelson & M. Vignes (ISAE) SAD 2013 2 / 19

  4. Monte Carlo computation Why, what ? ◮ An old experiment that conceived the idea of Monte Carlo methods is that of “Buffon’s needle”: you throw a l -length needle on a flat surface made of parallel lines with spacing D ( > l ). Under ideal 2 l πD . → Estimation of conditions, P(needle crosses one of the lines) = π thanks to a large number of thrown needles : 2 l π = lim P n D, n →∞ where P n is the proportion of crosses in n such throws. ◮ Basic concept here is that of simulating random processes in order to help evaluate some quantities of interest . ◮ First intensive use during WW II in order to make a good use of computing facilities (ENIAC): neutron random diffusion for atomic bomb design and the estimation of eigenvalues in the Schr¨ odinger equation. Intensively developped by (statistical) physicists. E. Rachelson & M. Vignes (ISAE) SAD 2013 2 / 19

  5. Monte Carlo computation Why, what ? ◮ An old experiment that conceived the idea of Monte Carlo methods is that of “Buffon’s needle”: you throw a l -length needle on a flat surface made of parallel lines with spacing D ( > l ). Under ideal 2 l πD . → Estimation of conditions, P(needle crosses one of the lines) = π thanks to a large number of thrown needles : 2 l π = lim P n D, n →∞ where P n is the proportion of crosses in n such throws. ◮ Basic concept here is that of simulating random processes in order to help evaluate some quantities of interest . ◮ First intensive use during WW II in order to make a good use of computing facilities (ENIAC): neutron random diffusion for atomic bomb design and the estimation of eigenvalues in the Schr¨ odinger equation. Intensively developped by (statistical) physicists. ◮ main interest when no closed form of solutions is tractable. E. Rachelson & M. Vignes (ISAE) SAD 2013 2 / 19

  6. Typical problems 1. Integral computation � I = h ( x ) f ( x ) dx, can be assimilated to a E f [ h ] if f is a density distribution. To be written h ( x ) f ( x ) � g ( x ) g ( x ) dx = E g [ hf/g ] , if f was not a density distribution and Supp( f ) ⊂ Supp( g ) . E. Rachelson & M. Vignes (ISAE) SAD 2013 3 / 19

  7. Typical problems 1. Integral computation � I = h ( x ) f ( x ) dx, can be assimilated to a E f [ h ] if f is a density distribution. To be written h ( x ) f ( x ) � g ( x ) g ( x ) dx = E g [ hf/g ] , if f was not a density distribution and Supp( f ) ⊂ Supp( g ) . 2. Optimisation max x in X f ( x ) or argmax x inX f ( x ) ( min can replace max ) E. Rachelson & M. Vignes (ISAE) SAD 2013 3 / 19

  8. Need of Monte Carlo techniques: integration ◮ Essential part in many scientific problems: computation of � I = f ( x ) dx. D E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 19

  9. Need of Monte Carlo techniques: integration ◮ Essential part in many scientific problems: computation of � I = f ( x ) dx. D ◮ If we can draw iid random samples from D , we can compute ˆ j ( f ( x ( j ) )) /n and LLN says: lim n ˆ I n = � I n = I with probability 1 and CLT give convergence rate: √ n ( ˆ I n − I ) → N ( O, σ 2 ) , where σ 2 = var( g ( x )) . E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 19

  10. Need of Monte Carlo techniques: integration ◮ Essential part in many scientific problems: computation of � I = f ( x ) dx. D ◮ If we can draw iid random samples from D , we can compute ˆ j ( f ( x ( j ) )) /n and LLN says: lim n ˆ I n = � I n = I with probability 1 and CLT give convergence rate: √ n ( ˆ I n − I ) → N ( O, σ 2 ) , where σ 2 = var( g ( x )) . ◮ In dimension 1 , Riemann’s approximation give a O (1 /n ) error rate. But deterministc methods fail when dimensionality increases. E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 19

  11. Need of Monte Carlo techniques: integration ◮ Essential part in many scientific problems: computation of � I = f ( x ) dx. D ◮ If we can draw iid random samples from D , we can compute ˆ j ( f ( x ( j ) )) /n and LLN says: lim n ˆ I n = � I n = I with probability 1 and CLT give convergence rate: √ n ( ˆ I n − I ) → N ( O, σ 2 ) , where σ 2 = var( g ( x )) . ◮ In dimension 1 , Riemann’s approximation give a O (1 /n ) error rate. But deterministc methods fail when dimensionality increases. ◮ However, no free lunch theorem: in high-dimensional D , (i) σ 2 ≈ how uniform g is can be quite large and (ii) issue to produce uniformly distributed sample in D . E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 19

  12. Need of Monte Carlo techniques: integration ◮ Essential part in many scientific problems: computation of � I = f ( x ) dx. D ◮ If we can draw iid random samples from D , we can compute ˆ j ( f ( x ( j ) )) /n and LLN says: lim n ˆ I n = � I n = I with probability 1 and CLT give convergence rate: √ n ( ˆ I n − I ) → N ( O, σ 2 ) , where σ 2 = var( g ( x )) . ◮ In dimension 1 , Riemann’s approximation give a O (1 /n ) error rate. But deterministc methods fail when dimensionality increases. ◮ However, no free lunch theorem: in high-dimensional D , (i) σ 2 ≈ how uniform g is can be quite large and (ii) issue to produce uniformly distributed sample in D . ◮ Again, importance sampling theoretically solves this but the choice of sample distribution is a challenge. E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 19

  13. Integration a classical Monte Carlo approach � If we try to evaluate I = f ( x ) g ( x ) dx, where g is a density function: I = E g [ f ] and then: E. Rachelson & M. Vignes (ISAE) SAD 2013 5 / 19

  14. Integration a classical Monte Carlo approach � If we try to evaluate I = f ( x ) g ( x ) dx, where g is a density function: I = E g [ f ] and then: classical Monte Carlo method I n = 1 /n � n ˆ i =1 f ( x i ) , where x i ∼ L ( f ) . E. Rachelson & M. Vignes (ISAE) SAD 2013 5 / 19

  15. Integration a classical Monte Carlo approach � If we try to evaluate I = f ( x ) g ( x ) dx, where g is a density function: I = E g [ f ] and then: classical Monte Carlo method I n = 1 /n � n ˆ i =1 f ( x i ) , where x i ∼ L ( f ) . f 2 g < ∞ . � Justified by LLN & CLT if E. Rachelson & M. Vignes (ISAE) SAD 2013 5 / 19

  16. Integration no density at first If f is not a density (or not a “good” one), then for any density g whose h ( x ) f ( x ) � support contains the support of f : I = g ( x ) g ( x ) dx = E g [ hf/g ] . Similarly: E. Rachelson & M. Vignes (ISAE) SAD 2013 6 / 19

  17. Integration no density at first If f is not a density (or not a “good” one), then for any density g whose h ( x ) f ( x ) � support contains the support of f : I = g ( x ) g ( x ) dx = E g [ hf/g ] . Similarly: importance sampling Monte Carlo method I n = 1 /n � n ˆ i =1 h ( y i ) f ( y i ) /g ( y i ) , where y i ∼ L ( g ) . E. Rachelson & M. Vignes (ISAE) SAD 2013 6 / 19

  18. Integration no density at first If f is not a density (or not a “good” one), then for any density g whose h ( x ) f ( x ) � support contains the support of f : I = g ( x ) g ( x ) dx = E g [ hf/g ] . Similarly: importance sampling Monte Carlo method I n = 1 /n � n ˆ i =1 h ( y i ) f ( y i ) /g ( y i ) , where y i ∼ L ( g ) . h 2 f 2 /g < ∞ . This is equivalent to � Same justification but Var g ( I n ) = Var g (1 /n � n i =1 h ( Y i ) f ( Y i ) /g ( Y i )) ; g must have an heavier tail than that of f . Choice of g ? E. Rachelson & M. Vignes (ISAE) SAD 2013 6 / 19

  19. Integration no density at first If f is not a density (or not a “good” one), then for any density g whose h ( x ) f ( x ) � support contains the support of f : I = g ( x ) g ( x ) dx = E g [ hf/g ] . Similarly: importance sampling Monte Carlo method I n = 1 /n � n ˆ i =1 h ( y i ) f ( y i ) /g ( y i ) , where y i ∼ L ( g ) . h 2 f 2 /g < ∞ . This is equivalent to � Same justification but Var g ( I n ) = Var g (1 /n � n i =1 h ( Y i ) f ( Y i ) /g ( Y i )) ; g must have an heavier tail than that of f . Choice of g ? Theorem (Rubinstein) The density g ∗ which minimises Var( ˆ I n ) (for all n ) is | h ( x ) | f ( x ) g ∗ ( x ) = | h ( y ) | f ( y ) dy. � E. Rachelson & M. Vignes (ISAE) SAD 2013 6 / 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend