Statistics and learning Monte Carlo Markov Chains (methods) Emmanuel Rachelson and Matthieu Vignes ISAE SupAero 22 nd March 2013 E. Rachelson & M. Vignes (ISAE) SAD 2013 1 / 19
Monte Carlo computation Why, what ? ◮ An old experiment that conceived the idea of Monte Carlo methods is that of “Buffon’s needle”: you throw a l -length needle on a flat surface made of parallel lines with spacing D ( > l ). Under ideal 2 l πD . → Estimation of conditions, P(needle crosses one of the lines) = π thanks to a large number of thrown needles : 2 l π = lim P n D, n →∞ where P n is the proportion of crosses in n such throws. E. Rachelson & M. Vignes (ISAE) SAD 2013 2 / 19
Monte Carlo computation Why, what ? ◮ An old experiment that conceived the idea of Monte Carlo methods is that of “Buffon’s needle”: you throw a l -length needle on a flat surface made of parallel lines with spacing D ( > l ). Under ideal 2 l πD . → Estimation of conditions, P(needle crosses one of the lines) = π thanks to a large number of thrown needles : 2 l π = lim P n D, n →∞ where P n is the proportion of crosses in n such throws. ◮ Basic concept here is that of simulating random processes in order to help evaluate some quantities of interest . E. Rachelson & M. Vignes (ISAE) SAD 2013 2 / 19
Monte Carlo computation Why, what ? ◮ An old experiment that conceived the idea of Monte Carlo methods is that of “Buffon’s needle”: you throw a l -length needle on a flat surface made of parallel lines with spacing D ( > l ). Under ideal 2 l πD . → Estimation of conditions, P(needle crosses one of the lines) = π thanks to a large number of thrown needles : 2 l π = lim P n D, n →∞ where P n is the proportion of crosses in n such throws. ◮ Basic concept here is that of simulating random processes in order to help evaluate some quantities of interest . ◮ First intensive use during WW II in order to make a good use of computing facilities (ENIAC): neutron random diffusion for atomic bomb design and the estimation of eigenvalues in the Schr¨ odinger equation. Intensively developped by (statistical) physicists. E. Rachelson & M. Vignes (ISAE) SAD 2013 2 / 19
Monte Carlo computation Why, what ? ◮ An old experiment that conceived the idea of Monte Carlo methods is that of “Buffon’s needle”: you throw a l -length needle on a flat surface made of parallel lines with spacing D ( > l ). Under ideal 2 l πD . → Estimation of conditions, P(needle crosses one of the lines) = π thanks to a large number of thrown needles : 2 l π = lim P n D, n →∞ where P n is the proportion of crosses in n such throws. ◮ Basic concept here is that of simulating random processes in order to help evaluate some quantities of interest . ◮ First intensive use during WW II in order to make a good use of computing facilities (ENIAC): neutron random diffusion for atomic bomb design and the estimation of eigenvalues in the Schr¨ odinger equation. Intensively developped by (statistical) physicists. ◮ main interest when no closed form of solutions is tractable. E. Rachelson & M. Vignes (ISAE) SAD 2013 2 / 19
Typical problems 1. Integral computation � I = h ( x ) f ( x ) dx, can be assimilated to a E f [ h ] if f is a density distribution. To be written h ( x ) f ( x ) � g ( x ) g ( x ) dx = E g [ hf/g ] , if f was not a density distribution and Supp( f ) ⊂ Supp( g ) . E. Rachelson & M. Vignes (ISAE) SAD 2013 3 / 19
Typical problems 1. Integral computation � I = h ( x ) f ( x ) dx, can be assimilated to a E f [ h ] if f is a density distribution. To be written h ( x ) f ( x ) � g ( x ) g ( x ) dx = E g [ hf/g ] , if f was not a density distribution and Supp( f ) ⊂ Supp( g ) . 2. Optimisation max x in X f ( x ) or argmax x inX f ( x ) ( min can replace max ) E. Rachelson & M. Vignes (ISAE) SAD 2013 3 / 19
Need of Monte Carlo techniques: integration ◮ Essential part in many scientific problems: computation of � I = f ( x ) dx. D E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 19
Need of Monte Carlo techniques: integration ◮ Essential part in many scientific problems: computation of � I = f ( x ) dx. D ◮ If we can draw iid random samples from D , we can compute ˆ j ( f ( x ( j ) )) /n and LLN says: lim n ˆ I n = � I n = I with probability 1 and CLT give convergence rate: √ n ( ˆ I n − I ) → N ( O, σ 2 ) , where σ 2 = var( g ( x )) . E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 19
Need of Monte Carlo techniques: integration ◮ Essential part in many scientific problems: computation of � I = f ( x ) dx. D ◮ If we can draw iid random samples from D , we can compute ˆ j ( f ( x ( j ) )) /n and LLN says: lim n ˆ I n = � I n = I with probability 1 and CLT give convergence rate: √ n ( ˆ I n − I ) → N ( O, σ 2 ) , where σ 2 = var( g ( x )) . ◮ In dimension 1 , Riemann’s approximation give a O (1 /n ) error rate. But deterministc methods fail when dimensionality increases. E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 19
Need of Monte Carlo techniques: integration ◮ Essential part in many scientific problems: computation of � I = f ( x ) dx. D ◮ If we can draw iid random samples from D , we can compute ˆ j ( f ( x ( j ) )) /n and LLN says: lim n ˆ I n = � I n = I with probability 1 and CLT give convergence rate: √ n ( ˆ I n − I ) → N ( O, σ 2 ) , where σ 2 = var( g ( x )) . ◮ In dimension 1 , Riemann’s approximation give a O (1 /n ) error rate. But deterministc methods fail when dimensionality increases. ◮ However, no free lunch theorem: in high-dimensional D , (i) σ 2 ≈ how uniform g is can be quite large and (ii) issue to produce uniformly distributed sample in D . E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 19
Need of Monte Carlo techniques: integration ◮ Essential part in many scientific problems: computation of � I = f ( x ) dx. D ◮ If we can draw iid random samples from D , we can compute ˆ j ( f ( x ( j ) )) /n and LLN says: lim n ˆ I n = � I n = I with probability 1 and CLT give convergence rate: √ n ( ˆ I n − I ) → N ( O, σ 2 ) , where σ 2 = var( g ( x )) . ◮ In dimension 1 , Riemann’s approximation give a O (1 /n ) error rate. But deterministc methods fail when dimensionality increases. ◮ However, no free lunch theorem: in high-dimensional D , (i) σ 2 ≈ how uniform g is can be quite large and (ii) issue to produce uniformly distributed sample in D . ◮ Again, importance sampling theoretically solves this but the choice of sample distribution is a challenge. E. Rachelson & M. Vignes (ISAE) SAD 2013 4 / 19
Integration a classical Monte Carlo approach � If we try to evaluate I = f ( x ) g ( x ) dx, where g is a density function: I = E g [ f ] and then: E. Rachelson & M. Vignes (ISAE) SAD 2013 5 / 19
Integration a classical Monte Carlo approach � If we try to evaluate I = f ( x ) g ( x ) dx, where g is a density function: I = E g [ f ] and then: classical Monte Carlo method I n = 1 /n � n ˆ i =1 f ( x i ) , where x i ∼ L ( f ) . E. Rachelson & M. Vignes (ISAE) SAD 2013 5 / 19
Integration a classical Monte Carlo approach � If we try to evaluate I = f ( x ) g ( x ) dx, where g is a density function: I = E g [ f ] and then: classical Monte Carlo method I n = 1 /n � n ˆ i =1 f ( x i ) , where x i ∼ L ( f ) . f 2 g < ∞ . � Justified by LLN & CLT if E. Rachelson & M. Vignes (ISAE) SAD 2013 5 / 19
Integration no density at first If f is not a density (or not a “good” one), then for any density g whose h ( x ) f ( x ) � support contains the support of f : I = g ( x ) g ( x ) dx = E g [ hf/g ] . Similarly: E. Rachelson & M. Vignes (ISAE) SAD 2013 6 / 19
Integration no density at first If f is not a density (or not a “good” one), then for any density g whose h ( x ) f ( x ) � support contains the support of f : I = g ( x ) g ( x ) dx = E g [ hf/g ] . Similarly: importance sampling Monte Carlo method I n = 1 /n � n ˆ i =1 h ( y i ) f ( y i ) /g ( y i ) , where y i ∼ L ( g ) . E. Rachelson & M. Vignes (ISAE) SAD 2013 6 / 19
Integration no density at first If f is not a density (or not a “good” one), then for any density g whose h ( x ) f ( x ) � support contains the support of f : I = g ( x ) g ( x ) dx = E g [ hf/g ] . Similarly: importance sampling Monte Carlo method I n = 1 /n � n ˆ i =1 h ( y i ) f ( y i ) /g ( y i ) , where y i ∼ L ( g ) . h 2 f 2 /g < ∞ . This is equivalent to � Same justification but Var g ( I n ) = Var g (1 /n � n i =1 h ( Y i ) f ( Y i ) /g ( Y i )) ; g must have an heavier tail than that of f . Choice of g ? E. Rachelson & M. Vignes (ISAE) SAD 2013 6 / 19
Integration no density at first If f is not a density (or not a “good” one), then for any density g whose h ( x ) f ( x ) � support contains the support of f : I = g ( x ) g ( x ) dx = E g [ hf/g ] . Similarly: importance sampling Monte Carlo method I n = 1 /n � n ˆ i =1 h ( y i ) f ( y i ) /g ( y i ) , where y i ∼ L ( g ) . h 2 f 2 /g < ∞ . This is equivalent to � Same justification but Var g ( I n ) = Var g (1 /n � n i =1 h ( Y i ) f ( Y i ) /g ( Y i )) ; g must have an heavier tail than that of f . Choice of g ? Theorem (Rubinstein) The density g ∗ which minimises Var( ˆ I n ) (for all n ) is | h ( x ) | f ( x ) g ∗ ( x ) = | h ( y ) | f ( y ) dy. � E. Rachelson & M. Vignes (ISAE) SAD 2013 6 / 19
Recommend
More recommend