modern computational statistics lecture 4 numerical
play

Modern Computational Statistics Lecture 4: Numerical Integration - PowerPoint PPT Presentation

Modern Computational Statistics Lecture 4: Numerical Integration Cheng Zhang School of Mathematical Sciences, Peking University September 23, 2019 Overview 2/30 Statistical inference often depends on intractable integrals I ( f ) =


  1. Modern Computational Statistics Lecture 4: Numerical Integration Cheng Zhang School of Mathematical Sciences, Peking University September 23, 2019

  2. Overview 2/30 ◮ Statistical inference often depends on intractable integrals � I ( f ) = Ω f ( x ) dx ◮ This is especially true in Bayesian statistics, where a posterior distribution is usually non-trivial. ◮ In some situations, the likelihood itself may depend on intractable integrals so frequentist methods would also require numerical integration ◮ In this lecture, we start by discussing some simple numerical methods that can be easily used in low dimensional problems ◮ Next, we will discuss several Monte Carlo strategies that could be implemented even when the dimension is high

  3. Newton-Cˆ otes Quadrature 3/30 ◮ Consider a one-dimensional integral of the form � b I ( f ) = a f ( x ) dx ◮ A common strategy for approximating this integral is to use a tractable approximating function ˜ f ( x ) that can be integrated easily ◮ We typically constrain the approximating function to agree with f on a grid of points: x 1 , x 2 , . . . , x n

  4. Newton-Cˆ otes Quadrature 4/30 ◮ Newton-Cˆ otes methods use equally-spaced grids ◮ The approximating function is a polynomial ◮ The integral then is approximated with a weighted sum as follows n ˆ � I = w i f ( x i ) i =1 ◮ In its simplest case, we can use the Riemann rule by partitioning the interval [ a, b ] into n subintervals of length h = b − a n ; then n − 1 ˆ � I L = h f ( a + ih ) i =0 This is obtained using a piecewise constant function ˜ f that matches f at the left points of each subinterval

  5. Newton-Cˆ otes Quadrature 5/30 ◮ Alternatively, the approximating function could agree with the integrand at the right or middle point of each subinterval n n − 1 f ( a + ( i + 1 ˆ � ˆ � I R = h f ( a + ih ) , I M = h 2) h ) i =1 i =0 ◮ In either case, the approximating function is a zero-order polynomial ◮ To improve the approximation, we can use the trapzoidal rule by using a piecewise linear function that agrees with f ( x ) at both ends of subintervals n − 1 I = h f ( x i ) + h ˆ � 2 f ( a ) + h 2 f ( b ) i =1

  6. Newton-Cˆ otes Quadrature 6/30 ◮ We would further improve the approximation by using higher order polynomials ◮ Simpson’s rule uses a quadratic approximation over each subinterval � x i +1 f ( x ) dx ≈ x i +1 − x i � f ( x i ) + 4 f ( x i + x i +1 � ) + f ( x i +1 ) 6 2 x i ◮ In general, we can use any polynomial of degree k

  7. Gaussian Quadrature 7/30 ◮ Newton-Cˆ otes rules require equally spaced grids ◮ With a suitably flexible choice of n + 1 nodes, x 0 , x 1 , . . . , x n , and corresponding weights, A 0 , A 1 , . . . , A n , n � A i f ( x i ) i =0 gives the exact integration for all polynomials with degree less than or equal to 2 n + 1 ◮ This is called Gaussian quadrature, which is especially � b useful for the following type of integrals a f ( x ) w ( x ) dx where w ( x ) is a nonnegative function and � b a x k w ( x ) dx < ∞ for all k ≥ 0

  8. Orthogonal Functions 8/30 ◮ In general, for squared integrable functions, � b f ( x ) 2 w ( x ) dx ≤ ∞ a denoted as f ∈ L 2 w, [ a,b ] , we define the inner product as � b � f, g � w, [ a,b ] = f ( x ) g ( x ) w ( x ) dx a where f, g ∈ L 2 w, [ a,b ] ◮ We said two functions to be orthogonal if � f, g � w, [ a,b ] = 0. If f and g are also scaled so that � f, f � w, [ a,b ] = 1, � g, g � w, [ a,b ] = 1, then f and g are orthonormal

  9. Orthogonal Polynomials 9/30 ◮ We can define a sequence of orthogonal polynomials by a recursive rule T k +1 ( x ) = ( α k +1 + β k +1 x ) T k ( x ) − γ k +1 T k − 1 ( x ) ◮ Example: Chebyshev polynomials (first kind). T 0 ( x ) = 1 , T 1 ( x ) = x T n +1 ( x ) = 2 xT n ( x ) − T n − 1 ( x ) 1 ◮ T n ( x ) are orthogonal with respect to w ( x ) = 1 − x 2 and √ [ − 1 , 1] � 1 1 T n ( x ) T m ( x ) √ 1 − x 2 dx = 0 , ∀ n � = m − 1

  10. Orthogonal Polynomials 10/30 ◮ In general orthogonal polynomials are note unique since � f, g � = 0 implies � cf, dg � = 0 ◮ To make the orthogonal polynomial unique, we can use the following standarizations ◮ make the polynomial orthonormal: � f, f � = 0 ◮ set the leading coefficient of T j ( x ) to 1 ◮ Orthogonal polynomials form a basis for L 2 w, [ a,b ] so any function in this space can be written as ∞ � f ( x ) = a n T n ( x ) n =0 � f,T n � where a n = � T n ,T n �

  11. Gaussian Quadrature 11/30 ◮ Let { T n ( x ) } ∞ n =0 be a sequence of orthogonal polynomials with respect to w on [ a, b ]. ◮ Denote the n + 1 roots of T n +1 ( x ) by a < x 0 < x 1 < . . . < x n < b. ◮ We can find weights A 1 , A 2 , . . . , A n +1 such that � b n � P ( x ) w ( x ) dx = A i P ( x i ) , ∀ deg( P ) ≤ 2 n + 1 a i =0 ◮ To do that, we first show: there exists weights A 1 , A 2 , . . . , A n +1 such that � b n � ∀ deg( P ) < n + 1 P ( x ) w ( x ) dx = A i P ( x i ) , a i =0

  12. Gaussian Quadrature 12/30 ◮ Sketch of proof. We only need to satisfy � b n x k w ( x ) dx = � A i x k i , ∀ k = 0 , 1 , . . . , n a i =0 This leads to a system of linear equations       1 1 . . . 1 A 0 I 0 x 0 x 1 . . . x n A 1 I 1        =  . . . . . .      . . . . . .       . . . . . .      x n x n x n . . . A n I n 0 1 n � b a x k w ( x ) dx . The determinant of the where I k = coefficient matrix is a Vandermonde determinant, and is non-zero since x i � = x j , ∀ i � = j

  13. Gaussian Quadrature 13/30 ◮ Now we show that the above Gaussian Quadrature can be exact for polynomials of degree ≤ 2 n + 1 ◮ Let P ( x ) be a polynomial with deg( P ) ≤ 2 n + 1, there exist polynomials g ( x ) and r ( x ) such that P ( x ) = g ( x ) T n +1 ( x ) + r ( x ) with deg( g ) ≤ n, deg( r ) ≤ n , Therefore, � b � b n � P ( x ) w ( x ) dx = r ( x ) w ( x ) dx = A i r ( x i ) a a i =0 n � = A i P ( x i ) i =0

  14. Monte Carlo Method 14/30 ◮ We now discuss the Monte Carlo method mainly in the context of statistical inference ◮ As before, suppose we are interested in estimating � b I ( h ) = a h ( x ) dx ◮ If we can draw iid samples, x (1) , x (2) , . . . , x ( n ) uniformly from ( a, b ), we can approximate the integral as n I n = ( b − a ) 1 ˆ � h ( x ( i ) ) n i =1 ◮ Note that we can think about the integral as � b 1 ( b − a ) h ( x ) · b − adx a 1 where b − a is the density of Uniform( a, b )

  15. Monte Carlo Method 15/30 ◮ In general, we are interested in integrals of the form � X h ( x ) f ( x ) dx , where f ( x ) is a probability density function ◮ Analogous to the above argument, we can approximate this integral (or expectation) by drawing iid samples x (1) , x (2) , . . . , x ( n ) from the density f ( x ) and then n I = 1 ˆ � h ( x ( i ) ) n i =1 ◮ Based on the law of large numbers, we know that p ˆ lim I n − → I n →∞ ◮ And based on the central limit theorem √ n (ˆ σ 2 = V ar( h ( X )) I n − I ) → N (0 , σ 2 ) ,

  16. Example: estimating π 16/30 [ − 1 , 1] 2 h ( x ) · 1 ◮ Let h ( x ) = 1 B (0 , 1) ( x ), then π = 4 � 4 dx ◮ Monte Carlo estimate of π n I n = 4 ˆ � 1 B (0 , 1) ( x ( i ) ) n i =1 x ( i ) ∼ Uniform([ − 1 , 1] 2 )

  17. Example: estimating π 17/30

  18. Monte Carlo vs Quadrature 18/30 ◮ Convergence rate for Monte Carlo: O ( n − 1 / 2 ) � σ � | ˆ p I n − I | ≤ √ nδ ≥ 1 − δ, ∀ δ often slower than quadrature methods ( O ( n − 2 ) or better) ◮ However, the convergence rate of Monte Carlo does not depend on dimensionality ◮ On the other hand, quadrature methods are difficult to extend to multidimensional problems, because of the curse of dimensionality. The actual convergence rate becomes O ( n − k/d ), for any order k method in dimension d ◮ This makes Monte Carlo strategy very attractive for high dimensional problems

  19. Exact Simulation 19/30 ◮ Monte Carlo methods require sampling a set of points chosen randomly from a probability distribution ◮ For simple distribution f ( x ) whose inverse cumulative distribution functions (CDF) exists, we can sampling x from f as follows x = F − 1 ( u ) , u ∼ Uniform(0 , 1) where F − 1 is the inverse CDF of f ◮ Proof. p ( a ≤ x ≤ b ) = p ( F ( a ) ≤ u ≤ F ( b )) = F ( b ) − F ( a )

  20. Examples 20/30 ◮ Exponential distribution: f ( x ) = θ exp( − θx ). The CDF is � a θ exp( − θx ) = 1 − exp( − θa ) F ( a ) = 0 therefore, x = F − 1 ( u ) = − 1 θ log(1 − u ) ∼ f ( x ). Since 1 − u also follows the uniform distribution, we often use x = − 1 θ log( u ) instead 2 π exp( − x 2 1 ◮ Normal distribution: f ( x ) = 2 ). Box-Muller √ Transform � − 2 log U 1 cos 2 πU 2 X = � − 2 log U 1 sin 2 πU 2 Y = where U 1 ∼ Uniform(0 , 1) , U 2 ∼ Uniform(0 , 1)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend