 
              Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29
Outline 1 Introduction 2 Newton–Cotes quadrature 3 Gaussian quadrature 4 Laplace approximation 5 Conclusion 2 / 29
Motivation While many statistics problems rely on optimization, there are also some that require numerical integration. Bayesian statistics is almost exclusively integration. data admits a likelihood function L ( θ ); θ unknown, so assign it a weight function π ( θ ); combine prior and data using Bayes’s formula L ( θ ) π ( θ ) π ( θ | x ) = Θ L ( θ ′ ) π ( θ ′ ) d θ ′ . � Need to compute probabilities and expectations — integrals! Some “non-Bayesian” problems may involve integration, e.g., random- or mixed-effects models. Other approaches besides Bayesian and frequentist... 3 / 29
Intuition There are a number of classical numerical integration techniques, simple and powerful. Think back to calculus class where integral was defined: approximate function by a constant on small intervals; compute area of rectangles and sum them up; integral defined as the limit of this sum as mesh → 0. Numerical integration, or quadrature , is based on this definition and refinements thereof. Basic principle: 3 approximate the function on a small interval by a nice one that you know how to integrate. Works well for one- or two-dim integrals; for higher-dim integrals, other tools are needed. 3 This was essentially the same principle that motivated the various methods we discussed for optimization! 4 / 29
Notation Suppose that f ( x ) is a function that we’d like to integrate over an interval [ a , b ]. Take n relatively large and set h = ( b − a ) / n . Let x i = a + ih , i = 0 , . . . , n − 1. Key point: if f ( x ) is “nice,” then it can be approximated by a simple function on the small interval [ x i , x i +1 ]. A general strategy is to approximate the integral by � b n − 1 � x i +1 n − 1 m � � � f ( x ) dx = f ( x ) ≈ A ij f ( x ij ) , a x i i =0 i =0 j =0 for appropriately chosen A ij ’s and m . 5 / 29
Outline 1 Introduction 2 Newton–Cotes quadrature 3 Gaussian quadrature 4 Laplace approximation 5 Conclusion 6 / 29
Polynomial approximation Consider the following sequence of polynomials: x − x ik � p ij ( x ) = , j = 0 , . . . , m . x ij − x ik k � = j Then m � p i ( x ) = p ij ( x ) f ( x ij ) j =0 is an m th degree polynomial that interpolates f ( x ) at the nodes x i 0 , . . . , x im . Furthermore, � x i +1 � x i +1 � x i +1 m � f ( x ) dx ≈ p ( x ) dx = p ij ( x ) dx f ( x ij ) . x i x i x i j =0 � �� � A ij 7 / 29
Riemann rule: m = 0 Approximate f ( x ) on [ x i , x i +1 ] by a constant. Here x i 0 = x i and p i 0 ( x ) ≡ 1, so � b n − 1 n − 1 � � f ( x ) dx ≈ f ( x i )( x i +1 − x i ) = h f ( x i ) . a i =0 i =0 Features of Riemann’s rule: Very easy to program — only need f ( x 0 ) , . . . , f ( x n ). Can be slow to converge, i.e., lots of x i ’s may be needed to get a good approximation. 8 / 29
Trapezoid rule: m = 1 Approximate f ( x ) on [ x i , x i +1 ] by a linear function. In this case: x i 0 = x i and x i 1 = x i +1 . A i 0 = A i 1 = ( x i +1 − x i ) / 2 = h / 2. Therefore, � b n − 1 f ( x ) dx ≈ h � � � f ( x i ) + f ( x i +1 ) . 2 a i =0 Still only requires function evaluations at the x i ’s. More accurate then Riemann because the linear approximation is more flexible than constant. Can derive bounds on the approximation error... 9 / 29
Trapezoid rule (cont.) A general tool which we can use to study the precision of the trapezoid rule is the Euler–Maclaurin formula. Suppose that g ( x ) is twice differentiable; then � n n � � � n g ( t ) dt + 1 2 { g (0) + g ( n ) } + C 1 g ′ ( t ) g ( t ) ≈ 0 , 0 t =0 � n 0 | g ′′ ( t ) | dt . where | LHS − RHS | ≤ C 2 How does this help? Trapezoid rule is � 1 � 2 g (0) + g (1) + · · · + g ( n − 1) + 1 T ( h ) := h 2 g ( n ) where g ( x ) = f ( a + h t ). 10 / 29
Trapezoid rule (cont.) Apply Euler–Maclaurin to T ( h ): n � g ( t ) − h T ( h ) = h 2 { g (0) + g ( n ) } t =0 � n � � � � n g ′ ( t ) ≈ h g ( t ) dt + h C 1 0 0 � b 1 h f ( x ) dx + h C 1 { hf ′ ( b ) − hf ′ ( a ) } . = h a Therefore, � b � � � = O ( h 2 ) , � � � T ( h ) − f ( x ) dx h → 0 . a 11 / 29
Trapezoid rule (cont.) Can trapezoid error O ( h 2 ) be improved? Our derivation above is not quite precise; the next smallest term in the expansion is O ( h 4 ). Romberg recognized that a manipulation of T ( h ) will cancel the O ( h 2 ) term, leaving only the O ( h 4 ) term! Romberg’s rule is � b 4 T ( h 2 ) − T ( h ) f ( x ) dx + O ( h 4 ) , = h → 0 . 3 a Can be iterated to improve further; see Sec. 5.2 in G&H. 12 / 29
Simpson rule: m = 2 Approximate f ( x ) on [ x i , x i +1 ] by a quadratic function. Similar arguments as above gives the x i ’s and A ij ’s. Simpson’s rule approximation is � b n − 1 � � x i + x i +1 � � f ( x ) dx ≈ h � f ( x i ) + 4 f + f ( x i +1 ) . 6 2 a i =0 More accurate than the trapezoid rule — error is O ( n − 4 ). If n is taken to be even, then the formula simplifies a bit; see Equation (5.20) in G&H and my R code. 13 / 29
Remarks This approach works for generic m and the approximation improves as m increases. Can be extended to functions of more than one variable, but details get complicated real fast. In R, integrate does one-dimensional integration. Numerical methods and corresponding software work very well, but care is still needed — see Section 5.4 in G&H. 14 / 29
Example: Bayesian analysis of binomial Suppose X ∼ Bin( n , θ ) with n known and θ unknown. Prior for θ is the so-called semicircle distribution with density π ( θ ) = 8 π − 1 � 1 2 ) 2 � 1 / 2 , 4 − ( θ − 1 θ ∈ [0 , 1] . The posterior density is then θ x (1 − θ ) n − x � 1 2 ) 2 � 1 / 2 4 − ( θ − 1 π x ( θ ) = . � 1 0 u x (1 − u ) n − x � 1 2 ) 2 � 1 / 2 du 4 − ( θ − 1 Calculating the Bayes estimate of θ , the posterior mean, requires a numerical integration. 15 / 29
Example: mixture densities Mixture distributions are very common models, flexible. Useful for density estimation and heavy-tailed modeling. General mixture model looks like � p ( y ) = k ( y | x ) f ( x ) dx , where kernel k ( y | x ) is a pdf (or pmf) in y for each x f ( x ) is a pdf (or pmf). Easy to check that p ( y ) is a pdf (or pmf) depending on k . Evaluation of p ( y ) requires integration for each y . 16 / 29
Example 5.1 in G&H Generalized linear mixed model: � i = 1 , . . . , n ind λ ij = e γ i e β 0 + β 1 j , Y ij ∼ Pois( λ ij ) , j = 1 , . . . , J iid ∼ N(0 , σ 2 where γ 1 , . . . , γ n γ ). Model parameters are ( β 0 , β 1 , σ 2 γ ). Marginal likelihood for θ = ( β 0 , β 1 , σ 2 γ ) is n J � � � Pois( Y ij | e γ i e β 0 + β 1 j )N( γ i | 0 , σ 2 L ( θ ) = γ ) d γ i . i =1 j =1 Goal is to maximize over θ ... 17 / 29
Example 5.1 in G&H (cont.) Taking log we get n J � � � Pois( Y ij | e γ i e β 0 + β 1 j )N( γ i | 0 , σ 2 ℓ ( θ ) = log γ ) d γ i . i =1 j =1 � �� � L i ( θ ) ∂ G&H consider evaluating ∂β 1 L 1 ( θ ), or � � J � � j ( Y 1 j − e γ 1 e β 0 + β 1 j ) j =1 � J � � Pois( Y 1 j | e γ 1 e β 0 + β 1 j ) N( γ 1 | 0 , σ 2 × γ ) d γ 1 . j =1 Reproduce Tables 5.2–5.4 using R codes. 18 / 29
Outline 1 Introduction 2 Newton–Cotes quadrature 3 Gaussian quadrature 4 Laplace approximation 5 Conclusion 19 / 29
Very brief summary Gaussian quadrature is an alternative Newton–Cotes methods. Useful primarily in problems where integration is with respect to a “non-uniform” measure, e.g., an expectation. Basic idea is that the measure identifies a sequence of “orthogonal polynomials.” Approximations of f via these polynomials turns out to be better than Newton–Cotes approximations, at least as far as integration is concerned. Book gives minimal details, and we won’t get into it here. 20 / 29
Outline 1 Introduction 2 Newton–Cotes quadrature 3 Gaussian quadrature 4 Laplace approximation 5 Conclusion 21 / 29
Setup The Laplace approximation is a tool that allows us to approximate certain integrals — based on optimization! The type of integrals to be considered are � b f ( x ) e ng ( x ) dx , J n := n → ∞ a endpoints a < b can be finite or infinite; f and g are sufficiently nice functions; g has a unique maximizer ˆ x = arg max g ( x ) in interior of ( a , b ). Claim: when n is large, the major contribution to the integral x , the maximizer of g . 4 comes from a neighborhood of ˆ 4 For a proof of this claim, see Section 4.7 in Lange. 22 / 29
Formula Assuming the “claim,” it suffices to restrict the range of integration to a small neighborhood around ˆ x , where x ) 2 . + 1 g ( x ) ≈ g (ˆ x ) + ˙ g (ˆ x )( x − ˆ x ) 2 ¨ g (ˆ x )( x − ˆ � �� � =0 Plug this into integral: � x )+ 1 x ) 2 } dx f ( x ) e n { g (ˆ 2 n ¨ g (ˆ x )( x − ˆ J n ≈ nbhd � x ) 2 dx . f ( x ) e − 1 = e ng (ˆ x ) 2 [ − n ¨ g (ˆ x )]( x − ˆ nbhd 23 / 29
Recommend
More recommend