Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 - - PowerPoint PPT Presentation

numerical integration
SMART_READER_LITE
LIVE PREVIEW

Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 - - PowerPoint PPT Presentation

Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29 Outline 1 Introduction 2


slide-1
SLIDE 1

Stat 451 Lecture Notes 0312 Numerical Integration

Ryan Martin UIC www.math.uic.edu/~rgmartin

1Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2Updated: February 11, 2016 1 / 29

slide-2
SLIDE 2

Outline

1 Introduction 2 Newton–Cotes quadrature 3 Gaussian quadrature 4 Laplace approximation 5 Conclusion

2 / 29

slide-3
SLIDE 3

Motivation

While many statistics problems rely on optimization, there are also some that require numerical integration. Bayesian statistics is almost exclusively integration.

data admits a likelihood function L(θ); θ unknown, so assign it a weight function π(θ); combine prior and data using Bayes’s formula π(θ | x) = L(θ)π(θ)

  • Θ L(θ′)π(θ′) dθ′ .

Need to compute probabilities and expectations — integrals!

Some “non-Bayesian” problems may involve integration, e.g., random- or mixed-effects models. Other approaches besides Bayesian and frequentist...

3 / 29

slide-4
SLIDE 4

Intuition

There are a number of classical numerical integration techniques, simple and powerful. Think back to calculus class where integral was defined:

approximate function by a constant on small intervals; compute area of rectangles and sum them up; integral defined as the limit of this sum as mesh → 0.

Numerical integration, or quadrature, is based on this definition and refinements thereof. Basic principle:3 approximate the function on a small interval by a nice one that you know how to integrate. Works well for one- or two-dim integrals; for higher-dim integrals, other tools are needed.

3This was essentially the same principle that motivated the various methods

we discussed for optimization!

4 / 29

slide-5
SLIDE 5

Notation

Suppose that f (x) is a function that we’d like to integrate

  • ver an interval [a, b].

Take n relatively large and set h = (b − a)/n. Let xi = a + ih, i = 0, . . . , n − 1. Key point: if f (x) is “nice,” then it can be approximated by a simple function on the small interval [xi, xi+1]. A general strategy is to approximate the integral by b

a

f (x) dx =

n−1

  • i=0

xi+1

xi

f (x) ≈

n−1

  • i=0

m

  • j=0

Aijf (xij), for appropriately chosen Aij’s and m.

5 / 29

slide-6
SLIDE 6

Outline

1 Introduction 2 Newton–Cotes quadrature 3 Gaussian quadrature 4 Laplace approximation 5 Conclusion

6 / 29

slide-7
SLIDE 7

Polynomial approximation

Consider the following sequence of polynomials: pij(x) =

  • k=j

x − xik xij − xik , j = 0, . . . , m. Then pi(x) =

m

  • j=0

pij(x)f (xij) is an mth degree polynomial that interpolates f (x) at the nodes xi0, . . . , xim. Furthermore, xi+1

xi

f (x) dx ≈ xi+1

xi

p(x) dx =

m

  • j=0

xi+1

xi

pij(x) dx

  • Aij

f (xij).

7 / 29

slide-8
SLIDE 8

Riemann rule: m = 0

Approximate f (x) on [xi, xi+1] by a constant. Here xi0 = xi and pi0(x) ≡ 1, so b

a

f (x) dx ≈

n−1

  • i=0

f (xi)(xi+1 − xi) = h

n−1

  • i=0

f (xi). Features of Riemann’s rule:

Very easy to program — only need f (x0), . . . , f (xn). Can be slow to converge, i.e., lots of xi’s may be needed to get a good approximation.

8 / 29

slide-9
SLIDE 9

Trapezoid rule: m = 1

Approximate f (x) on [xi, xi+1] by a linear function. In this case:

xi0 = xi and xi1 = xi+1. Ai0 = Ai1 = (xi+1 − xi)/2 = h/2.

Therefore, b

a

f (x) dx ≈ h 2

n−1

  • i=0
  • f (xi) + f (xi+1)
  • .

Still only requires function evaluations at the xi’s. More accurate then Riemann because the linear approximation is more flexible than constant. Can derive bounds on the approximation error...

9 / 29

slide-10
SLIDE 10

Trapezoid rule (cont.)

A general tool which we can use to study the precision of the trapezoid rule is the Euler–Maclaurin formula. Suppose that g(x) is twice differentiable; then

n

  • t=0

g(t) ≈ n g(t) dt + 1

2{g(0) + g(n)} + C1 g′(t)

  • n

0,

where |LHS − RHS| ≤ C2 n

0 |g′′(t)| dt.

How does this help? Trapezoid rule is T(h) := h 1

2g(0) + g(1) + · · · + g(n − 1) + 1 2g(n)

  • where g(x) = f (a + h t).

10 / 29

slide-11
SLIDE 11

Trapezoid rule (cont.)

Apply Euler–Maclaurin to T(h): T(h) = h

n

  • t=0

g(t) − h

2{g(0) + g(n)}

≈ h n g(t) dt + h C1

  • g′(t)
  • n
  • = h

b

a 1 hf (x) dx + h C1 {hf ′(b) − hf ′(a)}.

Therefore,

  • T(h) −

b

a

f (x) dx

  • = O(h2),

h → 0.

11 / 29

slide-12
SLIDE 12

Trapezoid rule (cont.)

Can trapezoid error O(h2) be improved? Our derivation above is not quite precise; the next smallest term in the expansion is O(h4). Romberg recognized that a manipulation of T(h) will cancel the O(h2) term, leaving only the O(h4) term! Romberg’s rule is 4T( h

2) − T(h)

3 = b

a

f (x) dx + O(h4), h → 0. Can be iterated to improve further; see Sec. 5.2 in G&H.

12 / 29

slide-13
SLIDE 13

Simpson rule: m = 2

Approximate f (x) on [xi, xi+1] by a quadratic function. Similar arguments as above gives the xi’s and Aij’s. Simpson’s rule approximation is b

a

f (x) dx ≈ h 6

n−1

  • i=0
  • f (xi) + 4f

xi + xi+1 2

  • + f (xi+1)
  • .

More accurate than the trapezoid rule — error is O(n−4). If n is taken to be even, then the formula simplifies a bit; see Equation (5.20) in G&H and my R code.

13 / 29

slide-14
SLIDE 14

Remarks

This approach works for generic m and the approximation improves as m increases. Can be extended to functions of more than one variable, but details get complicated real fast. In R, integrate does one-dimensional integration. Numerical methods and corresponding software work very well, but care is still needed — see Section 5.4 in G&H.

14 / 29

slide-15
SLIDE 15

Example: Bayesian analysis of binomial

Suppose X ∼ Bin(n, θ) with n known and θ unknown. Prior for θ is the so-called semicircle distribution with density π(θ) = 8π−1 1

4 − (θ − 1 2)21/2,

θ ∈ [0, 1]. The posterior density is then πx(θ) = θx(1 − θ)n−x 1

4 − (θ − 1 2)21/2

1

0 ux(1 − u)n−x 1 4 − (θ − 1 2)21/2 du

. Calculating the Bayes estimate of θ, the posterior mean, requires a numerical integration.

15 / 29

slide-16
SLIDE 16

Example: mixture densities

Mixture distributions are very common models, flexible. Useful for density estimation and heavy-tailed modeling. General mixture model looks like p(y) =

  • k(y | x)f (x) dx,

where

kernel k(y | x) is a pdf (or pmf) in y for each x f (x) is a pdf (or pmf).

Easy to check that p(y) is a pdf (or pmf) depending on k. Evaluation of p(y) requires integration for each y.

16 / 29

slide-17
SLIDE 17

Example 5.1 in G&H

Generalized linear mixed model: Yij

ind

∼ Pois(λij), λij = eγieβ0+β1j,

  • i = 1, . . . , n

j = 1, . . . , J where γ1, . . . , γn

iid

∼ N(0, σ2

γ).

Model parameters are (β0, β1, σ2

γ).

Marginal likelihood for θ = (β0, β1, σ2

γ) is

L(θ) =

n

  • i=1
  • J
  • j=1

Pois(Yij | eγieβ0+β1j)N(γi | 0, σ2

γ) dγi.

Goal is to maximize over θ...

17 / 29

slide-18
SLIDE 18

Example 5.1 in G&H (cont.)

Taking log we get ℓ(θ) =

n

  • i=1

log

  • J
  • j=1

Pois(Yij | eγieβ0+β1j)N(γi | 0, σ2

γ) dγi

  • Li(θ)

. G&H consider evaluating

∂ ∂β1 L1(θ), or

J

  • j=1

j(Y1j − eγ1eβ0+β1j)

  • ×

J

  • j=1

Pois(Y1j | eγ1eβ0+β1j)

  • N(γ1 | 0, σ2

γ) dγ1.

Reproduce Tables 5.2–5.4 using R codes.

18 / 29

slide-19
SLIDE 19

Outline

1 Introduction 2 Newton–Cotes quadrature 3 Gaussian quadrature 4 Laplace approximation 5 Conclusion

19 / 29

slide-20
SLIDE 20

Very brief summary

Gaussian quadrature is an alternative Newton–Cotes methods. Useful primarily in problems where integration is with respect to a “non-uniform” measure, e.g., an expectation. Basic idea is that the measure identifies a sequence of “orthogonal polynomials.” Approximations of f via these polynomials turns out to be better than Newton–Cotes approximations, at least as far as integration is concerned. Book gives minimal details, and we won’t get into it here.

20 / 29

slide-21
SLIDE 21

Outline

1 Introduction 2 Newton–Cotes quadrature 3 Gaussian quadrature 4 Laplace approximation 5 Conclusion

21 / 29

slide-22
SLIDE 22

Setup

The Laplace approximation is a tool that allows us to approximate certain integrals — based on optimization! The type of integrals to be considered are Jn := b

a

f (x)eng(x) dx, n → ∞

endpoints a < b can be finite or infinite; f and g are sufficiently nice functions; g has a unique maximizer ˆ x = arg max g(x) in interior of (a, b).

Claim: when n is large, the major contribution to the integral comes from a neighborhood of ˆ x, the maximizer of g.4

4For a proof of this claim, see Section 4.7 in Lange. 22 / 29

slide-23
SLIDE 23

Formula

Assuming the “claim,” it suffices to restrict the range of integration to a small neighborhood around ˆ x, where g(x) ≈ g(ˆ x) + ˙ g(ˆ x)(x − ˆ x)

  • =0

+ 1

2 ¨

g(ˆ x)(x − ˆ x)2. Plug this into integral: Jn ≈

  • nbhd

f (x)en{g(ˆ

x)+ 1

2 n¨

g(ˆ x)(x−ˆ x)2} dx

= eng(ˆ

x)

  • nbhd

f (x)e− 1

2 [−n¨

g(ˆ x)](x−ˆ x)2 dx.

23 / 29

slide-24
SLIDE 24

Formula (cont.)

From previous slide: Jn ≈ eng(ˆ

x)

  • nbhd

f (x)e− 1

2 [−n¨

g(ˆ x)](x−ˆ x)2 dx.

Two observations:

since ˆ x is a maximizer, ¨ g(ˆ x) < 0;

  • n small nbhd, f (x) ≈ f (ˆ

x).

Therefore, Jn ≈ f (ˆ x)eng(ˆ

x)

  • nbhd

e− 1

2 [−n¨

g(ˆ x)](x−ˆ x)2 dx

≈ (2π)1/2f (ˆ x)eng(ˆ

x){−n¨

g(ˆ x)}−1/2.

24 / 29

slide-25
SLIDE 25

Example: Stirling’s formula

Stirling’s formula is a useful approximation of factorials. Starts by writing factorial as a gamma function n! = Γ(n + 1) = ∞ zne−z dz. Make a change of variable x = z/n to get n! = nn+1 ∞ en g(x) dx, g(x) = log x − x. g(x) has maximizer ˆ x = 1 in interior of (0, ∞). For large n, Laplace approximation gives: n! ≈ nn+1 · (2π)1/2en g(1){−n¨ g(1)}−1/2 = (2π)1/2nn+1/2e−n.

25 / 29

slide-26
SLIDE 26

Example: Bayesian posterior expectations

Recall the Bayesian ingredients:

L(θ) is the likelihood based on n iid samples π(θ) a prior density.

Then a posterior expectation looks like E{h(θ) | data} =

  • h(θ)L(θ)π(θ) dθ
  • L(θ)π(θ) dθ

. When n is large, applying Laplace to both numerator and denominator gives E{h(θ) | data} ≈ h(ˆ θ), where ˆ θ is the MLE. So, previous binomial example that showed posterior mean close to MLE was not a coincidence...

26 / 29

slide-27
SLIDE 27

Remarks

Can be shown that the error in Laplace approx is O(n−1).5 The basic principle of the Laplace approximation is that locally the integrals look like Gaussian integrals. This principle extends to integrals over more than one dimension — this multivariate version is most useful. There is also a version of the Laplace approximation for the case when the maximizer of g is on the boundary. Then the principle is to make integral looks like exponential or gamma integrals. Details of this version can be found in Sec. 4.6 of Lange.

5Can be improved with some extra care. 27 / 29

slide-28
SLIDE 28

Outline

1 Introduction 2 Newton–Cotes quadrature 3 Gaussian quadrature 4 Laplace approximation 5 Conclusion

28 / 29

slide-29
SLIDE 29

Remarks

Quadrature methods are very powerful. In principle, these methods can be developed for integrals of any dimension, but they only work well in 1–2 dimensions. “Curse of dimensionality” — if the dimension is large, then

  • ne needs so many grid points to get good approximations.

Laplace approximation can work in high-dimensions, but only for certain kinds of integrals — fortunately, the stat-related integrals are often of this form. For higher dimensions, Monte Carlo methods are preferred:

generally very easy to do approximation accuracy is independent of dimension.

We will talk in detail later about Monte Carlo.

29 / 29