Lecture 7: Kernel Density Estimation Applied Statistics 2015 1 / 20 - - PowerPoint PPT Presentation

lecture 7 kernel density estimation
SMART_READER_LITE
LIVE PREVIEW

Lecture 7: Kernel Density Estimation Applied Statistics 2015 1 / 20 - - PowerPoint PPT Presentation

Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments Lecture 7: Kernel Density Estimation Applied Statistics 2015 1 / 20 Kernel Density Estimator Measures of discrepancy Practical bandwidth choices


slide-1
SLIDE 1

Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments

Lecture 7: Kernel Density Estimation

Applied Statistics 2015

1 / 20

slide-2
SLIDE 2

Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments

Histogram

The oldest density estimator is the hitogram. Suppose that we have a dissection of the real line into bins; then the estimator is defined by ˆ fhist = 1 n

  • NO. of Xi in the same bin

width of bin containing x .

x Density −3 −2 −1 1 2 3 4 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35

  • 2 / 20
slide-3
SLIDE 3

Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments

Naive Density Estimator

Let X1, . . . , Xn be a random sample from unknown distribution function F. Suppose that F has continuous density f. How to estimate f based on the sample using non-parametric methods?

3 / 20

slide-4
SLIDE 4

Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments

Naive Density Estimator

Let X1, . . . , Xn be a random sample from unknown distribution function F. Suppose that F has continuous density f. How to estimate f based on the sample using non-parametric methods? Note that f = h(F) = F ′. Plugging the edf ˆ Fn does not work now because ˆ Fn is a discrete df and does not have a density.

3 / 20

slide-5
SLIDE 5

Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments

Naive Density Estimator

Let X1, . . . , Xn be a random sample from unknown distribution function F. Suppose that F has continuous density f. How to estimate f based on the sample using non-parametric methods? Note that f = h(F) = F ′. Plugging the edf ˆ Fn does not work now because ˆ Fn is a discrete df and does not have a density. On the other hand, f(x) = lim

h→0

F(x + h) − F(x − h) 2h . Hence, we consider, for small h, ˆ fn(x) = ˆ Fn(x + h) − ˆ Fn(x − h) 2h . This is called naive density estimator.

3 / 20

slide-6
SLIDE 6

Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments

Naive Density Estimator

Theorem

If h = hn → 0 and nhn → ∞, as n → ∞, then, for any y, ˆ fn(y)

p

→ f(y), as n → ∞. So it is a consistent estimator. ˆ fn is a probability density function. The fact that n( ˆ Fn(x + h) − ˆ Fn(x − h)) ∼ Bin(n, F(x + h) − F(x − h)) leads to E

  • ˆ

fn

  • = F(x + h) − F(x − h)

2h ; Var

  • ˆ

fn

  • = (F(x + h) − F(x − h))(1 − F(x + h) + F(x − h))

4nh2 .

4 / 20

slide-7
SLIDE 7

Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments

Naive Density Estimator

Plots of ˆ fn, for n = 100 and h = 0.1.

−3 −2 −1 1 2 3 0.0 0.1 0.2 0.3 0.4 0.5 0.6 X density

  • −2.4

−2.2 −2.0 −1.8 −1.6 0.0 0.1 0.2 0.3 0.4 0.5 0.6 X density

  • 5 / 20
slide-8
SLIDE 8

Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments

Naive Density Estimator

Plots of ˆ fn, for n = 100 and h = 0.1.

−3 −2 −1 1 2 3 0.0 0.1 0.2 0.3 0.4 0.5 0.6 X density

  • −2.4

−2.2 −2.0 −1.8 −1.6 0.0 0.1 0.2 0.3 0.4 0.5 0.6 X density

  • ˆ

fn is a step function, and thus discontinuous. The empirical distri- bution assigns 1/n probability to each observation Xi. Spread this probability mass over the interval [Xi −h, Xi +h] according to a uni- form distribution. The discontinuities of the uniform density at the end points results in the ragged appearance of ˆ fn.

5 / 20

slide-9
SLIDE 9

Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments

Kernel Density Estimator

For naive density estimator, we have, ˆ fn(x) = 1 nh

n

  • i=1

1 2I(x−h < Xi ≤ x+h) = 1 nh

n

  • i=1

1 21[−1,1) x − Xi h

  • .

A kernel is a continuous and symmetric (around zero) density function satisfying

  • x2K(x)dx < ∞

and

  • K2(x)dx < ∞.

In the construction of ˆ fn, replacing the uniform density, 1

21[−1,1)(·),

by a kernel K(·), we obtain a kernel density estimator of f ˆ fn(x) = 1 nh

n

  • i=1

K x − Xi h

  • .

h is called the bandwidth. It is a smoothing parameter.

6 / 20

slide-10
SLIDE 10

Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments

Kernel Density Estimator

The estimator depends on the kernel K and the bandwidth h. Following are some commonly used kernels.

3 4(1 − u2)1[−1,1](u),

Epanechnikov;

15 16(1 − u2)21[−1,1](u),

Biweight;

1 √ 2π exp

  • − 1

2u2

, Gaussian; (1 − |u|)1[−1,1](u), triangular;

1 21[−1,1](u),

uniform.

7 / 20

slide-11
SLIDE 11

Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments

Example 1

Data: 100 observations from exp(1) kernel: Epanechnikov comparing the effect of different bandwidths. the black curve indicates the real density.

2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0

kernel density estimator

x density real 0.05 0.15 0.5

  • larger bandwidth leads to a smoother, however, more biased,

estimator

8 / 20

slide-12
SLIDE 12

Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments

Example 2

Data: 100 observations from exp(1) kernel: Gaussian comparing the effect of different bandwidths. the black curve indicates the real density.

2 4 6 0.0 0.2 0.4 0.6 0.8 1.0

kernel density estimator

x density real 0.05 0.15 0.5

  • ● ●
  • larger bandwidth leads to a smoother, however, more biased,

estimator

9 / 20

slide-13
SLIDE 13

Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments

To see the performance of the estimator, consider the bias and the mean square error of ˆ fn(x) for fixed x.

Theorem

Let f be three times differentiable with bounded third derivative in a neighborhood of x. Let the kernel K satisfy

  • |y|3K(y)dy < ∞. If

limn→∞ hn = 0, then, with τ 2 =

  • y2K(y)dy,

E

  • ˆ

fn(x) − f(x)

  • = 1

2h2

nf ′′(x)τ 2 + o(h2 n).

If in addition, limn→∞ nhn = ∞, then Var

  • ˆ

fn(x)

  • =

1 nhn f(x)

  • K2(y)dy + o

1 nhn

  • .

Thus, MSE( ˆ fn(x)) =E( ˆ fn(x) − f(x))2 =1 4h4

n(f ′′(x))2τ 4 + f(x)

nhn

  • K2(y)dy + o
  • h4

n +

1 nhn

  • .

10 / 20

slide-14
SLIDE 14

Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments

Optimal bandwidth: bias-variance trade off

Observe that as h increases, the bias becomes large while the variance

  • decreases. In order to find the optimal value of h, we minimize the MSE.

This leads to: hopt1 = f(x)

  • K2(z)dz

(f ′′(x))2τ 4 1/5 1 n1/5 . It follows that the corresponding MSE and variance are both of the order n−4/5.

11 / 20

slide-15
SLIDE 15

Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments

Optimal kernel

The optimal bandwidth results in the following mean square error, MSEhopt1 = c(n, f)

  • y2K(y)dy
  • K2(y)dy

22/5 , where c(n, f) is a constant depending only on n and f(x). The optimal kernel is then obtained by minimizing the term

  • y2K(y)dy
  • K2(y)dy

2 (it is the same as minimizing the MSE(hopt1)). The desired kernel is kopt(u) = 1 c kepa(u/c), where c is a positive number and kepa is the Epanechnikov kernel.

12 / 20

slide-16
SLIDE 16

Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments

Global behavior of ˆ fn

View ˆ fn as an estimator of the whole function f. We consider the mean integrated square error as a risk function. We define integrated square error of ˆ fn: ∞

−∞

( ˆ fn(x) − f(x))2dx. This takes into account the performance of the estimator over all x. It is a random variable. Next we take the expectation. Mean integrated square error (MISE) is defined by E ∞

−∞

( ˆ fn(x) − f(x))2dx

  • .

13 / 20

slide-17
SLIDE 17

Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments

Global behavior of ˆ fn

Observe that MISE( ˆ fn) = ∞

−∞

E

  • ( ˆ

fn(x) − f(x))2 dx = ∞

−∞

MSE( ˆ fn(x))dx = IMSE( ˆ fn). It can be shown (c.f Rosenblatt (1971)) that MISE( ˆ fn) = 1 4h4

nτ 4

  • (f ′′(x))2dx+

1 nhn

  • K2(y)dy+o
  • h4

n +

1 nhn

  • .

Thus MISE( ˆ fn) → 0 and further ∞

−∞( ˆ

fn(x) − f(x))2dx

P

→ 0.

14 / 20

slide-18
SLIDE 18

Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments

Global behavior of ˆ fn

Minimizing the MISE leads to the following optimal bandwidth, hopt2 =

  • K2(z)dz
  • (f ′′(x))2dxτ 4

1/5 1 n1/5 . The resulting MISE is of the order n−4/5. The optimal kernel is also Epanechnivok type.

15 / 20

slide-19
SLIDE 19

Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments

Both locally and globaly, the optimal bandwidth is of the order n−1/5, and the convergence rate is n−4/5. Bandwidth plays a more important role than the kernel. The choice of kernel does not effect the order of bandwith or the rate of mean square

  • convergence. Any kernel from a large class satisfying the assumptions

can be used.

16 / 20

slide-20
SLIDE 20

Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments

Practical bandwidth choices

The theoretically optimal bandwidth, hopt2, depends on the unknown den- sity f through

  • (f ′′(x))2dx. The actual choice of h is a critical issue.

There are different approaches to choose h in practice. Write hopt2 = n−1/5

C(K)

  • (f ′′(x))2dx, where C(k) is the constant depending
  • nly on K.

Straight plug-in etimate Let ˆ gn be a pilot estimator of f. hpi = n−1/5 C(K)

g′′

n(x))2dx.

This esimtor is very basic and naive because ˆ g′′

n is usually not a good

estimate of f ′′ even if ˆ gn is a good estimator of f. Plug-in equation solving The bandwidth is defined as a suitably selected root of the equation h∗

pi = n−1/5

C(K)

  • ( ˆ

f ′′

n,h∗

pi(x))2dx

, where ˆ fn,h = ˆ fn, the kernel esitmator using bandwidth h.

17 / 20

slide-21
SLIDE 21

Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments

Practical bandwidth choices

Reference bandwidth Choose an auxiliary parametric family, say normal distributions, to choose h, not to estimate f.

We plug in the density of N(0, σ2) into the formula of hopt2, that is, with f(x) = 1

σ ϕ

x−µ

σ

  • ,

h = n−1/5 C(K)

  • (f ′′(x))2dx = n−1/5C(k)

8√π 3 1/5 σ It is recommended to estimate σ with min(S, R/1.35), where S is the sample standard deviation and R is the sample interquantile range, that is R = ˆ F −1

n (0.75) − ˆ

F −1

n (0.25).

(Φ−1(0.75) − Φ−1(0.25) = 1.35.) hr = n−1/5C(k) 8√π 3 1/5 min(S, R/1.35) Taking the Epanechnikov kernel, hr = 2.34 min(S, R/1.35)n−1/5.

18 / 20

slide-22
SLIDE 22

Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments

Group Presentation (March 23)

Group 12 Consider the density f(x) =

1 2ϕ(x − 2) + 1 2ϕ(x + 2). Simulate a

sample of size n = 200 from this density. Compare visually and give a ranking of the following estimates of the density:

assuming N(µ, σ2) model, and a parametric estimator using the MLE of the parameters. a histogram kernel density etimates with at least two kernels and a few different bandwidths.

19 / 20

slide-23
SLIDE 23

Kernel Density Estimator Measures of discrepancy Practical bandwidth choices Assignments

Group Presentation (March 23)

Group 13 Downloand the dataset Forensic Glass Data from http://www.stat.cmu.edu/ larry/all-of-nonpar/data.html. Estimate the density of the first variable (refreactive index).

using Epanechnivok kernel; choosing bandwidths with two approaches: plug-in equation solving, and reference bandwidth. you might also try with other choices of bandwidth.

20 / 20