[PPT] - A general procedure to combine estimators Fr ed eric Lavancier PowerPoint Presentation

SLIDE 1

Examples Method Theory Simulations Conclusion

A general procedure to combine estimators

Fr´ ed´ eric Lavancier

Laboratoire de Math´ ematiques Jean Leray University of Nantes

Joint work with Paul Rochet (University of Nantes)

SLIDE 2

Examples Method Theory Simulations Conclusion

Introduction

Let θ be an unknown quantity in a statistical model. Consider a collection of k estimators T1, ..., Tk of θ. Aim: combining these estimators to obtain a better estimate.

SLIDE 3

Examples Method Theory Simulations Conclusion

1

Some examples

2

The method

3

Theoretical results

4

Simulations : back to the examples

5

Conclusion

SLIDE 4

Examples Method Theory Simulations Conclusion

1

Some examples

2

The method

3

Theoretical results

4

Simulations : back to the examples

5

Conclusion

SLIDE 5

Examples Method Theory Simulations Conclusion

Example 1 : mean and median

Let x1, . . . , xn be n i.i.d realisations of an unknown distribution on the real line. Assume this distribution is symmetric around some parameter θ (θ ∈ R). Two natural choices to estimate θ : the mean T1 = ¯ xn the median T2 = x(n/2) The idea to combine these two estimators comes from Pierre Simon de Laplace. In the Second Supplement of the Th´ eorie Analytique des Probabilit´ es (1812), he wrote : ” En combinant les r´ esultats de ces deux m´ ethodes, on peut obtenir un r´ esultat dont la loi de probabilit´ e des erreurs soit plus rapidement d´ ecroissante.” [In combining the results of these two methods, one can obtain a result whose probability law of error will be more rapidly decreasing.]

SLIDE 6

Examples Method Theory Simulations Conclusion

Example 1 : mean and median

Let x1, . . . , xn be n i.i.d realisations of an unknown distribution on the real line. Assume this distribution is symmetric around some parameter θ (θ ∈ R). Two natural choices to estimate θ : the mean T1 = ¯ xn the median T2 = x(n/2) The idea to combine these two estimators comes from Pierre Simon de Laplace. In the Second Supplement of the Th´ eorie Analytique des Probabilit´ es (1812), he wrote : ” En combinant les r´ esultats de ces deux m´ ethodes, on peut obtenir un r´ esultat dont la loi de probabilit´ e des erreurs soit plus rapidement d´ ecroissante.” [In combining the results of these two methods, one can obtain a result whose probability law of error will be more rapidly decreasing.]

SLIDE 7

Examples Method Theory Simulations Conclusion

Example 1 : mean and median

Laplace considered the combination λ1¯ xn + λ2x(n/2) with λ1 + λ2 = 1.

1. He proved that the asymptotic law of this combination is Gaussian (in 1812)!
2. Minimizing the asymptotic variance in λ1, λ2, he concluded that

if the underlying distribution is Gaussian, then the best combination is to take λ1 = 1 and λ2 = 0. for other distributions, the best combination depends on the distribution: ” L’ignorance o` u l’on est de la loi de probabilit´ e des erreurs des observations rend cette correction impraticable” [When one does not know the distribution of the errors of observation this correction is not feasible.]

SLIDE 8

Examples Method Theory Simulations Conclusion

Example 1 : mean and median

Laplace considered the combination λ1¯ xn + λ2x(n/2) with λ1 + λ2 = 1.

1. He proved that the asymptotic law of this combination is Gaussian (in 1812)!
2. Minimizing the asymptotic variance in λ1, λ2, he concluded that

if the underlying distribution is Gaussian, then the best combination is to take λ1 = 1 and λ2 = 0. for other distributions, the best combination depends on the distribution: ” L’ignorance o` u l’on est de la loi de probabilit´ e des erreurs des observations rend cette correction impraticable” [When one does not know the distribution of the errors of observation this correction is not feasible.]

SLIDE 9

Examples Method Theory Simulations Conclusion

Example 1 : mean and median

Laplace considered the combination λ1¯ xn + λ2x(n/2) with λ1 + λ2 = 1.

1. He proved that the asymptotic law of this combination is Gaussian (in 1812)!
2. Minimizing the asymptotic variance in λ1, λ2, he concluded that

if the underlying distribution is Gaussian, then the best combination is to take λ1 = 1 and λ2 = 0. for other distributions, the best combination depends on the distribution: ” L’ignorance o` u l’on est de la loi de probabilit´ e des erreurs des observations rend cette correction impraticable” [When one does not know the distribution of the errors of observation this correction is not feasible.]

SLIDE 10

Examples Method Theory Simulations Conclusion

Example 1 : mean and median

Laplace considered the combination λ1¯ xn + λ2x(n/2) with λ1 + λ2 = 1.

1. He proved that the asymptotic law of this combination is Gaussian (in 1812)!
2. Minimizing the asymptotic variance in λ1, λ2, he concluded that

if the underlying distribution is Gaussian, then the best combination is to take λ1 = 1 and λ2 = 0. for other distributions, the best combination depends on the distribution: ” L’ignorance o` u l’on est de la loi de probabilit´ e des erreurs des observations rend cette correction impraticable” [When one does not know the distribution of the errors of observation this correction is not feasible.] Is it possible to estimate λ1 and λ2?

SLIDE 11

Examples Method Theory Simulations Conclusion

Example 2 : Weibull model

Let x1, . . . , xn i.i.d with respect to the Weibull distribution f (x) = β η x η β−1 e−(x/η)β, x > 0. We consider 3 standard methods to estimate β and η the maximum likelihood estimator (ML) the method of moments (MM) the ordinary least squares method or Weibull plot (OLS)

SLIDE 12

Examples Method Theory Simulations Conclusion

Example 2 : Weibull model

Repartition of ˆ β when β = 0.5 and β = 3 (η = 10, n = 20) Simulations based on 104 replications.

ML

MM OLS 0.2 0.4 0.6 0.8 1.0 1.2 1.4

ML

MM OLS 1 2 3 4 5 6 7

Which one to choose? Can we combine them to get a better estimate?

SLIDE 13

Examples Method Theory Simulations Conclusion

Example 3 : kernel density estimation

Let x1, . . . , xn be a sample from a real random variable with density f . The kernel density estimator of f at x ∈ R is ˆ fn,h(x) = 1 nh

n

i=1

K x − xi h

,

where K is the kernel and h the smoothing bandwidth. In this setting θ = f and we assume that θ ∈ L2(R). For a fixed kernel K (say the Gaussian kernel), one may consider k different choices of bandwidth h1, . . . , hk leading to the estimators: T1 = ˆ fn,h1, . . . , Tk = ˆ fn,hk

SLIDE 14

Examples Method Theory Simulations Conclusion

Example 3 : kernel density estimation

For instance, in R, 5 choices are proposed in the function density (option bw): nrd0 (Silverman rule of thumb), nrd (a variation) ucv and bcv (unbiased and biased cross validation), SJ (Sheather and Jones method). Example with a mixture distribution and the Cauchy distribution (n = 500) :

−4 −2 2 4 0.00 0.05 0.10 0.15 0.20 0.25 −6 −4 −2 2 4 6 0.0 0.1 0.2 0.3 0.4 0.5

SLIDE 15

Examples Method Theory Simulations Conclusion

Other examples

Any parametric model where several estimators are available. Any method involving tuning parameters. In forecasting (of a time series, or of a model output) : combination of several forecasts. This special case has been widely studied and specific procedures have been developed.

SLIDE 16

Examples Method Theory Simulations Conclusion

1

Some examples

2

The method

3

Theoretical results

4

Simulations : back to the examples

5

Conclusion

SLIDE 17

Examples Method Theory Simulations Conclusion

The oracle

Let θ ∈ R and consider a collection of k estimators : T = (T1, ..., Tk)⊤. Following P.S. de Laplace, we look for the best linear combination : λ⊤T =

k

i=1

λiTi, where

k

i=1

λi = 1. We denote Λmax = {λ ∈ Rk : λ⊤1 = 1}, where 1 is the vector 1 = (1, ..., 1)⊤. The best non-random combination, in the mean square sense, is the so-called

racle :

ˆ θ∗ = λ∗⊤T, where λ∗ = arg min

λ∈Λmax E(λ⊤T − θ)2.

This is a standard optimisation problem to deduce λ∗ = Σ−11 1⊤Σ−11 where Σ is the Mean Square Error matrix of T, i.e. Σ = E (T − θ1) (T − θ1)⊤ = (E (Ti − θ)(Tj − θ))i,j=1,...,k .

SLIDE 18

Examples Method Theory Simulations Conclusion

The oracle

Let θ ∈ R and consider a collection of k estimators : T = (T1, ..., Tk)⊤. Following P.S. de Laplace, we look for the best linear combination : λ⊤T =

k

i=1

λiTi, where

k

i=1

λi = 1. We denote Λmax = {λ ∈ Rk : λ⊤1 = 1}, where 1 is the vector 1 = (1, ..., 1)⊤. The best non-random combination, in the mean square sense, is the so-called

racle :

ˆ θ∗ = λ∗⊤T, where λ∗ = arg min

λ∈Λmax E(λ⊤T − θ)2.

This is a standard optimisation problem to deduce λ∗ = Σ−11 1⊤Σ−11 where Σ is the Mean Square Error matrix of T, i.e. Σ = E (T − θ1) (T − θ1)⊤ = (E (Ti − θ)(Tj − θ))i,j=1,...,k .

SLIDE 19

Examples Method Theory Simulations Conclusion

The oracle

Let θ ∈ R and consider a collection of k estimators : T = (T1, ..., Tk)⊤. Following P.S. de Laplace, we look for the best linear combination : λ⊤T =

k

i=1

λiTi, where

k

i=1

λi = 1. We denote Λmax = {λ ∈ Rk : λ⊤1 = 1}, where 1 is the vector 1 = (1, ..., 1)⊤. The best non-random combination, in the mean square sense, is the so-called

racle :

ˆ θ∗ = λ∗⊤T, where λ∗ = arg min

λ∈Λmax E(λ⊤T − θ)2.

This is a standard optimisation problem to deduce λ∗ = Σ−11 1⊤Σ−11 where Σ is the Mean Square Error matrix of T, i.e. Σ = E (T − θ1) (T − θ1)⊤ = (E (Ti − θ)(Tj − θ))i,j=1,...,k .

SLIDE 20

Examples Method Theory Simulations Conclusion

The average estimator

The oracle is therefore ˆ θ∗ = λ∗⊤T = 1⊤Σ−1 1⊤Σ−11T. In practice, the optimal weight λ∗ is not known and must be estimated. This reduces to the estimation of the MSE matrix Σ. Denoting ˆ Σ some estimate of Σ, we obtain the average estimator ˆ θ = ˆ λ⊤T = 1⊤ ˆ Σ−1 1⊤ ˆ Σ−11 T.

SLIDE 21

Examples Method Theory Simulations Conclusion

Estimation of Σ

In practice, the estimation of Σ may be conducted in several ways, depending

n the underlying model :

In a fully parametric model, the law of T only depends on θ and so Σ = Σ(θ). If Σ(θ) is explicitly known, a natural choice is the plug-in estimator ˆ Σ = Σ(ˆ θ0), where ˆ θ0 is some estimator of θ (for instance one of the initial Ti, or their simple average). Otherwise, Σ(ˆ θ0) may be approximated by parametric bootstrap. Note that in these cases, ˆ Σ (and so the average estimator ˆ θ) do not require the initial data used to produce T, but only T. In a non-parametric setting: Σ may be estimated by standard non-parametric bootstrap. Alternatively, a parametric close-form expression of Σ may be available

asymptotically. Then we can use a plug-in estimation method to get ˆ

Σ.

SLIDE 22

Examples Method Theory Simulations Conclusion

Estimation of Σ

In practice, the estimation of Σ may be conducted in several ways, depending

n the underlying model :

In a fully parametric model, the law of T only depends on θ and so Σ = Σ(θ). If Σ(θ) is explicitly known, a natural choice is the plug-in estimator ˆ Σ = Σ(ˆ θ0), where ˆ θ0 is some estimator of θ (for instance one of the initial Ti, or their simple average). Otherwise, Σ(ˆ θ0) may be approximated by parametric bootstrap. Note that in these cases, ˆ Σ (and so the average estimator ˆ θ) do not require the initial data used to produce T, but only T. In a non-parametric setting: Σ may be estimated by standard non-parametric bootstrap. Alternatively, a parametric close-form expression of Σ may be available

asymptotically. Then we can use a plug-in estimation method to get ˆ

Σ.

SLIDE 23

Examples Method Theory Simulations Conclusion

Estimation of Σ

In practice, the estimation of Σ may be conducted in several ways, depending

n the underlying model :

In a fully parametric model, the law of T only depends on θ and so Σ = Σ(θ). If Σ(θ) is explicitly known, a natural choice is the plug-in estimator ˆ Σ = Σ(ˆ θ0), where ˆ θ0 is some estimator of θ (for instance one of the initial Ti, or their simple average). Otherwise, Σ(ˆ θ0) may be approximated by parametric bootstrap. Note that in these cases, ˆ Σ (and so the average estimator ˆ θ) do not require the initial data used to produce T, but only T. In a non-parametric setting: Σ may be estimated by standard non-parametric bootstrap. Alternatively, a parametric close-form expression of Σ may be available

asymptotically. Then we can use a plug-in estimation method to get ˆ

Σ.

SLIDE 24

Examples Method Theory Simulations Conclusion

Generalization : Combination of several parameters simultaneously

Assume θ = (θ1, . . . , θd)⊤ and we have access to several collections of estimators T1, . . . , Td one for each component θj (the Tj may have different sizes). To estimate, say θ1 : We can consider the simple combination ˆ θ1 = ˆ λ⊤

1 T1

where ˆ λ1 is a vector of weights of the same size as T1. This is the previous setting, where we consider the constraint ˆ λ⊤

1 1 = 1.

Or we can consider the full combination ˆ θ1 = ˆ λ⊤

1 T1 + · · · + ˆ

λ⊤

d Td,

where each vector of weights ˆ λj is of the same size as Tj. Then we consider the constraints : ˆ λ⊤

1 1 = 1

and ∀j = 1, ˆ λ⊤

j 1 = 0.

The oracle then depends on the MSE block matrix, with blocks E (Tj − θj1) (Tj′ − θj′1)⊤

SLIDE 25

Examples Method Theory Simulations Conclusion

Generalization : Combination of several parameters simultaneously

Assume θ = (θ1, . . . , θd)⊤ and we have access to several collections of estimators T1, . . . , Td one for each component θj (the Tj may have different sizes). To estimate, say θ1 : We can consider the simple combination ˆ θ1 = ˆ λ⊤

1 T1

where ˆ λ1 is a vector of weights of the same size as T1. This is the previous setting, where we consider the constraint ˆ λ⊤

1 1 = 1.

Or we can consider the full combination ˆ θ1 = ˆ λ⊤

1 T1 + · · · + ˆ

λ⊤

d Td,

where each vector of weights ˆ λj is of the same size as Tj. Then we consider the constraints : ˆ λ⊤

1 1 = 1

and ∀j = 1, ˆ λ⊤

j 1 = 0.

The oracle then depends on the MSE block matrix, with blocks E (Tj − θj1) (Tj′ − θj′1)⊤

SLIDE 26

Examples Method Theory Simulations Conclusion

Generalization : Combination of several parameters simultaneously

Assume θ = (θ1, . . . , θd)⊤ and we have access to several collections of estimators T1, . . . , Td one for each component θj (the Tj may have different sizes). To estimate, say θ1 : We can consider the simple combination ˆ θ1 = ˆ λ⊤

1 T1

where ˆ λ1 is a vector of weights of the same size as T1. This is the previous setting, where we consider the constraint ˆ λ⊤

1 1 = 1.

Or we can consider the full combination ˆ θ1 = ˆ λ⊤

1 T1 + · · · + ˆ

λ⊤

d Td,

where each vector of weights ˆ λj is of the same size as Tj. Then we consider the constraints : ˆ λ⊤

1 1 = 1

and ∀j = 1, ˆ λ⊤

j 1 = 0.

The oracle then depends on the MSE block matrix, with blocks E (Tj − θj1) (Tj′ − θj′1)⊤

SLIDE 27

Examples Method Theory Simulations Conclusion

1

Some examples

2

The method

3

Theoretical results

4

Simulations : back to the examples

5

Conclusion

SLIDE 28

Examples Method Theory Simulations Conclusion

Oracle inequality

For Λ ⊂ Λmax and two matrices A and B, we introduce the divergence δΛ(A|B) = sup

λ∈Λ

1 − tr(λ⊤Aλ)

tr(λ⊤Bλ)

,

and δΛ(A, B) = max{δΛ(A|B), δΛ(B|A)}. Theorem Let Λ be a non-empty closed convex subset of Λmax and ˆ Σ a symmetric positive definite k × k matrix. The averaging estimator ˆ θ = ˆ λ⊤T satisfies ˆ θ − ˆ θ∗2 ≤

inf

λ∈Λ E

λ⊤T − θ

2 2δΛ(ˆ Σ, Σ) + δΛ(ˆ Σ, Σ)2 Σ− 1

2 (T − θ1)2,

(1) where ˆ θ∗ is the oracle. Green term : MSE of the oracle Blue term : should be small, provided ˆ Σ is ” close”to Σ Orange term : plays the role of a constant in view of EΣ− 1

2 (T − θ1)2 = k.

SLIDE 29

Examples Method Theory Simulations Conclusion

Asymptotic results

Let n denote the size of the sample used to produce T, and αn := E(ˆ θ∗

n − θ)2 = λ∗⊤ n Σnλ∗ n,

ˆ αn = ˆ λ⊤

n ˆ

Σnˆ λn. Theorem If ˆ ΣnΣ−1

n p

− → I, then (ˆ θn − θ)2 = (ˆ θ∗

n − θ)2 + op(αn).

Moreover, if the vector of initial estimators T is asymptotically gaussian, then ˆ α

− 1

2

n

(ˆ θn − θ)

d

− → N(0, 1). (2) (2) allows to construct asymptotic confidence intervals for θ, without further approximation (since ˆ αn is already computed to get ˆ θ). This interval is of minimal length (asymptotically) amongst all possible confidence intervals based on a linear combination of T.

SLIDE 30

Examples Method Theory Simulations Conclusion

1

Some examples

2

The method

3

Theoretical results

4

Simulations : back to the examples

5

Conclusion

SLIDE 31

Examples Method Theory Simulations Conclusion

Example 1 : mean and median

x1, . . . , xn i.i.d ∼ f , with variance σ2, and where f is symmetric around θ. T = (T1, T2)⊤ with T1 = ¯ xn and T2 = x(n/2). The average estimator over Λmax is ˆ θ = 1⊤ ˆ Σ−1 1⊤ ˆ Σ−11 T Two choices for ˆ Σ: The asymptotic form of Σ (obtained by P. S. de Laplace) is n−1W where W =

σ2

E|X−θ| 2f (θ) E|X−θ| 2f (θ) 1 4f (θ)2

.

All entries of W can be estimated naturally given an initial estimate ˆ θ0 (we choose ˆ θ0 = x(n/2)) and this leads to a first estimate ˆ Σ to get ˆ θAV . Σ is estimated by non-parametric bootstrap (i.e. resampling), leading to another average estimator denoted ˆ θAVB.

SLIDE 32

Examples Method Theory Simulations Conclusion

Example 1 : mean and median

Estimated MSE based on 104 replications, for several distributions f with θ = 0

n=30 n=50 n=100 MEAN MED AV AVB MEAN MED AV AVB MEAN MED AV AVB

Cauchy

2.106 9 8.95 8.99 4.107 5.07 4.92 4.9 2.107 2.56 2.49 2.49

St(4)

6.68 5.71 5.4 5.43 4.12 3.53 3.33 3.34 1.99 1.74 1.61 1.62

St(7)

4.8 5.51 4.6 4.64 2.82 3.32 2.74 2.8 1.42 1.67 1.37 1.38

Logistic

10.89 12.7 10.76 10.87 6.64 7.93 6.52 6.6 3.3 4 3.2 3.26

Gauss

3.39 5.11 3.53 3.61 2.04 3.1 2.1 2.15 1 1.51 1.02 1.06

Mix

16.79 87 15.03 13.41 10.08 66.53 7.57 6.68 5.05 42.35 3.09 2.36

ˆ θAV and ˆ θAVB behave similarly : they outperform both the mean and the median, in all cases except for the Gaussian law. For the Gaussian law : we know that ¯ xn is the best estimator. However, the performances of ˆ θAV and ˆ θAVB are very close, meaning that the optimal weight (1, 0) is well estimated For the Cauchy distribution : surprisingly good performances of ˆ θAV and ˆ θAVB given that ¯ xn should not have been used. This means that the

ptimal weight (0, 1) is well estimated.

SLIDE 33

Examples Method Theory Simulations Conclusion

Example 2 : Weibull model

η is estimated by MLE, and 3 estimators are considered for β : T1 = MLE, T2 = MM, T3 = OLS The average estimator of β over Λmax is ˆ βAV = 1⊤ ˆ Σ−1 1⊤ ˆ Σ−11 T Here Σ = Σ(β, η) but the close expression is not known. However ˆ Σ can be estimated by parametric bootstrap.

1

resample B Weibull samples of size n with parameters ˆ β0 and ˆ η0, where ˆ β0 and ˆ η0 are initial estimators (we chose the mean of T1, T2, T3 for ˆ β0 and the MLE for ˆ η0).

2

for each sample b, compute T (b)

1 , T (b) 2

and T (b)

3 .

3

ˆ Σ corresponds to the empirical MSE matrix of T, for instance E (T1 − θ)(T2 − θ) is estimated by

1 B

B

b=1(T (b) 1

− ˆ β0)(T (b)

2

− ˆ β0)

SLIDE 34

Examples Method Theory Simulations Conclusion

Example 2 : Weibull model

Simulations for several values of β and different sample sizes n (η = 10). Estimated MSE (104 replications) with standard errors in parenthesis

n=10 n=20 n=50 ML MM OLS AV ML MM OLS AV ML MM OLS AV β = 0.5 35.53 76.95 24.41 25.27 12.06 35.57 13.74 10.5 3.7 14.19 6.04 3.52 (0.91) (1.27) (0.40) (0.64) (0.26) (0.52) (0.19) (0.19) (0.07) (0.20) (0.08) (0.06) β = 1 152.4 131.6 98.1 85.5 49.2 53.6 54.2 36.9 14.4 19.3 23.9 12.8 (3.8) (3.1) (1.5) (1.7) (1.1) (1.1) (0.7) (0.7) (0.2) (0.3) (0.3) (0.2) β = 2 596.4 444.6 399.4 355.5 194.5 164.5 218 163.3 57.9 53.9 94.8 54.3 (14.4) (11.9) (6.3) (6.7) (3.8) (3.3) (2.8) (2.7) (1.0) (0.9) (1.3) (0.9) β = 3 1369 1080 905 770 452 394 486 343 128 122 211 120 (34.6) (29.7) (14.6) (18.1) (9.8) (8.9) (6.7) (6.2) (2.2) (2.0) (2.7) (1.9)

SLIDE 35

Examples Method Theory Simulations Conclusion

Example 2 : Weibull model

Repartition of ˆ β when β = 0.5 and β = 3 (η = 10, n = 20)

ML

MM OLS AG 0.2 0.4 0.6 0.8 1.0 1.2 1.4

ML

MM OLS AG 1 2 3 4 5 6

SLIDE 36

Examples Method Theory Simulations Conclusion

Example 2 : Weibull model

Combination of all estimators to estimate η ˆ ηML : maximum likelihood estimators of η T1, T2, T3 : previous estimators of β We consider ˆ ηAV = ˆ ηML + λ1T1 + λ2T2 + λ3T3 with λ1 + λ2 + λ3 = 0. Estimated MSE (104 replications) with standard errors in parenthesis n=10 n=20 n=50 ML AV ML AV ML AV β = 0.5 60.59 55.61 25.96 24.56 9.57 9.38 (1.60) (1.48) (0.53) (0.5) (0.17) (0.17) β = 1 11.15 10.88 5.53 5.43 2.23 2.22 (0.18) (0.17) (0.08) (0.08) (0.03) (0.03) β = 2 2.71 2.74 1.36 1.37 0.55 0.56 (0.04) (0.04) (0.02) (0.02) (0.01) (0.01) β = 3 1.21 1.23 0.61 0.61 0.247 0.248 (0.02) (0.02) (0.01) (0.01) (0.003) (0.004)

SLIDE 37

Examples Method Theory Simulations Conclusion

Example 3 : kernel density estimation

Estimation of a density f based on a sample of size n. We choose the Gaussian kernel and we consider 4 choices of bandwidth in the function density (option bw): h1 : nrd0 (Silverman rule of thumb), h2 : nrd (a variation) h3 : ucv (unbiased cross validation), h4 : SJ (Sheather and Jones method). Denoted the initial estimators by T = (ˆ fn,h1, . . . , ˆ fn,h4)⊤, the average estimator

f f over Λmax is

ˆ fAV = 1⊤ ˆ Σ−1 1⊤ ˆ Σ−11 T where Σ is the MISE matrix with entries

E(ˆ

fn,hi (x) − f (x))(ˆ fn,hj (x) − f (x))dx.

SLIDE 38

Examples Method Theory Simulations Conclusion

Example 3 : kernel density estimation

To estimate the MISE matrix, we consider its asymptotic expression with entries (when the Gaussian kernel is used) : AMISE(hi, hj) = 1 n √ 2π hihj

h2

i + h2 j

+ h2

i h2 j

4

(f ′′(x))2dx

The integral

(f ′′(x))2dx is estimated by the standard plug-in method

proposed by Jones and Sheather. Estimated MSE (104 replications) for different densities and sample sizes

n=250 n=500 n=1000 h1 h2 h3 h4 AV h1 h2 h3 h4 AV h1 h2 h3 h4 AV

Gauss

29.9 27.2 26.8 29.9 24.9 17.7 16.2 16.2 17.3 14.4 10.5 9.7 9.8 10.1 8.4

Mix

24.0 27.5 27.1 25.2 26.7 14.8 17.6 15.3 14.9 14.2 9.1 11.1 8.9 8.8 7.4

Gamma

28.0 32.7 29.5 28.9 27.9 17.1 20.6 17.0 17.2 15.8 10.3 12.7 10.0 10.3 9.0

Cauchy

31.2 37.0 830 132 32.8 18.9 23.2 945 180 18.7 11.4 14.4 1068 226 10.6

SLIDE 39

Examples Method Theory Simulations Conclusion

1

Some examples

2

The method

3

Theoretical results

4

Simulations : back to the examples

5

Conclusion

SLIDE 40

Examples Method Theory Simulations Conclusion

Conclusion

The best combination ˆ θ∗ = k

i=1 λiTi with k i=1 λi = 1 is

ˆ θ∗ = 1⊤Σ−1 1⊤Σ−11T The oracle ˆ θ∗ is better than each Ti, but it is not known in practice. The average estimator ˆ θ approximates the oracle in that Σ is replaced by an estimate ˆ Σ. The estimation of Σ be carried out with the same data as those used to compute the initial estimators T1, . . . , Tk. In a fully parametric setting, the initial data are not necessary to compute ˆ θ, but only the estimators T1, . . . , Tk. ˆ θ is (in some sense) asymptotically equivalent to ˆ θ∗, and in our examples the approximation works well for moderate size of data. Once ˆ θ is obtained, an asymptotic confidence interval can be provided for free.