Examples Method Theory Simulations Conclusion
A general procedure to combine estimators Fr ed eric Lavancier - - PowerPoint PPT Presentation
A general procedure to combine estimators Fr ed eric Lavancier - - PowerPoint PPT Presentation
Examples Method Theory Simulations Conclusion A general procedure to combine estimators Fr ed eric Lavancier Laboratoire de Math ematiques Jean Leray University of Nantes Joint work with Paul Rochet (University of Nantes) Examples
Examples Method Theory Simulations Conclusion
Introduction
Let θ be an unknown quantity in a statistical model. Consider a collection of k estimators T1, ..., Tk of θ. Aim: combining these estimators to obtain a better estimate.
Examples Method Theory Simulations Conclusion
1
Some examples
2
The method
3
Theoretical results
4
Simulations : back to the examples
5
Conclusion
Examples Method Theory Simulations Conclusion
1
Some examples
2
The method
3
Theoretical results
4
Simulations : back to the examples
5
Conclusion
Examples Method Theory Simulations Conclusion
Example 1 : mean and median
Let x1, . . . , xn be n i.i.d realisations of an unknown distribution on the real line. Assume this distribution is symmetric around some parameter θ (θ ∈ R). Two natural choices to estimate θ : the mean T1 = ¯ xn the median T2 = x(n/2) The idea to combine these two estimators comes from Pierre Simon de Laplace. In the Second Supplement of the Th´ eorie Analytique des Probabilit´ es (1812), he wrote : ” En combinant les r´ esultats de ces deux m´ ethodes, on peut obtenir un r´ esultat dont la loi de probabilit´ e des erreurs soit plus rapidement d´ ecroissante.” [In combining the results of these two methods, one can obtain a result whose probability law of error will be more rapidly decreasing.]
Examples Method Theory Simulations Conclusion
Example 1 : mean and median
Let x1, . . . , xn be n i.i.d realisations of an unknown distribution on the real line. Assume this distribution is symmetric around some parameter θ (θ ∈ R). Two natural choices to estimate θ : the mean T1 = ¯ xn the median T2 = x(n/2) The idea to combine these two estimators comes from Pierre Simon de Laplace. In the Second Supplement of the Th´ eorie Analytique des Probabilit´ es (1812), he wrote : ” En combinant les r´ esultats de ces deux m´ ethodes, on peut obtenir un r´ esultat dont la loi de probabilit´ e des erreurs soit plus rapidement d´ ecroissante.” [In combining the results of these two methods, one can obtain a result whose probability law of error will be more rapidly decreasing.]
Examples Method Theory Simulations Conclusion
Example 1 : mean and median
Laplace considered the combination λ1¯ xn + λ2x(n/2) with λ1 + λ2 = 1.
- 1. He proved that the asymptotic law of this combination is Gaussian (in 1812)!
- 2. Minimizing the asymptotic variance in λ1, λ2, he concluded that
if the underlying distribution is Gaussian, then the best combination is to take λ1 = 1 and λ2 = 0. for other distributions, the best combination depends on the distribution: ” L’ignorance o` u l’on est de la loi de probabilit´ e des erreurs des observations rend cette correction impraticable” [When one does not know the distribution of the errors of observation this correction is not feasible.]
Examples Method Theory Simulations Conclusion
Example 1 : mean and median
Laplace considered the combination λ1¯ xn + λ2x(n/2) with λ1 + λ2 = 1.
- 1. He proved that the asymptotic law of this combination is Gaussian (in 1812)!
- 2. Minimizing the asymptotic variance in λ1, λ2, he concluded that
if the underlying distribution is Gaussian, then the best combination is to take λ1 = 1 and λ2 = 0. for other distributions, the best combination depends on the distribution: ” L’ignorance o` u l’on est de la loi de probabilit´ e des erreurs des observations rend cette correction impraticable” [When one does not know the distribution of the errors of observation this correction is not feasible.]
Examples Method Theory Simulations Conclusion
Example 1 : mean and median
Laplace considered the combination λ1¯ xn + λ2x(n/2) with λ1 + λ2 = 1.
- 1. He proved that the asymptotic law of this combination is Gaussian (in 1812)!
- 2. Minimizing the asymptotic variance in λ1, λ2, he concluded that
if the underlying distribution is Gaussian, then the best combination is to take λ1 = 1 and λ2 = 0. for other distributions, the best combination depends on the distribution: ” L’ignorance o` u l’on est de la loi de probabilit´ e des erreurs des observations rend cette correction impraticable” [When one does not know the distribution of the errors of observation this correction is not feasible.]
Examples Method Theory Simulations Conclusion
Example 1 : mean and median
Laplace considered the combination λ1¯ xn + λ2x(n/2) with λ1 + λ2 = 1.
- 1. He proved that the asymptotic law of this combination is Gaussian (in 1812)!
- 2. Minimizing the asymptotic variance in λ1, λ2, he concluded that
if the underlying distribution is Gaussian, then the best combination is to take λ1 = 1 and λ2 = 0. for other distributions, the best combination depends on the distribution: ” L’ignorance o` u l’on est de la loi de probabilit´ e des erreurs des observations rend cette correction impraticable” [When one does not know the distribution of the errors of observation this correction is not feasible.] Is it possible to estimate λ1 and λ2?
Examples Method Theory Simulations Conclusion
Example 2 : Weibull model
Let x1, . . . , xn i.i.d with respect to the Weibull distribution f (x) = β η x η β−1 e−(x/η)β, x > 0. We consider 3 standard methods to estimate β and η the maximum likelihood estimator (ML) the method of moments (MM) the ordinary least squares method or Weibull plot (OLS)
Examples Method Theory Simulations Conclusion
Example 2 : Weibull model
Repartition of ˆ β when β = 0.5 and β = 3 (η = 10, n = 20) Simulations based on 104 replications.
- ML
MM OLS 0.2 0.4 0.6 0.8 1.0 1.2 1.4
- ML
MM OLS 1 2 3 4 5 6 7
Which one to choose? Can we combine them to get a better estimate?
Examples Method Theory Simulations Conclusion
Example 3 : kernel density estimation
Let x1, . . . , xn be a sample from a real random variable with density f . The kernel density estimator of f at x ∈ R is ˆ fn,h(x) = 1 nh
n
- i=1
K x − xi h
- ,
where K is the kernel and h the smoothing bandwidth. In this setting θ = f and we assume that θ ∈ L2(R). For a fixed kernel K (say the Gaussian kernel), one may consider k different choices of bandwidth h1, . . . , hk leading to the estimators: T1 = ˆ fn,h1, . . . , Tk = ˆ fn,hk
Examples Method Theory Simulations Conclusion
Example 3 : kernel density estimation
For instance, in R, 5 choices are proposed in the function density (option bw): nrd0 (Silverman rule of thumb), nrd (a variation) ucv and bcv (unbiased and biased cross validation), SJ (Sheather and Jones method). Example with a mixture distribution and the Cauchy distribution (n = 500) :
−4 −2 2 4 0.00 0.05 0.10 0.15 0.20 0.25 −6 −4 −2 2 4 6 0.0 0.1 0.2 0.3 0.4 0.5
Examples Method Theory Simulations Conclusion
Other examples
Any parametric model where several estimators are available. Any method involving tuning parameters. In forecasting (of a time series, or of a model output) : combination of several forecasts. This special case has been widely studied and specific procedures have been developed.
Examples Method Theory Simulations Conclusion
1
Some examples
2
The method
3
Theoretical results
4
Simulations : back to the examples
5
Conclusion
Examples Method Theory Simulations Conclusion
The oracle
Let θ ∈ R and consider a collection of k estimators : T = (T1, ..., Tk)⊤. Following P.S. de Laplace, we look for the best linear combination : λ⊤T =
k
- i=1
λiTi, where
k
- i=1
λi = 1. We denote Λmax = {λ ∈ Rk : λ⊤1 = 1}, where 1 is the vector 1 = (1, ..., 1)⊤. The best non-random combination, in the mean square sense, is the so-called
- racle :
ˆ θ∗ = λ∗⊤T, where λ∗ = arg min
λ∈Λmax E(λ⊤T − θ)2.
This is a standard optimisation problem to deduce λ∗ = Σ−11 1⊤Σ−11 where Σ is the Mean Square Error matrix of T, i.e. Σ = E (T − θ1) (T − θ1)⊤ = (E (Ti − θ)(Tj − θ))i,j=1,...,k .
Examples Method Theory Simulations Conclusion
The oracle
Let θ ∈ R and consider a collection of k estimators : T = (T1, ..., Tk)⊤. Following P.S. de Laplace, we look for the best linear combination : λ⊤T =
k
- i=1
λiTi, where
k
- i=1
λi = 1. We denote Λmax = {λ ∈ Rk : λ⊤1 = 1}, where 1 is the vector 1 = (1, ..., 1)⊤. The best non-random combination, in the mean square sense, is the so-called
- racle :
ˆ θ∗ = λ∗⊤T, where λ∗ = arg min
λ∈Λmax E(λ⊤T − θ)2.
This is a standard optimisation problem to deduce λ∗ = Σ−11 1⊤Σ−11 where Σ is the Mean Square Error matrix of T, i.e. Σ = E (T − θ1) (T − θ1)⊤ = (E (Ti − θ)(Tj − θ))i,j=1,...,k .
Examples Method Theory Simulations Conclusion
The oracle
Let θ ∈ R and consider a collection of k estimators : T = (T1, ..., Tk)⊤. Following P.S. de Laplace, we look for the best linear combination : λ⊤T =
k
- i=1
λiTi, where
k
- i=1
λi = 1. We denote Λmax = {λ ∈ Rk : λ⊤1 = 1}, where 1 is the vector 1 = (1, ..., 1)⊤. The best non-random combination, in the mean square sense, is the so-called
- racle :
ˆ θ∗ = λ∗⊤T, where λ∗ = arg min
λ∈Λmax E(λ⊤T − θ)2.
This is a standard optimisation problem to deduce λ∗ = Σ−11 1⊤Σ−11 where Σ is the Mean Square Error matrix of T, i.e. Σ = E (T − θ1) (T − θ1)⊤ = (E (Ti − θ)(Tj − θ))i,j=1,...,k .
Examples Method Theory Simulations Conclusion
The average estimator
The oracle is therefore ˆ θ∗ = λ∗⊤T = 1⊤Σ−1 1⊤Σ−11T. In practice, the optimal weight λ∗ is not known and must be estimated. This reduces to the estimation of the MSE matrix Σ. Denoting ˆ Σ some estimate of Σ, we obtain the average estimator ˆ θ = ˆ λ⊤T = 1⊤ ˆ Σ−1 1⊤ ˆ Σ−11 T.
Examples Method Theory Simulations Conclusion
Estimation of Σ
In practice, the estimation of Σ may be conducted in several ways, depending
- n the underlying model :
In a fully parametric model, the law of T only depends on θ and so Σ = Σ(θ). If Σ(θ) is explicitly known, a natural choice is the plug-in estimator ˆ Σ = Σ(ˆ θ0), where ˆ θ0 is some estimator of θ (for instance one of the initial Ti, or their simple average). Otherwise, Σ(ˆ θ0) may be approximated by parametric bootstrap. Note that in these cases, ˆ Σ (and so the average estimator ˆ θ) do not require the initial data used to produce T, but only T. In a non-parametric setting: Σ may be estimated by standard non-parametric bootstrap. Alternatively, a parametric close-form expression of Σ may be available
- asymptotically. Then we can use a plug-in estimation method to get ˆ
Σ.
Examples Method Theory Simulations Conclusion
Estimation of Σ
In practice, the estimation of Σ may be conducted in several ways, depending
- n the underlying model :
In a fully parametric model, the law of T only depends on θ and so Σ = Σ(θ). If Σ(θ) is explicitly known, a natural choice is the plug-in estimator ˆ Σ = Σ(ˆ θ0), where ˆ θ0 is some estimator of θ (for instance one of the initial Ti, or their simple average). Otherwise, Σ(ˆ θ0) may be approximated by parametric bootstrap. Note that in these cases, ˆ Σ (and so the average estimator ˆ θ) do not require the initial data used to produce T, but only T. In a non-parametric setting: Σ may be estimated by standard non-parametric bootstrap. Alternatively, a parametric close-form expression of Σ may be available
- asymptotically. Then we can use a plug-in estimation method to get ˆ
Σ.
Examples Method Theory Simulations Conclusion
Estimation of Σ
In practice, the estimation of Σ may be conducted in several ways, depending
- n the underlying model :
In a fully parametric model, the law of T only depends on θ and so Σ = Σ(θ). If Σ(θ) is explicitly known, a natural choice is the plug-in estimator ˆ Σ = Σ(ˆ θ0), where ˆ θ0 is some estimator of θ (for instance one of the initial Ti, or their simple average). Otherwise, Σ(ˆ θ0) may be approximated by parametric bootstrap. Note that in these cases, ˆ Σ (and so the average estimator ˆ θ) do not require the initial data used to produce T, but only T. In a non-parametric setting: Σ may be estimated by standard non-parametric bootstrap. Alternatively, a parametric close-form expression of Σ may be available
- asymptotically. Then we can use a plug-in estimation method to get ˆ
Σ.
Examples Method Theory Simulations Conclusion
Generalization : Combination of several parameters simultaneously
Assume θ = (θ1, . . . , θd)⊤ and we have access to several collections of estimators T1, . . . , Td one for each component θj (the Tj may have different sizes). To estimate, say θ1 : We can consider the simple combination ˆ θ1 = ˆ λ⊤
1 T1
where ˆ λ1 is a vector of weights of the same size as T1. This is the previous setting, where we consider the constraint ˆ λ⊤
1 1 = 1.
Or we can consider the full combination ˆ θ1 = ˆ λ⊤
1 T1 + · · · + ˆ
λ⊤
d Td,
where each vector of weights ˆ λj is of the same size as Tj. Then we consider the constraints : ˆ λ⊤
1 1 = 1
and ∀j = 1, ˆ λ⊤
j 1 = 0.
The oracle then depends on the MSE block matrix, with blocks E (Tj − θj1) (Tj′ − θj′1)⊤
Examples Method Theory Simulations Conclusion
Generalization : Combination of several parameters simultaneously
Assume θ = (θ1, . . . , θd)⊤ and we have access to several collections of estimators T1, . . . , Td one for each component θj (the Tj may have different sizes). To estimate, say θ1 : We can consider the simple combination ˆ θ1 = ˆ λ⊤
1 T1
where ˆ λ1 is a vector of weights of the same size as T1. This is the previous setting, where we consider the constraint ˆ λ⊤
1 1 = 1.
Or we can consider the full combination ˆ θ1 = ˆ λ⊤
1 T1 + · · · + ˆ
λ⊤
d Td,
where each vector of weights ˆ λj is of the same size as Tj. Then we consider the constraints : ˆ λ⊤
1 1 = 1
and ∀j = 1, ˆ λ⊤
j 1 = 0.
The oracle then depends on the MSE block matrix, with blocks E (Tj − θj1) (Tj′ − θj′1)⊤
Examples Method Theory Simulations Conclusion
Generalization : Combination of several parameters simultaneously
Assume θ = (θ1, . . . , θd)⊤ and we have access to several collections of estimators T1, . . . , Td one for each component θj (the Tj may have different sizes). To estimate, say θ1 : We can consider the simple combination ˆ θ1 = ˆ λ⊤
1 T1
where ˆ λ1 is a vector of weights of the same size as T1. This is the previous setting, where we consider the constraint ˆ λ⊤
1 1 = 1.
Or we can consider the full combination ˆ θ1 = ˆ λ⊤
1 T1 + · · · + ˆ
λ⊤
d Td,
where each vector of weights ˆ λj is of the same size as Tj. Then we consider the constraints : ˆ λ⊤
1 1 = 1
and ∀j = 1, ˆ λ⊤
j 1 = 0.
The oracle then depends on the MSE block matrix, with blocks E (Tj − θj1) (Tj′ − θj′1)⊤
Examples Method Theory Simulations Conclusion
1
Some examples
2
The method
3
Theoretical results
4
Simulations : back to the examples
5
Conclusion
Examples Method Theory Simulations Conclusion
Oracle inequality
For Λ ⊂ Λmax and two matrices A and B, we introduce the divergence δΛ(A|B) = sup
λ∈Λ
- 1 − tr(λ⊤Aλ)
tr(λ⊤Bλ)
- ,
and δΛ(A, B) = max{δΛ(A|B), δΛ(B|A)}. Theorem Let Λ be a non-empty closed convex subset of Λmax and ˆ Σ a symmetric positive definite k × k matrix. The averaging estimator ˆ θ = ˆ λ⊤T satisfies ˆ θ − ˆ θ∗2 ≤
- inf
λ∈Λ E
- λ⊤T − θ
2 2δΛ(ˆ Σ, Σ) + δΛ(ˆ Σ, Σ)2 Σ− 1
2 (T − θ1)2,
(1) where ˆ θ∗ is the oracle. Green term : MSE of the oracle Blue term : should be small, provided ˆ Σ is ” close”to Σ Orange term : plays the role of a constant in view of EΣ− 1
2 (T − θ1)2 = k.
Examples Method Theory Simulations Conclusion
Asymptotic results
Let n denote the size of the sample used to produce T, and αn := E(ˆ θ∗
n − θ)2 = λ∗⊤ n Σnλ∗ n,
ˆ αn = ˆ λ⊤
n ˆ
Σnˆ λn. Theorem If ˆ ΣnΣ−1
n p
− → I, then (ˆ θn − θ)2 = (ˆ θ∗
n − θ)2 + op(αn).
Moreover, if the vector of initial estimators T is asymptotically gaussian, then ˆ α
− 1
2
n
(ˆ θn − θ)
d
− → N(0, 1). (2) (2) allows to construct asymptotic confidence intervals for θ, without further approximation (since ˆ αn is already computed to get ˆ θ). This interval is of minimal length (asymptotically) amongst all possible confidence intervals based on a linear combination of T.
Examples Method Theory Simulations Conclusion
1
Some examples
2
The method
3
Theoretical results
4
Simulations : back to the examples
5
Conclusion
Examples Method Theory Simulations Conclusion
Example 1 : mean and median
x1, . . . , xn i.i.d ∼ f , with variance σ2, and where f is symmetric around θ. T = (T1, T2)⊤ with T1 = ¯ xn and T2 = x(n/2). The average estimator over Λmax is ˆ θ = 1⊤ ˆ Σ−1 1⊤ ˆ Σ−11 T Two choices for ˆ Σ: The asymptotic form of Σ (obtained by P. S. de Laplace) is n−1W where W =
- σ2
E|X−θ| 2f (θ) E|X−θ| 2f (θ) 1 4f (θ)2
- .
All entries of W can be estimated naturally given an initial estimate ˆ θ0 (we choose ˆ θ0 = x(n/2)) and this leads to a first estimate ˆ Σ to get ˆ θAV . Σ is estimated by non-parametric bootstrap (i.e. resampling), leading to another average estimator denoted ˆ θAVB.
Examples Method Theory Simulations Conclusion
Example 1 : mean and median
Estimated MSE based on 104 replications, for several distributions f with θ = 0
n=30 n=50 n=100 MEAN MED AV AVB MEAN MED AV AVB MEAN MED AV AVB
Cauchy
2.106 9 8.95 8.99 4.107 5.07 4.92 4.9 2.107 2.56 2.49 2.49
St(4)
6.68 5.71 5.4 5.43 4.12 3.53 3.33 3.34 1.99 1.74 1.61 1.62
St(7)
4.8 5.51 4.6 4.64 2.82 3.32 2.74 2.8 1.42 1.67 1.37 1.38
Logistic
10.89 12.7 10.76 10.87 6.64 7.93 6.52 6.6 3.3 4 3.2 3.26
Gauss
3.39 5.11 3.53 3.61 2.04 3.1 2.1 2.15 1 1.51 1.02 1.06
Mix
16.79 87 15.03 13.41 10.08 66.53 7.57 6.68 5.05 42.35 3.09 2.36
ˆ θAV and ˆ θAVB behave similarly : they outperform both the mean and the median, in all cases except for the Gaussian law. For the Gaussian law : we know that ¯ xn is the best estimator. However, the performances of ˆ θAV and ˆ θAVB are very close, meaning that the optimal weight (1, 0) is well estimated For the Cauchy distribution : surprisingly good performances of ˆ θAV and ˆ θAVB given that ¯ xn should not have been used. This means that the
- ptimal weight (0, 1) is well estimated.
Examples Method Theory Simulations Conclusion
Example 2 : Weibull model
η is estimated by MLE, and 3 estimators are considered for β : T1 = MLE, T2 = MM, T3 = OLS The average estimator of β over Λmax is ˆ βAV = 1⊤ ˆ Σ−1 1⊤ ˆ Σ−11 T Here Σ = Σ(β, η) but the close expression is not known. However ˆ Σ can be estimated by parametric bootstrap.
1
resample B Weibull samples of size n with parameters ˆ β0 and ˆ η0, where ˆ β0 and ˆ η0 are initial estimators (we chose the mean of T1, T2, T3 for ˆ β0 and the MLE for ˆ η0).
2
for each sample b, compute T (b)
1 , T (b) 2
and T (b)
3 .
3
ˆ Σ corresponds to the empirical MSE matrix of T, for instance E (T1 − θ)(T2 − θ) is estimated by
1 B
B
b=1(T (b) 1
− ˆ β0)(T (b)
2
− ˆ β0)
Examples Method Theory Simulations Conclusion
Example 2 : Weibull model
Simulations for several values of β and different sample sizes n (η = 10). Estimated MSE (104 replications) with standard errors in parenthesis
n=10 n=20 n=50 ML MM OLS AV ML MM OLS AV ML MM OLS AV β = 0.5 35.53 76.95 24.41 25.27 12.06 35.57 13.74 10.5 3.7 14.19 6.04 3.52 (0.91) (1.27) (0.40) (0.64) (0.26) (0.52) (0.19) (0.19) (0.07) (0.20) (0.08) (0.06) β = 1 152.4 131.6 98.1 85.5 49.2 53.6 54.2 36.9 14.4 19.3 23.9 12.8 (3.8) (3.1) (1.5) (1.7) (1.1) (1.1) (0.7) (0.7) (0.2) (0.3) (0.3) (0.2) β = 2 596.4 444.6 399.4 355.5 194.5 164.5 218 163.3 57.9 53.9 94.8 54.3 (14.4) (11.9) (6.3) (6.7) (3.8) (3.3) (2.8) (2.7) (1.0) (0.9) (1.3) (0.9) β = 3 1369 1080 905 770 452 394 486 343 128 122 211 120 (34.6) (29.7) (14.6) (18.1) (9.8) (8.9) (6.7) (6.2) (2.2) (2.0) (2.7) (1.9)
Examples Method Theory Simulations Conclusion
Example 2 : Weibull model
Repartition of ˆ β when β = 0.5 and β = 3 (η = 10, n = 20)
- ML
MM OLS AG 0.2 0.4 0.6 0.8 1.0 1.2 1.4
- ML
MM OLS AG 1 2 3 4 5 6
Examples Method Theory Simulations Conclusion
Example 2 : Weibull model
Combination of all estimators to estimate η ˆ ηML : maximum likelihood estimators of η T1, T2, T3 : previous estimators of β We consider ˆ ηAV = ˆ ηML + λ1T1 + λ2T2 + λ3T3 with λ1 + λ2 + λ3 = 0. Estimated MSE (104 replications) with standard errors in parenthesis n=10 n=20 n=50 ML AV ML AV ML AV β = 0.5 60.59 55.61 25.96 24.56 9.57 9.38 (1.60) (1.48) (0.53) (0.5) (0.17) (0.17) β = 1 11.15 10.88 5.53 5.43 2.23 2.22 (0.18) (0.17) (0.08) (0.08) (0.03) (0.03) β = 2 2.71 2.74 1.36 1.37 0.55 0.56 (0.04) (0.04) (0.02) (0.02) (0.01) (0.01) β = 3 1.21 1.23 0.61 0.61 0.247 0.248 (0.02) (0.02) (0.01) (0.01) (0.003) (0.004)
Examples Method Theory Simulations Conclusion
Example 3 : kernel density estimation
Estimation of a density f based on a sample of size n. We choose the Gaussian kernel and we consider 4 choices of bandwidth in the function density (option bw): h1 : nrd0 (Silverman rule of thumb), h2 : nrd (a variation) h3 : ucv (unbiased cross validation), h4 : SJ (Sheather and Jones method). Denoted the initial estimators by T = (ˆ fn,h1, . . . , ˆ fn,h4)⊤, the average estimator
- f f over Λmax is
ˆ fAV = 1⊤ ˆ Σ−1 1⊤ ˆ Σ−11 T where Σ is the MISE matrix with entries
- E(ˆ
fn,hi (x) − f (x))(ˆ fn,hj (x) − f (x))dx.
Examples Method Theory Simulations Conclusion
Example 3 : kernel density estimation
To estimate the MISE matrix, we consider its asymptotic expression with entries (when the Gaussian kernel is used) : AMISE(hi, hj) = 1 n √ 2π hihj
- h2
i + h2 j
+ h2
i h2 j
4
- (f ′′(x))2dx
The integral
- (f ′′(x))2dx is estimated by the standard plug-in method
proposed by Jones and Sheather. Estimated MSE (104 replications) for different densities and sample sizes
n=250 n=500 n=1000 h1 h2 h3 h4 AV h1 h2 h3 h4 AV h1 h2 h3 h4 AV
Gauss
29.9 27.2 26.8 29.9 24.9 17.7 16.2 16.2 17.3 14.4 10.5 9.7 9.8 10.1 8.4
Mix
24.0 27.5 27.1 25.2 26.7 14.8 17.6 15.3 14.9 14.2 9.1 11.1 8.9 8.8 7.4
Gamma
28.0 32.7 29.5 28.9 27.9 17.1 20.6 17.0 17.2 15.8 10.3 12.7 10.0 10.3 9.0
Cauchy
31.2 37.0 830 132 32.8 18.9 23.2 945 180 18.7 11.4 14.4 1068 226 10.6
Examples Method Theory Simulations Conclusion
1
Some examples
2
The method
3
Theoretical results
4
Simulations : back to the examples
5
Conclusion
Examples Method Theory Simulations Conclusion