A general procedure to combine estimators Fr ed eric Lavancier - - PowerPoint PPT Presentation

a general procedure to combine estimators
SMART_READER_LITE
LIVE PREVIEW

A general procedure to combine estimators Fr ed eric Lavancier - - PowerPoint PPT Presentation

Examples Method Theory Simulations Conclusion A general procedure to combine estimators Fr ed eric Lavancier Laboratoire de Math ematiques Jean Leray University of Nantes Joint work with Paul Rochet (University of Nantes) Examples


slide-1
SLIDE 1

Examples Method Theory Simulations Conclusion

A general procedure to combine estimators

Fr´ ed´ eric Lavancier

Laboratoire de Math´ ematiques Jean Leray University of Nantes

Joint work with Paul Rochet (University of Nantes)

slide-2
SLIDE 2

Examples Method Theory Simulations Conclusion

Introduction

Let θ be an unknown quantity in a statistical model. Consider a collection of k estimators T1, ..., Tk of θ. Aim: combining these estimators to obtain a better estimate.

slide-3
SLIDE 3

Examples Method Theory Simulations Conclusion

1

Some examples

2

The method

3

Theoretical results

4

Simulations : back to the examples

5

Conclusion

slide-4
SLIDE 4

Examples Method Theory Simulations Conclusion

1

Some examples

2

The method

3

Theoretical results

4

Simulations : back to the examples

5

Conclusion

slide-5
SLIDE 5

Examples Method Theory Simulations Conclusion

Example 1 : mean and median

Let x1, . . . , xn be n i.i.d realisations of an unknown distribution on the real line. Assume this distribution is symmetric around some parameter θ (θ ∈ R). Two natural choices to estimate θ : the mean T1 = ¯ xn the median T2 = x(n/2) The idea to combine these two estimators comes from Pierre Simon de Laplace. In the Second Supplement of the Th´ eorie Analytique des Probabilit´ es (1812), he wrote : ” En combinant les r´ esultats de ces deux m´ ethodes, on peut obtenir un r´ esultat dont la loi de probabilit´ e des erreurs soit plus rapidement d´ ecroissante.” [In combining the results of these two methods, one can obtain a result whose probability law of error will be more rapidly decreasing.]

slide-6
SLIDE 6

Examples Method Theory Simulations Conclusion

Example 1 : mean and median

Let x1, . . . , xn be n i.i.d realisations of an unknown distribution on the real line. Assume this distribution is symmetric around some parameter θ (θ ∈ R). Two natural choices to estimate θ : the mean T1 = ¯ xn the median T2 = x(n/2) The idea to combine these two estimators comes from Pierre Simon de Laplace. In the Second Supplement of the Th´ eorie Analytique des Probabilit´ es (1812), he wrote : ” En combinant les r´ esultats de ces deux m´ ethodes, on peut obtenir un r´ esultat dont la loi de probabilit´ e des erreurs soit plus rapidement d´ ecroissante.” [In combining the results of these two methods, one can obtain a result whose probability law of error will be more rapidly decreasing.]

slide-7
SLIDE 7

Examples Method Theory Simulations Conclusion

Example 1 : mean and median

Laplace considered the combination λ1¯ xn + λ2x(n/2) with λ1 + λ2 = 1.

  • 1. He proved that the asymptotic law of this combination is Gaussian (in 1812)!
  • 2. Minimizing the asymptotic variance in λ1, λ2, he concluded that

if the underlying distribution is Gaussian, then the best combination is to take λ1 = 1 and λ2 = 0. for other distributions, the best combination depends on the distribution: ” L’ignorance o` u l’on est de la loi de probabilit´ e des erreurs des observations rend cette correction impraticable” [When one does not know the distribution of the errors of observation this correction is not feasible.]

slide-8
SLIDE 8

Examples Method Theory Simulations Conclusion

Example 1 : mean and median

Laplace considered the combination λ1¯ xn + λ2x(n/2) with λ1 + λ2 = 1.

  • 1. He proved that the asymptotic law of this combination is Gaussian (in 1812)!
  • 2. Minimizing the asymptotic variance in λ1, λ2, he concluded that

if the underlying distribution is Gaussian, then the best combination is to take λ1 = 1 and λ2 = 0. for other distributions, the best combination depends on the distribution: ” L’ignorance o` u l’on est de la loi de probabilit´ e des erreurs des observations rend cette correction impraticable” [When one does not know the distribution of the errors of observation this correction is not feasible.]

slide-9
SLIDE 9

Examples Method Theory Simulations Conclusion

Example 1 : mean and median

Laplace considered the combination λ1¯ xn + λ2x(n/2) with λ1 + λ2 = 1.

  • 1. He proved that the asymptotic law of this combination is Gaussian (in 1812)!
  • 2. Minimizing the asymptotic variance in λ1, λ2, he concluded that

if the underlying distribution is Gaussian, then the best combination is to take λ1 = 1 and λ2 = 0. for other distributions, the best combination depends on the distribution: ” L’ignorance o` u l’on est de la loi de probabilit´ e des erreurs des observations rend cette correction impraticable” [When one does not know the distribution of the errors of observation this correction is not feasible.]

slide-10
SLIDE 10

Examples Method Theory Simulations Conclusion

Example 1 : mean and median

Laplace considered the combination λ1¯ xn + λ2x(n/2) with λ1 + λ2 = 1.

  • 1. He proved that the asymptotic law of this combination is Gaussian (in 1812)!
  • 2. Minimizing the asymptotic variance in λ1, λ2, he concluded that

if the underlying distribution is Gaussian, then the best combination is to take λ1 = 1 and λ2 = 0. for other distributions, the best combination depends on the distribution: ” L’ignorance o` u l’on est de la loi de probabilit´ e des erreurs des observations rend cette correction impraticable” [When one does not know the distribution of the errors of observation this correction is not feasible.] Is it possible to estimate λ1 and λ2?

slide-11
SLIDE 11

Examples Method Theory Simulations Conclusion

Example 2 : Boolean model

The standard Boolean model Γ is the union of random discs where

  • the centres come from a homogeneous Poisson point process with intensity ρ
  • the radii are independently distributed according to some probability law µ

Examples on [0, 1]2 when µ = U([0.01, 0.1]) and ρ = 50, 100, 200

slide-12
SLIDE 12

Examples Method Theory Simulations Conclusion

Example 2 : Boolean model

Assume the law of the radii µ is known and we want to estimate ρ. Denote ˆ p = |Γ ∩ W |/|W | the volume fraction on the observation window W . First estimator : denoting R ∼ µ T1 = − log(1 − ˆ p)/(π E(R2)) Second estimator : Let n+

ν be the number of lower tangent points of Γ in direction ν (ν ∈ [0, 2π]).

ˆ ρν = n+

ν /|W |

1 − ˆ p is a consistent estimator of ρ. Sample K independent directions νi uniformly on [0, 2π], then T2 = 1 K

K

  • i=1

ˆ ρνi T2 has a smaller asymptotic variance than ˆ ρν, ∀ν (Molchanov(1995))

slide-13
SLIDE 13

Examples Method Theory Simulations Conclusion

Example 2 : Boolean model

Assume the law of the radii µ is known and we want to estimate ρ. Denote ˆ p = |Γ ∩ W |/|W | the volume fraction on the observation window W . First estimator : denoting R ∼ µ T1 = − log(1 − ˆ p)/(π E(R2)) Second estimator : Let n+

ν be the number of lower tangent points of Γ in direction ν (ν ∈ [0, 2π]).

ˆ ρν = n+

ν /|W |

1 − ˆ p is a consistent estimator of ρ. Sample K independent directions νi uniformly on [0, 2π], then T2 = 1 K

K

  • i=1

ˆ ρνi T2 has a smaller asymptotic variance than ˆ ρν, ∀ν (Molchanov(1995))

slide-14
SLIDE 14

Examples Method Theory Simulations Conclusion

Example 2 : Boolean model

Assume the law of the radii µ is known and we want to estimate ρ. Denote ˆ p = |Γ ∩ W |/|W | the volume fraction on the observation window W . First estimator : denoting R ∼ µ T1 = − log(1 − ˆ p)/(π E(R2)) Second estimator : Let n+

ν be the number of lower tangent points of Γ in direction ν (ν ∈ [0, 2π]).

ˆ ρν = n+

ν /|W |

1 − ˆ p is a consistent estimator of ρ. Sample K independent directions νi uniformly on [0, 2π], then T2 = 1 K

K

  • i=1

ˆ ρνi T2 has a smaller asymptotic variance than ˆ ρν, ∀ν (Molchanov(1995))

slide-15
SLIDE 15

Examples Method Theory Simulations Conclusion

Example 2 : Boolean model

n+

ν : number of lower tangent points in direction ν.

Example : for ν = π/4, we obtain n+

ν = 6

slide-16
SLIDE 16

Examples Method Theory Simulations Conclusion

Example 2 : Boolean model

Simulation study on 103 replications on [0, 1]2 when µ = U([0.01, 0.1]) and ρ = 50, 100, 200 Boxplots of T1 and T2

T1 T2 30 40 50 60 70 80 90 T1 T2 60 80 100 120 140 160 T1 T2 100 150 200 250 300 350

ρ = 50 ρ = 100 ρ = 200 Which one to choose? Can we combine them to get a better estimate?

slide-17
SLIDE 17

Examples Method Theory Simulations Conclusion

Example 2 : Boolean model

Simulation study on 103 replications on [0, 1]2 when µ = U([0.01, 0.1]) and ρ = 50, 100, 200 Boxplots of T1 and T2

T1 T2 30 40 50 60 70 80 90 T1 T2 60 80 100 120 140 160 T1 T2 100 150 200 250 300 350

ρ = 50 ρ = 100 ρ = 200 Which one to choose? Can we combine them to get a better estimate?

slide-18
SLIDE 18

Examples Method Theory Simulations Conclusion

Example 3 : Thomas cluster process

A Thomas cluster process is a Poisson cluster process with 3 parameters κ: intensity of the Poisson process of cluster centres (i.e. the parents) µ: expected number of points per cluster (i.e. the offsprings) σ: given the parents, each offspring is distributed according to a Gaussian law centred at his parent and with standard deviation σ. Examples on [0, L]2 with L = 1, 2, 3 (κ = 10, µ = 10, σ = 0.05)

slide-19
SLIDE 19

Examples Method Theory Simulations Conclusion

Example 3 : Thomas cluster process

Several standard methods to estimate the 3 parameters : Minimum contrast method based on the pair correlation function g Minimum contrast method based on Ripley’s K function Composite likelihood estimation Palm likelihood estimation How to choose in practice, given that No one is uniformly better that the others (depending on the values of the parameters). Each method depends on some tuning parameters, that are not easy to calibrate. Alternative : Combine several estimators (from different methods or/and from different choices for the tuning parameters).

slide-20
SLIDE 20

Examples Method Theory Simulations Conclusion

Other examples

Any parametric model where several estimators are available. In kernel-based estimation (e.g. density estimation) : combination of estimators associated to different bandwidths. In forecasting (of a time series, or of a model output) : combination of several forecasts. This special case has been widely studied and specific procedures have been developed.

slide-21
SLIDE 21

Examples Method Theory Simulations Conclusion

1

Some examples

2

The method

3

Theoretical results

4

Simulations : back to the examples

5

Conclusion

slide-22
SLIDE 22

Examples Method Theory Simulations Conclusion

The oracle

Let θ ∈ R and consider a collection of k estimators : T = (T1, ..., Tk)⊤. Following P.S. de Laplace, we look for the best linear combination : λ⊤T =

k

  • i=1

λiTi, where

k

  • i=1

λi = 1. We denote Λmax = {λ ∈ Rk : λ⊤1 = 1}, where 1 is the vector 1 = (1, ..., 1)⊤. The best non-random combination, in the mean square sense, is the so-called

  • racle :

ˆ θ∗ = λ∗⊤T, where λ∗ = arg min

λ∈Λmax E(λ⊤T − θ)2.

This is a standard optimisation problem to deduce λ∗ = Σ−11 1⊤Σ−11 where Σ is the Mean Square Error matrix of T, i.e. Σ = E (T − θ1) (T − θ1)⊤ = (E (Ti − θ)(Tj − θ))i,j=1,...,k .

slide-23
SLIDE 23

Examples Method Theory Simulations Conclusion

The oracle

Let θ ∈ R and consider a collection of k estimators : T = (T1, ..., Tk)⊤. Following P.S. de Laplace, we look for the best linear combination : λ⊤T =

k

  • i=1

λiTi, where

k

  • i=1

λi = 1. We denote Λmax = {λ ∈ Rk : λ⊤1 = 1}, where 1 is the vector 1 = (1, ..., 1)⊤. The best non-random combination, in the mean square sense, is the so-called

  • racle :

ˆ θ∗ = λ∗⊤T, where λ∗ = arg min

λ∈Λmax E(λ⊤T − θ)2.

This is a standard optimisation problem to deduce λ∗ = Σ−11 1⊤Σ−11 where Σ is the Mean Square Error matrix of T, i.e. Σ = E (T − θ1) (T − θ1)⊤ = (E (Ti − θ)(Tj − θ))i,j=1,...,k .

slide-24
SLIDE 24

Examples Method Theory Simulations Conclusion

The oracle

Let θ ∈ R and consider a collection of k estimators : T = (T1, ..., Tk)⊤. Following P.S. de Laplace, we look for the best linear combination : λ⊤T =

k

  • i=1

λiTi, where

k

  • i=1

λi = 1. We denote Λmax = {λ ∈ Rk : λ⊤1 = 1}, where 1 is the vector 1 = (1, ..., 1)⊤. The best non-random combination, in the mean square sense, is the so-called

  • racle :

ˆ θ∗ = λ∗⊤T, where λ∗ = arg min

λ∈Λmax E(λ⊤T − θ)2.

This is a standard optimisation problem to deduce λ∗ = Σ−11 1⊤Σ−11 where Σ is the Mean Square Error matrix of T, i.e. Σ = E (T − θ1) (T − θ1)⊤ = (E (Ti − θ)(Tj − θ))i,j=1,...,k .

slide-25
SLIDE 25

Examples Method Theory Simulations Conclusion

The average estimator

The oracle is therefore ˆ θ∗ = λ∗⊤T = 1⊤Σ−1 1⊤Σ−11T. In practice, the optimal weight λ∗ is not known and must be estimated. This reduces to the estimation of the MSE matrix Σ. Denoting ˆ Σ some estimate of Σ, we obtain the average estimator ˆ θ = ˆ λ⊤T = 1⊤ ˆ Σ−1 1⊤ ˆ Σ−11 T.

slide-26
SLIDE 26

Examples Method Theory Simulations Conclusion

Estimation of Σ

In practice, the estimation of Σ may be conducted in several ways, depending

  • n the underlying model :

In a fully parametric model, the law of T only depends on θ and so Σ = Σ(θ). If Σ(θ) is explicitly known, a natural choice is the plug-in estimator ˆ Σ = Σ(ˆ θ0), where ˆ θ0 is some estimator of θ (for instance one of the initial Ti, or their simple average). Otherwise, Σ(ˆ θ0) may be approximated by parametric bootstrap. Note that in these cases, ˆ Σ (and so the average estimator ˆ θ) do not require the initial data used to produce T, but only T. In a non-parametric setting: Σ may be estimated by standard non-parametric bootstrap. Alternatively, a parametric close-form expression of Σ may be available

  • asymptotically. Then we can use a plug-in estimation method to get ˆ

Σ.

slide-27
SLIDE 27

Examples Method Theory Simulations Conclusion

Estimation of Σ

In practice, the estimation of Σ may be conducted in several ways, depending

  • n the underlying model :

In a fully parametric model, the law of T only depends on θ and so Σ = Σ(θ). If Σ(θ) is explicitly known, a natural choice is the plug-in estimator ˆ Σ = Σ(ˆ θ0), where ˆ θ0 is some estimator of θ (for instance one of the initial Ti, or their simple average). Otherwise, Σ(ˆ θ0) may be approximated by parametric bootstrap. Note that in these cases, ˆ Σ (and so the average estimator ˆ θ) do not require the initial data used to produce T, but only T. In a non-parametric setting: Σ may be estimated by standard non-parametric bootstrap. Alternatively, a parametric close-form expression of Σ may be available

  • asymptotically. Then we can use a plug-in estimation method to get ˆ

Σ.

slide-28
SLIDE 28

Examples Method Theory Simulations Conclusion

Estimation of Σ

In practice, the estimation of Σ may be conducted in several ways, depending

  • n the underlying model :

In a fully parametric model, the law of T only depends on θ and so Σ = Σ(θ). If Σ(θ) is explicitly known, a natural choice is the plug-in estimator ˆ Σ = Σ(ˆ θ0), where ˆ θ0 is some estimator of θ (for instance one of the initial Ti, or their simple average). Otherwise, Σ(ˆ θ0) may be approximated by parametric bootstrap. Note that in these cases, ˆ Σ (and so the average estimator ˆ θ) do not require the initial data used to produce T, but only T. In a non-parametric setting: Σ may be estimated by standard non-parametric bootstrap. Alternatively, a parametric close-form expression of Σ may be available

  • asymptotically. Then we can use a plug-in estimation method to get ˆ

Σ.

slide-29
SLIDE 29

Examples Method Theory Simulations Conclusion

Generalization : Combination of several parameters simultaneously

Assume θ = (θ1, . . . , θd)⊤ and we have access to several collections of estimators T1, . . . , Td one for each component θj (the Tj may have different sizes). To estimate, say θ1 : We can consider the simple combination ˆ θ1 = ˆ λ⊤

1 T1

where ˆ λ1 is a vector of weights of the same size as T1. This is the previous setting, where we consider the constraint ˆ λ⊤

1 1 = 1.

Or we can consider the full combination ˆ θ1 = ˆ λ⊤

1 T1 + · · · + ˆ

λ⊤

d Td,

where each vector of weights ˆ λj is of the same size as Tj. Then we consider the constraints : ˆ λ⊤

1 1 = 1

and ∀j = 1, ˆ λ⊤

j 1 = 0.

The oracle then depends on the MSE block matrix, with blocks E (Tj − θj1) (Tj′ − θj′1)⊤

slide-30
SLIDE 30

Examples Method Theory Simulations Conclusion

Generalization : Combination of several parameters simultaneously

Assume θ = (θ1, . . . , θd)⊤ and we have access to several collections of estimators T1, . . . , Td one for each component θj (the Tj may have different sizes). To estimate, say θ1 : We can consider the simple combination ˆ θ1 = ˆ λ⊤

1 T1

where ˆ λ1 is a vector of weights of the same size as T1. This is the previous setting, where we consider the constraint ˆ λ⊤

1 1 = 1.

Or we can consider the full combination ˆ θ1 = ˆ λ⊤

1 T1 + · · · + ˆ

λ⊤

d Td,

where each vector of weights ˆ λj is of the same size as Tj. Then we consider the constraints : ˆ λ⊤

1 1 = 1

and ∀j = 1, ˆ λ⊤

j 1 = 0.

The oracle then depends on the MSE block matrix, with blocks E (Tj − θj1) (Tj′ − θj′1)⊤

slide-31
SLIDE 31

Examples Method Theory Simulations Conclusion

Generalization : Combination of several parameters simultaneously

Assume θ = (θ1, . . . , θd)⊤ and we have access to several collections of estimators T1, . . . , Td one for each component θj (the Tj may have different sizes). To estimate, say θ1 : We can consider the simple combination ˆ θ1 = ˆ λ⊤

1 T1

where ˆ λ1 is a vector of weights of the same size as T1. This is the previous setting, where we consider the constraint ˆ λ⊤

1 1 = 1.

Or we can consider the full combination ˆ θ1 = ˆ λ⊤

1 T1 + · · · + ˆ

λ⊤

d Td,

where each vector of weights ˆ λj is of the same size as Tj. Then we consider the constraints : ˆ λ⊤

1 1 = 1

and ∀j = 1, ˆ λ⊤

j 1 = 0.

The oracle then depends on the MSE block matrix, with blocks E (Tj − θj1) (Tj′ − θj′1)⊤

slide-32
SLIDE 32

Examples Method Theory Simulations Conclusion

1

Some examples

2

The method

3

Theoretical results

4

Simulations : back to the examples

5

Conclusion

slide-33
SLIDE 33

Examples Method Theory Simulations Conclusion

Oracle inequality

For Λ ⊂ Λmax and two matrices A and B, we introduce the divergence δΛ(A|B) = sup

λ∈Λ

  • 1 − tr(λ⊤Aλ)

tr(λ⊤Bλ)

  • ,

and δΛ(A, B) = max{δΛ(A|B), δΛ(B|A)}. Theorem Let Λ be a non-empty closed convex subset of Λmax and ˆ Σ a symmetric positive definite k × k matrix. The averaging estimator ˆ θ = ˆ λ⊤T satisfies ˆ θ − ˆ θ∗2 ≤

  • inf

λ∈Λ E

  • λ⊤T − θ

2 2δΛ(ˆ Σ, Σ) + δΛ(ˆ Σ, Σ)2 Σ− 1

2 (T − Jθ)2,

(1) where ˆ θ∗ is the oracle. Green term : MSE of the oracle Blue term : should be small, provided ˆ Σ is ” close”to Σ Orange term : plays the role of a constant in view of EΣ− 1

2 (T − Jθ)2 = k.

slide-34
SLIDE 34

Examples Method Theory Simulations Conclusion

Asymptotic results

Let n denote the size of the sample used to produce T, and αn := E(ˆ θ∗

n − θ)2 = λ∗⊤ n Σnλ∗ n,

ˆ αn = ˆ λ⊤

n ˆ

Σnˆ λn. Theorem If ˆ ΣnΣ−1

n p

− → I, then (ˆ θn − θ)2 = (ˆ θ∗

n − θ)2 + op(αn).

Moreover, if the vector of initial estimators T is asymptotically gaussian, then ˆ α

− 1

2

n

(ˆ θn − θ)

d

− → N(0, 1). (2) (2) allows to construct asymptotic confidence intervals for θ, without further approximation (since ˆ αn is already computed to get ˆ θ). This interval is of minimal length (asymptotically) amongst all possible confidence intervals based on a linear combination of T.

slide-35
SLIDE 35

Examples Method Theory Simulations Conclusion

1

Some examples

2

The method

3

Theoretical results

4

Simulations : back to the examples

5

Conclusion

slide-36
SLIDE 36

Examples Method Theory Simulations Conclusion

Example 1 : mean and median

x1, . . . , xn i.i.d ∼ f , with variance σ2, and where f is symmetric around θ. T = (T1, T2)⊤ with T1 = ¯ xn and T2 = x(n/2). The average estimator over Λmax is ˆ θ = 1⊤ ˆ Σ−1 1⊤ ˆ Σ−11 T Two choices for ˆ Σ: The asymptotic form of Σ (obtained by P. S. de Laplace) is n−1W where W =

  • σ2

E|X−θ| 2f (θ) E|X−θ| 2f (θ) 1 4f (θ)2

  • .

All entries of W can be estimated naturally given an initial estimate ˆ θ0 (we choose ˆ θ0 = x(n/2)) and this leads to a first estimate ˆ Σ to get ˆ θAV . Σ is estimated by non-parametric bootstrap (i.e. resampling), leading to another average estimator denoted ˆ θAVB.

slide-37
SLIDE 37

Examples Method Theory Simulations Conclusion

Example 1 : mean and median

Estimated MSE based on 104 replications, for several distributions f with θ = 0

n=30 n=50 n=100 MEAN MED AV AVB MEAN MED AV AVB MEAN MED AV AVB

Cauchy

2.106 9 8.95 8.99 4.107 5.07 4.92 4.9 2.107 2.56 2.49 2.49

St(4)

6.68 5.71 5.4 5.43 4.12 3.53 3.33 3.34 1.99 1.74 1.61 1.62

St(7)

4.8 5.51 4.6 4.64 2.82 3.32 2.74 2.8 1.42 1.67 1.37 1.38

Logistic

10.89 12.7 10.76 10.87 6.64 7.93 6.52 6.6 3.3 4 3.2 3.26

Gauss

3.39 5.11 3.53 3.61 2.04 3.1 2.1 2.15 1 1.51 1.02 1.06

Mix

16.79 87 15.03 13.41 10.08 66.53 7.57 6.68 5.05 42.35 3.09 2.36

ˆ θAV and ˆ θAVB behave similarly : they outperform both the mean and the median, in all cases except for the Gaussian law. For the Gaussian law : we know that ¯ xn is the best estimator. However, the performances of ˆ θAV and ˆ θAVB are very close, meaning that the optimal weight (1, 0) is well estimated For the Cauchy distribution : surprisingly good performances of ˆ θAV and ˆ θAVB given that ¯ xn should not have been used. This means that the

  • ptimal weight (0, 1) is well estimated.
slide-38
SLIDE 38

Examples Method Theory Simulations Conclusion

Example 2 : Boolean model

Boolean model on [0, 1]2 with intensity ρ and the law of the radii µ is known. ρ is estimated by T = (T1, T2)⊤ with T1 = − log(1 − ˆ p)/(π E(R2)) T2 = 1 K

K

  • i=1

ˆ ρνi where ˆ ρν = n+

ν /|W |

1 − ˆ p . The average estimator over Λmax is ˆ ρAV = 1⊤ ˆ Σ−1 1⊤ ˆ Σ−11 T Here Σ = Σ(ρ, µ) but the close expression is not known. However ˆ Σ can be estimated by parametric bootstrap.

1

resample B Boolean models with parameters ˆ ρ0 and µ, where ˆ ρ0 is an initial estimator of ρ (we choose the mean of T1 and T2).

2

for each Boolean sample b, compute T (b)

1

and T (b)

2

3

ˆ Σ corresponds to the empirical MSE matrix of T, for instance E (T1 − θ)(T2 − θ) is estimated by

1 B

B

b=1(T (b) 1

− ˆ ρ0)(T (b)

2

− ˆ ρ0)

slide-39
SLIDE 39

Examples Method Theory Simulations Conclusion

Example 2 : Boolean model

Simulations on [0, 1]2 when µ = U([0.01, 0.1]) and ρ = 50, 100, 200. Estimated MSE (103 replications of each) : ρ = 50 ρ = 100 ρ = 200 T1 91 210 670 T2 79 266 1754 ˆ ρAV 75 199 655 Coverage rate of the 95% asymptotic confidence interval based on ˆ ρAV : ρ = 50 ρ = 100 ρ = 200 95.2% 96.7% 96%

slide-40
SLIDE 40

Examples Method Theory Simulations Conclusion

Example 2 : Boolean model

Distribution of the weights (ˆ λ1, ˆ λ2)⊤ for ρ = 50, 100, 200.

λ1 λ2 0.0 0.5 1.0 λ1 λ2 0.0 0.2 0.4 0.6 0.8 1.0 λ1 λ2 −0.5 0.0 0.5 1.0 1.5

slide-41
SLIDE 41

Examples Method Theory Simulations Conclusion

Example 3 : Thomas cluster model

Thomas process X on [0, L]2 3 parameters κ, µ and σ. Two estimation methods are considered :

  • Minimum contrast based on K : kppm(X) in spatstat.
  • Minimum contrast based on g : kppm(X,statistic = "pcf") in spatstat.

T1 = (ˆ κK, ˆ κg)⊤ T2 = (ˆ µK, ˆ µg)⊤ T3 = (ˆ σK, ˆ σg)⊤ Component-wise combination, over Λmax : ˆ κAV = 1⊤ ˆ Σ−1

1

1⊤ ˆ Σ−1

1 1

T1 ˆ µAV = 1⊤ ˆ Σ−1

2

1⊤ ˆ Σ−1

2 1

T2 ˆ σAV = 1⊤ ˆ Σ−1

3

1⊤ ˆ Σ−1

3 1

T3 ˆ Σ1, ˆ Σ2 and ˆ Σ3 are estimated by parametric bootstrap (100 replications).

slide-42
SLIDE 42

Examples Method Theory Simulations Conclusion

Example 3 : Thomas cluster model

Simulations with L = 1, L = 2 and L = 3 (κ = 10, µ = 10, σ = 0.05) Estimated MSE based on 103 replications L=1 L=2 L=3 κ µ σ κ µ σ κ µ σ K 51 18 13 13.8 9.6 9.5 7.0 5.2 5.6 g 44 17 10 8.6 6.2 4.0 3.8 2.6 1.5 AV 41 17 10 7.9 5.8 3.6 3.6 2.4 1.4 Coverage rate of the 95% asymptotic confidence interval based on ˆ κAV , ˆ µAV , ˆ σAV : L=1 L=2 L=3 κ µ σ κ µ σ κ µ σ AV 98.3 86.2 87.0 94.0 91.4 87.5 94.6 92.4 88.4

slide-43
SLIDE 43

Examples Method Theory Simulations Conclusion

Example 3 : Thomas cluster model

Combination with collaboration of all estimators, e.g. ˆ κAV 2 = λ1,1ˆ κK + λ1,2ˆ κg + λ2,1ˆ µK + λ2,2ˆ µg + λ3,1ˆ σK + λ3,2ˆ σg where λ1,1 + λ1,2 = 1 and λ2,1 + λ2,2 = 0, λ3,1 + λ3,2 = 0. Estimated MSE based on 103 replications L=1 L=2 L=3 κ µ σ κ µ σ κ µ σ K 51 18 13 13.8 9.6 9.5 7.0 5.2 5.6 g 44 17 10 8.6 6.2 4.0 3.8 2.6 1.5 AV 41 17 10 7.9 5.8 3.6 3.6 2.4 1.4 AV2 33 20 14 8.1 5.6 3.8 3.6 2.3 1.2 Coverage rate of the asymptotic confidence interval : L=1 L=2 L=3 κ µ σ κ µ σ κ µ σ AV 98.3 86.2 87.0 94.0 91.4 87.5 94.6 92.4 88.4 AV2 97.5 82.8 85.0 93.7 88.8 89.3 93.8 91.0 90

slide-44
SLIDE 44

Examples Method Theory Simulations Conclusion

1

Some examples

2

The method

3

Theoretical results

4

Simulations : back to the examples

5

Conclusion

slide-45
SLIDE 45

Examples Method Theory Simulations Conclusion

The best combination ˆ θ∗ = k

i=1 λiTi with k i=1 λi = 1 is

ˆ θ∗ = 1⊤Σ−1 1⊤Σ−11T The oracle ˆ θ∗ is better than each Ti, but it is not known in practice. The average estimator ˆ θ approximates the oracle in that Σ is replaced by an estimate ˆ Σ. The estimation of Σ be carried out with the same data as those used to compute the initial estimators T1, . . . , Tk. In a fully parametric setting, the initial data are not necessary to compute ˆ θ, but only the estimators T1, . . . , Tk. ˆ θ is (in some sense) asymptotically equivalent to ˆ θ∗, and in our examples the approximation works well for moderate size of data. Once ˆ θ is obtained, an asymptotic confidence interval can be provided for free.