On the asymptotics of the m.l. estimators Matematiikan p aiv at - - PowerPoint PPT Presentation

on the asymptotics of the m l estimators
SMART_READER_LITE
LIVE PREVIEW

On the asymptotics of the m.l. estimators Matematiikan p aiv at - - PowerPoint PPT Presentation

On the asymptotics of the m.l. estimators Matematiikan p aiv at 4.-5.1. 2006, Tampere Esko Valkeila Teknillinen korkeakoulu 4.1.2006 Outline of the talk Motivation Outline of the talk Motivation Some technical facts


slide-1
SLIDE 1

On the asymptotics of the m.l. – estimators

Matematiikan p¨ aiv¨ at 4.-5.1. 2006, Tampere Esko Valkeila

Teknillinen korkeakoulu

4.1.2006

slide-2
SLIDE 2

Outline of the talk

◮ Motivation

slide-3
SLIDE 3

Outline of the talk

◮ Motivation ◮ Some technical facts

slide-4
SLIDE 4

Outline of the talk

◮ Motivation ◮ Some technical facts ◮ One result of Le Cam

slide-5
SLIDE 5

Outline of the talk

◮ Motivation ◮ Some technical facts ◮ One result of Le Cam ◮ Multi-dimensional parameters

slide-6
SLIDE 6

Outline of the talk

◮ Motivation ◮ Some technical facts ◮ One result of Le Cam ◮ Multi-dimensional parameters ◮ Abstract filtered models

slide-7
SLIDE 7

Outline of the talk

◮ Motivation ◮ Some technical facts ◮ One result of Le Cam ◮ Multi-dimensional parameters ◮ Abstract filtered models ◮ Examples

slide-8
SLIDE 8

Outline of the talk

◮ Motivation ◮ Some technical facts ◮ One result of Le Cam ◮ Multi-dimensional parameters ◮ Abstract filtered models ◮ Examples ◮ Conclusions

slide-9
SLIDE 9

Motivation

Basic setup

We work with statistical models/experiments En(Θ) := (Ωn, F n, Pθ

n; θ ∈ Θ);

here (Ωn, F n) is a model for the observation scheme, and Pθ

n, θ ∈ Θ is a model for different statistical theories concerning

the observations. We are interested in asymptotics: what happens if n → ∞? More precisely, would like to understand what are the minimal assumptions to quarantee that the maximum likelihood estimator is asymptotically normal, efficient . . .

slide-10
SLIDE 10

Motivation

Text book information

How these problems are treated in the text books of Statistics? I would like to describe the situation as follows:

◮ Cook books on Statistics: Under some regularity assumptions

the m.l.e. is asymptotically normal. Typically, no proof is given.

slide-11
SLIDE 11

Motivation

Text book information

How these problems are treated in the text books of Statistics? I would like to describe the situation as follows:

◮ Cook books on Statistics: Under some regularity assumptions

the m.l.e. is asymptotically normal. Typically, no proof is given.

◮ Main stream books on Statistics: The log-likelihood is smooth

(in C2 in the neighbourhood of the true parameter) , some domination on the remainder term, and the support of the true distribution does not depend on the parameter. A detailed proof is given.

slide-12
SLIDE 12

Motivation

Text book information, cont.

Next I will make some comments:

◮ If we want to understand the minimal conditions for the good

properties of the m.l.e. to be valid, we should forget cook books on Statistics.

slide-13
SLIDE 13

Motivation

Text book information, cont.

Next I will make some comments:

◮ If we want to understand the minimal conditions for the good

properties of the m.l.e. to be valid, we should forget cook books on Statistics.

◮ Main stream books on Statistics are inaccurate: the support

can depend on the parameter, if the dependency is smooth (take f (x; θ) = (x − θ)e−(x−θ)1{x≥θ}).

slide-14
SLIDE 14

Some technical facts

L2- differentiability

The following definition will be very useful. We work with statistical model/experiment (Ω, F, P θ; θ ∈ Θ) with the following additional property: there exists a probability measure Q such that Pθ ≺≺ Q for all θ ∈ Θ. Put f θ := dPθ dQ . Notation: Θ ⊂ Rd, (u, v) is the inner product in R d. The model is differentiable in L2, if there exists a random variable wθ,2 ∈ L2(Q) such that for all un → 0 we have EQ √ f θ+un − √ f θ |un| − (un, w θ,2) 2 → 0 as n → ∞.

slide-15
SLIDE 15

Some technical facts

Lq- differentiability

We formally generalize the L2-differentiability: take q > 2 The model is differentiable in Lq, if there exists a random variable wθ,q ∈ Lq(Q) such that for all un → 0 we have EQ

  • q

√ f θ+un −

q

√ f θ |un| − (un, w θ,q)

  • q

→ 0 as n → ∞.

slide-16
SLIDE 16

Some technical facts

Score and Lq- differentiability

To simplify the discussion, we assume that P θ ∼ Q and put Lη,θ = f η

f θ ; Lη,θ is the likelihood.

One can show the following: EQ

  • f θ+un − f θ

|un| − (un, w θ,1)

  • → 0

with some random variable w θ,1 ∈ L1(Q), if and only if EPθ|Lθ+un,θ − 1 |un| − (un, v θ)| → 0 with some random variable v θ ∈ L1(Pθ). The vector v θ is the score vector.

slide-17
SLIDE 17

Some technical facts

Score and Lq- differentiability, cont.

Moreover, the model is differentiable in Lq if and only if EPθ

  • q

√ Lθ+un,θ − 1 |un| − 1 q (un, v θ)

  • q

→ 0. Hence w θ,2 = 1

2

√ f θv θ, w θ,q = 1

q

q

√ f θv θ; for L2- differentiable the Fisher information matrix I(θ) automatically exists and Iij(θ) = EPθv θ

i v θ j .

slide-18
SLIDE 18

One result of LeCam

Consider next the case when the statistical model En(Θ) is a product experiment En(Θ) = (Ωn, ⊗n

k=1F, Pθ n; θ ∈ Θ);

here Pθ

n is the product measure Pθ n = n k=1 Pθ.

One can show that the experiment En(Θ) is Lq- differentiable if and only if the coordinate experiment e(Θ) = (Ω, F, P θ; θ ∈ Θ) is Lq- differentiable. Let ˆ θn be the m.l.- estimator of the parameter θ in the product experiment, i.e. the m.l.- estimator based on n independent and identical observations from the model (Ω, F, P θ; θ ∈ Θ).

slide-19
SLIDE 19

One result of LeCam, cont.

We can now formulate the result of LeCam for the product

  • experiments. Assume that

◮ Θ ⊂ R is open and bounded.

slide-20
SLIDE 20

One result of LeCam, cont.

We can now formulate the result of LeCam for the product

  • experiments. Assume that

◮ Θ ⊂ R is open and bounded. ◮ The model (Ω, F, Pθ; θ ∈ Θ) is L2 differentiable.

slide-21
SLIDE 21

One result of LeCam, cont.

We can now formulate the result of LeCam for the product

  • experiments. Assume that

◮ Θ ⊂ R is open and bounded. ◮ The model (Ω, F, Pθ; θ ∈ Θ) is L2 differentiable. ◮ 0 < infθ I(θ) and supθ I(θ) < ∞ and the map θ → I(θ) is

continuous.

slide-22
SLIDE 22

One result of LeCam, cont.

We can now formulate the result of LeCam for the product

  • experiments. Assume that

◮ Θ ⊂ R is open and bounded. ◮ The model (Ω, F, Pθ; θ ∈ Θ) is L2 differentiable. ◮ 0 < infθ I(θ) and supθ I(θ) < ∞ and the map θ → I(θ) is

continuous.

◮ Then the following facts hold:

slide-23
SLIDE 23

One result of LeCam, cont.

We can now formulate the result of LeCam for the product

  • experiments. Assume that

◮ Θ ⊂ R is open and bounded. ◮ The model (Ω, F, Pθ; θ ∈ Θ) is L2 differentiable. ◮ 0 < infθ I(θ) and supθ I(θ) < ∞ and the map θ → I(θ) is

continuous.

◮ Then the following facts hold: ◮ There exists maximum likelihood estimators ˆ

θn.

slide-24
SLIDE 24

One result of LeCam, cont.

We can now formulate the result of LeCam for the product

  • experiments. Assume that

◮ Θ ⊂ R is open and bounded. ◮ The model (Ω, F, Pθ; θ ∈ Θ) is L2 differentiable. ◮ 0 < infθ I(θ) and supθ I(θ) < ∞ and the map θ → I(θ) is

continuous.

◮ Then the following facts hold: ◮ There exists maximum likelihood estimators ˆ

θn.

◮ The sequence √n(ˆ

θn − θ) is asymptotically normal under P θ with the limit N(0,

1 I(θ)).

slide-25
SLIDE 25

One result of LeCam, discussion

Essentially the good properties of the m.l.e. follow from the L2- differentiability, when the parameter is one-dimensional. In the main stream text books the proof is based on Taylor expansion with two terms and the correction term. This is not possible here, because we do not have a Taylor expansion with two terms, but with one term only. In the proof one must control the terms sup

|u|≤δ

|Lθ+u,θ

n

− 1|, by using the Kolmogorov criteria for modulus of continuity; here Lθ+u,θ

n

is the likelihood in the product experiment. If the parameter is one-dimensional, then L2 differentiability is sufficient for the control we are looking for.

slide-26
SLIDE 26

Multi dimensional parameters

Assume now that Θ ⊂ Rd, where d ≥ 2. We still assume that the model is L2– differentiable, the parameter set Θ is an open and bounded subset of Rd, Fisher information is continuous, strictly non-degenerate, and the score vector v θ satisfies v θ ∈ Lq(Pθ) with some q > d. Then

◮ There exists maximum likelihood estimators ˆ

θn.

◮ The sequence √n(ˆ

θn − θ) is asymptotically normal under P θ with the limit N(0, I(θ)−1).

slide-27
SLIDE 27

Multi dimensional parameters, discussion

As explained earlier, the main problem with this approach is to control the expression sup

|u|≤δ

|Lθ+u,θ

n

− 1|

  • r equivalently

sup

|u|≤δ

|

q

  • Lθ+u,θ

n

− 1|. If v θ ∈ Lq(Pθ), then the experiment is also Lq- differentiable, and this makes the desired control possible. The proof of these results is essentially in Ibragimov and Has’minskii, but the role of Lq– differentiability in their arguments is missing.

slide-28
SLIDE 28

General observation schemes

Filtered experiments

We work now with filtered models: (Ω, F, F, P θ; θ ∈ Θ). Here F = (Ft)0≤t≤T is an increasing family of sigma-fields, so called

  • filtration. Assume that Pθ ∼ Q and define the density processes by

t = dPθ t

dQt ; here Pθ

t = Pθ|Ft (Qt = Q|Ft).

We have the following for free: density processes z θ are (F, Q)- martingales.

slide-29
SLIDE 29

General observation schemes

Filtered experiments: differentiability and Fisher information

Assume that the filtered experiment is differentiable at time T at some θ ∈ Θ. Then the experiment is differentiable for every t ≤ T. Moreover, the score process V θ is now a square integrable (F, P θ)–

  • martingale. Then there exists a unique predictable matrix valued

process V θ, V θ =    V θ,1, V θ,1 · · · V θ,1, V θ,d . . . . . . . . . V θ,d, V θ,1 · · · V θ,d, V θ,d    such that the process V θ,iV θ,j − V θ,i, V θ,j is a (F, Pθ)- martingale.

slide-30
SLIDE 30

General observation schemes

Filtered experiments: differentiability and Fisher information

The process V θ, V θ has componentwise bounded variation on

  • compacts. If It(θ) is the Fisher information matrix for the filtered

experiment at time t, the process V θ, V θt is ’predictable ’ Fisher information. For a discussion various (Fisher) information concepts for stochastic processes we refer to Barndorff-Nielsen and Søresen, where the authors discuss various sample information concepts like V θ ∗ (V θ)t or the raw bracket [V θ, V θ]· We have I ij

t (θ) = EPθV θ,i t V θ,j t

= EPθV θ,i, V θ,jt = EPθ[V θ,i, V θ,j]t.

slide-31
SLIDE 31

General observation schemes

Lq differentiability

We know that Lq differentiability follows from the property V θ

t ∈ Lq(Pθ).

But the score process V θ is a martingale, and hence there are well-known criteria to check this. One can use the Burkholder-Davis-Gundy inequality to check the condition V θ

t ∈ Lq(Pθ).

The other possibility is to use so-called Rosenthal inequalities to check the condition V θ

t ∈ Lq(Pθ).

We will illustrate how this works with an example.

slide-32
SLIDE 32

Example

Counting process observations

We work with counting process models: the P θ intensity is λθ; here λθ is a predictable integrable positive process. Our observations is a trajectory of a counting process X. Then we know that Xt − t λθ

sds

is a (FX, Pθ) martingale. We take as the reference measure Q the Poisson measure: Xt − t is a (FX, Q) martingale.

slide-33
SLIDE 33

Example

Counting process observations

The density process z θ is zθ

t = dPθ t

dQt = exp t log λθ

sdXs −

t (λθ

s − 1)ds

  • .

If the model is smooth with respect to θ, then the score will be [we will write the one dimensional case only] V θ

t = log zθ t

dθ = t log(λθ

s)

dθ dXs − t λθ

s

dθds.

slide-34
SLIDE 34

Example

Counting process observations

Put uθ = log λθ

dθ .

The Burkholder-Davis-Gundy inequality tells us that V θ

t ∈ Lq(Pθ)

if and only if EPθ[ t (uθ

s )2dXs]q/2 < ∞.

Rosenthal inequality tells us that V θ

t ∈ Lq(Pθ) if and only if

EPθ t

s

2 ds q/2 + t |uθ

s |qλθ sds

  • < ∞.
slide-35
SLIDE 35

Conclusion

◮ The Lq- differentiability of statistical models allows us to

extend LeCam’s one dimensional result to multi dimensional case.

slide-36
SLIDE 36

Conclusion

◮ The Lq- differentiability of statistical models allows us to

extend LeCam’s one dimensional result to multi dimensional case.

◮ Using martingale methods one can simple sufficient criteria for

Lq- differentiability.

slide-37
SLIDE 37

References

Barndorff-Nielsen, O.E., and Sørensen, M. (1994). A review of some aspects of asymptotic likelihood theory for stochastic processes. International Statistical Review, 62, 133-165. Dzhaparidze, K., and Valkeila, E. (1990). On Hellinger type distances for filtered experiments. Probability Theory and Related Fields, 85, 105-117. Ibragimov, I.A., and Has’misnkii, R.Z. (1981). Statistical estimation: Asymptotic Theory. Springer, New York. Le Cam, L. (1970). On the Assumptions Used to Prove Asymptotic Normality of Maximum Likelihood Estimates. Annals of Mathematical Statistics, 41, 802-828. Witting, H. (1985). Mathematische Statistik I.