SLIDE 1
On the asymptotics of the m.l. estimators Matematiikan p aiv at - - PowerPoint PPT Presentation
On the asymptotics of the m.l. estimators Matematiikan p aiv at - - PowerPoint PPT Presentation
On the asymptotics of the m.l. estimators Matematiikan p aiv at 4.-5.1. 2006, Tampere Esko Valkeila Teknillinen korkeakoulu 4.1.2006 Outline of the talk Motivation Outline of the talk Motivation Some technical facts
SLIDE 2
SLIDE 3
Outline of the talk
◮ Motivation ◮ Some technical facts
SLIDE 4
Outline of the talk
◮ Motivation ◮ Some technical facts ◮ One result of Le Cam
SLIDE 5
Outline of the talk
◮ Motivation ◮ Some technical facts ◮ One result of Le Cam ◮ Multi-dimensional parameters
SLIDE 6
Outline of the talk
◮ Motivation ◮ Some technical facts ◮ One result of Le Cam ◮ Multi-dimensional parameters ◮ Abstract filtered models
SLIDE 7
Outline of the talk
◮ Motivation ◮ Some technical facts ◮ One result of Le Cam ◮ Multi-dimensional parameters ◮ Abstract filtered models ◮ Examples
SLIDE 8
Outline of the talk
◮ Motivation ◮ Some technical facts ◮ One result of Le Cam ◮ Multi-dimensional parameters ◮ Abstract filtered models ◮ Examples ◮ Conclusions
SLIDE 9
Motivation
Basic setup
We work with statistical models/experiments En(Θ) := (Ωn, F n, Pθ
n; θ ∈ Θ);
here (Ωn, F n) is a model for the observation scheme, and Pθ
n, θ ∈ Θ is a model for different statistical theories concerning
the observations. We are interested in asymptotics: what happens if n → ∞? More precisely, would like to understand what are the minimal assumptions to quarantee that the maximum likelihood estimator is asymptotically normal, efficient . . .
SLIDE 10
Motivation
Text book information
How these problems are treated in the text books of Statistics? I would like to describe the situation as follows:
◮ Cook books on Statistics: Under some regularity assumptions
the m.l.e. is asymptotically normal. Typically, no proof is given.
SLIDE 11
Motivation
Text book information
How these problems are treated in the text books of Statistics? I would like to describe the situation as follows:
◮ Cook books on Statistics: Under some regularity assumptions
the m.l.e. is asymptotically normal. Typically, no proof is given.
◮ Main stream books on Statistics: The log-likelihood is smooth
(in C2 in the neighbourhood of the true parameter) , some domination on the remainder term, and the support of the true distribution does not depend on the parameter. A detailed proof is given.
SLIDE 12
Motivation
Text book information, cont.
Next I will make some comments:
◮ If we want to understand the minimal conditions for the good
properties of the m.l.e. to be valid, we should forget cook books on Statistics.
SLIDE 13
Motivation
Text book information, cont.
Next I will make some comments:
◮ If we want to understand the minimal conditions for the good
properties of the m.l.e. to be valid, we should forget cook books on Statistics.
◮ Main stream books on Statistics are inaccurate: the support
can depend on the parameter, if the dependency is smooth (take f (x; θ) = (x − θ)e−(x−θ)1{x≥θ}).
SLIDE 14
Some technical facts
L2- differentiability
The following definition will be very useful. We work with statistical model/experiment (Ω, F, P θ; θ ∈ Θ) with the following additional property: there exists a probability measure Q such that Pθ ≺≺ Q for all θ ∈ Θ. Put f θ := dPθ dQ . Notation: Θ ⊂ Rd, (u, v) is the inner product in R d. The model is differentiable in L2, if there exists a random variable wθ,2 ∈ L2(Q) such that for all un → 0 we have EQ √ f θ+un − √ f θ |un| − (un, w θ,2) 2 → 0 as n → ∞.
SLIDE 15
Some technical facts
Lq- differentiability
We formally generalize the L2-differentiability: take q > 2 The model is differentiable in Lq, if there exists a random variable wθ,q ∈ Lq(Q) such that for all un → 0 we have EQ
- q
√ f θ+un −
q
√ f θ |un| − (un, w θ,q)
- q
→ 0 as n → ∞.
SLIDE 16
Some technical facts
Score and Lq- differentiability
To simplify the discussion, we assume that P θ ∼ Q and put Lη,θ = f η
f θ ; Lη,θ is the likelihood.
One can show the following: EQ
- f θ+un − f θ
|un| − (un, w θ,1)
- → 0
with some random variable w θ,1 ∈ L1(Q), if and only if EPθ|Lθ+un,θ − 1 |un| − (un, v θ)| → 0 with some random variable v θ ∈ L1(Pθ). The vector v θ is the score vector.
SLIDE 17
Some technical facts
Score and Lq- differentiability, cont.
Moreover, the model is differentiable in Lq if and only if EPθ
- q
√ Lθ+un,θ − 1 |un| − 1 q (un, v θ)
- q
→ 0. Hence w θ,2 = 1
2
√ f θv θ, w θ,q = 1
q
q
√ f θv θ; for L2- differentiable the Fisher information matrix I(θ) automatically exists and Iij(θ) = EPθv θ
i v θ j .
SLIDE 18
One result of LeCam
Consider next the case when the statistical model En(Θ) is a product experiment En(Θ) = (Ωn, ⊗n
k=1F, Pθ n; θ ∈ Θ);
here Pθ
n is the product measure Pθ n = n k=1 Pθ.
One can show that the experiment En(Θ) is Lq- differentiable if and only if the coordinate experiment e(Θ) = (Ω, F, P θ; θ ∈ Θ) is Lq- differentiable. Let ˆ θn be the m.l.- estimator of the parameter θ in the product experiment, i.e. the m.l.- estimator based on n independent and identical observations from the model (Ω, F, P θ; θ ∈ Θ).
SLIDE 19
One result of LeCam, cont.
We can now formulate the result of LeCam for the product
- experiments. Assume that
◮ Θ ⊂ R is open and bounded.
SLIDE 20
One result of LeCam, cont.
We can now formulate the result of LeCam for the product
- experiments. Assume that
◮ Θ ⊂ R is open and bounded. ◮ The model (Ω, F, Pθ; θ ∈ Θ) is L2 differentiable.
SLIDE 21
One result of LeCam, cont.
We can now formulate the result of LeCam for the product
- experiments. Assume that
◮ Θ ⊂ R is open and bounded. ◮ The model (Ω, F, Pθ; θ ∈ Θ) is L2 differentiable. ◮ 0 < infθ I(θ) and supθ I(θ) < ∞ and the map θ → I(θ) is
continuous.
SLIDE 22
One result of LeCam, cont.
We can now formulate the result of LeCam for the product
- experiments. Assume that
◮ Θ ⊂ R is open and bounded. ◮ The model (Ω, F, Pθ; θ ∈ Θ) is L2 differentiable. ◮ 0 < infθ I(θ) and supθ I(θ) < ∞ and the map θ → I(θ) is
continuous.
◮ Then the following facts hold:
SLIDE 23
One result of LeCam, cont.
We can now formulate the result of LeCam for the product
- experiments. Assume that
◮ Θ ⊂ R is open and bounded. ◮ The model (Ω, F, Pθ; θ ∈ Θ) is L2 differentiable. ◮ 0 < infθ I(θ) and supθ I(θ) < ∞ and the map θ → I(θ) is
continuous.
◮ Then the following facts hold: ◮ There exists maximum likelihood estimators ˆ
θn.
SLIDE 24
One result of LeCam, cont.
We can now formulate the result of LeCam for the product
- experiments. Assume that
◮ Θ ⊂ R is open and bounded. ◮ The model (Ω, F, Pθ; θ ∈ Θ) is L2 differentiable. ◮ 0 < infθ I(θ) and supθ I(θ) < ∞ and the map θ → I(θ) is
continuous.
◮ Then the following facts hold: ◮ There exists maximum likelihood estimators ˆ
θn.
◮ The sequence √n(ˆ
θn − θ) is asymptotically normal under P θ with the limit N(0,
1 I(θ)).
SLIDE 25
One result of LeCam, discussion
Essentially the good properties of the m.l.e. follow from the L2- differentiability, when the parameter is one-dimensional. In the main stream text books the proof is based on Taylor expansion with two terms and the correction term. This is not possible here, because we do not have a Taylor expansion with two terms, but with one term only. In the proof one must control the terms sup
|u|≤δ
|Lθ+u,θ
n
− 1|, by using the Kolmogorov criteria for modulus of continuity; here Lθ+u,θ
n
is the likelihood in the product experiment. If the parameter is one-dimensional, then L2 differentiability is sufficient for the control we are looking for.
SLIDE 26
Multi dimensional parameters
Assume now that Θ ⊂ Rd, where d ≥ 2. We still assume that the model is L2– differentiable, the parameter set Θ is an open and bounded subset of Rd, Fisher information is continuous, strictly non-degenerate, and the score vector v θ satisfies v θ ∈ Lq(Pθ) with some q > d. Then
◮ There exists maximum likelihood estimators ˆ
θn.
◮ The sequence √n(ˆ
θn − θ) is asymptotically normal under P θ with the limit N(0, I(θ)−1).
SLIDE 27
Multi dimensional parameters, discussion
As explained earlier, the main problem with this approach is to control the expression sup
|u|≤δ
|Lθ+u,θ
n
− 1|
- r equivalently
sup
|u|≤δ
|
q
- Lθ+u,θ
n
− 1|. If v θ ∈ Lq(Pθ), then the experiment is also Lq- differentiable, and this makes the desired control possible. The proof of these results is essentially in Ibragimov and Has’minskii, but the role of Lq– differentiability in their arguments is missing.
SLIDE 28
General observation schemes
Filtered experiments
We work now with filtered models: (Ω, F, F, P θ; θ ∈ Θ). Here F = (Ft)0≤t≤T is an increasing family of sigma-fields, so called
- filtration. Assume that Pθ ∼ Q and define the density processes by
zθ
t = dPθ t
dQt ; here Pθ
t = Pθ|Ft (Qt = Q|Ft).
We have the following for free: density processes z θ are (F, Q)- martingales.
SLIDE 29
General observation schemes
Filtered experiments: differentiability and Fisher information
Assume that the filtered experiment is differentiable at time T at some θ ∈ Θ. Then the experiment is differentiable for every t ≤ T. Moreover, the score process V θ is now a square integrable (F, P θ)–
- martingale. Then there exists a unique predictable matrix valued
process V θ, V θ = V θ,1, V θ,1 · · · V θ,1, V θ,d . . . . . . . . . V θ,d, V θ,1 · · · V θ,d, V θ,d such that the process V θ,iV θ,j − V θ,i, V θ,j is a (F, Pθ)- martingale.
SLIDE 30
General observation schemes
Filtered experiments: differentiability and Fisher information
The process V θ, V θ has componentwise bounded variation on
- compacts. If It(θ) is the Fisher information matrix for the filtered
experiment at time t, the process V θ, V θt is ’predictable ’ Fisher information. For a discussion various (Fisher) information concepts for stochastic processes we refer to Barndorff-Nielsen and Søresen, where the authors discuss various sample information concepts like V θ ∗ (V θ)t or the raw bracket [V θ, V θ]· We have I ij
t (θ) = EPθV θ,i t V θ,j t
= EPθV θ,i, V θ,jt = EPθ[V θ,i, V θ,j]t.
SLIDE 31
General observation schemes
Lq differentiability
We know that Lq differentiability follows from the property V θ
t ∈ Lq(Pθ).
But the score process V θ is a martingale, and hence there are well-known criteria to check this. One can use the Burkholder-Davis-Gundy inequality to check the condition V θ
t ∈ Lq(Pθ).
The other possibility is to use so-called Rosenthal inequalities to check the condition V θ
t ∈ Lq(Pθ).
We will illustrate how this works with an example.
SLIDE 32
Example
Counting process observations
We work with counting process models: the P θ intensity is λθ; here λθ is a predictable integrable positive process. Our observations is a trajectory of a counting process X. Then we know that Xt − t λθ
sds
is a (FX, Pθ) martingale. We take as the reference measure Q the Poisson measure: Xt − t is a (FX, Q) martingale.
SLIDE 33
Example
Counting process observations
The density process z θ is zθ
t = dPθ t
dQt = exp t log λθ
sdXs −
t (λθ
s − 1)ds
- .
If the model is smooth with respect to θ, then the score will be [we will write the one dimensional case only] V θ
t = log zθ t
dθ = t log(λθ
s)
dθ dXs − t λθ
s
dθds.
SLIDE 34
Example
Counting process observations
Put uθ = log λθ
dθ .
The Burkholder-Davis-Gundy inequality tells us that V θ
t ∈ Lq(Pθ)
if and only if EPθ[ t (uθ
s )2dXs]q/2 < ∞.
Rosenthal inequality tells us that V θ
t ∈ Lq(Pθ) if and only if
EPθ t
- uθ
s
2 ds q/2 + t |uθ
s |qλθ sds
- < ∞.
SLIDE 35
Conclusion
◮ The Lq- differentiability of statistical models allows us to
extend LeCam’s one dimensional result to multi dimensional case.
SLIDE 36
Conclusion
◮ The Lq- differentiability of statistical models allows us to
extend LeCam’s one dimensional result to multi dimensional case.
◮ Using martingale methods one can simple sufficient criteria for
Lq- differentiability.
SLIDE 37