COMPSTAT 2010 * Score moment estimates Zden ek Fabi an - - PowerPoint PPT Presentation

compstat 2010
SMART_READER_LITE
LIVE PREVIEW

COMPSTAT 2010 * Score moment estimates Zden ek Fabi an - - PowerPoint PPT Presentation

COMPSTAT 2010 * Score moment estimates Zden ek Fabi an Institute of Computer Sciences, Prague August 17, 2010 Zden ek Fabi an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates Motivation Apart from the


slide-1
SLIDE 1

COMPSTAT 2010 *

Score moment estimates

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague August 17, 2010

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-2
SLIDE 2

Motivation

Apart from the fact that the ML estimates ˆ θML are often influenced by outliers, the solution f(x; ˆ θML) of the parametric estimation problem has some other drawbacks:

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-3
SLIDE 3

Motivation

Apart from the fact that the ML estimates ˆ θML are often influenced by outliers, the solution f(x; ˆ θML) of the parametric estimation problem has some other drawbacks: Instead of f(x; ˆ θML), a few numbers characterizing the data would be useful in further analysis. However, moments mk = E(X − m1)k, m1 = EX are often queer expressions containing special functions, and moments of heavy-tailed distributions do not exist, so that the approach ˆ mk = mk(ˆ θML) is not used

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-4
SLIDE 4

Motivation

Apart from the fact that the ML estimates ˆ θML are often influenced by outliers, the solution f(x; ˆ θML) of the parametric estimation problem has some other drawbacks: Instead of f(x; ˆ θML), a few numbers characterizing the data would be useful in further analysis. However, moments mk = E(X − m1)k, m1 = EX are often queer expressions containing special functions, and moments of heavy-tailed distributions do not exist, so that the approach ˆ mk = mk(ˆ θML) is not used Complex problems are solved by using ’pure’ data not ’adapted’ to the assumed model by an adequate inference function (Pearson correlation coefficient)

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-5
SLIDE 5

Problem

The reason: The score function r(x; θ) = (rθ1, ..., rθm), rθj(x; θ) =

∂ ∂θj log f(x; θ), is a vector function, suitable for

estimation of parameters, but too complicated to afford useful proposals of sensible numeric characteristics of distributions and too complicated to be used in more complex problems

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-6
SLIDE 6

Problem

The reason: The score function r(x; θ) = (rθ1, ..., rθm), rθj(x; θ) =

∂ ∂θj log f(x; θ), is a vector function, suitable for

estimation of parameters, but too complicated to afford useful proposals of sensible numeric characteristics of distributions and too complicated to be used in more complex problems The problem: To find a relevant scalar inference function S(x; θ) reflecting basic features of the model distribution, and to use moments Mk(θ) =

  • X

Sk(x; θ)f(x; θ) dx for generalized moment estimates

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-7
SLIDE 7

Location distributions

Location distribution g(y − µ), µ ∈ R, g unimodal, regular, with support R Scalar score rµ(y; µ) = ∂ ∂µ log g(y − µ) = SG(y − µ) where function SG(y) = −g′(y) g(y) is obtained by differentiating according the variable

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-8
SLIDE 8

Location distributions

Location distribution g(y − µ), µ ∈ R, g unimodal, regular, with support R Scalar score rµ(y; µ) = ∂ ∂µ log g(y − µ) = SG(y − µ) where function SG(y) = −g′(y) g(y) is obtained by differentiating according the variable Scalar score of a distribution with support R SG(y; θ) = − 1 g(y; θ) d dy g(y; θ)

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-9
SLIDE 9

Log-location distributions - I

The log-location distribution (Lawless 2003) F of random variable X = η−1(Y) with support X = (0, ∞) has density f(x; τ) = g(u)η′(x), where g(y − µ) is the density of ’prototype’ distribution on R, u = η(x) − η(τ) and the ’log-location’ parameter τ = η−1(µ) is the ’image’ of the location µ of the prototype

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-10
SLIDE 10

Log-location distributions II

Theorem. ∂ ∂τ log f(x; τ) = SG(u)η′(τ)

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-11
SLIDE 11

Log-location distributions II

Theorem. ∂ ∂τ log f(x; τ) = SG(u)η′(τ) T(x; τ) ≡ SG(u) = − 1 f(x; τ) d dx

  • 1

η′(x)f(x; τ)

  • transformation-based score (t-score)

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-12
SLIDE 12

Log-location distributions II

Theorem. ∂ ∂τ log f(x; τ) = SG(u)η′(τ) T(x; τ) ≡ SG(u) = − 1 f(x; τ) d dx

  • 1

η′(x)f(x; τ)

  • transformation-based score (t-score)

Scalar score Sτ(x) = η′(τ)T(x; τ)

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-13
SLIDE 13

Generalizations

F on general interval support X ⊆ R, η : X → R

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-14
SLIDE 14

Generalizations

F on general interval support X ⊆ R, η : X → R t-score (a general concept) T(x, θ) = − 1 f(x; θ) d dx

  • 1

η′(x)f(x; θ)

  • where (Johnson, 1949)

η(x) = log(x − a) if X = (a, ∞) log (x − a) (b − x) if X = (a, b)

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-15
SLIDE 15

Generalizations

F on general interval support X ⊆ R, η : X → R t-score (a general concept) T(x, θ) = − 1 f(x; θ) d dx

  • 1

η′(x)f(x; θ)

  • where (Johnson, 1949)

η(x) = log(x − a) if X = (a, ∞) log (x − a) (b − x) if X = (a, b) However, to use relation

∂ ∂τ log f(x; θ) = η′(τ)T(x; θ),

θ has to be in the form θ = (η−1(µ), θ2, ..., θm)

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-16
SLIDE 16

Starting point

Sτ(x; τ) = η′(τ)T(x; τ)

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-17
SLIDE 17

Starting point

Sτ(x; τ) = η′(τ)T(x; τ) Example: f(x; τ) = 1

τ e−x/τ

T(x; τ) = x/τ − 1 Sτ(x; τ) = 1 τ (x/τ − 1)

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-18
SLIDE 18

Starting point

Sτ(x; τ) = η′(τ)T(x; τ) Example: f(x; τ) = 1

τ e−x/τ

T(x; τ) = x/τ − 1 Sτ(x; τ) = 1 τ (x/τ − 1) τ is usually taken as scale parameter, but τ = η−1(µ) and T(τ; θ) = 0. Perhaps the most important value is not the parameter, but the ’center’ of the distribution itself

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-19
SLIDE 19

Definitions

Measure of central tendency: t-mean x∗(θ) : T(x; θ) = 0

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-20
SLIDE 20

Definitions

Measure of central tendency: t-mean x∗(θ) : T(x; θ) = 0 Inference function: Scalar score S(x; θ) ≡ η′(x∗)T(x; θ)

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-21
SLIDE 21

Definitions

Measure of central tendency: t-mean x∗(θ) : T(x; θ) = 0 Inference function: Scalar score S(x; θ) ≡ η′(x∗)T(x; θ) EθS2 Fisher information for x∗

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-22
SLIDE 22

Example: Scalar scores of beta-prime distribution

f(x) =

1 B(p,q) xp−1 (x+1)p+q

T(x) = qx−p

x+1

x∗ = p

q

S(x) = q

p qx−p x+1

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-23
SLIDE 23

Consequences

Measure of variability: Score variance: the reciprocal Fisher information ω2(θ) = 1 EθS2

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-24
SLIDE 24

Consequences

Measure of variability: Score variance: the reciprocal Fisher information ω2(θ) = 1 EθS2 ’Center’ and ’radius’ of the distribution x∗(θ), ω(θ)

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-25
SLIDE 25

Consequences

Measure of variability: Score variance: the reciprocal Fisher information ω2(θ) = 1 EθS2 ’Center’ and ’radius’ of the distribution x∗(θ), ω(θ) Estimates: Important are not the estimates of θ, but the sample t-mean ˆ x∗ = x∗(ˆ θML) and sample score standard deviation ˆ ω = ω(ˆ θML), which make possible to compare results for various models with different parameters

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-26
SLIDE 26

Score moment estimators

ˆ θSM by a generalized moment method 1 n

n

  • i=1

Sk(xi; θ) = EθSk, k = 1, ..., m

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-27
SLIDE 27

Score moment estimators

ˆ θSM by a generalized moment method 1 n

n

  • i=1

Sk(xi; θ) = EθSk, k = 1, ..., m Scalar score moment estimates are M-estimates, equations are ’simple’ (EθSk is often expressed by simple function of parameters)

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-28
SLIDE 28

Score moment estimators

ˆ θSM by a generalized moment method 1 n

n

  • i=1

Sk(xi; θ) = EθSk, k = 1, ..., m Scalar score moment estimates are M-estimates, equations are ’simple’ (EθSk is often expressed by simple function of parameters) Scalar scores of heavy-tailed distributions are bounded: estimates are robust

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-29
SLIDE 29

Score moment estimators

ˆ θSM by a generalized moment method 1 n

n

  • i=1

Sk(xi; θ) = EθSk, k = 1, ..., m Scalar score moment estimates are M-estimates, equations are ’simple’ (EθSk is often expressed by simple function of parameters) Scalar scores of heavy-tailed distributions are bounded: estimates are robust In cases of heavy-tailed distributions, estimates have asymptotic efficiences ∼ 0.9.

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-30
SLIDE 30

Inverted gamma distribution

Support (0, ∞), densities and t-scores f(x) = γα xΓ(α)x−αe−γ/x T(x) = α − γ/x x∗ = γ

α, ET 2 = α, ω2 = (x∗)2 ET 2 = γ2 α3 , S(x) = α2 γ (1 − x∗/x)

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-31
SLIDE 31

Estimation

n

  • i=1

(1 − x∗/xi) = 1 n

n

  • i=1

(1 − x∗/xi)2 = α ˆ x∗ is the harmonic mean

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-32
SLIDE 32

Generalized beta family

Support X = (0, ∞) and densities f(x; τ, α, ν) = 1 ναB(να, α) (x/τ)να−1 [(x/τ) + 1/ν](1+ν)α where B is the beta function. The t-score is T(x; τ; α, ν) = α (x/τ) − 1 (x/τ) + 1/ν The first three t-score moments ET = 0, ET 2 =

ν (ν+1)α+1

ET 3 =

2ν(1−ν) [(ν+1)α+1][(ν+1)α+2] are independent of τ

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-33
SLIDE 33

Generalized beta family, τ = 1

By setting τ = 1 we obtain equations ˆ ν :

n

  • i=1

xi − 1 xi + 1/ν = 0 and ˆ α = (ˆ ν/ρ − 1)/(ˆ ν + 1), where ρ = 1

n

n

i=1

  • xi−1

xi+1/ˆ ν

2

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-34
SLIDE 34

Estimation of the Threshold Parameter

Uniform distribution on [0, γ]. ML estimator is ˆ γML = x(n). The t-score is T(x) = 2x γ − 1, so that 1 n

n

  • i=1

2xi γ = 1 The score moment solution ˆ γSM = max(x(n), 2¯ x) For n = 5, 10, 20 and 50 we obtained after 10 000 experiments ˆ γML ≈ 0.87, 0.91, 0.95 and 0.98, respectively, whereas ˆ γSM = 1 with accuracy to three decimal points

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-35
SLIDE 35

Confidence intervals

for ˆ x∗

SM can be established by the modification of the Rao score

test or by the use of the distance d(ˆ x∗

SM, x0) = |S(ˆ

x∗

SM) − S(x0)|

ES2 As ω2 = 1 ES2 = (x∗)2 ET 2 ˆ ω = ˆ x∗

SM

1

n

n

i=1 T 2(xi; ˆ

x∗

SM)

1/2

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates

slide-36
SLIDE 36

References Fabi´ an, Z. (2001). Induced cores and their use in robust parametric estimation. Comm. in Statist. Theory Methods 30, 537-556. Fabi´ an, Z. (2008). New measures of central tendency and variability of continuous distributions. Comm. Statist. Theory Methods 37, 159-174. Fabi´ an, Z., Stehl´ ık, M. (2008). A note on favorable estimation when data is contaminated. Comm. Dep. and Quality Management 11, 36-43. Fabi´ an, Z. (2009). Confidence intervals for a new characteristic

  • f central tendency of distributions. Comm. Statist. Theory

Methods 38, 1804-1814.

Zdenˇ ek Fabi´ an Institute of Computer Sciences, Prague COMPSTAT 2010 * Score moment estimates