New approaches for statistical modelling Jelena Jockovi c - - PowerPoint PPT Presentation

new approaches for statistical modelling
SMART_READER_LITE
LIVE PREVIEW

New approaches for statistical modelling Jelena Jockovi c - - PowerPoint PPT Presentation

The Double Pareto Lognormal Distribution ( dP lN ) Algebraic structures concerning probability densities Edgeworth expansions New approaches for statistical modelling Jelena Jockovi c ADVISORS: Pepa Ram rez Cobo, Prof. Fernando L


slide-1
SLIDE 1

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

New approaches for statistical modelling

Jelena Jockovi´ c ADVISORS: Pepa Ram´ ırez Cobo,

  • Prof. Fernando L´
  • pez Bl´

azquez

DOC-COURSE IMUS, University of Seville

May 25 2010

slide-2
SLIDE 2

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

Outline

The Double Pareto Lognormal Distribution (dPlN) Algebraic structures concerning probability densities Edgeworth expansions

slide-3
SLIDE 3

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

Heavy tailed distributions

  • samples with some extreme values
  • cannot be modelled by normal distribution
  • application: insurance, finance, hydrology, internet traffic...
  • models: Pareto, Pareto mixtures, Log-normal,..., dPlN

dPlN introduced in: Reed, W. and Jorgensen, M. (2004). The Double Pareto Lognormal distribution - a new parametric model for size

  • distributions. Communications in Statistics, Theory and Methods,

33(8):1733-1753.

slide-4
SLIDE 4

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

dPlN: Definition

  • Reed and Jorgersen, 2004.
  • Define Y = W + Z ind., where Z ∼ N(ν, τ 2) and

W ∼ fW (w) (skewed Laplace distribution): fW (w) =

  • αβ

α+βeβw

for w 0,

αβ α+βe−αw

for w > 0 where α, β > 0

  • Y ∼ NL(α, β, ν, τ) and X = exp(Y ) ∼ dPlN(α, β, ν, τ).
slide-5
SLIDE 5

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

CDF

P[X x] = Φ log x − ν τ

βx−α β + α exp α2τ 2 2 + αν

  • Φ

log x − ν − τ 2α τ

αxβ α + β exp β2τ 2 2 −βν

  • Φc

log x − ν + τ 2β τ

slide-6
SLIDE 6

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

PDF, Moments

  • PDF

f(x) = β α + β f1(x) + α α + β f2(x), f1(x) = αx−α−1 exp

  • αν + α2τ 2

2

  • Φ

log x − ν − ατ 2 τ

  • ,

f2(x) = βxβ−1 exp

  • −βν + β2τ 2

2

  • Φc

log x − ν + βτ 2 τ

  • .
  • Moments: The MGF does not exist in closed form.

However, for r < α moments can be obtained.

slide-7
SLIDE 7

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

dPlN properties

  • Power law tail behaviour:

f(x) ∼ αA(α, ν, τ)x−α−1, x → ∞, f(x) ∼ βA(−β, ν, τ)xβ−1, x → 0

  • Closure under power-law transformations:

X ∼ dPlN(α, β, ν, τ 2), a, b > 0 W = aXb ∼ dPlN(α/b, β/b, bν + log a, b2τ 2)

slide-8
SLIDE 8

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

New results

X, Y ∼ dPlN, Z, W ∼ NL What is:

  • Z + W,

Z − W?

  • X · Y,

X/Y ?

  • X + Y,

X − Y ?

slide-9
SLIDE 9

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

New results

X, Y ∼ dPlN, Z, W ∼ NL What is:

  • Z + W,

Z − W? obtained

  • X · Y,

X/Y ? obtained

  • X + Y,

X − Y ? very hard!

  • exp(ax)ϕ(x + b)Φ(x)dx

(!?)

slide-10
SLIDE 10

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

Future work

  • more general formulas for NL, dPlN
  • lack of identifiability of dPlN

f(x1, x2, ..., xn | θ) = f(x1, x2, ..., xn | θ′), θ = (α, β, ν, τ), θ = θ′ f - the likelihood function Sometimes, parameters are not estimated well!

  • queueing models
slide-11
SLIDE 11

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

dPlN and queueing systems

  • queueing systems closely related to heavy tailed modelling

(congestion in teletraffic systems, ruin problems in insurance...) Cooper, R. (1981). Introduction to Queueing Theory. North Holland, 2nd edition.

  • GI/M/c described in:

Ausin, M., Lillo, R., and Wiper, M. (2007). Bayesian control

  • f the number of servers in a GI/M/c queueing system.

Journal of Statistical Planning and Inference, 137:3043-3057.

  • dPlN/M/1, M/dPlN/1 analyzed in:

Ramirez, P., Lillo, R., Wilson, S., and Wiper, M. (2010). Bayesian inference for Double Pareto Lognormal queues. To appear in Annals of Applied Statistics.

  • next:

dPlN/G/c queueing system!

  • ptimizing number of servers!
slide-12
SLIDE 12

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

Motivation

  • bringing together applied probability and algebra

(known applications in analysis of variance, multivariate analysis and stationary processes) Some classical references: Girardin, V. and Senoussi, R. (2003). Semigroup stationary processes and spectral representation. Bernouilli, 9(5):857-876. Grenander, U. (1963). Probabilities on Algebraic Structures. John Wiley, New York. Hannan, E. (1965). Group representations and applied

  • probability. J. Appl. Prob., 2:1-68.
  • What is the family of densities f?
slide-13
SLIDE 13

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

Motivation

  • bringing together applied probability and algebra

(known applications in analysis of variance, multivariate analysis and stationary processes) Some classical references: Girardin, V. and Senoussi, R. (2003). Semigroup stationary processes and spectral representation. Bernouilli, 9(5):857-876. Grenander, U. (1963). Probabilities on Algebraic Structures. John Wiley, New York. Hannan, E. (1965). Group representations and applied

  • probability. J. Appl. Prob., 2:1-68.
  • What is the family of densities f?

Most distribution families are not closed for convolutions. We have to define new operations!

slide-14
SLIDE 14

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

New results

Gamma distribution: V = {g(α, β) | α > 0, β > 0}, g(α, β) =

βα Γ(α)xα−1e−βx.

⊕ : V × V → V, g(α, β) ⊕ g(α1, β1) = g(αα1, ββ1) ⊗ : R × V → V, c ⊗ g(α, β) = g(αc, βc) inner product:g(α, β), g(α1, β1) = log α · log α1 + log β · log β1 Then, the structure (V, R, ⊕, ⊗, .) is a pre-Hilbert space.

slide-15
SLIDE 15

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

New results

Normal distribution: V = {f(ν, τ 2) | ν ∈ R, τ 2 > 0}, f(ν, τ 2) =

1 τ √ 2πe− 1

2 (x−ν)2 τ2

. ⊕ : V × V → V, f(ν, τ 2) ⊕ f(ν1, τ12) = f(ν + ν1, τ 2τ 2

1 )

⊗ : R × V → V, c ⊗ f(ν, τ 2) = f(cν, (τ 2)c) inner product:f(ν, τ 2), f(ν1, τ 2

1 ) = νν1 + log τ 2 · log τ 2 1

Then, the structure (V, R, ⊕, ⊗, .) is a pre-Hilbert space.

slide-16
SLIDE 16

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

Conclusions and future work

Operations ⊕ and ⊗ can be applied to:

  • any family of densities defined by two real parameters

(at least one positive)

  • moment generating functions, characteristic functions

(example: stable distributions)

slide-17
SLIDE 17

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

Motivation

Central Limit Theorem: sup

x∈R

| Fn(x) − Φ(x) |≤ C0 n

1 2

Error may be too large! A way to improve it:

  • Fn(x) −

k

  • j=0

Aj(x) n

1 2

  • ≤ Ck(x)

n

(k+1) 2

, A0(x) = Φ(x)

slide-18
SLIDE 18

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

Definition

F - d.f. to be approximated, f - c.f., {κr} - cumulants We want to find the expansion based on: d.f. Ψ, with c.f. ψ and cumulants {γr} f(t) = exp +∞

  • r=1

(κr − γr)(it)r r!

  • ψ(t)

(holds) Under certain conditions and after applying the inverse Fourier transform: F(t) = exp +∞

  • r=1

(κr − γr)(−Dx)r r!

  • Ψ(t)

(Charlier differential series) Dx - differential operator with respect to x

slide-19
SLIDE 19

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

Edgeworth expansions - definition

Fn(x) = P X1+X2+···+Xn

n

−µ σ

≤ x

  • , Xi - iid r.v. with mean µ and

variance σ, Φ - standard normal distribution Collecting terms according to powers of n... Edgeworth expansion: fn(t) =  1 +

  • j=1

Pj(it) n

j 2

  exp(−t2/2), Pj - pol. of deg. 3j, Fn(x) = Φ(x) +

  • j=1

Pj(−Dx) n

j 2

Φ(x) (Pj - Cramer-Edgeworth polynomials) Convergent series, can be truncated with error arbitrary small!

slide-20
SLIDE 20

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

Validity of Edgeworth expansions

Can be proved in these cases:

  • Cramer’s condition holds

exists M > 0 such that sup {|φX(t)| , |t| > M} < 1.

  • lattice distributions (support consists of equidistant points)

Some classical references: Chebyshev, P. (1890), Edgeworth, F. (1905), Cramer, H. (1928), Essen, C. (1945)

slide-21
SLIDE 21

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

Univariate case - inversion formulas

Lattice case: Support is {xm : xm = α + βm, m ∈ Z}, α ∈ R, β > 0.

Theorem

Let X be a discrete r.v. with probability function pm = P[X = xm], xm = α + βm, m ∈ Z, β > 0, E |X| < ∞. P[X ≥ xm] = 1 2 + β 2πPV π/β

−π/β

φX(t) exp(−ixmt) 1 − exp(−iβt) dt . If Cramer’s condition holds:

Theorem

Let X be a r.v. that satisfies Cramer’s condition and E |X| < ∞. P[X > x] = 1 2 + 1 2πiPV ∞

−∞

φX(t) exp(−ixt) t dt, x ∈ R.

slide-22
SLIDE 22

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

Univariate case - expansions

Lattice case: {Xn}n≥1 - nondegenerate, discrete, iid r.v., pm = P[X1 = m], m ∈ Z, Xn = n

i=1 Xi/n, µ = EX1, σ2 = V ar(X1), Zn = √n

  • Xn − µ
  • .

(Zn - supported in a lattice set {− (√nµ) + m/√n, m ∈ Z}) rn defined by: nµ + √nx + rn(x) = [nµ + √nx] + 1 Hi(x) - Hermite polynomials Edgeworth expansion: P[Zn ≥ x] = 1 − Φ x

σ

  • + ϕ

x

σ

κ3

6σ3 H2

x

σ

  • + 1

σ

1

2 − rn(x)

  • n−1/2 +

+ϕ x

σ

κ2

3

72σ6 H5

x

σ

  • +

1 6σ4

κ4

4 + κ3

1

2 − rn(x)

  • H3

x

σ

  • +

+ 1

2σ2

1

6 − rn(x)(1 − rn(x))

  • H1

x

σ

  • n−1 + o(n−1).
slide-23
SLIDE 23

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

Univariate case - expansions

If Cramer’s condition holds:

Theorem

Let {Xn}n≥1 be a sequence of nondegenerate, iid r.v. Moreover, suppose that X1 satisfies Cramer’s condition and that first k + 2 cumulants exist (k ≥ 1). Let Zn = √n

  • Xn − µ
  • . Then,

P[Zn > x] =

k

  • j=1

Pj(−Dx)

  • 1 − Φ

x

σ

  • nj/2

+ o(n−k/2), x ∈ R Pj(z) - j-th Cramer-Edgeworth polynomial associated to X1 . Edgeworth expansion: P[Zn > x] = 1 − Φ x σ

  • + ϕ

x σ κ3 6σ3 H2 x σ

  • n−1/2+

+ κ2

3

72σ6 H5 x σ

  • +

1 24σ4 κ4 4 H3 x σ

  • n−1
  • + o(n−1).
slide-24
SLIDE 24

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

Multivariate case

What is different?

  • much more complicated notation
  • multidimensional lattice

L =

  • xm = α + d

j=1 mjvj : m ∈Zd

, with α ∈ Rd

  • case of mixed vectors

X = (X(1)′, X(2)′)

′ - d-dim. r.v. with positive definite

covariance matrix, EX(1) = µ X(1) - d1-dim. r.v., with support in a lattice, L = Zd1, in Rd1, X(2) is a d2-dim. r.v., such that for any x(1) ∈ L with P[X(1) = x(1)] > 0 the conditional distribution of

  • X(2)|X(1) = x(1)

has a positive definite covariance matrix and satisfies Cramer’s condition

slide-25
SLIDE 25

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

Multivariate case-mixed vectors

Under the conditions for mixed vectors: P

  • Z(1)

n

≥ x(1), Z(2)

n

> x(2) =

k−2

  • j=0

(Tδj) (x) nj/2 +o(n−(k−2)/2), x ∈ Rd, where δj(z), j = 0, . . . , k − 2 is the j-th polynomial obtained by convolution: δ(z) = P(z) ∗ b(z1) ∗ · · · ∗ b(zd1) ∗ cx(1)(z(1)), Pj - j-th Cramer-Edgeworth polynomial associated to X; bk defined by:

  • k=0

bk(z) nk/2 = 1/g(z), g(z) = √n z

  • 1 − exp
  • − z

√n

  • ;
slide-26
SLIDE 26

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

Multivariate case-mixed vectors

cx defined by: cx(z) = (−1)j j!

  • r′

n(x)z

j , j ≥ 0

  • ,

x+ 1 √nrn(x) = √n jn(x) n − µ

  • ,

jn,s = inf

p∈Z{p|p ≥ nµs + √nxs}

T being an operator acting on multivariate polynomials: (TQ) (x) = Q(−Dx)

  • Φd(x; V)
  • .
slide-27
SLIDE 27

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

Multivariate case - mixed vectors

Polynomials δ: δ0(z) = 1; δ1(z) = 1 6κ30z3

1 + 1

6κ03z3

2 + 1

2κ12z1z2

2 + 1

2κ21z2

1z2 +

1 2 − r1

  • z1;

δ2(z) = 1

72κ2 30z6 1 + 1 72κ2 03z6 2 + 1 12κ30κ21z5 1z2 + 1 12κ03κ12z1z5 2 +

+ 1

8κ2 21 + 1 12κ30κ12

  • z4

1z2 2+

1

8κ2 12 + 1 12κ03κ21

  • z2

1z4 2+

1

36κ30κ03 + 1 4κ21κ12

  • z3

1z3 2+

1

24κ40+ 1 6κ30

1

2 − r1

  • z4

1+

+ 1

24κ04z4 2+

1

6κ13+ 1 6κ03

1

2 −r1

  • z1z3

2 +

1

6κ31 + 1 2

1

2 − r1

  • κ21
  • z3

1z2+

1

4κ22 + 1 2κ12

1

2 − r1

  • z2

1z2 2 +

+ 1

2

  • r2

1 − r1 + 1 6

  • z2

1.

slide-28
SLIDE 28

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions

Thank you for your attention! :-)

slide-29
SLIDE 29

The Double Pareto Lognormal Distribution (dP lN) Algebraic structures concerning probability densities Edgeworth expansions