[PPT] - Why Cannot We Have a Analysis of the Problem Strongly Consistent PowerPoint Presentation

SLIDE 1

Formulation of the . . . Need for Strong . . . What We Do in This Talk Skew Normal . . . Analysis of the Problem Scale Invariance Main Result Discussion Proof Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 1 of 18 Go Back Full Screen Close Quit

Why Cannot We Have a Strongly Consistent Family

f Skew Normal (and Higher

Order) Distributions

Thongchai Dumrongpokaphan1 and Vladik Kreinovich2

1Department of Mathematics, Faculty of Science,

Chiang Mai University, Thailand, tcd43@hotmail.com

2Department of Computer Science, University of Texas at El Paso,

500 W. University, El Paso, Texas 79968, USA, vladik@utep.edu

SLIDE 2

Formulation of the . . . Need for Strong . . . What We Do in This Talk Skew Normal . . . Analysis of the Problem Scale Invariance Main Result Discussion Proof Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 2 of 18 Go Back Full Screen Close Quit

1. Formulation of the Problem

Often, the only information that we have about the

probability distribution is its first few moments.

Many statistical techniques requires us to select a sin-

gle distribution.

It is therefore desirable to select,
out of all possible distributions with these mo-

ments,

a single “most representative” one.
When we know the first two moments, a natural idea

is to select a normal distribution.

This selection is strongly consistent in the sense that:
if a random variable is a sum of several ones,
and we select normal distribution for all of them,
then the sum is also normally distributed.

SLIDE 3

Formulation of the . . . Need for Strong . . . What We Do in This Talk Skew Normal . . . Analysis of the Problem Scale Invariance Main Result Discussion Proof Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 3 of 18 Go Back Full Screen Close Quit

2. Need for Strong Consistency

Often, the random variable of interest has several com-

ponents.

For example, an overall income consists of salaries, pen-

sions, unemployment benefits, interest, etc.

Each of these categories, in its turn, can be subdivided

into more subcategories.

If for each of these categories, we only know the first

moments, then we can apply the selection:

either to the overall sum,
or separately to each term.
It seems reasonable to require that the resulting distri-

bution for the overall sum should be the same.

SLIDE 4

Formulation of the . . . Need for Strong . . . What We Do in This Talk Skew Normal . . . Analysis of the Problem Scale Invariance Main Result Discussion Proof Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 4 of 18 Go Back Full Screen Close Quit

3. What We Do in This Talk

When we know three moments, there is also a widely

used selection – a skew-normal distribution.

However, this selection is not strongly consistent in the

above sense.

In this talk, we show that this absence of strong con-

sistency:

is not a fault of a specific selection but a general

feature of the problem;

namely, for third and higher order moments, no

strongly consistent selection is possible.

SLIDE 5

Formulation of the . . . Need for Strong . . . What We Do in This Talk Skew Normal . . . Analysis of the Problem Scale Invariance Main Result Discussion Proof Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 5 of 18 Go Back Full Screen Close Quit

4. Skew Normal Distributions

In addition to the first two moments µ and M2, we may

also know the third moment M3.

This can be described by the mean µ, the variance V =

σ2, and the third central moment m3

def

= E[(X − µ)3].

There is a widely used selection, called skew normal:

ρ(x) = 2 ω · φ x − η ω

· Φ
α · x − η

ω

, where

φ(x) = 1 √ 2π · exp

−x2

2

, and Φ(x) =

x

−∞

φ(t) dt.

Here, µ = η + ω · δ ·
2

π, where δ

def

= α √ 1 + α2, σ2 = ω2·

1 − 2δ2

π

, and m3 = 4 − π

2 ·σ3· (δ ·

2/π)3

(1 − 2δ2/π)3/2.

SLIDE 6

Formulation of the . . . Need for Strong . . . What We Do in This Talk Skew Normal . . . Analysis of the Problem Scale Invariance Main Result Discussion Proof Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 6 of 18 Go Back Full Screen Close Quit

5. Analysis of the Problem

We want to assign, to each triple (µ, V, m3), a proba-

bility distribution ρ(x, µ, V, m3).

Let us list the natural properties of this assignment.
Moments are rarely known exactly, we usually know

them with some accuracy.

It is reasonable to require that if the moments change

slightly, then ρ(x, µ, V, m3) should not change much.

In other words, it is reasonable to require that the func-

tion ρ(x, µ, V, m3) is continuous.

Comment: in our proof, we will only use that ρ(x) is

measurable.

Strong consistency:

if X1 and X2 are independent, X1 ∼ ρ(x, µ1, V1, m31), and X2 ∼ ρ(x, µ2, V2, m32), then X1 + X2 ∼ ρ(x, µ1 + µ2, V1 + V2, m31 + m32).

SLIDE 7

Formulation of the . . . Need for Strong . . . What We Do in This Talk Skew Normal . . . Analysis of the Problem Scale Invariance Main Result Discussion Proof Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 7 of 18 Go Back Full Screen Close Quit

6. Scale Invariance

Numerical values of different quantities depend on the

choice of a measuring unit.

E.g.: income can be described in Baht or in dollars.
If we change the unit to λ times smaller one, then:
the actual incomes will not change,
but the numerical values will change x → x′ = λ·x.
If we perform the selection in the original units, then

we get ρ(x, µ, V, m3).

If we simply re-scale x to x′ = λ · x, then for x′, we get

a new distribution ρ′(x′) = 1 λ · ρ x′ λ , µ, V, m3

.

SLIDE 8

Formulation of the . . . Need for Strong . . . What We Do in This Talk Skew Normal . . . Analysis of the Problem Scale Invariance Main Result Discussion Proof Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 8 of 18 Go Back Full Screen Close Quit

7. Scale Invariance (cont-d)

If we re-scale ρ(x, µ, V, m3), we get

ρ′(x′) = 1 λ · ρ x′ λ , µ, V, m3

.
We should get the exact same distribution if we make

a selection after the re-scaling, i.e., for µ′ = λ · µ, V ′ = λ2 · V, m′

3 = λ3 · m3.

In the new units, we get ρ(x′, λ · µ, λ2 · V, λ3 · m3).
A natural requirement is that the resulting selection

should be the same: 1 λ · ρ x′ λ , µ, V, m3

= ρ(x′, λ · µ, λ2 · V, λ3 · m3).

SLIDE 9

Formulation of the . . . Need for Strong . . . What We Do in This Talk Skew Normal . . . Analysis of the Problem Scale Invariance Main Result Discussion Proof Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 9 of 18 Go Back Full Screen Close Quit

8. Definitions

We say that a tuple (µ, V, m3) is possible if there exists

a distr. with mean µ, variance V , and moment m3.

By a 3-selection, we mean a measurable mapping

ρ(x, µ, V, m3) defined for all possible tuples.

We say that a 3-selection is strongly consistent if Xi ∼

ρ(x, µi, Vi, m3i) for independent Xi implies X1 + X2 ∼ ρ(x, µ1 + µ2, V1 + V2, m31 + m32).

We say that a 3-selection is scale-invariant if for every

possible tuple (µ, V, m3), for every λ > 0 and x′: 1 λ · ρ x′ λ , µ, V, m3

= ρ(x′, λ · µ, λ2 · V, λ3 · m3).

SLIDE 10

Formulation of the . . . Need for Strong . . . What We Do in This Talk Skew Normal . . . Analysis of the Problem Scale Invariance Main Result Discussion Proof Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 10 of 18 Go Back Full Screen Close Quit

9. Main Result

Proposition. No 3-selection is strongly consistent and

scale-invariant.

A similar result can be formulated for the case when

we also know higher order moments.

In this case, instead of the original moments, we can

consider cumulants κn.

Cumulants are terms at in · tn

n! in the Taylor expansion

f ln(E[exp(i · t · X)]).
For n = 1, n = 2, and n = 3, we get exactly the mean,

the variance, and the central third moment.

Cumulants are additive: if X = X1 + X2 and X1 and

X2 are independent, then κn(X) = κn(X1) + κn(X2).

SLIDE 11

Formulation of the . . . Need for Strong . . . What We Do in This Talk Skew Normal . . . Analysis of the Problem Scale Invariance Main Result Discussion Proof Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 11 of 18 Go Back Full Screen Close Quit

10. Discussion

Since we cannot make a strongly consistent selection,

what should we do?

min and max are also natural operations in many ap-

plications; for example, in econometrics:

if there are several ways to invest money with the

same level of risk,

then an investor selects the one that leads to the

largest interest rate.

From this viewpoint, it is reasonable to consider min-

ima and maxima of normal variables.

In some cases, these minima and maxima are dis-

tributed according to the skew normal distribution.

This may be an additional argument in favor of using

these distributions.

SLIDE 12

Formulation of the . . . Need for Strong . . . What We Do in This Talk Skew Normal . . . Analysis of the Problem Scale Invariance Main Result Discussion Proof Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 12 of 18 Go Back Full Screen Close Quit

11. Proof

For sums of independent random variables X = X1 +

X2, it is convenient to use characteristic functions χX(ω)

def

= E[exp(i·ω·X)] for which χX(ω) = χX1(ω)·χX2(ω).

For characteristic functions χ(ω, µ, V, m3), strong con-

sistency takes the form: χ(ω, µ1 + µ2, V1 + V2, m31 + m32) = χ(ω, µ1, V1, m31) · χ(ω2, µ2, V, m32).

This requirement becomes even simpler if we take log-

arithm of both sides: for ℓ

def

= ln(χ): ℓ(ω, µ1 + µ2, V1 + V2, m31 + m32) = ℓ(ω, µ1, V1, m31) + ℓ(ω2, µ2, V, m32).

SLIDE 13

Formulation of the . . . Need for Strong . . . What We Do in This Talk Skew Normal . . . Analysis of the Problem Scale Invariance Main Result Discussion Proof Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 13 of 18 Go Back Full Screen Close Quit

12. Proof (cont-d)

It is known that the only measurable functions with

this additivity property are linear functions, so ℓ(ω, µ, V, m3) = µ·ℓ1(ω)+V ·ℓ2(ω)+m3·ℓ3(ω) for some ℓi(ω).

Let us now use the scale invariance requirement.
When we replace x with x′ = λ · x, then

χX′(ω) = χX(λ · ω).

Thus re-scaled χ(λ·ω, µ, V, m3) should be equal to what

we get from re-scaled moments: χ(ω, λ·µ, λ2·V, λ3·m3): χ(λ · ω, µ, V, m3) = χ(ω, λ · µ, λ2 · V, λ3 · m3).

Their logarithms should also be equal:

ℓ(λ · ω, µ, V, m3) = ℓ(ω, λ · µ, λ2 · V, λ3 · m3).

SLIDE 14

Formulation of the . . . Need for Strong . . . What We Do in This Talk Skew Normal . . . Analysis of the Problem Scale Invariance Main Result Discussion Proof Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 14 of 18 Go Back Full Screen Close Quit

13. Proof (cont-d)

Substituting the above linear expression for the func-

tion ℓ(ω, µ, V, m3) into this equality, we conclude that µ · ℓ1(λ · ω) + V · ℓ2(λ · ω) + m3 · ℓ3(λ · ω) = λ · µ · ℓ1(ω) + λ2 · V · ℓ2(ω) + λ3 · m3 · ℓ3(ω).

This

equality must hold for all possible triples (µ, V, m3).

Thus, the coefficient at µ, V , and m3 on both sides

must coincide.

SLIDE 15

Formulation of the . . . Need for Strong . . . What We Do in This Talk Skew Normal . . . Analysis of the Problem Scale Invariance Main Result Discussion Proof Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 15 of 18 Go Back Full Screen Close Quit

14. Proof (final)

By equating coefficients at µ, we conclude that

ℓ1(λ · ω) = λ · ℓ1(ω).

In particular, for ω = 1, we conclude that ℓ1(λ) =

λ · ℓ1(1), i.e., that ℓ1(ω) = c1 · ω for some constant c1.

By equating coefficients at V and m3, we similarly get

ℓ2(ω) = c2 · ω2 and ℓ3(ω) = c3 · ω3.

Thus, ℓ(ω, µ, V, m3) = c1 ·µ·ω +c2 ·V ·ω2 +c3 ·m3 ·ω3,

and χ(ω, µ, V, m3) = exp(c1 · µ · ω + c2 · V · ω2 + c3 · m3 · ω3).

However, the Fourier transform of the above expression

is, in general, not an everywhere non-negative function.

Thus, it cannot serve as a probability density function.

SLIDE 16

Formulation of the . . . Need for Strong . . . What We Do in This Talk Skew Normal . . . Analysis of the Problem Scale Invariance Main Result Discussion Proof Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 16 of 18 Go Back Full Screen Close Quit

15. Comment

If we only consider two moments, then the above proof

leads to the characteristic function χ(ω, µ, V ) = exp(c1 · µ · ω + c2 · V · ω2).

This characteristic function describes the Gaussian dis-

tribution.

Thus, we have, in effect proven the following auxiliary

result.

SLIDE 17

Formulation of the . . . Need for Strong . . . What We Do in This Talk Skew Normal . . . Analysis of the Problem Scale Invariance Main Result Discussion Proof Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 17 of 18 Go Back Full Screen Close Quit

16. Auxiliary Definitions

By a 2-selection, we mean a measurable mapping

ρ(x, µ, V ) defined for all possible tuples.

We say that a 2-selection is strongly consistent if Xi ∼

ρ(x, µi, Vi) for independent Xi implies X1 + X2 ∼ ρ(x, µ1 + µ2, V1 + V2).

We say that a 3-selection is scale-invariant if for every

possible tuple (µ, V ), for every λ > 0 and x′: 1 λ · ρ x′ λ , µ, V

= ρ(x′, λ · µ, λ2 · V ).
Proposition.

Every strongly consistent and scale- invariant 2-selection assigns:

to each possible tuple (µ, V ),
Gaussian distribution with mean µ and variance V .

SLIDE 18

Formulation of the . . . Need for Strong . . . What We Do in This Talk Skew Normal . . . Analysis of the Problem Scale Invariance Main Result Discussion Proof Home Page Title Page ◭◭ ◮◮ ◭ ◮ Page 18 of 18 Go Back Full Screen Close Quit

17. Acknowledgments This work was supported in part:

by the National Science Foundation grants:
HRD-0734825 and HRD-1242122

(Cyber-ShARE Center of Excellence) and

DUE-0926721, and
by an award from Prudential Foundation.