Sampling Distribution Of The Variance pierre.douillet@ensait.fr - - PowerPoint PPT Presentation

sampling distribution of the variance
SMART_READER_LITE
LIVE PREVIEW

Sampling Distribution Of The Variance pierre.douillet@ensait.fr - - PowerPoint PPT Presentation

WSC 2009 Sampling Distribution Of The Variance pierre.douillet@ensait.fr cole Nationale Suprieure des Arts et Industries Textiles Roubaix, France founded 1881 www.douillet.info WSC 2009


slide-1
SLIDE 1

✬ ✫ ✩ ✪ WSC 2009

Sampling Distribution Of The Variance

pierre.douillet@ensait.fr

École Nationale Supérieure des Arts et Industries Textiles Roubaix, France

slide-2
SLIDE 2

✬ ✫ ✩ ✪ founded 1881

slide-3
SLIDE 3

www.douillet.info WSC 2009

✬ ✫ ✩ ✪

⇒ • Well-Known Results . . . . . . . . . . . . . . 4

notations shape chi-square statistic batch mean method

  • Closed Form Results for m2 . . . . . . . . . .

9

  • Experimental Results

. . . . . . . . . . . . . 14

  • Variations of the Sample Variance

. . . . . . 19

  • Useful and Useless Statistics

. . . . . . . . . 25

  • Conclusions . . . . . . . . . . . . . . . . . . .

29

Ensait - Roubaix - France 3

slide-4
SLIDE 4

www.douillet.info WSC 2009

✬ ✫ ✩ ✪

Well-Known Results

notations

  • random variable ξ ∈ Ω with pd

f : ϕ (ξ)

  • sample of size n : ω ∈ Φ .

= Ωn where xi ∈ ω are i.i.d. µ = E (ξ) , µ2 = σ2 = var (ξ) , µ4 = E

  • (ξ − µ)4

m = Eω (x) , m2 = s2, m4 = n n − 1 Eω

  • (x − m)4

EΦ (f) , varΦ (f)

  • Φ-distribution of some f (ω), using the product measure.

Ensait - Roubaix - France 4

slide-5
SLIDE 5

www.douillet.info WSC 2009

✬ ✫ ✩ ✪

shape

  • mean, variance, shape (everything else)
  • centered moments of increasing index are more and more

involving rare events

  • Fisher’s skewness is

γ1 . = E

  • (ξ − µ)3

/σ3

  • usual : γ1 (gauss) = 0, γ1
  • χ2

ν

  • =
  • 8/ν, γ1 (exp.) = 2
  • not bounded (e.g lognormal)

Ensait - Roubaix - France 5

slide-6
SLIDE 6

www.douillet.info WSC 2009

✬ ✫ ✩ ✪

chi-square statistic

  • A0, A1, · · · , Aν, partition of Ω, ∀j : pj .

= Pr (ξ ∈ Aj) > 0.

  • For a sample ω, nj is the number of xi that belong to Aj

χ2

P earson (ω) = ν j=0 (n pj−nj)2 n pj

  • without any other assumption, EΦ
  • χ2

P earson (ω)

  • = ν

and varΦ

  • χ2

P earson (ω)

  • = 2ν n−1

n

+ 1

n

ν

  • 1

pj − ν − 1

  • χ2

std =

  • χ2

P earson − ν

  • /

√ 2ν even when χ2

P earson statistic is not χ2 ν distributed

Ensait - Roubaix - France 6

slide-7
SLIDE 7

www.douillet.info WSC 2009

✬ ✫ ✩ ✪

batch mean method

  • each result has been obtained with N = 200000 replications
  • f the n-sized sample
  • containing rounding errors, allowing parallelization (with

suitable random generator)

  • estimation of the sd of the estimators (and checking for

independence)

Ensait - Roubaix - France 7

slide-8
SLIDE 8

www.douillet.info WSC 2009

✬ ✫ ✩ ✪

√ • Well-Known Results . . . . . . . . . . . . . . 4 ⇒ • Closed Form Results for m2 . . . . . . . . . . 9

normal distribution normal law behaves abnormally n=2, n=3 n=2, n=3, R-uniform −a ≤ x ≤ a

  • Experimental Results

. . . . . . . . . . . . . 14

  • Variations of the Sample Variance

. . . . . . 19

  • Useful and Useless Statistics

. . . . . . . . . 25

  • Conclusions . . . . . . . . . . . . . . . . . . .

29

Ensait - Roubaix - France 8

slide-9
SLIDE 9

www.douillet.info WSC 2009

✬ ✫ ✩ ✪

Closed Form Results for m2

normal distribution

  • 200 000 samples (n = 8)
  • plot all the m2 (ω)
  • well known model χ2

7

  • goodness of fit :

χ2

P earson = 25.10

χ2

std = −1.28

  • bs
nor chi

500 1000 1500 2000 2500 3000 3500 49 160

⊕= observed, solid= chi2(7)

Ensait - Roubaix - France 9

slide-10
SLIDE 10

www.douillet.info WSC 2009

✬ ✫ ✩ ✪

normal law behaves abnormally

  • Random variates m and m2 are fully independent if

and only if the sampled population Ω is normal. In such a case, (n − 1) m2/µ2 is χ2

n−1 distributed.

  • Most of the time, stated in the "Gaussian distribution"

chapter of statistics books

  • Quite never recalled in the "χ2" chapter...
  • full independence is the key property for χ2
  • χ2 is not a model, even not an approximate model,

for the sample variance, when Ω is not Gaussian.

Ensait - Roubaix - France 10

slide-11
SLIDE 11

www.douillet.info WSC 2009

✬ ✫ ✩ ✪

n=2, n=3

  • Very special situations,

excluded from next coming general formulae

  • A direct attack leads to :

pd f2 (m2) =

  • 2

m2

  • R ϕ (t) ϕ
  • t + √2 m2
  • dt

pd f3 (m2) = t=s

t=0 4 √ 3 √ m2−t2

  • R ϕ (u − t) ϕ (u + t) ϕ
  • u +

√ 3m2 − 3t2 du dt

  • Applied to a Gaussian distribution, leads back to χ2

1 and χ2 2

  • m and m2 are linearly but not fully independent

Ensait - Roubaix - France 11

slide-12
SLIDE 12

www.douillet.info WSC 2009

✬ ✫ ✩ ✪

n=2, n=3, R-uniform −a ≤ x ≤ a

  • n = 2, 0 ≤ m2 ≤ 2a2, pd

f2 (m2) =

1 a√2m2 − 1 2a2

  • n = 3, 0 ≤ s2 = m2 ≤ 4a2/3 and

     pd f3 (m2) = 3

√ 3 a2

π

6 − s 2 a

  • 0 < s < a

pd f3 (m2) = 3

√ 3 a2

  • arcsin a

s − π 3 − s 2 a +

  • s2

a2 − 1

  • a < s
  • the sample belongs to a cube ; we have to measure the set
  • f all the ω that share the same value of m2 ; the shape and

therefore the description changes when ω travels from center (hexagon) to corner (triangle).

Ensait - Roubaix - France 12

slide-13
SLIDE 13

www.douillet.info WSC 2009

✬ ✫ ✩ ✪

√ • Well-Known Results . . . . . . . . . . . . . . 4 √ • Closed Form Results for m2 . . . . . . . . . . 9 ⇒ • Experimental Results . . . . . . . . . . . . . 14

Z-uniform R-uniform lognormal Student-like t-statistic

  • Variations of the Sample Variance

. . . . . . 19

  • Useful and Useless Statistics

. . . . . . . . . 25

  • Conclusions . . . . . . . . . . . . . . . . . . .

29

Ensait - Roubaix - France 13

slide-14
SLIDE 14

www.douillet.info WSC 2009

✬ ✫ ✩ ✪

Experimental Results

Z-uniform

1200 37 120

a = 10, n = 5, # = 617

  • bs
nor chi

4500 37 120

neither chi2 nor normal

Ensait - Roubaix - France 14

slide-15
SLIDE 15

www.douillet.info WSC 2009

✬ ✫ ✩ ✪

R-uniform

  • bs
nor chi

5000 33 120

a = 10, n = 5 γ1 ≈ 0.40 = 1.41

  • bs

nor chi

7000 33 100

a = 10, n = 8 quite normal γ1 ≈ 0.27 = 1.07

Ensait - Roubaix - France 15

slide-16
SLIDE 16

www.douillet.info WSC 2009

✬ ✫ ✩ ✪

lognormal

  • bs
nor chi

3000 98 250

ln M = E (ln ξ) ln K = var (ln ξ) m2, usual scale, γ1 ≈ 39

  • bs

nor chi

80000 10

M = 7, K = 2 n = 8 m2, log scale, γ1 ≈ 0.05

Ensait - Roubaix - France 16

slide-17
SLIDE 17

www.douillet.info WSC 2009

✬ ✫ ✩ ✪

Student-like t-statistic

  • bs
nor stu

80000

  • 4
  • 3
  • 2
  • 1

1 2 3 4

R-uniform, n = 5 t = (m − µ) /s, tail ≈Student

  • bs

nor stu

70000

  • 4
  • 3
  • 2
  • 1

1 2 3 4

lognormal, n = 8 t very skew, far away from models

Ensait - Roubaix - France 17

slide-18
SLIDE 18

www.douillet.info WSC 2009

✬ ✫ ✩ ✪

√ • Well-Known Results . . . . . . . . . . . . . . 4 √ • Closed Form Results for m2 . . . . . . . . . . 9 √ • Experimental Results . . . . . . . . . . . . . 14 ⇒ • Variations of the Sample Variance . . . . . . 19

expectation of products experimental values of theoretical formulae new results and proof of correctness some applications

  • Useful and Useless Statistics

. . . . . . . . . 25

  • Conclusions . . . . . . . . . . . . . . . . . . .

29

Ensait - Roubaix - France 18

slide-19
SLIDE 19

www.douillet.info WSC 2009

✬ ✫ ✩ ✪

Variations of the Sample Variance

expectation of products

  • estimation of monomials α .

= µαj

j

relative to Ω using monomials β . = mβk

k

relative to ω.

  • degree : dgm β .

= βk the number of mk occurring in β dgx β . = k βk, the number of factors xi occurring in β. EΦ (β) ∈ Span {α|dgx α = dgx β}

Ensait - Roubaix - France 19

slide-20
SLIDE 20

www.douillet.info WSC 2009

✬ ✫ ✩ ✪

experimental values of theoretical formulae

"Science is what we understand well enough to explain to a computer. Art is everything else we do (Knuth)."

  • for each n in [2, N], expand β as polynomial in the x1 · · · xn
  • substitute each xj

i (j > 1) by µj, and then each xi by 0

  • for each n, obtain a polynomial Pn =

α c (n, α) × α,

where c (n, α) ∈ Q

Ensait - Roubaix - France 20

slide-21
SLIDE 21

www.douillet.info WSC 2009

✬ ✫ ✩ ✪ experimental values of theoretical formulae (2)

  • each c (n, α) has a closed form, quotient of polynomials in

n, whose degrees cannot exceed dgx β

  • general algorithm AeqB, implemented as gfun (Maple)
  • each denominator is a divisor of np (n − 1)q where

p + q + 2 = dgx β and q + 1 = dgm β.

  • closed form of polynomial numerator from a list of values :

divided differences (Newton)

Ensait - Roubaix - France 21

slide-22
SLIDE 22

www.douillet.info WSC 2009

✬ ✫ ✩ ✪

new results and proof of correctness

  • Fisher (1929) started the process.
  • n = 11 now, n = 12 soon after Xmas (?)
  • Error prone process...
  • Test : the determinant of all the β over all the α of same

dgx splits into linear factors. ∆ 4 = (n−2)(n−3)

n (n−1)

∆11 = (n−2)14(n−3)12(n−4)10(n−5)7(n−6)5(n−7)3(n−8)2(n−9)(n−10)

n28(n−1)27

Ensait - Roubaix - France 22

slide-23
SLIDE 23

www.douillet.info WSC 2009

✬ ✫ ✩ ✪

some applications

  • define V .

=

1 (n−2)(n−3)

(x − m)4 − n2−3

n

m2

2

  • then

EΦ (V ) = varΦ (m2)

  • skewness of statistic m2 is :

γ1 (m2) =

1 √n

  • 8 κ2

3+4 κ3 2+12 κ2 κ4 +κ6

(2 κ2 2+κ4 )3/2

+ O 1

n

  • formulae with cumulants are better looking...

... but equally ill conditioned (cancellations)

Ensait - Roubaix - France 23

slide-24
SLIDE 24

www.douillet.info WSC 2009

✬ ✫ ✩ ✪

√ • Well-Known Results . . . . . . . . . . . . . . 4 √ • Closed Form Results for m2 . . . . . . . . . . 9 √ • Experimental Results . . . . . . . . . . . . . 14 √ • Variations of the Sample Variance . . . . . . 19 ⇒ • Useful and Useless Statistics . . . . . . . . . 25

xper V (R-uniform, n=5, 8, 12, 50) usefulness some examples

  • Conclusions . . . . . . . . . . . . . . . . . . .

29

Ensait - Roubaix - France 24

slide-25
SLIDE 25

www.douillet.info WSC 2009

✬ ✫ ✩ ✪

Useful and Useless Statistics

xper V (R-uniform, n=5, 8, 12, 50)

20%

800

  • 2000

1444 8000 900

  • 1000

1206 6000 1800 1091 4000 25000 400 934 1400

Ensait - Roubaix - France 25

slide-26
SLIDE 26

www.douillet.info WSC 2009

✬ ✫ ✩ ✪

usefulness

  • probable error PE, defined by Pr (|X − µ| ≤ PE) = 1/2

below PE, don’t discuss ; above PE begin to discuss.

  • when quite all of Ω is in [µ ± σ], distribution is dominated

by rare events, and "not huge" samples are meaningless

  • when nominal value of statistic α > 0 is ¯

α, Pr (X / ∈ [2¯ α/3, 4¯ α/3]) should be small, or... we have a bad feeling about a not better known statistic

  • definition : statistic α > 0 is useless when

cv = σα/µα > 1/3

Ensait - Roubaix - France 26

slide-27
SLIDE 27

www.douillet.info WSC 2009

✬ ✫ ✩ ✪

some examples

pdf m2 spec m2

2

m4 V spec uniform 10 31 20 21 normal 19 74 97 128 chi-square ν = 15 26 106 341 503 exponential 72 36 305 1638 1934 148

  • normal, n = 19 ; then α = 18 m2/µ2 is χ2

18 ; the one sigma

range of ¯ α is

  • 18 ±

√ 36

  • = [12, 24] ; cv=1/3
  • it happens that Pr (outside) ≈ 31%.

Ensait - Roubaix - France 27

slide-28
SLIDE 28

www.douillet.info WSC 2009

✬ ✫ ✩ ✪

√ • Well-Known Results . . . . . . . . . . . . . . 4 √ • Closed Form Results for m2 . . . . . . . . . . 9 √ • Experimental Results . . . . . . . . . . . . . 14 √ • Variations of the Sample Variance . . . . . . 19 √ • Useful and Useless Statistics . . . . . . . . . 25 ⇒ • Conclusions . . . . . . . . . . . . . . . . . . . 29

Ensait - Roubaix - France 28

slide-29
SLIDE 29

www.douillet.info WSC 2009

✬ ✫ ✩ ✪

Conclusions

Ensait - Roubaix - France 29