Sampling Distribution Of The Variance pierre.douillet@ensait.fr - - PowerPoint PPT Presentation
Sampling Distribution Of The Variance pierre.douillet@ensait.fr - - PowerPoint PPT Presentation
WSC 2009 Sampling Distribution Of The Variance pierre.douillet@ensait.fr cole Nationale Suprieure des Arts et Industries Textiles Roubaix, France founded 1881 www.douillet.info WSC 2009
✬ ✫ ✩ ✪ founded 1881
www.douillet.info WSC 2009
✬ ✫ ✩ ✪
⇒ • Well-Known Results . . . . . . . . . . . . . . 4
notations shape chi-square statistic batch mean method
- Closed Form Results for m2 . . . . . . . . . .
9
- Experimental Results
. . . . . . . . . . . . . 14
- Variations of the Sample Variance
. . . . . . 19
- Useful and Useless Statistics
. . . . . . . . . 25
- Conclusions . . . . . . . . . . . . . . . . . . .
29
Ensait - Roubaix - France 3
www.douillet.info WSC 2009
✬ ✫ ✩ ✪
Well-Known Results
notations
- random variable ξ ∈ Ω with pd
f : ϕ (ξ)
- sample of size n : ω ∈ Φ .
= Ωn where xi ∈ ω are i.i.d. µ = E (ξ) , µ2 = σ2 = var (ξ) , µ4 = E
- (ξ − µ)4
m = Eω (x) , m2 = s2, m4 = n n − 1 Eω
- (x − m)4
EΦ (f) , varΦ (f)
- Φ-distribution of some f (ω), using the product measure.
Ensait - Roubaix - France 4
www.douillet.info WSC 2009
✬ ✫ ✩ ✪
shape
- mean, variance, shape (everything else)
- centered moments of increasing index are more and more
involving rare events
- Fisher’s skewness is
γ1 . = E
- (ξ − µ)3
/σ3
- usual : γ1 (gauss) = 0, γ1
- χ2
ν
- =
- 8/ν, γ1 (exp.) = 2
- not bounded (e.g lognormal)
Ensait - Roubaix - France 5
www.douillet.info WSC 2009
✬ ✫ ✩ ✪
chi-square statistic
- A0, A1, · · · , Aν, partition of Ω, ∀j : pj .
= Pr (ξ ∈ Aj) > 0.
- For a sample ω, nj is the number of xi that belong to Aj
χ2
P earson (ω) = ν j=0 (n pj−nj)2 n pj
- without any other assumption, EΦ
- χ2
P earson (ω)
- = ν
and varΦ
- χ2
P earson (ω)
- = 2ν n−1
n
+ 1
n
ν
- 1
pj − ν − 1
- χ2
std =
- χ2
P earson − ν
- /
√ 2ν even when χ2
P earson statistic is not χ2 ν distributed
Ensait - Roubaix - France 6
www.douillet.info WSC 2009
✬ ✫ ✩ ✪
batch mean method
- each result has been obtained with N = 200000 replications
- f the n-sized sample
- containing rounding errors, allowing parallelization (with
suitable random generator)
- estimation of the sd of the estimators (and checking for
independence)
Ensait - Roubaix - France 7
www.douillet.info WSC 2009
✬ ✫ ✩ ✪
√ • Well-Known Results . . . . . . . . . . . . . . 4 ⇒ • Closed Form Results for m2 . . . . . . . . . . 9
normal distribution normal law behaves abnormally n=2, n=3 n=2, n=3, R-uniform −a ≤ x ≤ a
- Experimental Results
. . . . . . . . . . . . . 14
- Variations of the Sample Variance
. . . . . . 19
- Useful and Useless Statistics
. . . . . . . . . 25
- Conclusions . . . . . . . . . . . . . . . . . . .
29
Ensait - Roubaix - France 8
www.douillet.info WSC 2009
✬ ✫ ✩ ✪
Closed Form Results for m2
normal distribution
- 200 000 samples (n = 8)
- plot all the m2 (ω)
- well known model χ2
7
- goodness of fit :
χ2
P earson = 25.10
χ2
std = −1.28
- bs
500 1000 1500 2000 2500 3000 3500 49 160
⊕= observed, solid= chi2(7)
Ensait - Roubaix - France 9
www.douillet.info WSC 2009
✬ ✫ ✩ ✪
normal law behaves abnormally
- Random variates m and m2 are fully independent if
and only if the sampled population Ω is normal. In such a case, (n − 1) m2/µ2 is χ2
n−1 distributed.
- Most of the time, stated in the "Gaussian distribution"
chapter of statistics books
- Quite never recalled in the "χ2" chapter...
- full independence is the key property for χ2
- χ2 is not a model, even not an approximate model,
for the sample variance, when Ω is not Gaussian.
Ensait - Roubaix - France 10
www.douillet.info WSC 2009
✬ ✫ ✩ ✪
n=2, n=3
- Very special situations,
excluded from next coming general formulae
- A direct attack leads to :
pd f2 (m2) =
- 2
m2
- R ϕ (t) ϕ
- t + √2 m2
- dt
pd f3 (m2) = t=s
t=0 4 √ 3 √ m2−t2
- R ϕ (u − t) ϕ (u + t) ϕ
- u +
√ 3m2 − 3t2 du dt
- Applied to a Gaussian distribution, leads back to χ2
1 and χ2 2
- m and m2 are linearly but not fully independent
Ensait - Roubaix - France 11
www.douillet.info WSC 2009
✬ ✫ ✩ ✪
n=2, n=3, R-uniform −a ≤ x ≤ a
- n = 2, 0 ≤ m2 ≤ 2a2, pd
f2 (m2) =
1 a√2m2 − 1 2a2
- n = 3, 0 ≤ s2 = m2 ≤ 4a2/3 and
pd f3 (m2) = 3
√ 3 a2
π
6 − s 2 a
- 0 < s < a
pd f3 (m2) = 3
√ 3 a2
- arcsin a
s − π 3 − s 2 a +
- s2
a2 − 1
- a < s
- the sample belongs to a cube ; we have to measure the set
- f all the ω that share the same value of m2 ; the shape and
therefore the description changes when ω travels from center (hexagon) to corner (triangle).
Ensait - Roubaix - France 12
www.douillet.info WSC 2009
✬ ✫ ✩ ✪
√ • Well-Known Results . . . . . . . . . . . . . . 4 √ • Closed Form Results for m2 . . . . . . . . . . 9 ⇒ • Experimental Results . . . . . . . . . . . . . 14
Z-uniform R-uniform lognormal Student-like t-statistic
- Variations of the Sample Variance
. . . . . . 19
- Useful and Useless Statistics
. . . . . . . . . 25
- Conclusions . . . . . . . . . . . . . . . . . . .
29
Ensait - Roubaix - France 13
www.douillet.info WSC 2009
✬ ✫ ✩ ✪
Experimental Results
Z-uniform
1200 37 120
a = 10, n = 5, # = 617
- bs
4500 37 120
neither chi2 nor normal
Ensait - Roubaix - France 14
www.douillet.info WSC 2009
✬ ✫ ✩ ✪
R-uniform
- bs
5000 33 120
a = 10, n = 5 γ1 ≈ 0.40 = 1.41
- bs
nor chi
7000 33 100
a = 10, n = 8 quite normal γ1 ≈ 0.27 = 1.07
Ensait - Roubaix - France 15
www.douillet.info WSC 2009
✬ ✫ ✩ ✪
lognormal
- bs
3000 98 250
ln M = E (ln ξ) ln K = var (ln ξ) m2, usual scale, γ1 ≈ 39
- bs
nor chi
80000 10
M = 7, K = 2 n = 8 m2, log scale, γ1 ≈ 0.05
Ensait - Roubaix - France 16
www.douillet.info WSC 2009
✬ ✫ ✩ ✪
Student-like t-statistic
- bs
80000
- 4
- 3
- 2
- 1
1 2 3 4
R-uniform, n = 5 t = (m − µ) /s, tail ≈Student
- bs
nor stu
70000
- 4
- 3
- 2
- 1
1 2 3 4
lognormal, n = 8 t very skew, far away from models
Ensait - Roubaix - France 17
www.douillet.info WSC 2009
✬ ✫ ✩ ✪
√ • Well-Known Results . . . . . . . . . . . . . . 4 √ • Closed Form Results for m2 . . . . . . . . . . 9 √ • Experimental Results . . . . . . . . . . . . . 14 ⇒ • Variations of the Sample Variance . . . . . . 19
expectation of products experimental values of theoretical formulae new results and proof of correctness some applications
- Useful and Useless Statistics
. . . . . . . . . 25
- Conclusions . . . . . . . . . . . . . . . . . . .
29
Ensait - Roubaix - France 18
www.douillet.info WSC 2009
✬ ✫ ✩ ✪
Variations of the Sample Variance
expectation of products
- estimation of monomials α .
= µαj
j
relative to Ω using monomials β . = mβk
k
relative to ω.
- degree : dgm β .
= βk the number of mk occurring in β dgx β . = k βk, the number of factors xi occurring in β. EΦ (β) ∈ Span {α|dgx α = dgx β}
Ensait - Roubaix - France 19
www.douillet.info WSC 2009
✬ ✫ ✩ ✪
experimental values of theoretical formulae
"Science is what we understand well enough to explain to a computer. Art is everything else we do (Knuth)."
- for each n in [2, N], expand β as polynomial in the x1 · · · xn
- substitute each xj
i (j > 1) by µj, and then each xi by 0
- for each n, obtain a polynomial Pn =
α c (n, α) × α,
where c (n, α) ∈ Q
Ensait - Roubaix - France 20
www.douillet.info WSC 2009
✬ ✫ ✩ ✪ experimental values of theoretical formulae (2)
- each c (n, α) has a closed form, quotient of polynomials in
n, whose degrees cannot exceed dgx β
- general algorithm AeqB, implemented as gfun (Maple)
- each denominator is a divisor of np (n − 1)q where
p + q + 2 = dgx β and q + 1 = dgm β.
- closed form of polynomial numerator from a list of values :
divided differences (Newton)
Ensait - Roubaix - France 21
www.douillet.info WSC 2009
✬ ✫ ✩ ✪
new results and proof of correctness
- Fisher (1929) started the process.
- n = 11 now, n = 12 soon after Xmas (?)
- Error prone process...
- Test : the determinant of all the β over all the α of same
dgx splits into linear factors. ∆ 4 = (n−2)(n−3)
n (n−1)
∆11 = (n−2)14(n−3)12(n−4)10(n−5)7(n−6)5(n−7)3(n−8)2(n−9)(n−10)
n28(n−1)27
Ensait - Roubaix - France 22
www.douillet.info WSC 2009
✬ ✫ ✩ ✪
some applications
- define V .
=
1 (n−2)(n−3)
(x − m)4 − n2−3
n
m2
2
- then
EΦ (V ) = varΦ (m2)
- skewness of statistic m2 is :
γ1 (m2) =
1 √n
- 8 κ2
3+4 κ3 2+12 κ2 κ4 +κ6
(2 κ2 2+κ4 )3/2
+ O 1
n
- formulae with cumulants are better looking...
... but equally ill conditioned (cancellations)
Ensait - Roubaix - France 23
www.douillet.info WSC 2009
✬ ✫ ✩ ✪
√ • Well-Known Results . . . . . . . . . . . . . . 4 √ • Closed Form Results for m2 . . . . . . . . . . 9 √ • Experimental Results . . . . . . . . . . . . . 14 √ • Variations of the Sample Variance . . . . . . 19 ⇒ • Useful and Useless Statistics . . . . . . . . . 25
xper V (R-uniform, n=5, 8, 12, 50) usefulness some examples
- Conclusions . . . . . . . . . . . . . . . . . . .
29
Ensait - Roubaix - France 24
www.douillet.info WSC 2009
✬ ✫ ✩ ✪
Useful and Useless Statistics
xper V (R-uniform, n=5, 8, 12, 50)
20%
800
- 2000
1444 8000 900
- 1000
1206 6000 1800 1091 4000 25000 400 934 1400
Ensait - Roubaix - France 25
www.douillet.info WSC 2009
✬ ✫ ✩ ✪
usefulness
- probable error PE, defined by Pr (|X − µ| ≤ PE) = 1/2
below PE, don’t discuss ; above PE begin to discuss.
- when quite all of Ω is in [µ ± σ], distribution is dominated
by rare events, and "not huge" samples are meaningless
- when nominal value of statistic α > 0 is ¯
α, Pr (X / ∈ [2¯ α/3, 4¯ α/3]) should be small, or... we have a bad feeling about a not better known statistic
- definition : statistic α > 0 is useless when
cv = σα/µα > 1/3
Ensait - Roubaix - France 26
www.douillet.info WSC 2009
✬ ✫ ✩ ✪
some examples
pdf m2 spec m2
2
m4 V spec uniform 10 31 20 21 normal 19 74 97 128 chi-square ν = 15 26 106 341 503 exponential 72 36 305 1638 1934 148
- normal, n = 19 ; then α = 18 m2/µ2 is χ2
18 ; the one sigma
range of ¯ α is
- 18 ±
√ 36
- = [12, 24] ; cv=1/3
- it happens that Pr (outside) ≈ 31%.
Ensait - Roubaix - France 27
www.douillet.info WSC 2009
✬ ✫ ✩ ✪
√ • Well-Known Results . . . . . . . . . . . . . . 4 √ • Closed Form Results for m2 . . . . . . . . . . 9 √ • Experimental Results . . . . . . . . . . . . . 14 √ • Variations of the Sample Variance . . . . . . 19 √ • Useful and Useless Statistics . . . . . . . . . 25 ⇒ • Conclusions . . . . . . . . . . . . . . . . . . . 29
Ensait - Roubaix - France 28
www.douillet.info WSC 2009
✬ ✫ ✩ ✪
Conclusions
Ensait - Roubaix - France 29