sampling distribution of the variance
play

Sampling Distribution Of The Variance pierre.douillet@ensait.fr - PowerPoint PPT Presentation

WSC 2009 Sampling Distribution Of The Variance pierre.douillet@ensait.fr cole Nationale Suprieure des Arts et Industries Textiles Roubaix, France founded 1881 www.douillet.info WSC 2009


  1. ✬ ✩ WSC 2009 Sampling Distribution Of The Variance pierre.douillet@ensait.fr École Nationale Supérieure des Arts et Industries Textiles Roubaix, France ✫ ✪

  2. ✬ ✩ ✫ ✪ founded 1881

  3. ✬ ✩ www.douillet.info WSC 2009 ⇒ • Well-Known Results . . . . . . . . . . . . . . 4 notations shape chi-square statistic batch mean method • Closed Form Results for m 2 . . . . . . . . . . 9 • Experimental Results . . . . . . . . . . . . . 14 • Variations of the Sample Variance . . . . . . 19 • Useful and Useless Statistics . . . . . . . . . 25 ✫ ✪ • Conclusions . . . . . . . . . . . . . . . . . . . 29 Ensait - Roubaix - France 3

  4. ✬ ✩ www.douillet.info WSC 2009 Well-Known Results notations • random variable ξ ∈ Ω with pd f : ϕ ( ξ ) • sample of size n : ω ∈ Φ . = Ω n where x i ∈ ω are i.i.d. µ 2 = σ 2 = var ( ξ ) , � ( ξ − µ ) 4 � µ = E ( ξ ) , µ 4 = E n � ( x − m ) 4 � m 2 = s 2 , m = E ω ( x ) , m 4 = n − 1 E ω E Φ ( f ) , var Φ ( f ) • Φ -distribution of some f ( ω ) , using the product measure. ✫ ✪ Ensait - Roubaix - France 4

  5. ✬ ✩ www.douillet.info WSC 2009 shape • mean, variance, shape (everything else) • centered moments of increasing index are more and more involving rare events • Fisher’s skewness is γ 1 . � ( ξ − µ ) 3 � /σ 3 = E χ 2 � • usual : γ 1 ( gauss ) = 0 , γ 1 � � 8 /ν , γ 1 ( exp. ) = 2 = ν • not bounded (e.g lognormal) ✫ ✪ Ensait - Roubaix - France 5

  6. ✬ ✩ www.douillet.info WSC 2009 chi-square statistic • A 0 , A 1 , · · · , A ν , partition of Ω , ∀ j : p j . = Pr ( ξ ∈ A j ) > 0 . • For a sample ω , n j is the number of x i that belong to A j ( n p j − n j ) 2 P earson ( ω ) = � ν χ 2 j =0 n p j • without any other assumption, E Φ χ 2 � � P earson ( ω ) = ν � � � ν = 2 ν n − 1 + 1 1 χ 2 and var Φ � � P earson ( ω ) p j − ν − 1 0 n n √ χ 2 χ 2 � � std = P earson − ν / 2 ν even when χ 2 P earson statistic is not χ 2 ν distributed ✫ ✪ Ensait - Roubaix - France 6

  7. ✬ ✩ www.douillet.info WSC 2009 batch mean method • each result has been obtained with N = 200000 replications of the n -sized sample • containing rounding errors, allowing parallelization (with suitable random generator) • estimation of the sd of the estimators (and checking for independence) ✫ ✪ Ensait - Roubaix - France 7

  8. ✬ ✩ www.douillet.info WSC 2009 √ • Well-Known Results . . . . . . . . . . . . . . 4 ⇒ • Closed Form Results for m 2 . . . . . . . . . . 9 normal distribution normal law behaves abnormally n=2, n=3 n=2, n=3, R-uniform − a ≤ x ≤ a • Experimental Results . . . . . . . . . . . . . 14 • Variations of the Sample Variance . . . . . . 19 • Useful and Useless Statistics . . . . . . . . . 25 ✫ ✪ • Conclusions . . . . . . . . . . . . . . . . . . . 29 Ensait - Roubaix - France 8

  9. ✬ ✩ www.douillet.info WSC 2009 Closed Form Results for m 2 normal distribution • 200 000 samples ( n = 8 ) 3500 obs nor chi 3000 • plot all the m 2 ( ω ) 2500 • well known model χ 2 2000 7 1500 • goodness of fit : 1000 500 χ 2 P earson = 25 . 10 0 0 49 160 ⊕ = observed, solid= chi2(7) χ 2 ✫ std = − 1 . 28 ✪ Ensait - Roubaix - France 9

  10. ✬ ✩ www.douillet.info WSC 2009 normal law behaves abnormally • Random variates m and m 2 are fully independent if and only if the sampled population Ω is normal. In such a case, ( n − 1) m 2 /µ 2 is χ 2 n − 1 distributed. • Most of the time, stated in the "Gaussian distribution" chapter of statistics books • Quite never recalled in the " χ 2 " chapter... • full independence is the key property for χ 2 • χ 2 is not a model, even not an approximate model, for the sample variance, when Ω is not Gaussian. ✫ ✪ Ensait - Roubaix - France 10

  11. ✬ ✩ www.douillet.info WSC 2009 n=2, n=3 • Very special situations, excluded from next coming general formulae • A direct attack leads to : t + √ 2 m 2 � 2 � � � pd f 2 ( m 2 ) = R ϕ ( t ) ϕ d t m 2 √ � t = s √ 4 3 � � 3 m 2 − 3 t 2 � pd f 3 ( m 2 ) = R ϕ ( u − t ) ϕ ( u + t ) ϕ u + d u d t √ t =0 m 2 − t 2 • Applied to a Gaussian distribution, leads back to χ 2 1 and χ 2 2 • m and m 2 are linearly but not fully independent ✫ ✪ Ensait - Roubaix - France 11

  12. ✬ ✩ www.douillet.info WSC 2009 n=2, n=3, R-uniform − a ≤ x ≤ a 1 1 • n = 2 , 0 ≤ m 2 ≤ 2 a 2 , pd f 2 ( m 2 ) = a √ 2 m 2 − 2 a 2 • n = 3 , 0 ≤ s 2 = m 2 ≤ 4 a 2 / 3 and � π  √ f 3 ( m 2 ) = 3 3 s � pd 6 − 0 < s < a  a 2 2 a  � � � √ f 3 ( m 2 ) = 3 3 s 2 arcsin a s − π s pd 3 − 2 a + a 2 − 1 a < s  a 2  • the sample belongs to a cube ; we have to measure the set of all the ω that share the same value of m 2 ; the shape and therefore the description changes when ω travels from center (hexagon) to corner (triangle). ✫ ✪ Ensait - Roubaix - France 12

  13. ✬ ✩ www.douillet.info WSC 2009 √ • Well-Known Results . . . . . . . . . . . . . . 4 √ • Closed Form Results for m 2 . . . . . . . . . . 9 ⇒ • Experimental Results . . . . . . . . . . . . . 14 Z-uniform R-uniform lognormal Student-like t-statistic • Variations of the Sample Variance . . . . . . 19 • Useful and Useless Statistics . . . . . . . . . 25 ✫ ✪ • Conclusions . . . . . . . . . . . . . . . . . . . 29 Ensait - Roubaix - France 13

  14. ✬ ✩ www.douillet.info WSC 2009 Experimental Results Z-uniform 1200 4500 obs nor chi 0 0 37 120 0 0 37 120 neither chi2 nor normal ✫ ✪ a = 10 , n = 5 , # = 617 Ensait - Roubaix - France 14

  15. ✬ ✩ www.douillet.info WSC 2009 R-uniform 5000 7000 obs obs nor nor chi chi 0 0 0 33 120 0 33 100 n = 8 quite normal a = 10 , n = 5 a = 10 , ✫ ✪ γ 1 ≈ 0 . 40 � = 1 . 41 γ 1 ≈ 0 . 27 � = 1 . 07 Ensait - Roubaix - France 15

  16. ✬ ✩ www.douillet.info WSC 2009 lognormal 3000 80000 obs obs nor nor chi chi 0 0 0 98 250 0 10 M = 7 , K = 2 ln M = E (ln ξ ) ln K = var (ln ξ ) n = 8 m 2 , usual scale, γ 1 ≈ 39 m 2 , log scale, γ 1 ≈ 0 . 05 ✫ ✪ Ensait - Roubaix - France 16

  17. ✬ ✩ www.douillet.info WSC 2009 Student-like t-statistic obs obs nor nor stu stu 80000 70000 0 0 -4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4 R-uniform, n = 5 lognormal, n = 8 t = ( m − µ ) /s , tail t very skew, far away from ✫ ✪ ≈ Student models Ensait - Roubaix - France 17

  18. ✬ ✩ www.douillet.info WSC 2009 √ • Well-Known Results . . . . . . . . . . . . . . 4 √ • Closed Form Results for m 2 . . . . . . . . . . 9 √ • Experimental Results . . . . . . . . . . . . . 14 ⇒ • Variations of the Sample Variance . . . . . . 19 expectation of products experimental values of theoretical formulae new results and proof of correctness some applications • Useful and Useless Statistics . . . . . . . . . 25 ✫ ✪ • Conclusions . . . . . . . . . . . . . . . . . . . 29 Ensait - Roubaix - France 18

  19. ✬ ✩ www.douillet.info WSC 2009 Variations of the Sample Variance expectation of products = � µ α j • estimation of monomials α . relative to Ω j = � m β k using monomials β . relative to ω . k = � β k the number of m k occurring in β • degree : dg m β . = � k β k , the number of factors x i occurring in β . dg x β . E Φ ( β ) ∈ Span { α | dg x α = dg x β } ✫ ✪ Ensait - Roubaix - France 19

  20. ✬ ✩ www.douillet.info WSC 2009 experimental values of theoretical formulae "Science is what we understand well enough to explain to a computer. Art is everything else we do (Knuth)." • for each n in [2 , N ] , expand β as polynomial in the x 1 · · · x n • substitute each x j i ( j > 1 ) by µ j , and then each x i by 0 • for each n , obtain a polynomial P n = � α c ( n, α ) × α , where c ( n, α ) ∈ Q ✫ ✪ Ensait - Roubaix - France 20

  21. ✬ ✩ www.douillet.info WSC 2009 experimental values of theoretical formulae (2) • each c ( n, α ) has a closed form, quotient of polynomials in n , whose degrees cannot exceed dg x β • general algorithm AeqB, implemented as gfun (Maple) • each denominator is a divisor of n p ( n − 1) q where p + q + 2 = dg x β and q + 1 = dg m β . • closed form of polynomial numerator from a list of values : divided differences (Newton) ✫ ✪ Ensait - Roubaix - France 21

  22. ✬ ✩ www.douillet.info WSC 2009 new results and proof of correctness • Fisher (1929) started the process. • n = 11 now, n = 12 soon after Xmas (?) • Error prone process... • Test : the determinant of all the β over all the α of same dg x splits into linear factors. ∆ 4 = ( n − 2)( n − 3) n ( n − 1) ∆ 11 = ( n − 2) 14 ( n − 3) 12 ( n − 4) 10 ( n − 5) 7 ( n − 6) 5 ( n − 7) 3 ( n − 8) 2 ( n − 9)( n − 10) n 28 ( n − 1) 27 ✫ ✪ Ensait - Roubaix - France 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend