chi squared 2 1 10 5 and f tests 9 5 2 for the variance
play

Chi-squared ( 2 ) (1.10.5) and F -tests (9.5.2) for the variance of - PowerPoint PPT Presentation

Chi-squared ( 2 ) (1.10.5) and F -tests (9.5.2) for the variance of a normal distribution 2 tests for goodness of fit and indepdendence (3.5.43.5.5) Prof. Tesler Math 283 Fall 2016 2 and F tests Prof. Tesler Math 283 / Fall 2016


  1. Chi-squared ( χ 2 ) (1.10.5) and F -tests (9.5.2) for the variance of a normal distribution χ 2 tests for goodness of fit and indepdendence (3.5.4–3.5.5) Prof. Tesler Math 283 Fall 2016 χ 2 and F tests Prof. Tesler Math 283 / Fall 2016 1 / 41

  2. Tests of means vs. tests of variances 2 Data x 1 , . . . , x n , sample mean ¯ x , sample var. s X 2 Data y 1 , . . . , y m , sample mean ¯ y , sample var. s Y Tests for mean Tests for variance One-sample tests: One-sample test: H 0 : σ 2 = σ 0 2 vs. H 1 : σ 2 � σ 0 2 H 0 : µ = µ 0 vs. H 1 : µ � µ 0 test statistic: test statistic: “chi-squared” x − µ 0 ¯ x − µ 0 ¯ χ 2 = ( n − 1 ) s 2 /σ 0 2 ( df = n − 1 ) z = σ/ √ n or t = s / √ n ( df = n − 1 ) Two-sample tests: Two-sample test: 2 = σ Y 2 vs. H 1 : σ X 2 � σ Y 2 H 0 : µ X = µ Y vs. H 1 : µ X � µ Y H 0 : σ X test statistic: test statistic: x − ¯ ¯ y 2 / s X 2 z = F = s Y � 2 2 σ X σ Y (with m − 1 and n − 1 d.f.) n + m x − ¯ ¯ y √ 1 or t = ( df = n + m − 2 ) n + 1 s p m χ 2 and F tests Prof. Tesler Math 283 / Fall 2016 2 / 41

  3. Application: The fine print in the Z and t -tests One-sample z -test, H 0 : µ = µ 0 vs. H 1 : µ � µ 0 This assumes that you know the value of σ 2 , say σ 2 = σ 02 . A χ 2 test could be used to verify that the data is consistent with H 0 : σ 2 = σ 02 instead of H 1 : σ 2 � σ 02 . Two-sample z -test, H 0 : µ X = µ Y vs. H 1 : µ X � µ Y 2 and σ Y 2 . This assumes that you know the values of σ X Separate χ 2 tests for σ X 2 and σ Y 2 could be performed to verify consistency with the assumed values. Two-sample t -test, H 0 : µ X = µ Y vs. H 1 : µ X � µ Y 2 = σ Y 2 (but doesn’t assume that this common This assumes σ X value is known to you). An F -test could be used to verify that the data is consistent with 2 = σ Y 2 instead of H 1 : σ X 2 � σ Y 2 . H 0 : σ X If the variances are unequal, Welch’s t -test can be used instead of the regular two-sample t -test (Ewens & Grant pp. 127–128). χ 2 and F tests Prof. Tesler Math 283 / Fall 2016 3 / 41

  4. The χ 2 (“Chi-squared”) distribution Used for confidence intervals and hypothesis tests on the unknown parameter σ 2 of the normal distribution, based on the test statistic s 2 (sample variance). It has the same “degrees of freedom” as for the t distribution. Point these out on the graphs: The chi-squared distribution with k degrees of freedom has Range [ 0 , ∞ ) Mean µ = k χ 2 = k − 2 (for k � 2 , the pdf is maximum for χ 2 = k − 2 ) Mode χ 2 = 1 (for k = 1 ) ≈ k ( 1 − 2 9 k ) 3 Median Between k and k − 2 3 . Asymptotically decreases → k − 2 3 as k → ∞ . σ 2 = 2 k Variance x ( k / 2 )− 1 e − x / 2 2 , rate λ = 1 Γ distrib. with shape r = k PDF 2 k / 2 Γ ( k / 2 ) : 2 Unlike z and t , the pdf for χ 2 is NOT symmetric. χ 2 and F tests Prof. Tesler Math 283 / Fall 2016 4 / 41

  5. The graphs for 1 and 2 degrees of freedom are decreasing: 5 0.5 4 0.4 3 0.3 pdf pdf 2 0.2 1 0.1 0 0 0 1 2 3 4 0 1 2 3 4 5 6 ! 2 ! 2 mean @ µ = 1 mean @ µ = 2 1 2 mode @ χ 2 = 0 mode @ χ 2 = 0 median @ χ 2 = chi2inv(.5,1) = qchisq(.5,1) = 0 . 4549 median @ χ 2 = chi2inv(.5,2) = qchisq(.5,2) = 1 . 3863 The rest are “hump” shaped and skewed to the right: 0.25 0.12 0.1 0.2 0.08 0.15 pdf pdf 0.06 0.1 0.04 0.05 0.02 0 0 0 2 4 6 8 0 2 4 6 8 10 12 14 16 ! 2 ! 2 mean @ µ = 3 mean @ µ = 8 3 8 mode @ χ 2 = 1 mode @ χ 2 = 6 median @ χ 2 = chi2inv(.5,3) = qchisq(.5,3) = 2 . 3660 median @ χ 2 = chi2inv(.5,8) = qchisq(.5,8) = 7 . 3441 χ 2 and F tests Prof. Tesler Math 283 / Fall 2016 5 / 41

  6. χ 2 (“Chi-squared”) distribution — Cutoffs 2 � sided acceptance region: df=5, � = 0.05 left � sided critical region 0.15 0.15 0.10 0.10 pdf pdf 0.05 0.05 � 2 � 2 0.025,5 = 0.8312116 � ,df � 2 0.975,5 = 12.83250 0.00 0.00 0 2 4 6 8 10 12 0 5 10 15 � 2 � 2 5 5 Define χ 2 α , df as the number where the cdf (area left of it) is α : P ( χ 2 df � χ 2 α , df ) = α Different notation than z α and t α , df (area α on right ) since pdf isn’t symmetric. Matlab R χ 2 0 . 025 , 5 = chi2inv(.025,5) = qchisq(.025,5) = 0 . 8312 χ 2 0 . 975 , 5 = chi2inv(.975,5) = qchisq(.975,5) = 12 . 8325 = pchisq(0.8312,5) = 0 . 025 chi2cdf(0.8312,5) chi2cdf(12.8325,5) = pchisq(12.8325,5) = 0 . 975 = dchisq(0.8312,5) = 0 . 0665 chi2pdf(0.8312,5) chi2pdf(12.8325,5) = dchisq(12.8325,5) = 0 . 0100 χ 2 and F tests Prof. Tesler Math 283 / Fall 2016 6 / 41

  7. Two-sided cutoff 2 � sided acceptance region: df=5, � = 0.05 0.15 0.10 pdf 0.05 � 2 0.025,5 = 0.8312116 � 2 0.975,5 = 12.83250 0.00 0 5 10 15 � 2 5 The mean, median, and mode are different, so it may not be obvious what values of χ 2 are “more consistent” with the null H 0 : σ 2 = 10000 vs. the alternative σ 2 � 10000 . Closer to the median of χ 2 is "more consistent" with H 0 . For 2-sided hypothesis tests or confidence intervals with α = 5 %, we still put 95 % of the area in the middle and 2 . 5 % at each end, but the pdf is not symmetric, so the lower and upper cutoffs are determined separately instead of ± each other. χ 2 and F tests Prof. Tesler Math 283 / Fall 2016 7 / 41

  8. Two-sided hypothesis test for variance H 0 : σ 2 = 10000 vs. H 1 : σ 2 � 10000 at sig. level α = . 05 Test (In general, replace 10000 by σ 02 ; here, σ 0 = 100 ) Decision procedure Get a sample x 1 , . . . , x n . 1 650, 510, 470, 570, 410, 370 with n = 6 � n s 2 = m = x 1 + ··· + x n 1 i = 1 ( x i − m ) 2 . Calculate and 2 n − 1 n s 2 = 10666 . 67 , m = 496 . 67 , s = 103 . 28 = � n Calculate the test-statistic χ 2 = ( n − 1 ) s 2 ( x i − m ) 2 3 i = 1 σ 02 σ 02 χ 2 = ( n − 1 ) s 2 = ( 6 − 1 )( 10666 . 67 ) = 5 . 33 σ 02 10000 Accept H 0 if χ 2 is between χ 2 α/ 2 , n − 1 and χ 2 1 − α/ 2 , n − 1 . 4 Reject H 0 otherwise. χ 2 . 025 , 5 = . 8312 , χ 2 . 975 , 5 = 12 . 8325 Since χ 2 = 5 . 33 is between these, we accept H 0 . (Or, there is insufficient evidence to reject σ 2 = 10000 .) χ 2 and F tests Prof. Tesler Math 283 / Fall 2016 8 / 41

  9. Doing the same test with a P -value 0.15 ● ● ● ● ● ● ● ● Supports H 0 better ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Supports H 1 better ● ● ● ● ● ● ● ● 24.61% ● ● median=4.35 ● ● ● ● ● ● ● ● ● 0.10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● pdf ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 37.69% 37.69% ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 3.50 5.33 10 15 20 χ 5 2 5 � 5 . 33 ) = 0 . 6231 is the area left of 5.33 for χ 2 with 5 d.f.: P ( χ 2 Matlab: chi2cdf(5.33,5) R: pchisq(5.33,5) Values at least as extreme as this are those at the 62.31th percentile or higher, OR at the 37.69th percentile or lower, so P = ( 1 − . 6231 ) + . 3769 = 2 ( . 3769 ) = 0 . 7539 P > α ( 0 . 75 > 0 . 05 ) so accept H 0 . To turn a one-sided P -value p 1 into a two-sided P -value, use P = 2 min ( p 1 , 1 − p 1 ) . χ 2 and F tests Prof. Tesler Math 283 / Fall 2016 9 / 41

  10. Two-sided 95 % confidence interval for the variance Continue with data 650, 510, 470, 570, 410, 370 s 2 = 10666 . 67 , which has n = 6 , m = 496 . 67 , s = 103 . 28 . Get bounds on σ 2 in terms of s 2 for the two-sided test: . 025 , 5 < χ 2 < χ 2 P ( χ 2 0 . 95 = . 975 , 5 ) P ( 0 . 8312 < χ 2 < 12 . 8325 ) = � � 0 . 8312 < ( 6 − 1 ) S 2 = < 12 . 8325 P σ 2 � ( 6 − 1 ) S 2 0 . 8312 > σ 2 > ( 6 − 1 ) S 2 � = P 12 . 8325 A two-sided 95 % confidence interval for the variance σ 2 is � ( 6 − 1 ) S 2 12 . 8325 , ( 6 − 1 ) S 2 � = ( 4156 . 11 , 64164 . 26 ) 0 . 8312 A two-sided 95 % confidence interval for σ is � � � � ( 6 − 1 ) S 2 ( 6 − 1 ) S 2 12 . 8325 , = ( 64 . 47 , 253 . 31 ) 0 . 8312 χ 2 and F tests Prof. Tesler Math 283 / Fall 2016 10 / 41

  11. Properties of Chi-squared distribution Definition of Chi-squared distribution: 1 Let Z 1 , . . . , Z k be independent standard normal variables. k = Z 12 + · · · + Z k 2 . Let χ 2 The pdf of the random variable χ 2 k is the “chi-squared distribution with k degrees of freedom.” Pooling property: If U and V are independent χ 2 random 2 variables with q and r degrees of freedom respectively, then U + V is a χ 2 random variable with q + r degrees of freedom. Sample variance: Pick X 1 , . . . , X n from a normal distribution 3 N ( µ , σ 2 ) . It turns out that n ( X i − X ) 2 σ 2 = ( n − 1 ) S 2 � = SS σ 2 σ 2 i = 1 has a χ 2 distribution with df = n − 1 , so we test on χ 2 = ( n − 1 ) s 2 . 2 σ 0 χ 2 and F tests Prof. Tesler Math 283 / Fall 2016 11 / 41

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend