Interval Estimation Edwin Leuven Interval estimation While an - PowerPoint PPT Presentation

Interval Estimation Edwin Leuven

Interval estimation While an estimator may be unbiased or consistent, a given estimate will never equal the true value ◮ this point estimate does not give a sense of “closeness”, while ◮ the variance estimate does not give a sense of location We could try to combine both in statements like: “We are confident that θ lies somewhere between . . . and . . . ” where we would like to 1. give a specific interval, and 2. be precise about how confident we are This is the aim of a confidence interval (CI), which is a particular type of probability interval 2/36

Interval estimation Formally we define a confidence interval as follows Pr(ˆ L < θ < ˆ U ) = 1 − α where we construct estimates of a lower bound ˆ L and an upper bound ˆ U such that the interval [ˆ L , ˆ U ] covers the parameter of interest with probability 1 − α We call 1 − α the confidence level CI’s are random intervals because they differ across random samples The confidence level is thus a probability relative to the sampling distribution! 3/36

Probability intervals We will consider probability intervals for continuous r.v. X Pr( a < X < b ) with density f ( x ) and support on the real line this equals � b Pr( a < X < b ) = f ( x ) dx a � b � a = f ( x ) dx − f ( x ) dx −∞ −∞ = F ( b ) − F ( a ) Note that since X is continuous Pr( X = x ) = 0 and Pr( a < X < b ) = Pr( a ≤ X < b ) = Pr( a < X ≤ b ) = Pr( a ≤ X ≤ b ) 4/36

Probability intervals ( X ∼ χ 2 (3)) Density Area = Pr(a < X < b) = F(b) − F(a) a b X 5/36

Probability intervals We often need to compute either p = F ( a ) ≡ Pr( X ≤ a ) or a = F − 1 ( p ) In R we can do this using the pxxx and qxxx functions, f.e. for the normal distribution: pnorm (1.96) ## [1] 0.9750021 qnorm (0.975) ## [1] 1.959964 6/36

What is the probability that our estimator is close to θ ? We can write this as: Pr( | ˆ θ − θ | < ε ) = Pr( θ − ε < ˆ θ < θ + ε ) Density Area = ? θ − ε θ θ + ε ^ θ 7/36

Interval estimation The probability that our estimator is no further than ε from θ equals Pr( θ − ε < ˆ θ < θ + ε ) which is the probability that the r.v. ˆ θ is in a fixed interval with unknown boundaries Note though that we can rewrite this as follows Pr( θ − ε < ˆ θ < θ + ε ) = Pr(ˆ θ − ε < θ < ˆ θ + ε ) which is the probability that the random interval (ˆ θ − ε, ˆ θ + ε ) covers the fixed number θ How do we construct such intervals and compute their corresponding confidence levels? 8/36

CI for the mean – Normal data, variance known Let X ∼ N ( µ, σ 2 ) then ¯ X ∼ N ( µ, σ 2 / n ) Now consider taking a random sample of size n , then ¯ � � ε X − µ ε Pr( µ − ε < ¯ X < µ + ε ) = Pr − σ/ √ n < σ/ √ n < σ/ √ n ε ε � � � � σ/ √ n σ/ √ n =Φ − Φ − ε � � σ/ √ n =2Φ − 1 = 1 − α For a given confidence level 1 − α we get the following ε ε = z 1 − α/ 2 · σ/ √ n where z 1 − α/ 2 = Φ − 1 (1 − α/ 2) 9/36

CI for the mean – Normal data, variance known 1 − α 2 Φ ( x ) α 2 µ z α 2 z 1 −α 2 x 10/36

CI for the mean – Normal data, variance known Since Pr( µ − ε < ¯ X < µ + ε ) = Pr(¯ X − ε < µ < ¯ X + ε ) , the following is a (1 − α )100% CI: X − z 1 − α/ 2 · σ/ √ n , ¯ X + z 1 − α/ 2 · σ/ √ n ) (¯ For a given sample, µ will be either inside or outside this interval But before drawing the sample, there is a (1 − α )100% chance that an interval constructed this way will cover the true parameter µ 11/36

CI for the mean – Normal data, variance known For example if we set the confidence level at 1 − α = 0 . 90, then z 1 − 0 . 10 / 2 = Φ − 1 (0 . 95) = − Φ − 1 (0 . 05) ≈ 1 . 645 qnorm (.95) ## [1] 1.6448536 With n = 10 random draws from X ∼ N ( µ, 1) mean ( rnorm (10)) ## [1] -0.38315741 we get the following 90% confidence interval: √ √ ( − 0 . 38 − 1 . 645 · 1 / 10 , − 0 . 38 + 1 . 645 · 1 / 10) ≈ ( − 0 . 90 , 0 . 14) 12/36

CI for the mean – Normal data, variance known We know that we need to cover the true parameter 90% of the time: n = 10; nrep = 1e5; z = qnorm (0.95) cover = rep (F, nrep) for (i in 1 : nrep) { x = rnorm (n, 0, 1) m = mean (x); se = 1 / sqrt (n) ci0 = m - z * se; ci1 = m + z * se; cover[i] = ci0 < 0 & 0 < ci1 } mean (cover) ## [1] 0.90217 13/36

90% CI for the mean, n = 10 50 50 40 40 Sample nr. 30 30 20 20 10 10 0 0 −2 −2 −1 −1 0 0 1 1 2 2 CI 14/36

Computing sample size Suppose you plan to collect data, and you want to know the sample size you need to achieve a certain level of confidence in our interval estimate Since ε = z 1 − α/ 2 · σ/ √ n Solving for n , we obtain � 2 � z 1 − α/ 2 · σ n = ε Note that we need a larger sample ( n increases) if ◮ we require greater precision ( ε decreases) ◮ we want to be more confident ( α decreases) ◮ there is more dispersion in the population ( σ increases) 20/36

CI for the Variance – Normal data The sample variance n 1 S 2 = ( X i − ¯ � X ) 2 n − 1 i =1 is our estimator for the population variance When the X i follow a normal distribution then ( n − 1) S 2 /σ 2 follows a so-called Chi-squared distribution with n − 1 degrees of freedom: Chi-squared distribution If Z i ∼ N (0 , 1) and V = � k i =1 Z 2 i then V ∼ χ 2 ( k ) where k are the degrees of freedom. E [ V ] = k and Var( V ) = 2 k . 21/36

CI for the Variance – χ 2 distribution χ 2 ( 1 ) χ 2 ( 2 ) χ 2 ( 3 ) χ 2 ( 9 ) Density 0 2 4 6 8 X 22/36

CI for the Variance – Normal data Because the Chi-square distribution is asymmetric we need to make sure that we set the boundaries of the CI such that we have α/ 2 probability mass on each side: . 025 < ( n − 1) S 2 /σ 2 < c n − 1 Pr( c n − 1 . 975 ) = 0 . 95 where we can compute c n − 1 in R using qchisq(p,k) p We can rewrite the above as . 975 < σ 2 < ( n − 1) S 2 / c n − 1 Pr(( n − 1) S 2 / c n − 1 . 025 ) = 0 . 95 23/36

CI for the Variance – Normal data 1.0 0.8 0.6 Density 0.4 0.2 0.0 0.0 0.0 0.5 0.5 1.0 1.0 1.5 1.5 2.0 2.0 2.5 2.5 S 2 24/36

CI for the Variance – Normal data If α = . 05 we know that we need to cover the true parameter 95% of the time: n = 4; nrep = 1e5 cover = rep (F, nrep) for (i in 1 : nrep) { v = var ( rnorm (n, 1, 10)) ci0 = (n - 1) * v / qchisq (.975, n - 1) ci1 = (n - 1) * v / qchisq (.025, n - 1) cover[i] = ci0 < 100 & 100 < ci1 } mean (cover) ## [1] 0.94955 25/36

CI for the mean – Normal data, variance unknown We considered X ∼ N ( µ, σ 2 ) and assumed we knew σ 2 In practice we probably don’t know σ 2 , but we have an estimator n 1 S 2 = � ( X i − ¯ X ) 2 n − 1 i =1 A simple solution is to replace σ with S : x − z α/ 2 S / √ n , ¯ x + z 1 − α/ 2 S / √ n � � ¯ But how does this work? 26/36

CI for the mean – Normal data, variance unknown If α = . 1 we know that we need to cover the true parameter 90% of the time: n = 10; nrep = 1e5; z = qnorm (.95) cover = rep (F, nrep) for (i in 1 : nrep) { x = rnorm (n, 1, 10) m = mean (x); se = sd (x) / sqrt (n) ci0 = m - z * se; ci1 = m + z * se; cover[i] = ci0 < 1 & 1 < ci1 } mean (cover) ## [1] 0.86477 27/36

Student’s t-distribution It turns out that X − µ ) / ( S / √ n ) Z = (¯ does not follow a Normal but a so-called t-distribution with n − 1 degrees of freedom t distribution If Z ∼ N (0 , 1) and V ∼ χ 2 ( k ), Z and V are independent, and Z T = � V / k then T ∼ t ( k ) where k are the degrees of freedom. E [ T ] = k and Var( T ) = 2 k . 28/36

Student’s t-distribution N ( 0 , 1 ) = t ( ∞ ) t ( 4 ) t ( 1 ) Density x 29/36

Student’s t-distribution 5 4 t 1 −α 2 ( k ) 3 z 0.995 z 0.975 z 0.95 2 5 10 20 50 k (degrees of freedom) 30/36

CI for the mean – Normal data, variance unknown We know that we need to cover the true parameter 95% of the time: n = 10; nrep = 1e5; z = qt (.975, n - 1) cover = rep (FALSE, nrep) for (i in 1 : nrep) { x = rnorm (n, 1, 10) m = mean (x); se = sd (x) / sqrt (n) ci0 = m - z * se; ci1 = m + z * se; cover[i] = ci0 < 1 & 1 < ci1 } mean (cover) ## [1] 0.95102 31/36

Interval Estimation Edwin Leuven Interval estimation While an - PowerPoint PPT Presentation

Interval Estimation Edwin Leuven Interval estimation While an estimator may be unbiased or consistent, a given estimate will never equal the true value this point estimate does not give a sense of closeness, while the variance

Massive Data Algorithmics Lecture 6: Interval Trees Massive Data Algorithmics Lecture 6:

Towards More Realistic How Interval Data Is . . . Discussion Interval Models in How to Actually

Interval Computations Interval . . . Linearization and their Possible Use Interval Arithmetic:

Dynamic Programming: Interval Scheduling and Knapsack 6.1 Weighted Interval Scheduling Weighted

Interval Analysis for Guaranteed Set Estimation MaGiX@LiX September 2011 Eric Walter (joint

Math 211 Math 211 Lecture #2 Separable Equations 2 Interval of Existence Interval of

Computing the Cube of an Computing the . . . Why Power of a Matrix Interval Matrix Is NP-Hard

On restrictions of balanced 2-interval graphs Philippe Gambette and Stphane Vialette Outline

Math 211 Math 211 Lecture #2 Separable Equations 2 Interval of Existence Interval of

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Interval Arithmatic and Automatic Differentiation in Optimization and Model Calibration Grzegorz

15 min interval data- shows flow of water to end use 15 min interval data- shows tank top up

Interval Partitions and Polynomial Factorization Daniel Panario School of Mathematics and

Constant Propagation and Interval Analysis Daniela Moldovan 29. May 2010 Daniela Moldovan

The Power of Two-Choices in Regulating Interval Partitions Ohad N. Feldheim (Stanford) Joint

The (near-)future IEEE 1788 standard for interval arithmetic Nathalie Revol INRIA - Universit

CS70: Lecture 33. WLLN, Confidence Intervals (CI): Chebyshev vs. CLT 1. Review: Inequalities:

Interval Estimates A point estimate by itself provides no information about the precision and

STAT 113 Confidence Intervals Colin Reimer Dawson Oberlin College October 3, 2017 1 / 51

Continuous Improvement Toolkit Confidence Intervals Continuous Improvement Toolkit .

M5S1 - Confidence Intervals Professor Jarad Niemi STAT 226 - Iowa State University October 9,

Inferential Statistics Chapters 6 &7

A Course in Applied Econometrics 1. Introduction Lecture 10 2. Example I: Missing Data 3.

CS70: Jean Walrand: Lecture 29. Confidence? Confidence? Confidence is essential is many

Interval Estimation Edwin Leuven Interval estimation While an - PowerPoint PPT Presentation

Interval Estimation Edwin Leuven Interval estimation While an estimator may be unbiased or consistent, a given estimate will never equal the true value this point estimate does not give a sense of closeness, while the variance

Massive Data Algorithmics Lecture 6: Interval Trees Massive Data Algorithmics Lecture 6:

Towards More Realistic How Interval Data Is . . . Discussion Interval Models in How to Actually

Interval Computations Interval . . . Linearization and their Possible Use Interval Arithmetic:

Dynamic Programming: Interval Scheduling and Knapsack 6.1 Weighted Interval Scheduling Weighted

Interval Analysis for Guaranteed Set Estimation MaGiX@LiX September 2011 Eric Walter (joint

Math 211 Math 211 Lecture #2 Separable Equations 2 Interval of Existence Interval of

Computing the Cube of an Computing the . . . Why Power of a Matrix Interval Matrix Is NP-Hard

On restrictions of balanced 2-interval graphs Philippe Gambette and Stphane Vialette Outline

Math 211 Math 211 Lecture #2 Separable Equations 2 Interval of Existence Interval of

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Interval Arithmatic and Automatic Differentiation in Optimization and Model Calibration Grzegorz

15 min interval data- shows flow of water to end use 15 min interval data- shows tank top up

Interval Partitions and Polynomial Factorization Daniel Panario School of Mathematics and

Constant Propagation and Interval Analysis Daniela Moldovan 29. May 2010 Daniela Moldovan

The Power of Two-Choices in Regulating Interval Partitions Ohad N. Feldheim (Stanford) Joint

The (near-)future IEEE 1788 standard for interval arithmetic Nathalie Revol INRIA - Universit

CS70: Lecture 33. WLLN, Confidence Intervals (CI): Chebyshev vs. CLT 1. Review: Inequalities:

Interval Estimates A point estimate by itself provides no information about the precision and

STAT 113 Confidence Intervals Colin Reimer Dawson Oberlin College October 3, 2017 1 / 51

Continuous Improvement Toolkit Confidence Intervals Continuous Improvement Toolkit .

M5S1 - Confidence Intervals Professor Jarad Niemi STAT 226 - Iowa State University October 9,

Inferential Statistics Chapters 6 &amp;7

A Course in Applied Econometrics 1. Introduction Lecture 10 2. Example I: Missing Data 3.

CS70: Jean Walrand: Lecture 29. Confidence? Confidence? Confidence is essential is many

Inferential Statistics Chapters 6 &7