Statistical Data Analysis DS GA 1002 Statistical and Mathematical - - PowerPoint PPT Presentation

statistical data analysis
SMART_READER_LITE
LIVE PREVIEW

Statistical Data Analysis DS GA 1002 Statistical and Mathematical - - PowerPoint PPT Presentation

Statistical Data Analysis DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall16 Carlos Fernandez-Granda Descriptive statistics Statistical estimation Histogram Technique to visualize


slide-1
SLIDE 1

Statistical Data Analysis

DS GA 1002 Statistical and Mathematical Models

http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall16 Carlos Fernandez-Granda

slide-2
SLIDE 2

Descriptive statistics Statistical estimation

slide-3
SLIDE 3

Histogram

Technique to visualize one-dimensional data Bin range of the data, then count the number of instances in each bin The width of the bins can be adjusted to yield higher or lower resolution Approximation to their pmf or pdf if data are iid

slide-4
SLIDE 4

Temperature in Oxford

5 10 15 20 25 30 Degrees (Celsius) 5 10 15 20 25 30 35 40 45 January August

slide-5
SLIDE 5

GDP per capita

50 100 150 200 Thousands of dollars 10 20 30 40 50 60 70 80 90

slide-6
SLIDE 6

Empirical mean

Let {x1, x2, . . . , xn} be a set of real-valued data The empirical mean is defined as av (x1, x2, . . . , xn) := 1 n

n

  • i=1

xi Temperature data: 6.73 ◦C in January and 21.3 ◦C in August GDP per capita: $16 500

slide-7
SLIDE 7

Empirical mean

Let { x1, x2, . . . , xn} be a set of d-dimensional real-valued data The empirical mean is defined as av ( x1, x2, . . . , xn) := 1 n

n

  • i=1
  • xi

Centering a dataset by subtracting its empirical mean is a common preprocessing step

slide-8
SLIDE 8

Empirical variance

Let {x1, x2, . . . , xn} be a set of real-valued data The empirical variance is defined as var (x1, x2, . . . , xn) := 1 n − 1

n

  • i=1

(xi − av (x1, x2, . . . , xn))2 The sample standard deviation is the square root of the empirical variance Temperature data: 1.99 ◦C in January and 1.73 ◦C in August GDP per capita: $25 300

slide-9
SLIDE 9

Temperature dataset

In January the temperature in Oxford is around 6.73 ◦C give or take 2 ◦C

slide-10
SLIDE 10

GDP dataset

Countries typically have a GDP per capita of about $16 500 give or take $25 300

slide-11
SLIDE 11

Quantiles and percentiles

Let x(1) ≤ x(2) ≤ . . . ≤ x(n) denote the ordered elements of a dataset {x1, x2, . . . , xn} The q quantile of the data for 0 < q < 1 is x(⌈q(n+1)⌉) The 100 p quantile is known as the p percentile

slide-12
SLIDE 12

Quartiles and median

The 0.25 and 0.75 quantiles are the first and third quartiles The 0.5 quantile is the empirical median If n is even, the empirical median is usually set to x(n/2) + x(n/2+1) 2 The difference between the 3rd and 1st quartiles is the interquartile range (IQR)

slide-13
SLIDE 13

Quartiles and median

◮ Temperature data (January):

◮ Sample mean: 6.73 ◦C ◮ Median: 6.80 ◦C ◮ Interquartile range: 2.9 ◦C

◮ Temperature data (August):

◮ Sample mean: 21.3 ◦C ◮ Median: 21.2 ◦C ◮ Interquartile range: 2.1 ◦C

slide-14
SLIDE 14

Quartiles and median

◮ GDP per capita:

◮ Sample mean: $16 500

(71% of the countries have lower GDP per capita!)

◮ Median: $6 350 ◮ Interquartile range: $18 200 ◮ Five-number summary: $130, $1 960, $6 350, $20 100, $188 000

slide-15
SLIDE 15

Boxplot of temperature data

January April August November 5 5 10 15 20 25 30 Degrees (Celsius)

slide-16
SLIDE 16

Boxplot of GDP data

10 20 30 40 50 60 Thousands of dollars

slide-17
SLIDE 17

Multidimensional data

Each dimension represents a feature We can visualize two-dimensional data using scatter plots

slide-18
SLIDE 18

Scatter plot

16 18 20 22 24 26 28

August

8 10 12 14 16 18 20

April

slide-19
SLIDE 19

Scatter plot

5 5 10 15 20 25 30

Maximum temperature

10 5 5 10 15 20

Minimum temperature

slide-20
SLIDE 20

Empirical covariance

Data: {(x1, y1) , (x2, y2) , . . . , (xn, yn)} The empirical covariance is defined as

cov ((x1, y1) , . . . , (xn, yn)) := 1 n − 1

n

  • i=1

(xi − av (x1, . . . , xn)) (yi − av (y1, . . . , yn))

slide-21
SLIDE 21

Empirical correlation coefficient

Data: {(x1, y1) , (x2, y2) , . . . , (xn, yn)} The empirical correlation coefficient is defined as ρ ((x1, y1) , . . . , (xn, yn)) := cov ((x1, y1) , . . . , (xn, yn)) std (x1, . . . , xn) std (y1, . . . , yn) Cauchy-Schwarz inequality: for any a, b −1 ≤

  • aT

b ||a||2 ||b||2 ≤ 1 Consequence: −1 ≤ ρ ((x1, y1) , . . . , (xn, yn)) ≤ 1

slide-22
SLIDE 22

ρ = 0.269

16 18 20 22 24 26 28

August

8 10 12 14 16 18 20

April

slide-23
SLIDE 23

ρ = 0.962

5 5 10 15 20 25 30

Maximum temperature

10 5 5 10 15 20

Minimum temperature

slide-24
SLIDE 24

Empirical covariance matrix

Data: { x1, x2, . . . , xn} (d features) The empirical covariance matrix is defined as Σ ( x1, . . . , xn) := 1 n − 1

n

  • i=1

( xi − av ( x1, . . . , xn)) ( xi − av ( x1, . . . , xn))T The (i, j) entry, 1 ≤ i, j ≤ d, is given by Σ ( x1, . . . , xn)ij =

  • var ((

x1)i , . . . , ( xn)i) if i = j, cov

  • (

x1)i , ( x1)j

  • , . . . ,
  • (

xn)i , ( xn)j

  • if i = j.
slide-25
SLIDE 25

Empirical variance in a certain direction

Let v be a unit-norm vector aligned with a direction of interest var

  • v T

x1, . . . , v T xn

slide-26
SLIDE 26

Empirical variance in a certain direction

Let v be a unit-norm vector aligned with a direction of interest var

  • v T

x1, . . . , v T xn

  • =

1 n − 1

n

  • i=1
  • v T

xi − av

  • v T

x1, . . . , v T xn 2

slide-27
SLIDE 27

Empirical variance in a certain direction

Let v be a unit-norm vector aligned with a direction of interest var

  • v T

x1, . . . , v T xn

  • =

1 n − 1

n

  • i=1
  • v T

xi − av

  • v T

x1, . . . , v T xn 2 = 1 n − 1

n

  • i=1
  • v T (

xi − av ( x1, . . . , xn)) 2

slide-28
SLIDE 28

Empirical variance in a certain direction

Let v be a unit-norm vector aligned with a direction of interest var

  • v T

x1, . . . , v T xn

  • =

1 n − 1

n

  • i=1
  • v T

xi − av

  • v T

x1, . . . , v T xn 2 = 1 n − 1

n

  • i=1
  • v T (

xi − av ( x1, . . . , xn)) 2 = v T

  • 1

n − 1

n

  • i=1

( xi − av ( x1, . . . , xn)) ( xi − av ( x1, . . . , xn))T

  • v
slide-29
SLIDE 29

Empirical variance in a certain direction

Let v be a unit-norm vector aligned with a direction of interest var

  • v T

x1, . . . , v T xn

  • =

1 n − 1

n

  • i=1
  • v T

xi − av

  • v T

x1, . . . , v T xn 2 = 1 n − 1

n

  • i=1
  • v T (

xi − av ( x1, . . . , xn)) 2 = v T

  • 1

n − 1

n

  • i=1

( xi − av ( x1, . . . , xn)) ( xi − av ( x1, . . . , xn))T

  • v

= v TΣ ( x1, . . . , xn) v

slide-30
SLIDE 30

Eigendecomposition of the covariance matrix

Let v be a unit-norm vector aligned with a direction of interest Σ ( x1, . . . , xn) = UΛUT =

  • u1
  • u2

· · ·

  • un

   λ1 · · · λ2 · · · · · · · · · λn    

  • u1
  • u2

· · ·

  • un

T

slide-31
SLIDE 31

Eigendecomposition of the covariance matrix

For any symmetric matrix A ∈ Rn with normalized eigenvectors

  • u1,

u2, . . . , un and corresponding eigenvalues λ1 ≥ λ2 ≥ . . . ≥ λn λ1 = max

|| v||2=1

v TA v

  • u1 = arg max

|| v||2=1

v TA v λk = max

|| v||2=1, u⊥ u1,..., uk−1

  • v TA

v

  • uk = arg

max

|| v||2=1, u⊥ u1,..., uk−1

  • v TA

v

slide-32
SLIDE 32

Principal component analysis

Compute eigenvectors of empirical covariance matrix to determine directions of maximum variation Application: dimensionality reduction Example: Seeds from 3 varieties of wheat (Kama, Rosa and Canadian) 7 features: area, perimeter, compactness, length of kernel, width of kernel, asymmetry coefficient and length of kernel groove Aim: Visualize in two dimensions

slide-33
SLIDE 33

PCA dimensionality reduction

2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0

Projection onto first PC

2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0

Projection onto second PC

slide-34
SLIDE 34

PCA dimensionality reduction

2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0

Projection onto (d-1)th PC

2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0

Projection onto dth PC

slide-35
SLIDE 35

Descriptive statistics Statistical estimation

slide-36
SLIDE 36

Statistical estimation

Data: Realization of iid sequence Aim: Estimate parameter associated to underlying distribution Frequentist viewpoint: Parameter is deterministic

slide-37
SLIDE 37

Estimator

Deterministic function of the data x1, x2, . . . , xn yn := h (x1, x2, . . . , xn)

slide-38
SLIDE 38

Estimator

Under iid assumption

  • Y (n) := h
  • X (1) ,

X (2) , . . . , X (n)

  • ◮ Does

Y converge to γ as n → ∞?

◮ For finite n what is the probability that γ is approximated by the

estimator up to a certain accuracy?

slide-39
SLIDE 39

Sampling from a population

Population of m individuals We are interested in a feature associated to each person (cholesterol level, salary, who they are voting for. . . ) The feature has k possible values {z1, z2, . . . , zk} mj = number of people for whom feature equals zj

slide-40
SLIDE 40

Sampling from a population

Data: Values of the feature for a subset of individuals X If individuals are chosen uniformly at random with replacement p

X(i) (zj) = P (The feature for the ith chosen person equals zj)

= mj m , 1 ≤ j ≤ k The sequence is iid

slide-41
SLIDE 41

Mean square error

The mean square error (MSE) of an estimator Y that approximates a parameter γ is MSE (Y ) := E

  • (Y − γ)2
slide-42
SLIDE 42

Bias-variance decomposition

MSE (Y ) = E

  • (Y − γ)2
slide-43
SLIDE 43

Bias-variance decomposition

MSE (Y ) = E

  • (Y − γ)2

= E

  • (Y − E (Y ) + E (Y ) − γ)2
slide-44
SLIDE 44

Bias-variance decomposition

MSE (Y ) = E

  • (Y − γ)2

= E

  • (Y − E (Y ) + E (Y ) − γ)2

= E ((Y − E (Y )))2 + (E (Y ) − γ)2 + 2 (E (Y ) − γ) E (Y − E (Y ))

  • E(Y )−E(Y )
slide-45
SLIDE 45

Bias-variance decomposition

MSE (Y ) = E

  • (Y − γ)2

= E

  • (Y − E (Y ) + E (Y ) − γ)2

= E ((Y − E (Y )))2 + (E (Y ) − γ)2 + 2 (E (Y ) − γ) E (Y − E (Y ))

  • E(Y )−E(Y )

= E

  • (Y − E (Y ))2
  • variance

+ (E (Y ) − γ)2

  • bias
slide-46
SLIDE 46

Unbiased estimator

An estimator Y that approximates γ is unbiased if and only if E (Y ) = γ

slide-47
SLIDE 47

Empirical mean is unbiased

The empirical mean of an iid sequence X with mean µ

  • Y (n) := 1

n

n

  • i=1
  • X (i)

is unbiased E

  • Y (n)
  • = E
  • 1

n

n

  • i=1
  • X (i)
slide-48
SLIDE 48

Empirical mean is unbiased

The empirical mean of an iid sequence X with mean µ

  • Y (n) := 1

n

n

  • i=1
  • X (i)

is unbiased E

  • Y (n)
  • = E
  • 1

n

n

  • i=1
  • X (i)
  • = 1

n

n

  • i=1

E

  • X (i)
slide-49
SLIDE 49

Empirical mean is unbiased

The empirical mean of an iid sequence X with mean µ

  • Y (n) := 1

n

n

  • i=1
  • X (i)

is unbiased E

  • Y (n)
  • = E
  • 1

n

n

  • i=1
  • X (i)
  • = 1

n

n

  • i=1

E

  • X (i)
  • = µ
slide-50
SLIDE 50

Empirical mean is unbiased

The empirical mean of an iid sequence X with mean µ

  • Y (n) := 1

n

n

  • i=1
  • X (i)

is unbiased E

  • Y (n)
  • = E
  • 1

n

n

  • i=1
  • X (i)
  • = 1

n

n

  • i=1

E

  • X (i)
  • = µ

The empirical variance is also unbiased

slide-51
SLIDE 51

Consistency

An estimator Y (n) := h

  • X (1) ,

X (2) , . . . , X (n)

  • that approximates γ

is consistent if it converges to γ as n → ∞ in mean square, with probability one or in probability

slide-52
SLIDE 52

Consistency

The empirical mean of an iid sequence X with mean µ

  • Y (n) := 1

n

n

  • i=1
  • X (i)

is consistent by the law of large numbers if the variance is bounded

slide-53
SLIDE 53

Estimating the average height

Population of 25 000 people Goal: Estimate average height from iid samples X The average of the population is the mean of the iid sequence E

  • X (i)
  • :=

m

  • j=1

P (Person j is chosen) · height of person j = 1 m

m

  • j=1

hj = av (h1, . . . , hm)

slide-54
SLIDE 54

Estimating the average height

60 62 64 66 68 70 72 74 76

Height (inches)

0.05 0.10 0.15 0.20 0.25

slide-55
SLIDE 55

Estimating the average height

100 101 102 103 n 64 65 66 67 68 69 70 71 72 Height (inches) True mean Empirical mean

slide-56
SLIDE 56

Empirical median is consistent

The empirical median of an iid sequence X is consistent even if the mean is not well defined or the variance is unbounded

slide-57
SLIDE 57

Proof

Aim: Show that for any ǫ > 0 lim

n→∞ P

  • Y (n) − γ
  • ≥ ǫ
  • = 0

We will prove that lim

n→∞ P

  • Y (n) ≥ γ + ǫ
  • = 0

The same argument allows to prove lim

n→∞ P

  • Y (n) ≤ γ − ǫ
  • = 0
slide-58
SLIDE 58

Proof

Assuming n is odd, Y (n) equals the (n + 1) /2th element The event Y (n) ≥ γ + ǫ implies that at least (n + 1) /2 of the elements are larger than γ + ǫ For each individual X (i), the probability that X (i) > γ + ǫ is p := 1 − F

X(i) (γ + ǫ) = 1/2 − ǫ′

Distribution of number of X (i) above 1/2 − ǫ′?

slide-59
SLIDE 59

Proof

Assuming n is odd, Y (n) equals the (n + 1) /2th element The event Y (n) ≥ γ + ǫ implies that at least (n + 1) /2 of the elements are larger than γ + ǫ For each individual X (i), the probability that X (i) > γ + ǫ is p := 1 − F

X(i) (γ + ǫ) = 1/2 − ǫ′

Distribution of number of X (i) above 1/2 − ǫ′? Binomial Bn with parameters n and p

slide-60
SLIDE 60

Proof

P

  • Y (n) ≥ γ + ǫ
  • ≤ P

n + 1 2

  • r more samples ≥ γ + ǫ
slide-61
SLIDE 61

Proof

P

  • Y (n) ≥ γ + ǫ
  • ≤ P

n + 1 2

  • r more samples ≥ γ + ǫ
  • = P
  • Bn ≥ n + 1

2

slide-62
SLIDE 62

Proof

P

  • Y (n) ≥ γ + ǫ
  • ≤ P

n + 1 2

  • r more samples ≥ γ + ǫ
  • = P
  • Bn ≥ n + 1

2

  • = P
  • Bn − np ≥ n + 1

2 − np

slide-63
SLIDE 63

Proof

P

  • Y (n) ≥ γ + ǫ
  • ≤ P

n + 1 2

  • r more samples ≥ γ + ǫ
  • = P
  • Bn ≥ n + 1

2

  • = P
  • Bn − np ≥ n + 1

2 − np

  • ≤ P
  • |Bn − np| ≥ nǫ′ + 1

2

slide-64
SLIDE 64

Proof

P

  • Y (n) ≥ γ + ǫ
  • ≤ P

n + 1 2

  • r more samples ≥ γ + ǫ
  • = P
  • Bn ≥ n + 1

2

  • = P
  • Bn − np ≥ n + 1

2 − np

  • ≤ P
  • |Bn − np| ≥ nǫ′ + 1

2

  • ≤ Var (Bn)
  • nǫ′ + 1

2

2 by Chebyshev’s inequality

slide-65
SLIDE 65

Proof

P

  • Y (n) ≥ γ + ǫ
  • ≤ P

n + 1 2

  • r more samples ≥ γ + ǫ
  • = P
  • Bn ≥ n + 1

2

  • = P
  • Bn − np ≥ n + 1

2 − np

  • ≤ P
  • |Bn − np| ≥ nǫ′ + 1

2

  • ≤ Var (Bn)
  • nǫ′ + 1

2

2 by Chebyshev’s inequality = np (1 − p) n2 ǫ′ + 1

2n

2

slide-66
SLIDE 66

Proof

P

  • Y (n) ≥ γ + ǫ
  • ≤ P

n + 1 2

  • r more samples ≥ γ + ǫ
  • = P
  • Bn ≥ n + 1

2

  • = P
  • Bn − np ≥ n + 1

2 − np

  • ≤ P
  • |Bn − np| ≥ nǫ′ + 1

2

  • ≤ Var (Bn)
  • nǫ′ + 1

2

2 by Chebyshev’s inequality = np (1 − p) n2 ǫ′ + 1

2n

2 = p (1 − p) n

  • ǫ′ + 1

2n

2

slide-67
SLIDE 67

Cauchy iid sequence: empirical mean

10 20 30 40 50 i 5 5 10 15 20 25 30

Moving average Median of iid seq.

slide-68
SLIDE 68

Cauchy iid sequence: empirical mean

100 200 300 400 500 i 10 5 5 10

Moving average Median of iid seq.

slide-69
SLIDE 69

Cauchy iid sequence: empirical mean

1000 2000 3000 4000 5000 i 60 50 40 30 20 10 10 20 30

Moving average Median of iid seq.

slide-70
SLIDE 70

Cauchy iid sequence: empirical median

10 20 30 40 50 i 3 2 1 1 2 3

Moving median Median of iid seq.

slide-71
SLIDE 71

Cauchy iid sequence: empirical median

100 200 300 400 500 i 3 2 1 1 2 3

Moving median Median of iid seq.

slide-72
SLIDE 72

Cauchy iid sequence: empirical median

1000 2000 3000 4000 5000 i 3 2 1 1 2 3

Moving median Median of iid seq.

slide-73
SLIDE 73

Consistency

Empirical variance is consistent if fourth moment is bounded Covariance matrix converges under similar conditions

slide-74
SLIDE 74

PCA: n = 5

True covariance Empirical covariance

slide-75
SLIDE 75

PCA: n = 5

True covariance Empirical covariance

slide-76
SLIDE 76

PCA: n = 20

slide-77
SLIDE 77

PCA: n = 100

slide-78
SLIDE 78

Confidence intervals

Aim: quantify accuracy of estimator for a fixed number of data A 1 − α confidence interval I for a parameter γ satisfies P (γ ∈ I) ≥ 1 − α where 0 < α < 1

slide-79
SLIDE 79

Confidence interval for the mean of an iid sequence

Let X be an iid sequence with mean µ and variance σ2 ≤ b2 for b > 0 For any 0 < α < 1 In :=

  • Yn −

b √α n, Yn + b √α n

  • Yn := av
  • X (1) ,

X (2) , . . . , X (n)

  • ,

is a 1 − α confidence interval for µ

slide-80
SLIDE 80

Proof

P

  • µ ∈
  • Yn −

σ √α n, Yn + σ √α n

  • = 1 − P
  • |Yn − µ| >

σ √α n

slide-81
SLIDE 81

Proof

P

  • µ ∈
  • Yn −

σ √α n, Yn + σ √α n

  • = 1 − P
  • |Yn − µ| >

σ √α n

  • ≥ 1 − α nVar (Yn)

b2

slide-82
SLIDE 82

Proof

P

  • µ ∈
  • Yn −

σ √α n, Yn + σ √α n

  • = 1 − P
  • |Yn − µ| >

σ √α n

  • ≥ 1 − α nVar (Yn)

b2 = 1 − α σ2 b2

slide-83
SLIDE 83

Proof

P

  • µ ∈
  • Yn −

σ √α n, Yn + σ √α n

  • = 1 − P
  • |Yn − µ| >

σ √α n

  • ≥ 1 − α nVar (Yn)

b2 = 1 − α σ2 b2 ≥ 1 − α

slide-84
SLIDE 84

Bears in Yosemite

Aim: average weight of bears in Yosemite Scientist captures 300 bears, average weight Y := 200 lbs We need bound on the variance Maximum weight = 880 lbs For a randomly selected bear X σ2 = E

  • X 2

− E2 (X) ≤ E

  • X 2

≤ 8802 because X ≤ 880 := b

slide-85
SLIDE 85

Bears in Yosemite

  • Y −

b √α n, Y + b √α n

  • = [−27.2, 427.2]

is a 95% confidence interval for the average weight of the whole population

slide-86
SLIDE 86

Central limit theorem with empirical standard deviation

Let X be an iid discrete sequence with mean µ such that its variance and fourth moment E

  • X (i)4

are bounded. The sequence √n

  • av
  • X (1) , . . . ,

X (n)

  • − µ
  • std
  • X (1) , . . . ,

X (n)

  • converges in distribution to a standard Gaussian random variable
slide-87
SLIDE 87

Q function

For x > 0 Q (x) := ∞

u=x

1 √ 2π exp

  • −u2

2

  • du

If U is a standard Gaussian random variable and y < 0 P (U < y) = Q (−y)

slide-88
SLIDE 88

Approximate confidence interval for the mean

Let X be an iid discrete sequence with mean µ such that its variance and fourth moment E

  • X (i)4

are bounded. For any 0 < α < 1 In :=

  • Yn − Sn

√nQ−1 α 2

  • , Yn + Sn

√nQ−1 α 2

  • Yn := av
  • X (1) ,

X (2) , . . . , X (n)

  • Sn := std
  • X (1) ,

X (2) , . . . , X (n)

  • is an approximate 1 − α confidence interval for µ, i.e.

P (µ ∈ In) ≈ 1 − α

slide-89
SLIDE 89

Approximate confidence interval for the mean

P (µ ∈ In) = 1 − P

  • Yn > µ + Sn

√nQ−1 α 2

  • − P
  • Yn < µ − Sn

√nQ−1 α 2

slide-90
SLIDE 90

Approximate confidence interval for the mean

P (µ ∈ In) = 1 − P

  • Yn > µ + Sn

√nQ−1 α 2

  • − P
  • Yn < µ − Sn

√nQ−1 α 2

  • = 1 − P

√n (Yn − µ) Sn > Q−1 α 2

  • − P

√n (Yn − µ) Sn < −Q−1 α 2

slide-91
SLIDE 91

Approximate confidence interval for the mean

P (µ ∈ In) = 1 − P

  • Yn > µ + Sn

√nQ−1 α 2

  • − P
  • Yn < µ − Sn

√nQ−1 α 2

  • = 1 − P

√n (Yn − µ) Sn > Q−1 α 2

  • − P

√n (Yn − µ) Sn < −Q−1 α 2

  • ≈ 1 − 2Q
  • Q−1 α

2

slide-92
SLIDE 92

Approximate confidence interval for the mean

P (µ ∈ In) = 1 − P

  • Yn > µ + Sn

√nQ−1 α 2

  • − P
  • Yn < µ − Sn

√nQ−1 α 2

  • = 1 − P

√n (Yn − µ) Sn > Q−1 α 2

  • − P

√n (Yn − µ) Sn < −Q−1 α 2

  • ≈ 1 − 2Q
  • Q−1 α

2

  • = 1 − α
slide-93
SLIDE 93

Bears in Yosemite

Empirical standard deviation is 100 lbs Given that Q (1.95) ≈ 0.025,

  • Y − σ

√nQ−1 α 2

  • , Y + σ

√nQ−1 α 2

  • ≈ [188.8, 211.3]

is an approximate 95% confidence interval

slide-94
SLIDE 94

Interpreting confidence intervals

The average weight is between 188.8 and 211.3 lbs with probability 0.95

slide-95
SLIDE 95

Interpreting confidence intervals

If we repeat the process of sampling the population and computing the confidence interval, then the true parameter will lie in the interval 95% of the time

slide-96
SLIDE 96

Estimating the average height

We compute 40 confidence intervals of the form In :=

  • Yn − Sn

√nQ−1 α 2

  • , Yn + Sn

√nQ−1 α 2

  • Yn := av
  • X (1) ,

X (2) , . . . , X (n)

  • Sn := std
  • X (1) ,

X (2) , . . . , X (n)

  • for 1 − α = 0.95 and different values of n
slide-97
SLIDE 97

Estimating the average height: n = 50

True mean

slide-98
SLIDE 98

Estimating the average height: n = 200

True mean

slide-99
SLIDE 99

Estimating the average height: n = 1000

True mean