Probability and Statistics for Computer Science "StaGsGcal - - PowerPoint PPT Presentation

probability and statistics
SMART_READER_LITE
LIVE PREVIEW

Probability and Statistics for Computer Science "StaGsGcal - - PowerPoint PPT Presentation

Probability and Statistics for Computer Science "StaGsGcal thinking will one day be as necessary for efficient ciGzenship as the ability to read and write." H. G. Wells Credit: wikipedia Hongye Liu, Teaching Assistant Prof, CS361,


slide-1
SLIDE 1

ì

Probability and Statistics for Computer Science

Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 10.15.2020 Credit: wikipedia

"StaGsGcal thinking will one day be as necessary for efficient ciGzenship as the ability to read and write." H. G. Wells

slide-2
SLIDE 2

Last Lecture

✺ Review of sample mean and

confidence interval

✺ Bootstrap simulaGon of other

sample staGsGc

✺ Hypothesis test intro

slide-3
SLIDE 3

Q.

Histogram of sample_median

sample_median Frequency 250 300 350 400 450 500 550 500 1000 1500 2000 2500 3000

✺ Given the histogram of

the bootstrap samples’ staGsGc, we want to get its 95% confidence

  • interval. Where is the

leV side threshold?

  • A. 0.025 quanGle
  • B. 0.05 quanGle
  • C. 0.975 quanGle
slide-4
SLIDE 4

Objectives

✺ Hypothesis test ✺ Chi-square test ✺ Maximum Likelihood EsGmaGon

slide-5
SLIDE 5

A hypothesis

✺ Ms. Smith’s vote percentage is 55%

This is what we want to test, oVen called null hypothesis H0

✺ Should we reject this hypothesis given the

poll data?

51%

slide-6
SLIDE 6

Fraction of “less extreme” samples

✺ Assuming the hypothesis H0 is true ✺ Define a test staGsGc ✺ Since N>30, x should come from a standard normal ✺ So, the fracGon of “less extreme” samples is:

x = (sample mean) − (hypothesized value) standard error

f = 1 √ 2π |x|

−|x|

exp(−u2 2 )du

slide-7
SLIDE 7

Rejection region of null hypothesis H0

✺ Assuming the hypothesis H0 is true ✺ Define a test staGsGc ✺ Since N>30, x should come from a standard normal

RejecGon region (2α)

x = (sample mean) − (hypothesized value) standard error

Credit:

  • J. Orloff et al
slide-8
SLIDE 8

P-value: Rejection region- “The extreme fraction”

✺ It is convenGonal to report the p-value of a

hypothesis test

✺ Since N>30, x should come from a standard normal

RejecGon region (2α) By convenGon: 2α = 0.05 That is: If p < 0.05, reject H0

p = 1 − f = 1 − 1 √ 2π |x|

−|x|

exp(−u2 2 )du

slide-9
SLIDE 9

p-value: election polling

✺ H0: Ms. Smith’s vote percentage is 55% ✺ The sample mean is 51% and stderr is 1.44% ✺ The test staGsGc ✺ And the p-value for the test is: ✺ So we reject the hypothesis

x = 51 − 55 1.44 = −2.7778

< 0.05

p = 1 − 1 √ 2π 2.7778

−2.7778

exp(−u2 2 )du = 0.00547

slide-10
SLIDE 10

Hypothesis test if N < 30

✺ Q: what distribuGon should we use to test the

hypothesis of sample mean if N<30?

  • A. Normal distribuGon
  • B. t-distribuGon with degree =30
  • C. t-distribuGon with degree = N
  • D. t-distribuGon with degree = N-1
slide-11
SLIDE 11

The use and misuse of p-value

✺ p-value use in scienGfic pracGce ✺ Usually used to reject the null hypothesis that the

data is random noise

✺ Common pracGce is p < 0.05 is considered significant

evidence for something interesGng

✺ CauGon about p-value hacking ✺ RejecGng the null hypothesis doesn’t mean the

alternaGve is true

✺ P < 0.05 is arbitrary and oVen is not enough for

controlling false posiGve phenomenon

slide-12
SLIDE 12

Be wary of one tailed p-values

✺ The one tailed p-value should only be considered

when the realized sample mean or differences will for sure fall only to one size of the distribuGon.

✺ SomeGmes scienGst are tempted to use one tailed

test because it’ll give smaller p-val. But this is bad staGsGcs!

slide-13
SLIDE 13

Chi-square distribution

✺ If are independent variables of standard normal

distribuGon,

Z′

is

has a Chi-square distribuGon with degree of freedom m , X ∼ χ2(m)

✺ We can test the goodness of fit for a model using a

staGsGc C against this distribuGon, where

C =

m

  • i=1

(fo(εi) − ft(εi))2 ft(εi)

X = Z2

1 + Z2 2 + ... + Z2 m = m

  • i=1

Z2

i

slide-14
SLIDE 14

Independence analysis using Chi-square

✺ Given the two way table, test whether the

column and row are independent

Boy Girl Total Grades

117 130 247

Popular

50 91 141

Sports

60 30 90

Total

227 251 478

slide-15
SLIDE 15

Independence analysis using Chi-square

✺ The theoreGcal expected values if

independent

Boy Girl Total Grades

117.29916 129.70084 247

Popular

66.96025 74.03975 141

Sports

42.74059 47.25941 90

Total

227 251 478

slide-16
SLIDE 16

The degree of the chi-square distribution for the two way table

✺ The degree of freedom for the chi-square

distribuGon for a r by c table is (r-1) × (c-1) where r>1 and c>1

✺ Because the degree df = n-1-p

= rc -1- (r-1) - (c-1) = (r-1) ×(c-1) = 2

See textbook Pg 171-172

n is the number of cells of data; p is the number of unknown parameters

slide-17
SLIDE 17

Chi-square test for the popular kid data

✺ The Chi-staGsGc : 21.455 ✺ P-value: 2.193e-05 ✺ It’s very unlikely the two categories are

independent

chisq.test(data_BG) Pearson's Chi-squared test data: data_BG X-squared = 21.455, df = 2, p-value = 2.193e-05

slide-18
SLIDE 18
  • Q. What is the degree of freedom for this?

✺ The following 2-way table for chi-square test

has a degree of freedom equal to: A. 8

  • B. 6

C. 3

  • D. 4

Class Male Female 1st 118 4 2nd 154 13 3rd 387 89 Crew 670 3

slide-19
SLIDE 19
  • Q. What is the degree of freedom for this?

✺ The following 2-way table for chi-square test

has a degree of freedom equal to: A. 8

  • B. 6

C. 3

  • D. 4

Class Male Female 1st 118 4 2nd 154 13 3rd 387 89 Crew 670 3

slide-20
SLIDE 20

Chi-square test is very versatile

✺ Chi-square test is so versaGle that it can

be uGlized in many ways either for discrete data or conGnuous data via intervals

✺ Please check out the worked-out

examples in the textbook and read more about its applicaGons.

slide-21
SLIDE 21

Maximum likelihood estimation

slide-22
SLIDE 22

The parameter estimation problem

✺ Suppose we have a dataset that we know comes from

a distribuGon (ie. Binomial, Geometric, or Poisson, etc.)

✺ What is the best esGmate of the parameters (θ or θs)

  • f the distribuGon?

✺ Examples:

✺ For binomial and geometric distribuGon, θ = p (probability of

success)

✺ For Poisson and exponenGal distribuGons, θ = λ (intensity) ✺ For normal distribuGons, θ could be μ or σ2.

slide-23
SLIDE 23

Motivation: Poisson example

✺ Suppose we have data on the number of babies

born each hour in a large hospital

✺ We can assume the data comes from a Poisson

distribuGon

✺ What is your best esGmate of the intensity λ?

Credit: David Varodayan

hour

1 2

N

# of babies k1 k2

kN

slide-24
SLIDE 24

Maximum likelihood estimation (MLE)

✺ We write the probability of seeing the data D

given parameter θ

✺ The likelihood funcBon is not a

probability distribuGon

✺ The maximum likelihood esBmate (MLE) of

θ is

L(θ) = P(D|θ)

L(θ)

ˆ θ = arg max

θ

L(θ)

slide-25
SLIDE 25

Why is L(θ) not a probability distribution?

  • A. It doesn’t give the probability of all the

possible θ values.

  • B. Don’t know whether the sum or integral of

for all possible θ values is one or not.

  • C. Both.

L(θ)

slide-26
SLIDE 26

Likelihood function: Binomial example

✺ Suppose we have a coin with unknown

probability of coming up heads

✺ We toss it N Gmes and observe k heads ✺ We know that this data comes from a binomial

distribuGon

✺ What is the likelihood funcGon ? L(θ) = P(D|θ)

slide-27
SLIDE 27

Likelihood function: binomial example

✺ Suppose we have a coin with unknown

probability of coming up heads

✺ We toss it N Gmes and observe k heads ✺ We know that this data comes from a binomial

distribuGon

✺ What is the likelihood funcGon ? L(θ) = P(D|θ)

L(θ) = N k

  • θk(1 − θ)N−k
slide-28
SLIDE 28

MLE derivation: binomial example

L(θ) = N k

  • θk(1 − θ)N−k

ˆ θ = arg max

θ

L(θ)

In order to find: We set:

dL(θ) dθ = 0

slide-29
SLIDE 29

MLE derivation: binomial example

L(θ) = N k

  • θk(1 − θ)N−k
slide-30
SLIDE 30

MLE derivation: binomial example

L(θ) = N k

  • θk(1 − θ)N−k

d dθL(θ) = N k

  • (kθk−1(1 − θ)N−k − θk(N − k)(1 − θ)N−k−1) = 0
slide-31
SLIDE 31

MLE derivation: binomial example

L(θ) = N k

  • θk(1 − θ)N−k

d dθL(θ) = N k

  • (kθk−1(1 − θ)N−k − θk(N − k)(1 − θ)N−k−1) = 0

kθk−1(1 − θ)N−k = θk(N − k)(1 − θ)N−k−1

slide-32
SLIDE 32

MLE derivation: binomial example

L(θ) = N k

  • θk(1 − θ)N−k

d dθL(θ) = N k

  • (kθk−1(1 − θ)N−k − θk(N − k)(1 − θ)N−k−1) = 0

kθk−1(1 − θ)N−k = θk(N − k)(1 − θ)N−k−1 k − kθ = Nθ − kθ

slide-33
SLIDE 33

MLE derivation: binomial example

L(θ) = N k

  • θk(1 − θ)N−k

d dθL(θ) = N k

  • (kθk−1(1 − θ)N−k − θk(N − k)(1 − θ)N−k−1) = 0

kθk−1(1 − θ)N−k = θk(N − k)(1 − θ)N−k−1 k − kθ = Nθ − kθ

ˆ θ = k N

The MLE of p

slide-34
SLIDE 34

Likelihood function: geometric example

✺ Suppose we have a die with unknown probability

  • f coming up six

✺ We roll it and it comes up six for the first Gme on

the kth roll

✺ We know that this data comes from a geometric

distribuGon

✺ What is the likelihood funcGon ?

Assume θ is p.

L(θ) = P(D|θ)

slide-35
SLIDE 35

MLE derivation: geometric example

L(θ) = (1 − θ)k−1θ

slide-36
SLIDE 36

MLE derivation: geometric example

L(θ) = (1 − θ)k−1θ

d dθL(θ) = (1 − θ)k−1 − (k − 1)(1 − θ)k−2θ = 0

slide-37
SLIDE 37

MLE derivation: geometric example

L(θ) = (1 − θ)k−1θ

(1 − θ)k−1 = (k − 1)(1 − θ)k−2θ

d dθL(θ) = (1 − θ)k−1 − (k − 1)(1 − θ)k−2θ = 0

slide-38
SLIDE 38

MLE derivation: geometric example

L(θ) = (1 − θ)k−1θ

(1 − θ)k−1 = (k − 1)(1 − θ)k−2θ

d dθL(θ) = (1 − θ)k−1 − (k − 1)(1 − θ)k−2θ = 0

1 − θ = kθ − θ

slide-39
SLIDE 39

MLE derivation: geometric example

L(θ) = (1 − θ)k−1θ

(1 − θ)k−1 = (k − 1)(1 − θ)k−2θ

d dθL(θ) = (1 − θ)k−1 − (k − 1)(1 − θ)k−2θ = 0

1 − θ = kθ − θ

ˆ θ = 1 k

The MLE of p

slide-40
SLIDE 40

MLE with data from IID trials

✺ If the dataset comes from IID trials ✺ Each xi is one observed result from an IID trial

D = {x}

L(θ) = P(D|θ) =

  • xi∈D

P(xi|θ)

slide-41
SLIDE 41

Q: MLE with data from IID trials

✺ If the dataset comes from IID trials ✺ Why is the above funcGon defined by the product?

  • A. IID samples are independent
  • B. Each trial has idenGcal probability funcGon
  • C. Both.

D = {x}

L(θ) = P(D|θ) =

  • xi∈D

P(xi|θ)

slide-42
SLIDE 42

MLE with data from IID trials

✺ If the dataset comes from IID trials ✺ The likelihood funcGon is hard to differenGate in

general, except for the binomial and geometric cases.

✺ Clever trick: take the (natural) log

D = {x}

L(θ) = P(D|θ) =

  • xi∈D

P(xi|θ)

slide-43
SLIDE 43

Log-likelihood function

✺ Since log is a strictly increasing funcGon ✺ So we can aim to maximize the log-likelihood

funcBon

✺ The log-likelihood funcGon is usually much easier

to differenGate

ˆ θ = arg max

θ

L(θ) = arg max

θ

logL(θ)

logL(θ) = logP(D|θ) = log

  • xi∈D

P(xi|θ) =

  • xi∈D

logP(xi|θ)

slide-44
SLIDE 44

Log-likelihood function: Poisson example

✺ Suppose we have data on the number of babies

born each hour in a large hospital

✺ We can assume the data comes from a Poisson

distribuGon λ

✺ What is the log likelihood funcGon ?

hour

1 2

N

# of babies k1 k2

kN

LogL(θ)

slide-45
SLIDE 45

Log-likelihood function: Poisson example

L(θ) =

N

  • i=1

e−θθki ki!

log L(θ) = log (

N

  • i=1

e−θθki ki! ) =

N

  • i=1

log(e−θθki ki! )

=

N

  • i=1

(−θ + ki logθ − log ki!)

slide-46
SLIDE 46

MLE : Poisson example

LogL(θ) =

N

  • i=1

(−θ + ki logθ − log ki!)

slide-47
SLIDE 47

MLE : Poisson example

d dθlog L(θ) = 0 ⇒

N

  • i=1

(−1 + ki θ − 0) = 0 LogL(θ) =

N

  • i=1

(−θ + ki logθ − log ki!)

slide-48
SLIDE 48

MLE : Poisson example

d dθlog L(θ) = 0 ⇒

N

  • i=1

(−1 + ki θ − 0) = 0 −N + N

i ki

θ = 0 LogL(θ) =

N

  • i=1

(−θ + ki logθ − log ki!)

slide-49
SLIDE 49

MLE : Poisson example

d dθlog L(θ) = 0 ⇒

N

  • i=1

(−1 + ki θ − 0) = 0 −N + N

i ki

θ = 0

ˆ θ = N

i ki

N

The MLE of λ

LogL(θ) =

N

  • i=1

(−θ + ki logθ − log ki!)

slide-50
SLIDE 50

MLE for normal distribution

✺ Suppose we model the dataset as

normally distributed

✺ What should be the likelihood funcGon? Is the

method of modeling the same as for the Poisson distribuGon?

  • A. Yes B. No

D = {x}

slide-51
SLIDE 51

MLE for normal distribution

✺ Suppose we model the dataset as

normally distributed

✺ What should be the likelihood funcGon? Is the

method of modeling the same as for the Poisson distribuGon? Yes and No. The idea is similar but the normal distribuGon is conGnuous, we need to use the probability density instead.

D = {x}

slide-52
SLIDE 52

MLE for normal distribution

✺ Suppose we model the dataset as

normally distributed

✺ The likelihood funcGon of a normal distribuGon:

D = {x}

L(µ, σ) =

n

  • i=1

1 √ 2πσexp(−(xi − µ)2 2σ2 )

slide-53
SLIDE 53

MLE for normal distribution

✺ Suppose we model the dataset as

normally distributed

✺ There are two parameters to esGmate: μ and σ

✺ If we fix σ and set θ= μ ✺ If we fix μ and set θ= σ

ˆ θ = 1 N

N

  • i=1

xi ˆ θ =

  • 1

N

N

  • i=1

(xi − µ)2

D = {x}

slide-54
SLIDE 54

Drawbacks of MLE

✺ Maximizing some likelihood or log-likelihood

funcGon is mathemaGcally hard

✺ If there are very few data items, the MLE

esGmate maybe very unreliable

✺ If we observe 3 heads in 10 coin tosses, should we

accept that p(heads)= 0.3 ?

✺ If we observe 0 heads in 2 coin tosses, should we

accept that p(heads)= 0 ?

slide-55
SLIDE 55

Confidence intervals for MLE estimates

✺ An MLE parameter esGmate depends on the

data that was observed

✺ We can construct a confidence interval for using

the parametric bootstrap

✺ Use the distribuGon with parameter to generate

a large number of bootstrap samples

✺ From each “syntheGc” dataset, re-esGmate the

parameter using MLE

✺ Use the histogram of these re-esGmates to

construct a confidence interval

ˆ θ ˆ θ ˆ θ

slide-56
SLIDE 56

Assignments

✺ Finish Chapter 7 of the textbook ✺ Next Gme: Maximum likelihood

esGmate, Bayesian inference

slide-57
SLIDE 57

Additional References

✺ Robert V. Hogg, Elliot A. Tanis and Dale L.

  • Zimmerman. “Probability and StaGsGcal

Inference”

✺ Morris H. Degroot and Mark J. Schervish

"Probability and StaGsGcs”

slide-58
SLIDE 58

We are interested in comparing sample means

✺ Are the average

daily body temperature of the two beavers the same?

✺ We need to model

the difference between two sample means

vs.

slide-59
SLIDE 59

How do we model the difference between two samples means?

✺ We know when the sample size N is large,

the sample mean random variable approaches normal *.

✺ So our problem became finding the model

  • f the difference between two normally

distributed random variables.

* Assume the daily temperature at different Bmes are independent.

slide-60
SLIDE 60

Background: sum of independent normals

✺ We know ✺ The sum of X1 and X2 is sGll normal (proof

  • mi}ed, ref. …)

X1 ∼ normal(µ1, σ2

1)

X2 ∼ normal(µ2, σ2

2)

X1 + X2 ∼ normal(µ1 + µ2, σ2

1 + σ2 2)

?

slide-61
SLIDE 61

Background: sum of independent normals

✺ We know ✺ So ✺ By the linearity of expected value and the

sum rule of variance of the sum of two independent random variables.

X1 ∼ normal(µ1, σ2

1)

X2 ∼ normal(µ2, σ2

2)

X1 + X2 ∼ normal(µ1 + µ2, σ2

1 + σ2 2)

slide-62
SLIDE 62

Background: sum of independent normals

✺ We know ✺ So ✺ By properGes:

X1 ∼ normal(µ1, σ2

1)

X2 ∼ normal(µ2, σ2

2)

X1 + X2 ∼ normal(µ1 + µ2, σ2

1 + σ2 2)

E[X1 + X2] = E[X1] + E[X2]

var[X1 + X2] = var[X1] + var[X2]

slide-63
SLIDE 63

Difference of independent normals

✺ We know ✺ The difference of X1 and X2 is sGll normal

(proof omi}ed)

X1 ∼ normal(µ1, σ2

1)

X2 ∼ normal(µ2, σ2

2)

X1 − X2 ∼ ?

slide-64
SLIDE 64

Difference of independent normals

✺ We know ✺ So

✺ By the linearity of expected value and the sum

rule of variance of the sum of two independent random variables and the scaling property of variance.

X1 ∼ normal(µ1, σ2

1)

X2 ∼ normal(µ2, σ2

2)

X1 − X2 ∼ normal(µ1 − µ2, σ2

1 + σ2 2)

**

slide-65
SLIDE 65

Derivation of the mean and variance of difference of independent normals

✺ Because

X1 − X2 ∼ normal(µ1 − µ2, σ2

1 + σ2 2)

**

slide-66
SLIDE 66

Derivation of the mean and variance of difference of independent normals

✺ Because

E[X1 − X2] = E[X1] − E[X2] = µ1 − µ2 var[X1 − X2] = var[X1 + (−X2)] = var[X1] + var[−X2] = var[X1] + var[X2] = σ2

1 + σ2 2

slide-67
SLIDE 67

Derivation of the mean and variance of difference of independent normals

✺ Because

E[X1 − X2] = E[X1] − E[X2] = µ1 − µ2 var[X1 − X2] = var[X1 + (−X2)] = var[X1] + var[−X2] = var[X1] + var[X2] = σ2

1 + σ2 2

slide-68
SLIDE 68

Derivation of the mean and variance of difference of independent normals

✺ Because

E[X1 − X2] = E[X1] − E[X2] = µ1 − µ2 var[X1 − X2] = var[X1 + (−X2)] = var[X1] + var[−X2] = var[X1] + var[X2] = σ2

1 + σ2 2

slide-69
SLIDE 69

Derivation of the mean and variance of difference of independent normals

✺ Because

E[X1 − X2] = E[X1] − E[X2] = µ1 − µ2 var[X1 − X2] = var[X1 + (−X2)] = var[X1] + var[−X2] = var[X1] + var[X2] = σ2

1 + σ2 2

var[c · X2] = c2var[X2]

slide-70
SLIDE 70

Derivation of the mean and variance of difference of independent normals

✺ Because

E[X1 − X2] = E[X1] − E[X2] = µ1 − µ2 var[X1 − X2] = var[X1 + (−X2)] = var[X1] + var[−X2] = var[X1] + var[X2] = σ2

1 + σ2 2

slide-71
SLIDE 71

Derivation of the mean and variance of difference of independent normals

✺ Because

E[X1 − X2] = E[X1] − E[X2] = µ1 − µ2 var[X1 − X2] = var[X1 + (−X2)] = var[X1] + var[−X2] = var[X1] + var[X2] = σ2

1 + σ2 2

X1 − X2 ∼ normal(µ1 − µ2, σ2

1 + σ2 2)

**

slide-72
SLIDE 72

Now we are ready to check the differences between sample means

✺ Because sample means are roughly normal

when N is large.

X1 − X2 ∼ normal(µ1 − µ2, σ2

1 + σ2 2)

**

slide-73
SLIDE 73

The difference between two sample means

✺ Suppose we draw samples from two populaGons

and

✺ From a sample of size kx from , we get sample

mean

✺ From a sample of size ky from , we get sample

mean

{x} {y} {x} {y}

X(kx) Y (ky)

slide-74
SLIDE 74

The difference between two sample means

D = X(kx) − Y (ky)

✺ Define random variable

as the difference between the sample means

✺ If we hypothesize that popmean({x}) =

popmean({y}), then

E[D] = E[X(kx)] − E[Y (ky)] = 0

slide-75
SLIDE 75

Standard error of the difference between two sample means

✺ Recall the standard error is roughly the

standard deviaGon of a sample mean

✺ By the property of variance of the difference

between two independent normals

var[D] . = stderr({x})2 + stderr({y})2 std[D] . =

  • stderr({x})2 + stderr({y})2

std[D] . =

  • stdunbiased({x})2

kx + stdunbiased({y})2 ky

= stderr[D]

slide-76
SLIDE 76

P-value for testing the equality of two means

✺ Define the test staGsGc ✺ If kx ≥ 30 and If ky ≥ 30

g = mean({x}) − mean({y}) stderr(D)

p = 1 − f = 1 − 1 √ 2π |g|

−|g|

exp(−u2 2 )du

slide-77
SLIDE 77

P-value: Rejection region- “The extreme fraction”

✺ It is convenGonal to report the p-value of a

hypothesis test

✺ Since N>30, x should come from a standard normal

RejecGon region (2α) By convenGon: 2α = 0.05 That is: If p < 0.05, reject H0

p = 1 − f = 1 − 1 √ 2π |g|

−|g|

exp(−u2 2 )du

slide-78
SLIDE 78

Comparing the body temperatures of two beavers

✺ kx = 114 and ky = 100 ✺ Mean({x}) = 36.86219 ✺ Mean({y}) = 37.5967 ✺ stderr({x}) = ✺ stderr({y}) = ✺ stderr(D) =

= 0.04821181

{x} {y}

stdunbiased({x}) √ 114 stdunbiased({y}) √ 100

  • stderr({x})2 + stderr({y})2
slide-79
SLIDE 79

Comparing the body temperatures of two beavers

✺ Hypothesis H0: the mean temperatures of the

two beavers are the same

✺ The test staGsGc g = = -15.235 ✺ So we can reject the hypothesis that the mean

temperatures are the same

36.86219 − 37.5967 0.04821181

p = 1 − f = 1 − 1 √ 2π 15.235

−15.235

exp(−u2 2 )du

p ≃ 0

slide-80
SLIDE 80

What if N < 30?

✺ There are general soluGons for either N >= 30 or N <

30 if the data sets are random samples from normal distributed data.

✺ The difference between sample means can be

either modeled as t-distribuGon with degree (kx +ky-2) when their populaGon standard deviaGons are the same

✺ Or the difference between sample means can be

approximated with t-distribuGon with other proper degree of freedom.

✺ There are build in t-test procedures in Python, R

slide-81
SLIDE 81

Compare the two mean temperatures of two beavers with t.test

✺ Hypothesis H0: the mean temperatures of the

two beavers are the same

✺ p < 2.2e-16 , also reject the hypothesis

slide-82
SLIDE 82

See you next time

See You!