Univariate Continuous Data MATH 185 Introduction to Computational - - PowerPoint PPT Presentation

univariate continuous data
SMART_READER_LITE
LIVE PREVIEW

Univariate Continuous Data MATH 185 Introduction to Computational - - PowerPoint PPT Presentation

Univariate Continuous Data MATH 185 Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math185.html MATH 185 University of California San Diego


slide-1
SLIDE 1

MATH 185 – University of California San Diego – Ery Arias-Castro 1 / 30

Univariate Continuous Data

MATH 185 – Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/∼eariasca/math185.html

slide-2
SLIDE 2

Lung dysfunction in workers in the detergent industry

MATH 185 – University of California San Diego – Ery Arias-Castro 2 / 30

We consider the following dataset (quoted in Larsen & Marx, exercise 5.3.1).

> str(bacillus) 'data.frame': 19 obs. of 1 variable: $ ratio: num 0.61 0.7 0.63 0.76 0.67 0.72 0.64 0.82 0.88 0.82 ... NULL

The FEV1/V C ratio is a measure of lung capacity.

FEV1: forced expiratory volume V C: vital capacity Normal FEV1/V C ratio is 0.80

This ratio was measured for certain workers in the detergent industry exposed to a Bacillus subtilis enzyme.

slide-3
SLIDE 3

Boxplot

MATH 185 – University of California San Diego – Ery Arias-Castro 3 / 30

A basic plot that helps visualize how the data is spread out.

0.60 0.65 0.70 0.75 0.80 0.85

slide-4
SLIDE 4

Boxplot

MATH 185 – University of California San Diego – Ery Arias-Castro 4 / 30

The middle box represents the inter-quartile range and contains the 50% of

the data.

The upper edge (hinge) of the box indicates the 75th percentile of the data

set

The lower hinge indicates the 25th percentile The line within the box indicates the median value of the data. The ends of the vertical lines or ”whiskers” indicate the minimum and

maximum data values, unless outliers are present in which case the whiskers extend to a maximum of 1.5 times the inter-quartile range.

The points outside the ends of the whiskers are outliers or suspected

  • utliers.
slide-5
SLIDE 5

Boxplot

MATH 185 – University of California San Diego – Ery Arias-Castro 5 / 30

The following table provides similar information

> summary(ratio)

  • Min. 1st Qu.

Median Mean 3rd Qu. Max. 0.6100 0.7100 0.7800 0.7663 0.8350 0.8800

slide-6
SLIDE 6

Histogram

MATH 185 – University of California San Diego – Ery Arias-Castro 6 / 30

The histogram is more detailed – approximates the actual distribution of the data.

Histogram of ratio

ratio Frequency 0.60 0.65 0.70 0.75 0.80 0.85 0.90 1 2 3 4 5 6 7

slide-7
SLIDE 7

Histogram

MATH 185 – University of California San Diego – Ery Arias-Castro 7 / 30

A bin’s width is the range it covers. A bin’s height is proportional to the number of points that fall within that

range.

The histogram is an (piecewise constant) approximation of the population’s

probability density function.

slide-8
SLIDE 8

Boxplot v. Histogram

MATH 185 – University of California San Diego – Ery Arias-Castro 8 / 30

> library(UsingR) > simple.hist.and.boxplot(ratio)

Histogram of x

0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.60 0.65 0.70 0.75 0.80 0.85

slide-9
SLIDE 9

Main question

MATH 185 – University of California San Diego – Ery Arias-Castro 9 / 30

Are workers exposed to a bacillus subtilis enzyme more likely to suffer from

lung dysfunction?

Since we know what the normal level for the FEV1/V C ratio is (0.80), we

want to compare the observations to this baseline.

slide-10
SLIDE 10

Testing the Mean – Student’s t-Test

MATH 185 – University of California San Diego – Ery Arias-Castro 10 / 30

Let µ be the population mean. Consider H0 : µ = 0.80 versus H1 : µ < 0.80. The Student t-test rejects for large negative values of T, where

T = X − µ S/√n , X = 1 n

n

  • i=1

Xi, S2 = 1 n − 1

n

  • i=1

(Xi − X)2

slide-11
SLIDE 11

Testing the Mean – Student’s t-Test

MATH 185 – University of California San Diego – Ery Arias-Castro 11 / 30

> t.test(ratio, mu = 0.8) One Sample t-test data: ratio t = -1.7091, df = 18, p-value = 0.1046 alternative hypothesis: true mean is not equal to 0.8 95 percent confidence interval: 0.7249096 0.8077219 sample estimates: mean of x 0.7663158

The p-value is based on the assumption that the observations are an i.i.d. sample from a normal distribution.

slide-12
SLIDE 12

Quantile-Quantile Plot

MATH 185 – University of California San Diego – Ery Arias-Castro 12 / 30

Helps visualize whether a sample comes from a given translation/scale family of

  • distributions. Here we compare with the normal family.

−2 −1 1 2 0.60 0.65 0.70 0.75 0.80 0.85

Normal Q−Q Plot

Theoretical Quantiles Sample Quantiles

If the points lie close to the line, then this assumption is reasonable.

slide-13
SLIDE 13

Wilcoxon Sign Test

MATH 185 – University of California San Diego – Ery Arias-Castro 13 / 30

Let m be the population median. Consider H0 : m = m0 versus H1 : m < m0.

– The Wilcoxon sign test rejects for small values of N+ where

N+ = #{i : Xi > m0}

– In fact, for large n, the following statistic Z is approximately standard

normal Z = N+ + N0/2 − n/2

  • n/4

, N0 = #{i : Xi = m0}

Here we get a p-value of 0.4092729.

slide-14
SLIDE 14

Wilcoxon Signed Rank Test

MATH 185 – University of California San Diego – Ery Arias-Castro 14 / 30

Let F be the population’s cumulative distribution function, that we assume

  • symmetric. Let m be the median of F.

Say we want to test H0 : m = m0 versus H1 : m < m0.

– The Wilcoxon signed-rank test rejects for small values of W where

W =

n

  • i=1

Yi Ri, Ri = rank(|Xi − m0|), Yi = sign(Xi − m0) (Actually, R returns W − n(n + 1)/4.)

– In fact, for large n, the following statistic Z is approximately standard

normal Z = W − n(n + 1)/4

  • n(n + 1)(2n + 1)/24

Here we get p-value = 0.07084

slide-15
SLIDE 15

MATH 185 – University of California San Diego – Ery Arias-Castro 15 / 30

slide-16
SLIDE 16

MATH 185 – University of California San Diego – Ery Arias-Castro 16 / 30

slide-17
SLIDE 17

MATH 185 – University of California San Diego – Ery Arias-Castro 17 / 30

slide-18
SLIDE 18

MATH 185 – University of California San Diego – Ery Arias-Castro 18 / 30

slide-19
SLIDE 19

MATH 185 – University of California San Diego – Ery Arias-Castro 19 / 30

slide-20
SLIDE 20

MATH 185 – University of California San Diego – Ery Arias-Castro 20 / 30

slide-21
SLIDE 21

MATH 185 – University of California San Diego – Ery Arias-Castro 21 / 30

slide-22
SLIDE 22

MATH 185 – University of California San Diego – Ery Arias-Castro 22 / 30

slide-23
SLIDE 23

MATH 185 – University of California San Diego – Ery Arias-Castro 23 / 30

slide-24
SLIDE 24

MATH 185 – University of California San Diego – Ery Arias-Castro 24 / 30

slide-25
SLIDE 25

MATH 185 – University of California San Diego – Ery Arias-Castro 25 / 30

slide-26
SLIDE 26

MATH 185 – University of California San Diego – Ery Arias-Castro 26 / 30

slide-27
SLIDE 27

MATH 185 – University of California San Diego – Ery Arias-Castro 27 / 30

slide-28
SLIDE 28

MATH 185 – University of California San Diego – Ery Arias-Castro 28 / 30

slide-29
SLIDE 29

MATH 185 – University of California San Diego – Ery Arias-Castro 29 / 30

slide-30
SLIDE 30

MATH 185 – University of California San Diego – Ery Arias-Castro 30 / 30