[PPT] - 1 We have recorded the average high temperature in October at the PowerPoint Presentation

SLIDE 1

1 We have recorded the average high temperature in October at the San Diego airport, each year from 1950 through 2016.

Source: http://www.wrcc.dri.edu/cgi-bin/cliMAIN.pl?ca7740

Average October High Temperature in San Diego (1950-2016)

69 72 75 78 81 2 4 6 8 10 12 14 16 18

Frequency Average High Temperature (degrees Fahrenheit)

Distribution is approximately symmetric. Mean = 74.0, Median = 74.0, Q1 = 72.0, Q3 = 75.6, s = 2.7. For symmetric distributions, mean and median nearly the same.

SLIDE 2

2 We have recorded the amount of precipitation in October at the San Diego airport, each year from 1950 through 2016.

Source: http://www.wrcc.dri.edu/cgi-bin/cliMAIN.pl?ca7740

October Precipitation Totals in San Diego (1950-2016)

0.00 0.75 1.50 2.25 3.00 3.75 4.50 10 20 30 40

Frequency Precipitation (inches)

There is one outlier (4.98 inches of rain in October, 2004). Mean = 0.43, Median = 0.14, Q1 = 0.01, Q3 = 0.58, s = 0.75. For distributions skewed to the right, the mean tends to be larger than the median, and the upper quartile tends to be farther from the median than the lower quartile.

SLIDE 3

3 We have the Shannon Biodiversity Index for 632 soil samples collected from Scripps Coastal Reserve by BILD 4 students.

Source: data provided to the instructor by Professor Stanley Lo

Biodiversity in soil samples from Scripps Coastal Reserve

2.00 2.20 2.40 2.60 2.80 3.00 3.20 3.40 50 100 150 200

Frequency Shannon Biodiversity Index

Mean = 3.08, Median = 3.13, Q1 = 2.96, Q3 = 3.24. For distributions skewed to the left, the mean tends to be smaller than the median, and the upper quartile tends to be closer to the median than the lower quartile.

SLIDE 4

4 Data on eruptions of the Old Faithful geyser in Yellowstone.

Source: http://rweb.stat.umn.edu/R/library/alr3/help/oldfaith

300 270 240 210 180 150 120 90 35 30 25 20 15 10 5 Duration of Eruption (seconds) Frequency

Old Faithful Eruptions (October, 1980)

Distribution is bimodal. Mean = 209.9, Median = 240, Q1 = 129.8, Q3 = 268, s = 68.4. For bimodal distributions, summary statistics do not describe the distribution well. One should always plot the data.

SLIDE 5

Examples of Correlations 5

r = 0.01 r = 0.46 r = 0.76 r = 1

SLIDE 6

Examples of Correlations 6

r = -0.33 r = -0.69 r = -0.98 r = -1

SLIDE 7

7

1.0 0.8 0.6 0.4 0.2 0.0 0.5 0.4 0.3 0.2 0.1 0.0

r = -0.05

Here there is a strong association but no correlation. Correlation measures only linear association and should not be used to describe nonlinear relationships.

SLIDE 8

8

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2

r = 0.44

Here there is a weak correlation of 0.44. If the outlier is removed, there is a very strong correlation of 0.99. Outliers can have a large effect on correlation.

SLIDE 9

9 Data on SAT scores, GPAs of 1000 students

Source: http://www.cvgs.k12.va.us/DIGSTATS/main/inferant/a gpas.html

1400 1300 1200 1100 1000 900 800 700 600 500 4 3 2 1 Combined SAT Score College GPA

College GPA and SAT Scores

Correlation between combined SAT score, college GPA is 0.46. Correlation between high school GPA and college GPA is 0.54. High school GPA is a better predictor of college performance than SAT scores.

SLIDE 10

10 Data on student evaluations in 1184 UCSD Math courses (Fall 2010 – Spring 2017)

Source: http://www.cape.ucsd.edu/responses/Results.aspx

Learning and Study Time

2 4 6 8 10 12 14 16 18 2.0 2.5 3.0 3.5 4.0 4.5 5.0

Amount Learned (1-5 scale) Hours of studying per week

Correlation between “Hours” and “Learned”: 0.29 Correlation between “Hours” and “Recommend course”: -0.08 Correlation between “Hours” and “Recommend instructor”: 0.08

SLIDE 11

11 Data on 157 countries with population over 1 million.

Source: https://www.cia.gov/library/publications/the-world-factbook/index.html

8 7 6 5 4 3 2 1 80 70 60 50 40 30 Children per woman Life Expectancy

Life Expectancy and Fertility Rate

There is a moderately strong negative correlation of -0.77. High fertility rates do not cause shorter lifespans. Correlation does not imply causation. Economic conditions are a lurking variable.

SLIDE 12

12 Data on 599 women aged 45-74, Whickham, UK. Women surveyed 1972-1974, again 20 years later.

Source: Appleton, D.R., French, J.M., and Vanderpump, M.V. Ignoring a covariate: An example of Simpson’s paradox. The American Statistician, 50 (1996), 340-341.

Smokers: 107 of 271 (39.5%) died. Non-smokers: 153 of 328 (46.6%) died. Age Smokers Non-Smokers 45-54 27 of 130 (20.8%) died 12 of 78 (15.4%) died 55-64 51 of 105 (48.6%) died 40 of 121 (33.1%) died 65-74 29 of 36 (80.6%) died 101 of 129 (78.3%) died Simpson’s paradox: A higher percentage of non-smokers died

verall. A higher percentage of smokers died in each age group.

The positive association between smoking and living 20 more years does not mean that smoking causes people to live longer. Age is a lurking variable.

SLIDE 13

13 Data on graduate admissions at U.C. Berkeley.

Source: D. Freedman, R. Pisani, R. Purves. Statistics. 4th ed. Norton, New York.

In the fall of 1973, U.C. Berkeley admitted 44% of 8442 men who applied for graduate school, and 35% of 4321 women. Department Men % Accepted Women % Accepted A 825 62 108 82 B 560 63 25 68 C 325 37 593 34 D 417 33 375 35 E 191 28 393 24 F 373 6 341 7 Most individual departments admitted a similar percentage of male and female applicants. However, male applicants applied more often to departments with a higher acceptance rate.

SLIDE 14

14 Data on GPA and SAT scores of 1000 college students.

Source: http://www.cvgs.k12.va.us/DIGSTATS/main/inferant/a gpas.html

1400 1300 1200 1100 1000 900 800 700 600 500 4 3 2 1 Combined SAT Score College GPA

Predicting College GPA from SAT Scores

Predictor Coefficient Constant .002 ˆ y = .002 + .00239x SAT Score .00239 SAT = 1200, predicted GPA = .002 + (.00239)(1200) = 2.87.

SLIDE 15

15 Data on winning times in the Boston marathon, 1940-1990.

Source: http://www.bostonmarathonmediaguide.com/champions/

Winning times in the Boston marathon (1940-1990)

1940 1950 1960 1970 1980 1990 130 135 140 145 150 155

Winning time (minutes) Year

ˆ y = 1072.77 − .47519x

SLIDE 16

16 Predicted Actual Error 1991 126.7 131.1 4.4 1992 126.2 128.3 2.1 2016 114.8 132.8 18.0 2017 114.3 129.6 15.3

Winning times in the Boston marathon (1940-2017)

1930 1940 1950 1960 1970 1980 1990 2000 2010 2020 120 125 130 135 140 145 150 155

Winning time (minutes) Year

Extrapolation (making predictions outside the range of the data) can be dangerous.

SLIDE 17

17 Outliers in Regression It is important to consider carefully the effect of outliers on the position of the regression line. We say a point has high leverage if it is extreme in the x-direction. We say a point is influential if removing it would greatly change the position of the regression line. A high leverage point is often (not always) influential because the regression line tends to be drawn towards high leverage points. Regression is usually not appropriate when there is an influential point.

SLIDE 18

18

High Leverage, Not Influential

0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0

High Leverage, Influential

0.0 0.5 1.0 1.5 2.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2

Low Leverage, Not Influential

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6

SLIDE 19

19 Data on mortality rates of white males from melanoma in the 48 contiguous states, 1950-1959.

Source: G. van Belle, L. Fisher, P. Heagerty, T. Lumley. (2004) Biostatistics. 2nd ed. Wiley.

50 45 40 35 30 225 200 175 150 125 100 Latitude Mortality rate per 10,000,000

Mortality Rates from Melanoma (1950-1959)

ˆ y = 388.31 − 5.9665x, R2 = .685 The value of R2 means 68.5 percent of the variation in mortality rates from melanoma is explained by the latitude of the state.

SLIDE 20

20

50 45 40 35 30 50 40 30 20 10

10
20
30
40

Latitude Residual

Residual Plot

The residual plot looks like a random scatter. There are no apparent patterns. In particular, there is no indication of curva- ture and there are no outliers. Consequently, linear regression is appropriate.

SLIDE 21

21 Data on length and weight of 42 Rainbow trout

Source: http://www.seattlecentral.org/qelp/sets/023/023.html

Goal: to predict weight from length so that it is not necessary to weigh fish.

500 450 400 350 300 250 1400 1200 1000 800 600 400 200 Length (millimeters) Weight (grams)

Weights and lengths of rainbow trout

ˆ y = −881.9 + 3.92x, R2 = .939

SLIDE 22

22

500 450 400 350 300 250 250 200 150 100 50

50
100

Length (millimeters) Residual

Residual Plot

The residual plot shows some curvature. This suggests that the relationship between the variables is not linear. Try taking the cube root of the weights.

SLIDE 23

23

500 450 400 350 300 250 11 10 9 8 7 6 5 Length (millimeters) Cube root of weight

Cube root of weight vs Length

ˆ y1/3 = .7383 + .01994x, R2 = .971 For a fish of length 400 millimeters, our prediction for the cube root of the weight is .7383 + (.01994)(400) = 8.7143, so our prediction for the weight in grams is 8.71433 = 661.8. Caution: when making predictions, do not forget to convert back to the original units.

SLIDE 24

24

500 450 400 350 300 250 0.5 0.4 0.3 0.2 0.1 0.0

0.1
0.2
0.3
0.4

Length (millimeters) Residual

Residual Plot

The residual plot shows a random scatter, and there is no cur-

vature. Therefore, this re-expression appears to give us a good

method for predicting the weight of the fish. Caution: do not use R2 to compare models with different y variables. Judge the suitability of the re-expression based on whether the scatterplot and residual plot are free of curvature.

SLIDE 25

25 Data on brain and body weights of 62 mammals.

Source: http://www.seattlecentral.org/qelp/sets/017/017.html

7000 6000 5000 4000 3000 2000 1000 7000 6000 5000 4000 3000 2000 1000 Body Weight (kilograms) Brain Weight (grams)

Body and brain weights of 62 mammals

Three outliers: African Elephant, Asian Elephant, Human The point in the upper right (African Elephant) is a high leverage point and an influential point. Regression gives little insight into the smaller mammals.

SLIDE 26

26

600 500 400 300 200 100 700 600 500 400 300 200 100 Body Weight (kilograms) Brain Weight (grams)

Mammals with brain weights less than 1000 grams

Here we have removed three outliers. There is still a cluster of points in the lower-left corresponding to the small mammals. Try taking natural logarithms of both variables.

SLIDE 27

27

6000 4800 3600 2400 1200 60 50 40 30 20 10 Brain Weight (grams) Frequency

Brain weights of 62 mammals

8 6 4 2

2

10 8 6 4 2 log (Brain weight) Frequency

Logarithms of brain weights of 62 mammals

SLIDE 28

28

10.0 7.5 5.0 2.5 0.0

2.5
5.0

10 8 6 4 2

2

log (Body weight) log (Brain weight)

Scatterplot of log (Brain weight) vs log (Body weight)

log(brain weight) = 2.12 + 0.746 log(body weight), R2 = .919 For a mammal of body weight 5 kilograms, our prediction for the log of the brain weight in grams is 2.12 + (0.746)(log 5) = 3.32. Our prediction for the brain weight in grams is e3.32 = 27.66.

SLIDE 29

29

10.0 7.5 5.0 2.5 0.0

2.5
5.0

2 1

1
2

log (Body weight) Residual

Residual Plot

The residual plot shows a random scatter. There is no curvature and there are no outliers. This regression gives us an appropriate way of predicting brain weight from body weight.

SLIDE 30

30 Example 1: Wayne Gretzky’s Scoring

Source: www.stat.ualberta.ca/people/schmu/preprints/poisson.pdf

Wayne Gretzky played 9 seasons with Edmonton, scoring 1669 points (goals + assists) in 696 games. Average number of points per game: 2.398. Compare data to Poisson with λ = 2.398.

SLIDE 31

31 Points Games Expected 69 63.3 1 155 151.2 2 171 181.9 3 143 145.4 4 79 87.1 5 57 41.8 6 14 16.7 7 6 5.7 8 2 1.7 9+ 0.6 Gretzky scored 0 points in 69 games. The Poisson model predicts that he would score 0 points in (696)

e−2.398(2.398)0

0!

= 63.3

games.

SLIDE 32

32 Example 2: Radioactive decay of polonium.

Source: D. J. Hand, F. Daly, A. D. Lunn, K. J. McConway, and E. Ostrowski. A Handbook of Small Data Sets. London, Chapman and Hall, 1994. Original Source: E. Rutherford and M. Geiger (1910). The probability varia- tions in the distribution of alpha-particles. Philosophical Magazine, Series 6, 20, 698-704.

Total of 10,097 particles decayed over 2608 intervals of length 72 seconds, an average of 3.87 per interval. Compare the number that decay in an interval to the Poisson distribution with λ = 3.87.

SLIDE 33

33 Number Observed Expected 57 54.3 1 203 210.3 2 383 407.1 3 525 525.3 4 532 508.4 5 408 393.7 6 273 254.0 7 139 140.5 8 45 68.0 9 27 29.2 10 10 11.3 11 4 4.0 12 1.3 13 1 0.4 14 1 0.1

SLIDE 34

34 Example 3: Police shootings in the United States, 2015-2016.

Source: https://github.com/washingtonpost/data-police-shootings

The Washington Post found records of 1958 people who were shot and killed by police in 2015 or 2016. This was a 731-day period, so we compare numbers of shootings each day to the Poisson distribution with λ = 1958/731 ≈ 2.68. Number Observed Expected 50 50.2 1 149 134.4 2 162 180.0 3 155 160.1 4 115 107.7 5 60 57.7 6 24 25.7 7 12 9.9 8+ 4 4.6

SLIDE 35

35 Data on homicides in the greater San Diego area, 2008-2012.

Source: http://data.sandiegodata.org/dataset/clarinova com-crime-incidents-casnd-7ba4-extract Based on information provided by SANDAG (San Diego Association of Governments)

36 30 24 18 12 6 160 140 120 100 80 60 40 20

Days between homicides Frequency

Mean 4.968 N 367

Times between homicides in San Diego area

Excellent fit to the exponential distribution.

SLIDE 36

36 Data on 256 earthquakes in Southern California from 1932-2012

f magnitide 5.0 or greater

Source: Southern California Earthquake Data Center: http://www.data.scec.org/eq-catalogs/date mag loc.php

750 600 450 300 150 160 140 120 100 80 60 40 20

Times between earthquakes (days) Frequency

Mean 114.0 N 255

Southern California earthquakes (1932-2012)

Poor fit to exponential distribution.

SLIDE 37

37 Because of aftershocks, more very short intervals (0 or 1 days) than there would be if the times were exponentially distributed. Discard aftershocks from the data by counting only earthquakes that occurred at least 3 days after previous earthquake.

900 720 540 360 180 40 30 20 10

Times between earthquakes (days) Frequency

Mean 204.6 N 142

Earthquakes at least 3 days apart

Good fit to exponential distribution, indicating that earthquakes appear at unpredictable times.

SLIDE 38

38 Survey Sampling (Presidential Polls)

Sources:

D. Freedman, R. Pisani, R. Purves. Statistics. 3rd ed. Norton, 1998.

http://poll.gallup.com http://www.presidentelect.org/ http://fivethirtyeight.com/

Literary Digest correctly predicted winner of presidential election in 1916, 1920, 1924, 1928, and 1932 by mailing questionnaires. 1936 Literary Digest Poll Questionnaires mailed to 10 million people, 2.4 million responses. Literary Digest Prediction: Landon 57%, Roosevelt 43%. Actual Results: Roosevelt 60.6%, Landon 36.8%. Problem: The sample was not representative. Names to whom surveys were sent came from phone books, club membership

lists. Poor were undersampled.

SLIDE 39

39 1936 Gallup Poll Gallup Prediction: Roosevelt 55.7%, Landon 44.3%. Actual Results: Roosevelt 60.6%, Landon 36.8%. Quota sampling: Interviewers were assigned to survey a specific number of people in different categories, based on race, gender, age, and income. This gave better results than the Literary Digest survey. 1948 Gallup Poll Gallup Prediction: Dewey 49.5%, Truman 44.5%. Actual: Truman 49.8%, Dewey 45.1%. Problem: Convenience sampling. Interviewers surveyed assigned number of people in certain categories, but within categories could choose arbitrarily.

SLIDE 40

40 1952-2012 Gallup Poll Year Winner Gallup Actual Error 1952 Eisenhower 51.0 54.9 3.9 1956 Eisenhower 59.5 57.6 1.9 1960 Kennedy 49.0 49.7 0.7 1964 Johnson 64.0 60.6 3.4 1968 Nixon 43.0 43.4 0.4 1972 Nixon 62.0 60.3 1.7 1976 Carter 49.0 50.1 1.1 1980 Reagan 47.0 50.8 3.8 1984 Reagan 59.0 59.2 0.2 1988 Bush 56.0 53.4 2.6 1992 Clinton 49.0 43.0 6.0 1996 Clinton 52.0 49.2 2.8 2000 Bush 48.0 47.9 0.1 2004 Bush 49.0 50.7 1.7 2008 Obama 55.0 52.9 2.1 2012 Obama 48.0 51.0 3.0 In 1952, the Gallup poll began choosing people at random, and predicted the winner correctly in 15 consecutive elections from 1952 through 2008. In 2008, the sample size was only 3050.

SLIDE 41

41 2012 election

Gallup poll predicted Mitt Romney to win by 1 point, but

Barack Obama won the election.

Nate Silver of FiveThirtyEight correctly predicted the winner

in all 50 states using a model that incorporates information from hundreds of polls. 2016 election

Nate Silver predicted Hillary Clinton to win the popular vote

by 3.6 percent over Donald Trump. She won the popular vote by 2.1 percent but lost the election.

The 1.5 percent error was not atypical of past elections, but

there were larger polling errors in Midwestern states.

Possible problem:

nonresponse bias. Response rates now typically under 10 percent, creates inaccurate polls if respon- dants differ systematically from nonrespondants.

SLIDE 42

42 Random sampling 1) Simple random sample. Choose n people, every sample of size n equally likely to be chosen. 2) Stratified random sample. Divide the population into groups called strata, then do simple random sampling in each stratum. (Example: sample 500 men and 500 women rather than any 1000 people.) This can reduce variability. 3) Cluster sample. Divide the population into clusters. Select a few clusters at random and sample only from those selected. (Example: exit polling is only done at a few polling stations.) This can reduce cost. 4) Voluntary response sample. Many people are invited to respond, and all who respond are counted. (Example: surveys done through the internet or radio talk shows suffer from volun- tary response bias, have no scientific value.)

SLIDE 43

43

5 4 3 2 1 3500 3000 2500 2000 1500 1000 500 Number of heads Frequency

Number of heads in 5 coin tosses (10,000 simulations)

The distribution is symmetric, but not really bell-shaped, and

nly 6 values are possible.

SLIDE 44

44

10 8 6 4 2 2500 2000 1500 1000 500 Number of heads Frequency

Number of heads in 10 coin tosses (10,000 simulations)

The distribution of the number of heads is much closer to being bell-shaped with 10 tosses of the coin.

SLIDE 45

45

18 15 12 9 6 3 1800 1600 1400 1200 1000 800 600 400 200 Number of heads Frequency

Number of heads in 20 coin tosses (10,000 simulations)

The distribution of the number of heads in 20 tosses of a coin is well approximated by a normal distribution. In this case, we have np = n(1 − p) = 10, so the conditions for using a normal approximation to the binomial distribution are met.

SLIDE 46

46 Sample of 10,000 values from a Uniform(0,1) distribution.

1.0 0.8 0.6 0.4 0.2 0.0 500 400 300 200 100 Value Frequency

Histogram

2.0 1.5 1.0 0.5 0.0

0.5
1.0

99.99 99 95 80 50 20 5 1 0.01 Value Percent

Mean 0.4992 StDev 0.2878 N 10000 AD 111.104 P-Value < 0.005

Normal Probability Plot

In addition to examining a histogram to check for normality, we can look at a normal probability plot. If the data follow approximately a normal distribution, the normal probability plot should be approximately a straight line.

SLIDE 47

47 Means of a sample of size 2 from a Uniform(0,1) distribution (10,000 samples)

1.0 0.8 0.6 0.4 0.2 0.0 800 700 600 500 400 300 200 100 Average Frequency

Histogram

1.5 1.0 0.5 0.0

0.5

99.99 99 95 80 50 20 5 1 0.01 Average Percent

Mean 0.4989 StDev 0.2033 N 10000 AD 10.437 P-Value < 0.005

Normal Probability Plot

SLIDE 48

48 Means of a sample of size 3 from a Uniform(0,1) distribution (10,000 samples)

1.0 0.8 0.6 0.4 0.2 0.0 900 800 700 600 500 400 300 200 100 Average Frequency

Histogram

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0

0.2
0.4

99.99 99 95 80 50 20 5 1 0.01 Average Percent

Mean 0.4995 StDev 0.1662 N 10000 AD 3.933 P-Value < 0.005

Normal Probability Plot

SLIDE 49

49 Means of a sample of size 6 from a Uniform(0,1) distribution (10,000 samples)

0.82 0.70 0.58 0.46 0.34 0.22 0.10 1400 1200 1000 800 600 400 200 Average Frequency

Histogram

1.0 0.8 0.6 0.4 0.2 0.0 99.99 99 95 80 50 20 5 1 0.01 Average Percent

Mean 0.5002 StDev 0.1174 N 10000 AD 1.106 P-Value 0.007

Normal Probability Plot

SLIDE 50

50 Means of a sample of size 20 from a Uniform(0,1) distribution (10,000 samples)

0.81 0.72 0.63 0.54 0.45 0.36 0.27 2000 1500 1000 500 Average Frequency

Histogram

0.8 0.7 0.6 0.5 0.4 0.3 0.2 99.99 99 95 80 50 20 5 1 0.01 Average Percent

Mean 0.4999 StDev 0.06459 N 10000 AD 0.281 P-Value 0.641

Normal Probability Plot

SLIDE 51

51 Data on 157 countries with population over 1 million.

Source: https://www.cia.gov/library/publications/the-world-factbook/index.html

56000 48000 40000 32000 24000 16000 8000 70 60 50 40 30 20 10 Per Capita GDP Frequency

Per Capita GDP in 157 Countries

Data are highly skewed, not close to normally distributed.

SLIDE 52

52 Sample means of 100 samples of size 2.

40000 32000 24000 16000 8000 35 30 25 20 15 10 5 Average Frequency

Histogram

50000 40000 30000 20000 10000

10000
20000
30000

99.9 99 95 90 80 70 60 50 40 30 20 10 5 1 0.1

Average Percent

Mean 10434 StDev 8985 N 100 AD 4.473 P-Value < 0.005

Normal Probability Plot

SLIDE 53

53 Sample means of 100 samples of size 5.

40000 32000 24000 16000 8000 30 25 20 15 10 5 Average Frequency

Histogram

40000 30000 20000 10000

10000
20000

99.9 99 95 90 80 70 60 50 40 30 20 10 5 1 0.1

Average Percent

Mean 11066 StDev 6574 N 100 AD 1.667 P-Value < 0.005

Normal Probability Plot

SLIDE 54

54 Sample means of 100 samples of size 10.

29000 24000 19000 14000 9000 4000

1000

25 20 15 10 5 Average Frequency

Histogram

25000 20000 15000 10000 5000

99.9 99 95 90 80 70 60 50 40 30 20 10 5 1 0.1

Average Percent

Mean 11386 StDev 4014 N 100 AD 0.261 P-Value 0.701

Normal Probability Plot

SLIDE 55

55 Sample means of 100 samples of size 30.

18000 16000 14000 12000 10000 8000 25 20 15 10 5 Average Frequency

Histogram

20000 17500 15000 12500 10000 7500 5000

99.9 99 95 90 80 70 60 50 40 30 20 10 5 1 0.1

Average Percent

Mean 11677 StDev 2009 N 100 AD 0.176 P-Value 0.921

Normal Probability Plot

SLIDE 56

56 Sample means of 100 samples of size 60.

15800 14200 12600 11000 9400 7800 25 20 15 10 5 Average Frequency

Histogram

16000 14000 12000 10000 8000 6000

99.9 99 95 90 80 70 60 50 40 30 20 10 5 1 0.1

Average Percent

Mean 11894 StDev 1288 N 100 AD 0.425 P-Value 0.311

Normal Probability Plot

The number of samples whose mean is plotted (in this case 100), is not important. The histogram may be smoother with more samples, but the basic shape is unchanged. What matters for the Central Limit Theorem is the number of values being averaged (in this case 60). This number must be approximately 30 or more before we expect a normal distribution.

SLIDE 57

57 Example 1

Source: http://lib.stat.cmu.edu/DASL/Datafiles/differencetestdat.html Original source:

W. S. Gosset (1908).

The probable error of a mean. Biometrika, 6, 1-25.

To determine whether regular seed or kiln-dried seed is better for corn growth, corn was planted in 11 adjacent pairs of plots. For each adjacent pair of plots, one of the two plots was picked at random to be planted with regular seed, while the other was planted with kiln-dried seed. The corn yields from the 22 plots were measured in pounds per acre.

SLIDE 58

58 Hypothesis test: Two-sided, paired t-test. Regular Kiln-Dried Difference 1903 2009

106

1935 1915 20 1910 2011

101

2496 2463 33 2108 2180

72

1961 1925 36 2060 2122

62

1444 1482

38

1612 1542 70 1316 1443

127

1511 1535

24

SLIDE 59

59

200 100

100
200
300

99 95 90 80 70 60 50 40 30 20 10 5 1

Difference between yields from regular and kiln-dried seed (pounds/ acre) Percent

Mean

33.73

StDev 66.17 N 11 AD 0.267 P-Value 0.612

Normal probability plot of differences between yields

Because the samples of size 11 are relatively small, we check that the differences are approximately normally distributed. The normal probability plot indicates that they are, so we proceed with the test.

SLIDE 60

60 Solution: Let µd be the true mean difference between the corn yield with regular seed and the corn yield with kiln-dried seed. Let ¯ d = −33.7 and sd = 66.2 be the sample mean and sample standard deviation of the observed differences. We test H0 : µd = 0 HA : µd = 0 The test statistic T = ¯ d sd/√n has approximately a t distribution with n − 1 = 10 degrees of freedom if H0 is true. We observe T = −33.7 66.2/ √ 11 ≈ −1.69. If H0 is true, then P(|T| > 1.69) ≈ .122. Therefore, the p-value is .122, so we fail to reject H0. There is not enough evidence to say that either type of seed is better.

SLIDE 61

61 Example 2

Source: http://lib.stat.cmu.edu/DASL/ Original source: R. Lyle et. al. (1987) Blood pressure and metabolic effects

f calcium supplementation in normotensive white and black men.

JAMA 257, pp. 1772-1776.

From observational studies, it was believed that calcium intake may reduce blood pressure, especially in African-American men. We have data from an experiment involving 21 African-American men, 11 of whom were randomly assigned to take a calcium supplement for 12 weeks and 10 of whom took a placebo. The blood pressures were measured at the beginning and at the end

f the 12-week period, and the amount of decrease in blood

pressure was recorded. We want to determine whether calcium reduces blood pressure.

SLIDE 62

62 Hypothesis test: One-sided, two-sample t-test.

Placebo Calcium 20 15 10 5

5
10

Decrease in blood pressure

The effect of Calcium on blood pressure

The boxplot suggests that blood pressure was reduced in the men who took the calcium supplement but not in those who took the placebo.

SLIDE 63

63

40 30 20 10

10
20
30

99 95 90 80 70 60 50 40 30 20 10 5 1

Decrease in blood pressure Percent

Mean 5 StDev 8.743 N 10 AD 0.422 P-Value 0.256

Normal Probability Plot for Treatment Group

20 10

10
20

99 95 90 80 70 60 50 40 30 20 10 5 1

Decrease in blood pressure Percent

Mean

0.6364

StDev 5.870 N 11 AD 0.478 P-Value 0.187

Normal Probability Plot for Control Group

We have small samples of sizes 10 and 11, so the t-test is only valid if the data are approximately normally distributed. Although it is difficult to check for normality in such small sam- ples, there are no outliers and the normal probability plots are at least reasonably straight. We will proceed with the test.

SLIDE 64

64 Two-sample T for Difference Treatment N Mean StDev Calcium 10 5.00 8.74 Placebo 11

0.64

5.87 Difference = mu (Calcium) - mu (Placebo) Estimate for difference: 5.64 95% lower bound for difference: -0.12 T-Test of difference = 0 (vs > 0): T-Value = 1.72 P-Value = 0.053 DF = 15 Conclusion: We have some evidence that calcium lowers blood pressure in African-American men, but not quite enough to reject

ur null hypothesis at significance level .05.

SLIDE 65

65 Example 3

Source: http://statmaster.sdu.dk/courses/st111/data/index.html

We want to determine whether either of two forms of iron, Fe2+

r Fe3+, is retained in the body better than the other. If one were

retained better, then it would make a better dietary supplement. Each of the two forms of iron was given to 18 mice. Because the iron was radioactively labeled, it was possible to measure what percentage of the iron was retained at a later time.

SLIDE 66

66 Hypothesis test: Two-sided, two-sample t-test.

20 16 12 8 4 9 8 7 6 5 4 3 2 1 Percentage of iron retained Frequency

Percentage of iron retained (Fe2+ )

28 24 20 16 12 8 4 12 10 8 6 4 2 Percentage of iron retained Frequency

Percentage of iron retained (Fe3+ )

The histograms show that the distributions of the percentages

f iron retained by the mice are skewed.

Because the sample size is small, the assumptions for the t-test do not hold.

SLIDE 67

67 Take logarithms of the data.

4 3 2 1

99 95 90 80 70 60 50 40 30 20 10 5 1

Log (percentage of iron retained) Percent

Mean 1.901 StDev 0.6585 N 18 AD 0.288 P-Value 0.577

Normal Probability Plot (Fe2+ )

4 3 2 1

99 95 90 80 70 60 50 40 30 20 10 5 1

Log (percentage of iron retained) Percent

Mean 2.090 StDev 0.5737 N 18 AD 0.582 P-Value 0.112

Normal probability plot (Fe3+ )

The normal probability plots suggest that the distributions of the logarithms of the percentages of iron retained by the mice are ap- proximately normally distributed. We will therefore proceed with the two-sample t-test, using the logarithms of the percentages.

SLIDE 68

68 Two-sample T for Log(retained) Iron N Mean StDev Fe2+ 18 1.901 0.659 Fe3+ 18 2.090 0.574 Difference = mu (Fe2+) - mu (Fe3+) Estimate for difference: -0.189 95% CI for diference: (-0.607, 0.230) T-Test of difference = 0 (vs not =): T-Value = -0.92 P-Value = 0.366 DF = 33 Conclusion: There is no evidence that either type of iron is retained better than the other.

SLIDE 69

69 Example 4

Source: A. Agresti and C. Franklin (2007). Statistics: The Art and Science

f Learning from Data. Pearson Prentice Hall.

Original source: D. Strayer and W. Johnston (2001). Psych. Science 21, 422-466.

We want to determine whether cell phones slow the reaction time

f drivers.

Thirty-two subjects used a machine that simulated driving situations and were asked to press a break button when they saw a red light. Each subject did this once when talking on a cell phone and once when not talking on the phone, and their reaction times (in milliseconds) were recorded.

SLIDE 70

70 Hypothesis test: One-sided, paired t-test.

150 120 90 60 30

30

9 8 7 6 5 4 3 2 1 Reaction time difference (milliseconds) Frequency

Difference in reaction times with and without cell phones

There are no outliers, and there is no extreme skewness. Since the sample size is larger than 30, the assumptions required for the t-test are satisfied.

SLIDE 71

71 Paired T for Yes - No N Mean StDev Yes 32 585.2 89.6 No 32 534.6 66.4 Difference 32 50.62 52.49 95% lower bound for mean difference: 34.89 T-Test of mean difference = 0 (vs > 0): T-Value = 5.46 P-Value = 0.000 Conclusion: We have overwhelming evidence that cell phones slow the reaction times of drivers.

SLIDE 72

72 We have data on global temperatures between 1970 and 2016.

Source: https://www.ncdc.noaa.gov/monitoring-references/faq/anomalies.php

Global Temperature Anomalies, 1970-2016

1970 1980 1990 2000 2010 2020 0.0 0.2 0.4 0.6 0.8 1.0

Temperature Anomaly (degrees Celsius) Year

Note: temperatures are global average temperatures in degrees Celsius, departures from 1901-2000 average.

SLIDE 73

73

Residual Plot

1970 1980 1990 2000 2010 2020

0.2
0.1

0.0 0.1 0.2

Residual Year

The residual plot shows a random scatter. There is no curvature, and there are no outliers. The spread of residuals around the line is roughly constant. Linear regression appears to be appropriate for this problem.

SLIDE 74

74

Normal Probability Plot of Residuals

0.2
0.1

0.0 0.1 0.2 1 5 10 20 30 40 50 60 70 80 90 95 99

Percent Residual

The normal probability plot of the residuals is approximately a straight line. This indicates that the distribution of the errors is approximately normal, and the assumptions required for regres- sion inference are satisfied.

SLIDE 75

75 The regression equation is Temperature Anomaly = −34.459 + 0.0175 Year Predictor Coef SE Coef T P Constant

34.459

1.905

18.09

0.000 Year 0.0174850 0.000956 18.30 0.000 S = 0.0888727, R-Sq = 88.15%, R-Sq(adj) = 87.89% Problem 1: Do we have enough evidence to conclude that global temperatures are changing over time? Test H0 : β1 = 0, HA : β1 = 0. Test statistic T = b1 SE(b1) = 0.0174850 0.000956 = 18.30. The p-value is close to zero (1.8 × 10−22), so we reject H0. 2013 Report of Intergovernmental Panel on Climate Change (IPCC): “Warming of the climate system is unequivocal.”

SLIDE 76

76 Report of Intergovernmental Panel on Climate Change (IPCC): “It is extremely likely that human influence has been the domi- nant cause of the observed warming since the mid-20th century.” The data shown previously do not provide information about the cause of global warming. The IPCC conclusion is based on other science, including climate models.

SLIDE 77

77 Predictor Coef SE Coef T P Constant

34.459

1.905

18.09

0.000 Year 0.0174850 0.000956 18.30 0.000 Problem 2: Find a 95 percent confidence interval for the slope

f the true regression line.

b1 = 0.0175, SE(b1) = 0.000956. Critical value t∗

45 = 2.014. (Here n = 47, so df = n − 2 = 45.)

Margin of error: ME = t∗

45SE(b1) = (2.014)(0.000956) = 0.0019.

95 percent confidence interval: (0.0175 - 0.0019, 0.0175 + 0.0019) = (0.0156, 0.0194). Interpretation: We are 95 percent confident that global temper- atures are increasing at a rate of between 0.0156 and 0.0194 degrees Celsius per year.

SLIDE 78

78 Article at http://www.townhall.com on February 28, 2008: “Scientists: 100 Years of Global Warming – ERASED!” “Now there is word that all four major global temperature track- ing outlets have released data showing that temperatures have dropped significantly over the last year. California meteorolo- gist Anthony Watts says the amount of cooling ranges from 65-hundredths of a degree Centigrade to 75-hundreds of a de-

gree. He says that is a value large enough to erase nearly all the

global warming recorded over the past 100 years.”

SLIDE 79

79

Monthly Global Temperature Anomalies (1/1970-10/2017)

1970 1980 1990 2000 2010 2020

0.4
0.2

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

Temperature Anomaly (degrees Celsius) Year

Group 1 2

Temperature anomaly for January, 2007: 0.89 Temperature anomaly for January, 2008: 0.28 One should avoid “cherry-picking” data points that do not fit the overall pattern.