E LIZABETH A. A LBRIGHT , P H .D . A SSISTANT P ROFESSOR OF THE P - PowerPoint PPT Presentation

P RE -O RIENTATION R EVIEW S ESSION ENV710 A PPLIED D ATA A NALYSIS FOR E NVIRONMENTAL S CIENCE 17 A UGUST 2017 1 E LIZABETH A. A LBRIGHT , P H .D . A SSISTANT P ROFESSOR OF THE P RACTICE

O UTLINE FOR T ODAY  Introductions  Overview of diagnostic exam  Review/Practice Problems 2

O VERVIEW OF D IAGNOSTIC  20 questions  One hour and 15 minutes  No calculators  No credit for work w/o correct answer  Z-Distribution table will be supplied 3

P OTENTIAL T OPICS  Basic math and algebra  Descriptive statistics  Probability  Sampling  Inference  Confidence intervals  Comparison of means 4  Type I and Type II errors

The Statistics Review Website http://sites.nicholas.duke.edu/statsreview 5

B ASIC M ATH  Rounding/Significant digits  Algebra  Exponents and their rules  Logarithms and their rules 6

B ASIC M ATH P RACTICE P ROBLEMS  0.306 contains how many significant digits?  3 6 * 3 2 = ?  log 10 (8) – log 10 (2) = ?  Simplify: (x 4 x -2 ) -3  Simplify: 6!/2! 7

B ASIC M ATH S OLUTIONS  0.306 contains three significant digits  3 6 * 3 2 = 3 8  log 10 (8) – log 10 (2) = log 10 (4)  Simplify: (x 4 x -2 ) -3 =(x 2 ) -3 = x -6  Simplify: 6!/2! = (6*5*4*3*2*1)/(2*1)=720/2=360 8

D ESCRIPTIVE S TATISTICS 9

D ESCRIPTIVE S TATISTICS  Measure of central tendency  Mean  Median  Mode  Measure of spread  Standard deviation  Variance  IQR  Range  Skewness  Outliers 10

Q UESTION OF I NTEREST Do Nicholas or Fuqua faculty members have larger transportation carbon footprints? 11

T HE S TEPS  Design the study  Random sampling  Collect the data  Describe the data  Infer from the samples to the populations 12

CO2 E MISSIONS ( METRIC TONS ) FROM T RANSPORTATION S OURCES FOR 10 R ANDOMLY S ELECTED NSOE F ACULTY 7 1 2 4 2 8 7 15 2 2 13

M EASURE OF C ENTRAL T ENDENCY  Mean = 5 metric tons CO2  Median = 3 metric tons CO2  Mode = 2 metric tons CO2 14

The Mean (Expected Value) 𝑜 𝑦 = 1/ 𝑜 𝑦 𝑗 𝑗 =1 15

M EDIAN  If odd number of observations: middle value (50 th percentile)  If even number of observations: halfway between the middle two values 16

S PREAD OF A DISTRIBUTION  Range : 15-1 = 14 metric tons CO2  Largest observation minus smallest observation  Variance =  18.9 metric tons 2  Standard Deviation s=4.3 metric tons  17

V ARIANCE 18

P ROBABILITY 19

R ANDOM V ARIABLE  A variable whose value is a function of a random process  Discrete  Continuous  If X is a random variable, then p(X=x) is the probability that the the value x will occur 20

Which of the following is a discrete random variable? I. The height of a randomly selected MEM student.   II. The annual number of lottery winners from Durham.   III. The number of presidential elections in the United States in the 20th century. (A) I only (B) II only (C) III only   (D) I and II (E) II and III 21

P ROPERTIES OF P ROBABILITY  The events A and B are mutually exclusive if they have no outcomes in common and so can never occur together.  If A and B are mutually exclusive then P(A or B) = P(A) + P(B) Example: Roll a die . What ’ s the probability of getting a 1 or a 2? 22

P(A OR B) What if events A and B are not mutually exclusive? P(A or B) = P(A) + P(B) – P(A and B) 23

D ECK OF C ARDS 24

P(A OR B) Example : What ’ s the probability of pulling a black card or a ten from a deck of cards? 25

P(A OR B) Example : What ’ s the probability of pulling a black card or a ten from a deck of cards? P(black) = 26/52 P(10) = 4/52 Probability of a black card OR a ten = 26/52 + 4/52 – 2/52 = 28/52 26

P(A AND B) p(A and B) = p(A) * p(B)  Two consecutive flips of a coin, A and B  A = [heads on first flip]  B = [heads on second flip]  p(A and B) = ???  p(A and B) = ½ * ½ = 1/4 27

T HE N ORMAL D ISTRIBUTION 28

T HE N ORMAL D ISTRIBUTION 29 Normal Distribution (2012) Last accessed September, 2012 from http://www.comfsm.fm/~dleeling/statistics/notes06.html.

Z S CORE  How do you convert any normal curve to the standard normal curve? 31

N ORMAL D ISTRIBUTION C ALCULATIONS If X is normally distributed around a mean of 32 and a standard deviation of 8, find: a. p(X>32) b. p(X>48) c. p(X<24) d. p(40<X<48) 32

S OLUTIONS a. p(X>32) = p(z>0) = 0.5 b. p(X>48) = p(z>2) = 0.0228 c. p(X<24) = p(z<-1) = 0.1587 d. p(40<X<48) = p(1<z<2) = 0.1587 – 0.0228 = 0.136 33

N ORMAL D ISTRIBUTION P RACTICE P ROBLEM  The crop yield is typically measured as the amount of the crop produced per acre. For example, cotton is measured in pounds per acre. It has been demonstrated that the normal distribution can be used to characterize crop yields.  Historical data suggest that the probability distribution of next summer ’ s cotton yield for a particular North Carolina farm can be characterized by a normal distribution with mean 1,500 pounds per acres and standard deviation 250. The farm in question will be profitable if it produces at least 1,600 pounds per acre.  What is the probability that the farm will lose money next summer? 34

N ORMAL D ISTRIBUTION P RACTICE P ROBLEM Historical data suggest that the probability distribution of next summer ’ s cotton yield for a particular North Carolina farm can be characterized by a normal distribution with mean 1,500 pounds per acres and standard deviation 250. The farm in question will be profitable if it produces at least 1,600 pounds per acre.  What is the probability that the farm will lose money next summer? 35

S AMPLING AND THE C ENTRAL L IMIT THEOREM 36

S AMPLING  Why do we sample?  In simple random sampling every unit in the population has an equal probability of being sampled.  Sampling error  Samples will vary because of the random process 37

C ENTRAL L IMIT T HEOREM As the size of a sampling distribution increases, the sampling distribution of X bar concentrates more and more around µ. The shape of the distribution also gets closer and closer to normal. population n=5 n=100 38

P ROFUNDITY OF C ENTRAL L IMIT T HEOREM  As sample size gets larger, even if you start with a non-normal distribution, the sampling distribution approaches a normal distribution 39

S AMPLING D ISTRIBUTION OF THE S AMPLE M EANS  Mean of the sample means  Standard Error  Standard deviation of the sampling distribution of sample means 40

SE VS . SD  What is the difference between standard deviation and standard error?  SD is the typical deviation from the average. SD does not depend on random sampling.  SE is the typical deviation from the expected value in a random sample. SE results from random sampling. 41

INFERENCE…. 42

I NFERENCE  We infer from a sample to a population.  Need to take into account sampling error.  Confidence intervals  Comparison of means tests 43

C ONFIDENCE I NTERVAL WITH KNOWN STANDARD DEVIATION  Let ’ s construct a 95% confidence interval (X bar -1.96*SE < µ <X bar + 1.96*SE)  Where did I get the 1.96 (the multiplier)?  Very important!!! It is the confidence interval that varies, not the population mean. 44

CI P RACTICE P ROBLEM We want to construct a 95% confidence interval around the mean number of hours that Nicholas MEM students (who are enrolled in statistics) spend studying statistics each week. We randomly sample 36 students and find that the average study time is eight hours. The standard deviation of study time of the population of all students in statistics is 2 hours. Calculate the 95% confidence interval of the mean study time. How do you interpret the confidence interval? 45

C ONFIDENCE I NTERVAL S OLUTION  (X bar -1.96*SE < µ <X bar + 1.96*SE)  Xbar = 8 hours  σ = 2 hours  SE = 2/sqrt(36) = 2/6 = 0.333  (8 – 1.96*0.333 < µ < 8 + 1.96 * 0.333)  (7.35 hours < µ < 8.65 hours) We are 95% confident that the interval (7.35 hrs, 8.65 hrs) covers the true average number of hours MEM students spend studying statistics. 46

C OMPARISON OF M EANS T ESTS  One sample  Is the average dissolved oxygen concentration less than 5mg/L?  Two independent samples  Do residents of North Carolina spend more on organic food than residents of South Carolina?  Matched/Pairs/Repeated samples  Are individuals ’ left hands larger than their right hands? 47

O NE -S AMPLE H YPOTHESIS T ESTING A PPROACH • Set up a ‘ null hypothesis ’ , (typically hypothesizing there is no difference between the population mean and a given value) • Establish an alternative hypothesis (that there is a difference between the population mean and a given value) • Calculate sample mean, standard deviation, standard error • Calculate a the test statistic and a p-value • The smaller the p-value, the more statistically significant results • Interpret results

E LIZABETH A. A LBRIGHT , P H .D . A SSISTANT P ROFESSOR OF THE P - PowerPoint PPT Presentation

P RE -O RIENTATION R EVIEW S ESSION ENV710 A PPLIED D ATA A NALYSIS FOR E NVIRONMENTAL S CIENCE 17 A UGUST 2017 1 E LIZABETH A. A LBRIGHT , P H .D . A SSISTANT P ROFESSOR OF THE P RACTICE O UTLINE FOR T ODAY Introductions Overview of

LIU-PSB MD S AND S TUDIES : 2018 Simon Albright BE-RF-BR February 19, 2018 S IMON A LBRIGHT

Voluntary Sector Forum 27 September 2017 ELI LIZABETH KWARTENG-AMANING Fou ounder & CEO (B

Starting Smart Lizabeth Doherty, Coordinator of Career Services Dawson McDermott, Coordinator of

Health & Wellbeing Board Up-date lizabeth Disney & Lucy Townsend Content: 1. Strategic

Ca Canadian Sch cholars CAN ANADIAN SCHO SCHOLAR AWARDS OPPORTUNITIES IN THE U.S. FOR

Health Benefits of Participating in the Healthy Harvest Food Box Program for a Food Insecure

Determination of material properties from boundary measurements in anisotropic elastic media

TRC1801 Evaluation of WIM AutoCalibration Practices and Parameters Sarah Hernandez Assistant

On a discrete Laplacian based method for outliers detection in Phase I of profile control charts

W H E N A N A L Y Z I N G T H E intersection of antitrust and patent law, goes the argument,

Adult Correctional Adult Correctional Recidivism Legislative Budget Board Criminal Justice Data

On asymmetric quantum hypothesis testing JMP, Vol 57, 6, 10.1063/1.4953582 arXiv:1612.01464

Vision Inspection System Josh Grundmann, Sarah Rowland, Ashley Oulds Advisor: Dr. Paul Weckler

Andrew Applegate NEFMC Staff Whiting PDT Chair NEFMC November 2018 Amendment 22 Webinar

Automated Geospatial Watershed Assessment (AGWA) Tool: A GIS-based Hydrologic Modeling Tool for

14th International Open Repositories Conference, June 10th-13th, Hamburg, Germany Introduction

Bargaining over Remedies in Merger Regulation Bruce Lyons and Andrei Medvedev Centre for

Predictive Analytics: Practical insights into Goals, Means, & Managing the development of an

Methods Consultation Panel for Pragmatic Clinical Studies: Evaluation and Recommendations Laura

of Coronavirus Justine Compton and Liz Davies, Garden Court Chambers 2 April 2020

Translating Nestl Roadmap into Executional Excellence CAGE, March 16th 2015 Luis Cantarell,

Translating Nestl Roadmap into ExecutionalExcellence CAGE, March 16th 2015 Luis Cantarell,

Potential applicability and challenges of using in pp y g g vitro and in silico methodologies

LYME DISEASE T HE DI SE ASE I N YOUR BACK YARD Kevin I. Young, MD Free copy of full slide

Sambuz

Useful Links

Newsletter

Mail Us

E LIZABETH A. A LBRIGHT , P H .D . A SSISTANT P ROFESSOR OF THE P - PowerPoint PPT Presentation

P RE -O RIENTATION R EVIEW S ESSION ENV710 A PPLIED D ATA A NALYSIS FOR E NVIRONMENTAL S CIENCE 17 A UGUST 2017 1 E LIZABETH A. A LBRIGHT , P H .D . A SSISTANT P ROFESSOR OF THE P RACTICE O UTLINE FOR T ODAY Introductions Overview of

LIU-PSB MD S AND S TUDIES : 2018 Simon Albright BE-RF-BR February 19, 2018 S IMON A LBRIGHT

Voluntary Sector Forum 27 September 2017 ELI LIZABETH KWARTENG-AMANING Fou ounder &amp; CEO (B

Starting Smart Lizabeth Doherty, Coordinator of Career Services Dawson McDermott, Coordinator of

Health &amp; Wellbeing Board Up-date lizabeth Disney &amp; Lucy Townsend Content: 1. Strategic

Ca Canadian Sch cholars CAN ANADIAN SCHO SCHOLAR AWARDS OPPORTUNITIES IN THE U.S. FOR

Health Benefits of Participating in the Healthy Harvest Food Box Program for a Food Insecure

Determination of material properties from boundary measurements in anisotropic elastic media

TRC1801 Evaluation of WIM AutoCalibration Practices and Parameters Sarah Hernandez Assistant

On a discrete Laplacian based method for outliers detection in Phase I of profile control charts

W H E N A N A L Y Z I N G T H E intersection of antitrust and patent law, goes the argument,

Adult Correctional Adult Correctional Recidivism Legislative Budget Board Criminal Justice Data

On asymmetric quantum hypothesis testing JMP, Vol 57, 6, 10.1063/1.4953582 arXiv:1612.01464

Vision Inspection System Josh Grundmann, Sarah Rowland, Ashley Oulds Advisor: Dr. Paul Weckler

Andrew Applegate NEFMC Staff Whiting PDT Chair NEFMC November 2018 Amendment 22 Webinar

Automated Geospatial Watershed Assessment (AGWA) Tool: A GIS-based Hydrologic Modeling Tool for

14th International Open Repositories Conference, June 10th-13th, Hamburg, Germany Introduction

Bargaining over Remedies in Merger Regulation Bruce Lyons and Andrei Medvedev Centre for

Predictive Analytics: Practical insights into Goals, Means, &amp; Managing the development of an

Methods Consultation Panel for Pragmatic Clinical Studies: Evaluation and Recommendations Laura

of Coronavirus Justine Compton and Liz Davies, Garden Court Chambers 2 April 2020

Translating Nestl Roadmap into Executional Excellence CAGE, March 16th 2015 Luis Cantarell,

Translating Nestl Roadmap into ExecutionalExcellence CAGE, March 16th 2015 Luis Cantarell,

Potential applicability and challenges of using in pp y g g vitro and in silico methodologies

LYME DISEASE T HE DI SE ASE I N YOUR BACK YARD Kevin I. Young, MD Free copy of full slide

Sambuz

Useful Links

Newsletter

Mail Us

Voluntary Sector Forum 27 September 2017 ELI LIZABETH KWARTENG-AMANING Fou ounder & CEO (B

Health & Wellbeing Board Up-date lizabeth Disney & Lucy Townsend Content: 1. Strategic

Predictive Analytics: Practical insights into Goals, Means, & Managing the development of an