Statistical Methods for Plant Biology PBIO 3150/5150 Anirudh V. S. - PowerPoint PPT Presentation

Statistical Methods for Plant Biology PBIO 3150/5150 Anirudh V. S. Ruhil January 26, 2016 The Voinovich School of Leadership and Public Affairs 1/22

Table of Contents 1 Sampling Distributions 2 Measuring Uncertainty around an Estimate The Standard Error of an Estimate Confidence Intervals 3 Worked Examples 2/22

Sampling Distributions

Sampling Distributions • Recall population values are parameters ... µ , σ 2 , σ ... while our sample values are estimates ... ¯ Y , s 2 , s • In fact, these sample values are point estimates ... single values that are supposed to reflect their corresponding population parameters Definition A point estimator is a sample statistic that predicts the value of the corresponding population parameter • Desirable point estimators have the following properties ... Sampling distribution of the point estimator is centered around 1 the population parameter ( unbiasedness ) Point estimator has the smallest possible standard deviation 2 ( efficiency ) Point estimator tends toward the population parameter as the 3 sample size increases ( consistency ) • What guarantees that these hold? Let us see ... 4/22

Understanding Sampling Distributions Let a population of four scores be [ 2 , 4 , 6 , 8 ] . How many random samples of two scores can we construct, and what would the sample mean be in each sample? Note: N = 4; n = 2 # # ¯ ¯ Y 1 Y 2 Y Y 1 Y 2 Y 1 2 2 2 9 6 2 4 2 2 4 3 10 6 4 5 3 2 6 4 11 6 6 6 4 2 8 5 12 6 8 7 5 4 2 3 13 8 2 5 6 4 4 4 14 8 4 6 7 4 6 5 15 8 6 7 8 4 8 6 16 8 8 8 5/22

Plotting the Distribution of Sample Means 6/22

Mapping the Genome Population 0.10 • Human Genome Project identified Probability approximately 20,500 genes in human beings 0.05 • Top panel: Population of gene lengths ( N = 20 , 290 ) 0.00 • Parameters: µ = 2 , 622 ; 0 5000 10000 15000 Gene length (number of nucleotides) σ = 2 , 036 . 967 ; Min = 60 ; Random Sample (n=100) Max = 99 , 631 0.15 • Bottom panel: Random sample of gene lengths ( n = 100 ) 0.10 Probability • Estimates: ¯ Y = 2 , 777 ; s = 1 , 875 . 814 ; Min = 87 ; 0.05 Max = 10 , 503 0.00 0 5000 10000 15000 Gene length (number of nucleotides) 7/22

What if we drew multiple samples? µ = 2622 100 Random Samples of n = 100 40 30 Frequency 20 10 0 2200 2400 2600 2800 3000 Sample Mean 8/22

But what if we increased the sample size for each draw? µ = 2622 100 Random Samples of n = 1000 25 Frequency 15 5 0 2400 2500 2600 2700 2800 2900 3000 Sample Mean 9/22

What if we increased the sample size even further? µ = 2622 100 Random Samples of n = 10,000 20 15 Frequency 10 5 0 2550 2600 2650 2700 Sample Mean 10/22

What if we increased the sample size even further? µ = 2622 100 Random Samples of n = 15,000 25 Frequency 15 5 0 2550 2600 2650 2700 Sample Mean 11/22

What if we drew all possible samples of n = 100? µ = 2622 All Random Samples of n = 100 4000 Frequency 2000 0 2000 2500 3000 3500 4000 4500 Sample Mean 12/22

The Sampling Distribution Definition The sampling distribution of ¯ Y is the probability distribution of all possible values of the sample mean ¯ Y • What we are saying is that for any given random sample the expected value of ¯ Y , denoted as E ( ¯ Y ) , = µ • Intuitively, unless we mess up our sampling, on average we should end up with a sample mean that equals the population mean (because the population mean has the highest frequency of occurrence in the population) • The preceding simulations show that the larger the sample, the more likely we are to end up with a sample mean close to the population mean ... larger samples yield more precise estimates • “Likely to equal the µ ” is one thing but how can we measure the precision of our sample-based estimate of the population mean? 13/22

Measuring Uncertainty around an Estimate

Measuring Uncertainty around an Estimate • The question now is: How far would we expect, on average, our sample mean to be from the population mean, for a given sample size? • The standard error provides the answer: σ ¯ Y = σ √ n Definition The standard error of an estimate is the standard deviation of the estimate’s sampling distribution. • Two things govern the standard error ... How the population varies ( σ ) 1 Sample size ( n ) 2 • In fact, we seldom know the population standard deviation ( σ ¯ Y ) and so have to work with the sample standard deviation ( s ) when calculating the standard error 15/22

The Standard Error of an Estimate Definition The standard error of the mean is estimated from the sample at hand s and calculated as ... SE ¯ Y = √ n Note: When calculating SE ¯ Y we divide by n and not by n − 1 Y = 1522 . 082 • When n = 30; s = 1522 . 082; SE ¯ = 277 . 8929 √ 30 Y = 1522 . 082 • When n = 60; s = 1522 . 082; SE ¯ √ = 196 . 4999 60 Y = 1522 . 082 • When n = 100; s = 1522 . 082; SE ¯ = 152 . 2082 √ 100 • Of course, if σ is large then so will be s and as a result so will be SE ¯ Y • Note also that every estimate (Median, correlation coefficient, etc.) has a standard error associated with it 16/22

Confidence Intervals • Since we do not see the population and have a single estimate drawn from the sample (say, ¯ Y ), how sure can we be that we are close to µ ? • Confidence Intervals help us answer this question Definition ... A range of plausible values that surround the sample estimate and this range of plausible values is likely to contain the population parameter • Confidence intervals typically used: 95% or 99% , and you hear folks say “we can be 95% confident that the true parameter (for e.g., the population mean) lies between values x and y ” [popular phrasing] • What they should say is that if “we drew all possible samples of size n and calculated the resulting sample estimates, the range of estimates established by 95% of the 95% confidence intervals calculated for the resulting sample means would trap the population mean” • Rule of thumb : 95% confidence interval is ≈ = ¯ Y ± 2 SE 17/22

Confidence Interval Simulation n = 20 95% Confidence Intervals (100 Sample Runs) 100 Sample Run 60 20 0 1000 2000 3000 4000 Gene Length (in mm) Note: Only 94 CIs touch µ = 2 , 622 (the hashed red line) 18/22

... once more n = 100 95% Confidence Intervals (100 Sample Runs) 100 Sample Run 60 20 0 2000 2500 3000 3500 Gene Length (in mm) Note: Only 95 CIs touch µ = 2 , 622 (the hashed red line) 19/22

Worked Examples

Worked Example 1 Practice Problem #2 The standard error of the mean time to rigor mortis is 0.22 hours 1 (which is approximately 13.27 minutes The standard error measures the spread of the sampling distribution 2 of mean time to rigor mortis That the data represent a random sample of time to rigor mortis 3 21/22

Worked Example 2 Practice Problem #7 Mean flash duration is 95.94 milliseconds 1 No, it is very unlikely because this estimates is based upon a small 2 sample of 35 male fireflies The standard error is 1.85 milliseconds 3 The standard error tells us how far, on average, we might expect our 4 sample mean to be from the population mean. The approximate 95% CI is: 5 95 . 94286 ± 2 ( 1 . 858409 ) = ( 92 . 22604 , 99 . 65968 ) We can be roughly 95% confident that the true population mean of 6 flash duration lies in this interval of ( 92 . 22 , 99 . 65 ) milliseconds. 22/22

Statistical Methods for Plant Biology PBIO 3150/5150 Anirudh V. S. - PowerPoint PPT Presentation

Statistical Methods for Plant Biology PBIO 3150/5150 Anirudh V. S. Ruhil January 26, 2016 The Voinovich School of Leadership and Public Affairs 1/22 Table of Contents 1 Sampling Distributions 2 Measuring Uncertainty around an Estimate The

Vertical Gardening Susan Holewa Sedgwick County EMG Gardening in the Third Dimension UP!!

2019-20 DNA Biology New Products RNA Biology PROTEIN Biology MOLECULAR Biology Plant DNA

Basics of Molecular biology Molecular biology is the study of biology at molecular level.

Introduction to Plant Taxonomy Introduction to Plant Taxonomy (See P. 1169) (See P. 1169)

Introduction to Fetal Medicine: Genetics and Embryology Question: What do cancer biology,

connections between cs and biology computing science and biology (1) biology is the science

Plant DNA Extraction Plant DNA Extraction Workshop Workshop Dr. F. Shokouhifar Research center

PULPER TREATMENT PLANT FOR PAPER MILLS TECHNICAL DATA OF THE PLANT CAPACITY OF THE PLANT: 80

Corporate Overview Plant at Yamunanagar, Haryana Head Office, Noida Head Office, Noida Plant at

Plant Development Lecture 1: Plant architecture and embryogenesis. Lecture 2: Polarity and

Statistical Methods for Plant Biology PBIO 3150/5150 Anirudh V. S. Ruhil January 21, 2016 The

Statistical Methods for Plant Biology PBIO 3150/5150 Anirudh V. S. Ruhil March 8, 2016 The

Statistical Methods for Plant Biology PBIO 3150/5150 Anirudh V. S. Ruhil January 14, 2016 The

Statistical Methods for Plant Biology PBIO 3150/5150 Anirudh V. S. Ruhil February 1, 2016 The

Statistical Methods for Plant Biology PBIO 3150/5150 Anirudh V. S. Ruhil February 2, 2016 The

Statistical Methods for Plant Biology PBIO 3150/5150 Anirudh V. S. Ruhil March 2, 2016 The

Basic Queueing Theory CS 450 : Operating Systems Michael Saelee <lee@iit.edu> Agenda -

CLIMATE RISK FRAMEWORK FOR THE ENERGY TRANSITION O C TO B E R 1 9 , 2 0 2 0 1 ConocoPhillips

Prioritizing Education, Prioritizing Texas Sarah Perez 4 th Grade Teacher, San Antonio Teach

What is Data Science? January 23, 2020 Data Science CSCI 1951A Brown University Instructor:

F ue ling the Civic Ima g ina tion: E xe rc ising the Value s o f De mo c ratic Civic E

Logistic Regression using Excel OLS with Nudge V1F 7/27/2017 V1F 2017 ASA 1 V1F 2017

Two equals one: Street-fighting mathematics and science for better teaching and thinking Sanjoy

Theorem Pro v ers and Computer Algebra Systems John Harrison Cam bridge Univ ersit y

Sambuz

Useful Links

Newsletter

Mail Us

Statistical Methods for Plant Biology PBIO 3150/5150 Anirudh V. S. - PowerPoint PPT Presentation

Statistical Methods for Plant Biology PBIO 3150/5150 Anirudh V. S. Ruhil January 26, 2016 The Voinovich School of Leadership and Public Affairs 1/22 Table of Contents 1 Sampling Distributions 2 Measuring Uncertainty around an Estimate The

Vertical Gardening Susan Holewa Sedgwick County EMG Gardening in the Third Dimension UP!!

2019-20 DNA Biology New Products RNA Biology PROTEIN Biology MOLECULAR Biology Plant DNA

Basics of Molecular biology Molecular biology is the study of biology at molecular level.

Introduction to Plant Taxonomy Introduction to Plant Taxonomy (See P. 1169) (See P. 1169)

Introduction to Fetal Medicine: Genetics and Embryology Question: What do cancer biology,

connections between cs and biology computing science and biology (1) biology is the science

Plant DNA Extraction Plant DNA Extraction Workshop Workshop Dr. F. Shokouhifar Research center

PULPER TREATMENT PLANT FOR PAPER MILLS TECHNICAL DATA OF THE PLANT CAPACITY OF THE PLANT: 80

Corporate Overview Plant at Yamunanagar, Haryana Head Office, Noida Head Office, Noida Plant at

Plant Development Lecture 1: Plant architecture and embryogenesis. Lecture 2: Polarity and

Statistical Methods for Plant Biology PBIO 3150/5150 Anirudh V. S. Ruhil January 21, 2016 The

Statistical Methods for Plant Biology PBIO 3150/5150 Anirudh V. S. Ruhil March 8, 2016 The

Statistical Methods for Plant Biology PBIO 3150/5150 Anirudh V. S. Ruhil January 14, 2016 The

Statistical Methods for Plant Biology PBIO 3150/5150 Anirudh V. S. Ruhil February 1, 2016 The

Statistical Methods for Plant Biology PBIO 3150/5150 Anirudh V. S. Ruhil February 2, 2016 The

Statistical Methods for Plant Biology PBIO 3150/5150 Anirudh V. S. Ruhil March 2, 2016 The

Basic Queueing Theory CS 450 : Operating Systems Michael Saelee &lt;lee@iit.edu&gt; Agenda -

CLIMATE RISK FRAMEWORK FOR THE ENERGY TRANSITION O C TO B E R 1 9 , 2 0 2 0 1 ConocoPhillips

Prioritizing Education, Prioritizing Texas Sarah Perez 4 th Grade Teacher, San Antonio Teach

What is Data Science? January 23, 2020 Data Science CSCI 1951A Brown University Instructor:

F ue ling the Civic Ima g ina tion: E xe rc ising the Value s o f De mo c ratic Civic E

Logistic Regression using Excel OLS with Nudge V1F 7/27/2017 V1F 2017 ASA 1 V1F 2017

Two equals one: Street-fighting mathematics and science for better teaching and thinking Sanjoy

Theorem Pro v ers and Computer Algebra Systems John Harrison Cam bridge Univ ersit y

Sambuz

Useful Links

Newsletter

Mail Us

Basic Queueing Theory CS 450 : Operating Systems Michael Saelee <lee@iit.edu> Agenda -