statistical methods for plant biology
play

Statistical Methods for Plant Biology PBIO 3150/5150 Anirudh V. S. - PowerPoint PPT Presentation

Statistical Methods for Plant Biology PBIO 3150/5150 Anirudh V. S. Ruhil January 26, 2016 The Voinovich School of Leadership and Public Affairs 1/22 Table of Contents 1 Sampling Distributions 2 Measuring Uncertainty around an Estimate The


  1. Statistical Methods for Plant Biology PBIO 3150/5150 Anirudh V. S. Ruhil January 26, 2016 The Voinovich School of Leadership and Public Affairs 1/22

  2. Table of Contents 1 Sampling Distributions 2 Measuring Uncertainty around an Estimate The Standard Error of an Estimate Confidence Intervals 3 Worked Examples 2/22

  3. Sampling Distributions

  4. Sampling Distributions • Recall population values are parameters ... µ , σ 2 , σ ... while our sample values are estimates ... ¯ Y , s 2 , s • In fact, these sample values are point estimates ... single values that are supposed to reflect their corresponding population parameters Definition A point estimator is a sample statistic that predicts the value of the corre- sponding population parameter • Desirable point estimators have the following properties ... Sampling distribution of the point estimator is centered around 1 the population parameter ( unbiasedness ) Point estimator has the smallest possible standard deviation 2 ( efficiency ) Point estimator tends toward the population parameter as the 3 sample size increases ( consistency ) • What guarantees that these hold? Let us see ... 4/22

  5. Understanding Sampling Distributions Let a population of four scores be [ 2 , 4 , 6 , 8 ] . How many random samples of two scores can we construct, and what would the sample mean be in each sample? Note: N = 4; n = 2 # # ¯ ¯ Y 1 Y 2 Y Y 1 Y 2 Y 1 2 2 2 9 6 2 4 2 2 4 3 10 6 4 5 3 2 6 4 11 6 6 6 4 2 8 5 12 6 8 7 5 4 2 3 13 8 2 5 6 4 4 4 14 8 4 6 7 4 6 5 15 8 6 7 8 4 8 6 16 8 8 8 5/22

  6. Plotting the Distribution of Sample Means 6/22

  7. Mapping the Genome Population 0.10 • Human Genome Project identified Probability approximately 20,500 genes in human beings 0.05 • Top panel: Population of gene lengths ( N = 20 , 290 ) 0.00 • Parameters: µ = 2 , 622 ; 0 5000 10000 15000 Gene length (number of nucleotides) σ = 2 , 036 . 967 ; Min = 60 ; Random Sample (n=100) Max = 99 , 631 0.15 • Bottom panel: Random sample of gene lengths ( n = 100 ) 0.10 Probability • Estimates: ¯ Y = 2 , 777 ; s = 1 , 875 . 814 ; Min = 87 ; 0.05 Max = 10 , 503 0.00 0 5000 10000 15000 Gene length (number of nucleotides) 7/22

  8. What if we drew multiple samples? µ = 2622 100 Random Samples of n = 100 40 30 Frequency 20 10 0 2200 2400 2600 2800 3000 Sample Mean 8/22

  9. But what if we increased the sample size for each draw? µ = 2622 100 Random Samples of n = 1000 25 Frequency 15 5 0 2400 2500 2600 2700 2800 2900 3000 Sample Mean 9/22

  10. What if we increased the sample size even further? µ = 2622 100 Random Samples of n = 10,000 20 15 Frequency 10 5 0 2550 2600 2650 2700 Sample Mean 10/22

  11. What if we increased the sample size even further? µ = 2622 100 Random Samples of n = 15,000 25 Frequency 15 5 0 2550 2600 2650 2700 Sample Mean 11/22

  12. What if we drew all possible samples of n = 100? µ = 2622 All Random Samples of n = 100 4000 Frequency 2000 0 2000 2500 3000 3500 4000 4500 Sample Mean 12/22

  13. The Sampling Distribution Definition The sampling distribution of ¯ Y is the probability distribution of all possible values of the sample mean ¯ Y • What we are saying is that for any given random sample the expected value of ¯ Y , denoted as E ( ¯ Y ) , = µ • Intuitively, unless we mess up our sampling, on average we should end up with a sample mean that equals the population mean (because the population mean has the highest frequency of occurrence in the population) • The preceding simulations show that the larger the sample, the more likely we are to end up with a sample mean close to the population mean ... larger samples yield more precise estimates • “Likely to equal the µ ” is one thing but how can we measure the precision of our sample-based estimate of the population mean? 13/22

  14. Measuring Uncertainty around an Estimate

  15. Measuring Uncertainty around an Estimate • The question now is: How far would we expect, on average, our sample mean to be from the population mean, for a given sample size? • The standard error provides the answer: σ ¯ Y = σ √ n Definition The standard error of an estimate is the standard deviation of the estimate’s sampling distribution. • Two things govern the standard error ... How the population varies ( σ ) 1 Sample size ( n ) 2 • In fact, we seldom know the population standard deviation ( σ ¯ Y ) and so have to work with the sample standard deviation ( s ) when calculating the standard error 15/22

  16. The Standard Error of an Estimate Definition The standard error of the mean is estimated from the sample at hand s and calculated as ... SE ¯ Y = √ n Note: When calculating SE ¯ Y we divide by n and not by n − 1 Y = 1522 . 082 • When n = 30; s = 1522 . 082; SE ¯ = 277 . 8929 √ 30 Y = 1522 . 082 • When n = 60; s = 1522 . 082; SE ¯ √ = 196 . 4999 60 Y = 1522 . 082 • When n = 100; s = 1522 . 082; SE ¯ = 152 . 2082 √ 100 • Of course, if σ is large then so will be s and as a result so will be SE ¯ Y • Note also that every estimate (Median, correlation coefficient, etc.) has a standard error associated with it 16/22

  17. Confidence Intervals • Since we do not see the population and have a single estimate drawn from the sample (say, ¯ Y ), how sure can we be that we are close to µ ? • Confidence Intervals help us answer this question Definition ... A range of plausible values that surround the sample estimate and this range of plausible values is likely to contain the population parameter • Confidence intervals typically used: 95% or 99% , and you hear folks say “we can be 95% confident that the true parameter (for e.g., the population mean) lies between values x and y ” [popular phrasing] • What they should say is that if “we drew all possible samples of size n and calculated the resulting sample estimates, the range of estimates established by 95% of the 95% confidence intervals calculated for the resulting sample means would trap the population mean” • Rule of thumb : 95% confidence interval is ≈ = ¯ Y ± 2 SE 17/22

  18. Confidence Interval Simulation n = 20 95% Confidence Intervals (100 Sample Runs) 100 Sample Run 60 20 0 1000 2000 3000 4000 Gene Length (in mm) Note: Only 94 CIs touch µ = 2 , 622 (the hashed red line) 18/22

  19. ... once more n = 100 95% Confidence Intervals (100 Sample Runs) 100 Sample Run 60 20 0 2000 2500 3000 3500 Gene Length (in mm) Note: Only 95 CIs touch µ = 2 , 622 (the hashed red line) 19/22

  20. Worked Examples

  21. Worked Example 1 Practice Problem #2 The standard error of the mean time to rigor mortis is 0.22 hours 1 (which is approximately 13.27 minutes The standard error measures the spread of the sampling distribution 2 of mean time to rigor mortis That the data represent a random sample of time to rigor mortis 3 21/22

  22. Worked Example 2 Practice Problem #7 Mean flash duration is 95.94 milliseconds 1 No, it is very unlikely because this estimates is based upon a small 2 sample of 35 male fireflies The standard error is 1.85 milliseconds 3 The standard error tells us how far, on average, we might expect our 4 sample mean to be from the population mean. The approximate 95% CI is: 5 95 . 94286 ± 2 ( 1 . 858409 ) = ( 92 . 22604 , 99 . 65968 ) We can be roughly 95% confident that the true population mean of 6 flash duration lies in this interval of ( 92 . 22 , 99 . 65 ) milliseconds. 22/22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend