inferential statistics
play

Inferential Statistics Chapters 6 &7 - PDF document

5/1/2017 Overview IMGD 2905 Use simple statistics to infer population parameters Inferential Statistics Chapters 6 &7 http://3.bp.blogspot.com/_94E2PdKwaXE/S-xQRuoiKAI/AAAAAAAAABY/xvDRcG_Mcj0/s1600/120909_0159_1.png Overview Outline


  1. 5/1/2017 Overview IMGD 2905 • Use simple statistics to infer population parameters Inferential Statistics Chapters 6 &7 http://3.bp.blogspot.com/_94E2PdKwaXE/S-xQRuoiKAI/AAAAAAAAABY/xvDRcG_Mcj0/s1600/120909_0159_1.png Overview Outline • Use simple statistics to infer population parameters • Overview (done) • Foundation (next) • Confidence Intervals • Hypothesis Testing http://3.bp.blogspot.com/_94E2PdKwaXE/S-xQRuoiKAI/AAAAAAAAABY/xvDRcG_Mcj0/s1600/120909_0159_1.png Inferential statistics Dice Rolling (1 of 4) Dice Rolling (1 of 4) • Have 1d6, sample (i.e., roll 1 die) • Have 1d6, sample (i.e., roll 1 die) • What is probability distribution of values? • What is probability distribution of values? “Square“ distribution http://www.investopedia.com/articles/06/probabilitydistribution.asp 1

  2. 5/1/2017 Dice Rolling (2 of 4) Dice Rolling (2 of 4) • Have 1d6, sample twice and sum (i.e., roll 2 • Have 1d6, sample twice and sum (i.e., roll 2 dice) dice) • What is probability distribution of values? • What is probability distribution of values? “Triangle“ distribution http://www.investopedia.com/articles/06/probabilitydistribution.asp Dice Rolling (3 of 4) Dice Rolling (3 of 4) • Have 1d6, sample thrice and sum (i.e., roll 3 • Have 1d6, sample thrice and sum (i.e., roll 3 dice) dice) • What is probability distribution of values? • What is probability distribution of values? What’s happening to the shape? http://www.investopedia.com/articles/06/probabilitydistribution.asp Dice Rolling (3 of 4) Dice Rolling (4 of 4) • Same holds for experiments with dice (i.e., • Have 1d6, sample thrice and sum (i.e., roll 3 observing sample sum and mean of dice rolls) dice) • What is probability distribution of values? What’s happening to the shape? http://www.muelaner.com/uncertainty-of-measurement/ Ok, neat – but what about experiments with other distributions? 2

  3. 5/1/2017 Sampling Why do we care about sample means Distributions following normal distribution? • With “enough” • What if we had only a samples, looks “bell- sample mean and no shaped”  Normal! measure of spread • How many is – e.g., mean rank for enough? Overwatch is 50 – 30 (15 if symmetric • What can we say about distribution) • Central Limit population mean? Theorem – Sum of independent variables tends towards Normal distribution http://flylib.com/books/2/528/1/html/2/images/figu115_1.jpg Why do we care about sample means Why do we care about sample means following normal distribution? following normal distribution? • What if we had only a • Remember this? sample mean and no measure of spread – e.g., mean rank for Overwatch is 50 • What can we say about population mean? – Not a whole lot! – Yes, population mean http://www.six-sigma-material.com/images/PopSamples.GIF could be 50. But could be 100. How likely are each? Sample mean • Allows us to predict range With mean and  No idea! Population mean? standard deviation to bound population mean Why do we care about sample means Outline following normal distribution? • Overview (done) • Foundation (done) • Confidence Intervals (next) • Hypothesis Testing Sample mean Probable range of population mean 3

  4. 5/1/2017 Sampling Error (1 of 2) Sampling Error (1 of 2) • • Population of 200 game Population of 200 game times times Mean μ = 69.637 Mean μ = 69.637 Std Dev σ = 10.411 Std Dev σ = 10.411 • • Experiment w/20 samples Experiment w/20 samples – Each 15 game times – Each 15 game times • • Observations? Observations? – Statistics differ each time! – Sometimes higher, sometimes lower than population (μ , σ) – Sample range varies a lot more than sample standard deviation – Population mean always within sample range This variation  Sampling error Sampling Error (2 of 2) Standard Error (1 of 2) • Error from estimating population parameters from sample statistics • Amount sample means • Exact error often cannot be known (do not vary from sample to sample know population parameters) • Also likelihood that • But size of error based on: sample statistic is near population parameter – Variation in population (s) itself – more variation, – Depends upon sample size more sample statistic variation (N) – Sample size (N) – larger sample, lower error – Depends upon standard deviation • Q: Why can’t we just make sample size super large ? • How much does it vary?  Standard error (Example next) Standard Error (2 of 2) Standard Error (2 of 2) standard error, 100 samples, N=3 standard error, 100 samples, N=3 standard error, 100 samples, N=20 For N = 20: What will happen to What will happen to x’s? bars for N = 20? What will happen to dots? Estimate population parameter  confidence interval http://www.biostathandbook.com/standarderror.html http://www.biostathandbook.com/standarderror.html 4

  5. 5/1/2017 Confidence Interval Confidence Interval for the Mean • Range of values with specific certainty that population parameter is within • Probability of  in interval • Say,  = 0.1. Could do k – e.g., 90% confidence interval for mean League of Legends [c 1 ,c 2 ] experiments, find sample – P(c 1 <  < c 2 ) = 1-  means, sort match duration: [28.5 minutes, 32.5 minutes] – Cumulative distribution [c1, c2] is confidence interval  is significance level • Interval from distribution: 100(1-  ) is confidence level – Lower bound: 5% • Typically want  small so • – Upper bound: 95% Have sample of durations • Compute interval containing confidence level 90%,  90% confidence interval population duration 95% or 99% (more on (with 90% confidence) effect later) • In general: probability of  in interval [c 1 ,c 2 ] We have to do k experiments, each of size n ? 28.5 32.5 http://www.comfsm.fm/~dleeling/statistics/notes009_normalcurve90.png Confidence Interval Estimate t distribution • Estimate interval from 1 • Looks like standard normal, but bit “squashed” experiment/sample, size n • Compute sample mean, • Gets more squashed as n gets smaller sample standard error (SE) • Note, can use • Multiply SE by t distribution • Add/subtract from sample standard normal (z mean distribution) when  Confidence interval large enough sample e.g., mean 30.5 size (N = 30+) SE x t is 2 • Ok, what is t distribution? 30.5 - 2 = 28.5 30.5 + 2 = 32.5 – Parameterized by  and n aka student’s t distribution (“student” [28.5, 32.5] was anonymous name used when published by William Gosset) http://ci.columbia.edu/ci/premba_test/c0331/images/s7/6317178747.gif 28.5 32.5 Meaning of Confidence Interval (  ) Confidence Interval Example  (Sorted) If 100 experiments and Game Time • �̅ = 3.90, stddev s =0.95, n =32 confidence level is 90%: 90 cases interval includes  , 1.9 3.9 • A 90% confidence interval (  is 0.1) for in 10 cases not include  2.7 3.9 f(x) population mean (  ): 2.8 4.1 2.8 4.1 Lookup 1.645 in 3.90 ± �.���×�.�� 2.8 4.2 table, or �� 2.9 4.2 =TINV(0.1,31) 3.1 4.4 = [3.62, 4.19] Includes  ? Experiment/Sample 3.1 4.5 1 yes 3.2 4.5 2 yes 3.2 4.8 • With 90% confidence,  in that 3.3 4.9 3 no interval. Chance of error 10%. 3.4 5.1 … e.g., 3.6 5.1  =0.1 • But, what does that mean? 100 yes 3.7 5.3 yes > 100 (1-  ) Total 90 3.8 5.6 no < 100  (See next slide for depiction of meaning) Total 10 3.9 5.9 5

  6. 5/1/2017 How does Confidence Interval Size How does Confidence Interval Change Change? (1 of 2)? • With number of samples (N) • What happens to confidence interval • With confidence level (  ) when sample larger (N increases)? – Hint: think about Standard Error How does Confidence Interval Change How does Confidence Interval Change (1 of 2)? (2 of 2)? • 90% CI = [6.5, 9.4] • What happens to – 90% chance population value is between 6.5, 9.4 confidence interval • 95% CI = [6.1, 9.8] when sample larger ( N – 95% chance population value is between 6.1, 9.8 increases)? • Why is interval wider when we are “more” confident? – Hint: think about Standard Error How does Confidence Interval Change Using Confidence Interval (1 of 2) (2 of 2)? • 90% CI = [6.5, 9.4] • Indicator of spread  Error bars – 90% chance population value is between 6.5, 9.4 • CI can be more informative than standard deviation • 95% CI = [6.1, 9.8]  indicates range of population parameter (make sure – 95% chance population value is between 6.1, 9.8 30+ samples!) • Why is interval wider when we are “more” confident? http://vassarstats.net/textbook/f1002.gif 6

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend