univariate continuous data
play

Univariate Continuous Data MATH 185 Introduction to Computational - PowerPoint PPT Presentation

Univariate Continuous Data MATH 185 Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math185.html MATH 185 University of California San Diego


  1. Univariate Continuous Data MATH 185 – Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ ∼ eariasca/math185.html MATH 185 – University of California San Diego – Ery Arias-Castro 1 / 30

  2. Lung dysfunction in workers in the detergent industry We consider the following dataset (quoted in Larsen & Marx, exercise 5.3.1). > str(bacillus) ' data.frame ' : 19 obs. of 1 variable: $ ratio: num 0.61 0.7 0.63 0.76 0.67 0.72 0.64 0.82 0.88 0.82 ... NULL The FEV 1 /V C ratio is a measure of lung capacity. � FEV 1 : forced expiratory volume � V C : vital capacity � Normal FEV 1 /V C ratio is 0.80 This ratio was measured for certain workers in the detergent industry exposed to a Bacillus subtilis enzyme. MATH 185 – University of California San Diego – Ery Arias-Castro 2 / 30

  3. Boxplot A basic plot that helps visualize how the data is spread out. 0.85 0.80 0.75 0.70 0.65 0.60 MATH 185 – University of California San Diego – Ery Arias-Castro 3 / 30

  4. Boxplot � The middle box represents the inter-quartile range and contains the 50% of the data. � The upper edge (hinge) of the box indicates the 75th percentile of the data set � The lower hinge indicates the 25th percentile � The line within the box indicates the median value of the data. � The ends of the vertical lines or ”whiskers” indicate the minimum and maximum data values, unless outliers are present in which case the whiskers extend to a maximum of 1.5 times the inter-quartile range. � The points outside the ends of the whiskers are outliers or suspected outliers. MATH 185 – University of California San Diego – Ery Arias-Castro 4 / 30

  5. Boxplot The following table provides similar information > summary(ratio) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.6100 0.7100 0.7800 0.7663 0.8350 0.8800 MATH 185 – University of California San Diego – Ery Arias-Castro 5 / 30

  6. Histogram The histogram is more detailed – approximates the actual distribution of the data. Histogram of ratio 7 6 5 Frequency 4 3 2 1 0 0.60 0.65 0.70 0.75 0.80 0.85 0.90 ratio MATH 185 – University of California San Diego – Ery Arias-Castro 6 / 30

  7. Histogram � A bin’s width is the range it covers. � A bin’s height is proportional to the number of points that fall within that range. � The histogram is an (piecewise constant) approximation of the population’s probability density function. MATH 185 – University of California San Diego – Ery Arias-Castro 7 / 30

  8. Boxplot v. Histogram > library(UsingR) > simple.hist.and.boxplot(ratio) Histogram of x 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.60 0.65 0.70 0.75 0.80 0.85 MATH 185 – University of California San Diego – Ery Arias-Castro 8 / 30

  9. Main question � Are workers exposed to a bacillus subtilis enzyme more likely to suffer from lung dysfunction? � Since we know what the normal level for the FEV 1 /V C ratio is (0.80), we want to compare the observations to this baseline. MATH 185 – University of California San Diego – Ery Arias-Castro 9 / 30

  10. Testing the Mean – Student’s t -Test � Let µ be the population mean. � Consider H 0 : µ = 0 . 80 versus H 1 : µ < 0 . 80. � The Student t -test rejects for large negative values of T , where n n T = X − µ X = 1 1 S 2 = � � ( X i − X ) 2 S/ √ n , X i , n n − 1 i =1 i =1 MATH 185 – University of California San Diego – Ery Arias-Castro 10 / 30

  11. Testing the Mean – Student’s t -Test > t.test(ratio, mu = 0.8) One Sample t-test data: ratio t = -1.7091, df = 18, p-value = 0.1046 alternative hypothesis: true mean is not equal to 0.8 95 percent confidence interval: 0.7249096 0.8077219 sample estimates: mean of x 0.7663158 The p -value is based on the assumption that the observations are an i.i.d. sample from a normal distribution. MATH 185 – University of California San Diego – Ery Arias-Castro 11 / 30

  12. Quantile-Quantile Plot Helps visualize whether a sample comes from a given translation/scale family of distributions. Here we compare with the normal family. Normal Q−Q Plot 0.85 0.80 Sample Quantiles 0.75 0.70 0.65 0.60 −2 −1 0 1 2 Theoretical Quantiles If the points lie close to the line, then this assumption is reasonable. MATH 185 – University of California San Diego – Ery Arias-Castro 12 / 30

  13. Wilcoxon Sign Test � Let m be the population median. � Consider H 0 : m = m 0 versus H 1 : m < m 0 . – The Wilcoxon sign test rejects for small values of N + where N + = # { i : X i > m 0 } – In fact, for large n , the following statistic Z is approximately standard normal Z = N + + N 0 / 2 − n/ 2 , N 0 = # { i : X i = m 0 } � n/ 4 � Here we get a p-value of 0.4092729. MATH 185 – University of California San Diego – Ery Arias-Castro 13 / 30

  14. Wilcoxon Signed Rank Test � Let F be the population’s cumulative distribution function, that we assume symmetric. Let m be the median of F . � Say we want to test H 0 : m = m 0 versus H 1 : m < m 0 . – The Wilcoxon signed-rank test rejects for small values of W where n � W = Y i R i , R i = rank( | X i − m 0 | ) , Y i = sign( X i − m 0 ) i =1 (Actually, R returns W − n ( n + 1) / 4.) – In fact, for large n , the following statistic Z is approximately standard normal W − n ( n + 1) / 4 Z = � n ( n + 1)(2 n + 1) / 24 � Here we get p-value = 0.07084 MATH 185 – University of California San Diego – Ery Arias-Castro 14 / 30

  15. � MATH 185 – University of California San Diego – Ery Arias-Castro 15 / 30

  16. � MATH 185 – University of California San Diego – Ery Arias-Castro 16 / 30

  17. � MATH 185 – University of California San Diego – Ery Arias-Castro 17 / 30

  18. � MATH 185 – University of California San Diego – Ery Arias-Castro 18 / 30

  19. � MATH 185 – University of California San Diego – Ery Arias-Castro 19 / 30

  20. � MATH 185 – University of California San Diego – Ery Arias-Castro 20 / 30

  21. � MATH 185 – University of California San Diego – Ery Arias-Castro 21 / 30

  22. � MATH 185 – University of California San Diego – Ery Arias-Castro 22 / 30

  23. � MATH 185 – University of California San Diego – Ery Arias-Castro 23 / 30

  24. � MATH 185 – University of California San Diego – Ery Arias-Castro 24 / 30

  25. � MATH 185 – University of California San Diego – Ery Arias-Castro 25 / 30

  26. � MATH 185 – University of California San Diego – Ery Arias-Castro 26 / 30

  27. � MATH 185 – University of California San Diego – Ery Arias-Castro 27 / 30

  28. � MATH 185 – University of California San Diego – Ery Arias-Castro 28 / 30

  29. � MATH 185 – University of California San Diego – Ery Arias-Castro 29 / 30

  30. � MATH 185 – University of California San Diego – Ery Arias-Castro 30 / 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend