announcements u 2 p
play

Announcements U 2: P - PowerPoint PPT Presentation

Announcements U 2: P L 3: N D P Lab 3a due tomorrow at 6 PM S


  1. Announcements U  2: P    L  3: N  D  P  Lab 3a due tomorrow at 6 PM S  101 Project Proposal Nicole Dalzell May 21, 2015 Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 2 / 37 Project Project Project Data resources: Data and GIS Services http://guides.library.duke.edu/stat101 Project instructions posted: https://stat.duke.edu/ ∼ nmd16/courses/Summer15/sta101.001-1/ problem sets/project.pdf Think about research questions to explore. Decide if you’ll be collecting your own observational data, conduct an experiment, or use previously collected data (from a published study or public database). Brainstorm due Tuesday May 25. Proposal due Thurday, June 4. Project due Thursday, June 18. Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 3 / 37 Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 4 / 37

  2. Project Project Data resources: Data and GIS Services - office hours Data resources: Hillary Mason & Bit.ly For questions related to finding data and getting it into R only, not statistical analysis questions. http://bitly.com/bundles/hmason/1 http://library.duke.edu/data/about/schedule.html Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 5 / 37 Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 6 / 37 Project Project Data resources: Reddit Data resources: DASL http://www.reddit.com/r/datasets http://lib.stat.cmu.edu/DASL/ Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 7 / 37 Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 8 / 37

  3. Project Project Data resources: Citizen Statistician Data resources http://citizen-statistician.org/2012/11/07/data-sets-a-list-in-flux/ and many others, get creative! Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 9 / 37 Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 10 / 37 Project The Normal Distribution Getting Data in R Participation question In lab you have been given nicely formatted data that can be directly Scores on a standardized test are normally distributed with a mean of 100 and a standard deviation of 20. If these scores are converted to loaded into R using either load or source. This will rarely happen standard normal Z scores, which of the following statements will be outside of this class, and for your project you will need to convert your correct? data into a format R can read. Ideally use a plaintext format: csv, tab delimited, etc. (read.csv, (a) Both the mean and median score will equal 0. read.table, read.delim) (b) The mean will equal 0, but the median cannot be determined. Avoid proprietary formats (usually doable but require extra work) (c) The mean of the z-scores will equal 100. Programs like Excel are useful to convert and clean up data (d) The mean of the z-scores will equal 5. Find your data early, if you run into trouble ask for help Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 11 / 37 Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 12 / 37

  4. The Normal Distribution The Normal Distribution Approximating percentiles Percentiles Percentile is the percentage of observations that fall below a Approximately what percent of students score below 1800 on the SAT? given data point. The mean SAT score is 1500, with a standard deviation of 300 Graphically, percentile is the area below the probability (Hint: Use the 68-95-99.7% rule.) distribution curve to the left of that observation. 600 900 1200 1500 1800 2100 2400 600 900 1200 1500 1800 2100 2400 Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 13 / 37 Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 14 / 37 The Normal Distribution The Normal Distribution Calculating percentiles - using computation Z-Scores There are many ways to compute percentiles/areas under the curve: Z-Score R: The z-score for a data value, x i , is > pnorm(1800, mean = 1500, sd = 300) z = x i − ¯ x [1] 0.8413447 s Applet: http://www.socr.ucla.edu/htmls/SOCR Distributions.html Values farther from 0 are more extreme. A z-score puts values on a common scale A z-score is the number of standard deviations a value falls from the mean 95% of all z-scores fall between -2 and 2 . z-scores beyond -2 or 2 can be considered extreme Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 15 / 37 Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 16 / 37

  5. The Normal Distribution The Normal Distribution Calculating percentiles - using tables Participation question Second decimal place of Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 Z Which of the following is false? 0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 (a) Z scores are helpful for determining how unusual a data point is 0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 0.3 compared to the rest of the data in the distribution. 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 (b) Majority of Z scores in a right skewed distribution are negative. 0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 0.6 (c) Regardless of the shape of the distribution (symmetric vs. 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 skewed) the Z score of the mean is always 0. 0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 (d) In a normal distribution, Q1 and Q3 are more than one SD away 0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 from the mean. 1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 Z-score = 1 Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 17 / 37 Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 18 / 37 The Normal Distribution The Normal Distribution Example What percent of the standard normal distribution is above Z = 0 . 82? The average daily temperature in June in LA is 77 F, with a standard Choose the closest answer. deviation of 5 degrees. Suppose the temperatures in June closely follow a normal distribution. What is the probability of observing a (a) 79.4% temperature of at most 83 F on a randomly chosen day in June? ) (b) 20.6% (c) 82% T ∼ N ( mean = 77 , sd = 5 ) (d) 18% (e) Need to be provided the mean and the standard deviation of the distribution in order to be able to solve this problem. � Z ≤ 83 − 77 � P ( T ≤ 83 ) = P = P ( Z ≤ 1 . 2 ) 5 Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 19 / 37 Statistics 101 (Nicole Dalzell) U2 - L2: Normal distribution May 21, 2015 20 / 37

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend