introduction to probability distributions and testing
play

Introduction to probability distributions and testing Lecture 2 - PowerPoint PPT Presentation

1 Introduction to probability distributions and testing Lecture 2 Summary of this week This week we introduce the normal distribution and how to estimate population means from samples using confidence intervals. In RStudio we will use


  1. 1 Introduction to probability distributions and testing Lecture 2

  2. Summary of this week ■ This week we introduce the normal distribution and how to estimate population means from samples using confidence intervals. ■ In RStudio we will use many built-in functions for calculating summary statistics, probabilities and critical values (quantiles). 3

  3. Learning objectives for the week By actively following the lecture and practical and carrying out the independent study the successful student will be able to: ■ Explain the properties of ‘normal distributions’ and their use in statistics ■ Define, select and calculate with R probabilities, quantiles and confidence intervals 4

  4. 5 What is a probability distribution? Variable: Some information about an individual (the property that we measure) A formula or a table used to assign probabilities to each possible value of a random variable X Example: You flip a coin twice. What’s the probability of getting heads both times? 4 different outcomes: Probability of getting a head: ç This is a probability distribution!

  5. What is a probability distribution? A formula or a table used to assign probabilities to each possible value of a random variable X Example: You flip a coin twice. What’s the probability of getting heads both times? 0.6 0.5 Probability 0.4 ç This is a 0.3 probability 0.2 distribution! 0.1 0 0 1 2 Number of heads

  6. 7 What is a probability distribution? A probability distribution may be either discrete or continuous: z Discrete distribution means that X can assume one of a finite number of values (e.g. Binomial – what the previous example was) z Continuous distribution means that X can assume one of an infinite number of different values (e.g. normal, uniform, lots of others ->) Ø We will focus on normal distributions in this module Ø Why? Because normal distribution are central to most statistical tests we do

  7. What is the normal distribution? Normal distribution Density f(x) – the height for a (a.k.a Gaussian given value on the x axis distribution, bell- shaped curve) Symmetrical distribution used to compute probabilities for continuous data. Values the variable (x) can take (a.k.a quantiles) 8

  8. All normal distributions have these properties… Pierce (2017) 'Normal Distribution', Math Is Fun, Available at: <http://www.mathsisfun.com/data/standard-normal-distribution.html>

  9. 10 All normal distributions have these properties… Standard deviation: measure of how spread out data points 68% of observations within 1 SD of the mean 95% of observations are within 1.96 SD of the mean

  10. 11 All normal distributions have these properties… Standard deviation: measure of how spread out data points x = 1, 2, 3, 4, 5, 6 Mean ( !̅ ) = 3.5 n = 6 ⎷ (1-3.5) 2 + (2-3.5) 2 + (3-3.5) 2 + (4-3.5) 2 + (5-3.5) 2 + (6-3.5) 2 6 ⎷ 2.92 = 1.7

  11. 12 All normal distributions have these properties… 68% of observations within 1 SD of the mean (e.g 68% within 3.5 ± 1.7) 95% of observations are within 1.96 SD of the mean (e.g 95% within 3.5 ± 3.3)

  12. …but they vary based on their PARAMETERS 13 Normal distributions have two parameters: Parameter 1: Parameter 2: σ (standard deviation) μ (mean) Understand: only that parameters alter shape

  13. 14 Why? (without going into too much detail) Equation (function) for normal distribution: ‘Density’ (the height for a x = given value of variable given value on the x $ = mean axis) % = Standard deviation e = base of the natural logarithm π = constant (pi) (only terms that alter the density – all other terms are already defined/constants)

  14. Extracting probabilities using R qnorm() to calc Pnorm() to calc area (probability) quantile (value of x) pnorm – maps value to probability qnorm – maps probability to value 15

  15. Extracting probabilities using R I.Q. in the U.K. is normally distributed μ = 100, σ = 15 What is probability of an individual having IQ >115?” mu <- 100 sd <- 15 IQ <- 115 pnorm(IQ, mu, sd, lower.tail = FALSE) Whether you are The value The The interested in the you are standard mean lower or upper part interested in deviation of the distribution (here – upper part)

  16. 17 Extracting probabilities using R I.Q. in the U.K. is normally distributed μ = 100, σ = 15 What is probability of an individual having IQ >115?” mu <- 100 sd <- 15 IQ <- 115 pnorm(IQ, mu, sd, lower.tail = FALSE) [1] 0.1586553 Answer = 15.9 %

  17. 18 Extracting probabilities using R I.Q. in the U.K. is normally distributed μ = 100, σ = 15 What I.Q. value are 0.159 (15.9%) of people above? mu <- 100 sd <- 15 P <- 0.159 qnorm(P, mu, sd, lower.tail = FALSE) [1] 115

  18. 19 MINI SUMMARY ●Normal distributions are continuous, symmetrical distributions ●They have certain set properties Ø Same mode, median & mean Ø 95% of observations are within 1.96 SD of the mean ●They differ according to their parameters which are: mean & standard deviation ●In R, we can extract: ○ Cumulative probabilities using pnorm() ○ Quantiles using qnorm()

  19. Using the normal distribution We often want to know information about populations: e.g mean value μ (mu) in whole population (this is a true value) But we can only measure on a sample of that population e.g Sample mean ( !̅, said x bar) in sample We can use the properties of the x normal distribution to help us estimate population parameters from samples 20

  20. Sampling distribution of the mean 21 Population mean = 100 sd = 15 Mean of the sample will differ Sample of from the population mean by size n chance (unlikely to be 100 spot on) Example: I.Q. in the U.K. is normally distributed μ = 100, σ = 15

  21. Sampling distribution of the mean 22 Population Many samples of size n mean = 100 sd = 15 Plot the mean of all samples = sampling distribution of the mean Example: I.Q. in the U.K. is normally distributed μ = 100, σ = 15

  22. Sampling distribution of the mean Population Sampling distribution distribution The sampling distribution of the mean has the same mean as the parent population

  23. Sampling distribution of the mean Population Population Sampling distribution distribution distribution The sampling distribution has different (and lower) standard deviation than the parent population Standard deviation of the sampling distribution = called the standard error of the mean (often shortened to ‘standard error’ (or just ‘se’))

  24. Sampling distribution of the mean se = sd /√n Population Population Sampling distribution distribution distribution Standard deviation = Standard error = 15 / 15 √n = 4.7

  25. We want to know about populations... …but only have samples! We use the samples to infer information about the population. So we need an idea of how confident we can be in our inferences…. (is our sample mean any good?) Confidence intervals do this!

  26. Confidence intervals ■ How confident can we be that our sample mean is a good estimate of the true value? ■ Confidence intervals give the highest and lowest likely values ■ Likely means 95% (most common) , 99%, 99.9% e.g. the mean I.Q. of the population of the U.K is 100 ( ± 4.7) = 95% certain that the mean I.Q. of the U.K’s population is between 95.3 and 104.7 27

  27. Confidence intervals on the mean Rely on fact that 95% of observations fall within 1.96 s.d. of the mean in a normal distribution

  28. Confidence intervals on the mean For LARGE samples !̅ ± 1.96×3. 4. i.e., 95% certain population mean is between !̅ − 1.96×3. 4. and !̅ + 1.96×3. 4. WARNING: This method of calculating CI’s requires large sample sizes (~30+) – assumes a normal distribution

  29. Confidence intervals on the mean For LARGE samples !̅ ± 1.96×3. 4. Where does this number come from? Value of x (quantile) when P=0.975: > qnorm(0.975) [1] 1.959964 P = 0.975 P = 0.025 Quantile = 1.96 Quantile = -1.96

  30. Why qnorm(0.975) and not qnorm(0.95)? 2-tailed test: Allots half of your alpha to testing the statistical significance in one direction and half of your alpha to testing statistical significance in the other direction (0.05 total in both, 0.025 in each tail Regardless of the direction of the relationship you hypothesize, you are testing for the possibility of the relationship in both directions 1- 0.025=0.975 31

  31. What is the K m for Arginine-tRNA synthetase when ATP is the substrate? 100 measures of K m in μM Use CI’s to estimate the true K m value… Km is the concentration of substrate which permits the enzyme to achieve half Vmax 32

  32. Confidence intervals for LARGE samples K m for Arginine-tRNA synthetase km <- read.table("../data/km.txt", header = FALSE) hist(km$V1) 33

  33. Confidence intervals for large samples m <- mean(km$V1);m • Mean [1] 255 se <- sd(km$V1)/sqrt(length(km$V1));se • Standard error (=sd/√n) [1] 3.919647 • quantile q <- qnorm(0.975);q [1] 1.959964 • amount to add/subtract amount <- round(q*se,1) [1] 7.7 • Upper confidence limit m + amount [1] 262.7 ( !̅ + 1.96×3. 4. ) • Lower confidence limit m - amount [1] 247.3 ( !̅ − 1.96×3. 4. ) 34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend