Introduction to probability distributions and testing Lecture 2 - PowerPoint PPT Presentation

1 Introduction to probability distributions and testing Lecture 2

Summary of this week ■ This week we introduce the normal distribution and how to estimate population means from samples using confidence intervals. ■ In RStudio we will use many built-in functions for calculating summary statistics, probabilities and critical values (quantiles). 3

Learning objectives for the week By actively following the lecture and practical and carrying out the independent study the successful student will be able to: ■ Explain the properties of ‘normal distributions’ and their use in statistics ■ Define, select and calculate with R probabilities, quantiles and confidence intervals 4

5 What is a probability distribution? Variable: Some information about an individual (the property that we measure) A formula or a table used to assign probabilities to each possible value of a random variable X Example: You flip a coin twice. What’s the probability of getting heads both times? 4 different outcomes: Probability of getting a head: ç This is a probability distribution!

What is a probability distribution? A formula or a table used to assign probabilities to each possible value of a random variable X Example: You flip a coin twice. What’s the probability of getting heads both times? 0.6 0.5 Probability 0.4 ç This is a 0.3 probability 0.2 distribution! 0.1 0 0 1 2 Number of heads

7 What is a probability distribution? A probability distribution may be either discrete or continuous: z Discrete distribution means that X can assume one of a finite number of values (e.g. Binomial – what the previous example was) z Continuous distribution means that X can assume one of an infinite number of different values (e.g. normal, uniform, lots of others ->) Ø We will focus on normal distributions in this module Ø Why? Because normal distribution are central to most statistical tests we do

What is the normal distribution? Normal distribution Density f(x) – the height for a (a.k.a Gaussian given value on the x axis distribution, bell- shaped curve) Symmetrical distribution used to compute probabilities for continuous data. Values the variable (x) can take (a.k.a quantiles) 8

All normal distributions have these properties… Pierce (2017) 'Normal Distribution', Math Is Fun, Available at: <http://www.mathsisfun.com/data/standard-normal-distribution.html>

10 All normal distributions have these properties… Standard deviation: measure of how spread out data points 68% of observations within 1 SD of the mean 95% of observations are within 1.96 SD of the mean

11 All normal distributions have these properties… Standard deviation: measure of how spread out data points x = 1, 2, 3, 4, 5, 6 Mean ( !̅ ) = 3.5 n = 6 ⎷ (1-3.5) 2 + (2-3.5) 2 + (3-3.5) 2 + (4-3.5) 2 + (5-3.5) 2 + (6-3.5) 2 6 ⎷ 2.92 = 1.7

12 All normal distributions have these properties… 68% of observations within 1 SD of the mean (e.g 68% within 3.5 ± 1.7) 95% of observations are within 1.96 SD of the mean (e.g 95% within 3.5 ± 3.3)

…but they vary based on their PARAMETERS 13 Normal distributions have two parameters: Parameter 1: Parameter 2: σ (standard deviation) μ (mean) Understand: only that parameters alter shape

14 Why? (without going into too much detail) Equation (function) for normal distribution: ‘Density’ (the height for a x = given value of variable given value on the x $ = mean axis) % = Standard deviation e = base of the natural logarithm π = constant (pi) (only terms that alter the density – all other terms are already defined/constants)

Extracting probabilities using R qnorm() to calc Pnorm() to calc area (probability) quantile (value of x) pnorm – maps value to probability qnorm – maps probability to value 15

Extracting probabilities using R I.Q. in the U.K. is normally distributed μ = 100, σ = 15 What is probability of an individual having IQ >115?” mu <- 100 sd <- 15 IQ <- 115 pnorm(IQ, mu, sd, lower.tail = FALSE) Whether you are The value The The interested in the you are standard mean lower or upper part interested in deviation of the distribution (here – upper part)

17 Extracting probabilities using R I.Q. in the U.K. is normally distributed μ = 100, σ = 15 What is probability of an individual having IQ >115?” mu <- 100 sd <- 15 IQ <- 115 pnorm(IQ, mu, sd, lower.tail = FALSE) [1] 0.1586553 Answer = 15.9 %

18 Extracting probabilities using R I.Q. in the U.K. is normally distributed μ = 100, σ = 15 What I.Q. value are 0.159 (15.9%) of people above? mu <- 100 sd <- 15 P <- 0.159 qnorm(P, mu, sd, lower.tail = FALSE) [1] 115

19 MINI SUMMARY ●Normal distributions are continuous, symmetrical distributions ●They have certain set properties Ø Same mode, median & mean Ø 95% of observations are within 1.96 SD of the mean ●They differ according to their parameters which are: mean & standard deviation ●In R, we can extract: ○ Cumulative probabilities using pnorm() ○ Quantiles using qnorm()

Using the normal distribution We often want to know information about populations: e.g mean value μ (mu) in whole population (this is a true value) But we can only measure on a sample of that population e.g Sample mean ( !̅, said x bar) in sample We can use the properties of the x normal distribution to help us estimate population parameters from samples 20

Sampling distribution of the mean 21 Population mean = 100 sd = 15 Mean of the sample will differ Sample of from the population mean by size n chance (unlikely to be 100 spot on) Example: I.Q. in the U.K. is normally distributed μ = 100, σ = 15

Sampling distribution of the mean 22 Population Many samples of size n mean = 100 sd = 15 Plot the mean of all samples = sampling distribution of the mean Example: I.Q. in the U.K. is normally distributed μ = 100, σ = 15

Sampling distribution of the mean Population Sampling distribution distribution The sampling distribution of the mean has the same mean as the parent population

Sampling distribution of the mean Population Population Sampling distribution distribution distribution The sampling distribution has different (and lower) standard deviation than the parent population Standard deviation of the sampling distribution = called the standard error of the mean (often shortened to ‘standard error’ (or just ‘se’))

Sampling distribution of the mean se = sd /√n Population Population Sampling distribution distribution distribution Standard deviation = Standard error = 15 / 15 √n = 4.7

We want to know about populations... …but only have samples! We use the samples to infer information about the population. So we need an idea of how confident we can be in our inferences…. (is our sample mean any good?) Confidence intervals do this!

Confidence intervals ■ How confident can we be that our sample mean is a good estimate of the true value? ■ Confidence intervals give the highest and lowest likely values ■ Likely means 95% (most common) , 99%, 99.9% e.g. the mean I.Q. of the population of the U.K is 100 ( ± 4.7) = 95% certain that the mean I.Q. of the U.K’s population is between 95.3 and 104.7 27

Confidence intervals on the mean Rely on fact that 95% of observations fall within 1.96 s.d. of the mean in a normal distribution

Confidence intervals on the mean For LARGE samples !̅ ± 1.96×3. 4. i.e., 95% certain population mean is between !̅ − 1.96×3. 4. and !̅ + 1.96×3. 4. WARNING: This method of calculating CI’s requires large sample sizes (~30+) – assumes a normal distribution

Confidence intervals on the mean For LARGE samples !̅ ± 1.96×3. 4. Where does this number come from? Value of x (quantile) when P=0.975: > qnorm(0.975) [1] 1.959964 P = 0.975 P = 0.025 Quantile = 1.96 Quantile = -1.96

Why qnorm(0.975) and not qnorm(0.95)? 2-tailed test: Allots half of your alpha to testing the statistical significance in one direction and half of your alpha to testing statistical significance in the other direction (0.05 total in both, 0.025 in each tail Regardless of the direction of the relationship you hypothesize, you are testing for the possibility of the relationship in both directions 1- 0.025=0.975 31

What is the K m for Arginine-tRNA synthetase when ATP is the substrate? 100 measures of K m in μM Use CI’s to estimate the true K m value… Km is the concentration of substrate which permits the enzyme to achieve half Vmax 32

Confidence intervals for LARGE samples K m for Arginine-tRNA synthetase km <- read.table("../data/km.txt", header = FALSE) hist(km$V1) 33

Confidence intervals for large samples m <- mean(km$V1);m • Mean [1] 255 se <- sd(km$V1)/sqrt(length(km$V1));se • Standard error (=sd/√n) [1] 3.919647 • quantile q <- qnorm(0.975);q [1] 1.959964 • amount to add/subtract amount <- round(q*se,1) [1] 7.7 • Upper confidence limit m + amount [1] 262.7 ( !̅ + 1.96×3. 4. ) • Lower confidence limit m - amount [1] 247.3 ( !̅ − 1.96×3. 4. ) 34

Introduction to probability distributions and testing Lecture 2 - PowerPoint PPT Presentation

1 Introduction to probability distributions and testing Lecture 2 Summary of this week This week we introduce the normal distribution and how to estimate population means from samples using confidence intervals. In RStudio we will use

Lecture 5: Probability Distributions Random Variables Probability Distributions

Formal Modeling in Cognitive Science 1 Special Probability Distributions Uniform Distribution

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

Formal Modeling in Cognitive Science 1 Distributions Lecture 20: Joint, Marginal, and Conditional

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter II.2: Basic Probability Theory and Statistics 1. What is a probability? 1.1. Probability

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Input Distributions Reading: Chapter 6 in Law Input Distributions Overview Probability Theory

Unit 2: Probability and distributions 3. Normal and binomial distributions GOVT 3990 - Spring

Gov 2000: 2. Random Variables and Probability Distributions Matthew Blackwell Fall 2016 1 / 56

Outline 1. Bayes Law L7: Probability Basics 2. Probability distributions CS 344R/393R:

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Common Probability Distributions Several simple probability distributions are useful in may

Welcome Our plans to make Bicknacre future- ready with the fastest, most dependable

From the Spectral Stokes solvers to the Stokes eigenmodes in square/cube, until

campaign at GANIL Paddy Regan (University of Surrey & National Physical Laboratory, UK)

ShEx by example RDF Validation tutorial Jose Emilio Labra Gayo Eric Prud'hommeaux WESO Research

Markov Decision Processes: Biosens II E. Jrgensen & Lars R. Nielsen Department of Genetics

Day 3 ! " 5 slide synopsis of last lecture A: Sequence + structure " Covariance Models

Agenda Dairy Cattle Milk Recording WG Aarhus, May 28, 2013 Agenda Opening, Kai, Pavel, Franz,

Boolean Analysis Unveils Universal Rules in the Microbiome Data Daniella Vo, Shayal Singh, Sara

Introduction to probability distributions and testing Lecture 2 - PowerPoint PPT Presentation

1 Introduction to probability distributions and testing Lecture 2 Summary of this week This week we introduce the normal distribution and how to estimate population means from samples using confidence intervals. In RStudio we will use

Lecture 5: Probability Distributions Random Variables Probability Distributions

Formal Modeling in Cognitive Science 1 Special Probability Distributions Uniform Distribution

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

Formal Modeling in Cognitive Science 1 Distributions Lecture 20: Joint, Marginal, and Conditional

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter II.2: Basic Probability Theory and Statistics 1. What is a probability? 1.1. Probability

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Input Distributions Reading: Chapter 6 in Law Input Distributions Overview Probability Theory

Unit 2: Probability and distributions 3. Normal and binomial distributions GOVT 3990 - Spring

Gov 2000: 2. Random Variables and Probability Distributions Matthew Blackwell Fall 2016 1 / 56

Outline 1. Bayes Law L7: Probability Basics 2. Probability distributions CS 344R/393R:

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Common Probability Distributions Several simple probability distributions are useful in may

Welcome Our plans to make Bicknacre future- ready with the fastest, most dependable

From the Spectral Stokes solvers to the Stokes eigenmodes in square/cube, until

campaign at GANIL Paddy Regan (University of Surrey &amp; National Physical Laboratory, UK)

ShEx by example RDF Validation tutorial Jose Emilio Labra Gayo Eric Prud'hommeaux WESO Research

Markov Decision Processes: Biosens II E. Jrgensen &amp; Lars R. Nielsen Department of Genetics

Day 3 ! &quot; 5 slide synopsis of last lecture A: Sequence + structure &quot; Covariance Models

Agenda Dairy Cattle Milk Recording WG Aarhus, May 28, 2013 Agenda Opening, Kai, Pavel, Franz,

Boolean Analysis Unveils Universal Rules in the Microbiome Data Daniella Vo, Shayal Singh, Sara

campaign at GANIL Paddy Regan (University of Surrey & National Physical Laboratory, UK)

Markov Decision Processes: Biosens II E. Jrgensen & Lars R. Nielsen Department of Genetics

Day 3 ! " 5 slide synopsis of last lecture A: Sequence + structure " Covariance Models