Marc Mehlman
Distributions
Marc H. Mehlman
marcmehlman@yahoo.com
University of New Haven
Marc Mehlman (University of New Haven) Distributions 1 / 49
Distributions Marc H. Mehlman marcmehlman@yahoo.com University of - - PowerPoint PPT Presentation
Distributions Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Marc Mehlman Marc Mehlman (University of New Haven) Distributions 1 / 49 Table of Contents Distributions 1 Discrete Random Variables 2 Common Discrete
Marc Mehlman
University of New Haven
Marc Mehlman (University of New Haven) Distributions 1 / 49
Marc Mehlman
1
2
3
4
5
6
7
8
9
Marc Mehlman (University of New Haven) Distributions 2 / 49
Marc Mehlman
Distributions
Marc Mehlman (University of New Haven) Distributions 3 / 49
Marc Mehlman
Distributions
Marc Mehlman (University of New Haven) Distributions 4 / 49
Marc Mehlman
Distributions
Marc Mehlman (University of New Haven) Distributions 5 / 49
Marc Mehlman
Discrete Random Variables
Marc Mehlman (University of New Haven) Distributions 6 / 49
Marc Mehlman
Discrete Random Variables
Definition Define P(x)
def
= P(X = x) = probability of X = x. P(x) is called the density function. A probability histogram is a relative frequency histogram where the vertical scale shows probabilities instead of relative frequences. Note: that 0 ≤ P(x) ≤ 1 and P(x) = 1. Example
Marc Mehlman (University of New Haven) Distributions 7 / 49
Marc Mehlman
Discrete Random Variables
Theorem (Expectation) Given a discrete random variable, X, µ =
σ2 =
and σ = √ σ2. Notice: If all outcomes of X are equally likely, then summing over all outcomes, µ =
1 N
= 1 N
Definition Unusual values of X lie within 2σ of µ c is an unusually high value of X ⇔ c ≥ µX + 2σX. c is an unusually low value of X ⇔ c ≤ µX − 2σX.
Marc Mehlman (University of New Haven) Distributions 8 / 49
Marc Mehlman
Discrete Random Variables
Marc Mehlman (University of New Haven) Distributions 9 / 49
Marc Mehlman
Common Discrete Distributions
Marc Mehlman (University of New Haven) Distributions 10 / 49
Marc Mehlman
Common Discrete Distributions Discrete Uniform, X ∼ DU(n)
n
Marc Mehlman (University of New Haven) Distributions 11 / 49
Marc Mehlman
Common Discrete Distributions Binomial, X ∼ BIN(n, p)
Bernoulli Distribution, X ∼ BIN(1, p) Model: X = # heads after tossing a coin once, that has a probability of heads on each toss equal to p. Binomial Distribution, X ∼ BIN(n, p) Model: X = # heads after tossing a coin n times, that has a probability of heads on each toss equal to p. Theorem If X ∼ BIN(n, p) and j is a nonnegative integer between 0 and n inclusive P(j) = P(X = j) =
j
Furthermore µX = np, σ2
X = np(1 − p)
and σX =
Marc Mehlman (University of New Haven) Distributions 12 / 49
Marc Mehlman
Common Discrete Distributions Binomial, X ∼ BIN(n, p)
1 probability of getting H · · · H
j
n−j
2 # ways to get j heads is
j
j
j)pj(1−p)n−j Marc Mehlman (University of New Haven) Distributions 13 / 49
Marc Mehlman
Common Discrete Distributions Binomial, X ∼ BIN(n, p)
Marc Mehlman (University of New Haven) Distributions 14 / 49
Marc Mehlman
Common Discrete Distributions Geometric, X ∼ GEO(p)
Marc Mehlman (University of New Haven) Distributions 15 / 49
Marc Mehlman
Common Discrete Distributions Geometric, X ∼ GEO(p)
Marc Mehlman (University of New Haven) Distributions 16 / 49
Marc Mehlman
Common Discrete Distributions Geometric, X ∼ GEO(p)
3!
Marc Mehlman (University of New Haven) Distributions 17 / 49
Marc Mehlman
Continuous Distributions
Marc Mehlman (University of New Haven) Distributions 18 / 49
Marc Mehlman
Continuous Distributions 40
Marc Mehlman (University of New Haven) Distributions 19 / 49
Marc Mehlman
Common Continuous Distributions
Marc Mehlman (University of New Haven) Distributions 20 / 49
Marc Mehlman
Common Continuous Distributions Uniform Distribution, UNIF(a, b)
b−a
Marc Mehlman (University of New Haven) Distributions 21 / 49
Marc Mehlman
Common Continuous Distributions Uniform Distribution, UNIF(a, b)
17 100.
Marc Mehlman (University of New Haven) Distributions 22 / 49
Marc Mehlman
Common Continuous Distributions Normal Distribution, X ∼ N(µ, σ)
43
One particularly important class of density curves are the Normal curves, which describe Normal distributions.
standard deviation σ.
Marc Mehlman (University of New Haven) Distributions 23 / 49
Marc Mehlman
Common Continuous Distributions Normal Distribution, X ∼ N(µ, σ)
44
A Normal distribution is described by a Normal density curve. Any particular Normal distribution is completely specified by two numbers: its mean µ and standard deviation σ.
symmetric Normal curve.
change-of-curvature points on either side.
standard deviation σ as N(µ,σ). A Normal distribution is described by a Normal density curve. Any particular Normal distribution is completely specified by two numbers: its mean µ and standard deviation σ.
symmetric Normal curve.
change-of-curvature points on either side.
standard deviation σ as N(µ,σ).
Definition A continuous random variable, X, has a normal distribution, X ∼ N(µ, σ) if its density is f (x) = 1 √ 2πσ e−(x−µ)2/2σ2.
Marc Mehlman (University of New Haven) Distributions 24 / 49
Marc Mehlman
Common Continuous Distributions Normal Distribution, X ∼ N(µ, σ)
45
The 68-95-99.7 Rule In the Normal distribution with mean µ and standard deviation σ:
The 68-95-99.7 Rule In the Normal distribution with mean µ and standard deviation σ:
Marc Mehlman (University of New Haven) Distributions 25 / 49
Marc Mehlman
Common Continuous Distributions Normal Distribution, X ∼ N(µ, σ)
47
All Normal distributions are the same if we measure in units of size σ from the mean µ as center. If a variable x has a distribution with mean µ and standard deviation σ, then the standardized value of x, or its z-score, is If a variable x has a distribution with mean µ and standard deviation σ, then the standardized value of x, or its z-score, is
The standard Normal distribution is the Normal distribution with mean 0 and standard deviation 1. That is, the standard Normal distribution is N(0,1). The standard Normal distribution is the Normal distribution with mean 0 and standard deviation 1. That is, the standard Normal distribution is N(0,1).
Marc Mehlman (University of New Haven) Distributions 26 / 49
Marc Mehlman
Common Continuous Distributions Normal Distribution, X ∼ N(µ, σ)
48
Because all Normal distributions are the same when we standardize, we can find areas under any Normal curve from a single table. The Standard Normal Table Table A is a table of areas under the standard Normal curve. The table entry for each value z is the area under the curve to the left
The Standard Normal Table Table A is a table of areas under the standard Normal curve. The table entry for each value z is the area under the curve to the left
Marc Mehlman (University of New Haven) Distributions 27 / 49
Marc Mehlman
Common Continuous Distributions Normal Distribution, X ∼ N(µ, σ)
Marc Mehlman (University of New Haven) Distributions 28 / 49
Marc Mehlman
Common Continuous Distributions Normal Distribution, X ∼ N(µ, σ)
Marc Mehlman (University of New Haven) Distributions 29 / 49
Marc Mehlman
Sampling Distributions
Marc Mehlman (University of New Haven) Distributions 30 / 49
Marc Mehlman
Sampling Distributions
As we begin to use sample data to draw conclusions about a wider population, we must be clear about whether a number describes a sample or a population.
4
A parameter is a number that describes some characteristic of the
known because we cannot examine the entire population. A statistic is a number that describes some characteristic of a
sample data. We often use a statistic to estimate an unknown parameter. A parameter is a number that describes some characteristic of the
known because we cannot examine the entire population. A statistic is a number that describes some characteristic of a
sample data. We often use a statistic to estimate an unknown parameter.
Remember s and p: statistics come from samples and parameters come from populations. x We write µ (the Greek letter mu) for the population mean and σ for the population standard deviation. We write (x-bar) for the sample mean and s for the sample standard deviation.
Marc Mehlman (University of New Haven) Distributions 31 / 49
Marc Mehlman
Sampling Distributions
5
The process of statistical inference involves using information from a sample to draw conclusions about a wider population. Different random samples yield different statistics. We need to be able to describe the sampling distribution of possible statistic values in order to perform statistical inference. We can think of a statistic as a random variable because it takes numerical values that describe the outcomes of the random sampling process. Population Population Sample Sample Collect data from a representative Sample... Make an Inference about the Population.
Marc Mehlman (University of New Haven) Distributions 32 / 49
Marc Mehlman
Sampling Distributions
6
Different random samples yield different statistics. This basic fact is called sampling variability: the value of a statistic varies in repeated random sampling. To make sense of sampling variability, we ask, “What would happen if we took many samples?” Population Population
Sample Sample Sample Sample Sample Sample Sample Sample Sample Sample Sample Sample Sample Sample Sample Sample Marc Mehlman (University of New Haven) Distributions 33 / 49
Marc Mehlman
Sampling Distributions
7
The law of large numbers assures us that if we measure enough subjects, the statistic x-bar will eventually get very close to the unknown parameter µ. If we took every one of the possible samples of a certain size, calculated the sample mean for each, and graphed all of those values, we’d have a sampling distribution. The population distribution of a variable is the distribution of values of the variable among all individuals in the population. The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population. The population distribution of a variable is the distribution of values of the variable among all individuals in the population. The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population.
Marc Mehlman (University of New Haven) Distributions 34 / 49
Marc Mehlman
Sampling Distributions
Mean of a sampling distribution of a sample mean There is no tendency for a sample mean to fall systematically above or below µ, even if the distribution of the raw data is skewed. Thus, the mean of the sampling distribution is an unbiased estimate of the population mean µ. Standard deviation of a sampling distribution of a sample mean The standard deviation of the sampling distribution measures how much the sample statistic varies from sample to sample. It is smaller than the standard deviation of the population by a factor of √n. Averages are less variable than individual observations.
8
Marc Mehlman (University of New Haven) Distributions 35 / 49
Marc Mehlman
Sampling Distributions
9
When we choose many SRSs from a population, the sampling distribution
spread out than the population distribution. Here are the facts. Note: These facts about the mean and standard deviation of x are true no matter what shape the population distribution has.
The Sampling Distribution of Sample Means The Sampling Distribution of Sample Means
The st andard deviation of the sampling distribution of x is σ x = σ n The mean of the sampling distribution of x is µx
= µ
Suppose that x is the mean of an SRS of size n drawn from a large population with mean µ and standard deviation σ. Then : If individual observations have the N(µ,σ) distribution, then the sample mean
size n. If individual observations have the N(µ,σ) distribution, then the sample mean
size n.
9
Marc Mehlman (University of New Haven) Distributions 36 / 49
Marc Mehlman
Central Limit Theorem, CLT
Marc Mehlman (University of New Haven) Distributions 37 / 49
Marc Mehlman
Central Limit Theorem, CLT
1 with mean µ, ie unbiased. 2 with standard deviation σ/√n. 3 with normal distribution.
Marc Mehlman (University of New Haven) Distributions 38 / 49
Marc Mehlman
Central Limit Theorem, CLT
√n
1 If the Xj’s are normal then ¯
2 µ ¯
Xn = µ.
3 σ ¯
Xn = σ √n.
Marc Mehlman (University of New Haven) Distributions 39 / 49
Marc Mehlman
Central Limit Theorem, CLT
Any linear combination of independent Normal random variables is also Normally distributed. More generally, the central limit theorem notes that the distribution of a sum or average of many small random quantities is close to Normal. Finally, the central limit theorem also applies to discrete random variables.
12
Marc Mehlman (University of New Haven) Distributions 40 / 49
Marc Mehlman
Central Limit Theorem, CLT
1,000
Marc Mehlman (University of New Haven) Distributions 41 / 49
Marc Mehlman
Central Limit Theorem, CLT
n
n
j=1 Xj = 1 nY is approximately normal for large n
Marc Mehlman (University of New Haven) Distributions 42 / 49
Marc Mehlman
Central Limit Theorem, CLT
Example (without continuity correction) Untreated Varicella (chickenpox) in newborns results in probability of death equal to 0.3. What is the probability that a village with 100 newborns with chickenpox has 25 or fewer deaths? Solution: Let Xj = # number of deaths (0 or 1) for the jth newborn ∼ BIN(1, 0.3). Let Y
def
=
100
Xj = # of deaths ∼ BIN(100, 0.3). Since 100(0.3) ≥ 5 and 100(0.7) ≥ 5 the above convention (CLT) says P(Y ≤ 25) = P
≤ 25 − 100(0.3)
−5 √ 21
Real answer is
25
j
differs too much from the approximate answer (over 15%).
Marc Mehlman (University of New Haven) Distributions 43 / 49
Marc Mehlman
Central Limit Theorem, CLT
Marc Mehlman (University of New Haven) Distributions 44 / 49
Marc Mehlman
Is it Normal?
Marc Mehlman (University of New Haven) Distributions 45 / 49
Marc Mehlman
Is it Normal?
One way to assess if a distribution is indeed approximately normal is to plot the data on a normal quantile plot. The data points are ranked and the percentile ranks are converted to z- scores with Table A. The z-scores are then used for the x axis against which the data are plotted on the y axis of the normal quantile plot.
line, indicating a good match between the data and a normal distribution.
55
Marc Mehlman (University of New Haven) Distributions 46 / 49
Marc Mehlman
Is it Normal?
−2 −1 1 2 3 −5 5 10
Normal Q−Q Plot
Theoretical Quantiles Sample Quantiles
−1 1 2 8 10 12 14 16 18 20
Normal Q−Q Plot
Theoretical Quantiles Sample Quantiles
Marc Mehlman (University of New Haven) Distributions 47 / 49
Marc Mehlman
Chapter #4 and #5 R Assignment
Marc Mehlman (University of New Haven) Distributions 48 / 49
Marc Mehlman
Chapter #4 and #5 R Assignment 1 Create a Normal Quantile Plot of the height of loblolly trees from the
Marc Mehlman (University of New Haven) Distributions 49 / 49