Review Probability Basic definitions: Randomization experiment - - PowerPoint PPT Presentation
Review Probability Basic definitions: Randomization experiment - - PowerPoint PPT Presentation
Review Probability Basic definitions: Randomization experiment Sample spaces Elementary outcomes Event Basic operationsconditional probability Bayes Theorem Objectives Random Variable Discrete random
Review
- Probability
- Basic definitions:
Randomization experiment Sample spaces Elementary outcomes Event
- Basic operations—conditional probability
- Bayes Theorem
Objectives
- Random Variable
Discrete random variable Continuous random variable
- Two probability distributions
Binomial distribution Normal distribution
Random variables
- A random variable is a function that assigns numeric
values to different events in a sample space. Usually we denote a random variable using a capital letter X, Y or Z…
- NOTE: (1) Randomness; (2) Numeric values
- Example 1: Randomly select a student from a class.
X=student’s number of siblings. X could be 0, 1, 2 …
- Example 2: Randomly select a student from a class.
X=student’s height. X could be any value bigger than 0
4
Two types of random variables
1.
Discrete random variable: their outcomes are set of discrete (isolated) values.
- Eg. X=number of siblings
2.
Continuous random variable: its possible values cannot be enumerated; infinite number of values, all
- utcomes have probability zero. p(x)=0 for every x.
- Eg. X=the student’ height
5
- EG1. Tossing two coins
let X=number of heads
Notation: X: variable x: observed values
6
Outcome TT HT TH HH x 1 2
Probability distribution function
- A probability distribution function (pdf) is a
mathematical relationship, or rule, that assigns to any possible value x of a discrete random variable X the probability Pr(X=x).
7
Probability Distribution of the Random Variable
8
X=number of heads.
Outcome TT WT TW WW x 1 2 P(X=x) 1/4 1/2 1/4
Probability histogram
- EG2. Tossing two dice
Y: the sum of the dots on the two Dice. What’s the possible values of Y?
9
Probability Distribution of the Random Variable
10
Y: the sum of the dots on the two Dice.
frequency of occurrences frequency of all possible occurrences Probability = 0 ≤ Probability ≤ 1
Relative frequency
In practice, the probability can be estimated by the relative frequency of an event “in a long run”. Relative frequency histogram should look very much like the probability histogram, if the experiment is repeated many times.
Data set vs. Probability distributions
Sample properties—based on data set
Sample mean: Sample variance:
Model or population properties—based on
probability distribution. Population mean: Population variance:
12
1 2 2 1
/ 1 ( ) 1
n i i n i i
x x n s x x n
= =
= = − −
∑ ∑
1 2 2 1
Pr( ) ( ) Pr( )
R i i i R i i i
x X x x X x µ σ µ
= =
= = = − =
∑ ∑
Mean of Random Variable
Mean or expected value of X, denoted as E(X)
- r µ, is defined as
It is the sum of the possible values, each
weighted by its probability
Expectation represents “average” value of the
random variable
13
∑
=
= = =
R i i i
x X x X E
1
) Pr( ) ( µ
Mean of X
14
X=number of heads.
Outcome TT WT TW WW x 1 2 P(X=x) 1/4 1/2 1/4 xP(x) 1/2 1/2
3 1
( ) Pr( ) 1
i i i
E X x X x µ
=
= = = =
∑
Variance of Random Variable
The variance of X is the expected squared
distance from the population mean.
The standard deviation σ is the square root of
variance
15
∑
=
= − = =
R i i i
x X x X Var
1 2 2
) Pr( ) ( ) ( µ σ
) ( ) ( X Var X sd = = σ
Variance of X
16
X=number of heads.
Thus, Summary, µ and σ are computed from probability
- distribution. They are population properties.
x P(x) (X-µ)2 P(x) 0.25 (0-1)2*0.25=0.25 1 0.5 (1-1)2*0.25=0 2 0.25 (2-1)2*0.25=0.25 Total 0.50
2
0.5 σ =
Two types of random variables
1.
Discrete random variable: their outcomes are set of discrete (isolated) values.
2.
Continuous random variable: its possible values cannot be enumerated; infinite number of values, all
- utcomes have probability zero. p(x)=0 for every x.
17
Continuous random variables
- A balanced spinning pointer.
Can stop anywhere in the circle
- X—the proportion of the
total circumference it lands
- n.
- X can be any value between
0 and 1. Infinite values.
- p(0.25≤x ≤0.75)=0.5
- p(x=0.5)=0, for x can take on
an infinite number of values.
18
Probability density function(pdf) of X
- The curve is
the probability density function (pdf) of the random variable X
- Pr(a≤X ≤b)= is the area
under the curve between the x value a and b.
- The total area under the
density function curve over the entire range of possible values for the random variable is 1
19
( ) y f x =
( ) ( )
b a
P a X b f x dx ≤ ≤ = ∫
( ) y f x =
( ) ( ) 1 P X f x dx
∞ −∞
−∞ ≤ ≤ ∞ = =
∫
Probability density function(pdf) of X
- The pdf has large values in
regions of high probability and small values in regions of low probability
- Pr(X=x)=0 for any specific
value x
- Generally, a distinction is not
made between probabilities such as Pr(X<x) and Pr(X≤x), Pr(a≤X≤b) and Pr(a<X<b) when X is a continuous
20
( ) y f x =
Expectation and variance of a continuous random variable
- Mean :
Center of the probability density
- Variance :
Spread of the probability density
- The standard deviation, or σ, is the square root of
the variance, that is,
21
2
σ
) (X Var = σ
(X) ( ) E xf x dx µ
∞ −∞
= = ∫
2 2
(X) ( ) ( ) Var x f x dx σ µ
∞ −∞
= = −
∫
µ
Two distributions
22
Binomial --discrete Normal -- continuous
Bernoulli trial
23 Examples:
A heads-or-tails Coin toss A win-or-lose football game A pass-or-fail automotive smog inspection
Properties:
Two outcomes: success or failure Success probability(p) is the same in each
trial
Trials are independent.
Binomial random variable
24
- --X is the number of success in n repeated
Bernoulli trial with probability p of success.
Success probability(p) is the same in each
trial
Trials are independent.
Binomial random variable
25
Probability Distribution: the probability of
- btaining k successes in n trial, with success
probability p: : counts all possible ways of getting k success and n-k failures : probability for getting k success and n-k failures
( ) (1 )
k n k
n P X k p p k
−
= = −
! !( )! n n k k n k = −
(1 )
k n k
p p
−
−
! ( 1) ... 1 where n n n = × − × ×
Mean and Variance of the Binomial Distribution
26
2
(1 ) np np p µ σ = = −
Exercise
27
Newborns were screened for HIV in a Massachusetts
- hospital. The positive rate for inner-city baby is p=0.01.
If 500 newborns are screened,
- 1. what is the exact binomial probability of 5 HIV
positive test results?
Exercise
28
Newborns were screened for HIV in a Massachusetts
- hospital. The positive rate for inner-city baby is p=0.01.
If 500 newborns are screened,
- 1. what is the exact binomial probability of 5 HIV
positive test results? Answer: EXCEL: BINOMDIST(5,500,0.01,FALSE)
5 495
500 ( 5) 0.01 (1 0.01) 5 0.176 P X = = − =
Exercise
29
Newborns were screened for HIV in a Massachusetts
- hospital. The positive rate for inner-city baby is p=0.01.
If 500 newborns are screened,
- 2. What is the exact binomial probability of at least 5
HIV positive test results?
Exercise
30
Newborns were screened for HIV in a Massachusetts
- hospital. The positive rate for inner-city baby is p=0.01.
If 500 newborns are screened,
- 2. What is the exact binomial probability of at least 5
HIV positive test results?
Answer: EXCEL: F(4)= BINOMDIST(4,500,0.01,TRUE)
( 5) 1 ( 4) 0.44 4) .5 1 6 ( 1 P X P X F ≥ = − ≤ = − = = −
Normal distribution
- Normal distribution is also called Gaussian
distribution, after the well-known mathematician Karl Gauss (1777-1855, “the Prince of Mathematicians“)
31
Normal distribution
32
- Normal distribution is very useful
- Many things closely follow a normal distribution
- Heights of people
- Errors in measurement
- Blood pressure
- Scores on a test
- Many other distributions can be made approximately
normal by transformation—Binomial et al.
- Most statistical methods considered in this text are
based on normal distribution
The pdf of normal distribution
- The normal distribution is defined by its pdf, which is
given as for some parameters µ and σ
33
2 2
( ) 2
1 ( ) 2
x
f x e
µ σ
πσ
− −
=
Other properties of Normal pdf
34
- Mean=median=mode
- Symmetry about the center
- 50% of values less than the mean
Location is measured by µ
- In the graph, µ2>µ1
35
Spread is measured by σ2
- In the graph, σ2>σ1
36
Standard normal distribution N(0, 1)
- A normal distribution with mean 0 and variance 1
is called a standard normal distribution. Denoted as N(0, 1)
- In the following, we will examine the standard
normal distribution N(0, 1) in details.
- We will see that any information concerning a
general normal distribution N(µ, σ2) can be
- btained from appropriate manipulations of an
N(0,1) distribution
37
Density of standard normal N(0,1)
38
1 µ σ = =
2
2
1 ( ) 2
x
f x e π
−
=
Properties of the standard normal N(0, 1)
- It can be shown that about 68% of the area under the standard
normal density lies between -1 and +1, about 95% of the area lies between -2 and +2, and about 99% lies between -2.5 and +2.5 NOTE: You will see that, more precisely, Pr(-1<x<1)=0.6827, Pr(-1.96<X<1.96)=0.95, Pr(-2.576<X<2.576)=0.99
39
Cumulative probability
- The cumulative distribution function (cdf) for a
standard normal distribution is denoted by Φ(x)=Pr(X≤x), where Z~N(0,1)
40
( ) ( ) F a P Z a = ≤
Excel: F(a): NORMSDIST(a);
41
( ) ( ) ( ) P a Z b F b F a ≤ ≤ = −
Excel: F(1): NORMSDIST(1); F(-1): NORMSDIST(-1);
( 1 1) (1) ( 1) =0.8413-0.1587 =0.6826 P Z F F − ≤ ≤ = − −
- Eg.
42
( ) 1 ( ) P Z a F a ≥ = −
( 1) 1 (1) =1-0.8413 =0.1587 P Z F ≥ = −
(1) NORMSDIST(1)
Excel: F(1): NORMSDIST(1);
How to standardize the normal distribution?
43
How to standardize the normal distribution?
44
X Z µ σ − =
Then Z has a standard normal distribution, Z ~ N(0, 1)
Standardization
- IF X~ N(µ, σ2) and
then Z~N(0,1) Then
45
P( ) P( ) ( ) ( ) a b b a a X b Z F F µ µ µ µ σ σ σ σ − − − − < < = < < = −
X Z µ σ − =
Use standardization for many problems
- Example:If X~N(80, 12^2), what is Pr(90<X<100)?
- Solution:
46
90 80 80 100 80 Pr(90 100) Pr( ) 12 12 12 Pr(0.83 1.67) =F(1.67)-F(0.83) =0.9522-0.7977 X X Z − − − < < = < < = < < =0.155
Always draw a graph…
47
Exercise
- Suppose we know that among men aged 30-34 who have
ever smoked, the mean number of years they smoked is 12.8 with a standard deviation of 5.1 years. Assuming that the duration of smoking is normally distributed, what proportion of men in this age group have smoked for more than 20 years?
48
Exercise
Suppose we know that among men aged 30-34 who have ever smoked, the mean number of years they smoked is 12.8 with a standard deviation of 5.1 years. Assuming that the duration of smoking is normally distributed, what proportion of men in this age group have smoked for more than 20 years? Answer: We have And we need to compute
49
2
~ (12.8, 5.1 ) X N ( 20) P X >
( 20) 1 ( 20) 20 12.8 =1- (Z ) 5.1 1 (1.412) =1-0.9210=0.079 P X P X P F > = − ≤ − ≤ = −
EXCEL: NORMDIST(20,12.8,5.1,TRUE) Or NORMSDIST(1.412)