Review Probability Basic definitions: Randomization experiment - - PowerPoint PPT Presentation

review
SMART_READER_LITE
LIVE PREVIEW

Review Probability Basic definitions: Randomization experiment - - PowerPoint PPT Presentation

Review Probability Basic definitions: Randomization experiment Sample spaces Elementary outcomes Event Basic operationsconditional probability Bayes Theorem Objectives Random Variable Discrete random


slide-1
SLIDE 1
slide-2
SLIDE 2

Review

  • Probability
  • Basic definitions:

Randomization experiment Sample spaces Elementary outcomes Event

  • Basic operations—conditional probability
  • Bayes Theorem
slide-3
SLIDE 3

Objectives

  • Random Variable

Discrete random variable Continuous random variable

  • Two probability distributions

Binomial distribution Normal distribution

slide-4
SLIDE 4

Random variables

  • A random variable is a function that assigns numeric

values to different events in a sample space. Usually we denote a random variable using a capital letter X, Y or Z…

  • NOTE: (1) Randomness; (2) Numeric values
  • Example 1: Randomly select a student from a class.

X=student’s number of siblings. X could be 0, 1, 2 …

  • Example 2: Randomly select a student from a class.

X=student’s height. X could be any value bigger than 0

4

slide-5
SLIDE 5

Two types of random variables

1.

Discrete random variable: their outcomes are set of discrete (isolated) values.

  • Eg. X=number of siblings

2.

Continuous random variable: its possible values cannot be enumerated; infinite number of values, all

  • utcomes have probability zero. p(x)=0 for every x.
  • Eg. X=the student’ height

5

slide-6
SLIDE 6
  • EG1. Tossing two coins

let X=number of heads

Notation: X: variable x: observed values

6

Outcome TT HT TH HH x 1 2

slide-7
SLIDE 7

Probability distribution function

  • A probability distribution function (pdf) is a

mathematical relationship, or rule, that assigns to any possible value x of a discrete random variable X the probability Pr(X=x).

7

slide-8
SLIDE 8

Probability Distribution of the Random Variable

8

X=number of heads.

Outcome TT WT TW WW x 1 2 P(X=x) 1/4 1/2 1/4

Probability histogram

slide-9
SLIDE 9
  • EG2. Tossing two dice

Y: the sum of the dots on the two Dice. What’s the possible values of Y?

9

slide-10
SLIDE 10

Probability Distribution of the Random Variable

10

Y: the sum of the dots on the two Dice.

slide-11
SLIDE 11

frequency of occurrences frequency of all possible occurrences Probability = 0 ≤ Probability ≤ 1

Relative frequency

In practice, the probability can be estimated by the relative frequency of an event “in a long run”. Relative frequency histogram should look very much like the probability histogram, if the experiment is repeated many times.

slide-12
SLIDE 12

Data set vs. Probability distributions

 Sample properties—based on data set

Sample mean: Sample variance:

 Model or population properties—based on

probability distribution. Population mean: Population variance:

12

1 2 2 1

/ 1 ( ) 1

n i i n i i

x x n s x x n

= =

= = − −

∑ ∑

1 2 2 1

Pr( ) ( ) Pr( )

R i i i R i i i

x X x x X x µ σ µ

= =

= = = − =

∑ ∑

slide-13
SLIDE 13

Mean of Random Variable

 Mean or expected value of X, denoted as E(X)

  • r µ, is defined as

 It is the sum of the possible values, each

weighted by its probability

 Expectation represents “average” value of the

random variable

13

=

= = =

R i i i

x X x X E

1

) Pr( ) ( µ

slide-14
SLIDE 14

Mean of X

14

X=number of heads.

Outcome TT WT TW WW x 1 2 P(X=x) 1/4 1/2 1/4 xP(x) 1/2 1/2

3 1

( ) Pr( ) 1

i i i

E X x X x µ

=

= = = =

slide-15
SLIDE 15

Variance of Random Variable

 The variance of X is the expected squared

distance from the population mean.

 The standard deviation σ is the square root of

variance

15

=

= − = =

R i i i

x X x X Var

1 2 2

) Pr( ) ( ) ( µ σ

) ( ) ( X Var X sd = = σ

slide-16
SLIDE 16

Variance of X

16

X=number of heads.

Thus, Summary, µ and σ are computed from probability

  • distribution. They are population properties.

x P(x) (X-µ)2 P(x) 0.25 (0-1)2*0.25=0.25 1 0.5 (1-1)2*0.25=0 2 0.25 (2-1)2*0.25=0.25 Total 0.50

2

0.5 σ =

slide-17
SLIDE 17

Two types of random variables

1.

Discrete random variable: their outcomes are set of discrete (isolated) values.

2.

Continuous random variable: its possible values cannot be enumerated; infinite number of values, all

  • utcomes have probability zero. p(x)=0 for every x.

17

slide-18
SLIDE 18

Continuous random variables

  • A balanced spinning pointer.

Can stop anywhere in the circle

  • X—the proportion of the

total circumference it lands

  • n.
  • X can be any value between

0 and 1. Infinite values.

  • p(0.25≤x ≤0.75)=0.5
  • p(x=0.5)=0, for x can take on

an infinite number of values.

18

slide-19
SLIDE 19

Probability density function(pdf) of X

  • The curve is

the probability density function (pdf) of the random variable X

  • Pr(a≤X ≤b)= is the area

under the curve between the x value a and b.

  • The total area under the

density function curve over the entire range of possible values for the random variable is 1

19

( ) y f x =

( ) ( )

b a

P a X b f x dx ≤ ≤ = ∫

( ) y f x =

( ) ( ) 1 P X f x dx

∞ −∞

−∞ ≤ ≤ ∞ = =

slide-20
SLIDE 20

Probability density function(pdf) of X

  • The pdf has large values in

regions of high probability and small values in regions of low probability

  • Pr(X=x)=0 for any specific

value x

  • Generally, a distinction is not

made between probabilities such as Pr(X<x) and Pr(X≤x), Pr(a≤X≤b) and Pr(a<X<b) when X is a continuous

20

( ) y f x =

slide-21
SLIDE 21

Expectation and variance of a continuous random variable

  • Mean :

Center of the probability density

  • Variance :

Spread of the probability density

  • The standard deviation, or σ, is the square root of

the variance, that is,

21

2

σ

) (X Var = σ

(X) ( ) E xf x dx µ

∞ −∞

= = ∫

2 2

(X) ( ) ( ) Var x f x dx σ µ

∞ −∞

= = −

µ

slide-22
SLIDE 22

Two distributions

22

 Binomial --discrete  Normal -- continuous

slide-23
SLIDE 23

Bernoulli trial

23 Examples:

 A heads-or-tails Coin toss  A win-or-lose football game  A pass-or-fail automotive smog inspection

Properties:

 Two outcomes: success or failure  Success probability(p) is the same in each

trial

 Trials are independent.

slide-24
SLIDE 24

Binomial random variable

24

  • --X is the number of success in n repeated

Bernoulli trial with probability p of success.

Success probability(p) is the same in each

trial

Trials are independent.

slide-25
SLIDE 25

Binomial random variable

25

Probability Distribution: the probability of

  • btaining k successes in n trial, with success

probability p: : counts all possible ways of getting k success and n-k failures : probability for getting k success and n-k failures

( ) (1 )

k n k

n P X k p p k

  = = −    

! !( )! n n k k n k   =   −  

(1 )

k n k

p p

! ( 1) ... 1 where n n n = × − × ×

slide-26
SLIDE 26

Mean and Variance of the Binomial Distribution

26

2

(1 ) np np p µ σ = = −

slide-27
SLIDE 27

Exercise

27

Newborns were screened for HIV in a Massachusetts

  • hospital. The positive rate for inner-city baby is p=0.01.

If 500 newborns are screened,

  • 1. what is the exact binomial probability of 5 HIV

positive test results?

slide-28
SLIDE 28

Exercise

28

Newborns were screened for HIV in a Massachusetts

  • hospital. The positive rate for inner-city baby is p=0.01.

If 500 newborns are screened,

  • 1. what is the exact binomial probability of 5 HIV

positive test results? Answer: EXCEL: BINOMDIST(5,500,0.01,FALSE)

5 495

500 ( 5) 0.01 (1 0.01) 5 0.176 P X   = = −     =

slide-29
SLIDE 29

Exercise

29

Newborns were screened for HIV in a Massachusetts

  • hospital. The positive rate for inner-city baby is p=0.01.

If 500 newborns are screened,

  • 2. What is the exact binomial probability of at least 5

HIV positive test results?

slide-30
SLIDE 30

Exercise

30

Newborns were screened for HIV in a Massachusetts

  • hospital. The positive rate for inner-city baby is p=0.01.

If 500 newborns are screened,

  • 2. What is the exact binomial probability of at least 5

HIV positive test results?

Answer: EXCEL: F(4)= BINOMDIST(4,500,0.01,TRUE)

( 5) 1 ( 4) 0.44 4) .5 1 6 ( 1 P X P X F ≥ = − ≤ = − = = −

slide-31
SLIDE 31

Normal distribution

  • Normal distribution is also called Gaussian

distribution, after the well-known mathematician Karl Gauss (1777-1855, “the Prince of Mathematicians“)

31

slide-32
SLIDE 32

Normal distribution

32

  • Normal distribution is very useful
  • Many things closely follow a normal distribution
  • Heights of people
  • Errors in measurement
  • Blood pressure
  • Scores on a test
  • Many other distributions can be made approximately

normal by transformation—Binomial et al.

  • Most statistical methods considered in this text are

based on normal distribution

slide-33
SLIDE 33

The pdf of normal distribution

  • The normal distribution is defined by its pdf, which is

given as for some parameters µ and σ

33

2 2

( ) 2

1 ( ) 2

x

f x e

µ σ

πσ

  − −     

=

slide-34
SLIDE 34

Other properties of Normal pdf

34

  • Mean=median=mode
  • Symmetry about the center
  • 50% of values less than the mean
slide-35
SLIDE 35

Location is measured by µ

  • In the graph, µ2>µ1

35

slide-36
SLIDE 36

Spread is measured by σ2

  • In the graph, σ2>σ1

36

slide-37
SLIDE 37

Standard normal distribution N(0, 1)

  • A normal distribution with mean 0 and variance 1

is called a standard normal distribution. Denoted as N(0, 1)

  • In the following, we will examine the standard

normal distribution N(0, 1) in details.

  • We will see that any information concerning a

general normal distribution N(µ, σ2) can be

  • btained from appropriate manipulations of an

N(0,1) distribution

37

slide-38
SLIDE 38

Density of standard normal N(0,1)

38

1 µ σ = =

2

2

1 ( ) 2

x

f x e π

=

slide-39
SLIDE 39

Properties of the standard normal N(0, 1)

  • It can be shown that about 68% of the area under the standard

normal density lies between -1 and +1, about 95% of the area lies between -2 and +2, and about 99% lies between -2.5 and +2.5 NOTE: You will see that, more precisely, Pr(-1<x<1)=0.6827, Pr(-1.96<X<1.96)=0.95, Pr(-2.576<X<2.576)=0.99

39

slide-40
SLIDE 40

Cumulative probability

  • The cumulative distribution function (cdf) for a

standard normal distribution is denoted by Φ(x)=Pr(X≤x), where Z~N(0,1)

40

( ) ( ) F a P Z a = ≤

Excel: F(a): NORMSDIST(a);

slide-41
SLIDE 41

41

( ) ( ) ( ) P a Z b F b F a ≤ ≤ = −

Excel: F(1): NORMSDIST(1); F(-1): NORMSDIST(-1);

( 1 1) (1) ( 1) =0.8413-0.1587 =0.6826 P Z F F − ≤ ≤ = − −

slide-42
SLIDE 42
  • Eg.

42

( ) 1 ( ) P Z a F a ≥ = −

( 1) 1 (1) =1-0.8413 =0.1587 P Z F ≥ = −

(1) NORMSDIST(1)

Excel: F(1): NORMSDIST(1);

slide-43
SLIDE 43

How to standardize the normal distribution?

43

slide-44
SLIDE 44

How to standardize the normal distribution?

44

X Z µ σ − =

Then Z has a standard normal distribution, Z ~ N(0, 1)

slide-45
SLIDE 45

Standardization

  • IF X~ N(µ, σ2) and

then Z~N(0,1) Then

45

P( ) P( ) ( ) ( ) a b b a a X b Z F F µ µ µ µ σ σ σ σ − − − − < < = < < = −

X Z µ σ − =

slide-46
SLIDE 46

Use standardization for many problems

  • Example:If X~N(80, 12^2), what is Pr(90<X<100)?
  • Solution:

46

90 80 80 100 80 Pr(90 100) Pr( ) 12 12 12 Pr(0.83 1.67) =F(1.67)-F(0.83) =0.9522-0.7977 X X Z − − − < < = < < = < < =0.155

slide-47
SLIDE 47

Always draw a graph…

47

slide-48
SLIDE 48

Exercise

  • Suppose we know that among men aged 30-34 who have

ever smoked, the mean number of years they smoked is 12.8 with a standard deviation of 5.1 years. Assuming that the duration of smoking is normally distributed, what proportion of men in this age group have smoked for more than 20 years?

48

slide-49
SLIDE 49

Exercise

Suppose we know that among men aged 30-34 who have ever smoked, the mean number of years they smoked is 12.8 with a standard deviation of 5.1 years. Assuming that the duration of smoking is normally distributed, what proportion of men in this age group have smoked for more than 20 years? Answer: We have And we need to compute

49

2

~ (12.8, 5.1 ) X N ( 20) P X >

( 20) 1 ( 20) 20 12.8 =1- (Z ) 5.1 1 (1.412) =1-0.9210=0.079 P X P X P F > = − ≤ − ≤ = −

EXCEL: NORMDIST(20,12.8,5.1,TRUE) Or NORMSDIST(1.412)