Probability Probability Up to now we have been using the assumption - - PDF document

probability probability
SMART_READER_LITE
LIVE PREVIEW

Probability Probability Up to now we have been using the assumption - - PDF document

6.8, 6.9 Probability P. Danziger Probability Probability Up to now we have been using the assumption of equal likelyhood, we want a more general definition of probability. Definition 1 A probability function, P , on a sam- ple space S is a


slide-1
SLIDE 1

6.8, 6.9 Probability

  • P. Danziger

Probability Probability

Up to now we have been using the assumption of equal likelyhood, we want a more general definition

  • f probability.

Definition 1 A probability function, P, on a sam- ple space S is a function which maps events to R, i.e. P : P(S) − → R, which satisfies: P1 P(S) = 1, P2 For all A ⊆ S P(A) ≥ 0, P3 If A, B ⊆ S, with A ∩ B = φ, then P(A ∪ B) = P(A) + P(B). The particular form of the probability function de- fines the likelyhood of particular events. 1

slide-2
SLIDE 2

6.8, 6.9 Probability

  • P. Danziger

From these general properties several important consequences follow. Theorem 2 Given a probability function defined

  • n a sample space S and events A, B ⊆ S the fol-

lowing hold:

  • 1. If B ⊆ A, then P((B) ≤ P(A).
  • 2. 0 ≤ P(A) ≤ 1,
  • 3. P(Ac) = 1 − P(A).
  • 4. P(φ) = 0.

2

slide-3
SLIDE 3

6.8, 6.9 Probability

  • P. Danziger

Proof:

  • 1. Let B ⊆ A, then A − B and B are disjoint and

(A − B) ∪ B = A. So by P1 P(A) = P(A − B) + P(B). now by P1 P(A − B) ≥ 0, so P(A) ≥ P(B).

  • 2. P(A) ≥ 0 by P2, every event A ⊆ S by defini-

tion and P(S) = 1 so P(A) ≤ 1 by the previous point.

  • 3. Note that A ∩ Ac = φ and A ∪ Ac = S.

So by P1 and P3 1 = P(S) (P1) = P(A ∪ Ac) (S = A ∪ Ac) = P(A) + P(Ac) (P3) So P(Ac) = 1 − P(A).

  • 4. φ = Sc and so by 3 above and P1, P(φ) =

1 − P(S) = 1 − 1 = 0. 3

slide-4
SLIDE 4

6.8, 6.9 Probability

  • P. Danziger

Theorem 3 (Generalised disjoint addition rule) Let A1, A2, . . . An be pairwise disjoint events, then P(A1∪A2∪. . .∪An) = P(A1)+P(A2)+. . .+P(An). Proof: Exercise (induction on n & P3). Theorem 4 (Generalised inclusion/exclusion rule) P(A ∪ B) = P(A) + P(B) − P(A ∩ B). Proof: Exercise (see book). 4

slide-5
SLIDE 5

6.8, 6.9 Probability

  • P. Danziger

Probability Functions Equal Likelyhood

We now show that P(A) = |A|

|S| is a probability func-

tion by this definition. Theorem 5 The equal likely hood probability func- tion P(A) = |A|

|S| satisfies P1, P2, P3 above.

Proof: P1 P(S) = |S|

|S| = 1

P2 Let A ⊆ S, |A| ≥ 0, |S| ≥ 0, so P(A) = |A|

|S| ≥ 0.

P3 Let A, B ⊆ S, with A ∩ B = φ, then |A ∪ B| = |A| + |B|. Now P(A ∪ B) = |A∪B|

S

= |A|+|B|

S

=

|A| |S| + |B| |S| = P(A) + P(B).

Thus equal likelyhood is a probability distribution. 5

slide-6
SLIDE 6

6.8, 6.9 Probability

  • P. Danziger

Binomial Distribution

Suppose that we are rolling a dice, but we only “win” if a 3 is rolled. Let the probability of a 3 be p = 1

6, the probability of not rolling a 3 is

q = 1 − p = 5

6.

If we roll the dice 5 times we may ask what is the probability of exactly two 3s? This is an example of a binomial distribution: Given an experiment with n independent trials, called Bernoulli trials, each having a desired out- come with probability p, and probability of failure

  • f q = 1 − p, the probability of exactly k successes

is

n

k

  • pkqn−k.

Such an experiment is said to have a binomial dis- tribution since 1 = (p + q)n =

n

  • k=0

n

k

  • pkqn−k.

6

slide-7
SLIDE 7

6.8, 6.9 Probability

  • P. Danziger

Returning to the dice example above, we may get two 3s by rolling any of the following combina- tions: 33xxx, 3x3xx, 3xx3x, 3xxx3, x33xx, . . ., where x is a non 3. This is the number of ways of choosing the two positions from the 5. Each will occur with proba- bility p2q3 (3 occurs twice each with probability p and not 3 appears 3 times, each with probability q.) Example 6

  • 1. Suppose we flip a coin 25 times, what is the

probability of getting exactly 5 heads? In this case p = q = 1

2

P(H = 5) =

25

5

1

2

5 1

2

20

= 53130

1

2

25

≈ 0.0015834 7

slide-8
SLIDE 8

6.8, 6.9 Probability

  • P. Danziger
  • 2. Suppose we flip a coin 25 times, what is the

probability of getting 5 heads or less? Again p = q = 1

2.

The sample space S = {x ∈ {0, 1}∗ | |x| = 25}. Note that the events Ak = {x ∈ S | |x|1 = k} are all disjoint from each other. Thus the generalised addition rule applies, so P(A0 ∪ A1 ∪ A2 ∪ A3 ∪ A4 ∪ A5) =

5

k=0 P(Ak)

=

5

k=0

25

k

1

2

25

= (1 + 25 + 300 + 2300 + 12650 + 53130)

1

2

25

= 68406

1

2

25 = 0.002038658

8

slide-9
SLIDE 9

6.8, 6.9 Probability

  • P. Danziger

k

25

k

  • 25

k

1

2

25

1 2.98023224×10−8 1 25 7.45058060×10−7 2 300 8.94069672×10−6 3 2300 6.85453415×10−5 4 12650 3.76999378×10−4 5 53130 1.58339739×10−3 6 177100 5.27799129×10−3 7 480700 1.43259764×10−2 8 1081575 3.22334468×10−2 9 2042975 6.08853996×10−2 10 3268760 9.74166393×10−2 11 4457400 1.32840872×10−1 12 5200300 1.54981017×10−1 k

25

k

  • 25

k

1

2

25

13 5200300 1.54981017×10−1 14 4457400 1.32840872×10−1 15 3268760 9.74166393×10−2 16 2042975 6.08853996×10−2 17 1081575 3.22334468×10−2 18 480700 1.43259764×10−2 19 177100 5.27799129×10−3 20 53130 1.58339739×10−3 21 12650 3.76999378×10−4 22 2300 6.85453415×10−5 23 300 8.94069672×10−6 24 25 7.45058060×10−7 25 1 2.98023224×10−8

9

slide-10
SLIDE 10

6.8, 6.9 Probability

  • P. Danziger

Note If n is large and p = 1

2 then the binomial

distribution provides a good approximation to the normal distribution of a real variable. The binomial distribution with p = 1

2 is the discrete form of the

normal distribution. 10

slide-11
SLIDE 11

6.8, 6.9 Probability

  • P. Danziger

Conditional Probability

Suppose that we are rolling a dice and counting the number of 3s as above. If we are to roll the dice 5 times what is the probability of getting exactly two 3s? The sample space is S = {x ∈ {0, 1}∗ | |x| = 5}, the event is B = {x ∈ S | |x|1 = 2}. P(B) =

5

2

1

6

2 5

6

3

= 1053 65 = 0.160751. Now suppose that we roll the first dice and it is a 3, what is the probability now that we will get two 3s? Let A = {x ∈ S | x starts with a 1 }, we now wish to find P(A ∩ B). In general we may wish to find the probability of an event B conditional on A having occurred. We write this as P(B|A). 11

slide-12
SLIDE 12

6.8, 6.9 Probability

  • P. Danziger

Definition 7 (Conditional Probability) Given two events A and B, the probability of B conditional

  • n A is

P(B|A) = P(A ∩ B) P(A)

  • r equivalently

P(A ∩ B) = P(B|A)P(A) Note: P(A ∩ B) is the probability that both A and B happen. P(B|A) is the probability that B happens given that A has already occurred. We often wish to know the value of P(A ∩ B) for use in the generalised inclusion/exclusion rule. 12

slide-13
SLIDE 13

6.8, 6.9 Probability

  • P. Danziger

Example 8

  • 1. Continuing the example given above, what is

the probability that there are exactly two 3s,

  • ne of which occurs on the first throw?

The question actually asks for P(A ∩ B). In this case the conditional probability P(B|A) is relatively easy to calculate. We wish to find the probability that we get two 3s, conditional

  • n the first throw being a 3.

This is equiv- alent to throwing exactly one more 1 in the remaining four throws. So P(B|A) =

4

1

1

6

5

6

3 = 4 125

1296 ≈ 0.3858.

Now P(A ∩ B) = P(B|A)P(A) ≈ 0.3858 × 0.160751 = 0.062018. 13

slide-14
SLIDE 14

6.8, 6.9 Probability

  • P. Danziger
  • 2. A dice is rolled two times, what is the proba-

bility that both rolls are 3? That there is at least one 3? Let A be the event that the first roll is a 3 and B that the second roll is a 3. Note that P(A) = P(B) = 1

6.

The first part of the question asks for P(A ∩ B) = P(B|A)P(A). However P(B|A) = P(B), since there is no connection between the rolls, thus P(A ∩ B) = P(B)P(A) = 1

16.

(Note that we could have calculated this by the multiplication rule) For the second part we are asked for P(A∪B). But by the addition rule P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = 1

4 + 1 4 − 1 16 = 7 16.

14

slide-15
SLIDE 15

6.8, 6.9 Probability

  • P. Danziger
  • 3. Suppose that two cards are drawn from a stan-

dard 52 card deck. What is the probability that they are both hearts? What is the probability that at least one is a heart? Let A be the event that the first card is a heart and B that the second card is a heart. P(A) = 13

52 = 1 4.

The value of P(B) will depend on whether a heart was chosen first, i.e. on A. P(B|A) = 12 51, P(B|Ac) = 13 51. Note that B = (B ∩ A) ∪ (B ∩ Ac), and that the sets (B ∩A) and (B ∩Ac) are disjoint. Thus by the addition rule P(B) = P(B ∩A)+P(B ∩Ac). Now P(B ∩ A) = P(B|A)P(A) = 12 51 · 1 4 = 12 204 and P(B ∩ Ac) = P(B|A)P(Ac) = P(B|A)(1 − P(A)) =

13 51 · 3 4 = 39 204,

so P(B) = P(B ∩ A) + P(B ∩ Ac) =

12 204 + 13 204 = 25 204 ≈ 0.122549.

15

slide-16
SLIDE 16

6.8, 6.9 Probability

  • P. Danziger
  • 4. Suppose that 5 cards are drawn from a stan-

dard 52 card deck, what is the probability that the fifth card is the second spade drawn? The probability that the fifth card is the second spade drawn is the same as the probability that the fifth card is a spade, conditional on exactly

  • ne spade having been drawn in the previous

4 draws. S = {x ∈ {♥, ♦, ♠, ♣}∗ | |x| = 5}. Let B = {x ∈ S | 5th card is a ♠ } and A = {x ∈ S | first 4 cards contain exactly 1 ♠}, we wish to find P(A ∩ B). By the multiplication rule the number of ways

  • f choosing exactly 1 spade in 4 draws is the

number of ways of choosing 1 spade from the 13 times the number of ways of choosing 3 nonspades from the 39 non spades in the deck. So |A| =

13

1

39

3

  • . Thus P(A) = (13

1 )(39 3 )

(52

4 )

16

slide-17
SLIDE 17

6.8, 6.9 Probability

  • P. Danziger

To find P(B|A) we note that given A, 1 spade has been drawn and so 13 − 1 = 12 of the remaining 52 − 4 = 48 cards are spades. Thus P(B|A) = 12

48 = 1 4.

Now P(A∩B) = P(B|A)P(A) = 1 4·

13

1

39

3

  • 52

4

  • ≈ 0.10971

We can see that conditional probabilities are re- lated to the multiplication rule. Indeed the mul- tiplication rule is obtained when the probability of event B is unrelated to the event A, as in exam- ple 2 above. In this case P(B|A) = P(B) and so P(A ∩ B) = P(B)P(A). Definition 9 Given two events A and B they are independent if and only if P(B|A) = P(B). If two events A and B are independent the mul- tiplication rule applies and P(A ∩ B) = P(B)P(A) (see example 2 above). 17

slide-18
SLIDE 18

6.8, 6.9 Probability

  • P. Danziger

Random Variables and Expectation

Definition 10 (Random Variable) A random variable is a function from a sample space to R: X : S − → R Notes

  • A random variable is neither random nor a vari-

able, despite its name.

  • Though the image space is R the actual image

is usually a small subspace. In particular, for discrete probabilities a random variable usually takes only a small range of values in N We often use random variables to define events. So X(s) = 3 defines a set of elements of S A = {s ∈ S | X(s) = 3}, this set is often denoted X = 3, and we talk about P(X = 3) = P(A) 18

slide-19
SLIDE 19

6.8, 6.9 Probability

  • P. Danziger

Example 11

  • 1. A coin is tossed 3 times, X is the number of

heads. S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}. X(HHH) = 3, X(HHT) = X(HTH) = X(THH) = 2, X(TTH) = X(THT) = X(HTT) = 1, X(TTT) = 0. The set of values that X can take is {0, 1, 2, 3}. X = 2 is the set {HHT, HTH, THH}. P(X = 2) = the probability that exactly two heads are rolled.

  • 2. Two dice are rolled, X = the sum of dice

rolled. P(X < 7) = the probability that the sum is less than 7. 19

slide-20
SLIDE 20

6.8, 6.9 Probability

  • P. Danziger
  • 3. 5 cards are drawn from a deck X = the number
  • f 7s, Y = the number of diamonds.

P(X = 2|Y = 3) is the probability that the hand contains two 7s and three diamonds.

  • 4. Two dice are rolled, let X1 be the value of the

first roll, X2 the value of the second roll, and X = X1 + X2. What is the probability that the total rolled is 8? This question asks for P(X = 8). Now X = 8 = X1 + X2. 20

slide-21
SLIDE 21

6.8, 6.9 Probability

  • P. Danziger

Given a random variable X we can ask what the expected value of the variable is. Definition 12 (expected value) Given a (discrete) random variable X : S − → {a1, a2, . . . , an} the expected value of X is given by E(X) =

n

  • i=1

ai P(X = ai) The expected value of a random variable is also known as the mean. 21

slide-22
SLIDE 22

6.8, 6.9 Probability

  • P. Danziger

Example 13

  • 1. Suppose we play a game where a dice is rolled

and we receive a payout of double the value

  • n the dice in dollars. What is a fair price to

pay to participate in this game? Let X be the payout, so X(1) = 2, X(2) = 4, X(3) = 6, X(4) = 8, X(5) = 10, X(6) = 12. Each outcome is equally likely, so P(X = k) =

1 6 for each k.

E(X) =

n

  • i=1

ai P(X = ai) =

6

  • i=1

2i1 6 = 2 + 4 + 6 + 8 + 10 + 12 6 = 42 6 = 7. So we would expect this game to pay an aver- age of $7 over many trials. 22

slide-23
SLIDE 23

6.8, 6.9 Probability

  • P. Danziger
  • 2. A coin is flipped until a head is tossed. How

many rounds would we expect? S = {x ∈ {H, T}∗ | x = T ∗H}. The probability

  • f getting T nH is

1

2

n.

Thus the expected number of flips is

  • i=0

1

2

n

= lim

n→∞ n

  • i=0

1

2

n

= lim

n→∞

1 −

1

2

n+1

1 − 1

2

= 2.

  • 3. St. Petersburg Paradox The following prob-

lem was posed by Daniel Bernoulli in 1738. A coin is flipped until a head appears. The pot starts at $1, each time a tail appears it doubles, when a head appears you get the pot. If the first toss is tails you get $2, If the first 2 tosses are tails you get $4, If the first 3 tosses are tails you get $8 . . . In general if the first n tosses are tails you get $2n. How much would you be willing to pay to par- ticipate in this game? 23

slide-24
SLIDE 24

6.8, 6.9 Probability

  • P. Danziger

Let X be the amount won. The probability of winning $2 is

1

2

2 = 1

4

(1 tail followed by a head.) The probability of winning $4 is

1

2

3 = 1

8

(2 tails followed by a head.) The probability of winning $8 is

1

2

4 = 1

16

(3 tails followed by a head.) . . . In general the probability of winning $2n is

1

2

n+1.

E(X) =

  • i=1

2n ·

1

2

n+1

=

  • i=1

1 2 = ∞ The expected payout is infinite! We should be willing to pay any amount to play this game, but few would. Note that 3

4 of the time we win only $2. Though

a run of 20 tails would pay out over a million dollars the chance of this happening are less than 1 in 2 million. 24

slide-25
SLIDE 25

6.8, 6.9 Probability

  • P. Danziger

Theorem 14 Expectation is a linear function. That is given any random variables X and Y then:

  • 1. For any constant c ∈ R, E(cX) = cE(X).
  • 2. E(X + Y ) = E(X) + E(Y )

Proof:

  • 1. Let c ∈ R and X be a random variable with

possible outcomes x1, x2, . . . , xn. E(cX) =

n

  • i=1

cxiP(xi) = c

n

  • i=1

xiP(xi) = cE(X).

  • 2. X and Y

be a random variables with possi- ble outcomes x1, x2, . . . , xn and y1, y2, . . . , yn re- spectively. E(X + Y ) =

n

i=1 (xiP(xi) + yiP(yi))

=

n

i=1 xiP(xi)

  • +

n

i=1 yiP(yi)

  • =

E(X) + E(Y ). 25

slide-26
SLIDE 26

6.8, 6.9 Probability

  • P. Danziger

Given a random variable with a given mean, we would like to know how variable it is, how close the variable stays to the mean We measure this by calculating average deviation from the mean of the variable. Definition 15 (Variance) Given a (discrete) ran- dom variable X : S − → {a1, a2, . . . , an} the variance

  • f X is given by

σ2 = E(X − E(X)). σ2 is the average deviation from the mean of the variable. Theorem 16 Given a binomial distribution with probability p of success, Let Xn = the number of successes in n trials, then E(Xn) = np and σ2 = npq. 26

slide-27
SLIDE 27

6.8, 6.9 Probability

  • P. Danziger

Example 17

  • 1. Consider the following two games:

(a) We flip a coin. If the result is heads we win $7, if tails we lose. (b) We roll a dice, we win whatever the dice shows in dollars. In both cases the expected value (mean) is $3.50. However in the first case the variance is 24.5, whereas in the second it is 3.5. The lower variance effectively means that there is less volatility in the game. 27

slide-28
SLIDE 28

6.8, 6.9 Probability

  • P. Danziger

Bayes Theorem

Theorem 18 (Bayes Theorem) Given a sample space which is a union of disjoint sets B1, B2, . . . Bn and let A ⊆ S be an event such that P(A) = 0. Then for any integer k with 0 < k ≤ n P(Bk|A) = P(A|Bk)P(Bk)

P(A|B1)P(B1)+P(A|B2)P(B2)+...+P(A|Bn)P(Bn)

Example 19

  • 1. There are three urns, each containing coloured

balls: the first urn contains 3 red, 4 white and 1 blue balls, the second contains 3 red, 1 white and 4 blue balls and the third 2 red, 3 white and 5 blue balls. Suppose that an urn is chosen at random and a ball is randomly chosen from that urn. Given that a red ball was selected, what is the prob- ability that urn 1 was chosen? 28

slide-29
SLIDE 29

6.8, 6.9 Probability

  • P. Danziger

Let A be the set of red balls and Bk the set of balls in urn k. P(Bk) = 1

3 P(A|B1) = 3 8, P(A|B2) = 3 8, P(A|B3) = 2 10.

P(A|B1) =

3 8·1 3 3 8·1 3+3 8·1 3+ 2 10·1 3

=

3 24 3 24+ 3 24+ 1 15

=

3 6+24

15

≈ 0.27778

  • 2. Bayes’ Theorem is often applied to medical
  • tests. A false positive occurs when a test indi-

cates that the patient has the condition when in fact they do not. A false negative occurs when a test indicates that they do not have the condition when in fact they do. Generally the tested condition is rare. This means that if a random person is tested they are much more likely to not have the disease than to have it. 29

slide-30
SLIDE 30

6.8, 6.9 Probability

  • P. Danziger

Suppose that a disease affects 1 in every 1000

  • people. A diagnostic test has been developed

with a 4.5% false positive rate and a 1% false negative rate. If a patient tests positive, what is the probability that they actually have the disease? The sample space S is the set of results of all people screened with the test. Let A be the event of a positive result. Let B1 be the event that person has the disease and B2 that they don’t. Now, P(Ac|B1) = 0.01 - the false negative rate, so P(A|B1) = 0.99. P(A|B2) = 0.045 - the false positive rate. Also since 1 in 1000 people have the disease P(B1) = 0.001 and P(B2) = 0.999 P(A|B1) =

0.99×0.001 0.99×0.001+0.045×0.999

=

0.00099 0.00099+0.044955 ≈ 0.02155

So if a random person tests positive there is only a roughly 2% chance that they actually have the disease. 30

slide-31
SLIDE 31

6.8, 6.9 Probability

  • P. Danziger

The Birthday Problem

There are n people in a room, what is the probabil- ity that at least 2 of them have the same birthday? It is actually much easier to calculate the comple- mentary event: What is the probability that in a group of n people no two have the same birthday? How many ways can we choose n different days from 365? The first person chooses a birthday, out of 365, the next now can choose out of the remaining 364 days and so on. P(365, n) = 365! (365 − n)! The number of ways of choosing any set of n birth- days is 365n. 31

slide-32
SLIDE 32

6.8, 6.9 Probability

  • P. Danziger

So the chance of having two birthdays the same is 1 − P(365, n) 365n n P(n) 2 0.00274 3 0.00820 4 0.01636 5 0.02714 6 0.04046 7 0.05624 8 0.07434 9 0.09462 10 0.11695 11 0.14114 12 0.16702 13 0.19441 14 0.22310 15 0.25290 16 0.28360 17 0.31501 18 0.34691 19 0.37912 20 0.41144 n P(n) 21 0.44369 22 0.47570 23 0.50730 24 0.53834 25 0.56870 26 0.59824 27 0.62686 28 0.65446 29 0.68097 30 0.70632 31 0.73045 32 0.75335 33 0.77497 34 0.79532 35 0.81438 36 0.83218 37 0.84873 38 0.86407 39 0.87822 40 0.89123 n P(n) 41 0.90315 42 0.91403 43 0.92392 44 0.93289 45 0.94098 46 0.94825 47 0.95477 48 0.96060 49 0.96578 50 0.97037 51 0.97443 52 0.97800 53 0.98114 54 0.98388 55 0.98626 56 0.98833 57 0.99012 58 0.99166 59 0.99299 60 0.99412 32

slide-33
SLIDE 33

6.8, 6.9 Probability

  • P. Danziger

Some surprising results: Amongst 23 people the probability of two having the same birthday is over 50% For 32 people the probability is over 3

4

If 60 people are in a room the probability that 2

  • f them share a birthday is over 99%!

33

slide-34
SLIDE 34

6.8, 6.9 Probability

  • P. Danziger

Pseudo random numbers

In many (computer) applications it is desirable to have a sequence of random numbers. Unfortu- nately such a sequence is not readily available. In this case we generate a sequence of numbers, xn, which (we hope) are random in that any number within a given range may occur with equal likely- hood. Considerable effort has gone into finding and eval- uating the efficacy of such sequences and for find- ing measures of their randomness. A common method is the linear-congruential method for pseudo-random number generation. In this case x0 is the initial seed, usually taken from the system timer. xn = (axn−1 + c) mod k, n.0, where a, c and k are carefully chosen constants. The value xn/k will yield a number between 0 and 1. 34

slide-35
SLIDE 35

6.8, 6.9 Probability

  • P. Danziger

Example 20 Microsoft Visual Basic uses the linear-congruential method for pseudo-random number generation in the RND function. In this case a = 1140671485, c = 12820163 and k = 224. 35