Binomial Distribution Binomial Experiment 1 The same experiment is - - PowerPoint PPT Presentation

binomial distribution
SMART_READER_LITE
LIVE PREVIEW

Binomial Distribution Binomial Experiment 1 The same experiment is - - PowerPoint PPT Presentation

Binomial Distribution Binomial Experiment 1 The same experiment is repeated a fixed number of times. 2 There are only two possible outcomes, success and failure.; P ( success ) = p , P ( failure ) = 1 p . 3 The repeated trials are independent,


slide-1
SLIDE 1

Binomial Distribution

Binomial Experiment

1 The same experiment is repeated a fixed number of times. 2 There are only two possible outcomes, success and failure.;

P( success ) = p, P( failure ) = 1 − p.

3 The repeated trials are independent, so that the probability of success

remains the same for each trial. The Binomial Distribution is P( exactly k successes in n trials) = pk(1 − p)n−kC(n, k). Examples are a 2 showing in a rool of dice, or H in a toss of coins from before, but NOT birthdays.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 1 / 22

slide-2
SLIDE 2

Binomial Distribution, Examples I

Example (#48 Section 8.4)

A hospital receives 1/5 = 0.2 of its flu vaccine shipments from Company X and the remainder of its shipments from other companies. Each shipment contains a very large number of vaccine vials. For Company X’s shipments, 10% of the vials are ineffective. For every other company, 2%

  • f the vials are ineffective. The hospital tests 30 randomly selected vials

from a shipment and finds that one vial is ineffective. What is the prob- ability that this shipment came from Company X?

Dan Barbasch Math 1105 Chapter 8 Week of September 17 2 / 22

slide-3
SLIDE 3

Binomial Distribution, Examples II

Answer.

For X, p = 0.1 and 1 − p = 0.9, for NX, p = 0.02 and 1 − p = 0.98. P(X) = 0.2 P(NX) = 0.8 P(D | X) = 0.1 P(D | NX) = 0.02 P(1D/30 | X) = C(30, 1) × (0.1)1 × (0.9)29 P(1D/30 | NX) = C(30, 1) × (0.02)1 × (0.98)29 Draw the usual tree diagram for Bayes’s theorem and compute. P(X | 1D/30) = P(1D/30 and X) P(1D/30) = = C(30, 1) · (0.1)1 · (0.9)29 · 0.2 C(30, 1) · (0.1)1 · (0.9)29 · 0.2 + C(30, 1) · (0.02)1 · (0.98)29 · 0.8. The answer is (close to) 0.1.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 3 / 22

slide-4
SLIDE 4

Pascal’s Triangle I

The triangular array of numbers shown below is called Pascals triangle in honor of the French mathematician Blaise Pascal (1623 - 1662), who was

  • ne of the first to use it extensively. The triangle was known long before

Pascals time and appears in Chinese and Islamic manuscripts from the eleventh century. 1 1 1 1 2 1 1 3 3 1 1 4 6 4 1 1 5 10 10 5 1 The array provides a quick way to find binomial probabilities. The nth row

  • f the triangle, where n = 0, 1, 2, 3, . . . , gives the coefficients C(n, r) for

r = 0, 1, 2, 3, . . . , n. For example, for n = 4, 1 = C(4, 0), 4 = C(4, 1), 6 = C(4, 2), and so on. Each number in the

Dan Barbasch Math 1105 Chapter 8 Week of September 17 4 / 22

slide-5
SLIDE 5

Pascal’s Triangle II

triangle is the sum of the two numbers directly above it. For example, in the row for n = 4, 1 is the sum of 1, the only number above it, 4 is the sum 1 + 3, 6 = 3 + 3 and so on. The general formula is C(n, r) = C(n − 1, r − 1) + C(n − 1, r). Choosing r out of n is the same as the sum of choose r out of n − 1 (make the choice of all r out of 1, . . . , n − 1 plus choose r − 1 out of n − 1 (choose n and then r − 1 out of n − 1).

Dan Barbasch Math 1105 Chapter 8 Week of September 17 5 / 22

slide-6
SLIDE 6

Example, Sports I

In many sports championships, such as the World Series in baseball and the Stanley Cup final series in hockey, the winner is the first team to win four games. For this exercise, assume that each game is independent of the others, with a constant probability p that one specified team (say, the National League team) wins.

  • a. Find the probability that the series lasts for four, five, six, and seven

games when p = 0.5.

  • b. Morrison and Schmittlein have found that the Stanley Cup finals can be

described by letting p = 0.73 be the probability that the better team wins each game. Find the probability that the series lasts for four, five, six, and seven games. Source: Chance.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 6 / 22

slide-7
SLIDE 7

Example, Sports II

Answer.

P( End in exactly 4 ) = P(AAAA and BBBB} = 2 · (0.5)4, P( End in exactly 5 ) = C(4, 1)(0.5)5 + C(4, 3)(0.5)5, P( End in exactly 6 ) = C(5, 2) · (0.5)6 + C(5, 3)(0.5)6, P( End in exactly 7 ) = C(6, 3)(0.5)7 + C(6, 3)(0.5)7. From the triangle, C(4, 1) = C(3, 1) + C(3, 0) = 3 + 1 = 4, C(4, 3) = C(3, 3) + C(3, 2) = 1 + 3 = 4, C(5, 2) = C(4, 2) + C(4, 1) = 6 + 4 = 10, C(5, 3) = C(4, 3) + C(4, 2) = 4 + 6 = 10, C(6, 3) = C(5, 3) + C(5, 2) = 10 + 10 = 20.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 7 / 22

slide-8
SLIDE 8

Fermat and Pascal I

Fand P are playing a game. They toss a coin, p = P(H) = 0.3. F wins if H, P wins if T. F leads 8 to 7. What is the probability that the game ends whenever one reaches 20. What is the probability the game ends after another

1 12 2 20

tosses?

Dan Barbasch Math 1105 Chapter 8 Week of September 17 8 / 22

slide-9
SLIDE 9

Fermat and Pascal II

Answer.

For (1), p12. Only F can win. For (2) C(19, 12)p12(1 − p)8 + C(19, 13)p7(1 − p)13. The sum of probabilities that F wins and that P wins.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 9 / 22

slide-10
SLIDE 10

Random Variables I

Random Variable A random variable X is a function that assigns a real number to each outcome of an experiment. Probability Distribution The probability distribution of a radom variable is {P(X = k) = pk} with 0 ≤ pk ≤ 1 and the sum of pk, n

k=0 = 1. This definition is for when X takes finitely many

values only. Expected Value E(X) =

k kP(X = k).

Example

Toss a coin. Let X = 1 if H, and X = 0 if T. The coin satisfies P(H) = p and P(T) = 1 − p. The probability distribution is P(X = 1) = 1 and P(X = 0) = 1 − p. Then EX = 1 · p + 0 · (1 − p) = p. If X = 1 if H, and X = −1 if T, then EX = 1 · p + (−1) · (1 − p) = 2p − 1.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 10 / 22

slide-11
SLIDE 11

Random Variables II

Example

Toss two fair dice. Let X be the sum of the faces. P(X = 2) = 1/36 P(X = 3) = 2/36 P(X = 4) = 3/36 P(X = 5) = 4/36 P(X = 6) = 5/36 P(X = 7) = 6/36 P(X = 8) = 5/36 P(X = 9) = 4/36 P(X = 10) = 3/36 P(X = 11) = 2/36 P(X = 12) = 1/36 Then EX = 2 · 1/36 + 3 · 2/36 + 4 · 3/36 + 5 · 4/36 + 6 · 5/36 + 7 · 6/36 + 8 · 5/36 + 9 · 4/36 + 10 · 3/36 + 11 · 2/36 + 12 · 1/36= 7.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 11 / 22

slide-12
SLIDE 12

Motivation

Suppose you have a coin that comes up H 30% of times and T 70% of

  • times. You get paid $2 if H, and you pay out $1 if T. What do you expect

to have after 100 tosses? The intuition says it should be the average, 2 · 30 − 1 · 70 = −10. For one toss you’d expect 2 · 0.3 + (−1) · 0.7 = −0.1. Repeat a 100 times, and you expect to have lost $10. The mathematics is the Expected Value. We interpret P(H) = 0.3 and P(T) = 0.7. The expected value is EX = 2 · P(H) + (−1) · P(T). For n tosses you expect nEX.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 12 / 22

slide-13
SLIDE 13

Expected Value of a Sum of Independent Variables I

Definition

Two random variables X1, X2 are called independent if P(X1 = a, X2 = b) = P(X1 = a) · P(X2 = b). More general, X1, . . . Xn are called independent if P(Xi1 = a1, . . . Xik = ak) = P(Xi1 = i1) · P(Xik = ak) for any choice of a subset of the variables.

Theorem

Let X1, . . . , Xn be independent random variables. Then E(X1 + · · · + Xn) = EX1 + · · · + EXn We ilustrate the proof for the case n = 2, E(X1 + X2) = EX1 + EX2. This is the warmup. P(X1 = a1) = p P(X1 = a2) = 1 − p P(X2 = b1) = q P(X2 = b2) = 1 − q.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 13 / 22

slide-14
SLIDE 14

Expected Value of a Sum of Independent Variables II

EX1 = a1p+ a2(1 − p) EX2 = b1q+ b2(1 − q) E(X1 + X2) = (a1 + b1)pq+ (a1 + b2)p(1 − q)+ (a2 + b1)(1 − p)q+ (a2 + b2)(1 − p)(1 − q) Gather the terms according to the a′s and b′s, and do the algebra. a1(pq + p(1 − q)) = a1p a2((1 − p)q + (1 − p)(1 − q)) = a2(1 − p) b1((1 − p)q + pq) = b1q b2(p(1 − q) + 1 − p)(1 − q)) = b2(1 − q).

Dan Barbasch Math 1105 Chapter 8 Week of September 17 14 / 22

slide-15
SLIDE 15

Expected Value of a Sum of Independent Variables III

Example (Binomial distribution)

For n independent identical trials each with two possible outcomes, S and F, with probability p and 1 − p, X the number of S, the distribution is P(X = k) = C(n, k)pk(1 − p)n−k. The expected value is EX =

n

  • k=0

kC(n, k)pk(1 − p)n−k= np. For a single trial, EX = p · 1 + 0 · (1 − p) = p. The general case can be computed directly using algebra.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 15 / 22

slide-16
SLIDE 16

Binomial Distribution

We apply this to the binomial distribution, X1, . . . , Xn i.i.d. (independent identically distributed random variables) with probability distribution P(X = 1) = p, P(X = 0) = 1 − p. Then EX = 1 · p + 0 · (1 − p) = p. So E(X1 + · · · + Xn) = p + · · · + p

  • n

= np. The case of two dice is similar; X = X1 + X2. Then EX1 = EX2 = 1·1/6+2·1/6+3·1/6+4·1/6+5·1/6+6·1/6 = 21/6 = 7/2. So E(X1 + X2) = 7/2 + 7/2 = 7.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 16 / 22

slide-17
SLIDE 17

Statistics, Measures of Central Tendency I

We are considering a random variable X with a probability distribution which has some parameters. We want to get an idea what these parameters are. We perfom an experiment n times and record the

  • utcome. This means we have X1, . . . , Xn i.i.d. random variables, with

probability distribution same as X. We want to use the outcome to infer what the parameters are. Mean The outcomes are x1, . . . , xn. The Sample Mean is x := x1+···+xn

n

. Also sometimes called the average. The expected value of X, EX, is also called the mean of X. Often denoted by µ. Sometimes called population mean. Median The number so that half the values are below, half above. If the sample is of even size, you take the average of the middle terms. Mode The number that occurs most frequently. There could be several modes, or no mode.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 17 / 22

slide-18
SLIDE 18

Statistics, Measures of Central Tendency II

Example

You have a coin for which you know that P(H) = p and P(T) = 1 − p. You would like to estimate p. You toss it n times. You count the number

  • f heads. The sample mean should be an estimate of p.

EX = p, and E(X1 + · · · + Xn) = np. So E X1 + · · · + Xn n

  • = p.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 18 / 22

slide-19
SLIDE 19

Descriptive Statistics I

Frequency Distribution Divide into a number of equal disjoint intervals. For each interval count the number of elements in the sample occuring. Histogram see the next slide Grouped Data Mean Essentially calculate the mean of the frequency

  • distribution. Intervals are used, rather than single values. It

is assumed that all these values are located at the midpoint

  • f the interval. The letter xM is used to represent the

midpoints and f represents the frequencies: xMf n Frequency Polygon Connect the middles of the tops of each interval.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 19 / 22

slide-20
SLIDE 20

Histogram

A histogram is a graphical representation of the distribution of numerical

  • data. It is a kind of bar graph. To construct a histogram, the first step is

to ”bin” the range of values, that is, divide the entire range of values into a series of intervals, and then count how many values fall into each

  • interval. The bins are usually specified as consecutive, non-overlapping

intervals of a variable. The bins (intervals) must be adjacent, and are

  • ften (but are not required to be) of equal size.

Bin Count −3.5 − 2.51 9 −2.5 − 1.51 32 −1.5 − 0.51 109 −0.5 − 0.49 180 0.5 − 1.49 132 1.5 − 2.49 34 2.5 − 3.49 4 Mean:

(−3)·9+(−2)·32+(−1)·109+·(0)180+1·132+2·34+3·4 500

Dan Barbasch Math 1105 Chapter 8 Week of September 17 20 / 22

slide-21
SLIDE 21

Example

The table on the next page gives the number of days in June and July of recent years in which the temperature reached 90 degrees or higher in New Yorks Central Park. Source: The New York Times and Accuweather.com.

  • a. Prepare a frequency distribution with a column for intervals and
  • frequencies. Use seven intervals, starting with [0 4].
  • b. Sketch a histogram and a frequency polygon, using the intervals in part

a.

  • c. Find the mean for the original data.
  • d. Find the mean using the grouped data from part a.
  • e. Explain why your answers to parts c and d are different.
  • f. Find the median and the mode for the original data.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 21 / 22

slide-22
SLIDE 22

Temperature Data

fi 1972 11 1985 4 1998 5 1973 8 1986 8 1999 24 1974 11 1987 14 2000 3 1975 3 1988 21 2001 4 1976 8 1989 10 2002 13 1977 11 1990 6 2003 11 1978 5 1991 21 2004 1 1979 7 1992 4 2005 12 1980 12 1993 25 2006 5 1981 12 1994 16 2007 4 1982 11 1995 14 2008 10 1983 20 1996 2009 1984 7 1997 10 2010 20 Year Days Year Days Year Days

Dan Barbasch Math 1105 Chapter 8 Week of September 17 22 / 22