Independence and Conditional Probability August 5, 2019 August 5, - - PowerPoint PPT Presentation

independence and conditional probability
SMART_READER_LITE
LIVE PREVIEW

Independence and Conditional Probability August 5, 2019 August 5, - - PowerPoint PPT Presentation

Independence and Conditional Probability August 5, 2019 August 5, 2019 1 / 79 Midterm The Midterm is next week Tuesday, August 13. Approximately 50 multiple choice questions. You do not need a scantron. Questions will be mostly conceptual.


slide-1
SLIDE 1

Independence and Conditional Probability

August 5, 2019

August 5, 2019 1 / 79

slide-2
SLIDE 2

Midterm

The Midterm is next week Tuesday, August 13. Approximately 50 multiple choice questions. You do not need a scantron. Questions will be mostly conceptual. You may bring any basic or graphing calculator. I will bring extra scratch paper.

Section 3.1 August 5, 2019 2 / 79

slide-3
SLIDE 3

Extra Credit Opportunity

Write an exam question that would be appropriate for your midterm. The midterm will cover material from Chapters 1, 2, and 3. Your exam question must come from material covered in class, your homeworks, or your labs. Questions may be either multiple choice or short answer. To receive any credit, you must write an original question and provide both the question and the correct answer. These can be submitted on iLearn (Assignments tab). It opens today at 9:30am and will close on Thursday at 11:59pm.

Section 3.1 August 5, 2019 3 / 79

slide-4
SLIDE 4

Independence

Independence of random processes is similar to independence of variables and observations. We say that two random processes are independent if knowing the outcome of one provides no useful information about the

  • utcome of the other.

Section 3.1 August 5, 2019 4 / 79

slide-5
SLIDE 5

Independence

For example, consider our discussion on rolling 2 six-sided dice. The roll of the first die has no effect on the roll of the second die. Thus our two dice rolls are independent of one another.

Section 3.1 August 5, 2019 5 / 79

slide-6
SLIDE 6

Independence

We’ve already calculated the probability of the two rolls both being a 1 1/6 of the time the first roll is a 1 A further 1/6 of those times the second is also a 1. So we decided that the probability was (1/6) × (1/6) = 1/36. Multiplying these probabilities together works because the two events are independent.

Section 3.1 August 5, 2019 6 / 79

slide-7
SLIDE 7

Multiplication Rule for Independent Processes

Let A and B be events from two different and independent processes. Then the probability that both A and B occur can be calculated as the product of their separate probabilities: P(A and B) = P(A) × P(B) Similarly, if there are k events A1, . . . , Ak from k independent processes, then the probability they all occur is P(A1) × P(A2) × · · · × P(Ak)

Section 3.1 August 5, 2019 7 / 79

slide-8
SLIDE 8

Example

About 9% of people are left-handed. Suppose 2 people are selected at random from the U.S. population. Because the sample size of 2 is very small relative to the population, it is reasonable to assume these two people are independent.

1 What is the probability that both are left-handed? 2 What is the probability that both are right-handed? Section 3.1 August 5, 2019 8 / 79

slide-9
SLIDE 9

Example: Both Left-Handed

What is the probability that both are left-handed? Let L1 be the event that the first person is left-handed and L2 the event that the second person is left-handed. We are told that 9% of people are left-handed, so P(L1) = P(L2) = 0.09.

Section 3.1 August 5, 2019 9 / 79

slide-10
SLIDE 10

Example: Both Left-Handed

What is the probability that both are left-handed? We are assuming that these people are independent, so we can use the multiplication rule: P(L1 and L2) = P(L1) × P(L2) = (0.09) × (0.09) = 0.0081

  • r 0.81% (this is highly unlikely!)

Section 3.1 August 5, 2019 10 / 79

slide-11
SLIDE 11

Example: Both Right-Handed

What is the probability that both are right-handed? First, assume that everyone is either right- or left-handed. Then Lc

1 is the event that the first person is right-handed and Lc 2

is the event that the second person is right-handed. From the previous slide, we decided that P(L1) = P(L2) = 0.09 So P(Lc

1) = 1 − P(L1) = 1 − 0.09 = 0.91 and P(Lc 2) = 0.91

Section 3.1 August 5, 2019 11 / 79

slide-12
SLIDE 12

Example: Both Right-Handed

What is the probability that both are right-handed? We are still assuming that these people are independent, so we can again use the multiplication rule: P(Lc

1 and Lc 2) = P(Lc 1) × P(Lc 2)

= (0.91) × (0.91) = 0.8281

  • r 82.81%.

Section 3.1 August 5, 2019 12 / 79

slide-13
SLIDE 13

Disjoint Events - Independent?

If two events are disjoint, are they independent?

Section 3.1 August 5, 2019 13 / 79

slide-14
SLIDE 14

Disjoint Events- Independent?

If two events are disjoint, are they independent? Recall that independent events have no relationship with one another. This means that if we know something about event A, we don’t get any information about event B. For disjoint events, if event A occurs, we can be totally certain that event B did not occur. Therefore they are dependent.

Section 3.1 August 5, 2019 14 / 79

slide-15
SLIDE 15

Example

Consider two disjoint events for rolling a six-sided die. Let A = {1} be the event that I roll a 1 and B = {2} the event that I roll a 2. If I know that A occurred, then I can be 100% sure that B did not

  • ccur.

If I know that A did not occur, then I know that the roll must be a 2, 3, 4, 5, or 6.

Now there are five possible options instead of six! We’ve narrowed down our options, so knowing that I did not roll a 1 has given us some useful information.

Therefore A and B can’t be independent.

Section 3.1 August 5, 2019 15 / 79

slide-16
SLIDE 16

Conditional Probability

We can get far more information out of the relationships between multiple variables than we can from a single variable. For example Recall our case study on the malaria vaccine. We can look at P(infection), but that doesn’t tell us anything about the efficacy of the vaccine. Instead, we want to look at the probability that a person develops infection if they were vaccinated. We compare this to the probability that a person develops infection if they were not vaccinated.

Section 3.2 August 5, 2019 16 / 79

slide-17
SLIDE 17

Contingency Table Probabilities

Let’s consider a data set on a machine learning classifier. The classifier is designed to take images and determine whether each one is about fashion. The classifier groups 1822 photos into either ”fashion” or ”not fashion”. Separately, these photos are grouped into ”fashion” and ”not fashion” by a group of people.

We take these groupings as the truth that the classifier is trying to get at.

Section 3.2 August 5, 2019 17 / 79

slide-18
SLIDE 18

Contingency Table Probabilities

We can take these groupings and build them into a contingency table. truth Fashion Not Total classifier Fashion 197 22 219 Not 112 1491 1603 Total 309 1513 1822

Section 3.2 August 5, 2019 18 / 79

slide-19
SLIDE 19

Contingency Table Probabilities

We think about this a lot with classification problems! truth fashion not fashion Total classifier pred fashion 197 22 219 pred not 112 1491 1603 Total 309 1513 1822 When we build our classifier, we want to know the rate at which it correctly and incorrectly identifies fashion and not fashion. This will give us an idea of how successful our classifier is.

Is it a good classifier? Should we try a different machine learning algorithm?

Section 3.2 August 5, 2019 19 / 79

slide-20
SLIDE 20

Example: Contingency Table Probabilities

1 If the photo is actually about fashion, what is the probability that

the classifier correctly identified it as being about fashion?

2 If the classifier predicted that a photo was not about fashion, what

is the probability that it was incorrect?

Section 3.2 August 5, 2019 20 / 79

slide-21
SLIDE 21

Example: Contingency Table Probabilities

If the photo is actually about fashion, what is the probability that the classifier correctly identified it as being about fashion? truth fashion not fashion Total classifier pred fashion 197 22 219 pred not 112 1491 1603 Total 309 1513 1822 We know that the photo is actually about fashion, so we focus our attention to the column where truth is fashion. Then within this column, we look for the number of times the classifier pred fashion out of the total number of fashion photos.

Section 3.2 August 5, 2019 21 / 79

slide-22
SLIDE 22

Example: Contingency Table Probabilities

If the photo is actually about fashion, what is the probability that the classifier correctly identified it as being about fashion? truth fashion not fashion Total classifier pred fashion 197 22 219 pred not 112 1491 1603 Total 309 1513 1822 P(classifier is pred fashion given truth is fashion) = 197 309

  • r 0.638, a reasonable correct identification rate for fashion.

Section 3.2 August 5, 2019 22 / 79

slide-23
SLIDE 23

Example: Contingency Table Probabilities

If the classifier predicted that a photo was not about fashion, what is the probability that it was incorrect? truth fashion not fashion Total classifier pred fashion 197 22 219 pred not 112 1491 1603 Total 309 1513 1822 We know that classifier is pred not fashion, so we focus our attention to this row. We want to know the probability that it was incorrect, or in truth is fashion.

Section 3.2 August 5, 2019 23 / 79

slide-24
SLIDE 24

Example: Contingency Table Probabilities

If the classifier predicted that a photo was not about fashion, what is the probability that it was incorrect? truth fashion not fashion Total classifier pred fashion 197 22 219 pred not 112 1491 1603 Total 309 1513 1822 P(truth is fashion given classifier is pred not) = 112 1603

  • r 0.070, a low misidentification rate for fashion photos.

Section 3.2 August 5, 2019 24 / 79

slide-25
SLIDE 25

Marginal and Joint Probabilities

truth fashion not fashion Total classifier pred fashion 197 22 219 pred not 112 1491 1603 Total 309 1513 1822 We’ve now used our contingency table to think about two types of probabilities.

The probability for a single event (from the row and column of totals). The probability for multiple events together (from the numbers in the middle).

Section 3.2 August 5, 2019 25 / 79

slide-26
SLIDE 26

Marginal Probabilities

A marginal probability is a probability based on a single variable. Think of the margins as the edges of a contingency table where we have the information for each variable individually.

Section 3.2 August 5, 2019 26 / 79

slide-27
SLIDE 27

Marginal Probabilities

truth fashion not fashion Total classifier pred fashion 197 22 219 pred not 112 1491 1603 Total 309 1513 1822 A probability based solely on our classifier is a marginal probability. It is based on a single variable without regard to any other variables. P(classifier is pred fashion) = 219/1822

Section 3.2 August 5, 2019 27 / 79

slide-28
SLIDE 28

Joint Probabilities

A joint probability is a probability for two or more variables together. Think of this as a probability that two or more variables occur jointly (together).

Section 3.2 August 5, 2019 28 / 79

slide-29
SLIDE 29

Joint Probabilities

truth fashion not fashion Total classifier pred fashion 197 22 219 pred not 112 1491 1603 Total 309 1513 1822 The probability that our classifier is pred fashion and the truth is fashion is a joint probability. It is based on two variables together. P(classifier is pred fashion and truth is fashion) = 197/1822

Section 3.2 August 5, 2019 29 / 79

slide-30
SLIDE 30

Table Proportions

We can examine marginal and joint probabilities using table

  • proportions. Table proportions are computed by dividing each count

in a contingency table by the table’s grand total. truth fashion not fashion Total classifier pred fashion 0.108 0.012 0.120 pred not 0.062 0.818 0.880 Total 0.170 0.830 1.000

Section 3.2 August 5, 2019 30 / 79

slide-31
SLIDE 31

Joint Probability Distributions

A joint probability distribution is just a probability distribution for multiple variables together.

Joint Outcome Probability classifier is pred fashion and truth is fashion 0.108 classifier is pred fashion and truth is not fashion 0.012 classifier is pred not and truth is fashion 0.062 classifier is pred not and truth is not fashion 0.818 Total 1.000

Note: A marginal probability distribution is the type of probability distribution we introduced last week!

Section 3.2 August 5, 2019 31 / 79

slide-32
SLIDE 32

Marginal and Joint Probabilities

We can compute marginal probabilities using joint probabilities.

Joint Outcome Probability classifier is pred fashion and truth is fashion 0.108 classifier is pred fashion and truth is not fashion 0.012 classifier is pred not and truth is fashion 0.062 classifier is pred not and truth is not fashion 0.818 Total 1.000

For example, P(truth is fashion) =P(classifier is pred fashion and truth is fashion) + P(classifier is pred not and truth is fashion) =0.108 + 0.062 =0.170

Section 3.2 August 5, 2019 32 / 79

slide-33
SLIDE 33

Marginal and Joint Probabilities

This makes sense based on our table proportions! truth fashion not fashion Total classifier pred fashion 0.108 0.012 0.120 pred not 0.062 0.818 0.880 Total 0.170 0.830 1.000 All of these numbers are directly proportional to our original contingency table. The row and column of totals represent the marginal probabilities. These totals are the actual sums of their respective rows/columns.

Section 3.2 August 5, 2019 33 / 79

slide-34
SLIDE 34

Defining Conditional Probability

The classifier predicts whether a photo is about fashion, but it is not perfect. We’d like to know how we can use these predictions to improve

  • ur understanding of the second variable, the truth.

We might want to know, for example, the probability that the truth is fashion given that the classifier predicts fashion.

Section 3.2 August 5, 2019 34 / 79

slide-35
SLIDE 35

Defining Conditional Probability

The probability that a random photo from the data set is actually about fashion is 0.17. Suppose we know that classifier is pred fashion. Now we can get a better estimate of the probability that the truth is fashion. We do this by restricting our attention to the 219 cases where the classifier is pred fashion. Then we look at the fraction of these photos where the truth is fashion (197 cases). P(truth is fashion given classifier is pred fashion) = 197 219

Section 3.2 August 5, 2019 35 / 79

slide-36
SLIDE 36

Defining Conditional Probability

When we are given some useful information that allows us to restrict our attention, we call these probabilities conditional probabilities. We can say that we condition based on some given information, or that we computed the probability under the condition that the classifier is pred fashion.

Section 3.2 August 5, 2019 36 / 79

slide-37
SLIDE 37

Defining Conditional Probability

There are two important aspects to a conditional probability:

1 The outcome of interest is whatever we want to know about. 2 The condition is information we know to be true, a known

  • utcome or event.

Section 3.2 August 5, 2019 37 / 79

slide-38
SLIDE 38

Conditional Probability Notation

We separate our outcome of interest from our condition in our probability notation with a vertical bar: P(truth is fashion given classifier is pred fashion) becomes P(truth is fashion | classifier is pred fashion) = 197 219 We read the vertical bar as the word given.

Section 3.2 August 5, 2019 38 / 79

slide-39
SLIDE 39

Defining Conditional Probability

Earlier, we computed P(truth is fashion given classifier is pred fashion) = 0.900 by restricting our attention to the data where classifier is pred fashion. From this row where classifier is pred fashion, we took the number of cases where truth is fashion and divided by the row total to get our answer.

Section 3.2 August 5, 2019 39 / 79

slide-40
SLIDE 40

Defining Conditional Probability

However, we don’t always have access to the count data. Instead we are given only the probabilities. truth fashion not fashion Total classifier pred fashion 0.108 0.012 0.120 pred not 0.062 0.818 0.880 Total 0.170 0.830 1.000

Section 3.2 August 5, 2019 40 / 79

slide-41
SLIDE 41

Defining Conditional Probability

Suppose we took a sample of 1000 photos. We could multiply each probability by 1000 to get an estimate of how many would fall into each place in our contingency table. We would anticipate 0.120 × 1000 = 120 to be the number of cases where classifier is pred fashion. We would expect to see 0.108 × 1000 = 108 cases where truth is fashion and classifier is pred fashion

Section 3.2 August 5, 2019 41 / 79

slide-42
SLIDE 42

Defining Conditional Probability

We can use these numbers to compute our conditional probability. (Using our count data, we found 197/219 = 0.90.) P(truth is fashion given classifier is pred fashion) = # cases (truth is fashion and classifier is pred fashion) # cases (classifier is pred fashion) = 108 120 = 0.108 × 1000 0.120 × 1000 = 0.108 0.120 = 0.90

Section 3.2 August 5, 2019 42 / 79

slide-43
SLIDE 43

Defining Conditional Probability

This is the ratio, or fraction, or two probabilities. We can rewrite this as P(truth is fashion given classifier is pred fashion) = P(truth is fashion and classifier is pred fashion) P(classifier is pred fashion) = 0.108 0.120 = 0.90

Section 3.2 August 5, 2019 43 / 79

slide-44
SLIDE 44

Defining Conditional Probability

This leads us to the general conditional probability formula: Let A and B be outcomes. The conditional probability of outcome A

  • ccurring given the condition that B has occurred is

P(A|B) = P(A and B) P(B)

Section 3.2 August 5, 2019 44 / 79

slide-45
SLIDE 45

Example

Find the probability that the classifier is incorrect when classifying a photo about fashion.

Section 3.2 August 5, 2019 45 / 79

slide-46
SLIDE 46

Example

Find the probability that the classifier is incorrect when classifying a photo about fashion. We know that the photo is about fashion.

We can write that truth is fashion. This information is given, or our condition.

From that, we want to know the probability that the classifier is wrong.

We want to know the probability that the classifier results in not fashion.

Section 3.2 August 5, 2019 46 / 79

slide-47
SLIDE 47

Example

Find the probability that the classifier is incorrect when classifying a photo about fashion. Putting this all together, we want P(classifier is not fashion | truth is fashion)

Section 3.2 August 5, 2019 47 / 79

slide-48
SLIDE 48

Example

Using our formula P(A|B) = P(A and B) P(B) we let A be the event that classifier is not fashion and B the event that truth is fashion. Then P(classifier is not fashion | truth is fashion) = P(classifier is not fashion and truth is fashion) P(truth is fashion)

Section 3.2 August 5, 2019 48 / 79

slide-49
SLIDE 49

Example

truth fashion not fashion Total classifier pred fashion 0.108 0.012 0.120 pred not 0.062 0.818 0.880 Total 0.170 0.830 1.000 P(classifier is not fashion | truth is fashion) = P(classifier is not fashion and truth is fashion) P(truth is fashion) = 0.062 0.170 = 0.363

Section 3.2 August 5, 2019 49 / 79

slide-50
SLIDE 50

Example: Smallpox

The smallpox data set is a sample of 6224 individuals from the year 1721. inoculated yes no Total result lived 238 5136 5374 died 6 844 850 Total 244 5980 6224

Section 3.2 August 5, 2019 50 / 79

slide-51
SLIDE 51

Example: Smallpox

The smallpox data set has the following table proportions: inoculated yes no Total result lived 0.038 0.825 0.863 died 0.001 0.136 0.137 Total 0.039 0.961 1.000 Let’s find the probability that an inoculated person died from smallpox.

Section 3.2 August 5, 2019 51 / 79

slide-52
SLIDE 52

Example: Smallpox

Find the probability that an inoculated person died from smallpox. We are told that the person is inoculated. This is our condition. We want to know the probability that this person died. This is the probability that a person died given that they were inoculated P(died | inoculated)

Section 3.2 August 5, 2019 52 / 79

slide-53
SLIDE 53

Example: Smallpox

Find the probability that an inoculated person died from smallpox. inoculated yes no Total result lived 0.038 0.825 0.863 died 0.001 0.136 0.137 Total 0.039 0.961 1.000 P(died | inoculated) = P(died and inoculated) P(inoculated) = 0.001 0.039 = 0.026

Section 3.2 August 5, 2019 53 / 79

slide-54
SLIDE 54

General Multiplication Rule

In the previous section, we talked about the multiplication rule for independent events. The general multiplication rule is for all events, whether or not they are independent. Let A and B be any two outcomes or events. Then P(A and B) = P(A|B) × P(B) Notice that this is not new information! This is just a rearrangement of the formula for conditional probability.

Section 3.2 August 5, 2019 54 / 79

slide-55
SLIDE 55

Example

Let’s return to the smallpox data set, but suppose we only have two pieces of information:

1 96.08% of people were not inoculated. 2 85.88% of people who were not inoculated ended up surviving.

Can we compute the probability that a resident was not inoculated and lived?

Section 3.2 August 5, 2019 55 / 79

slide-56
SLIDE 56

Example

Compute the probability that a resident was not inoculated and lived. First, let’s rewrite the information we were given in probability notation. 96.08% of people were not inoculated → P(inoculated = no) = 0.9608 85.88% of people who were not inoculated ended up surviving → P(result = lived | inoculated = no) = 0.8588

Section 3.2 August 5, 2019 56 / 79

slide-57
SLIDE 57

Example

Compute the probability that a resident was not inoculated and lived. Then we use this information with the general multiplication rule. P(result = lived and inoculated = no) = P(result = lived | inoculated = no) × P(inoculated = no) = 0.9608 × 08588 = 0.8251.

Section 3.2 August 5, 2019 57 / 79

slide-58
SLIDE 58

Sum of Conditional Probabilities

Let A1, . . . , Ak represent all the disjoint outcomes for a variable or

  • process. Then if B is some event,

P(A1|B) + · · · + P(Ak|B) = 1 The rule for complements also holds when an event and its complement are conditioned on the same information: P(A|B) = 1 − P(Ac|B) Why are these true? Let’s look at a Venn diagram.

Section 3.2 August 5, 2019 58 / 79

slide-59
SLIDE 59

Independence Considerations

For two independent events, knowing the outcome of one should give us no information about the probability of the other. Consider X and Y , the outcomes for rolling two six-sided dice.

1 Find P(X = 1). 2 Find P(X = 1 and Y = 1). 3 Find P(Y = 1|X = 1).

Knowing the outcome of X doesn’t give us any additional information about Y .

Section 3.2 August 5, 2019 59 / 79

slide-60
SLIDE 60

Independence Considerations

We can use the Multiplication Rule to show that the conditioning information has no influence for independent processes: P(Y = 1|X = 1) = P(Y = 1 and X = 1) P(X = 1) = P(Y = 1)P(X = 1) P(X = 1) = P(Y = 1)

Section 3.2 August 5, 2019 60 / 79

slide-61
SLIDE 61

Example: The Gambler’s Fallacy

A roulette wheel has 18 black slots, 18 red slots, and 2 green slots (38 total slots). Ron is watching a roulette table in a casino and notices that the last five outcomes were black. He figures that the chances of getting black six times in a row is very small (about 1/64) and puts his paycheck on red. What is wrong with his reasoning?

Section 3.2 August 5, 2019 61 / 79

slide-62
SLIDE 62

Example: The Gambler’s Fallacy

What is wrong with Ron’s reasoning? It’s true that there is close to a 1/64 = 0.016 chance that we get black six times in a row.

P(black1) × · · · × P(black5) × P(black6) = (9/19)6 = 0.011

But there’s also a 1/64 chance that we get black five times in a row followed by red.

P(black1) × · · · × P(black5) × P(red6) = (9/19)6 = 0.011

Section 3.2 August 5, 2019 62 / 79

slide-63
SLIDE 63

Example: The Gambler’s Fallacy

What is wrong with Ron’s reasoning? Each spin is independent of the previous spins! This means that each spin has a 18/38 chance of being black! Ron has a 1 − 18

38 = 0.538 chance of losing his entire paycheck.

Section 3.2 August 5, 2019 63 / 79

slide-64
SLIDE 64

Tree Diagrams

Tree diagrams help organize outcomes and probabilities based on the structure of the data. They are especially useful when the data can be put into some kind of sequential structure.

Section 3.2 August 5, 2019 64 / 79

slide-65
SLIDE 65

Tree Diagrams

The smallpox data can be structured this way. We split the data by inoculation (yes or no). Then we split by result (lived or died).

Section 3.2 August 5, 2019 65 / 79

slide-66
SLIDE 66

Tree Diagrams

Section 3.2 August 5, 2019 66 / 79

slide-67
SLIDE 67

Tree Diagrams

The first branch, for inoculation, is called the primary branch. All other branches, in this case for result are secondary branches.

Section 3.2 August 5, 2019 67 / 79

slide-68
SLIDE 68

Tree Diagrams

The probabilities for the primary branch are marginal.

For inoculation is yes, the marginal probability is P(inoculation is yes) = 0.0392.

The probabilities for the secondary branches are conditional.

For result is lived on the inoculation is yes branch, we have P(result is lived | inoculation is yes) = 0.9754

Section 3.2 August 5, 2019 68 / 79

slide-69
SLIDE 69

Tree Diagrams

Joint probabilities are shown to the right of each secondary branch. These are computed using the General Multiplication Rule P(A and B) = P(A|B) × P(B) where the primary branch represents event B and the secondary branch event A.

Section 3.2 August 5, 2019 69 / 79

slide-70
SLIDE 70

Example: Exam Scores

Consider the midterm and final for a statistics class. Suppose 13% of students earned an A on the midterm. Of those students who earned an A on the midterm, 47% received an A on the final. 11% of the students who earned lower than an A on the midterm received an A on the final. You pick up a final exam at random and notice the student received an A. What is the probability that this student earned an A on the midterm?

Section 3.2 August 5, 2019 70 / 79

slide-71
SLIDE 71

Example: Exam Scores

Let’s start by writing the given information in probability notation. P(midterm = A) = 0.13 P(final = A | midterm = A) = 0.47 P(final = A | midterm = not A) = 0.11 We want to know the probability that a student who earned an A on the final also earned an A on the midterm: P(midterm = A | final = A)

Section 3.2 August 5, 2019 71 / 79

slide-72
SLIDE 72

Example: Exam Scores

Now that we’ve formalized the information from the problem statement, we can consider our next steps. It’s not yet clear how to calculate P(midterm = A | final = A), so let’s use what we know to draw a tree diagram.

Section 3.2 August 5, 2019 72 / 79

slide-73
SLIDE 73

Example: Exam Scores

We will use this information to draw our tree diagram. P(midterm = A) = 0.13 P(final = A | midterm = A) = 0.47 P(final = A | midterm = not A) = 0.11

Section 3.2 August 5, 2019 73 / 79

slide-74
SLIDE 74

Example: Exam Scores

Can we use this to calculate P(midterm = A | final = A)?

Section 3.2 August 5, 2019 74 / 79

slide-75
SLIDE 75

Example: Exam Scores

First, consider our conditional probability formula. P(midterm = A | final = A) = P(midterm = A and final = A) P(final = A) We can get all of the probabilities on the right hand side of the formula by using our tree diagram!

Section 3.2 August 5, 2019 75 / 79

slide-76
SLIDE 76

Example: Exam Scores

First, P(midterm = A and final = A) = 0.0611.

Section 3.2 August 5, 2019 76 / 79

slide-77
SLIDE 77

Example: Exam Scores

Then

P(final = A) = P(midterm = not A and final = A) + P(midterm = A and final = A) = 0.0957 + 0.0611 = 0.1568

Section 3.2 August 5, 2019 77 / 79

slide-78
SLIDE 78

Example: Exam Scores

Plugging these in, P(midterm = A | final = A) = P(midterm = A and final = A) P( final = A) = 0.0611 0.1568 = 0.3897. So the probability that a student earned an A on the midterm, given that their final exam score was an A, is about 39%.

Section 3.2 August 5, 2019 78 / 79

slide-79
SLIDE 79

Bayes’ Theorem

That was a lot of work! Bayes’ Theorem will help minimize this work so that we can more easily calculate P(statement about variable 1 | statement about variable 2) when we have information about P(statement about variable 2 | statement about variable 1).

Section 3.2 August 5, 2019 79 / 79