Proba obabi bility lity an and Ran d Rando dom m Processes - - PowerPoint PPT Presentation

proba obabi bility lity an and ran d rando dom m
SMART_READER_LITE
LIVE PREVIEW

Proba obabi bility lity an and Ran d Rando dom m Processes - - PowerPoint PPT Presentation

Proba obabi bility lity an and Ran d Rando dom m Processes ocesses ECS S 315 Asst. Prof. Dr. Prapun Suksompong prapun@siit.tu.ac.th 1 Probability and You Office Hours: Rangsit Library: Tuesday 16:20-17:20 BKD3601-7: Thursday


slide-1
SLIDE 1
  • Asst. Prof. Dr. Prapun Suksompong

prapun@siit.tu.ac.th

1 Probability and You

1

Proba

  • babi

bility lity an and Ran d Rando dom m Processes

  • cesses

ECS S 315

Office Hours: Rangsit Library: Tuesday 16:20-17:20 BKD3601-7: Thursday 14:40-16:00

slide-2
SLIDE 2

2

Life is random

slide-3
SLIDE 3

3

Life is random

In 2005, this statement (which is true) showed up all over the world…

slide-4
SLIDE 4

4

Life is random

slide-5
SLIDE 5

Applications of Probability Theory

5

 The subject of probability can be traced back to the 17th

century when it arose out of the study of gambling games.

 The range of applications extends beyond games into business

decisions, insurance, law, medical tests, and the social sciences.

 The stock market, “the largest casino in the world,” cannot

do without it.

 The telephone network, call centers, and airline

companies with their randomly fluctuating loads could not have been economically designed without probability theory.

slide-6
SLIDE 6

“The Perfect Thing”

6

What is this?

slide-7
SLIDE 7

“The Perfect Thing”

7

slide-8
SLIDE 8

Perfect?!...

8

slide-9
SLIDE 9

What about the shuffle function?

9

http://ipod.about.com/od/advanceditunesuse/a/itunes-random.htm http://electronics.howstuffworks.com/ipod-shuffle2.htm http://www.cnet.com.au/itunes-just-how-random-is-random-339274094.htm

slide-10
SLIDE 10

USA Currency Coins

10

 Penny = 1 cent

(Abraham Lincoln)

 Nickel = 5 cents

(Thomas Jefferson)

 Dime = 10 cents

(Franklin D. Roosevelt)

 Quarter = 25 cents

(George Washington)

slide-11
SLIDE 11

Coin Tossing: Relative Frequency

11

2 4 6 8 10 0.2 0.4 0.6 0.8 1 200 400 600 800 1000 0.2 0.4 0.6 0.8 1 20 40 60 80 100 0.2 0.4 0.6 0.8 1 2 4 6 8 10 x 10

5

0.2 0.4 0.6 0.8 1

 

, N A n n 1,2, ,10 n  1,2, ,100 n  1,2, ,1000 n 

6

1,2, ,10 n 

If a fair coin is flipped a large number of times, the proportion of heads will tend to get closer to 1/2 as the number of tosses increases.

slide-12
SLIDE 12

Coin Tossing: Relative Freq. vs. #H-#T

12

If a fair coin is flipped a large number of times, the proportion of heads will tend to get closer to 1/2 as the number of tosses increases.

This statement does not say that the difference between #H and #T will be close to 0.

1 2 3 4 5 6 7 8 9 10 x 10

5

  • 500

500 1000 1500 2000 2500

The difference between #H and #T will not converge to 0.

n

slide-13
SLIDE 13

Another trial

13

2 4 6 8 10 0.2 0.4 0.6 0.8 1 200 400 600 800 1000 0.2 0.4 0.6 0.8 1 20 40 60 80 100 0.2 0.4 0.6 0.8 1 2 4 6 8 10 x 10

5

0.2 0.4 0.6 0.8 1 1 2 3 4 5 6 7 8 9 10 x 10

5

  • 600
  • 400
  • 200

200 400 600 800 1000

Relative Freq. #H-#T

n n n n n

slide-14
SLIDE 14

Another trial

14

2 4 6 8 10 0.2 0.4 0.6 0.8 1 200 400 600 800 1000 0.2 0.4 0.6 0.8 1 20 40 60 80 100 0.2 0.4 0.6 0.8 1 2 4 6 8 10 x 10

5

0.2 0.4 0.6 0.8 1 1 2 3 4 5 6 7 8 9 10 x 10

5

  • 600
  • 500
  • 400
  • 300
  • 200
  • 100

100 200 300

Relative Freq. #H-#T

n n n n n

slide-15
SLIDE 15
  • Asst. Prof. Dr. Prapun Suksompong

prapun@siit.tu.ac.th

2 Review of Set Theory

1

Proba

  • babi

bility lity an and Ran d Rando dom m Processes

  • cesses

ECS S 315

Office Hours: Rangsit Library: Tuesday 16:20-17:20 BKD3601-7: Thursday 14:40-16:00

slide-16
SLIDE 16

Venn diagram

2

slide-17
SLIDE 17

Venn diagram: Examples

3

slide-18
SLIDE 18

Partitions

4

slide-19
SLIDE 19
  • Asst. Prof. Dr. Prapun Suksompong

prapun@siit.tu.ac.th

3 Classical Probability

1

Proba

  • babi

bility lity an and Ran d Rando dom m Processes

  • cesses

ECS S 315

Office Hours: Rangsit Library: Tuesday 16:20-17:20 BKD3601-7: Thursday 14:40-16:00

slide-20
SLIDE 20

Real coins are biased

2

 From a group of Stanford researchers

http://gajitz.com/up-in-the-air-coin-tosses-not-as-neutral-as-you-think/ http://www.codingthewheel.com/archives/the-coin-flip-a-fundamentally-unfair-proposition http://www-stat.stanford.edu/~susan/papers/headswithJ.pdf

slide-21
SLIDE 21

Example

3

 In drawing a card from a deck, there are 52 equally likely

  • utcomes, 13 of which are diamonds. This leads to a

probability of 13/52 or 1/4.

slide-22
SLIDE 22

The word “dice”

4

 Historically, dice is the plural of die.  In modern standard English, dice is used as both the

singular and the plural.

Example of 19th Century bone dice

slide-23
SLIDE 23

“Advanced” dice

5

[ http://gmdice.com/ ]

slide-24
SLIDE 24

Dice Simulator

6

 http://www.dicesimulator.com/  Support up to 6 dice and also has some background

information on dice and random numbers.

slide-25
SLIDE 25

Two Dice

7

slide-26
SLIDE 26

Two-Dice Statistics

8

slide-27
SLIDE 27

Two Dice

9

 A pair of dice

Double six

slide-28
SLIDE 28

Two dice: Simulation

10

[ http://www2.whidbey.net/ohmsmath/webwork/javascript/dice2rol.htm ]

slide-29
SLIDE 29

Two dice

11

 Assume that the two dice are fair and independent.  P[sum of the two dice = 5] = 4/36

slide-30
SLIDE 30

Two dice

12

 Assume that the two dice are fair and independent.

slide-31
SLIDE 31

Two-Dice Statistics

13

slide-32
SLIDE 32
  • Asst. Prof. Dr. Prapun Suksompong

prapun@siit.tu.ac.th

4 Combinatorics

1

Proba

  • babi

bility lity an and Ran d Rando dom m Processes

  • cesses

ECS S 315

Office Hours: Rangsit Library: Tuesday 16:20-17:20 BKD3601-7: Thursday 14:40-16:00

slide-33
SLIDE 33

Combinatorics

2

 Mathematics of Choice

How to count without counting

 By Ivan Niven  permutations, combinations,

binomial coefficients, the inclusion- exclusion principle, combinatorial probability, partitions of numbers, generating polynomials, the pigeonhole principle, and much more.

slide-34
SLIDE 34

Heads, Bodies and Legs flip-book

3

slide-35
SLIDE 35

Heads, Bodies and Legs flip-book (2)

4

slide-36
SLIDE 36

One Hundred Thousand Billion Poems

5

 Cent mille milliards de poèmes

slide-37
SLIDE 37

One Hundred Thousand Billion Poems (2)

6

slide-38
SLIDE 38

Example: Sock It Two Me

7

 Jack is so busy that he's always throwing his socks into his top

drawer without pairing them. One morning Jack oversleeps. In his haste to get ready for school, (and still a bit sleepy), he reaches into his drawer and pulls out 2 socks.

 Jack knows that 4 blue socks, 3 green socks, and 2 tan socks

are in his drawer.

  • 1. What are Jack's chances that he pulls out 2 blue socks to match

his blue slacks?

  • 2. What are the chances that he pulls out a pair of matching

socks?

[Greenes, 1977]

slide-39
SLIDE 39

“Origin” of Probability Theory

8

 Probability theory was originally inspired by gambling

problems.

 In 1654, Chevalier de Mere invented a gambling system

which bet even money on case B.

 When he began losing money, he asked his mathematician

friend Blaise Pascal to analyze his gambling system.

 Pascal discovered that the Chevalier's system would lose

about 51 percent of the time.

 Pascal became so interested in probability and together with

another famous mathematician, Pierre de Fermat, they laid the foundation of probability theory.

best known for Fermat's Last Theorem [http://www.youtube.com/watch?v=MrVD4q1m1Vo]

slide-40
SLIDE 40

Example: The Seven Card Hustle

9

 Take five red cards and two black cards from a pack.  Ask your friend to shuffle them and then, without looking at the

faces, lay them out in a row.

 Bet that them can’t turn over three red cards.  The probability that they CAN do it is

[Lovell, 2006]

   

3 3

5 5 7  4 3 7 6 5     2 7 5 3 5! 7 3! 3               3! 2! 4! 1 2 5 4 3 7! 7 6 5 7       

slide-41
SLIDE 41

Finger-Smudge on Touch-Screen Devices

10

 Fingers’ oily smear on the

screen

 Different apps gives different

finger-smudges.

 Latent smudges may be usable

to infer recently and frequently touched areas of the screen--a form of information leakage.

[http://www.ijsmblog.com/2011/02/ipad-finger-smudge-art.html]

slide-42
SLIDE 42

Andre Woolery Art

11

Fruit Ninja Facebook Angry Bird Mail

slide-43
SLIDE 43

For sale… Andre Woolery Art

12

[http://www.andrewooleryart.com/collections/ipad-abstract-art/products/mail-abstract-art-polychrome]

slide-44
SLIDE 44

Lockscreen PIN / Passcode

13 [http://lifehacker.com/5813533/why-you-should-repeat-one-digit-in-your-phones-4+digit-lockscreen-pin]

slide-45
SLIDE 45

Smudge Attack

14

 Touchscreen smudge may give away your password/passcode  Four distinct fingerprints reveals the four numbers used for

passcode lock.

[http://www.engadget.com/2010/08/16/shocker-touchscreen-smudge-may-give-away-your-android-password/2]

slide-46
SLIDE 46

Suggestion: Repeat One Digit

15

 Unknown numbers:

 The number of 4-digit different passcodes = 104

 Exactly four different numbers:

 The number of 4-digit different passcodes = 4! = 24

 Exactly three different numbers:

 The number of 4-digit different passcodes =

 2

3 4 36  

Choose the number that will be repeated Choose the locations of the two non- repeated numbers.

slide-47
SLIDE 47

News: Most Common Lockscreen PINs

16

 Passcodes of users of Big Brother Camera Security iPhone

app

 15% of all passcode sets were represented by only 10

different passcodes

[http://amitay.us/blog/files/most_common_iphone_passcodes.php (2011)]

  • ut of 204,508 recorded passcodes
slide-48
SLIDE 48

Even easier in Splinter Cell

17

 Decipher the keypad's code by the heat left on the buttons.  Here's the keypad viewed with your thermal

  • goggles. (Numbers added for emphasis.)

Again, the stronger the signature, the more recent the keypress.

 The code is 1456.

slide-49
SLIDE 49

Actual Research

18

 University of California San Diego  The researchers have shown that codes can be easily discerned

from quite a distance (at least seven metres away) and image- analysis software can automatically find the correct code in more than half of cases even one minute after the code has been entered.

 This figure rose to more than eighty percent if the thermal

camera was used immediately after the code was entered.

  • K. Mowery, S. Meiklejohn, and S. Savage. 2011. “Heat of the Moment:

Characterizing the Efficacy of Thermal-Camera Based Attacks”. Proceed- ings of WOOT 2011. http://cseweb.ucsd.edu/~kmowery/papers/thermal.pdf http://wordpress.mrreid.org/2011/08/27/hacking-pin-pads-using- thermal-vision/

slide-50
SLIDE 50

The Birthday Problem (Paradox)

19

 How many people do you need to assemble before the

probability is greater than 1/2 that some two of them have the same birthday (month and day)?

 Birthdays consist of a month and a day with no year attached.  Ignore February 29 which only comes in leap years  Assume that every day is as likely as any other to be someone’s

birthday

 In a group of r people, what is the probability that two or

more people have the same birthday?

slide-51
SLIDE 51

Probability of birthday coincidence

20

 Probability that there is at least two people who have the

same birthday in a group of r persons

 

terms

if 365 365 1 365 364 1 · · · , if 0 365 365 365 3 5 1, 6

r

r r r                        

slide-52
SLIDE 52

Probability of birthday coincidence

21

slide-53
SLIDE 53

The Birthday Problem (con’t)

22

 With 88 people, the probability is greater than 1/2 of having

three people with the same birthday.

 187 people gives a probability greater than1/2 of four people

having the same birthday

slide-54
SLIDE 54

Birthday Coincidence: 2nd Version

23

 How many people do you need to assemble before the

probability is greater than 1/2 that at least one of them have the same birthday (month and day) as you?

 In a group of r people, what is the probability that at least one

  • f them have the same birthday (month and day) as you?
slide-55
SLIDE 55

Distinct Passcodes (revisit)

24

 Unknown numbers:

 The number of 4-digit different passcodes = 104

 Exactly four different numbers:

 The number of 4-digit different passcodes = 4! = 24

 Exactly three different numbers:

 The number of 4-digit different passcodes =

 Exactly two different numbers:

 The number of 4-digit different passcodes =

 Exactly one number:

 The number of 4-digit different passcodes = 1

 Check:

10 4 ⋅ 24 + 10 3 ⋅ 36 + 10 2 ⋅ 14 + 10 1 ⋅ 1 = 10,000

 2

3 4 36  

4 3 + 4 2 + 4 1 = 14

slide-56
SLIDE 56

Ex: Poker Probability

25

[ http://en.wikipedia.org/wiki/Poker_probability ] Need more practice?

slide-57
SLIDE 57

Binomial Theorem

26

1 1 2 2

( ) ( ) x y x y   

1 2 1 2 1 2 1 2

x x x y y x y y    

1 1 2 2 3 3

( ) ( ) ( ) x y x y x y     

1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3

x x x x x y x y x x y y y x x y x y y y x y y y         ( ) ( ) x y x y    ( ) ( ) ( ) x y x y x y     

2 3 2 3

3 3 xyy yxy yy xx y x xy y xyx y yy y xxx x xx x y            

2 2

2 yy xx x xy yx xy y       

1 2 3 1 2 3

x x x y y y x y      

slide-58
SLIDE 58

Success Runs (1/4)

27

 Suppose that two people are separately asked to toss a fair

coin 120 times and take note of the results. Heads is noted as a “one” and tails as a “zero”.

 Results: Two lists of compiled zeros and ones:

[Tijms, 2007, p 192]

slide-59
SLIDE 59

Success Runs (2/4)

28

 Which list is more likely?

[Tijms, 2007, p 192]

slide-60
SLIDE 60

Success Runs (3/4)

29

 Fact: One of the two individuals has cheated and has

fabricated a list of numbers without having tossed the coin.

 Which list is more likely be the fabricated list?

[Tijms, 2007, p 192]

slide-61
SLIDE 61

Success Runs (4/4)

30

 Fact: In 120 tosses of a fair coin, there is a very large probability

that at some point during the tossing process, a sequence of five

  • r more heads or five or more tails will naturally occur.

 The probability of this is approximately 0.9865.

 In contrast to the second list, the first list shows no such sequence

  • f five heads in a row or five tails in a row. In the first list, the

longest sequence of either heads or tails consists of three in a row.

 In 120 tosses of a fair coin, the probability of the longest

sequence consisting of three or less in a row is equal to 0.000053 which is extremely small .

 Thus, the first list is almost certainly a fake.  Most people tend to avoid noting long sequences of consecutive

heads or tails. Truly random sequences do not share this human tendency!

[Tijms, 2007, p 192]

slide-62
SLIDE 62

Fun Reading …

31

 Entertaining Mathematical Puzzles

(1986)

 By Martin Gardner (1914-2010)  It includes a mixture of old and new

riddles covering a variety of mathematical topics: money, speed, plane and solid geometry, probability (Part VII), topology, tricky puzzles and more.

 Carefully explained solutions follow

each problem.

slide-63
SLIDE 63

Fun Books…

32

slide-64
SLIDE 64

Exercise from Mlodinow’s talk

33

 At 10:14 into the video, Mlodinow shows three probabilities.  Can you derive the first two?  http://www.youtube.com/watch?v=F0sLuRsu1Do  [Mlodinow, 2008, p. 180-181]

slide-65
SLIDE 65
  • Asst. Prof. Dr. Prapun Suksompong

prapun@siit.tu.ac.th

  • II. Events-Based Probability Theory

1

Proba

  • babi

bility lity an and Ran d Rando dom m Processes

  • cesses

ECS S 315

Office Hours: Rangsit Library: Tuesday 16:20-17:20 BKD3601-7: Thursday 14:40-16:00

slide-66
SLIDE 66
  • Asst. Prof. Dr. Prapun Suksompong

prapun@siit.tu.ac.th

5 Foundation of Probability Theory

2

Proba

  • babi

bility lity an and Ran d Rando dom m Processes

  • cesses

ECS S 315

Office Hours: Rangsit Library: Tuesday 16:20-17:20 BKD3601-7: Thursday 14:40-16:00

slide-67
SLIDE 67

Kolmogorov

3

 Andrey Nikolaevich Kolmogorov  Soviet Russian mathematician  Advanced various scientific fields

 probability theory  topology  classical mechanics  computational complexity.

 1922: Constructed a Fourier series that diverges almost

everywhere, gaining international recognition.

 1933: Published the book, Foundations of the Theory of

Probability, laying the modern axiomatic foundations of

probability theory and establishing his reputation as the world's leading living expert in this field.

slide-68
SLIDE 68

I learned probability theory from

4

Rick Durrett Eugene Dynkin Philip Protter Gennady Samorodnitsky Terrence Fine Xing Guo Toby Berger

slide-69
SLIDE 69

Not too far from Kolmogorov

5

You can be

the 4th-generation

probability theorists

slide-70
SLIDE 70
  • Asst. Prof. Dr. Prapun Suksompong

prapun@siit.tu.ac.th

Event-Based Properties

6

Proba

  • babi

bility lity an and Ran d Rando dom m Processes

  • cesses

ECS S 315

slide-71
SLIDE 71

Daniel Kahneman

7

 Daniel Kahneman  Israeli-American psychologist  2002 Nobel laureate

 In Economics

 Hebrew University, Jerusalem, Israel.  Professor emeritus of psychology and public affairs

at Princeton University's Woodrow Wilson School.

 With Amos Tversky, Kahneman studied and

clarified the kinds of misperceptions of randomness that fuel many of the common fallacies.

slide-72
SLIDE 72

K&T: Q1

8

 K&T presented this description to a group of 88 subjects and

asked them to rank the eight statements (shown on the next slide) on a scale of 1 to 8 according to their probability, with 1 representing the most probable and 8 the least.

[Daniel Kahneman, Paul Slovic, and Amos Tversky, eds., Judgment under Uncertainty: Heuristics and Biases (Cambridge: Cambridge University Press, 1982), pp. 90–98.]

Imagine a woman named Linda, 31 years old,

single, outspoken, and very bright. In college

she majored in philosophy. While a student she was deeply concerned with discrimination and

social justice and participated in antinuclear demonstrations.

[outspoken = given to expressing yourself freely or insistently]

slide-73
SLIDE 73

K&T: Q1 - Results

9

 Here are the results - from most to least probable

[feminist = of or relating to or advocating equal rights for women]

slide-74
SLIDE 74

K&T: Q1 – Results (2)

10

 At first glance there may appear to be nothing unusual in

these results: the description was in fact designed to be

 representative of an active feminist and  unrepresentative of a bank teller or an insurance salesperson.

Most probable Least likely

slide-75
SLIDE 75

K&T: Q1 – Results (3)

11

 Let’s focus on just three of the possibilities and their average

ranks.

 This is the order in which 85 percent of the respondents

ranked the three possibilities:

 If nothing about this looks strange, then K&T have fooled you

slide-76
SLIDE 76

K&T: Q1 - Contradiction

12

The probability that two events will both

  • ccur can never be greater than the

probability that each will occur individually!

slide-77
SLIDE 77

K&T: Q2

13

 K&T were not surprised by the result because they had given

their subjects a large number of possibilities, and the connections among the three scenarios could easily have gotten lost in the shuffle.

 So they presented the description of Linda to another group,

but this time they presented only three possibilities:

 Linda is active in the feminist movement.  Linda is a bank teller and is active in the feminist movement.  Linda is a bank teller.

slide-78
SLIDE 78

K&T: Q2 - Results

14

 To their surprise, 87 percent of the subjects in this trial also

incorrectly ranked the probability that “Linda is a bank teller and is active in the feminist movement” higher than the probability that “Linda is a bank teller”.

 If the details we are given fit our mental picture of

something, then the more details in a scenario, the more real it seems and hence the more probable we consider it to be

 even though any act of adding less-than-certain details to a conjecture

makes the conjecture less probable.

 Even highly trained doctors make this error when analyzing

symptoms.

 91 percent of the doctors fall prey to the same bias.

[Amos Tversky and Daniel Kahneman, “Extensional versus Intuitive Reasoning: The Conjunction Fallacy in Probability Judgment,” Psychological Review 90, no. 4 (October 1983): 293–315.]

slide-79
SLIDE 79

Related Topic

15

 Page 34-37  Tversky and Shafir @

Princeton University

slide-80
SLIDE 80

K&T: Q3

16

 Which is greater:

 the number of six-letter English words having “n” as their fifth letter

  • r

 the number of six-letter English words ending in “-ing”?

 Most people choose the group of words ending in “ing”. Why? Because words ending in “-ing” are easier to think of than generic six letter words having “n” as their fifth letter.

 The group of six-letter words having “n” as their fifth letter words

includes all six-letter words ending in “-ing”.

 Psychologists call this type of mistake the availability bias

 In reconstructing the past, we give unwarranted importance to

memories that are most vivid and hence most available for retrieval.

[Amos Tversky and Daniel Kahneman, “Availability: A Heuristic for Judging Frequency and Probability,” Cognitive Psychology 5 (1973): 207–32.]

slide-81
SLIDE 81

Misuse of probability in law

17

 It is not uncommon for experts in DNA analysis to testify at a

criminal trial that a DNA sample taken from a crime scene matches that taken from a suspect.

 How certain are such matches?  When DNA evidence was first introduced, a number of experts

testified that false positives are impossible in DNA testing.

 Today DNA experts regularly testify that the odds of a random

person’s matching the crime sample are less than 1 in 1

million or 1 in 1 billion.

 In Oklahoma a court sentenced a man named Timothy Durham to

more than 3,100 years in prison even though eleven witnesses had placed him in another state at the time of the crime.

[Mlodinow, 2008, p 36-37]

slide-82
SLIDE 82

Lab/Human Error

18

 There is another statistic that is often not presented to the

jury, one having to do with the fact that labs make errors, for instance, in collecting or handling a sample, by accidentally mixing

  • r swapping samples, or by misinterpreting or incorrectly

reporting results.

 Each of these errors is rare but not nearly as rare as a random

match.

 The Philadelphia City Crime Laboratory admitted that it had

swapped the reference sample of the defendant and the victim in a rape case

 A testing firm called Cellmark Diagnostics admitted a similar

error.

[Mlodinow, 2008, p 36-37]

slide-83
SLIDE 83

Timothy Durham’s case

19

 It turned out that in the initial analysis the lab had failed to

completely separate the DNA of the rapist and that of the victim in the fluid they tested, and the combination of the victim’s and the rapist’s DNA produced a positive result when compared with Durham’s.

 A later retest turned up the error, and Durham was released

after spending nearly four years in prison.

[Mlodinow, 2008, p 36-37]

slide-84
SLIDE 84

DNA-Match Error + Lab Error

20

 Estimates of the error rate due to human causes vary, but

many experts put it at around 1 percent.

 Most jurors assume that given the two types of error—the 1

in 1 billion accidental match and the 1 in 100 lab-error match—the overall error rate must be somewhere in

between, say 1 in 500 million, which is still for most jurors beyond a reasonable doubt.

[Mlodinow, 2008, p 36-37]

slide-85
SLIDE 85

Wait!…

21

 Even if the DNA match error was extremely accurate + Lab

error is very small,

 there is also another probability concept that should be taken

into account.

 More about this later.  Right now, back to notes for more properties of probability

measure.

slide-86
SLIDE 86
  • Asst. Prof. Dr. Prapun Suksompong

prapun@siit.tu.ac.th

6.1 Conditional Probability

1

Proba

  • babi

bility lity an and Ran d Rando dom m Processes

  • cesses

ECS S 315

Office Hours: Rangsit Library: Tuesday 16:20-17:20 BKD3601-7: Thursday 16:00-17:00

slide-87
SLIDE 87

2

slide-88
SLIDE 88

Disease Testing

3

 Suppose we have a diagnostic test for a particular disease

which is 99% accurate.

 A person is picked at random and tested for the disease.  The test gives a positive result.  Q1: What is the probability that the person actually has the

disease?

 Natural answer: 99% because the test gets it right 99% of the

times.

slide-89
SLIDE 89

99% accurate test?

4

 Two kinds of error  If you use this test on many persons with the disease, the

test will indicate correctly that those persons have disease 99% of the time.

 False negative rate = 1% = 0.01

 If you use this test on many persons without the disease, the

test will indicate correctly that those persons do not have disease 99% of the time.

 False positive rate = 1% = 0.01

1  0 0  1

slide-90
SLIDE 90

Disease Testing: The Question

5

 Suppose we have a diagnostic test for a particular disease

which is 99% accurate.

 A person is picked at random and tested for the disease.  The test gives a positive result.  Q1: What is the probability that the person actually has the

disease?

 Natural answer: 99% because the test gets it right 99% of the

times.

 Q2: Can the answer be 1% or 2%?  Q3: Can the answer be 50%?

slide-91
SLIDE 91

Disease Testing: The Answer

6

Q1: What is the probability that the person actually has the disease?

A1: The answer actually depends on how

common or how rare the disease is!

slide-92
SLIDE 92

Why?

7

 Let’s assume rare disease.

 The disease affects about 1 person in 10,000.

 Try an experiment with 106 people.  Approximately 100 people will have the disease.  What would the (99%-accurate) test say?

Test 106 people

slide-93
SLIDE 93

Results of the test

8

100 people w/ disease 999,900 people w/o disease 99 of them will test positive 1 of them will test negative 989,901 of them will test negative 9,999 of them will test positive

approximately

slide-94
SLIDE 94

Results of the test

9

100 people w/ disease 999,900 people w/o disease 99 of them will test positive 1 of them will test negative 989,901 of them will test negative 9,999 of them will test positive Of those who test positive, only

99 1% 99 9,999  

actually have the disease!

slide-95
SLIDE 95

Bayes’ Theorem

10

Using the concept of conditional probability and Bayes’ Theorem, you can show that the probability that a person will have the disease given that the test is positive is given by where, in our example, pD = 10-4 pTE = 1 – 0.99 = 0.01

(1 ) (1 ) (1 )

TE D TE D TE D

p p p p p p    

slide-96
SLIDE 96

Bayes’ Theorem

11

Using the concept of conditional probability and Bayes’ Theorem, you can show that the probability P(D|TP) that a person will have the disease given that the test result is positive is given by When different value of pD is assumed, We get different value of P(D|TP). Conclusion: Any value (between 0 and 1) can be obtained by varying the value of pD

(1 ) (1 ) (1 )

TE D TE D TE D

p p p p p p    

1 1 pD P(D|TP)

slide-97
SLIDE 97

In log scale…

12

10

  • 6

10

  • 5

10

  • 4

10

  • 3

10

  • 2

10

  • 1

10 10

  • 5

10

  • 4

10

  • 3

10

  • 2

10

  • 1

10 d

pD P(D|TP)

slide-98
SLIDE 98

Effect of pTE

13

pTE = 1 – 0.99 = 0.01

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

pTE = 1 – 0.9 = 0.1 pTE = 1 – 0.5 = 0.5

pD P(D|TP)

slide-99
SLIDE 99

Wrap-up

14

 Q1: What is the probability that the person actually has the

disease?

 A1: The answer actually depends on how common or how

rare the disease is! (The answer depends on the value of PD.)

 Q2: Can the answer be 1% or 2%?  A2:

Yes.

 Q3: Can the answer be 50%?  A3:

Yes.

slide-100
SLIDE 100

Example: A Revisit

15

 Roll a fair dice  Sneak peek:

slide-101
SLIDE 101

Prosecutor’s fallacy

16

 O. J. Simpson

 At the time a well-known celebrity famous

both as a TV actor and as a retired professional football star.

 Defense lawyer: Alan Dershowitz

 Renowned attorney and Harvard Law

School professor

[Mlodinow, 2008, p. 119-121],[Tijms, 1007, Ex 8.7]

 Murder case

 “one of the biggest media events of 1994–95”  “the most publicized criminal trial in American history”

(การพิจารณาคดีในศาล) (ทนาย)

slide-102
SLIDE 102

The murder of Nicole

17

 Nicole Brown was murdered at her home

in Los Angeles on the night of June 12, 1994.

 So was her friend Ronald Goldman.

 The prime suspect was her (ex-)

husband O.J. Simpson.

 (They were divorced in 1992.)

(ผู้ต้องสงสัย)

slide-103
SLIDE 103

Prosecutors’ argument

18

 Prosecutors* spent the first ten days of the trial entering

evidence of Simpson’s history of physically abusing her and claimed that this alone was a good reason to suspect him

  • f her murder.

 As they put it,

“a slap is a prelude to homicide.”

Prosecutor* = a government official who conducts criminal prosecutions on behalf of the state (พนักงานอัยการ)

(เป็นฝ่ายผู้ฟ้องร้อง/โจทก์) (ฆาตกรรม)

slide-104
SLIDE 104

Counterargument

19

 The defense attorneys argued

 that the prosecution* had spent two weeks trying to mislead

the jury

 and that the evidence that O. J. had battered Nicole on

previous occasions meant nothing.

 Dershowitz’s reasoning:

 4 million women are battered annually by husbands and

boyfriends in the US.

 In 1992, a total of 1,432, or 1 in 2,500, were killed by their

(ex)husbands or boyfriends.

 Therefore, few men who slap or beat their domestic partners

go on to murder them.

 True? …Yes…Convincing?

(ทนายฝ่ายจ าเลย) (ทุบตี)

slide-105
SLIDE 105

The verdict:

20

Not guilty for the two murders!

The verdict was seen live on TV by more than half of the U.S. population, making it one of the most watched events in American TV history.

slide-106
SLIDE 106

The Truth: Another number…

21

 It is important to make use of the crucial fact that Nicole

Brown was murdered.

 The relevant number is not the probability that a man who

batters his wife will go on to kill her (1 in 2,500) but rather the probability that a battered wife who was murdered was murdered by her abuser.

 According to the Uniform Crime Reports for the United

States and Its Possessions in 1993, the probability Dershowitz (or the prosecution) should have reported was this one:

  • f all the battered women murdered in the United States in

1993, some 90 percent were killed by their abuser.

 That statistic was not mentioned at the trial.

This event has happened and should be used in probability evaluation

slide-107
SLIDE 107

A Simplified Diagram

22

Physically abused (battered) by husband Murdered by husband Murdered

slide-108
SLIDE 108

Probability Comparison

23

Physically abused by husband Murdered by husband Murdered

Physically abused by husband Murdered by husband Murdered

1 in 2,500 (0.04%)

90%

The orange event is ignored.

slide-109
SLIDE 109

The Whole Truth …

24

 Dershowitz may have felt justified in misleading the jury

because, in his words, “the courtroom oath—‘to tell the truth, the whole truth and nothing but the truth’—is applicable only to witnesses.

 Defense attorneys, prosecutors, and judges don’t take this

  • ath . . . indeed, it is fair to say the American justice system is

built on a foundation of not telling the whole truth.”

[Mlodinow, The Drunkard's Walk: How Randomness Rules Our Lives]

slide-110
SLIDE 110

Simpson's paradox: Berkeley gender bias case

25

 University of California, Berkeley was sued for bias

against women who had applied for admission to graduate schools there.

 The admission figures for the fall of 1973 showed that men

applying were more likely than women to be admitted, and the difference was so large that it was unlikely to be due to chance.

Applicants Admitted Men 8442 44% Women 4321 35%

slide-111
SLIDE 111

Simpson's paradox: Berkeley gender bias case

26

 But when examining the individual departments, it appeared

that no department was significantly biased against women.

 In fact, most departments had a "small but statistically

significant bias in favor of women.“

 How?

slide-112
SLIDE 112

Simpson's paradox: Berkeley gender bias case

27

 The data from the six largest departments are listed below.

Department Men Women Applicants Admitted Applicants Admitted A 825 62% 108 82% B 560 63% 25 68% C 325 37% 593 34% D 417 33% 375 35% E 191 28% 393 24% F 272 6% 341 7%

slide-113
SLIDE 113

Simpson's paradox: Berkeley gender bias case: Conclusion

28

 Women tended to apply to competitive departments with

low rates of admission even among qualified applicants (such as in the English Department).

 Men tended to apply to less-competitive departments with

high rates of admission among the qualified applicants (such as in engineering and chemistry).

Department Men Women Applicants Admitted Applicants Admitted A 825 62% 108 82% B 560 63% 25 68% C 325 37% 593 34% D 417 33% 375 35% E 191 28% 393 24% F 272 6% 341 7%

slide-114
SLIDE 114
  • Asst. Prof. Dr. Prapun Suksompong

prapun@siit.tu.ac.th

6.2 Independence

1

Proba

  • babi

bility lity an and Ran d Rando dom m Processes

  • cesses

ECS S 315

Office Hours: Rangsit Library: Tuesday 16:20-17:20 BKD3601-7: Thursday 16:00-17:00

slide-115
SLIDE 115

Sally Clark

2

[http://www.sallyclark.org.uk/] [http://en.wikipedia.org/wiki/Sally_Clark] [http://www.timesonline.co.uk/tol/comment/obituaries/article1533755.ece]

slide-116
SLIDE 116

Sally Clark

3

 Falsely accused of the murder of her two

sons.

 Clark's first son died suddenly within a few

weeks of his birth in 1996.

 After her second son died in a similar manner,

she was arrested in 1998 and tried for the murder of both sons.

 The case went to appeal, but the convictions

and sentences were confirmed in 2000.

 Released in 2003 by Court of Appeal  Wrongfully imprisoned for more than 3 years  Never fully recovered from the effects of this

appalling miscarriage of justice.

slide-117
SLIDE 117

Misuse of statistics in the courts

4

 Her prosecution was controversial due to statistical

evidence

 This evidence was presented by a

medical expert witness

Professor Sir Roy Meadow,

 Meadow testified that the frequency of sudden infant death

syndrome (SIDS, or “cot death”) in families having some of the characteristics of the defendant’s family is 1 in 8500.

 He went on to square this figure to obtain a value of 1 in

73 million for the frequency of two cases of SIDS in such a family.

2 8

1 10 8500

      

slide-118
SLIDE 118

Royal Statistical Society

5

 “This approach is, in general, statistically invalid.”  “It would only be valid if SIDS cases arose independently

within families, an assumption that would need to be justified

  • empirically. “

 “There are very strong a priori reasons for supposing that the

assumption will be false.”

 “There may well be unknown genetic or environmental

factors that predispose families to SIDS, so that a second case within the family becomes much more likely.”

[http://www.rss.org.uk]

slide-119
SLIDE 119

Aftermath

6

 Clark's release in January 2003 prompted the Attorney

General to order a review of hundreds of other cases.

 Two other women convicted of murdering their children

had their convictions overturned and were released from prison.

 Trupti Patel, who was also accused of murdering her three

children, was acquitted in June 2003.

 In each case, Roy Meadow had testified about the

unlikelihood of multiple cot deaths in a single family.

slide-120
SLIDE 120

How Juries Are Fooled by Statistics

7

 By Peter Donnelly

http://www.youtube.com/watch?v=kLmzxmRcUTo http://www.stats.ox.ac.uk/people/academic_staff/peter_donnelly @ 11:15-13:50 Disease Testing @ 13:50-18:30 Sally Clark

Professor of Statistical Science (Dept Statistics) at University of Oxford

slide-121
SLIDE 121

Prosecutor’s Fallacy

8

 Aside from its invalidity, figures such as the 1 in 73 million are

very easily misinterpreted.

 Some press reports at the time stated that this was the chance that

the deaths of Sally Clark's two children were accidental.

 This (mis-)interpretation is a serious error of logic known as the

Prosecutor's Fallacy.

 The jury needs to weigh up two competing explanations for the

babies' deaths: 1) SIDS or 2) murder.

 Two deaths by SIDS or two murders are each quite unlikely, but

  • ne has apparently happened in this case.

 What matters is the relative likelihood of the deaths under each

explanation, not just how unlikely they are under one explanation (in this case SIDS, according to the evidence as presented).

slide-122
SLIDE 122
  • Asst. Prof. Dr. Prapun Suksompong

prapun@siit.tu.ac.th

6.3 Bernoulli Trials

9

Proba

  • babi

bility lity an and Ran d Rando dom m Processes

  • cesses

ECS S 315

Office Hours: Rangsit Library: Tuesday 16:20-17:20 BKD3601-7: Thursday 16:00-17:00

slide-123
SLIDE 123

Kakashi and Gai are eternal rivals.

10

slide-124
SLIDE 124
  • Asst. Prof. Dr. Prapun Suksompong

prapun@siit.tu.ac.th

Discrete Random Variable

1

Proba

  • babi

bility lity an and Ran d Rando dom m Processes

  • cesses

ECS S 315

Office Hours: Rangsit Library: Tuesday 16:20-17:20 BKD3601-7: Thursday 16:00-17:00

slide-125
SLIDE 125

X  Uniform({1,2,…,6})

2

Roll a fair dice. Record the result.

Again, roll a fair dice. Record the result. Again, roll a fair dice. Record the result. Again, roll a fair dice. Record the result. Again, roll a fair dice. Record the result. Again, roll a fair dice. Record the result.

Generate X 200 times. Put the results in a table of size 2010

slide-126
SLIDE 126

Histogram

3

[N, x] = hist(reshape(X,1,prod(size(X))),1:6) bar(x,N) Grid on frequency x

1 2 3 4 5 6 5 10 15 20 25 30 35 40

slide-127
SLIDE 127

Relative Frequency

4

rf = N/prod(size(X)) bar(x,rf) grid on stem(x,rf,'filled','LineWidth',1.5) grid on x x relative frequency relative frequency

1 2 3 4 5 6 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 1 2 3 4 5 6 7 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2

slide-128
SLIDE 128

With larger number of samples

5

rf = N/prod(size(X)) bar(x,rf) grid on stem(x,rf,'filled','LineWidth',1.5) grid on x x relative frequency relative frequency

1 2 3 4 5 6 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 1 2 3 4 5 6 7 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18

X = randi(6,100,100);