- Asst. Prof. Dr. Prapun Suksompong
prapun@siit.tu.ac.th
1 Probability and You
1
Proba
- babi
bility lity an and Ran d Rando dom m Processes
- cesses
ECS S 315
Office Hours: Rangsit Library: Tuesday 16:20-17:20 BKD3601-7: Thursday 14:40-16:00
Proba obabi bility lity an and Ran d Rando dom m Processes - - PowerPoint PPT Presentation
Proba obabi bility lity an and Ran d Rando dom m Processes ocesses ECS S 315 Asst. Prof. Dr. Prapun Suksompong prapun@siit.tu.ac.th 1 Probability and You Office Hours: Rangsit Library: Tuesday 16:20-17:20 BKD3601-7: Thursday
prapun@siit.tu.ac.th
1
Office Hours: Rangsit Library: Tuesday 16:20-17:20 BKD3601-7: Thursday 14:40-16:00
2
3
4
5
The subject of probability can be traced back to the 17th
The range of applications extends beyond games into business
The stock market, “the largest casino in the world,” cannot
The telephone network, call centers, and airline
6
7
8
9
http://ipod.about.com/od/advanceditunesuse/a/itunes-random.htm http://electronics.howstuffworks.com/ipod-shuffle2.htm http://www.cnet.com.au/itunes-just-how-random-is-random-339274094.htm
10
Penny = 1 cent
Nickel = 5 cents
Dime = 10 cents
Quarter = 25 cents
11
2 4 6 8 10 0.2 0.4 0.6 0.8 1 200 400 600 800 1000 0.2 0.4 0.6 0.8 1 20 40 60 80 100 0.2 0.4 0.6 0.8 1 2 4 6 8 10 x 10
5
0.2 0.4 0.6 0.8 1
, N A n n 1,2, ,10 n 1,2, ,100 n 1,2, ,1000 n
6
1,2, ,10 n
If a fair coin is flipped a large number of times, the proportion of heads will tend to get closer to 1/2 as the number of tosses increases.
12
If a fair coin is flipped a large number of times, the proportion of heads will tend to get closer to 1/2 as the number of tosses increases.
1 2 3 4 5 6 7 8 9 10 x 10
5
500 1000 1500 2000 2500
n
13
2 4 6 8 10 0.2 0.4 0.6 0.8 1 200 400 600 800 1000 0.2 0.4 0.6 0.8 1 20 40 60 80 100 0.2 0.4 0.6 0.8 1 2 4 6 8 10 x 10
5
0.2 0.4 0.6 0.8 1 1 2 3 4 5 6 7 8 9 10 x 10
5
200 400 600 800 1000
n n n n n
14
2 4 6 8 10 0.2 0.4 0.6 0.8 1 200 400 600 800 1000 0.2 0.4 0.6 0.8 1 20 40 60 80 100 0.2 0.4 0.6 0.8 1 2 4 6 8 10 x 10
5
0.2 0.4 0.6 0.8 1 1 2 3 4 5 6 7 8 9 10 x 10
5
100 200 300
n n n n n
prapun@siit.tu.ac.th
1
Office Hours: Rangsit Library: Tuesday 16:20-17:20 BKD3601-7: Thursday 14:40-16:00
2
3
4
prapun@siit.tu.ac.th
1
Office Hours: Rangsit Library: Tuesday 16:20-17:20 BKD3601-7: Thursday 14:40-16:00
2
From a group of Stanford researchers
http://gajitz.com/up-in-the-air-coin-tosses-not-as-neutral-as-you-think/ http://www.codingthewheel.com/archives/the-coin-flip-a-fundamentally-unfair-proposition http://www-stat.stanford.edu/~susan/papers/headswithJ.pdf
3
In drawing a card from a deck, there are 52 equally likely
4
Historically, dice is the plural of die. In modern standard English, dice is used as both the
Example of 19th Century bone dice
5
[ http://gmdice.com/ ]
6
http://www.dicesimulator.com/ Support up to 6 dice and also has some background
7
8
9
A pair of dice
Double six
10
[ http://www2.whidbey.net/ohmsmath/webwork/javascript/dice2rol.htm ]
11
Assume that the two dice are fair and independent. P[sum of the two dice = 5] = 4/36
12
Assume that the two dice are fair and independent.
13
prapun@siit.tu.ac.th
1
Office Hours: Rangsit Library: Tuesday 16:20-17:20 BKD3601-7: Thursday 14:40-16:00
2
Mathematics of Choice
By Ivan Niven permutations, combinations,
3
4
5
Cent mille milliards de poèmes
6
7
Jack is so busy that he's always throwing his socks into his top
drawer without pairing them. One morning Jack oversleeps. In his haste to get ready for school, (and still a bit sleepy), he reaches into his drawer and pulls out 2 socks.
Jack knows that 4 blue socks, 3 green socks, and 2 tan socks
are in his drawer.
his blue slacks?
socks?
[Greenes, 1977]
8
Probability theory was originally inspired by gambling
In 1654, Chevalier de Mere invented a gambling system
When he began losing money, he asked his mathematician
Pascal discovered that the Chevalier's system would lose
Pascal became so interested in probability and together with
best known for Fermat's Last Theorem [http://www.youtube.com/watch?v=MrVD4q1m1Vo]
9
Take five red cards and two black cards from a pack. Ask your friend to shuffle them and then, without looking at the
faces, lay them out in a row.
Bet that them can’t turn over three red cards. The probability that they CAN do it is
[Lovell, 2006]
3 3
5 5 7 4 3 7 6 5 2 7 5 3 5! 7 3! 3 3! 2! 4! 1 2 5 4 3 7! 7 6 5 7
10
Fingers’ oily smear on the
Different apps gives different
Latent smudges may be usable
[http://www.ijsmblog.com/2011/02/ipad-finger-smudge-art.html]
11
Fruit Ninja Facebook Angry Bird Mail
12
[http://www.andrewooleryart.com/collections/ipad-abstract-art/products/mail-abstract-art-polychrome]
13 [http://lifehacker.com/5813533/why-you-should-repeat-one-digit-in-your-phones-4+digit-lockscreen-pin]
14
Touchscreen smudge may give away your password/passcode Four distinct fingerprints reveals the four numbers used for
[http://www.engadget.com/2010/08/16/shocker-touchscreen-smudge-may-give-away-your-android-password/2]
15
Unknown numbers:
The number of 4-digit different passcodes = 104
Exactly four different numbers:
The number of 4-digit different passcodes = 4! = 24
Exactly three different numbers:
The number of 4-digit different passcodes =
3 4 36
Choose the number that will be repeated Choose the locations of the two non- repeated numbers.
16
Passcodes of users of Big Brother Camera Security iPhone
15% of all passcode sets were represented by only 10
[http://amitay.us/blog/files/most_common_iphone_passcodes.php (2011)]
17
Decipher the keypad's code by the heat left on the buttons. Here's the keypad viewed with your thermal
The code is 1456.
18
University of California San Diego The researchers have shown that codes can be easily discerned
from quite a distance (at least seven metres away) and image- analysis software can automatically find the correct code in more than half of cases even one minute after the code has been entered.
This figure rose to more than eighty percent if the thermal
camera was used immediately after the code was entered.
Characterizing the Efficacy of Thermal-Camera Based Attacks”. Proceed- ings of WOOT 2011. http://cseweb.ucsd.edu/~kmowery/papers/thermal.pdf http://wordpress.mrreid.org/2011/08/27/hacking-pin-pads-using- thermal-vision/
19
How many people do you need to assemble before the
Birthdays consist of a month and a day with no year attached. Ignore February 29 which only comes in leap years Assume that every day is as likely as any other to be someone’s
birthday
In a group of r people, what is the probability that two or
20
Probability that there is at least two people who have the
terms
if 365 365 1 365 364 1 · · · , if 0 365 365 365 3 5 1, 6
r
r r r
21
22
With 88 people, the probability is greater than 1/2 of having
187 people gives a probability greater than1/2 of four people
23
How many people do you need to assemble before the
In a group of r people, what is the probability that at least one
24
Unknown numbers:
The number of 4-digit different passcodes = 104
Exactly four different numbers:
The number of 4-digit different passcodes = 4! = 24
Exactly three different numbers:
The number of 4-digit different passcodes =
Exactly two different numbers:
The number of 4-digit different passcodes =
Exactly one number:
The number of 4-digit different passcodes = 1
Check:
10 4 ⋅ 24 + 10 3 ⋅ 36 + 10 2 ⋅ 14 + 10 1 ⋅ 1 = 10,000
3 4 36
4 3 + 4 2 + 4 1 = 14
25
[ http://en.wikipedia.org/wiki/Poker_probability ] Need more practice?
26
1 1 2 2
( ) ( ) x y x y
1 2 1 2 1 2 1 2
x x x y y x y y
1 1 2 2 3 3
( ) ( ) ( ) x y x y x y
1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
x x x x x y x y x x y y y x x y x y y y x y y y ( ) ( ) x y x y ( ) ( ) ( ) x y x y x y
2 3 2 3
3 3 xyy yxy yy xx y x xy y xyx y yy y xxx x xx x y
2 2
2 yy xx x xy yx xy y
1 2 3 1 2 3
x x x y y y x y
27
Suppose that two people are separately asked to toss a fair
Results: Two lists of compiled zeros and ones:
[Tijms, 2007, p 192]
28
Which list is more likely?
[Tijms, 2007, p 192]
29
Fact: One of the two individuals has cheated and has
Which list is more likely be the fabricated list?
[Tijms, 2007, p 192]
30
Fact: In 120 tosses of a fair coin, there is a very large probability
The probability of this is approximately 0.9865.
In contrast to the second list, the first list shows no such sequence
longest sequence of either heads or tails consists of three in a row.
In 120 tosses of a fair coin, the probability of the longest
sequence consisting of three or less in a row is equal to 0.000053 which is extremely small .
Thus, the first list is almost certainly a fake. Most people tend to avoid noting long sequences of consecutive
[Tijms, 2007, p 192]
31
Entertaining Mathematical Puzzles
By Martin Gardner (1914-2010) It includes a mixture of old and new
Carefully explained solutions follow
32
33
At 10:14 into the video, Mlodinow shows three probabilities. Can you derive the first two? http://www.youtube.com/watch?v=F0sLuRsu1Do [Mlodinow, 2008, p. 180-181]
prapun@siit.tu.ac.th
1
Office Hours: Rangsit Library: Tuesday 16:20-17:20 BKD3601-7: Thursday 14:40-16:00
prapun@siit.tu.ac.th
2
Office Hours: Rangsit Library: Tuesday 16:20-17:20 BKD3601-7: Thursday 14:40-16:00
3
Andrey Nikolaevich Kolmogorov Soviet Russian mathematician Advanced various scientific fields
probability theory topology classical mechanics computational complexity.
1922: Constructed a Fourier series that diverges almost
everywhere, gaining international recognition.
1933: Published the book, Foundations of the Theory of
probability theory and establishing his reputation as the world's leading living expert in this field.
4
Rick Durrett Eugene Dynkin Philip Protter Gennady Samorodnitsky Terrence Fine Xing Guo Toby Berger
5
prapun@siit.tu.ac.th
6
7
Daniel Kahneman Israeli-American psychologist 2002 Nobel laureate
In Economics
Hebrew University, Jerusalem, Israel. Professor emeritus of psychology and public affairs
With Amos Tversky, Kahneman studied and
8
K&T presented this description to a group of 88 subjects and
[Daniel Kahneman, Paul Slovic, and Amos Tversky, eds., Judgment under Uncertainty: Heuristics and Biases (Cambridge: Cambridge University Press, 1982), pp. 90–98.]
Imagine a woman named Linda, 31 years old,
single, outspoken, and very bright. In college
she majored in philosophy. While a student she was deeply concerned with discrimination and
[outspoken = given to expressing yourself freely or insistently]
9
Here are the results - from most to least probable
[feminist = of or relating to or advocating equal rights for women]
10
At first glance there may appear to be nothing unusual in
representative of an active feminist and unrepresentative of a bank teller or an insurance salesperson.
Most probable Least likely
11
Let’s focus on just three of the possibilities and their average
This is the order in which 85 percent of the respondents
If nothing about this looks strange, then K&T have fooled you
12
13
K&T were not surprised by the result because they had given
So they presented the description of Linda to another group,
Linda is active in the feminist movement. Linda is a bank teller and is active in the feminist movement. Linda is a bank teller.
14
To their surprise, 87 percent of the subjects in this trial also
If the details we are given fit our mental picture of
even though any act of adding less-than-certain details to a conjecture
makes the conjecture less probable.
Even highly trained doctors make this error when analyzing
symptoms.
91 percent of the doctors fall prey to the same bias.
[Amos Tversky and Daniel Kahneman, “Extensional versus Intuitive Reasoning: The Conjunction Fallacy in Probability Judgment,” Psychological Review 90, no. 4 (October 1983): 293–315.]
15
Page 34-37 Tversky and Shafir @
16
the number of six-letter English words having “n” as their fifth letter
the number of six-letter English words ending in “-ing”?
Most people choose the group of words ending in “ing”. Why? Because words ending in “-ing” are easier to think of than generic six letter words having “n” as their fifth letter.
The group of six-letter words having “n” as their fifth letter words
includes all six-letter words ending in “-ing”.
Psychologists call this type of mistake the availability bias
In reconstructing the past, we give unwarranted importance to
memories that are most vivid and hence most available for retrieval.
[Amos Tversky and Daniel Kahneman, “Availability: A Heuristic for Judging Frequency and Probability,” Cognitive Psychology 5 (1973): 207–32.]
17
It is not uncommon for experts in DNA analysis to testify at a
How certain are such matches? When DNA evidence was first introduced, a number of experts
Today DNA experts regularly testify that the odds of a random
person’s matching the crime sample are less than 1 in 1
In Oklahoma a court sentenced a man named Timothy Durham to
more than 3,100 years in prison even though eleven witnesses had placed him in another state at the time of the crime.
[Mlodinow, 2008, p 36-37]
18
There is another statistic that is often not presented to the
reporting results.
Each of these errors is rare but not nearly as rare as a random
match.
The Philadelphia City Crime Laboratory admitted that it had
swapped the reference sample of the defendant and the victim in a rape case
A testing firm called Cellmark Diagnostics admitted a similar
error.
[Mlodinow, 2008, p 36-37]
19
It turned out that in the initial analysis the lab had failed to
A later retest turned up the error, and Durham was released
[Mlodinow, 2008, p 36-37]
20
Estimates of the error rate due to human causes vary, but
Most jurors assume that given the two types of error—the 1
[Mlodinow, 2008, p 36-37]
21
Even if the DNA match error was extremely accurate + Lab
there is also another probability concept that should be taken
More about this later. Right now, back to notes for more properties of probability
prapun@siit.tu.ac.th
1
Office Hours: Rangsit Library: Tuesday 16:20-17:20 BKD3601-7: Thursday 16:00-17:00
2
3
Suppose we have a diagnostic test for a particular disease
A person is picked at random and tested for the disease. The test gives a positive result. Q1: What is the probability that the person actually has the
Natural answer: 99% because the test gets it right 99% of the
4
Two kinds of error If you use this test on many persons with the disease, the
False negative rate = 1% = 0.01
If you use this test on many persons without the disease, the
False positive rate = 1% = 0.01
1 0 0 1
5
Suppose we have a diagnostic test for a particular disease
A person is picked at random and tested for the disease. The test gives a positive result. Q1: What is the probability that the person actually has the
Natural answer: 99% because the test gets it right 99% of the
Q2: Can the answer be 1% or 2%? Q3: Can the answer be 50%?
6
7
Let’s assume rare disease.
The disease affects about 1 person in 10,000.
Try an experiment with 106 people. Approximately 100 people will have the disease. What would the (99%-accurate) test say?
Test 106 people
8
100 people w/ disease 999,900 people w/o disease 99 of them will test positive 1 of them will test negative 989,901 of them will test negative 9,999 of them will test positive
approximately
9
100 people w/ disease 999,900 people w/o disease 99 of them will test positive 1 of them will test negative 989,901 of them will test negative 9,999 of them will test positive Of those who test positive, only
99 1% 99 9,999
actually have the disease!
10
TE D TE D TE D
11
TE D TE D TE D
1 1 pD P(D|TP)
12
10
10
10
10
10
10
10 10
10
10
10
10
10 d
pD P(D|TP)
13
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
pD P(D|TP)
14
Q1: What is the probability that the person actually has the
A1: The answer actually depends on how common or how
Q2: Can the answer be 1% or 2%? A2:
Q3: Can the answer be 50%? A3:
15
Roll a fair dice Sneak peek:
16
O. J. Simpson
At the time a well-known celebrity famous
both as a TV actor and as a retired professional football star.
Defense lawyer: Alan Dershowitz
Renowned attorney and Harvard Law
[Mlodinow, 2008, p. 119-121],[Tijms, 1007, Ex 8.7]
Murder case
“one of the biggest media events of 1994–95” “the most publicized criminal trial in American history”
(การพิจารณาคดีในศาล) (ทนาย)
17
Nicole Brown was murdered at her home
So was her friend Ronald Goldman.
The prime suspect was her (ex-)
(They were divorced in 1992.)
(ผู้ต้องสงสัย)
18
Prosecutors* spent the first ten days of the trial entering
As they put it,
Prosecutor* = a government official who conducts criminal prosecutions on behalf of the state (พนักงานอัยการ)
(เป็นฝ่ายผู้ฟ้องร้อง/โจทก์) (ฆาตกรรม)
19
The defense attorneys argued
that the prosecution* had spent two weeks trying to mislead
the jury
and that the evidence that O. J. had battered Nicole on
Dershowitz’s reasoning:
4 million women are battered annually by husbands and
In 1992, a total of 1,432, or 1 in 2,500, were killed by their
(ex)husbands or boyfriends.
Therefore, few men who slap or beat their domestic partners
True? …Yes…Convincing?
(ทนายฝ่ายจ าเลย) (ทุบตี)
20
21
It is important to make use of the crucial fact that Nicole
The relevant number is not the probability that a man who
According to the Uniform Crime Reports for the United
That statistic was not mentioned at the trial.
This event has happened and should be used in probability evaluation
22
23
Physically abused by husband Murdered by husband Murdered
Physically abused by husband Murdered by husband Murdered
The orange event is ignored.
24
Dershowitz may have felt justified in misleading the jury
Defense attorneys, prosecutors, and judges don’t take this
[Mlodinow, The Drunkard's Walk: How Randomness Rules Our Lives]
25
University of California, Berkeley was sued for bias
The admission figures for the fall of 1973 showed that men
Applicants Admitted Men 8442 44% Women 4321 35%
26
But when examining the individual departments, it appeared
In fact, most departments had a "small but statistically
How?
27
The data from the six largest departments are listed below.
Department Men Women Applicants Admitted Applicants Admitted A 825 62% 108 82% B 560 63% 25 68% C 325 37% 593 34% D 417 33% 375 35% E 191 28% 393 24% F 272 6% 341 7%
28
Women tended to apply to competitive departments with
Men tended to apply to less-competitive departments with
Department Men Women Applicants Admitted Applicants Admitted A 825 62% 108 82% B 560 63% 25 68% C 325 37% 593 34% D 417 33% 375 35% E 191 28% 393 24% F 272 6% 341 7%
prapun@siit.tu.ac.th
1
Office Hours: Rangsit Library: Tuesday 16:20-17:20 BKD3601-7: Thursday 16:00-17:00
2
[http://www.sallyclark.org.uk/] [http://en.wikipedia.org/wiki/Sally_Clark] [http://www.timesonline.co.uk/tol/comment/obituaries/article1533755.ece]
3
Falsely accused of the murder of her two
Clark's first son died suddenly within a few
weeks of his birth in 1996.
After her second son died in a similar manner,
she was arrested in 1998 and tried for the murder of both sons.
The case went to appeal, but the convictions
and sentences were confirmed in 2000.
Released in 2003 by Court of Appeal Wrongfully imprisoned for more than 3 years Never fully recovered from the effects of this
appalling miscarriage of justice.
4
Her prosecution was controversial due to statistical
This evidence was presented by a
Meadow testified that the frequency of sudden infant death
He went on to square this figure to obtain a value of 1 in
2 8
1 10 8500
5
“This approach is, in general, statistically invalid.” “It would only be valid if SIDS cases arose independently
“There are very strong a priori reasons for supposing that the
“There may well be unknown genetic or environmental
[http://www.rss.org.uk]
6
Clark's release in January 2003 prompted the Attorney
Two other women convicted of murdering their children
Trupti Patel, who was also accused of murdering her three
In each case, Roy Meadow had testified about the
7
By Peter Donnelly
http://www.youtube.com/watch?v=kLmzxmRcUTo http://www.stats.ox.ac.uk/people/academic_staff/peter_donnelly @ 11:15-13:50 Disease Testing @ 13:50-18:30 Sally Clark
Professor of Statistical Science (Dept Statistics) at University of Oxford
8
Aside from its invalidity, figures such as the 1 in 73 million are
Some press reports at the time stated that this was the chance that
the deaths of Sally Clark's two children were accidental.
This (mis-)interpretation is a serious error of logic known as the
The jury needs to weigh up two competing explanations for the
babies' deaths: 1) SIDS or 2) murder.
Two deaths by SIDS or two murders are each quite unlikely, but
What matters is the relative likelihood of the deaths under each
explanation, not just how unlikely they are under one explanation (in this case SIDS, according to the evidence as presented).
prapun@siit.tu.ac.th
9
Office Hours: Rangsit Library: Tuesday 16:20-17:20 BKD3601-7: Thursday 16:00-17:00
10
prapun@siit.tu.ac.th
1
Office Hours: Rangsit Library: Tuesday 16:20-17:20 BKD3601-7: Thursday 16:00-17:00
2
Roll a fair dice. Record the result.
Again, roll a fair dice. Record the result. Again, roll a fair dice. Record the result. Again, roll a fair dice. Record the result. Again, roll a fair dice. Record the result. Again, roll a fair dice. Record the result.
Generate X 200 times. Put the results in a table of size 2010
3
[N, x] = hist(reshape(X,1,prod(size(X))),1:6) bar(x,N) Grid on frequency x
1 2 3 4 5 6 5 10 15 20 25 30 35 40
4
rf = N/prod(size(X)) bar(x,rf) grid on stem(x,rf,'filled','LineWidth',1.5) grid on x x relative frequency relative frequency
1 2 3 4 5 6 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 1 2 3 4 5 6 7 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
5
rf = N/prod(size(X)) bar(x,rf) grid on stem(x,rf,'filled','LineWidth',1.5) grid on x x relative frequency relative frequency
1 2 3 4 5 6 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 1 2 3 4 5 6 7 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18
X = randi(6,100,100);