Seeing the unseen: from coin flips to statistical inverse problems - - PowerPoint PPT Presentation
Seeing the unseen: from coin flips to statistical inverse problems - - PowerPoint PPT Presentation
Seeing the unseen: from coin flips to statistical inverse problems Alberto J. Coca StatsLab, University of Cambridge Topics Taster University of Cambridge Open Day 5th July 2018 Outline 1 Introduction Mathematical Statistics Examples 2
Outline
1 Introduction
Mathematical Statistics Examples
2 Seeing the unseen
Coin flips Statistical inverse problems
Outline
1 Introduction
Mathematical Statistics Examples
2 Seeing the unseen
Coin flips Statistical inverse problems
Outline
1 Introduction
Mathematical Statistics Examples
2 Seeing the unseen
Coin flips Statistical inverse problems
Mathematical Statistics
Question: what is Statistics?
Mathematical Statistics
Question: what is Statistics? An answer: extracting information and drawing conclusions from (random) data.
Mathematical Statistics
Question: what is Statistics? An answer: extracting information and drawing conclusions from (random) data. We are in the era of data and Statistics is of key importance!
Mathematical Statistics
Question: what is Statistics? An answer: extracting information and drawing conclusions from (random) data. We are in the era of data and Statistics is of key importance! We face many challenges.
Mathematical Statistics
Question: what is Statistics? An answer: extracting information and drawing conclusions from (random) data. We are in the era of data and Statistics is of key importance! We face many challenges. E.g., to develop new statistical methods to analyse very large and complex data sets (old methods no cannot handle them!),
Mathematical Statistics
Question: what is Statistics? An answer: extracting information and drawing conclusions from (random) data. We are in the era of data and Statistics is of key importance! We face many challenges. E.g., to develop new statistical methods to analyse very large and complex data sets (old methods no cannot handle them!), to make them computationally efficient,
Mathematical Statistics
Question: what is Statistics? An answer: extracting information and drawing conclusions from (random) data. We are in the era of data and Statistics is of key importance! We face many challenges. E.g., to develop new statistical methods to analyse very large and complex data sets (old methods no cannot handle them!), to make them computationally efficient, etc.
Mathematical Statistics
Question: what is Statistics? An answer: extracting information and drawing conclusions from (random) data. We are in the era of data and Statistics is of key importance! We face many challenges. E.g., to develop new statistical methods to analyse very large and complex data sets (old methods no cannot handle them!), to make them computationally efficient, etc. Mathematical Statistics: understand mathematical properties of statistical methods to be sure that they are sensible (in some sense).
Outline
1 Introduction
Mathematical Statistics Examples
2 Seeing the unseen
Coin flips Statistical inverse problems
Example I: Netflix prize
$1M contest open from October 2006 to August 2009.
Example I: Netflix prize
$1M contest open from October 2006 to August 2009. Predict un-rated films. Data (about 98.7% ratings missing!):
Example I: Netflix prize
$1M contest open from October 2006 to August 2009. Predict un-rated films. Data (about 98.7% ratings missing!): User Film F1 F2 F3 F4 F5 F6 ... F18k U1 4 5 ? ? 2 ? . . . 1 U2 4 5 3 ? 3 ? . . . ? U3 5 4 2 4 ? ? . . . 2 U4 2 ? ? 5 ? 2 . . . ? U5 1 ? 4 5 ? 2 . . . 5 . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . U480k ? ? 1 1 ? ? . . . 5
Example I: Netflix prize
$1M contest open from October 2006 to August 2009. Predict un-rated films. Data (about 98.7% ratings missing!): User Film F1 F2 F3 F4 F5 F6 ... F18k U1 4 5 ? ? 2 ? . . . 1 U2 4 5 3 ? 3 ? . . . ? U3 5 4 2 4 ? ? . . . 2 U4 2 ? ? 5 ? 2 . . . ? U5 1 ? 4 5 ? 2 . . . 5 . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . U480k ? ? 1 1 ? ? . . . 5
Example I: Netflix prize
$1M contest open from October 2006 to August 2009. Predict un-rated films. Data (about 98.7% ratings missing!): User Film F1 F2 F3 F4 F5 F6 ... F18k U1 4 5 2? ? 2 ? . . . 1 U2 4 5 3 ? 3 ? . . . 1? U3 5 4 2 4 ? ? . . . 2 U4 2 ? ? 5 ? 2 . . . ? U5 1 ? 4 5 ? 2 . . . 5 . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . U480k ? ? 1 1 ? ? . . . 5
Example I: Netflix prize
$1M contest open from October 2006 to August 2009. Predict un-rated films. Data (about 98.7% ratings missing!): User Film F1 F2 F3 F4 F5 F6 ... F18k U1 4 5 2 ? 2 ? . . . 1 U2 4 5 3 ? 3 ? . . . 1 U3 5 4 2 4 ? ? . . . 2 U4 2 ? ? 5 ? 2 . . . ? U5 1 ? 4 5 ? 2 . . . 5 . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . U480k ? ? 1 1 ? ? . . . 5
Example I: Netflix prize
$1M contest open from October 2006 to August 2009. Predict un-rated films. Data (about 98.7% ratings missing!): User Film F1 F2 F3 F4 F5 F6 ... F18k U1 4 5 2 4? 2 ? . . . 1 U2 4 5 3 3? 3 ? . . . 1 U3 5 4 2 4 3? ? . . . 2 U4 2 ? ? 5 ? 2 . . . ? U5 1 ? 4 5 ? 2 . . . 5 . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . U480k ? ? 1 1 ? ? . . . 5
Example I: Netflix prize
$1M contest open from October 2006 to August 2009. Predict un-rated films. Data (about 98.7% ratings missing!): User Film F1 F2 F3 F4 F5 F6 ... F18k U1 4 5 2 4 2 ? . . . 1 U2 4 5 3 3 3 ? . . . 1 U3 5 4 2 4 3 ? . . . 2 U4 2 ? ? 5 ? 2 . . . ? U5 1 ? 4 5 ? 2 . . . 5 . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . U480k ? ? 1 1 ? ? . . . 5
Example I: Netflix prize
$1M contest open from October 2006 to August 2009. Predict un-rated films. Data (about 98.7% ratings missing!): User Film F1 F2 F3 F4 F5 F6 ... F18k U1 4 5 2 4 2 ? . . . 1 U2 4 5 3 3 3 ? . . . 1 U3 5 4 2 4 3 ? . . . 2 U4 2 ? 4? 5 ? 2 . . . 5? U5 1 ? 4 5 ? 2 . . . 5 . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . U480k ? ? 1 1 ? ? . . . 5
Example I: Netflix prize
$1M contest open from October 2006 to August 2009. Predict un-rated films. Data (about 98.7% ratings missing!): User Film F1 F2 F3 F4 F5 F6 ... F18k U1 4 5 2 4 2 ? . . . 1 U2 4 5 3 3 3 ? . . . 1 U3 5 4 2 4 3 ? . . . 2 U4 2 ? 4 5 ? 2 . . . 5 U5 1 ? 4 5 ? 2 . . . 5 . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . U480k?? ? ? 1 1 ? ? . . . 5
Example I: Netflix prize
$1M contest open from October 2006 to August 2009. Predict un-rated films. Data (about 98.7% ratings missing!): User Film F1 F2 F3 F4 F5 F6 ... F18k U1 4 5 2 4 2 ? . . . 1 U2 4 5 3 3 3 ? . . . 1 U3 5 4 2 4 3 ? . . . 2 U4 2 ? 4 5 ? 2 . . . 5 U5 1 ? 4 5 ? 2 . . . 5 . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . U480k ? ? 1 1 ? ? . . . 5 Is this matrix-completion algorithm mathematically sensible?
Example I: Netflix prize
$1M contest open from October 2006 to August 2009. Predict un-rated films. Data (about 98.7% ratings missing!): User Film F1 F2 F3 F4 F5 F6 ... F18k U1 4 5 2 4 2 ? . . . 1 U2 4 5 3 3 3 ? . . . 1 U3 5 4 2 4 3 ? . . . 2 U4 2 ? 4 5 ? 2 . . . 5 U5 1 ? 4 5 ? 2 . . . 5 . . . . . . . . . . . . . . . . . . . . .
. . . . . .
. . . U480k ? ? 1 1 ? ? . . . 5 Is this matrix-completion algorithm mathematically sensible? E.g., if we have more and more data, will it recover the “true ratings”?
Example II: medical imaging
We need non-invasive ways to explore/diagnose patients: e.g., ultrasound, CT scan, MRI, etc.
Example II: medical imaging
We need non-invasive ways to explore/diagnose patients: e.g., ultrasound, CT scan, MRI, etc. Ultrasound (very simplified!):
Example II: medical imaging
We need non-invasive ways to explore/diagnose patients: e.g., ultrasound, CT scan, MRI, etc. Ultrasound (very simplified!): send many sound pulses with probe that travel into your body;
Example II: medical imaging
We need non-invasive ways to explore/diagnose patients: e.g., ultrasound, CT scan, MRI, etc. Ultrasound (very simplified!): send many sound pulses with probe that travel into your body; they hit boundaries between tissues and get reflected; and,
Example II: medical imaging
We need non-invasive ways to explore/diagnose patients: e.g., ultrasound, CT scan, MRI, etc. Ultrasound (very simplified!): send many sound pulses with probe that travel into your body; they hit boundaries between tissues and get reflected; and, image is created using the times echoes take to return to probe.
Example II: medical imaging
We need non-invasive ways to explore/diagnose patients: e.g., ultrasound, CT scan, MRI, etc. Ultrasound (very simplified!): send many sound pulses with probe that travel into your body; they hit boundaries between tissues and get reflected; and, image is created using the times echoes take to return to probe.
Example II: medical imaging
We need non-invasive ways to explore/diagnose patients: e.g., ultrasound, CT scan, MRI, etc. Ultrasound (very simplified!): send many sound pulses with probe that travel into your body; they hit boundaries between tissues and get reflected; and, image is created using the times echoes take to return to probe.
Example II: medical imaging
We need non-invasive ways to explore/diagnose patients: e.g., ultrasound, CT scan, MRI, etc. Ultrasound (very simplified!): send many sound pulses with probe that travel into your body; they hit boundaries between tissues and get reflected; and, image is created using the times echoes take to return to probe. Given tissues produce specific echoes. Machine “inverted” latter indirect (and incomplete!) info and dealt with instrumental errors (random): statistical inverse problem.
Example II: medical imaging
We need non-invasive ways to explore/diagnose patients: e.g., ultrasound, CT scan, MRI, etc. Ultrasound (very simplified!): send many sound pulses with probe that travel into your body; they hit boundaries between tissues and get reflected; and, image is created using the times echoes take to return to probe. Given tissues produce specific echoes. Machine “inverted” latter indirect (and incomplete!) info and dealt with instrumental errors (random): statistical inverse problem. Does it do better as instrum. errors decrease?
Outline
1 Introduction
Mathematical Statistics Examples
2 Seeing the unseen
Coin flips Statistical inverse problems
Outline
1 Introduction
Mathematical Statistics Examples
2 Seeing the unseen
Coin flips Statistical inverse problems
Coin flips: experiment I
Let us simplify things and flip some coins:
Coin flips: experiment I
Let us simplify things and flip some coins: with your phones, go to https://albertococacabrero.wordpress.com/openday/ or google Alberto J Coca Cambridge and add openday/ to end of my URL:
Coin flips: experiment I
Let us simplify things and flip some coins: with your phones, go to https://albertococacabrero.wordpress.com/openday/ or google Alberto J Coca Cambridge and add openday/ to end of my URL:
Coin flips: experiment I
Let us simplify things and flip some coins: with your phones, go to https://albertococacabrero.wordpress.com/openday/ or google Alberto J Coca Cambridge and add openday/ to end of my URL:
Coin flips: experiment I
Let us simplify things and flip some coins: with your phones, go to https://albertococacabrero.wordpress.com/openday/ or google Alberto J Coca Cambridge and add openday/ to end of my URL:
Coin flips: experiment I, MLE
We do not know the probability p ∈ [0, 1] of this coin landing Heads (typically p =1/2, i.e. fair coin).
Coin flips: experiment I, MLE
We do not know the probability p ∈ [0, 1] of this coin landing Heads (typically p =1/2, i.e. fair coin). How to guess p from our data?
Coin flips: experiment I, MLE
We do not know the probability p ∈ [0, 1] of this coin landing Heads (typically p =1/2, i.e. fair coin). How to guess p from our data? “Frequentist” guess is ˆ p = #Heads
#flips .
Coin flips: experiment I, MLE
We do not know the probability p ∈ [0, 1] of this coin landing Heads (typically p =1/2, i.e. fair coin). How to guess p from our data? “Frequentist” guess is ˆ p = #Heads
#flips . Is this a sensible guess?
Coin flips: experiment I, MLE
We do not know the probability p ∈ [0, 1] of this coin landing Heads (typically p =1/2, i.e. fair coin). How to guess p from our data? “Frequentist” guess is ˆ p = #Heads
#flips . Is this a sensible guess? Yes:
Coin flips: experiment I, MLE
We do not know the probability p ∈ [0, 1] of this coin landing Heads (typically p =1/2, i.e. fair coin). How to guess p from our data? “Frequentist” guess is ˆ p = #Heads
#flips . Is this a sensible guess? Yes:
each coin flip has options and probabilities given by Options Heads Tails Probabs. p 1 − p
Coin flips: experiment I, MLE
We do not know the probability p ∈ [0, 1] of this coin landing Heads (typically p =1/2, i.e. fair coin). How to guess p from our data? “Frequentist” guess is ˆ p = #Heads
#flips . Is this a sensible guess? Yes:
each coin flip has options and probabilities given by Options Heads Tails Probabs. p 1 − p If our data is Heads,Tails, Heads, it has probability or likelihood L = p(1 − p)p = p2(1 − p);
Coin flips: experiment I, MLE
We do not know the probability p ∈ [0, 1] of this coin landing Heads (typically p =1/2, i.e. fair coin). How to guess p from our data? “Frequentist” guess is ˆ p = #Heads
#flips . Is this a sensible guess? Yes:
each coin flip has options and probabilities given by Options Heads Tails Probabs. p 1 − p If our data is Heads,Tails, Heads, it has probability or likelihood L = p(1 − p)p = p2(1 − p); and, more generally, if we flip n coins and obtain m heads, the likelihood of our data is L(p, m, n) = pm(1 − p)n−m.
Coin flips: experiment I, MLE
We do not know the probability p ∈ [0, 1] of this coin landing Heads (typically p =1/2, i.e. fair coin). How to guess p from our data? “Frequentist” guess is ˆ p = #Heads
#flips . Is this a sensible guess? Yes:
each coin flip has options and probabilities given by Options Heads Tails Probabs. p 1 − p If our data is Heads,Tails, Heads, it has probability or likelihood L = p(1 − p)p = p2(1 − p); and, more generally, if we flip n coins and obtain m heads, the likelihood of our data is L(p, m, n) = pm(1 − p)n−m. Homework: ˆ p = m
n maximises q → L(q, m, n) when q ∈ [0, 1]!
Coin flips: experiment I, MLE
We do not know the probability p ∈ [0, 1] of this coin landing Heads (typically p =1/2, i.e. fair coin). How to guess p from our data? “Frequentist” guess is ˆ p = #Heads
#flips . Is this a sensible guess? Yes:
each coin flip has options and probabilities given by Options Heads Tails Probabs. p 1 − p If our data is Heads,Tails, Heads, it has probability or likelihood L = p(1 − p)p = p2(1 − p); and, more generally, if we flip n coins and obtain m heads, the likelihood of our data is L(p, m, n) = pm(1 − p)n−m. Homework: ˆ p = m
n maximises q → L(q, m, n) when q ∈ [0, 1]!
Indeed, ˆ p is called the Maximum Likelihood Estimator (MLE).
Coin flips: experiment I, MLE
The MLE ˆ p = m
n enjoys mathematically desirable properties:
Coin flips: experiment I, MLE
The MLE ˆ p = m
n enjoys mathematically desirable properties: e.g.,
Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞.
Coin flips: experiment I, MLE
The MLE ˆ p = m
n enjoys mathematically desirable properties: e.g.,
Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞. How fast does Error = |ˆ p − p| → 0?
Coin flips: experiment I, MLE
The MLE ˆ p = m
n enjoys mathematically desirable properties: e.g.,
Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞. How fast does Error = |ˆ p − p| → 0? Mathematical results guarantee it cannot be faster than 1/√n.
Coin flips: experiment I, MLE
The MLE ˆ p = m
n enjoys mathematically desirable properties: e.g.,
Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞. How fast does Error = |ˆ p − p| → 0? Mathematical results guarantee it cannot be faster than 1/√n. Note that if Error ≈ 1/na = n−a for some a>0, then log Error ≈ −a log n.
Coin flips: experiment I, MLE
The MLE ˆ p = m
n enjoys mathematically desirable properties: e.g.,
Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞. How fast does Error = |ˆ p − p| → 0? Mathematical results guarantee it cannot be faster than 1/√n. Note that if Error ≈ 1/na = n−a for some a>0, then log Error ≈ −a log n. Hence, to find value of a, plot x = log n vs. y = log Error ≈ −ax and compute the slope.
Coin flips: experiment I, MLE
The MLE ˆ p = m
n enjoys mathematically desirable properties: e.g.,
Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞. How fast does Error = |ˆ p − p| → 0? Mathematical results guarantee it cannot be faster than 1/√n. Note that if Error ≈ 1/na = n−a for some a>0, then log Error ≈ −a log n. Hence, to find value of a, plot x = log n vs. y = log Error ≈ −ax and compute the slope. Plot suggests a = 1/2, i.e. Error ≈ 1/√n!
Coin flips: experiment I, MLE
The MLE ˆ p = m
n enjoys mathematically desirable properties: e.g.,
Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞. How fast does Error = |ˆ p − p| → 0? Mathematical results guarantee it cannot be faster than 1/√n. Note that if Error ≈ 1/na = n−a for some a>0, then log Error ≈ −a log n. Hence, to find value of a, plot x = log n vs. y = log Error ≈ −ax and compute the slope. Plot suggests a = 1/2, i.e. Error ≈ 1/√n! Thus, MLE is optimal in convergence rates!
Coin flips: experiment I, MLE
The MLE ˆ p = m
n enjoys mathematically desirable properties: e.g.,
Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞. How fast does Error = |ˆ p − p| → 0? Mathematical results guarantee it cannot be faster than 1/√n. Note that if Error ≈ 1/na = n−a for some a>0, then log Error ≈ −a log n. Hence, to find value of a, plot x = log n vs. y = log Error ≈ −ax and compute the slope. Plot suggests a = 1/2, i.e. Error ≈ 1/√n! Thus, MLE is optimal in convergence rates! (Optimal in other senses in view of, e.g., Central Limit Theorem, but no time to explain this.)
Coin flips: experiment I, MLE
The MLE ˆ p = m
n enjoys mathematically desirable properties: e.g.,
Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞. How fast does Error = |ˆ p − p| → 0? Mathematical results guarantee it cannot be faster than 1/√n. Note that if Error ≈ 1/na = n−a for some a>0, then log Error ≈ −a log n. Hence, to find value of a, plot x = log n vs. y = log Error ≈ −ax and compute the slope. Plot suggests a = 1/2, i.e. Error ≈ 1/√n! Thus, MLE is optimal in convergence rates! (Optimal in other senses in view of, e.g., Central Limit Theorem, but no time to explain this.) Other estimators are optimal too:
Coin flips: experiment I, Bayes
‘Bayesian” method: before conducting the experiment, guess probabilities for unknown p, i.e., Prob(p = q |no data) = G(q) with G, e.g., Green: I am (obviously!) respectable and coin is probably fair;
Coin flips: experiment I, Bayes
‘Bayesian” method: before conducting the experiment, guess probabilities for unknown p, i.e., Prob(p = q |no data) = G(q) with G, e.g., Green: I am (obviously!) respectable and coin is probably fair; Red: I am (absolutely) not re- spectable and coin is unfair;
Coin flips: experiment I, Bayes
‘Bayesian” method: before conducting the experiment, guess probabilities for unknown p, i.e., Prob(p = q |no data) = G(q) with G, e.g., Green: I am (obviously!) respectable and coin is probably fair; Red: I am (absolutely) not re- spectable and coin is unfair; Blue: I would rather not guess to avoid confrontations...
Coin flips: experiment I, Bayes
‘Bayesian” method: before conducting the experiment, guess probabilities for unknown p, i.e., Prob(p = q |no data) = G(q) with G, e.g., Green: I am (obviously!) respectable and coin is probably fair; Red: I am (absolutely) not re- spectable and coin is unfair; Blue: I would rather not guess to avoid confrontations... The initial guess G(q) evolves through Bayes rule as we get the data giving B(q) = L(q, m, n)G(q) 1
0 L(r, m, n)G(r)dr
, q ∈ [0, 1].
Coin flips: experiment I, Bayes
‘Bayesian” method: before conducting the experiment, guess probabilities for unknown p, i.e., Prob(p = q |no data) = G(q) with G, e.g., Green: I am (obviously!) respectable and coin is probably fair; Red: I am (absolutely) not re- spectable and coin is unfair; Blue: I would rather not guess to avoid confrontations... The initial guess G(q) evolves through Bayes rule as we get the data giving B(q) = L(q, m, n)G(q) 1
0 L(r, m, n)G(r)dr
, q ∈ [0, 1]. The normalisation ensures 1
0 B(q)dq =
1
0 Prob(p = q |data)dq = 1.
Coin flips: experiment I, Bayes
How does the data-update of G given by B look?
Coin flips: experiment I, Bayes
How does the data-update of G given by B look?
Coin flips: experiment I, Bayes
How does the data-update of G given by B look? B gives much more information than ˆ p!
Coin flips: experiment I, Bayes
How does the data-update of G given by B look? B gives much more information than ˆ p! It does so without having to maximise likelihood!
Coin flips: experiment I, Bayes
How does the data-update of G given by B look? B gives much more information than ˆ p! It does so without having to maximise likelihood! Mathematical properties?
Coin flips: experiment I, Bayes
How does the data-update of G given by B look? B gives much more information than ˆ p! It does so without having to maximise likelihood! Optimal! Bernstein–von Mises Theorem (BvM) If G(p)>0, then B(q) ≈ ˆ p + 1
√n
- p(1 − p) N(0, 1)(q) as n → ∞.
(Here N(0, 1) is the “Bell curve”.)
Coin flips: experiment I, Bayes
How does the data-update of G given by B look? B gives much more information than ˆ p! It does so without having to maximise likelihood! Optimal! Bernstein–von Mises Theorem (BvM) If G(p)>0, then B(q) ≈ ˆ p + 1
√n
- p(1 − p) N(0, 1)(q) as n → ∞.
Indeed, Bayesian method is extensively used in practice. However, much harder to analyse mathematically: no-free-lunch principle!
Coin flips: experiment II, MLE
Let H =Heads =0 and T =Tails =1.
Coin flips: experiment II, MLE
Let H =Heads =0 and T =Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses.
Coin flips: experiment II, MLE
Let H =Heads =0 and T =Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses. E.g., if coin flips are H, H
- 0+0=0
, T, H
- 1+0=1
, H,T
- 0+1=1
, T,T
- 1+1=2
, the data is 0,1,1,2.
Coin flips: experiment II, MLE
Let H =Heads =0 and T =Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses. E.g., if coin flips are H, H
- 0+0=0
, T, H
- 1+0=1
, H,T
- 0+1=1
, T,T
- 1+1=2
, the data is 0,1,1,2. How to guess the probability p of the underlying coin landing Heads now?
Coin flips: experiment II, MLE
Let H =Heads =0 and T =Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses. E.g., if coin flips are H, H
- 0+0=0
, T, H
- 1+0=1
, H,T
- 0+1=1
, T,T
- 1+1=2
, the data is 0,1,1,2. How to guess the probability p of the underlying coin landing Heads now? I.e., how to see the unseen?
Coin flips: experiment II, MLE
Let H =Heads =0 and T =Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses. E.g., if coin flips are H, H
- 0+0=0
, T, H
- 1+0=1
, H,T
- 0+1=1
, T,T
- 1+1=2
, the data is 0,1,1,2. How to guess the probability p of the underlying coin landing Heads now? I.e., how to see the unseen? Each sum of a pair of coin flips has options and probabilities Options 1 2 Probabs. p2 2p(1 − p) (1 − p)2
Coin flips: experiment II, MLE
Let H =Heads =0 and T =Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses. E.g., if coin flips are H, H
- 0+0=0
, T, H
- 1+0=1
, H,T
- 0+1=1
, T,T
- 1+1=2
, the data is 0,1,1,2. How to guess the probability p of the underlying coin landing Heads now? I.e., how to see the unseen? Each sum of a pair of coin flips has options and probabilities Options 1 2 Probabs. p2 2p(1 − p) (1 − p)2 Easy guesses given by “inverting table entries”: e.g., ˆ p =
- #0s
#data.
Coin flips: experiment II, MLE
Let H =Heads =0 and T =Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses. E.g., if coin flips are H, H
- 0+0=0
, T, H
- 1+0=1
, H,T
- 0+1=1
, T,T
- 1+1=2
, the data is 0,1,1,2. How to guess the probability p of the underlying coin landing Heads now? I.e., how to see the unseen? Each sum of a pair of coin flips has options and probabilities Options 1 2 Probabs. p2 2p(1 − p) (1 − p)2 Easy guesses given by “inverting table entries”: e.g., ˆ p =
- #0s
#data.
Cannot be optimal as it does not use all the info from data.
Coin flips: experiment II, MLE
Let H =Heads =0 and T =Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses. E.g., if coin flips are H, H
- 0+0=0
, T, H
- 1+0=1
, H,T
- 0+1=1
, T,T
- 1+1=2
, the data is 0,1,1,2. How to guess the probability p of the underlying coin landing Heads now? I.e., how to see the unseen? Each sum of a pair of coin flips has options and probabilities Options 1 2 Probabs. p2 2p(1 − p) (1 − p)2 Easy guesses given by “inverting table entries”: e.g., ˆ p =
- #0s
#data.
Cannot be optimal as it does not use all the info from data. If n0 =#0s, n1 =#1s and n2 =#2s, the likelihood of the data is L(p, n0, n1, n2) = p2n0(2p(1 − p))n1(1 − p)2n2.
Coin flips: experiment II, MLE
Let H =Heads =0 and T =Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses. E.g., if coin flips are H, H
- 0+0=0
, T, H
- 1+0=1
, H,T
- 0+1=1
, T,T
- 1+1=2
, the data is 0,1,1,2. How to guess the probability p of the underlying coin landing Heads now? I.e., how to see the unseen? Each sum of a pair of coin flips has options and probabilities Options 1 2 Probabs. p2 2p(1 − p) (1 − p)2 Easy guesses given by “inverting table entries”: e.g., ˆ p =
- #0s
#data.
Cannot be optimal as it does not use all the info from data. If n0 =#0s, n1 =#1s and n2 =#2s, the likelihood of the data is L(p, n0, n1, n2) = p2n0(2p(1 − p))n1(1 − p)2n2. MLE is ˆ p = 1
2(1+ n0−n2 n
). The MLE “inverts table entries” optimally, enjoying same properties as before (and more): as n → ∞, ˆ p → p with Error = |ˆ p − p| ≈ 1/√n.
Coin flips: experiment II, Bayes
Now we appreciate further the superiority of the Bayesian method:
Coin flips: experiment II, Bayes
Now we appreciate further the superiority of the Bayesian method: with same initial guess G(q) as before, we again take B(q) = L(q, n0, n1, n2)G(q) 1
0 L(r, n0, n1, n2)G(r)dr
, q ∈ [0, 1].
Coin flips: experiment II, Bayes
Now we appreciate further the superiority of the Bayesian method: with same initial guess G(q) as before, we again take B(q) = L(q, n0, n1, n2)G(q) 1
0 L(r, n0, n1, n2)G(r)dr
, q ∈ [0, 1]. B in action
- p = 2
3 ≈67%
- :
Coin flips: experiment II, Bayes
Now we appreciate further the superiority of the Bayesian method: with same initial guess G(q) as before, we again take B(q) = L(q, n0, n1, n2)G(q) 1
0 L(r, n0, n1, n2)G(r)dr
, q ∈ [0, 1]. B in action
- p = 2
3 ≈67%
- :
Coin flips: experiment II, Bayes
Now we appreciate further the superiority of the Bayesian method: with same initial guess G(q) as before, we again take B(q) = L(q, n0, n1, n2)G(q) 1
0 L(r, n0, n1, n2)G(r)dr
, q ∈ [0, 1]. B in action
- p = 2
3 ≈67%
- :
Again, B gives much more information than ˆ p!
Coin flips: experiment II, Bayes
Now we appreciate further the superiority of the Bayesian method: with same initial guess G(q) as before, we again take B(q) = L(q, n0, n1, n2)G(q) 1
0 L(r, n0, n1, n2)G(r)dr
, q ∈ [0, 1]. B in action
- p = 2
3 ≈67%
- :
Again, B gives much more information than ˆ p! B much easier than “inverting table entries” (maximising likelihood)!
Coin flips: experiment II, Bayes
Now we appreciate further the superiority of the Bayesian method: with same initial guess G(q) as before, we again take B(q) = L(q, n0, n1, n2)G(q) 1
0 L(r, n0, n1, n2)G(r)dr
, q ∈ [0, 1]. B in action
- p = 2
3 ≈67%
- :
Again, B gives much more information than ˆ p! B much easier than “inverting table entries” (maximising likelihood)! In fact, we input the table entries and B “inverts” them automatically and
- ptimally: a BvM
Theorem holds!
Coin flips: experiment III, MLE
Do not see the coin flips directly but flip a second coin with qH ∈[0, 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin.
Coin flips: experiment III, MLE
Do not see the coin flips directly but flip a second coin with qH ∈[0, 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin. E.g., if second coin’s flips are 0, 1, 0 (not observed!) and the first coin’s flips are H, H
- 0+0=0
, T, H, H, T
- 1+0+0+1=2
, T, T
- 1+1=2
, the data is 0, 2, 2.
Coin flips: experiment III, MLE
Do not see the coin flips directly but flip a second coin with qH ∈[0, 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin. E.g., if second coin’s flips are 0, 1, 0 (not observed!) and the first coin’s flips are H, H
- 0+0=0
, T, H, H, T
- 1+0+0+1=2
, T, T
- 1+1=2
, the data is 0, 2, 2. If qH known, how to guess p now? I.e., how to see the unseen?
Coin flips: experiment III, MLE
Do not see the coin flips directly but flip a second coin with qH ∈[0, 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin. E.g., if second coin’s flips are 0, 1, 0 (not observed!) and the first coin’s flips are H, H
- 0+0=0
, T, H, H, T
- 1+0+0+1=2
, T, T
- 1+1=2
, the data is 0, 2, 2. If qH known, how to guess p now? I.e., how to see the unseen? Each (random) sum of coin flips has options and probabilities given by Opts. 1 ... 4 Probs. p0 =qHp2 +(1−qH)p4 p1 =qH2p(1−p) +(1−qH)4p3(1−p) ... p4 =(1−qH) × (1−p)4
Coin flips: experiment III, MLE
Do not see the coin flips directly but flip a second coin with qH ∈[0, 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin. E.g., if second coin’s flips are 0, 1, 0 (not observed!) and the first coin’s flips are H, H
- 0+0=0
, T, H, H, T
- 1+0+0+1=2
, T, T
- 1+1=2
, the data is 0, 2, 2. If qH known, how to guess p now? I.e., how to see the unseen? Each (random) sum of coin flips has options and probabilities given by Opts. 1 ... 4 Probs. p0 =qHp2 +(1−qH)p4 p1 =qH2p(1−p) +(1−qH)4p3(1−p) ... p4 =(1−qH) × (1−p)4 Guess obtained inverting p4, i.e. ˆ p =1−
- 1
1 − qH #4s #data
- 1/4.
Coin flips: experiment III, MLE
Do not see the coin flips directly but flip a second coin with qH ∈[0, 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin. E.g., if second coin’s flips are 0, 1, 0 (not observed!) and the first coin’s flips are H, H
- 0+0=0
, T, H, H, T
- 1+0+0+1=2
, T, T
- 1+1=2
, the data is 0, 2, 2. If qH known, how to guess p now? I.e., how to see the unseen? Each (random) sum of coin flips has options and probabilities given by Opts. 1 ... 4 Probs. p0 =qHp2 +(1−qH)p4 p1 =qH2p(1−p) +(1−qH)4p3(1−p) ... p4 =(1−qH) × (1−p)4 Guess obtained inverting p4, i.e. ˆ p =1−
- 1
1 − qH #4s #data
- 1/4. Not optimal!
Coin flips: experiment III, MLE
Do not see the coin flips directly but flip a second coin with qH ∈[0, 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin. E.g., if second coin’s flips are 0, 1, 0 (not observed!) and the first coin’s flips are H, H
- 0+0=0
, T, H, H, T
- 1+0+0+1=2
, T, T
- 1+1=2
, the data is 0, 2, 2. If qH known, how to guess p now? I.e., how to see the unseen? Each (random) sum of coin flips has options and probabilities given by Opts. 1 ... 4 Probs. p0 =qHp2 +(1−qH)p4 p1 =qH2p(1−p) +(1−qH)4p3(1−p) ... p4 =(1−qH) × (1−p)4 Guess obtained inverting p4, i.e. ˆ p =1−
- 1
1 − qH #4s #data
- 1/4. Not optimal!
If nj =#js, likelihood of data is L(p, n0, ..., n4)=4
j=0 pj(p)nj.
Coin flips: experiment III, MLE
Do not see the coin flips directly but flip a second coin with qH ∈[0, 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin. E.g., if second coin’s flips are 0, 1, 0 (not observed!) and the first coin’s flips are H, H
- 0+0=0
, T, H, H, T
- 1+0+0+1=2
, T, T
- 1+1=2
, the data is 0, 2, 2. If qH known, how to guess p now? I.e., how to see the unseen? Each (random) sum of coin flips has options and probabilities given by Opts. 1 ... 4 Probs. p0 =qHp2 +(1−qH)p4 p1 =qH2p(1−p) +(1−qH)4p3(1−p) ... p4 =(1−qH) × (1−p)4 Guess obtained inverting p4, i.e. ˆ p =1−
- 1
1 − qH #4s #data
- 1/4. Not optimal!
If nj =#js, likelihood of data is L(p, n0, ..., n4)=4
j=0 pj(p)nj. A unique
maximiser of p → L(p, n0, ..., n4) exists (MLE) but not in closed form: “inverting table entries” explicitly is too hard!
Coin flips: experiment III, MLE
Do not see the coin flips directly but flip a second coin with qH ∈[0, 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin. E.g., if second coin’s flips are 0, 1, 0 (not observed!) and the first coin’s flips are H, H
- 0+0=0
, T, H, H, T
- 1+0+0+1=2
, T, T
- 1+1=2
, the data is 0, 2, 2. If qH known, how to guess p now? I.e., how to see the unseen? Each (random) sum of coin flips has options and probabilities given by Opts. 1 ... 4 Probs. p0 =qHp2 +(1−qH)p4 p1 =qH2p(1−p) +(1−qH)4p3(1−p) ... p4 =(1−qH) × (1−p)4 Guess obtained inverting p4, i.e. ˆ p =1−
- 1
1 − qH #4s #data
- 1/4. Not optimal!
If nj =#js, likelihood of data is L(p, n0, ..., n4)=4
j=0 pj(p)nj. A unique
maximiser of p → L(p, n0, ..., n4) exists (MLE) but not in closed form: “inverting table entries” explicitly is too hard! The implicit MLE still enjoys the same desirable properties.
Coin flips: experiment III, Bayes
The superiority of the Bayesian method in this experiment is clear:
Coin flips: experiment III, Bayes
The superiority of the Bayesian method in this experiment is clear: with same initial guess G(q) as before, we again take B(q) = L(q, n0, ..., n4)G(q) 1
0 L(r, n0, ..., n4)G(r)dr
, q ∈ [0, 1].
Coin flips: experiment III, Bayes
Now we appreciate further the superiority of the Bayesian method: with same initial guess G(q) as before, we again take B(q) = L(q, n0, ..., n4)G(q) 1
0 L(r, n0, ..., n4)G(r)dr
. B in action
- qH = 2
3, p = 1 5 =20%
- :
Coin flips: experiment III, Bayes
Now we appreciate further the superiority of the Bayesian method: with same initial guess G(q) as before, we again take B(q) = L(q, n0, ..., n4)G(q) 1
0 L(r, n0, ..., n4)G(r)dr
. B in action
- qH = 2
3, p = 1 5 =20%
- :
Coin flips: experiment III, Bayes
Now we appreciate further the superiority of the Bayesian method: with same initial guess G(q) as before, we again take B(q) = L(q, n0, ..., n4)G(q) 1
0 L(r, n0, ..., n4)G(r)dr
. B in action
- qH = 2
3, p = 1 5 =20%
- :
Again, B gives much more information than ˆ p!
Coin flips: experiment III, Bayes
Now we appreciate further the superiority of the Bayesian method: with same initial guess G(q) as before, we again take B(q) = L(q, n0, ..., n4)G(q) 1
0 L(r, n0, ..., n4)G(r)dr
. B in action
- qH = 2
3, p = 1 5 =20%
- :
Again, B gives much more information than ˆ p! B has simpler and explicit expression!
Coin flips: experiment III, Bayes
Now we appreciate further the superiority of the Bayesian method: with same initial guess G(q) as before, we again take B(q) = L(q, n0, ..., n4)G(q) 1
0 L(r, n0, ..., n4)G(r)dr
. B in action
- qH = 2
3, p = 1 5 =20%
- :
Again, B gives much more information than ˆ p! B has simpler and explicit expression! We input the table entries and B “inverts” them automatically and
- ptimally: a BvM
Theorem holds!
Outline
1 Introduction
Mathematical Statistics Examples
2 Seeing the unseen
Coin flips Statistical inverse problems
Statistical inverse problems: random processes
The data of Experiment III (qH =p =1/2) can be visualised as
10 20 30 40 50 60 70 80 90 100 0.5 1 1.5 2 2.5 3 3.5 4
Statistical inverse problems: random processes
The data of Experiment III (qH =p =1/2) can be visualised as
10 20 30 40 50 60 70 80 90 100 0.5 1 1.5 2 2.5 3 3.5 4
Generalisation (n=1000): instead of second coin, die with infinite faces
100 200 300 400 500 600 700 800 900 1000 2 4 6 8 10 12
Statistical inverse problems: random processes
The data of Experiment III (qH =p =1/2) can be visualised as
10 20 30 40 50 60 70 80 90 100 0.5 1 1.5 2 2.5 3 3.5 4
Generalisation (n=1000): instead of second coin, die with infinite faces
100 200 300 400 500 600 700 800 900 1000 2 4 6 8 10 12
These are examples of random or stochastic processes: their estimation and study are large areas of fields of statistics and probability.
Statistical inverse problems: random processes
The data of Experiment III (qH =p =1/2) can be visualised as
10 20 30 40 50 60 70 80 90 100 0.5 1 1.5 2 2.5 3 3.5 4
Generalisation (n=1000): instead of second coin, die with infinite faces
100 200 300 400 500 600 700 800 900 1000 2 4 6 8 10 12