[PPT] - Seeing the unseen: from coin flips to statistical inverse problems PowerPoint Presentation

SLIDE 1

Seeing the unseen: from coin flips to statistical inverse problems

Alberto J. Coca StatsLab, University of Cambridge Topics Taster University of Cambridge Open Day 5th July 2018

SLIDE 2

Outline

1 Introduction

Mathematical Statistics Examples

2 Seeing the unseen

Coin flips Statistical inverse problems

SLIDE 3

Outline

1 Introduction

Mathematical Statistics Examples

2 Seeing the unseen

Coin flips Statistical inverse problems

SLIDE 4

Outline

1 Introduction

Mathematical Statistics Examples

2 Seeing the unseen

Coin flips Statistical inverse problems

SLIDE 5

Mathematical Statistics

Question: what is Statistics?

SLIDE 6

Mathematical Statistics

Question: what is Statistics? An answer: extracting information and drawing conclusions from (random) data.

SLIDE 7

Mathematical Statistics

Question: what is Statistics? An answer: extracting information and drawing conclusions from (random) data. We are in the era of data and Statistics is of key importance!

SLIDE 8

Mathematical Statistics

Question: what is Statistics? An answer: extracting information and drawing conclusions from (random) data. We are in the era of data and Statistics is of key importance! We face many challenges.

SLIDE 9

Mathematical Statistics

Question: what is Statistics? An answer: extracting information and drawing conclusions from (random) data. We are in the era of data and Statistics is of key importance! We face many challenges. E.g., to develop new statistical methods to analyse very large and complex data sets (old methods no cannot handle them!),

SLIDE 10

Mathematical Statistics

Question: what is Statistics? An answer: extracting information and drawing conclusions from (random) data. We are in the era of data and Statistics is of key importance! We face many challenges. E.g., to develop new statistical methods to analyse very large and complex data sets (old methods no cannot handle them!), to make them computationally efficient,

SLIDE 11

Mathematical Statistics

Question: what is Statistics? An answer: extracting information and drawing conclusions from (random) data. We are in the era of data and Statistics is of key importance! We face many challenges. E.g., to develop new statistical methods to analyse very large and complex data sets (old methods no cannot handle them!), to make them computationally efficient, etc.

SLIDE 12

Mathematical Statistics

Question: what is Statistics? An answer: extracting information and drawing conclusions from (random) data. We are in the era of data and Statistics is of key importance! We face many challenges. E.g., to develop new statistical methods to analyse very large and complex data sets (old methods no cannot handle them!), to make them computationally efficient, etc. Mathematical Statistics: understand mathematical properties of statistical methods to be sure that they are sensible (in some sense).

SLIDE 13

Outline

1 Introduction

Mathematical Statistics Examples

2 Seeing the unseen

Coin flips Statistical inverse problems

SLIDE 14

Example I: Netflix prize

$1M contest open from October 2006 to August 2009.

SLIDE 15

Example I: Netflix prize

$1M contest open from October 2006 to August 2009. Predict un-rated films. Data (about 98.7% ratings missing!):

SLIDE 16

Example I: Netflix prize

$1M contest open from October 2006 to August 2009. Predict un-rated films. Data (about 98.7% ratings missing!): User Film F1 F2 F3 F4 F5 F6 ... F18k U1 4 5 ? ? 2 ? . . . 1 U2 4 5 3 ? 3 ? . . . ? U3 5 4 2 4 ? ? . . . 2 U4 2 ? ? 5 ? 2 . . . ? U5 1 ? 4 5 ? 2 . . . 5 . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . U480k ? ? 1 1 ? ? . . . 5

SLIDE 17

Example I: Netflix prize

$1M contest open from October 2006 to August 2009. Predict un-rated films. Data (about 98.7% ratings missing!): User Film F1 F2 F3 F4 F5 F6 ... F18k U1 4 5 ? ? 2 ? . . . 1 U2 4 5 3 ? 3 ? . . . ? U3 5 4 2 4 ? ? . . . 2 U4 2 ? ? 5 ? 2 . . . ? U5 1 ? 4 5 ? 2 . . . 5 . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . U480k ? ? 1 1 ? ? . . . 5

SLIDE 18

Example I: Netflix prize

$1M contest open from October 2006 to August 2009. Predict un-rated films. Data (about 98.7% ratings missing!): User Film F1 F2 F3 F4 F5 F6 ... F18k U1 4 5 2? ? 2 ? . . . 1 U2 4 5 3 ? 3 ? . . . 1? U3 5 4 2 4 ? ? . . . 2 U4 2 ? ? 5 ? 2 . . . ? U5 1 ? 4 5 ? 2 . . . 5 . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . U480k ? ? 1 1 ? ? . . . 5

SLIDE 19

Example I: Netflix prize

$1M contest open from October 2006 to August 2009. Predict un-rated films. Data (about 98.7% ratings missing!): User Film F1 F2 F3 F4 F5 F6 ... F18k U1 4 5 2 ? 2 ? . . . 1 U2 4 5 3 ? 3 ? . . . 1 U3 5 4 2 4 ? ? . . . 2 U4 2 ? ? 5 ? 2 . . . ? U5 1 ? 4 5 ? 2 . . . 5 . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . U480k ? ? 1 1 ? ? . . . 5

SLIDE 20

Example I: Netflix prize

$1M contest open from October 2006 to August 2009. Predict un-rated films. Data (about 98.7% ratings missing!): User Film F1 F2 F3 F4 F5 F6 ... F18k U1 4 5 2 4? 2 ? . . . 1 U2 4 5 3 3? 3 ? . . . 1 U3 5 4 2 4 3? ? . . . 2 U4 2 ? ? 5 ? 2 . . . ? U5 1 ? 4 5 ? 2 . . . 5 . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . U480k ? ? 1 1 ? ? . . . 5

SLIDE 21

Example I: Netflix prize

$1M contest open from October 2006 to August 2009. Predict un-rated films. Data (about 98.7% ratings missing!): User Film F1 F2 F3 F4 F5 F6 ... F18k U1 4 5 2 4 2 ? . . . 1 U2 4 5 3 3 3 ? . . . 1 U3 5 4 2 4 3 ? . . . 2 U4 2 ? ? 5 ? 2 . . . ? U5 1 ? 4 5 ? 2 . . . 5 . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . U480k ? ? 1 1 ? ? . . . 5

SLIDE 22

Example I: Netflix prize

$1M contest open from October 2006 to August 2009. Predict un-rated films. Data (about 98.7% ratings missing!): User Film F1 F2 F3 F4 F5 F6 ... F18k U1 4 5 2 4 2 ? . . . 1 U2 4 5 3 3 3 ? . . . 1 U3 5 4 2 4 3 ? . . . 2 U4 2 ? 4? 5 ? 2 . . . 5? U5 1 ? 4 5 ? 2 . . . 5 . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . U480k ? ? 1 1 ? ? . . . 5

SLIDE 23

Example I: Netflix prize

$1M contest open from October 2006 to August 2009. Predict un-rated films. Data (about 98.7% ratings missing!): User Film F1 F2 F3 F4 F5 F6 ... F18k U1 4 5 2 4 2 ? . . . 1 U2 4 5 3 3 3 ? . . . 1 U3 5 4 2 4 3 ? . . . 2 U4 2 ? 4 5 ? 2 . . . 5 U5 1 ? 4 5 ? 2 . . . 5 . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . U480k?? ? ? 1 1 ? ? . . . 5

SLIDE 24

Example I: Netflix prize

$1M contest open from October 2006 to August 2009. Predict un-rated films. Data (about 98.7% ratings missing!): User Film F1 F2 F3 F4 F5 F6 ... F18k U1 4 5 2 4 2 ? . . . 1 U2 4 5 3 3 3 ? . . . 1 U3 5 4 2 4 3 ? . . . 2 U4 2 ? 4 5 ? 2 . . . 5 U5 1 ? 4 5 ? 2 . . . 5 . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . U480k ? ? 1 1 ? ? . . . 5 Is this matrix-completion algorithm mathematically sensible?

SLIDE 25

Example I: Netflix prize

$1M contest open from October 2006 to August 2009. Predict un-rated films. Data (about 98.7% ratings missing!): User Film F1 F2 F3 F4 F5 F6 ... F18k U1 4 5 2 4 2 ? . . . 1 U2 4 5 3 3 3 ? . . . 1 U3 5 4 2 4 3 ? . . . 2 U4 2 ? 4 5 ? 2 . . . 5 U5 1 ? 4 5 ? 2 . . . 5 . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . U480k ? ? 1 1 ? ? . . . 5 Is this matrix-completion algorithm mathematically sensible? E.g., if we have more and more data, will it recover the “true ratings”?

SLIDE 26

Example II: medical imaging

We need non-invasive ways to explore/diagnose patients: e.g., ultrasound, CT scan, MRI, etc.

SLIDE 27

Example II: medical imaging

We need non-invasive ways to explore/diagnose patients: e.g., ultrasound, CT scan, MRI, etc. Ultrasound (very simplified!):

SLIDE 28

Example II: medical imaging

We need non-invasive ways to explore/diagnose patients: e.g., ultrasound, CT scan, MRI, etc. Ultrasound (very simplified!): send many sound pulses with probe that travel into your body;

SLIDE 29

Example II: medical imaging

We need non-invasive ways to explore/diagnose patients: e.g., ultrasound, CT scan, MRI, etc. Ultrasound (very simplified!): send many sound pulses with probe that travel into your body; they hit boundaries between tissues and get reflected; and,

SLIDE 30

Example II: medical imaging

We need non-invasive ways to explore/diagnose patients: e.g., ultrasound, CT scan, MRI, etc. Ultrasound (very simplified!): send many sound pulses with probe that travel into your body; they hit boundaries between tissues and get reflected; and, image is created using the times echoes take to return to probe.

SLIDE 31

Example II: medical imaging

We need non-invasive ways to explore/diagnose patients: e.g., ultrasound, CT scan, MRI, etc. Ultrasound (very simplified!): send many sound pulses with probe that travel into your body; they hit boundaries between tissues and get reflected; and, image is created using the times echoes take to return to probe.

SLIDE 32

Example II: medical imaging

We need non-invasive ways to explore/diagnose patients: e.g., ultrasound, CT scan, MRI, etc. Ultrasound (very simplified!): send many sound pulses with probe that travel into your body; they hit boundaries between tissues and get reflected; and, image is created using the times echoes take to return to probe.

SLIDE 33

Example II: medical imaging

We need non-invasive ways to explore/diagnose patients: e.g., ultrasound, CT scan, MRI, etc. Ultrasound (very simplified!): send many sound pulses with probe that travel into your body; they hit boundaries between tissues and get reflected; and, image is created using the times echoes take to return to probe. Given tissues produce specific echoes. Machine “inverted” latter indirect (and incomplete!) info and dealt with instrumental errors (random): statistical inverse problem.

SLIDE 34

Example II: medical imaging

We need non-invasive ways to explore/diagnose patients: e.g., ultrasound, CT scan, MRI, etc. Ultrasound (very simplified!): send many sound pulses with probe that travel into your body; they hit boundaries between tissues and get reflected; and, image is created using the times echoes take to return to probe. Given tissues produce specific echoes. Machine “inverted” latter indirect (and incomplete!) info and dealt with instrumental errors (random): statistical inverse problem. Does it do better as instrum. errors decrease?

SLIDE 35

Outline

1 Introduction

Mathematical Statistics Examples

2 Seeing the unseen

Coin flips Statistical inverse problems

SLIDE 36

Outline

1 Introduction

Mathematical Statistics Examples

2 Seeing the unseen

Coin flips Statistical inverse problems

SLIDE 37

Coin flips: experiment I

Let us simplify things and flip some coins:

SLIDE 38

Coin flips: experiment I

Let us simplify things and flip some coins: with your phones, go to https://albertococacabrero.wordpress.com/openday/ or google Alberto J Coca Cambridge and add openday/ to end of my URL:

SLIDE 39

Coin flips: experiment I

Let us simplify things and flip some coins: with your phones, go to https://albertococacabrero.wordpress.com/openday/ or google Alberto J Coca Cambridge and add openday/ to end of my URL:

SLIDE 40

Coin flips: experiment I

Let us simplify things and flip some coins: with your phones, go to https://albertococacabrero.wordpress.com/openday/ or google Alberto J Coca Cambridge and add openday/ to end of my URL:

SLIDE 41

Coin flips: experiment I

Let us simplify things and flip some coins: with your phones, go to https://albertococacabrero.wordpress.com/openday/ or google Alberto J Coca Cambridge and add openday/ to end of my URL:

SLIDE 42

Coin flips: experiment I, MLE

We do not know the probability p ∈ [0, 1] of this coin landing Heads (typically p =1/2, i.e. fair coin).

SLIDE 43

Coin flips: experiment I, MLE

We do not know the probability p ∈ [0, 1] of this coin landing Heads (typically p =1/2, i.e. fair coin). How to guess p from our data?

SLIDE 44

Coin flips: experiment I, MLE

We do not know the probability p ∈ [0, 1] of this coin landing Heads (typically p =1/2, i.e. fair coin). How to guess p from our data? “Frequentist” guess is ˆ p = #Heads

#flips .

SLIDE 45

Coin flips: experiment I, MLE

We do not know the probability p ∈ [0, 1] of this coin landing Heads (typically p =1/2, i.e. fair coin). How to guess p from our data? “Frequentist” guess is ˆ p = #Heads

#flips . Is this a sensible guess?

SLIDE 46

Coin flips: experiment I, MLE

We do not know the probability p ∈ [0, 1] of this coin landing Heads (typically p =1/2, i.e. fair coin). How to guess p from our data? “Frequentist” guess is ˆ p = #Heads

#flips . Is this a sensible guess? Yes:

SLIDE 47

Coin flips: experiment I, MLE

We do not know the probability p ∈ [0, 1] of this coin landing Heads (typically p =1/2, i.e. fair coin). How to guess p from our data? “Frequentist” guess is ˆ p = #Heads

#flips . Is this a sensible guess? Yes:

each coin flip has options and probabilities given by Options Heads Tails Probabs. p 1 − p

SLIDE 48

Coin flips: experiment I, MLE

We do not know the probability p ∈ [0, 1] of this coin landing Heads (typically p =1/2, i.e. fair coin). How to guess p from our data? “Frequentist” guess is ˆ p = #Heads

#flips . Is this a sensible guess? Yes:

each coin flip has options and probabilities given by Options Heads Tails Probabs. p 1 − p If our data is Heads,Tails, Heads, it has probability or likelihood L = p(1 − p)p = p2(1 − p);

SLIDE 49

Coin flips: experiment I, MLE

We do not know the probability p ∈ [0, 1] of this coin landing Heads (typically p =1/2, i.e. fair coin). How to guess p from our data? “Frequentist” guess is ˆ p = #Heads

#flips . Is this a sensible guess? Yes:

each coin flip has options and probabilities given by Options Heads Tails Probabs. p 1 − p If our data is Heads,Tails, Heads, it has probability or likelihood L = p(1 − p)p = p2(1 − p); and, more generally, if we flip n coins and obtain m heads, the likelihood of our data is L(p, m, n) = pm(1 − p)n−m.

SLIDE 50

Coin flips: experiment I, MLE

We do not know the probability p ∈ [0, 1] of this coin landing Heads (typically p =1/2, i.e. fair coin). How to guess p from our data? “Frequentist” guess is ˆ p = #Heads

#flips . Is this a sensible guess? Yes:

each coin flip has options and probabilities given by Options Heads Tails Probabs. p 1 − p If our data is Heads,Tails, Heads, it has probability or likelihood L = p(1 − p)p = p2(1 − p); and, more generally, if we flip n coins and obtain m heads, the likelihood of our data is L(p, m, n) = pm(1 − p)n−m. Homework: ˆ p = m

n maximises q → L(q, m, n) when q ∈ [0, 1]!

SLIDE 51

Coin flips: experiment I, MLE

We do not know the probability p ∈ [0, 1] of this coin landing Heads (typically p =1/2, i.e. fair coin). How to guess p from our data? “Frequentist” guess is ˆ p = #Heads

#flips . Is this a sensible guess? Yes:

each coin flip has options and probabilities given by Options Heads Tails Probabs. p 1 − p If our data is Heads,Tails, Heads, it has probability or likelihood L = p(1 − p)p = p2(1 − p); and, more generally, if we flip n coins and obtain m heads, the likelihood of our data is L(p, m, n) = pm(1 − p)n−m. Homework: ˆ p = m

n maximises q → L(q, m, n) when q ∈ [0, 1]!

Indeed, ˆ p is called the Maximum Likelihood Estimator (MLE).

SLIDE 52

Coin flips: experiment I, MLE

The MLE ˆ p = m

n enjoys mathematically desirable properties:

SLIDE 53

Coin flips: experiment I, MLE

The MLE ˆ p = m

n enjoys mathematically desirable properties: e.g.,

Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞.

SLIDE 54

Coin flips: experiment I, MLE

The MLE ˆ p = m

n enjoys mathematically desirable properties: e.g.,

Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞. How fast does Error = |ˆ p − p| → 0?

SLIDE 55

Coin flips: experiment I, MLE

The MLE ˆ p = m

n enjoys mathematically desirable properties: e.g.,

Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞. How fast does Error = |ˆ p − p| → 0? Mathematical results guarantee it cannot be faster than 1/√n.

SLIDE 56

Coin flips: experiment I, MLE

The MLE ˆ p = m

n enjoys mathematically desirable properties: e.g.,

Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞. How fast does Error = |ˆ p − p| → 0? Mathematical results guarantee it cannot be faster than 1/√n. Note that if Error ≈ 1/na = n−a for some a>0, then log Error ≈ −a log n.

SLIDE 57

Coin flips: experiment I, MLE

The MLE ˆ p = m

n enjoys mathematically desirable properties: e.g.,

Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞. How fast does Error = |ˆ p − p| → 0? Mathematical results guarantee it cannot be faster than 1/√n. Note that if Error ≈ 1/na = n−a for some a>0, then log Error ≈ −a log n. Hence, to find value of a, plot x = log n vs. y = log Error ≈ −ax and compute the slope.

SLIDE 58

Coin flips: experiment I, MLE

The MLE ˆ p = m

n enjoys mathematically desirable properties: e.g.,

Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞. How fast does Error = |ˆ p − p| → 0? Mathematical results guarantee it cannot be faster than 1/√n. Note that if Error ≈ 1/na = n−a for some a>0, then log Error ≈ −a log n. Hence, to find value of a, plot x = log n vs. y = log Error ≈ −ax and compute the slope. Plot suggests a = 1/2, i.e. Error ≈ 1/√n!

SLIDE 59

Coin flips: experiment I, MLE

The MLE ˆ p = m

n enjoys mathematically desirable properties: e.g.,

Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞. How fast does Error = |ˆ p − p| → 0? Mathematical results guarantee it cannot be faster than 1/√n. Note that if Error ≈ 1/na = n−a for some a>0, then log Error ≈ −a log n. Hence, to find value of a, plot x = log n vs. y = log Error ≈ −ax and compute the slope. Plot suggests a = 1/2, i.e. Error ≈ 1/√n! Thus, MLE is optimal in convergence rates!

SLIDE 60

Coin flips: experiment I, MLE

The MLE ˆ p = m

n enjoys mathematically desirable properties: e.g.,

Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞. How fast does Error = |ˆ p − p| → 0? Mathematical results guarantee it cannot be faster than 1/√n. Note that if Error ≈ 1/na = n−a for some a>0, then log Error ≈ −a log n. Hence, to find value of a, plot x = log n vs. y = log Error ≈ −ax and compute the slope. Plot suggests a = 1/2, i.e. Error ≈ 1/√n! Thus, MLE is optimal in convergence rates! (Optimal in other senses in view of, e.g., Central Limit Theorem, but no time to explain this.)

SLIDE 61

Coin flips: experiment I, MLE

The MLE ˆ p = m

n enjoys mathematically desirable properties: e.g.,

Law of Large Numbers (LLN) It holds that ˆ p → p as n → ∞. How fast does Error = |ˆ p − p| → 0? Mathematical results guarantee it cannot be faster than 1/√n. Note that if Error ≈ 1/na = n−a for some a>0, then log Error ≈ −a log n. Hence, to find value of a, plot x = log n vs. y = log Error ≈ −ax and compute the slope. Plot suggests a = 1/2, i.e. Error ≈ 1/√n! Thus, MLE is optimal in convergence rates! (Optimal in other senses in view of, e.g., Central Limit Theorem, but no time to explain this.) Other estimators are optimal too:

SLIDE 62

Coin flips: experiment I, Bayes

‘Bayesian” method: before conducting the experiment, guess probabilities for unknown p, i.e., Prob(p = q |no data) = G(q) with G, e.g., Green: I am (obviously!) respectable and coin is probably fair;

SLIDE 63

Coin flips: experiment I, Bayes

‘Bayesian” method: before conducting the experiment, guess probabilities for unknown p, i.e., Prob(p = q |no data) = G(q) with G, e.g., Green: I am (obviously!) respectable and coin is probably fair; Red: I am (absolutely) not re- spectable and coin is unfair;

SLIDE 64

Coin flips: experiment I, Bayes

‘Bayesian” method: before conducting the experiment, guess probabilities for unknown p, i.e., Prob(p = q |no data) = G(q) with G, e.g., Green: I am (obviously!) respectable and coin is probably fair; Red: I am (absolutely) not re- spectable and coin is unfair; Blue: I would rather not guess to avoid confrontations...

SLIDE 65

Coin flips: experiment I, Bayes

‘Bayesian” method: before conducting the experiment, guess probabilities for unknown p, i.e., Prob(p = q |no data) = G(q) with G, e.g., Green: I am (obviously!) respectable and coin is probably fair; Red: I am (absolutely) not re- spectable and coin is unfair; Blue: I would rather not guess to avoid confrontations... The initial guess G(q) evolves through Bayes rule as we get the data giving B(q) = L(q, m, n)G(q) 1

0 L(r, m, n)G(r)dr

, q ∈ [0, 1].

SLIDE 66

Coin flips: experiment I, Bayes

‘Bayesian” method: before conducting the experiment, guess probabilities for unknown p, i.e., Prob(p = q |no data) = G(q) with G, e.g., Green: I am (obviously!) respectable and coin is probably fair; Red: I am (absolutely) not re- spectable and coin is unfair; Blue: I would rather not guess to avoid confrontations... The initial guess G(q) evolves through Bayes rule as we get the data giving B(q) = L(q, m, n)G(q) 1

0 L(r, m, n)G(r)dr

, q ∈ [0, 1]. The normalisation ensures 1

0 B(q)dq =

1

0 Prob(p = q |data)dq = 1.

SLIDE 67

Coin flips: experiment I, Bayes

How does the data-update of G given by B look?

SLIDE 68

Coin flips: experiment I, Bayes

How does the data-update of G given by B look?

SLIDE 69

Coin flips: experiment I, Bayes

How does the data-update of G given by B look? B gives much more information than ˆ p!

SLIDE 70

Coin flips: experiment I, Bayes

How does the data-update of G given by B look? B gives much more information than ˆ p! It does so without having to maximise likelihood!

SLIDE 71

Coin flips: experiment I, Bayes

How does the data-update of G given by B look? B gives much more information than ˆ p! It does so without having to maximise likelihood! Mathematical properties?

SLIDE 72

Coin flips: experiment I, Bayes

How does the data-update of G given by B look? B gives much more information than ˆ p! It does so without having to maximise likelihood! Optimal! Bernstein–von Mises Theorem (BvM) If G(p)>0, then B(q) ≈ ˆ p + 1

√n

p(1 − p) N(0, 1)(q) as n → ∞.

(Here N(0, 1) is the “Bell curve”.)

SLIDE 73

Coin flips: experiment I, Bayes

How does the data-update of G given by B look? B gives much more information than ˆ p! It does so without having to maximise likelihood! Optimal! Bernstein–von Mises Theorem (BvM) If G(p)>0, then B(q) ≈ ˆ p + 1

√n

p(1 − p) N(0, 1)(q) as n → ∞.

Indeed, Bayesian method is extensively used in practice. However, much harder to analyse mathematically: no-free-lunch principle!

SLIDE 74

Coin flips: experiment II, MLE

Let H =Heads =0 and T =Tails =1.

SLIDE 75

Coin flips: experiment II, MLE

Let H =Heads =0 and T =Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses.

SLIDE 76

Coin flips: experiment II, MLE

Let H =Heads =0 and T =Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses. E.g., if coin flips are H, H

0+0=0

, T, H

1+0=1

, H,T

0+1=1

, T,T

1+1=2

, the data is 0,1,1,2.

SLIDE 77

Coin flips: experiment II, MLE

Let H =Heads =0 and T =Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses. E.g., if coin flips are H, H

0+0=0

, T, H

1+0=1

, H,T

0+1=1

, T,T

1+1=2

, the data is 0,1,1,2. How to guess the probability p of the underlying coin landing Heads now?

SLIDE 78

Coin flips: experiment II, MLE

Let H =Heads =0 and T =Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses. E.g., if coin flips are H, H

0+0=0

, T, H

1+0=1

, H,T

0+1=1

, T,T

1+1=2

, the data is 0,1,1,2. How to guess the probability p of the underlying coin landing Heads now? I.e., how to see the unseen?

SLIDE 79

Coin flips: experiment II, MLE

Let H =Heads =0 and T =Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses. E.g., if coin flips are H, H

0+0=0

, T, H

1+0=1

, H,T

0+1=1

, T,T

1+1=2

, the data is 0,1,1,2. How to guess the probability p of the underlying coin landing Heads now? I.e., how to see the unseen? Each sum of a pair of coin flips has options and probabilities Options 1 2 Probabs. p2 2p(1 − p) (1 − p)2

SLIDE 80

Coin flips: experiment II, MLE

Let H =Heads =0 and T =Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses. E.g., if coin flips are H, H

0+0=0

, T, H

1+0=1

, H,T

0+1=1

, T,T

1+1=2

, the data is 0,1,1,2. How to guess the probability p of the underlying coin landing Heads now? I.e., how to see the unseen? Each sum of a pair of coin flips has options and probabilities Options 1 2 Probabs. p2 2p(1 − p) (1 − p)2 Easy guesses given by “inverting table entries”: e.g., ˆ p =

#0s

#data.

SLIDE 81

Coin flips: experiment II, MLE

Let H =Heads =0 and T =Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses. E.g., if coin flips are H, H

0+0=0

, T, H

1+0=1

, H,T

0+1=1

, T,T

1+1=2

, the data is 0,1,1,2. How to guess the probability p of the underlying coin landing Heads now? I.e., how to see the unseen? Each sum of a pair of coin flips has options and probabilities Options 1 2 Probabs. p2 2p(1 − p) (1 − p)2 Easy guesses given by “inverting table entries”: e.g., ˆ p =

#0s

#data.

Cannot be optimal as it does not use all the info from data.

SLIDE 82

Coin flips: experiment II, MLE

Let H =Heads =0 and T =Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses. E.g., if coin flips are H, H

0+0=0

, T, H

1+0=1

, H,T

0+1=1

, T,T

1+1=2

, the data is 0,1,1,2. How to guess the probability p of the underlying coin landing Heads now? I.e., how to see the unseen? Each sum of a pair of coin flips has options and probabilities Options 1 2 Probabs. p2 2p(1 − p) (1 − p)2 Easy guesses given by “inverting table entries”: e.g., ˆ p =

#0s

#data.

Cannot be optimal as it does not use all the info from data. If n0 =#0s, n1 =#1s and n2 =#2s, the likelihood of the data is L(p, n0, n1, n2) = p2n0(2p(1 − p))n1(1 − p)2n2.

SLIDE 83

Coin flips: experiment II, MLE

Let H =Heads =0 and T =Tails =1. Now we do not see coin flips directly but “their sum” every 2 tosses. E.g., if coin flips are H, H

0+0=0

, T, H

1+0=1

, H,T

0+1=1

, T,T

1+1=2

, the data is 0,1,1,2. How to guess the probability p of the underlying coin landing Heads now? I.e., how to see the unseen? Each sum of a pair of coin flips has options and probabilities Options 1 2 Probabs. p2 2p(1 − p) (1 − p)2 Easy guesses given by “inverting table entries”: e.g., ˆ p =

#0s

#data.

Cannot be optimal as it does not use all the info from data. If n0 =#0s, n1 =#1s and n2 =#2s, the likelihood of the data is L(p, n0, n1, n2) = p2n0(2p(1 − p))n1(1 − p)2n2. MLE is ˆ p = 1

2(1+ n0−n2 n

). The MLE “inverts table entries” optimally, enjoying same properties as before (and more): as n → ∞, ˆ p → p with Error = |ˆ p − p| ≈ 1/√n.

SLIDE 84

Coin flips: experiment II, Bayes

Now we appreciate further the superiority of the Bayesian method:

SLIDE 85

Coin flips: experiment II, Bayes

Now we appreciate further the superiority of the Bayesian method: with same initial guess G(q) as before, we again take B(q) = L(q, n0, n1, n2)G(q) 1

0 L(r, n0, n1, n2)G(r)dr

, q ∈ [0, 1].

SLIDE 86

Coin flips: experiment II, Bayes

Now we appreciate further the superiority of the Bayesian method: with same initial guess G(q) as before, we again take B(q) = L(q, n0, n1, n2)G(q) 1

0 L(r, n0, n1, n2)G(r)dr

, q ∈ [0, 1]. B in action

p = 2

3 ≈67%

:

SLIDE 87

Coin flips: experiment II, Bayes

Now we appreciate further the superiority of the Bayesian method: with same initial guess G(q) as before, we again take B(q) = L(q, n0, n1, n2)G(q) 1

0 L(r, n0, n1, n2)G(r)dr

, q ∈ [0, 1]. B in action

p = 2

3 ≈67%

:

SLIDE 88

Coin flips: experiment II, Bayes

Now we appreciate further the superiority of the Bayesian method: with same initial guess G(q) as before, we again take B(q) = L(q, n0, n1, n2)G(q) 1

0 L(r, n0, n1, n2)G(r)dr

, q ∈ [0, 1]. B in action

p = 2

3 ≈67%

:

Again, B gives much more information than ˆ p!

SLIDE 89

Coin flips: experiment II, Bayes

Now we appreciate further the superiority of the Bayesian method: with same initial guess G(q) as before, we again take B(q) = L(q, n0, n1, n2)G(q) 1

0 L(r, n0, n1, n2)G(r)dr

, q ∈ [0, 1]. B in action

p = 2

3 ≈67%

:

Again, B gives much more information than ˆ p! B much easier than “inverting table entries” (maximising likelihood)!

SLIDE 90

Coin flips: experiment II, Bayes

Now we appreciate further the superiority of the Bayesian method: with same initial guess G(q) as before, we again take B(q) = L(q, n0, n1, n2)G(q) 1

0 L(r, n0, n1, n2)G(r)dr

, q ∈ [0, 1]. B in action

p = 2

3 ≈67%

:

Again, B gives much more information than ˆ p! B much easier than “inverting table entries” (maximising likelihood)! In fact, we input the table entries and B “inverts” them automatically and

ptimally: a BvM

Theorem holds!

SLIDE 91

Coin flips: experiment III, MLE

Do not see the coin flips directly but flip a second coin with qH ∈[0, 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin.

SLIDE 92

Coin flips: experiment III, MLE

Do not see the coin flips directly but flip a second coin with qH ∈[0, 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin. E.g., if second coin’s flips are 0, 1, 0 (not observed!) and the first coin’s flips are H, H

0+0=0

, T, H, H, T

1+0+0+1=2

, T, T

1+1=2

, the data is 0, 2, 2.

SLIDE 93

Coin flips: experiment III, MLE

Do not see the coin flips directly but flip a second coin with qH ∈[0, 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin. E.g., if second coin’s flips are 0, 1, 0 (not observed!) and the first coin’s flips are H, H

0+0=0

, T, H, H, T

1+0+0+1=2

, T, T

1+1=2

, the data is 0, 2, 2. If qH known, how to guess p now? I.e., how to see the unseen?

SLIDE 94

Coin flips: experiment III, MLE

Do not see the coin flips directly but flip a second coin with qH ∈[0, 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin. E.g., if second coin’s flips are 0, 1, 0 (not observed!) and the first coin’s flips are H, H

0+0=0

, T, H, H, T

1+0+0+1=2

, T, T

1+1=2

, the data is 0, 2, 2. If qH known, how to guess p now? I.e., how to see the unseen? Each (random) sum of coin flips has options and probabilities given by Opts. 1 ... 4 Probs. p0 =qHp2 +(1−qH)p4 p1 =qH2p(1−p) +(1−qH)4p3(1−p) ... p4 =(1−qH) × (1−p)4

SLIDE 95

Coin flips: experiment III, MLE

Do not see the coin flips directly but flip a second coin with qH ∈[0, 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin. E.g., if second coin’s flips are 0, 1, 0 (not observed!) and the first coin’s flips are H, H

0+0=0

, T, H, H, T

1+0+0+1=2

, T, T

1+1=2

, the data is 0, 2, 2. If qH known, how to guess p now? I.e., how to see the unseen? Each (random) sum of coin flips has options and probabilities given by Opts. 1 ... 4 Probs. p0 =qHp2 +(1−qH)p4 p1 =qH2p(1−p) +(1−qH)4p3(1−p) ... p4 =(1−qH) × (1−p)4 Guess obtained inverting p4, i.e. ˆ p =1−

1

1 − qH #4s #data

1/4.

SLIDE 96

Coin flips: experiment III, MLE

Do not see the coin flips directly but flip a second coin with qH ∈[0, 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin. E.g., if second coin’s flips are 0, 1, 0 (not observed!) and the first coin’s flips are H, H

0+0=0

, T, H, H, T

1+0+0+1=2

, T, T

1+1=2

, the data is 0, 2, 2. If qH known, how to guess p now? I.e., how to see the unseen? Each (random) sum of coin flips has options and probabilities given by Opts. 1 ... 4 Probs. p0 =qHp2 +(1−qH)p4 p1 =qH2p(1−p) +(1−qH)4p3(1−p) ... p4 =(1−qH) × (1−p)4 Guess obtained inverting p4, i.e. ˆ p =1−

1

1 − qH #4s #data

1/4. Not optimal!

SLIDE 97

Coin flips: experiment III, MLE

Do not see the coin flips directly but flip a second coin with qH ∈[0, 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin. E.g., if second coin’s flips are 0, 1, 0 (not observed!) and the first coin’s flips are H, H

0+0=0

, T, H, H, T

1+0+0+1=2

, T, T

1+1=2

, the data is 0, 2, 2. If qH known, how to guess p now? I.e., how to see the unseen? Each (random) sum of coin flips has options and probabilities given by Opts. 1 ... 4 Probs. p0 =qHp2 +(1−qH)p4 p1 =qH2p(1−p) +(1−qH)4p3(1−p) ... p4 =(1−qH) × (1−p)4 Guess obtained inverting p4, i.e. ˆ p =1−

1

1 − qH #4s #data

1/4. Not optimal!

If nj =#js, likelihood of data is L(p, n0, ..., n4)=4

j=0 pj(p)nj.

SLIDE 98

Coin flips: experiment III, MLE

Do not see the coin flips directly but flip a second coin with qH ∈[0, 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin. E.g., if second coin’s flips are 0, 1, 0 (not observed!) and the first coin’s flips are H, H

0+0=0

, T, H, H, T

1+0+0+1=2

, T, T

1+1=2

, the data is 0, 2, 2. If qH known, how to guess p now? I.e., how to see the unseen? Each (random) sum of coin flips has options and probabilities given by Opts. 1 ... 4 Probs. p0 =qHp2 +(1−qH)p4 p1 =qH2p(1−p) +(1−qH)4p3(1−p) ... p4 =(1−qH) × (1−p)4 Guess obtained inverting p4, i.e. ˆ p =1−

1

1 − qH #4s #data

1/4. Not optimal!

If nj =#js, likelihood of data is L(p, n0, ..., n4)=4

j=0 pj(p)nj. A unique

maximiser of p → L(p, n0, ..., n4) exists (MLE) but not in closed form: “inverting table entries” explicitly is too hard!

SLIDE 99

Coin flips: experiment III, MLE

Do not see the coin flips directly but flip a second coin with qH ∈[0, 1]: if it lands H (T) observe the “sum” of 2 (4 resp.) tosses of the first coin. E.g., if second coin’s flips are 0, 1, 0 (not observed!) and the first coin’s flips are H, H

0+0=0

, T, H, H, T

1+0+0+1=2

, T, T

1+1=2

, the data is 0, 2, 2. If qH known, how to guess p now? I.e., how to see the unseen? Each (random) sum of coin flips has options and probabilities given by Opts. 1 ... 4 Probs. p0 =qHp2 +(1−qH)p4 p1 =qH2p(1−p) +(1−qH)4p3(1−p) ... p4 =(1−qH) × (1−p)4 Guess obtained inverting p4, i.e. ˆ p =1−

1

1 − qH #4s #data

1/4. Not optimal!

If nj =#js, likelihood of data is L(p, n0, ..., n4)=4

j=0 pj(p)nj. A unique

maximiser of p → L(p, n0, ..., n4) exists (MLE) but not in closed form: “inverting table entries” explicitly is too hard! The implicit MLE still enjoys the same desirable properties.

SLIDE 100

Coin flips: experiment III, Bayes

The superiority of the Bayesian method in this experiment is clear:

SLIDE 101

Coin flips: experiment III, Bayes

The superiority of the Bayesian method in this experiment is clear: with same initial guess G(q) as before, we again take B(q) = L(q, n0, ..., n4)G(q) 1

0 L(r, n0, ..., n4)G(r)dr

, q ∈ [0, 1].

SLIDE 102

Coin flips: experiment III, Bayes

Now we appreciate further the superiority of the Bayesian method: with same initial guess G(q) as before, we again take B(q) = L(q, n0, ..., n4)G(q) 1

0 L(r, n0, ..., n4)G(r)dr

. B in action

qH = 2

3, p = 1 5 =20%

:

SLIDE 103

Coin flips: experiment III, Bayes

Now we appreciate further the superiority of the Bayesian method: with same initial guess G(q) as before, we again take B(q) = L(q, n0, ..., n4)G(q) 1

0 L(r, n0, ..., n4)G(r)dr

. B in action

qH = 2

3, p = 1 5 =20%

:

SLIDE 104

Coin flips: experiment III, Bayes

Now we appreciate further the superiority of the Bayesian method: with same initial guess G(q) as before, we again take B(q) = L(q, n0, ..., n4)G(q) 1

0 L(r, n0, ..., n4)G(r)dr

. B in action

qH = 2

3, p = 1 5 =20%

:

Again, B gives much more information than ˆ p!

SLIDE 105

Coin flips: experiment III, Bayes

Now we appreciate further the superiority of the Bayesian method: with same initial guess G(q) as before, we again take B(q) = L(q, n0, ..., n4)G(q) 1

0 L(r, n0, ..., n4)G(r)dr

. B in action

qH = 2

3, p = 1 5 =20%

:

Again, B gives much more information than ˆ p! B has simpler and explicit expression!

SLIDE 106

Coin flips: experiment III, Bayes

Now we appreciate further the superiority of the Bayesian method: with same initial guess G(q) as before, we again take B(q) = L(q, n0, ..., n4)G(q) 1

0 L(r, n0, ..., n4)G(r)dr

. B in action

qH = 2

3, p = 1 5 =20%

:

Again, B gives much more information than ˆ p! B has simpler and explicit expression! We input the table entries and B “inverts” them automatically and

ptimally: a BvM

Theorem holds!

SLIDE 107

Outline

1 Introduction

Mathematical Statistics Examples

2 Seeing the unseen

Coin flips Statistical inverse problems

SLIDE 108

Statistical inverse problems: random processes

The data of Experiment III (qH =p =1/2) can be visualised as

10 20 30 40 50 60 70 80 90 100 0.5 1 1.5 2 2.5 3 3.5 4

SLIDE 109

Statistical inverse problems: random processes

The data of Experiment III (qH =p =1/2) can be visualised as

10 20 30 40 50 60 70 80 90 100 0.5 1 1.5 2 2.5 3 3.5 4

Generalisation (n=1000): instead of second coin, die with infinite faces

100 200 300 400 500 600 700 800 900 1000 2 4 6 8 10 12

SLIDE 110

Statistical inverse problems: random processes

The data of Experiment III (qH =p =1/2) can be visualised as

10 20 30 40 50 60 70 80 90 100 0.5 1 1.5 2 2.5 3 3.5 4

Generalisation (n=1000): instead of second coin, die with infinite faces

100 200 300 400 500 600 700 800 900 1000 2 4 6 8 10 12

These are examples of random or stochastic processes: their estimation and study are large areas of fields of statistics and probability.

SLIDE 111

Statistical inverse problems: random processes

The data of Experiment III (qH =p =1/2) can be visualised as

10 20 30 40 50 60 70 80 90 100 0.5 1 1.5 2 2.5 3 3.5 4

Generalisation (n=1000): instead of second coin, die with infinite faces

100 200 300 400 500 600 700 800 900 1000 2 4 6 8 10 12

These are examples of random or stochastic processes: their estimation and study are large areas of fields of statistics and probability. Stochastic processes have a prominent role in applied mathematics, the sciences and engineering as they are models for dynamical systems.

SLIDE 112

Statistical inverse problems: conclusions

No hope to “invert table” in the generalisation to find MLE for p.

SLIDE 113

Statistical inverse problems: conclusions

No hope to “invert table” in the generalisation to find MLE for p. In these examples a unique MLE exists, but other issues may appear.

SLIDE 114

Statistical inverse problems: conclusions

No hope to “invert table” in the generalisation to find MLE for p. In these examples a unique MLE exists, but other issues may appear. In other stochastic processes, the MLE may not be unique or even exist.

SLIDE 115

Statistical inverse problems: conclusions

No hope to “invert table” in the generalisation to find MLE for p. In these examples a unique MLE exists, but other issues may appear. In other stochastic processes, the MLE may not be unique or even exist. These issues are shared among statistical inverse problems, which appear in all type of applications, e.g., medical imaging (start of talk).

SLIDE 116

Statistical inverse problems: conclusions

No hope to “invert table” in the generalisation to find MLE for p. In these examples a unique MLE exists, but other issues may appear. In other stochastic processes, the MLE may not be unique or even exist. These issues are shared among statistical inverse problems, which appear in all type of applications, e.g., medical imaging (start of talk). The Bayesian method is an attractive alternative that in many cases where MLE fails to see the unseen, it optimally sees the unseen.

SLIDE 117

Statistical inverse problems: conclusions

No hope to “invert table” in the generalisation to find MLE for p. In these examples a unique MLE exists, but other issues may appear. In other stochastic processes, the MLE may not be unique or even exist. These issues are shared among statistical inverse problems, which appear in all type of applications, e.g., medical imaging (start of talk). The Bayesian method is an attractive alternative that in many cases where MLE fails to see the unseen, it optimally sees the unseen. However, the mathematics justifying this fact are not well understood; indeed, it is an active field of research!

SLIDE 118

Statistical inverse problems: conclusions

No hope to “invert table” in the generalisation to find MLE for p. In these examples a unique MLE exists, but other issues may appear. In other stochastic processes, the MLE may not be unique or even exist. These issues are shared among statistical inverse problems, which appear in all type of applications, e.g., medical imaging (start of talk). The Bayesian method is an attractive alternative that in many cases where MLE fails to see the unseen, it optimally sees the unseen. However, the mathematics justifying this fact are not well understood; indeed, it is an active field of research! If you would like to learn more about any of the above, Cambridge is the right place for you :)

SLIDE 119

Statistical inverse problems: conclusions

No hope to “invert table” in the generalisation to find MLE for p. In these examples a unique MLE exists, but other issues may appear. In other stochastic processes, the MLE may not be unique or even exist. These issues are shared among statistical inverse problems, which appear in all type of applications, e.g., medical imaging (start of talk). The Bayesian method is an attractive alternative that in many cases where MLE fails to see the unseen, it optimally sees the unseen. However, the mathematics justifying this fact are not well understood; indeed, it is an active field of research! If you would like to learn more about any of the above, Cambridge is the right place for you :)