Data-driven Model Selection for Approximate Bayesian Computation via - - PowerPoint PPT Presentation

data driven model selection for approximate bayesian
SMART_READER_LITE
LIVE PREVIEW

Data-driven Model Selection for Approximate Bayesian Computation via - - PowerPoint PPT Presentation

Data-driven Model Selection for Approximate Bayesian Computation via Multiple Logisitic Regression. Ben Rohrlach Prof. Nigel Bean, Dr Jonathan Tuke University of Adelaide November 6, 2014 Adam Rohrlach Table of Contents Introduction. 1


slide-1
SLIDE 1

Data-driven Model Selection for Approximate Bayesian Computation via Multiple Logisitic Regression.

Ben Rohrlach

  • Prof. Nigel Bean, Dr Jonathan Tuke

University of Adelaide

November 6, 2014

Adam Rohrlach

slide-2
SLIDE 2

Table of Contents

1

Introduction.

2

Approximate Bayesian Computation.

3

Model Selection.

4

Multiple Logistic Regression (MLR).

5

Conclusions.

Adam Rohrlach

slide-3
SLIDE 3

Some motivation.

Consider the Beringian Steppe Bison.

Adam Rohrlach

slide-4
SLIDE 4

Some motivation.

Adam Rohrlach

slide-5
SLIDE 5

Some motivation.

Adam Rohrlach

slide-6
SLIDE 6

Some motivation.

Population numbers dropped at some time in the past.

Adam Rohrlach

slide-7
SLIDE 7

Some motivation.

Population numbers dropped at some time in the past. Did it happen slowly over time?

Adam Rohrlach

slide-8
SLIDE 8

Some motivation.

Population numbers dropped at some time in the past. Did it happen slowly over time? Did it happen abruptly?

Adam Rohrlach

slide-9
SLIDE 9

Some motivation.

Population numbers dropped at some time in the past. Did it happen slowly over time? Did it happen abruptly? If it did happen abruptly, when did it happen?

Adam Rohrlach

slide-10
SLIDE 10

Some motivation.

Population numbers dropped at some time in the past. Did it happen slowly over time? Did it happen abruptly? If it did happen abruptly, when did it happen? How can we work this out if all we have are some DNA from old bones??

Adam Rohrlach

slide-11
SLIDE 11

Some motivation.

Figure: Rise and fall of the Beringian steppe bison, Shapiro et al. [4].

Adam Rohrlach

slide-12
SLIDE 12

Bayesian vs Frequentist.

Frequentist Approach Bayesian Approach

Adam Rohrlach

slide-13
SLIDE 13

Bayesian vs Frequentist.

Frequentist Approach Data comes from a repeatable experiment. Bayesian Approach Data comes from a realised experiment.

Adam Rohrlach

slide-14
SLIDE 14

Bayesian vs Frequentist.

Frequentist Approach Data comes from a repeatable experiment. The parameters are constant. Bayesian Approach Data comes from a realised experiment. The parameters are unknown.

Adam Rohrlach

slide-15
SLIDE 15

Bayesian vs Frequentist.

Frequentist Approach Data comes from a repeatable experiment. The parameters are constant. The parameters are fixed. Bayesian Approach Data comes from a realised experiment. The parameters are unknown. The data is fixed.

Adam Rohrlach

slide-16
SLIDE 16

Bayesian vs Frequentist.

In a frequentist analysis we: Set α in advance and find L(X

  • H0),

Adam Rohrlach

slide-17
SLIDE 17

Bayesian vs Frequentist.

In a frequentist analysis we: Set α in advance and find L(X

  • H0),

Accept H0 if L(X

  • H0) ≥ α,

Adam Rohrlach

slide-18
SLIDE 18

Bayesian vs Frequentist.

In a frequentist analysis we: Set α in advance and find L(X

  • H0),

Accept H0 if L(X

  • H0) ≥ α,

Report point estimates and confidence intervals for parameters.

Adam Rohrlach

slide-19
SLIDE 19

Bayesian vs Frequentist.

In a Bayesian analysis we: From π(θ) we (inductively) find P(θ

  • X),

Adam Rohrlach

slide-20
SLIDE 20

Bayesian vs Frequentist.

In a Bayesian analysis we: From π(θ) we (inductively) find P(θ

  • X),

Describe the posterior distribution of θ,

Adam Rohrlach

slide-21
SLIDE 21

Bayesian vs Frequentist.

In a Bayesian analysis we: From π(θ) we (inductively) find P(θ

  • X),

Describe the posterior distribution of θ, Report highest posterior density intervals for parameters.

Adam Rohrlach

slide-22
SLIDE 22

Bayesian Statistics.

That is: We aim to describe the probability of model parameters given the data we have observed via P(θ

  • X) = L(X
  • θ)π(θ)

P(X) where L(X

  • θ) is the likelihood function for the data.

Adam Rohrlach

slide-23
SLIDE 23

Bayesian Statistics.

That is: We aim to describe the probability of model parameters given the data we have observed via P(θ

  • X) = L(X
  • θ)π(θ)

P(X) where π(θ) is the ‘prior distribution’ for θ (my prior beliefs about the possible parameter values).

Adam Rohrlach

slide-24
SLIDE 24

Bayesian Statistics.

That is: We aim to describe the probability of model parameters given the data we have observed via P(θ

  • X) = L(X
  • θ)π(θ)

P(X) where P(X) is the ‘marginal likelihood’ of the data (sometimes called the ‘model evidence’).

Adam Rohrlach

slide-25
SLIDE 25

ABCs History.

First considered by Donald Rubin in the 1980’s via the ‘Acceptance-Rejection Algorithm’ [1].

Adam Rohrlach

slide-26
SLIDE 26

ABCs History.

First considered by Donald Rubin in the 1980’s via the ‘Acceptance-Rejection Algorithm’ [1]. Particularly useful when obtaining the likelihood function L(X

  • θ) is difficult or impossible to obtain.

Adam Rohrlach

slide-27
SLIDE 27

ABCs History.

First considered by Donald Rubin in the 1980’s via the ‘Acceptance-Rejection Algorithm’ [1]. Particularly useful when obtaining the likelihood function L(X

  • θ) is difficult or impossible to obtain.

Relies on being able to simulate data efficiently.

Adam Rohrlach

slide-28
SLIDE 28

The Rejection-Acceptance Algorithm.

Consider obtaining ℓ posterior samples using some

  • bserved data X obs:

1: Set i = 0 2: while i < ℓ do 9: end while

Adam Rohrlach

slide-29
SLIDE 29

The Rejection-Acceptance Algorithm.

Consider obtaining ℓ posterior samples using some

  • bserved data X obs:

1: Set i = 0 2: while i < ℓ do 3:

Sample θ∗ from π(θ)

9: end while

Adam Rohrlach

slide-30
SLIDE 30

The Rejection-Acceptance Algorithm.

Consider obtaining ℓ posterior samples using some

  • bserved data X obs:

1: Set i = 0 2: while i < ℓ do 3:

Sample θ∗ from π(θ)

4:

Simulate X ∗ from f(X

  • θ∗)

9: end while

Adam Rohrlach

slide-31
SLIDE 31

The Rejection-Acceptance Algorithm.

Consider obtaining ℓ posterior samples using some

  • bserved data X obs:

1: Set i = 0 2: while i < ℓ do 3:

Sample θ∗ from π(θ)

4:

Simulate X ∗ from f(X

  • θ∗)

5:

if (X ∗ = X obs) then

8:

end if

9: end while

Adam Rohrlach

slide-32
SLIDE 32

The Rejection-Acceptance Algorithm.

Consider obtaining ℓ posterior samples using some

  • bserved data X obs:

1: Set i = 0 2: while i < ℓ do 3:

Sample θ∗ from π(θ)

4:

Simulate X ∗ from f(X

  • θ∗)

5:

if (X ∗ = X obs) then

6:

accept θ∗

7:

i = i + 1

8:

end if

9: end while

Adam Rohrlach

slide-33
SLIDE 33

The Rejection-Acceptance Algorithm.

Gives the true posterior distribution P(θ

  • X obs).

Adam Rohrlach

slide-34
SLIDE 34

The Rejection-Acceptance Algorithm.

Gives the true posterior distribution P(θ

  • X obs).

Extremely slow convergence in cases where our data has high dimensionality.

Adam Rohrlach

slide-35
SLIDE 35

The Rejection-Acceptance Algorithm.

Gives the true posterior distribution P(θ

  • X obs).

Extremely slow convergence in cases where our data has high dimensionality. Could consider accepting data that is ‘close enough’.

Adam Rohrlach

slide-36
SLIDE 36

The Rejection-Acceptance Algorithm.

Gives the true posterior distribution P(θ

  • X obs).

Extremely slow convergence in cases where our data has high dimensionality. Could consider accepting data that is ‘close enough’. If “X ∗ = X obs” is unrealistic, try “X ∗ ≈ X obs”

Adam Rohrlach

slide-37
SLIDE 37

The Rejection-Acceptance Algorithm.

For some distance function ρ(X, Y), and some ‘tolerance’ parameter ǫ, the algorithm now becomes:

Adam Rohrlach

slide-38
SLIDE 38

The Rejection-Acceptance Algorithm.

For some distance function ρ(X, Y), and some ‘tolerance’ parameter ǫ, the algorithm now becomes:

1: Set i = 0 2: while i < ℓ do 3:

Sample θ∗ from π(θ)

4:

Simulate X ∗ from f(X

  • θ)∗

5:

if (ρ(X ∗, X obs) < ǫ) then

6:

accept θ∗

7:

i = i + 1

8:

end if

9: end while

Adam Rohrlach

slide-39
SLIDE 39

The Rejection-Acceptance Algorithm.

Gives an approximate posterior distribution ˆ P(θ

  • X obs).

Adam Rohrlach

slide-40
SLIDE 40

The Rejection-Acceptance Algorithm.

Gives an approximate posterior distribution ˆ P(θ

  • X obs).

ˆ P(θ

  • X obs) → P(θ
  • X obs) as ǫ → 0.

Adam Rohrlach

slide-41
SLIDE 41

The Rejection-Acceptance Algorithm.

Gives an approximate posterior distribution ˆ P(θ

  • X obs).

ˆ P(θ

  • X obs) → P(θ
  • X obs) as ǫ → 0.

Still slow convergence for small ǫ.

Adam Rohrlach

slide-42
SLIDE 42

The Rejection-Acceptance Algorithm.

Gives an approximate posterior distribution ˆ P(θ

  • X obs).

ˆ P(θ

  • X obs) → P(θ
  • X obs) as ǫ → 0.

Still slow convergence for small ǫ. Data being ‘similar’ can still be very unlikely.

Adam Rohrlach

slide-43
SLIDE 43

ABC Using Summary Statistics.

What are summary statistics?

Adam Rohrlach

slide-44
SLIDE 44

ABC Using Summary Statistics.

What are summary statistics? A summary statistic is a function of the data (i.e. the sample mean ¯ X).

Adam Rohrlach

slide-45
SLIDE 45

ABC Using Summary Statistics.

What are summary statistics? A summary statistic is a function of the data (i.e. the sample mean ¯ X). Summary statistics are used to reduce the dimensionality

  • f data.

Adam Rohrlach

slide-46
SLIDE 46

ABC Using Summary Statistics.

What are sufficient summary statistics?

Adam Rohrlach

slide-47
SLIDE 47

ABC Using Summary Statistics.

What are sufficient summary statistics? Sufficient summary statistics contain all of the information about a parameter that is available in a sample (i.e. ¯ X is sufficient for µ).

Adam Rohrlach

slide-48
SLIDE 48

ABC Using Summary Statistics.

What are sufficient summary statistics? Sufficient summary statistics contain all of the information about a parameter that is available in a sample (i.e. ¯ X is sufficient for µ). A summary statistic S(X) is sufficient if it can be written in Fisher-Neymann factorised form: L

  • X
  • θ
  • = g(X)hθ
  • S(X)
  • θ
  • Adam Rohrlach
slide-49
SLIDE 49

ABC Using Summary Statistics.

It can be shown P

  • θ
  • X obs
  • = P
  • θ
  • S(X obs)
  • .

Adam Rohrlach

slide-50
SLIDE 50

ABC Using Summary Statistics.

It can be shown P

  • θ
  • X obs
  • = P
  • θ
  • S(X obs)
  • .

That is, we can compare sufficient summary statistics to

  • btain the exact posterior distribution for θ.

Adam Rohrlach

slide-51
SLIDE 51

The Modified Rejection-Acceptance Algorithm.

For some distance function ρ(S(X), S(Y)), and some ‘tolerance’ parameter ǫ, the algorithm now becomes:

1: Set i = 0 2: while i < ℓ do 3:

Sample θ∗ from π(θ)

4:

Simulate X ∗ from f(X

  • θ∗)

5:

if (ρ(S(X ∗), S(X obs)) < ǫ) then

6:

accept θ∗

7:

i = i + 1

8:

end if

9: end while

Adam Rohrlach

slide-52
SLIDE 52

ABC Using Summary Statistics.

Gives the same posterior distribution ˆ P(θ

  • S(X obs)) if S(X)

is sufficient.

Adam Rohrlach

slide-53
SLIDE 53

ABC Using Summary Statistics.

Gives the same posterior distribution ˆ P(θ

  • S(X obs)) if S(X)

is sufficient. Again, ˆ P(θ

  • S(X obs)) → P(θ
  • X obs) as ǫ → 0.

Adam Rohrlach

slide-54
SLIDE 54

ABC Using Summary Statistics.

Gives the same posterior distribution ˆ P(θ

  • S(X obs)) if S(X)

is sufficient. Again, ˆ P(θ

  • S(X obs)) → P(θ
  • X obs) as ǫ → 0.

Convergence can now be faster.

Adam Rohrlach

slide-55
SLIDE 55

ABC Using Summary Statistics.

Gives the same posterior distribution ˆ P(θ

  • S(X obs)) if S(X)

is sufficient. Again, ˆ P(θ

  • S(X obs)) → P(θ
  • X obs) as ǫ → 0.

Convergence can now be faster. Sufficient summary statistics rarely show up when required.

Adam Rohrlach

slide-56
SLIDE 56

ABC Using Summary Statistics.

Gives the same posterior distribution ˆ P(θ

  • S(X obs)) if S(X)

is sufficient. Again, ˆ P(θ

  • S(X obs)) → P(θ
  • X obs) as ǫ → 0.

Convergence can now be faster. Sufficient summary statistics rarely show up when required. Choosing a ‘best summary statistic’ was the focus of my Masters [2].

Adam Rohrlach

slide-57
SLIDE 57

Approximately Sufficient Summary Statistics

We have insufficient summary statistics S = {S1, · · · , ST}.

Adam Rohrlach

slide-58
SLIDE 58

Approximately Sufficient Summary Statistics

We have insufficient summary statistics S = {S1, · · · , ST}. We have parameters of interest Φ = {φ1, · · · , φP}

Adam Rohrlach

slide-59
SLIDE 59

Approximately Sufficient Summary Statistics

We have insufficient summary statistics S = {S1, · · · , ST}. We have parameters of interest Φ = {φ1, · · · , φP} Create Γ simulations, which gives Γ × T summary statistics with known input parameters (call this TrainDat).

Adam Rohrlach

slide-60
SLIDE 60

Approximately Sufficient Summary Statistics

We have insufficient summary statistics S = {S1, · · · , ST}. We have parameters of interest Φ = {φ1, · · · , φP} Create Γ simulations, which gives Γ × T summary statistics with known input parameters (call this TrainDat). For each n ∈ {1, · · · , P} perform linear regression on the TrainDat such that we can get predictions ˆ φn = ˆ β(n) +

T

  • j=1

ˆ β(n)

j

sj

Adam Rohrlach

slide-61
SLIDE 61

Approximately Sufficient Summary Statistics

We have insufficient summary statistics S = {S1, · · · , ST}. We have parameters of interest Φ = {φ1, · · · , φP} Create Γ simulations, which gives Γ × T summary statistics with known input parameters (call this TrainDat). For each n ∈ {1, · · · , P} perform linear regression on the TrainDat such that we can get predictions ˆ φn = ˆ β(n) +

T

  • j=1

ˆ β(n)

j

sj We now have a ‘best predicted parameter value’ if we have summary statistics.

Adam Rohrlach

slide-62
SLIDE 62

Model Selection in ABC.

How do we choose which model we might wish to simulate data under?

Adam Rohrlach

slide-63
SLIDE 63

Model Selection in ABC.

Consider models M = {M1, · · · , Mq}

Adam Rohrlach

slide-64
SLIDE 64

Model Selection in ABC.

Consider models M = {M1, · · · , Mq} We can add a step which selects which model we might simulate under.

Adam Rohrlach

slide-65
SLIDE 65

The Very Modified Rejection-Acceptance Algorithm.

Let R(Mk) be the probability of Model k, and πk(θ) be the prior distribution for parameters under Model k.

Adam Rohrlach

slide-66
SLIDE 66

The Very Modified Rejection-Acceptance Algorithm.

Let R(Mk) be the probability of Model k, and πk(θ) be the prior distribution for parameters under Model k. Consider obtaining ℓ posterior samples from a possible q models using some observed data X obs:

Adam Rohrlach

slide-67
SLIDE 67

The Very Modified Rejection-Acceptance Algorithm.

Let R(Mk) be the probability of Model k, and πk(θ) be the prior distribution for parameters under Model k. Consider obtaining ℓ posterior samples from a possible q models using some observed data X obs:

1: Set i = 0 2: while i < ℓ do 10: end while

Adam Rohrlach

slide-68
SLIDE 68

The Very Modified Rejection-Acceptance Algorithm.

Let R(Mk) be the probability of Model k, and πk(θ) be the prior distribution for parameters under Model k. Consider obtaining ℓ posterior samples from a possible q models using some observed data X obs:

1: Set i = 0 2: while i < ℓ do 3:

Randomly select some model k to simulate via R(·)

10: end while

Adam Rohrlach

slide-69
SLIDE 69

The Very Modified Rejection-Acceptance Algorithm.

Let R(Mk) be the probability of Model k, and πk(θ) be the prior distribution for parameters under Model k. Consider obtaining ℓ posterior samples from a possible q models using some observed data X obs:

1: Set i = 0 2: while i < ℓ do 3:

Randomly select some model k to simulate via R(·)

4:

Sample θ∗ from πk(θ)

10: end while

Adam Rohrlach

slide-70
SLIDE 70

The Very Modified Rejection-Acceptance Algorithm.

Let R(Mk) be the probability of Model k, and πk(θ) be the prior distribution for parameters under Model k. Consider obtaining ℓ posterior samples from a possible q models using some observed data X obs:

1: Set i = 0 2: while i < ℓ do 3:

Randomly select some model k to simulate via R(·)

4:

Sample θ∗ from πk(θ)

5:

Simulate X ∗ from fk(X

  • θ∗)

6:

if (ρ(S(X ∗), S(X obs)) < ǫ) then

7:

accept θ∗

8:

i = i + 1

9:

end if

10: end while

Adam Rohrlach

slide-71
SLIDE 71

Model Selection in ABC.

How can we choose which Mi best fits our data?

Adam Rohrlach

slide-72
SLIDE 72

Model Selection in ABC.

How can we choose which Mi best fits our data? Common approach is to use ‘Bayes Factors’ Bij, i = j ∈ {1, · · · , q}.

Adam Rohrlach

slide-73
SLIDE 73

Bayes Factors.

The Bayes Factor for Models i and j is:

Adam Rohrlach

slide-74
SLIDE 74

Bayes Factors.

The Bayes Factor for Models i and j is: Bij = P

  • X
  • Mi
  • P
  • X
  • Mj
  • Adam Rohrlach
slide-75
SLIDE 75

Bayes Factors.

The Bayes Factor for Models i and j is: Bij = P

  • X
  • Mi
  • P
  • X
  • Mj
  • = P
  • Mi
  • X
  • P (X) /R(Mi)

P

  • Mj
  • X
  • P (X) /R(Mj)

Adam Rohrlach

slide-76
SLIDE 76

Bayes Factors.

The Bayes Factor for Models i and j is: Bij = P

  • X
  • Mi
  • P
  • X
  • Mj
  • = P
  • Mi
  • X
  • P (X) /R(Mi)

P

  • Mj
  • X
  • P (X) /R(Mj)

= P

  • Mi
  • X
  • P
  • Mj
  • X

, if R(·) has a uniform distribution.

Adam Rohrlach

slide-77
SLIDE 77

Bayes Factors.

The Bayes Factor for Models i and j is Bij = P

  • Mi
  • X
  • P
  • Mj
  • X

.

Adam Rohrlach

slide-78
SLIDE 78

Bayes Factors.

The Bayes Factor for Models i and j is Bij = P

  • Mi
  • X
  • P
  • Mj
  • X

. This is just the ‘posterior ratio’ for Models i and j.

Adam Rohrlach

slide-79
SLIDE 79

Bayes Factors.

The Bayes Factor for Models i and j is Bij = P

  • Mi
  • X
  • P
  • Mj
  • X

. This is just the ‘posterior ratio’ for Models i and j. Imagine out of 300 retained posterior parameter samples: 200 are from model i, and 100 are from model j, = ⇒ Bij

Adam Rohrlach

slide-80
SLIDE 80

Bayes Factors.

The Bayes Factor for Models i and j is Bij = P

  • Mi
  • X
  • P
  • Mj
  • X

. This is just the ‘posterior ratio’ for Models i and j. Imagine out of 300 retained posterior parameter samples: 200 are from model i, and 100 are from model j, = ⇒ Bij = 200/300

100/300

Adam Rohrlach

slide-81
SLIDE 81

Bayes Factors.

The Bayes Factor for Models i and j is Bij = P

  • Mi
  • X
  • P
  • Mj
  • X

. This is just the ‘posterior ratio’ for Models i and j. Imagine out of 300 retained posterior parameter samples: 200 are from model i, and 100 are from model j, = ⇒ Bij = 200/300

100/300 = 2.

Adam Rohrlach

slide-82
SLIDE 82

A Fundamental Flaw of Bayes Factors.

It can be shown that [3]: Bij = P

  • Mi
  • X
  • P
  • Mj
  • X

× hj

  • X
  • S(X)
  • hi
  • X
  • S(X)
  • Adam Rohrlach
slide-83
SLIDE 83

A Fundamental Flaw of Bayes Factors.

It can be shown that [3]: Bij = P

  • Mi
  • X
  • P
  • Mj
  • X

× hj

  • X
  • S(X)
  • hi
  • X
  • S(X)
  • = P
  • Mi
  • X
  • P
  • Mj
  • X
  • Adam Rohrlach
slide-84
SLIDE 84

A Fundamental Flaw of Bayes Factors.

It can be shown that [3]: Bij = P

  • Mi
  • X
  • P
  • Mj
  • X

× hj

  • X
  • S(X)
  • hi
  • X
  • S(X)
  • = P
  • Mi
  • X
  • P
  • Mj
  • X

⇒ hj

  • X
  • S(X)
  • = hi
  • X
  • S(X)
  • Adam Rohrlach
slide-85
SLIDE 85

A Fundamental Flaw of Bayes Factors.

It can be shown that [3]: Bij = P

  • Mi
  • X
  • P
  • Mj
  • X

× hj

  • X
  • S(X)
  • hi
  • X
  • S(X)
  • = P
  • Mi
  • X
  • P
  • Mj
  • X

⇒ hj

  • X
  • S(X)
  • = hi
  • X
  • S(X)
  • That is, Bij will be biased unless the probability of seeing

the data, given the observed summary statistics, is equal for each model.

Adam Rohrlach

slide-86
SLIDE 86

Post-Hoc Model Comparison.

Consider other problems with Bij (and any post-hoc model comparison method).

Adam Rohrlach

slide-87
SLIDE 87

Post-Hoc Model Comparison.

Consider other problems with Bij (and any post-hoc model comparison method). Posterior distributions are sensitive to choices of prior distributions.

Adam Rohrlach

slide-88
SLIDE 88

Post-Hoc Model Comparison.

Consider other problems with Bij (and any post-hoc model comparison method). Posterior distributions are sensitive to choices of prior distributions. A particularly poor choice of πj(θ) may reduce the number

  • f retained simulations under Model j, and hence inflate Bij.

Adam Rohrlach

slide-89
SLIDE 89

Post-Hoc Model Comparison.

We would like a model selection algorithm that avoids comparing posterior distributions.

Adam Rohrlach

slide-90
SLIDE 90

Post-Hoc Model Comparison.

We would like a model selection algorithm that avoids comparing posterior distributions. Given that our ‘semi-automatic summary selection’ version ABC is an example of ‘supervised learning’, we could consider a similar method for model selection.

Adam Rohrlach

slide-91
SLIDE 91

Multiple Logistic Regression.

Let X be our data (the collection of Γ × T summary statistics),

Adam Rohrlach

slide-92
SLIDE 92

Multiple Logistic Regression.

Let X be our data (the collection of Γ × T summary statistics), Let xm =

  • sm

1 , · · · , sm T

  • be the mth row of X (the summary

statistics from the mth simulation).

Adam Rohrlach

slide-93
SLIDE 93

Multiple Logistic Regression.

Let X be our data (the collection of Γ × T summary statistics), Let xm =

  • sm

1 , · · · , sm T

  • be the mth row of X (the summary

statistics from the mth simulation). Let Y m be the category of the mth observation (the model used for the mth simulation).

Adam Rohrlach

slide-94
SLIDE 94

Multiple Logistic Regression.

Let X be our data (the collection of Γ × T summary statistics), Let xm =

  • sm

1 , · · · , sm T

  • be the mth row of X (the summary

statistics from the mth simulation). Let Y m be the category of the mth observation (the model used for the mth simulation). Let βc =

  • βc

0, · · · , βc T

  • be the vector of coefficients for

category c.

Adam Rohrlach

slide-95
SLIDE 95

Multiple Logistic Regression.

Let X be our data (the collection of Γ × T summary statistics), Let xm =

  • sm

1 , · · · , sm T

  • be the mth row of X (the summary

statistics from the mth simulation). Let Y m be the category of the mth observation (the model used for the mth simulation). Let βc =

  • βc

0, · · · , βc T

  • be the vector of coefficients for

category c. We aim to best fit the model ln

  • P(Y m = c
  • X)

P(Y m = q

  • X)
  • = βc · x,,

for c = 1, · · · , J − 1.

Adam Rohrlach

slide-96
SLIDE 96

Multiple Logistic Regression.

We end up with a predictive model such that we can predict for X NEW: P(Y m = c

  • X NEW) = pc

for each c ∈ {1, · · · , q}, such that

q

  • i=1

pi = 1.

Adam Rohrlach

slide-97
SLIDE 97

Multiple Logistic Regression Example.

Consider two opposing models of population dynamics:

50000 100000 150000 4000 8000 12000 16000

Generations Before Present (t) Ne(t)

Model Bottleneck Exponential

Adam Rohrlach

slide-98
SLIDE 98

Multiple Logistic Regression Example.

The Bottleneck Model:

A sudden reduction to between 20% and 40% of the effective population size occurs before the species dies out.

The Exponential Model:

There was no sudden population size reduction, the species just died out (relatively) slowly over 3000 generations.

Adam Rohrlach

slide-99
SLIDE 99

Multiple Logistic Regression Example.

However, we don’t know which model fits our data best. If the data came from the Bottleneck Model, my prior belief is that: N(16000) = 150, 000, N(15500) ∼ U(30, 000, 75, 000) and N(12000) ∼ U(300, 12500). If the data came from the Exponential Model, my prior belief is that: N(16000) = 150, 000, N(15500) = 150, 000 and N(12000) ∼ U(300, 7500).

Adam Rohrlach

slide-100
SLIDE 100

Multiple Logistic Regression Example.

I produced training data of this form with only 10,000 (5000 simulations from each model ≈ 2 mins), and fit the MLR (call this trainDat).

Adam Rohrlach

slide-101
SLIDE 101

Multiple Logistic Regression Example.

I produced training data of this form with only 10,000 (5000 simulations from each model ≈ 2 mins), and fit the MLR (call this trainDat). I then produced another 10,000 independent simulations (call this testDat).

Adam Rohrlach

slide-102
SLIDE 102

Multiple Logistic Regression Example.

I produced training data of this form with only 10,000 (5000 simulations from each model ≈ 2 mins), and fit the MLR (call this trainDat). I then produced another 10,000 independent simulations (call this testDat). Finally, I used the MLR to find which model I would predict had produced each of the ‘testDat’ simulations.

Adam Rohrlach

slide-103
SLIDE 103

Multiple Logistic Regression Example.

I produced training data of this form with only 10,000 (5000 simulations from each model ≈ 2 mins), and fit the MLR (call this trainDat). I then produced another 10,000 independent simulations (call this testDat). Finally, I used the MLR to find which model I would predict had produced each of the ‘testDat’ simulations. The model predicted correctly for 99.53% of the testDat simulations (total 4.5 minutes).

Adam Rohrlach

slide-104
SLIDE 104

Multiple Logistic Regression Example.

I produced training data of this form with only 10,000 (5000 simulations from each model ≈ 2 mins), and fit the MLR (call this trainDat). I then produced another 10,000 independent simulations (call this testDat). Finally, I used the MLR to find which model I would predict had produced each of the ‘testDat’ simulations. The model predicted correctly for 99.53% of the testDat simulations (total 4.5 minutes). A corresponding Bayes Factor Analysis returned 17.03% accuracy (total 21 minutes).

Adam Rohrlach

slide-105
SLIDE 105

Multiple Logistic Regression Example.

Recall the two opposing models of population dynamics:

50000 100000 150000 4000 8000 12000 16000

Generations Before Present (t) Ne(t)

Model Bottleneck Exponential

Adam Rohrlach

slide-106
SLIDE 106

Multiple Logistic Regression Example.

  • Comp.1

Comp.2 Comp.3

Adam Rohrlach

slide-107
SLIDE 107

Conclusions.

In my thesis we performed a four model Semi-Automatic ABC Analysis.

Adam Rohrlach

slide-108
SLIDE 108

Conclusions.

In my thesis we performed a four model Semi-Automatic ABC Analysis. Our MLR classification returned > 96% accuracy for > 250, 000 simulations.

Adam Rohrlach

slide-109
SLIDE 109

Conclusions.

In my thesis we performed a four model Semi-Automatic ABC Analysis. Our MLR classification returned > 96% accuracy for > 250, 000 simulations. A complimentary Bayes Factor analysis never returned a correct post-hoc analysis for our simulated data.

Adam Rohrlach

slide-110
SLIDE 110

Conclusions.

In my thesis we performed a four model Semi-Automatic ABC Analysis. Our MLR classification returned > 96% accuracy for > 250, 000 simulations. A complimentary Bayes Factor analysis never returned a correct post-hoc analysis for our simulated data. Our method does not require ABC to be performed on all possible models (just simulations).

Adam Rohrlach

slide-111
SLIDE 111

Thanks.

Dr Barbara Holland and Dr Jeremy Sumner.

  • Prof. Nigel Bean and Dr Jono Tuke.
  • Prof. Alan Cooper and everyone at ACAD.

ACEMS for funding my visit.

Adam Rohrlach

slide-112
SLIDE 112

[1] M.A. Beaumont. Approximate Bayesian Computation in Evolution and Ecology. Annual Review of Ecology, Evolution, and Systematics, 41:379–406, 2010. [2] P . Fearnhead and D. Prangle. Constructing Summary Statistics for Approximate Bayesian Computation: Semi-automatic Approximate Bayesian Computation. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(3):419–474, June 2012. [3]

  • C. Robert, J-M. Cornuet, J-M. Marin, and N.S. Pillai. Lack of Confidence in Approximate Bayesian Computation

Model Choice. Proceedings of the National Academy of Sciences, 108(37):15112–15117, 2011. doi: 10.1073/pnas.1102900108. URL http://www.pnas.org/content/108/37/15112.abstract. [4]

  • B. Shapiro, A. J. Drummond, A. Rambaut, M. C. Wilson, P

. E. Matheus, A. V. Sher, O. G. Pybus, M. T. P . Gilbert,

  • I. Barnes, J. Binladen, E. Willerslev, A. J. Hansen, G. F. Baryshnikov, J. A. Burns, S. Davydov, J. C. Driver, D. G.

Froese, C. R. Harington, G. Keddie, P . Kosintsev, M. L. Kunz, L. D. Martin, R. O. Stephenson, J. Storer,

  • R. Tedford, S. Zimov, and A. Cooper. Rise and Fall of the Beringian Steppe Bison. Science, 306:1561–1565,

November 2004. Adam Rohrlach