Bayesian Phylogenetics Mark Holder (with big thanks to Paul Lewis) - - PowerPoint PPT Presentation

bayesian phylogenetics
SMART_READER_LITE
LIVE PREVIEW

Bayesian Phylogenetics Mark Holder (with big thanks to Paul Lewis) - - PowerPoint PPT Presentation

Bayesian Phylogenetics Mark Holder (with big thanks to Paul Lewis) Outline Intro What is Bayesian Analysis? Why be a Bayesian? What is required to do a Bayesian Analysis? (Priors) How can the required calculations be done?


slide-1
SLIDE 1

Bayesian Phylogenetics

Mark Holder (with big thanks to Paul Lewis)

slide-2
SLIDE 2

Outline

  • Intro

– What is Bayesian Analysis? – Why be a Bayesian?

  • What is required to do a Bayesian Analysis? (Priors)
  • How can the required calculations be done? (MCMC)
  • Prospects and Warnings
slide-3
SLIDE 3

Simple Example: Vesicouretural Reflux (VUR) - valves between the ureters and bladder do not shut fully.

  • leads to urinary tract infections
  • if not corrected, can cause serious kidney damage
  • effective diagnostic tests are available, but they are

expensive and invasive

slide-4
SLIDE 4
  • ≈ 1% of children will have VUR
  • ≈ 80% of children with VUR will see a doctor about

an infection

  • ≈ 2% of all children will see doctor about an infection

Should a child with 1 infection be screened for VUR?

slide-5
SLIDE 5

1% of the population has VUR Pr(V) = 0.01 v v v v v v v v v v = 0.1% of the population

slide-6
SLIDE 6

80% of kids with VUR get an infection Pr(I|V) = 0.8 Pr(I|V) is a conditional probability

slide-7
SLIDE 7

So, 0.8% of the population has VUR and will get an infection Pr(V)Pr(I|V) = 0.01 X 0.8 = 0.008 Pr(I,V) = 0.008 v v v v v v v v v v I I I I I I I I Pr(I,V) is a joint probability

slide-8
SLIDE 8

I ? I ? I ? I ? I ? I ? I ? I ? I ? I ? I ? I ? I ? I ? I ? I ? I ? I ? I ? I ? 2% of the population gets an infection Pr(I) = 0.02

slide-9
SLIDE 9

I v I v I v I v I v I ? I ? I ? I ? I ? I v I v I v I ? I ? I ? I ? I ? I ? I ? We just calculted that 0.8% of kids have VUR and get an infection

slide-10
SLIDE 10

I v I v I v I v I v I I I I I I v I v I v I I I I I I I The other 0.12% must not have VUR So, 40% of kids with infections have VUR Pr(V|I) = 0.4

slide-11
SLIDE 11

Pr(V |I) = Pr(V )Pr(I|V ) Pr(I) Pr(V |I) = 0.01 × 0.8 0.02 = 0.40

slide-12
SLIDE 12

Pr(I) is higher for females. Pr(I|~) = 0.03 Pr(I||) = 0.01 Pr(V |I, ~) = 0.01×0.8

0.03

Pr(V |I, |) = 0.01×0.8

0.01

Pr(V |I, ~) = 0.267 Pr(V |I, |) = 0.8

slide-13
SLIDE 13

Bayes’ Rule Pr(A|B) = Pr(A)Pr(B|A) Pr(B)

Pr(Hypothesis|Data) = Pr(Hypothesis)Pr(Data|Hypothesis) Pr(Data)

slide-14
SLIDE 14

Pr(Tree|Data) = Pr(Tree)Pr(Data|Tree) Pr(Data) We can ignore Pr(Data) (2nd half of this lecure)

slide-15
SLIDE 15

Pr(Tree|Data) ∝ Pr(Tree)Pr(Data|Tree) Pr(Tree) is the prior probability of the tree.

slide-16
SLIDE 16

Pr(Tree|Data) ∝ Pr(Tree)Pr(Data|Tree) Pr(Tree) is the prior probability of the tree. Pr(Data|Tree) is the likelihood of the tree. Pr(Tree|Data) ∝ Pr(Tree)L(Tree)

slide-17
SLIDE 17
slide-18
SLIDE 18

Pr(Tree|Data) ∝ Pr(Tree)L(Tree) Pr(Tree) is the prior probability of the tree. L(Tree) is the likelihood of the tree. Pr(Tree|Data) is the posterior probability of the tree.

slide-19
SLIDE 19

The posterior probability is a great way to evaluate trees:

  • Ranks trees
  • Intuitive measure of confidence
  • Is the ideal “weight” for a tree in secondary analyses
  • Closely tied to the likelihood
slide-20
SLIDE 20

Our models don’t give us L(Tree) They give us things like L(Tree, κ, α, ν1, ν2, ν3, ν4, ν5)

A B C D ν

5

ν1 ν3 ν2 ν4

slide-21
SLIDE 21

“Nuisance Parameters” Aspects of the evolutionary model that we don’t care about, but are in the likelihood equation.

slide-22
SLIDE 22
  • 2290
  • 2285
  • 2280
  • 2275
  • 2270

4 6 8 10 12 14

Ln Likelihood κ Ln Likelihood Profile

slide-23
SLIDE 23
  • 2290
  • 2285
  • 2280
  • 2275
  • 2270

4 6 8 10 12 14

Ln Likelihood Profile

κ

MLE of max LnL

slide-24
SLIDE 24

Marginalizing over (integrating out) nuisance parameters L(Tree) =

  • L(Tree, κ)Pr(κ)dκ
  • Removes the nuisance parameter
  • Takes the entire likelihood function into

account

slide-25
SLIDE 25
  • Avoids estimation errors
  • Requires a prior for the parameter
slide-26
SLIDE 26

When there is substantial uncertainty in a parameter’s value, marginalizing can give qualitatively different answers than using the MLE.

Likelihood Nuisance Parameter

slide-27
SLIDE 27

Trees

ω

Joint posterior probability density for trees and ω

slide-28
SLIDE 28

Trees

ω

Marginalize over ω by summing probability in this direction Trees

Posterior Probability

1 2 3 4 5 6 7 8 9 1 1 1 1 2 1 3 1 4 1 5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

slide-29
SLIDE 29

Trees

ω

M a r g i n a l i z e

  • v

e r t r e e s b y s u m m i n g p r

  • b

a b i l i t y i n t h i s d i r e c t i

  • n

ω

Posterior Prob. Density 0.0 1.0 2.0 0 1 2

slide-30
SLIDE 30

The Bayesian Perspective

Pros Cons Posterior probability Is it robust? is the ideal measure

  • f support

Focus of inference is flexible Marginalizes over Requires a prior nuisance parameters

slide-31
SLIDE 31

Priors

  • Probability distributions
  • Specified before analyzing the data
  • Needed for

– Hypotheses (trees) – Parameters

slide-32
SLIDE 32

Probability Distributions Reflect the action of random forces

slide-33
SLIDE 33

Probability Distributions Reflect the action of random forces OR (if you’re a Bayesian) Reflect your uncertainty

slide-34
SLIDE 34

X

slide courtesy of Derrick Zwickl

slide-35
SLIDE 35

X

slide courtesy of Derrick Zwickl

slide-36
SLIDE 36

X

X

slide courtesy of Derrick Zwickl

slide-37
SLIDE 37
slide-38
SLIDE 38

Considerations when choosing a prior for a parameter

  • What values are most likely?
slide-39
SLIDE 39

0.2 0.4 0.6 0.8 1

p = Pr(Heads) Subjective Prior on Pr(Heads)

slide-40
SLIDE 40

Considerations when choosing a prior for a parameter

  • What values are most likely?
  • How do you express ignorance?

– vague distributions

slide-41
SLIDE 41

0.2 0.4 0.6 0.8 1 1.2 0.2 0.4 0.6 0.8 1

p = P(Heads) Flat Prior on p

slide-42
SLIDE 42

“Non-informative” priors

  • Misleading term
  • Used by many Bayesians to mean “prior that

is expected to have the smallest effect on the posterior”

  • Not always a uniform prior
slide-43
SLIDE 43

Considerations when choosing a prior for a parameter

  • What values are most likely?
  • How do you express ignorance?

– vague distributions – How easily can the likelihood discriminate between parameter values?

slide-44
SLIDE 44

Jeffrey's (Default) Prior

1 2 3 4 5 6 0.2 0.4 0.6 0.8 1

p = P(Heads)

slide-45
SLIDE 45

Example: The Kimura model

A C G T

κ = r r

ti tv

φ = + r r r

ti ti tv

2

Proportion transitions Ratio of rates ( ,

) 0 ∞

( , ) 0 1

Slide by Zwickl

slide-46
SLIDE 46

n κ and φ map onto the predictions of

K80 very differently

Proportion transitions Ratio of rates

Slide by Zwickl

slide-47
SLIDE 47

K80 : κ and φ

n The likelihood surface

is tied to the model predictions

n The curve shapes

(and integrals) are quite different

n The ML estimates

are equivalent

Slide by Zwickl

slide-48
SLIDE 48

0.02 0.04 0.06 0.08 0.1 0.12 0.14 25 50 75 100 125 150 175 200

Posterior Density C<->T rate

Using Dirichlet Prior Using U(0,200)Prior

G<->T rate Effects of the Prior in the GTR model MLE = 45.2

slide-49
SLIDE 49
slide-50
SLIDE 50

Minimizing the effect of priors

  • Flat = non-informative
  • Familiar model parameterizations may

perform poorly in a Bayesian analysis with flat priors.

slide-51
SLIDE 51

Considerations when choosing a prior for a parameter

  • What values are most likely?
  • How do you express ignorance? (minimally

informative priors)

  • Are some errors better than others?
slide-52
SLIDE 52
  • 3900
  • 3895
  • 3890
  • 3885
  • 3880
  • 3875
  • 3870

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

Ln(Likelihood) Internal Branch Length Log-Likelihood for 3 trees

slide-53
SLIDE 53

The Tree's Posterior vs the Branch Length's Prior Mean

0.01 0.1 1 10 100

Internal Branch Prior Mean

0.4 0.5 0.6 0.7 0.8 0.9 1

Posterior

slide-54
SLIDE 54

We might make analyses more conservative by

  • Favoring short internal branch lengths
  • Placing some prior probability on “star”

trees (Lewis et al.)

slide-55
SLIDE 55

We need to worry about sensitivity of our conclusions to all “inputs”

  • Data
  • Model
  • Priors

Often priors will be the least of our concerns

slide-56
SLIDE 56

X

X

slide courtesy of Derrick Zwickl

slide-57
SLIDE 57
slide-58
SLIDE 58

The prior can be a benefit (not just a necessity)

  • f Bayesian analysis
  • Incorporate previous information
  • Make the analysis more conservative

But...

slide-59
SLIDE 59

It can be hard to say “I don’t know”

Priors can strongly affect the analysis if...

  • The prior strongly favors some parameter values, OR
  • The data (via the likelihood) are not very informative

(little data or complex model) Because Bayesian inference relies on marginalization, the priors for all parameters can affect the posterior probabilities of the hypotheses of interest.

slide-60
SLIDE 60

How do we calculate a posterior probability?

Pr(Tree|Data) = Pr(Tree)L(Tree) Pr(Data)

In particular, how do we calculate Pr(Data)?

slide-61
SLIDE 61

Pr(Data) is the marginal probability of the data, so

Pr(Data) =

  • i

Pr(Treei)L(Treei) But this is a sum over all trees (there are lots of trees). Recall that even L(Treei) involves multiple integrals.

slide-62
SLIDE 62

Pr(D) = Posterior Probability Density L(Treei, κ, α, ν1, ν2, ν3, ν4, ν5)Pr(Treei)Pr(κ)Pr(α)Pr(ν1)Pr(ν2) · ·

slide-63
SLIDE 63

Markov chain Monte Carlo

  • Simulates a walk through parameter/tree

space.

  • Lets us estimate posterior probabilities for

any aspect of the model

  • Relies on the ratio of posterior densities

between two points

slide-64
SLIDE 64

R = Pr(Point2|Data) Pr(Point1|Data) R =

Pr(Point2)L(Point2) Pr(Data) Pr(Point1)L(Point1) Pr(Data)

R = Pr (Point2) L (Point2) Pr (Point1) L (Point1)