Outline Searching Through trees 1. Branchswapping: NNI, SPR, TBR. 2. - - PDF document

outline
SMART_READER_LITE
LIVE PREVIEW

Outline Searching Through trees 1. Branchswapping: NNI, SPR, TBR. 2. - - PDF document

2/25/09 CSCI1950Z Computa3onal Methods for Biology Lecture 9 Ben Raphael February 23, 2009 hHp://cs.brown.edu/courses/csci1950z/ Outline Searching Through trees 1. Branchswapping: NNI, SPR, TBR. 2. MCMC Consensus Trees and


slide-1
SLIDE 1

2/25/09 1

CSCI1950‐Z Computa3onal Methods for Biology Lecture 9

Ben Raphael February 23, 2009

hHp://cs.brown.edu/courses/csci1950‐z/

Outline

Searching Through trees

  • 1. Branch‐swapping: NNI, SPR, TBR.
  • 2. MCMC

Consensus Trees and Supertrees

slide-2
SLIDE 2

2/25/09 2

Heuris3c Search

  • 1. Start with an arbitrary tree T.
  • 2. Check “neighbors” of T.
  • 3. Move to a neighbor if it provides the best

improvement in parsimony/likelihood score.

Caveats: Could be stuck in local

  • p3mum, and not

achieve global

  • p3mum

Trees and Splits

Given a set X, a split is a par33on of X into two non‐ empty subsets A and B such that X = A | B. For a phylogene3c tree T with leaves L, each edge e defines a split Le = A | B, where A and B are the leaves in the subtrees obtained by removing e.

A B e

slide-3
SLIDE 3

2/25/09 3

Compu3ng the Splits Metric

A phylogene3c tree T defines a collec3on of splits Σ(T) = { Le | e is edge in T}. Theorem: ρ(T1, T2) = | Σ(T1) \ Σ(T2) | + |Σ(T2) \ Σ(T1) | = |Σ(T1)| + |Σ(T2)| ‐ 2 |Σ(T1)∩Σ(T2)| Proof: (whiteboard) Nota3on: A \ B = {x: x ∈ A, x ∉ B}

Nearest Neighbor Interchange

Claim: The number of NNI neighbors of a binary tree is 2(n‐3) Proof: (whiteboard)

Rearrange four subtrees defined by one internal edge

slide-4
SLIDE 4

2/25/09 4

Subtree Pruning and Regrafing (SPR)

  • 1. Remove a branch.
  • 2. Reconnect incident vertex by

subdividing a branch

Subtree Pruning and Regrafing (SPR)

  • 1. Remove a branch.
  • 2. Reconnect incident vertex by

subdividing a branch

Claim: The number of SPR neighbors of a binary tree is 2(n‐3) (2n – 7) Proof: (whiteboard)

slide-5
SLIDE 5

2/25/09 5

Tree Bisec3on and Reconnec3on (TBR)

  • 1. Remove a branch.
  • 2. Reconnect subtrees by adding

new branch that subdivides branches in both.

Rela3onship between Opera3ons

  • Every NNI is an SPR and every SPR is a TBR.
  • Every TBR is a single SPR or a composi3on of

two SPR.

  • All three types of opera3ons are inver3ble:

If T  T’, then T’  T. Theorem: For all T and T’ in B(n), there is a sequence

  • f NNI (or SPR or TBR) opera3ons that transform T

into T’.

α α‐1

slide-6
SLIDE 6

2/25/09 6

Rela3onship between Opera3ons

  • Every NNI is an SPR and every SPR is a TBR.
  • Every TBR is a single SPR or a composi3on of two SPR.
  • All three types of opera3ons are inver3ble:

If T  T’, then T’  T.

NNI TBR SPR

Heuris3c Search

  • 1. Start with an arbitrary tree T.
  • 2. Check “neighbors” of T.
  • 3. Move to a neighbor if it provides the best

improvement in parsimony/likelihood score.

PAUP* (widely used phylogene3c package) includes command: hsearch nreps=num swap=type Where type = NNI, SPR, TBR

slide-7
SLIDE 7

2/25/09 7

From Likelihood to Bayesian

Given data X = (x1, …, xn), we found the tree T and branch lengths t* that maximized likelihood Pr[X | T, t*]. What about other trees? Could we compute Pr[T, t* | X]?

Back to Coin Flipping

Flip coin with p = Pr[heads] unknown. Earlier we computed max. likelihood es3mate of p. L(p) = Pr[ D | p]. Pr[p | D] = Pr[ p, D]/Pr[D] = Pr[D|p]Pr[p] / Pr[D]

44 tosses 20 heads 11 tosses 5 heads Prior Posterior

slide-8
SLIDE 8

2/25/09 8

Bayesian Methods

Pr[T, t* | X] = Pr[X, T, t*] / Pr[X] = Pr[X | T, t*] Pr[T, t*] / Pr[X] = Pr[X | T, t*] Pr[T, t*] / (ΣT’, t’Pr[X | T’, t’] Pr[T’, t’]

Prior Posterior

Problem: Cannot compute denominator.

Bayes Theorem

Bayesian Methods

Pr[T, t* | X] = Pr[X, T, t*] / Pr[X] = Pr[X | T, t*] Pr[T, t*] / Pr[X] = Pr[X | T, t*] Pr[T, t*] / (ΣT’, t’Pr[X | T’, t’] Pr[T’, t’]

Prior Posterior

Problem: Cannot compute denominator. Solu2on: Use power of Markov Chains to draw trees (“sample”) according to distribu3on Pr[T, t* | X]

Bayes Theorem

slide-9
SLIDE 9

2/25/09 9

Markov Chain Monte Carlo

To sample from a distribu3on Define a Markov chain with equilibrium distribu3on π. Simulate chain through many transi3ons. Afer many transi3ons (e.g. ~10000), will be at equilibrium π. (“Burn‐in”) Output every n‐th state. (n ~ 50).

A C G T Jukes‐Cantor model of DNA Equilibrium distribu3on: qA = qC = qG = qT = 1/4

MCMC on Trees

NNI neighborhood for trees with 5 leaves

  • 1. Define a Markov chain:
  • States are trees T.
  • Equilibrium distribu3on is posterior Pr[T,

t* | X].

  • 2. Simulate Markov chain for many steps (burn‐

in).

  • 3. Output T from every n‐th (e.g. n = 50) step.
slide-10
SLIDE 10

2/25/09 10

MCMC on Trees

NNI neighborhood for trees with 5 leaves

  • 1. Define a Markov chain:
  • States are trees T.
  • Equilibrium distribu3on is posterior Pr[T,

t* | X].

  • 2. Simulate Markov chain for many steps (burn‐

in).

  • 3. Output T from every n‐th (e.g. n = 50) step.

For transi3ons, can use NNI, SPR, TBR, or other

  • pera3ons.

Can define* the transi3on probabili3es of this Markov chain without compu3ng Z = (ΣT’, t’Pr[X | T’, t’] Pr[T’, t’] (Metropolis algorithm).

*“involves burning of incense, cas3ng of chicken bones, use of magical incanta3ons, and invoking the

  • pinions of more pres3gious colleagues.” ‐‐Felsenstein

How Many Times Did Wings Evolve?

  • Previous studies had shown loss of wings:

winged  wingless transi3ons

  • Gain of wings (Wingless  winged transi3on)

appears to be much more complicated

slide-11
SLIDE 11

2/25/09 11

Phylogeny of Insects

Build phylogeny of winged and wingless s3ck insects Used data from: 18S ribosomal DNA (~1,900 base pairs (bp)) 28S rDNA (2,250 bp) Por3on of histone 3 (H3, 372 bp) Used mul3ple tree reconstruc3on techniques (Nature 2003)

Most Parsimonious Evolu3onary Tree of Winged and Wingless Insects

  • All most parsimonious

reconstruc3on gave a wingless ancestor

  • All required mul3ple

winged  wingless transi3ons.

slide-12
SLIDE 12

2/25/09 12

Most Parsimonious Evolu3onary Tree of Winged and Wingless Insects

Will Wingless Insects Fly Again?

  • All most parsimonious reconstruc3ons all

required the re‐inven3on of wings.

  • It is likely that wing developmental pathways

are conserved in wingless s3ck insects

slide-13
SLIDE 13

2/25/09 13

Next Ques3ons

  • How to combine/merge trees?
  • How to determine “confidence” in a par3cular

tree/branch?

Mul3ple Trees?

slide-14
SLIDE 14

2/25/09 14

Consensus Trees Strict Consensus Tree

slide-15
SLIDE 15

2/25/09 15

Strict Consensus

No non‐trivial splits in common! Strict consensus tree is unresolved.

Splits Equivalence Theorem

A phylogene3c tree T defines a collec3on of splits Σ(T) = { Le | e is edge in T}. Splits A1 | B1 and A2 | B2 are pairwise compa.ble if at least one of A1∩A2 , A1∩B2 , B1∩A2, and B1∩B2 is the empty set. Splits Equivalence Theorem: Let Σ be a collec3on of

  • splits. There is a phylogene3c tree such that Σ(T) = Σ if

and only if the splits in Σ are pairwise compa3ble. The Pairwise Compa3bility Theorem (for binary characters) follows from this theorem.

slide-16
SLIDE 16

2/25/09 16

Majority Consensus Tree Majority Consensus Tree