Outline Searching Through trees 1. Op3mizing branch lengths in ML. - - PDF document

outline
SMART_READER_LITE
LIVE PREVIEW

Outline Searching Through trees 1. Op3mizing branch lengths in ML. - - PDF document

2/25/09 CSCI1950Z Computa3onal Methods for Biology Lecture 8 Ben Raphael February 18, 2009 hHp://cs.brown.edu/courses/csci1950z/ Outline Searching Through trees 1. Op3mizing branch lengths in ML. 2. Compu3ng distances b/w trees. 1


slide-1
SLIDE 1

2/25/09 1

CSCI1950‐Z Computa3onal Methods for Biology Lecture 8

Ben Raphael February 18, 2009

hHp://cs.brown.edu/courses/csci1950‐z/

Outline

Searching Through trees

  • 1. Op3mizing branch lengths in ML.
  • 2. Compu3ng distances b/w trees.
slide-2
SLIDE 2

2/25/09 2

Probabilis3c Model

Given a tree (T, t*) with leaves labeled by characters in M, Pr[ M | T, t*] is the probability of a labeling of ancestral nodes.

Assume: 1. Characters evolve independently: Pr[ M | T, t*] = Πj Pr[ Mj | T, t*] so consider each character separately 2. Constant rate of muta3on on each branch. 3. State of a vertex depends only on parent and branch length: i.e. Pr[ x | y, t] depends only on y and t. (Markov process)

Pr[ x | y, t] = probability that y mutates to x in 3me t

y x t

Probabilis3c Model

n species: x1, x2, …, xn Let α(i) = ancestor of node i. Let an+1, an+2, …, a2n‐1 = characters on internal nodes, where nodes are number from internal ver3ces up to root.

  • an+1,an+2,..,a2n−1

qa2n−1

2n−2

  • i=n+1

Pr[ai|aα(i), ti]

n

  • i=1

Pr[xi|aα(i), ti]

Pr[x1, ..., xn|T, t1, ..., t2n−2] =

Follows from Law of Total Probability: P(X) = Σ P(X| Yi) P(Yi).

slide-3
SLIDE 3

2/25/09 3

Felsenstein’s Algorithm

Let Pr[Tk | a] = probability of leaf nodes “below” node k, given ak = a. Compute via dynamic programming

Pr[Tk|a] =

  • b

Pr[b|a, ti]Pr[Ti|b]

  • c

Pr[c|a, tj]Pr[Tj|c]

Ini3al condi3ons. For k = 1, …, n (leaf nodes) Pr[Tk | a] = 1, if a = xk

0, otherwise.

a b c

Maximum Likelihood when T unknown

Must search over all trees T. Complexity unknown un3l recently:

– Felsenstein book (2004): “There has also been no proof that the problem is NP‐hard (as there has been for many other methods” – Shamir notes (2000): “[Maximum likelihood] not proven to be NP‐complete.”

  • ML is NP‐hard (B. Chor and T. Tuller, RECOMB 2005).

– Use Jukes‐Cantor model.

Pr[x1, . . . , xn|T, t∗] =

  • a

Pr[T2n−1|a]qa

Find T, t* that maximize:

slide-4
SLIDE 4

2/25/09 4

Unknown branch lengths

  • T fixed, branch lengths t* are unknown.
  • Use local op3miza3on rou3ne: e.g. Newton’s

method or Expecta6on Maximiza6on

Finding the Op3mal tree

Large Parsimony Problem Input: M: an n x m character matrix. Output: A tree T with:

  • n leaves labeled by the n

rows of matrix M

  • labeling of the internal

ver3ces of T minimizing the parsimony score over all possible trees and all possible labelings of internal ver3ces

Maximum Likelihood Input: M: an n x m character matrix. Output: A tree T and branch lengths t*:

  • n leaves labeled by the

n rows of matrix M Pr[ M | T, t*] is maximized.

slide-5
SLIDE 5

2/25/09 5

Finding the Op3mal tree

  • Both problems are NP‐hard.
  • Possible search space is huge, especially as n

increases:

– (2n – 3)!! possible rooted trees – (2n – 5)!! possible unrooted trees

  • Exhaus3ve search only possible w/ small n(< 10)
  • Thus, heuris3c search techniques (branch and

bound, simulated annealing, gene3c algorithms,

  • etc. are used)

Heuris3c Search

  • 1. Start with an arbitrary tree T.
  • 2. Check “neighbors” of T*.
  • 3. Move to a neighbor if it provides the best

improvement in parsimony/likelihood score.

Caveats: Could be stuck in local

  • p3mum, and not

achieve global

  • p3mum
slide-6
SLIDE 6

2/25/09 6

Tree Perturba3on

Simple opera3on: add or remove an edge.

ρ(T1, T2) = min{k : There exist α1, . . . , αk such that αk ◦ αk−1 ◦ ... ◦ α1(T1) = T2}

Trees and Splits

Given a set X, a split is a par33on of X into two non‐ empty subsets A and B such that X = A | B. For a phylogene3c tree T with leaves L, each edge e defines a split Le = A | B, where A and B are the leaves in the subtrees obtained by removing e.

A B e

slide-7
SLIDE 7

2/25/09 7

Compu3ng the Splits Metric

A phylogene3c tree T defines a collec3on of splits Σ(T) = { Le | e is edge in T}. Theorem: ρ(T1, T2) = | Σ(T1) \ Σ(T2) | + |Σ(T2) \ Σ(T1) | = |Σ(T1)| + |Σ(T2)| ‐ 2 |Σ(T1)∩Σ(T2)| Proof: (whiteboard) Nota3on: A \ B = {x: x ∈ A, x ∉ B}

Example

|Σ(T1)| = E(T1) = 8. |Σ(T2)| = |E(T2)| = 8. |Σ(T1)∩Σ(T2)| = |E(T)| = 6

From: Semple and Steel (2003)

slide-8
SLIDE 8

2/25/09 8

Splits Metric

Note: ρ(T1, T2) = | Σ(T1) \ Σ(T2) | + |Σ(T1) \ Σ(T2) | = | Σ(T1) Δ Σ(T2) | (symmetric difference) Also called Robinson‐Foulds Metric (1981)

Nearest Neighbor Interchange

A Greedy Algorithm

  • A Branch Swapping algorithm
  • Only evaluates a subset of all possible trees
  • Defines a neighbor of a tree as one reachable

by a nearest neighbor interchange

– A rearrangement of the four subtrees defined by

  • ne internal edge

– Only three different rearrangements per edge

slide-9
SLIDE 9

2/25/09 9

Nearest Neighbor Interchange

Rearrange four subtrees defined by one internal edge

Figure: Jones and Pevzner

Nearest Neighbor Interchange

B(n) := (unrooted) binary phylogene3c trees with n leaves. Theorem (Robinson 1971): For all T and T’ in B(n), there is a sequence of NNI that transform T into T’.

slide-10
SLIDE 10

2/25/09 10

Nearest Neighbor Interchange

ρNNI(T1, T2) = min{k : There exist β1, . . . , βk such that βk ◦ βk−1 ◦ ... ◦ β1(T1) = T2}

Claim: ρNNI ≤ 2ρ Proof: Every NNI can be obtained by dele3ng an edge and inser3ng an edge.

Nearest Neighbor Interchange

ρNNI(T1, T2) = min{k : There exist β1, . . . , βk such that βk ◦ βk−1 ◦ ... ◦ β1(T1) = T2}

Compu3ng ρNNI for binary trees is NP‐complete (Li and Zhang 1999)

slide-11
SLIDE 11

2/25/09 11

Nearest Neighbor Interchange

Claim: The number of NNI neighbors of a binary tree is 2(n‐3) Proof: (whiteboard)

Neighboring Trees

Parsimony scores for trees NNI neighborhood for trees with 5 leaves

slide-12
SLIDE 12

2/25/09 12

Nearest Neighbor Interchange

Subtree Pruning and Regra{ing (SPR)

http://artedi.ebc.uu.se/course/BioInfo-10p-2001/Phylogeny/Phylogeny-TreeSearch/SPR.gif

  • 1. Remove a branch.
  • 2. Reconnect incident vertex by

subdividing a branch

slide-13
SLIDE 13

2/25/09 13

Subtree Pruning and Regra{ing (SPR)

  • 1. Remove a branch.
  • 2. Reconnect incident vertex by

subdividing a branch

Tree Bisec3on and Reconnec3on (TBR)

  • 1. Remove a branch.
  • 2. Reconnect subtrees by adding new

branch that subdivides branches in both.

slide-14
SLIDE 14

2/25/09 14

Tree Bisec3on and Reconnec3on (TBR)

  • 1. Remove a branch.
  • 2. Reconnect subtrees by adding

new branch that subdivides branches in both.