Algorithm Summary Method Input Output Neighbor Joining Distance - - PDF document

algorithm summary
SMART_READER_LITE
LIVE PREVIEW

Algorithm Summary Method Input Output Neighbor Joining Distance - - PDF document

3/10/09 CSCI1950Z Computa3onal Methods for Biology Lecture 11 Ben Raphael March 2, 2009 hFp://cs.brown.edu/courses/csci1950z/ Algorithm Summary Method Input Output Neighbor Joining Distance matrix D T, B Distance based UPGMA


slide-1
SLIDE 1

3/10/09 1

CSCI1950‐Z Computa3onal Methods for Biology Lecture 11

Ben Raphael March 2, 2009

hFp://cs.brown.edu/courses/csci1950‐z/

Algorithm Summary

Method Input Output Neighbor Joining Distance matrix D T, B UPGMA Distance matrix D T, B Sankoff’s & Fitch’s Alg. Characters, T A, B Perfect Phylogeny Characters A, B, T Felsenstein Characters, T, B A T = tree topology B = branch lengths A = ancestral states Distance based Probabilis3c (Likelihood) Parsimony Compa3bility

Heuris3c search methods used to find T, B in parsimony and likelihood.

slide-2
SLIDE 2

3/10/09 2

Using Mul3ple Methods

  • Reliance on purely one method or dataset for

phylogene3c analysis o_en provides incomplete picture.

  • If different methods (parsimony, distance‐

based, etc.) applied to same/different datasets give same result, greater confidence that this is correct answer.

  • Consensus or supertree methods can be used

to combine this evidence.

Phylogeny of Insects

Build phylogeny of winged and wingless s3ck insects Used data from: 18S ribosomal DNA (~1,900 base pairs (bp)) 28S rDNA (2,250 bp) Por3on of histone 3 (H3, 372 bp) Used mul3ple tree reconstruc3on techniques (Nature 2003)

slide-3
SLIDE 3

3/10/09 3

Further Problems…

Contradictory answers some3mes not a fault of data, but from overly simplis3c assump3ons about evolu3onary process.

  • No homoplasy: characters change state only
  • nce.
  • Independence of characters.
  • Modeling muta3ons in DNA.
  • Genes/genomes evolve only by single leFer

muta3ons.

Biology 101

slide-4
SLIDE 4

3/10/09 4

Cell Division and Muta3on

Single nucleo3de change

Copy number Structural

Whole‐Genome Phylogeny

Finding same gene (descended from common ancestor) is non‐trivial.

slide-5
SLIDE 5

3/10/09 5

Phylogeny of Insects

Build phylogeny of winged and wingless s3ck insects Used data from: 18S ribosomal DNA (~1,900 base pairs (bp)) 28S rDNA (2,250 bp) Por3on of histone 3 (H3, 372 bp) Used mul3ple tree reconstruc3on techniques (Nature 2003) These genes used because they are assumed to be highly conserved across large evolu3onary distances.

Outline

Whole Genome Phylogeny

  • Gene Trees vs. Species Trees
  • Reconciling Trees
  • Genome Rearrangements

Genome sequencing is now rou3ne. Thus, data for these methods is increasingly available/ useful.

slide-6
SLIDE 6

3/10/09 6

Gene Trees vs. Species Trees

These trees indicate different phylogene3c rela3onships. One of them is wrong???

Gene Clusters/Families

Gene duplica3on is a common mechanism for evolu3on of new gene

  • func3on. (Ohno 1970)
slide-7
SLIDE 7

3/10/09 7

Gene Trees and Species Trees

Evolu3on of gene family inside species tree. Duplica3ons and losses occur.

Gene Trees and Species Trees

Hypothe3cal duplica3ons explain discrepancy between gene and species trees.

slide-8
SLIDE 8

3/10/09 8

Gene Trees and Species Trees

Duplica3ons are observed. Do not know which copies of gene descended from common ancestor.

Evolu3on of Gene Tree Inside Species Tree

Three events:

  • 1. Specia3on
  • 2. Loss
  • 3. Duplica3on
slide-9
SLIDE 9

3/10/09 9

Orthologs vs. Paralogs

Three events:

  • 1. Specia3on

Orthologs: genes descended from a common ancestor.

  • 2. Loss
  • 3. Duplica3on

Paralogs: genes related by duplica3on.

Dis3nguishing orthologs from paralogs is difficult! Sequence similarity is not enough.

Gene‐Species Tree Reconcilia3on

Given: Rooted binary tree TG and rooted binary tree TS. Find: Embedding of TG in TS that minimizes number of duplica3ons (and losses). Embedded tree is called a reconciled tree (Goodman et al. 1979).

slide-10
SLIDE 10

3/10/09 10

Reconcilia3on Example Reconcilia3on Example

slide-11
SLIDE 11

3/10/09 11

Reconcilia3on Algorithm

Zmasek and Eddy (2001) M(g) := λG,T(g)

Run Time analysis

n = # leaves in TG Ini3aliza3on: O(n): number nodes of TS

O(n): label external nodes (using hash‐table)

Reconcilia3on Algorithm

O(n2) O(n) O(n log n)

O(n2) worst case. Using algorithms to compute LCA in O(1) 3me gives O(n) algorithm (Zhang 1997, Chen

  • et. al 2001)
slide-12
SLIDE 12

3/10/09 12

Gene Trees and Species Trees Extensions

  • 1. Species tree TS unknown.

– Use minimum duplica3on/loss as objec3ve func3on to search tree space. – NP‐hard (Ma et al. 1998) – Heuris3c search (NNI, SPR, TBR, etc.)

  • 2. Mul3ple gene trees TG1, TG2, …, TGN

Minimize: Where c(TGi, S) = # duplic./losses on reconciled tree for TGi.

N

  • i=1

c(TGi, S)

slide-13
SLIDE 13

3/10/09 13

Roo3ng By Duplica3on

1 duplica3on 3 duplica3ons

  • Gene trees o_en

unrooted.

  • Root determined using
  • utgroup:

species known to be distantly related to all remaining.

  • Duplica3ons can be used

to determine outgroup.

Roo3ng By Duplica3on

Tree of life: Three major branches: bacteria, archaea, eukaryotes. No outgroup!