Alignment of Trees and Directed Acyclic Graphs Gabriel Valiente - - PowerPoint PPT Presentation

alignment of trees and directed acyclic graphs
SMART_READER_LITE
LIVE PREVIEW

Alignment of Trees and Directed Acyclic Graphs Gabriel Valiente - - PowerPoint PPT Presentation

Alignment of Trees and Directed Acyclic Graphs Gabriel Valiente Algorithms, Bioinformatics, Complexity and Formal Methods Research Group Technical University of Catalonia Computational Biology and Bioinformatics Research Group Research


slide-1
SLIDE 1

Alignment of Trees and Directed Acyclic Graphs

Gabriel Valiente

Algorithms, Bioinformatics, Complexity and Formal Methods Research Group Technical University of Catalonia Computational Biology and Bioinformatics Research Group Research Institute of Health Science, University of the Balearic Islands Centre for Genomic Regulation Barcelona Biomedical Research Park

Ben-Gurion University of the Negev, Israel, April 27, 2009

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 1 / 35

slide-2
SLIDE 2

Abstract

It is well known that the string edit distance and the alignment of strings coincide, while the alignment of trees differs from the tree edit distance. In this talk, we recall various constraints on directed acyclic graphs that allow for a unique (up to isomorphism) representation, called the path multiplicity representation, and present a new method for the alignment of trees and directed acyclic graphs that exploits the path multiplicity representation to produce a meaningful optimal alignment in polynomial time.

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 2 / 35

slide-3
SLIDE 3

Plan of the Talk

String edit distance and alignment Tree edit distance and alignment DAG representation of phylogenetic networks Path multiplicity representation DAG alignment Tree alignment as DAG alignment Tool support BioPerl module Web interface to the BioPerl module Conclusion

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 3 / 35

slide-4
SLIDE 4

String edit distance and alignment

Definition

The edit distance between two strings is the smallest number of insertions, deletions, and substitutions needed to transform one string into the other

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 4 / 35

slide-5
SLIDE 5

String edit distance and alignment

Definition

The edit distance between two strings is the smallest number of insertions, deletions, and substitutions needed to transform one string into the other

Definition

An alignment of two strings is an arrangement of the two strings as rows of a matrix, with additional gaps (dashes) between the elements to make some or all of the remaining (aligned) columns contain identical elements but with no column gapped in both strings

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 4 / 35

slide-6
SLIDE 6

String edit distance and alignment

Definition

The edit distance between two strings is the smallest number of insertions, deletions, and substitutions needed to transform one string into the other

Definition

An alignment of two strings is an arrangement of the two strings as rows of a matrix, with additional gaps (dashes) between the elements to make some or all of the remaining (aligned) columns contain identical elements but with no column gapped in both strings

Example (Optimal alignment)

  • GCTTCCGGCTCGTATAATGTGTGG

|||||*|*|| |||||* | TGCTTCTGACT ---ATAATA -G---

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 4 / 35

slide-7
SLIDE 7

Tree edit distance and alignment

Definition

The edit distance between two trees is the smallest number of insertions, deletions, and substitutions needed to transform one tree into the other

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 5 / 35

slide-8
SLIDE 8

Tree edit distance and alignment

Definition

The edit distance between two trees is the smallest number of insertions, deletions, and substitutions needed to transform one tree into the other

Example (Edit distance)

a e d b c a b c d a b f c d

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 5 / 35

slide-9
SLIDE 9

Tree edit distance and alignment

Definition

An alignment of two trees is an arrangement of the trees with space labeled nodes inserted such that their structures coincide

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 6 / 35

slide-10
SLIDE 10

Tree edit distance and alignment

Definition

An alignment of two trees is an arrangement of the trees with space labeled nodes inserted such that their structures coincide

Example (Optimal alignment)

a e d b c a e b c d a f b c d a b f c d

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 6 / 35

slide-11
SLIDE 11

Tree edit distance and alignment

Remark

An alignment of trees is a restricted form of tree edit distance in which all the insertions precede all the deletions

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 7 / 35

slide-12
SLIDE 12

Tree edit distance and alignment

Remark

An alignment of trees is a restricted form of tree edit distance in which all the insertions precede all the deletions

Remark

With insertion cost 1, deletion cost 1, identical substitution cost 0, and non-identical substitution cost 2, an optimal tree edit yields a largest common subtree and an optimal alignment yields a smallest common supertree

  • T. Jiang, L. Wang, and K. Zhang. Alignment of trees—an alternative to tree
  • edit. Theoretical Computer Science, 143(1):137–148, 1995

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 7 / 35

slide-13
SLIDE 13

Tree edit distance and alignment

  • H. Bunke, X. Jiang, and A. Kandel. On the minimum common supergraph of

two graphs. Computing, 65(1):13–25, 2000 M.-L. Fern´ andez and G. Valiente. A graph distance measure combining maximum common subgraph and minimum common supergraph. Pattern Recognition Letters, 22(6–7):753–758, 2001

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 8 / 35

slide-14
SLIDE 14

Tree edit distance and alignment

  • H. Bunke, X. Jiang, and A. Kandel. On the minimum common supergraph of

two graphs. Computing, 65(1):13–25, 2000 M.-L. Fern´ andez and G. Valiente. A graph distance measure combining maximum common subgraph and minimum common supergraph. Pattern Recognition Letters, 22(6–7):753–758, 2001

Theorem

The problems of finding a largest common subtree and a smallest common supertree of two trees, in each case together with a pair of witness (minor, topological, homeomorphic, or isomorphic) embeddings, are reducible to each

  • ther in time linear in the size of the trees
  • F. Rossell´
  • and G. Valiente. An algebraic view of the relation between largest

common subtrees and smallest common supertrees. Theoretical Computer Science, 362(1–3):33–53, 2006

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 8 / 35

slide-15
SLIDE 15

Tree edit distance and alignment

Example

  • A. Lozano, R. Pinter, O. Rokhlenko, G. Valiente, and M. Ziv-Ukelson. Seeded

tree alignment and planar tanglegram layout. In Proc. 7th Workshop on Algorithms in Bioinformatics, volume 4645 of Lecture Notes in Bioinformatics, pages 98–110. Springer, 2007

  • A. Lozano, R. Pinter, O. Rokhlenko, G. Valiente, and M. Ziv-Ukelson. Seeded

tree alignment. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 5(4):503–513, 2008

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 9 / 35

slide-16
SLIDE 16

DAG representation of phylogenetic networks

  • D. H. Huson and D. Bryant. Application of phylogenetic networks in

evolutionary studies. Mol. Biol. Evol., 23(2):254–267, 2006

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 10 / 35

slide-17
SLIDE 17

DAG representation of phylogenetic networks

Definition

A phylogenetic network is a directed acyclic graph whose terminal nodes are labeled by taxa names and whose internal nodes are either tree nodes (if they have only one parent) or hybrid nodes (if they have two or more parents)

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 11 / 35

slide-18
SLIDE 18

DAG representation of phylogenetic networks

Example

44 polymorphic sites in a sample of the single gene encoding for alcohol dehydrogenase in 11 species from 5 natural populations of D. melanogaster CCGCAATAATGGCGCTACTCTCACAATAACCCACTAGACAGCCT Wa-S CCCCAATATGGGCGCTACTTTCACAATAACCCACTAGACAGCCT Fl-1S CCGCAATATGGGCGCTACCCCCCGGAATCTCCACTAAACAGTCA Af-S CCGCAATATGGGCGCTGTCCCCCGGAATCTCCACTAAACTACCT Fr-S CCGAGATAAGTCCGAGGTCCCCCGGAATCTCCACTAGCCAGCCT Fl-2S CCCCAATATGGGCGCGACCCCCCGGAATCTCTATTCACCAGCTT Ja-S CCCCAATATGGGCGCGACCCCCCGGAATCTGTCTCCGCCAGCCT Fl-F TGCAGATAAGTCGGCGACCCCCCGGAATCTGTCTCCGCGAGCCT Fr-F TGCAGATAAGTCGGCGACCCCCCGGAATCTGTCTCCGCGAGCCT Wa-F TGCAGATAAGTCGGCGACCCCCCGGAATCTGTCTCCGCGAGCCT Af-F TGCAGGGGAGGGCTCGACCCCACGGGATCTGTCTCCGCCAGCCT Ja-F

  • M. Kreitman. Nucleotide polymorphism at the alcohol dehydrogenase locus of

Drosophila melanogaster. Nature, 304(5925):412–417, 1983

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 12 / 35

slide-19
SLIDE 19

DAG representation of phylogenetic networks

Example

Fl-F Ja-S Fl-1S Fr-S Af-S Wa-S Fl-2S Wa-F Fr-F Af-F Ja-F

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 13 / 35

slide-20
SLIDE 20

DAG representation of phylogenetic networks

Example

Fl-1S Af-S Wa-S Fr-S Fl-2S Ja-S Af-F Wa-F Fr-F Ja-F Fl-F

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 14 / 35

slide-21
SLIDE 21

DAG representation of phylogenetic networks

Definition

A phylogenetic network is called tree-sibling if every hybrid node has at least

  • ne sibling that is a tree node

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 15 / 35

slide-22
SLIDE 22

DAG representation of phylogenetic networks

Definition

A phylogenetic network is called tree-sibling if every hybrid node has at least

  • ne sibling that is a tree node

Remark

The biological meaning of the tree-sibling condition is that in each of the recombination or hybridization processes, at least one of the species involved in them also has some descendant through mutation

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 15 / 35

slide-23
SLIDE 23

DAG representation of phylogenetic networks

Definition

A phylogenetic network is called tree-child if every internal node has at least

  • ne child that is a tree node

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 16 / 35

slide-24
SLIDE 24

DAG representation of phylogenetic networks

Definition

A phylogenetic network is called tree-child if every internal node has at least

  • ne child that is a tree node

Remark

The biological meaning of the tree-child condition is that every non-extant species has some descendant through mutation

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 16 / 35

slide-25
SLIDE 25

DAG representation of phylogenetic networks

Definition

A phylogenetic network is time-consistent if there is a temporal representation

  • f the network, that is, an assignment of times to the nodes of the network that

strictly increases on tree edges (those edges whose head is a tree node) and remains the same on hybrid edges (whose head is a hybrid node)

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 17 / 35

slide-26
SLIDE 26

DAG representation of phylogenetic networks

Definition

A phylogenetic network is time-consistent if there is a temporal representation

  • f the network, that is, an assignment of times to the nodes of the network that

strictly increases on tree edges (those edges whose head is a tree node) and remains the same on hybrid edges (whose head is a hybrid node)

Remark

The biological meaning of a temporal assignment is the time when certain species exist or when certain hybridization processes occur, because for these processes to take place, the species involved must coexist in time

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 17 / 35

slide-27
SLIDE 27

DAG representation of phylogenetic networks

Example (Time consistency)

1 5 2 3 4

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 18 / 35

slide-28
SLIDE 28

DAG representation of phylogenetic networks

phylogenetic networks tree-sibling tree-child galled-trees phylogenetic trees not time- consistent

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 19 / 35

slide-29
SLIDE 29

DAG representation of phylogenetic networks

Number of phylogenetic trees, galled-trees, tree-child, and tree-sibling networks

ρ = 0

2000 2000 2000 2000

ρ = 1

173 638 1023 1616

ρ = 2

27 152 331 983

ρ = 4

1 22 48 252

ρ = 8

1 1 16

  • M. Arenas, G. Valiente, and D. Posada. Characterization of phylogenetic

reticulate networks based on the coalescent with recombination. Molecular Biology and Evolution, 25(12):2517–2520, 2008

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 20 / 35

slide-30
SLIDE 30

Path multiplicity representation

Definition

The µ-representation of a tree-child phylogenetic network is the multiset of

µ-vectors µ(u) of path-to-leaf multiplicities for each node u

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 21 / 35

slide-31
SLIDE 31

Path multiplicity representation

Definition

The µ-representation of a tree-child phylogenetic network is the multiset of

µ-vectors µ(u) of path-to-leaf multiplicities for each node u Example

1000 0100 0010 0001 0010 0110 0110 0111 1110 0121 1231

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 21 / 35

slide-32
SLIDE 32

Path multiplicity representation

Definition

The µ-representation of a tree-child phylogenetic network is the multiset of

µ-vectors µ(u) of path-to-leaf multiplicities for each node u Example

1000 0100 0010 0001 0010 0110 0110 0111 1110 0121 1231

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 21 / 35

slide-33
SLIDE 33

Path multiplicity representation

Definition

The µ-representation of a tree-child phylogenetic network is the multiset of

µ-vectors µ(u) of path-to-leaf multiplicities for each node u Example

1000 0100 0010 0001 0010 0110 0110 0111 1110 0121 1231

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 21 / 35

slide-34
SLIDE 34

Path multiplicity representation

Definition

The µ-representation of a tree-child phylogenetic network is the multiset of

µ-vectors µ(u) of path-to-leaf multiplicities for each node u Example

1000 0100 0010 0001 0010 0110 0110 0111 1110 0121 1231

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 21 / 35

slide-35
SLIDE 35

Path multiplicity representation

Definition

The µ-representation of a tree-child phylogenetic network is the multiset of

µ-vectors µ(u) of path-to-leaf multiplicities for each node u Example

1000 0100 0010 0001 0010 0110 0110 0111 1110 0121 1231

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 21 / 35

slide-36
SLIDE 36

Path multiplicity representation

Lemma

The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 22 / 35

slide-37
SLIDE 37

Path multiplicity representation

Lemma

The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time

Example

slide-38
SLIDE 38

Path multiplicity representation

Lemma

The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time

Example

1000 0100 0010 0001

slide-39
SLIDE 39

Path multiplicity representation

Lemma

The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time

Example

1000 0100 0010 0001 0010

slide-40
SLIDE 40

Path multiplicity representation

Lemma

The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time

Example

1000 0100 0010 0001 0010 0110

slide-41
SLIDE 41

Path multiplicity representation

Lemma

The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time

Example

1000 0100 0010 0001 0010 0110 0110

slide-42
SLIDE 42

Path multiplicity representation

Lemma

The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time

Example

1000 0100 0010 0001 0010 0110 0110 0111

slide-43
SLIDE 43

Path multiplicity representation

Lemma

The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time

Example

1000 0100 0010 0001 0010 0110 0110 0111 1110

slide-44
SLIDE 44

Path multiplicity representation

Lemma

The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time

Example

1000 0100 0010 0001 0010 0110 0110 0111 1110 0121

slide-45
SLIDE 45

Path multiplicity representation

Lemma

The µ-representation of a tree-child phylogenetic network can be obtained in polynomial time

Example

1000 0100 0010 0001 0010 0110 0110 0111 1110 0121 1231

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 22 / 35

slide-46
SLIDE 46

Path multiplicity representation

Theorem

Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 23 / 35

slide-47
SLIDE 47

Path multiplicity representation

Theorem

Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation

Example

1231 1110 1000 0121 0111 0110 0110 0100 0010 0010 0001 1000 0100 0010 0001 0010 0110 0110 0111 1110 0121 1231

slide-48
SLIDE 48

Path multiplicity representation

Theorem

Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation

Example

1231 1110 1000 0121 0111 0110 0110 0100 0010 0010 0001 1000 0100 0010 0001 0010 0110 0110 0111 1110 0121 1231

slide-49
SLIDE 49

Path multiplicity representation

Theorem

Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation

Example

1231 1110 1000 0121 0111 0110 0110 0100 0010 0010 0001 1000 0100 0010 0001 0010 0110 0110 0111 1110 0121 1231

slide-50
SLIDE 50

Path multiplicity representation

Theorem

Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation

Example

1231 1110 1000 0121 0111 0110 0110 0100 0010 0010 0001 1000 0100 0010 0001 0010 0110 0110 0111 1110 0121 1231

slide-51
SLIDE 51

Path multiplicity representation

Theorem

Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation

Example

1231 1110 1000 0121 0111 0110 0110 0100 0010 0010 0001 1000 0100 0010 0001 0010 0110 0110 0111 1110 0121 1231

slide-52
SLIDE 52

Path multiplicity representation

Theorem

Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation

Example

1231 1110 1000 0121 0111 0110 0110 0100 0010 0010 0001 1000 0100 0010 0001 0010 0110 0110 0111 1110 0121 1231

slide-53
SLIDE 53

Path multiplicity representation

Theorem

Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation

Example

1231 1110 1000 0121 0111 0110 0110 0100 0010 0010 0001 1000 0100 0010 0001 0010 0110 0110 0111 1110 0121 1231

slide-54
SLIDE 54

Path multiplicity representation

Theorem

Two tree-child phylogenetic networks are isomorphic if and only if they have the same µ-representation

Example

1231 1110 1000 0121 0111 0110 0110 0100 0010 0010 0001 1000 0100 0010 0001 0010 0110 0110 0111 1110 0121 1231

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 23 / 35

slide-55
SLIDE 55

Path multiplicity representation

Definition

The µ-distance between two two tree-child phylogenetic networks N and N′ is dµ(N,N′) = |µ(N)△µ(N′)|

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 24 / 35

slide-56
SLIDE 56

Path multiplicity representation

Definition

The µ-distance between two two tree-child phylogenetic networks N and N′ is dµ(N,N′) = |µ(N)△µ(N′)|

Example (dµ(N,N′) = 6)

1000 0100 0010 0001 0010 0110 0110 0111 1110 0121 1231 N 1000 0100 0010 0001 0010 0110 0011 1110 1121 N′

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 24 / 35

slide-57
SLIDE 57

Path multiplicity representation

Theorem

The µ-distance induces a metric on the space of tree-child phylogenetic networks that generalizes the bipartition distance for phylogenetic trees

  • G. Cardona, F. Rossell´
  • , and G. Valiente. Comparison of tree-child

phylogenetic networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2009

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 25 / 35

slide-58
SLIDE 58

Path multiplicity representation

Theorem

The µ-distance induces a metric on the space of tree-child phylogenetic networks that generalizes the bipartition distance for phylogenetic trees

  • G. Cardona, F. Rossell´
  • , and G. Valiente. Comparison of tree-child

phylogenetic networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2009

Theorem

The µ-distance induces a metric on the space of semi-binary tree-sibling time consistent phylogenetic networks that generalizes the bipartition distance for phylogenetic trees

  • G. Cardona, M. Llabr´

es, F. Rossell´

  • , and G. Valiente. A distance metric for a

class of tree-sibling phylogenetic networks. Bioinformatics, 24(13):1481–1488, 2008

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 25 / 35

slide-59
SLIDE 59

DAG alignment

Definition

For every v ∈ V and v′ ∈ V ′ of two phylogenetic networks N = (V,E) and N′ = (V ′,E′), let m(v,v′)

= |µ(v)−µ(v′)| χ(v,v′) =

  • if v,v′ are both tree nodes or both hybrid

1

  • therwise

The weight of the pair (v,v′) is w(v,v′) = m(v,v′)+ χ(v,v′) 2n The total weight of a matching M : V → V ′ is w(M) = ∑

v∈V

w(v,M(v))

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 26 / 35

slide-60
SLIDE 60

DAG alignment

Definition

An optimal alignment between two phylogenetic networks is a matching with the smallest total weight among all possible matchings

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 27 / 35

slide-61
SLIDE 61

DAG alignment

Definition

An optimal alignment between two phylogenetic networks is a matching with the smallest total weight among all possible matchings

Lemma

A matching between two phylogenetic networks N = (V,E) and N′ = (V ′,E′) is an optimal alignment if and only if it minimizes the sum

v∈V

m(v,M(v)) and, among those matchings minimizing this sum, it maximizes the number of nodes that are sent to nodes of the same type

  • G. Cardona, F. Rossell´
  • , and G. Valiente. Comparison of tree-child

phylogenetic networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2009

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 27 / 35

slide-62
SLIDE 62

DAG alignment

Example (Optimal alignment of two phylogenetic networks)

r a b A c d e B 1 2 3 4 5 r′ u v x X y z Y 1 2 3 4 5

r

(1,1,2,3,1)

b

(0,0,1,2,1)

a

(1,1,1,1,0)

A

(0,0,1,1,0)

c

(1,1,0,0,0)

d

(0,0,1,1,0)

e

(0,0,0,1,1)

B

(0,0,0,1,0)

r′

(1,2,1,2,1)

u

(1,1,0,0,0)

v

(0,1,1,2,1)

x

(0,1,1,1,0)

y

(0,0,1,1,0)

z

(0,0,0,1,1)

X

(0,1,0,0,0)

Y

(0,0,0,1,0)

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 28 / 35

slide-63
SLIDE 63

DAG alignment

Example (Optimal alignment of two phylogenetic networks)

r′ u v x y z X Y r 3 6 3 5 6 6 7.1 7.1 b 3 6 1 3 2 2 5.1 3.1 a 3 2 3 1 2 4 3.1 3.1 A 5.1 4.1 3.1 1.1 0.1 2.1 3 1 c 5 5 3 4 4 1.1 3.1 d 5 4 3 1 2 3.1 1.1 e 5 4 3 3 2 3.1 1.1 B 6.1 3.1 4.1 2.1 1.1 1.1 2

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 29 / 35

slide-64
SLIDE 64

DAG alignment

Example (Optimal alignment of two phylogenetic networks)

r a b A c d e B 1 2 3 4 5 r′ u v x X y z Y 1 2 3 4 5 3 1 1 3

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 30 / 35

slide-65
SLIDE 65

Tree alignment as DAG alignment

Remark

If we restrict this alignment method to phylogenetic trees, the weight of a pair of nodes (v1,v2) is simply |CL(v1)△ CL(v2)|. This can be seen as an unnormalized version of the score used in TreeJuxtaposer

  • T. Munzner, F. Guimbreti`

ere, S. Tasiran, L. Zhang, and Y. Zhou. TreeJuxtaposer: Scalable tree comparison using focus+context with guaranteed visibility. ACM T. Graphics, 22(3):453–462, 2003

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 31 / 35

slide-66
SLIDE 66

Tree alignment as DAG alignment

Example (Optimal alignment of two phylogenetic trees)

1 2 3 4 5 1 2 3 4 5 2 4 4 00011 00111 01111 11000 4 5 4 11100 5 4 3 11110 4 3 2 00011 00111 01111 11000 0/4 0/5 1/5 11100 0/5 1/5 2/5 11110 1/5 2/5 3/5

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 32 / 35

slide-67
SLIDE 67

Tool support BioPerl module

Bio::PhyloNetwork

The Perl module Bio::PhyloNetwork implements all the data structures needed to work with phylogenetic networks, as well as algorithms for reconstructing a network from its eNewick string reconstructing a network from its µ-representation exploding a network into the set of its induced subtrees computing the µ-representation of a network computing the µ-distance between two networks computing an optimal alignment between two networks computing the set of tripartitions of a network computing the tripartition error between two networks testing if a network is time consistent computing a temporal representation of a time-consistent network

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 33 / 35

slide-68
SLIDE 68

Tool support Web interface to the BioPerl module

The web interface at http://dmi.uib.es/˜gcardona/BioInfo/alignment.php allows the user to input one or two phylogenetic networks, given by their eNewick

  • strings. A Perl script processes these strings and uses the

Bio::PhyloNetwork package to compute all available data for them, including

a plot of the networks that can be downloaded in PS format; these plots are generated through the application GraphViz and its companion Perl package

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 34 / 35

slide-69
SLIDE 69

Tool support Web interface to the BioPerl module

The web interface at http://dmi.uib.es/˜gcardona/BioInfo/alignment.php allows the user to input one or two phylogenetic networks, given by their eNewick

  • strings. A Perl script processes these strings and uses the

Bio::PhyloNetwork package to compute all available data for them, including

a plot of the networks that can be downloaded in PS format; these plots are generated through the application GraphViz and its companion Perl package Given two networks on the same set of leaves, their µ-distance is also computed, as well as an optimal alignment between them. If their sets of leaves are not the same, their topological restriction on the set of common leaves is first computed followed by the µ-distance and an optimal alignment

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 34 / 35

slide-70
SLIDE 70

Tool support Web interface to the BioPerl module

The web interface at http://dmi.uib.es/˜gcardona/BioInfo/alignment.php allows the user to input one or two phylogenetic networks, given by their eNewick

  • strings. A Perl script processes these strings and uses the

Bio::PhyloNetwork package to compute all available data for them, including

a plot of the networks that can be downloaded in PS format; these plots are generated through the application GraphViz and its companion Perl package Given two networks on the same set of leaves, their µ-distance is also computed, as well as an optimal alignment between them. If their sets of leaves are not the same, their topological restriction on the set of common leaves is first computed followed by the µ-distance and an optimal alignment A Java applet displays the networks side by side, and whenever a node is selected, the corresponding node in the other network (with respect to the

  • ptimal alignment) is highlighted, if it exists. This is also extended to edges.

Similarities and differences between the networks are thus evident at a glance

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 34 / 35

slide-71
SLIDE 71

Conclusion

String edit distance and alignment of strings coincide, but alignment of trees differs from tree edit distance and alignment of graphs differs from graph edit distance

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 35 / 35

slide-72
SLIDE 72

Conclusion

String edit distance and alignment of strings coincide, but alignment of trees differs from tree edit distance and alignment of graphs differs from graph edit distance The alignment of trees and directed acyclic graphs based on the path multiplicity representation can be computed in polynomial time

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 35 / 35

slide-73
SLIDE 73

Conclusion

String edit distance and alignment of strings coincide, but alignment of trees differs from tree edit distance and alignment of graphs differs from graph edit distance The alignment of trees and directed acyclic graphs based on the path multiplicity representation can be computed in polynomial time The alignment method can be applied to any directed acyclic graphs with the same set of terminal node labels

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 35 / 35

slide-74
SLIDE 74

Conclusion

String edit distance and alignment of strings coincide, but alignment of trees differs from tree edit distance and alignment of graphs differs from graph edit distance The alignment of trees and directed acyclic graphs based on the path multiplicity representation can be computed in polynomial time The alignment method can be applied to any directed acyclic graphs with the same set of terminal node labels

  • G. Valiente. Algorithms on Trees and Graphs. Springer, 2002

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 35 / 35

slide-75
SLIDE 75

Conclusion

String edit distance and alignment of strings coincide, but alignment of trees differs from tree edit distance and alignment of graphs differs from graph edit distance The alignment of trees and directed acyclic graphs based on the path multiplicity representation can be computed in polynomial time The alignment method can be applied to any directed acyclic graphs with the same set of terminal node labels

  • G. Valiente. Algorithms on Trees and Graphs. Springer, 2002
  • G. Valiente. Combinatorial Pattern Matching Algorithms in Computational

Biology using Perl and R. Taylor & Francis/CRC Press, 2009

Gabriel Valiente (UPC) Alignment of Directed Acyclic Graphs BGU 2009 35 / 35