Phylogenetic Trees in ACL2 Warren A. Hunt Jr. and Serita M. Nelesen - - PowerPoint PPT Presentation

phylogenetic trees in acl2
SMART_READER_LITE
LIVE PREVIEW

Phylogenetic Trees in ACL2 Warren A. Hunt Jr. and Serita M. Nelesen - - PowerPoint PPT Presentation

Phylogenetic Trees in ACL2 Warren A. Hunt Jr. and Serita M. Nelesen The University of Texas at Austin Phylogenetic Trees in ACL2 p.1/14 Phylogenetic Trees Representation of the evolutionary relationship between species Very Long Ago


slide-1
SLIDE 1

Phylogenetic Trees in ACL2

Warren A. Hunt Jr. and Serita M. Nelesen The University of Texas at Austin

Phylogenetic Trees in ACL2 – p.1/14

slide-2
SLIDE 2

Phylogenetic Trees

Representation of the evolutionary relationship between species

Present

Long Ago Very Long Ago

Phylogenetic Trees in ACL2 – p.2/14

slide-3
SLIDE 3

From Organisms to Trees

DNA Sequencing Multiple Sequence Alignment Maximum Parsimony Search E F G A B D A B D F E G A B D G E F Set of Optimal Trees Consensus Analysis E F G A B D Consensus Tree Ape: ACCGTAGCTT Ape : ACCGTAGCTT Bear: ATAGTAACT Dog: CCGTATTT Emu: CGCATAGC Frog: CCTAAAC Bear: ATAGTAACT− Dog : −CCGTA−TTT Emu : CGCATAGC−− Frog: C−C−TA−AAC Goat: GTAATAGAAC Goat: GTAATAGAAC A Set of Taxa Unaligned Sequences Aligned Sequences

Phylogenetic Trees in ACL2 – p.3/14

slide-4
SLIDE 4

Lots and lots of trees

Number of possible trees grows exponentially with the number of leaves in the tree Two main methods used to determine the correct tree A heuristic search through tree space A Bayesian estimation of phylogeny using Markov chain Monte Carlo Both of these methods may produce hundreds, or thousands of trees which are then the input to further processing

Phylogenetic Trees in ACL2 – p.4/14

slide-5
SLIDE 5

Lots and lots of trees

Number of possible trees grows exponentially with the number of leaves in the tree Two main methods used to determine the correct tree A heuristic search through tree space A Bayesian estimation of phylogeny using Markov chain Monte Carlo Both of these methods may produce hundreds, or thousands of trees which are then the input to further processing Need a system to store these trees efficiently, and perform post-tree analysis.

Phylogenetic Trees in ACL2 – p.4/14

slide-6
SLIDE 6

Why Use ACL2?

Standard answer: Accuracy Explicit specification of input and output for all functions together with proof that the specification is met within the code (guards) Two representations of trees, with proof that we can accurately move from one representation to the

  • ther and back

Additional answers: Storage space and performance speed Hash-consing gives greatly reduced storage space Memoization gives improved performance speed Overall: Medical systems of the future

Phylogenetic Trees in ACL2 – p.5/14

slide-7
SLIDE 7

Representation

A B E F G C D A B C D E F G A B C D E F G A B C D E F G A B C D E F G

TASPI High-Level Representation: (((A B) C) ((D E) (F G))) (((A B) C) ((D E) F G)) (((A B) C) (D (E (F G)))) ((A (B C)) ((D E F) G)) ((A (B C)) ((D E) (F G)))

Phylogenetic Trees in ACL2 – p.6/14

slide-8
SLIDE 8

Representation

A B E F G C D A B C D E F G A B C D E F G A B C D E F G A B C D E F G

TASPI Low-Level Representation: ((#1=((A B) C) #5=(#6=(D E) #9=(F G))) (#1#(#6# F G)) (#1#(D (E #9#))) (#12=(A (B C)) ((D E F) G)) (#12##5#))

Phylogenetic Trees in ACL2 – p.6/14

slide-9
SLIDE 9

Reduced Storage Space

1 2 3 4 5 6 7 8 9 10 11 12

Data Set Number

10K 100K 1M 10M 100M 1G

Size (bytes)

Newick TASPI.bhz

Phylogenetic Trees in ACL2 – p.7/14

slide-10
SLIDE 10

Bipartition Representation

A B C F D E A A B B D D E E F F C C

Phylogenetic Trees in ACL2 – p.8/14

slide-11
SLIDE 11

Bipartition Representation

A B C F D E A A B B D D E E F F C C

Parenthetical Notation: (A B (C ((D E) F))) (A (B ((D E) F)) C) (A B ((C (D E)) F))

Phylogenetic Trees in ACL2 – p.8/14

slide-12
SLIDE 12

Bipartition Representation

A B C F D E A A B B D D E E F F C C

Parenthetical Notation: (A B (C ((D E) F))) (A (B ((D E) F)) C) (A B ((C (D E)) F)) Bipartition Representation: AB | CDEF AC | BDEF AB | CDEF ABC | DEF ABC | DEF ABF | CDE ABCF | DE ABCF | DE ABCF | DE

Phylogenetic Trees in ACL2 – p.8/14

slide-13
SLIDE 13

Bipartition Representation

A B C F D E A A B B D D E E F F C C

Parenthetical Notation: (A B (C ((D E) F))) (A (B ((D E) F)) C) (A B ((C (D E)) F)) Bipartition Representation: AB | CDEF AC | BDEF AB | CDEF ABC | DEF ABC | DEF ABF | CDE ABCF | DE ABCF | DE ABCF | DE Our Bipartitions: (A B C D E F) (A B C D E F) (A B C D E F) (C D E F) (B D E F) (C D E F) (D E F) (D E F) (C D E) (D E) (D E) (D E)

Phylogenetic Trees in ACL2 – p.8/14

slide-14
SLIDE 14

Relationship of Representations

(defthm paren-partition-paren (implies (and

<properties of input tree > <properties of ordering > <properties of tree and ordering >)

(equal (tree-from-fringes (get-fringes tree

  • rdering)
  • rdering)

tree)))

Phylogenetic Trees in ACL2 – p.9/14

slide-15
SLIDE 15

Strict and Majority Consensus

Strict consensus : Any branch that appears in every input tree is in the consensus tree Majority consensus : Any branch that appears in more than half of the input trees is in the consensus tree

Phylogenetic Trees in ACL2 – p.10/14

slide-16
SLIDE 16

Example

A B C F D E A A B B D D E E F F C C

Phylogenetic Trees in ACL2 – p.11/14

slide-17
SLIDE 17

Example

A B C F D E A A B B D D E E F F C C

Majority

A B D E C F

Strict

D E B A C F

Phylogenetic Trees in ACL2 – p.11/14

slide-18
SLIDE 18

Improved Consensus Performance

1 2 3 4 5 6 7 8 9 10 11 12

Data Set Number

200 400 600 800 1000

Time (secs)

PAUP total TNT total TASPI total TASPI.bhz total 2000 6000 10000

Phylogenetic Trees in ACL2 – p.12/14

slide-19
SLIDE 19

Conclusion and Future Work

TASPI provides accuracy guarantees, while providing state of the art performance in terms of size and speed TASPI is being extended to perform further post-tree analyses, as well as database operations

Phylogenetic Trees in ACL2 – p.13/14

slide-20
SLIDE 20

Questions?

Phylogenetic Trees in ACL2 – p.14/14