An Approximate Approach for Solving the Balanced Minimum Evolution - - PowerPoint PPT Presentation

an approximate approach for solving the balanced minimum
SMART_READER_LITE
LIVE PREVIEW

An Approximate Approach for Solving the Balanced Minimum Evolution - - PowerPoint PPT Presentation

An Approximate Approach for Solving the Balanced Minimum Evolution Problem A. Aringhieri * , C. Braghin * and D. Catanzaro *Dipartimento di Tecnologie dellInformazione - University of Milan - Italy Service Graphes et Optimisation


slide-1
SLIDE 1

An Approximate Approach for Solving the Balanced Minimum Evolution Problem

  • A. Aringhieri*, C. Braghin* and D. Catanzaro†

*Dipartimento di Tecnologie dell’Informazione - University of Milan - Italy

†Service Graphes et Optimisation Mathématique (G.O.M.) - Université Libre de Bruxelles - Belgium 1

slide-2
SLIDE 2

From phylogenetics to molecular phylogenetics

  • A. Aringhieri*, C. Braghin* and D. Catanzaro†

An approximate approach for solving the balanced minimum evolution problem

2

slide-3
SLIDE 3

From phylogenetics to molecular phylogenetics

  • A. Aringhieri*, C. Braghin* and D. Catanzaro†

An approximate approach for solving the balanced minimum evolution problem

3

slide-4
SLIDE 4

HIV-1 phylogeny

  • A. Aringhieri*, C. Braghin* and D. Catanzaro†

An approximate approach for solving the balanced minimum evolution problem

4

slide-5
SLIDE 5

HIV-1 phylogeny

  • A. Aringhieri*, C. Braghin* and D. Catanzaro†

An approximate approach for solving the balanced minimum evolution problem

4

slide-6
SLIDE 6

Applications

medical research - epidemiology population dynamics - drug discovery

5

slide-7
SLIDE 7

Phylogenetics

  • A. Aringhieri*, C. Braghin* and D. Catanzaro†

An approximate approach for solving the balanced minimum evolution problem

6

slide-8
SLIDE 8

Phylogenies

Species

Molecular Sequence

Macaca (A)

AAGCTTCATAGGAGCAACCATTCTAATAATCGCACATGGCCTTACATCATCC

Homo sapiens (B)

AAGCTTCACCGGCGCAGTCATTCTCATAATCGCCCACGGGCTTACATCCTCA

Pan (C)

AAGCTTCACCGGCGCAATTATCCTCATAATCGCCCACGGACTTACATCCTCA

Gorilla (D)

AAGCTTCACCGGCGCAGTTGTTCTTATAATTGCCCACGGACTTACATCATCA

Pongo (E)

AAGCTTCACCGGCGCAACCACCCTCATGATTGCCCATGGACTCACATCCTCC

A B C D E wa wb wc wd we w1 2 1 3 w2

  • A. Aringhieri*, C. Braghin* and D. Catanzaro†

An approximate approach for solving the balanced minimum evolution problem

7

slide-9
SLIDE 9

Phylogenetic estimation criteria

ME 1969 Parsimony Model-based criteria Maximum Likelihood Bayesian estimation 1957 1963 1967 1981 2001

  • A. Aringhieri*, C. Braghin* and D. Catanzaro†

An approximate approach for solving the balanced minimum evolution problem

8

slide-10
SLIDE 10

Phylogenetic estimation criteria

ME

Ordinary Least Squares (OLS) Weighted Least Squares (WLS) Generalized Least Squares (GLS) Balanced Minimum Evolution (BME)

2004

  • A. Aringhieri*, C. Braghin* and D. Catanzaro†

An approximate approach for solving the balanced minimum evolution problem

8

slide-11
SLIDE 11

The Minimum Evolution (ME) criterion of phylogenetic estimation

In absence of convergent or divergent evolution, evolution of well conserved molecular sequences can be approximated over time by means of local minimum paths. Local instead of global minimum because of: The neighborhood of possible allele that are selected at each instant of the life of a species is finite. The selective pressure may be not constant over time. The dimension of a population may variate from species to species (different influence on the fitness).

  • A. Aringhieri*, C. Braghin* and D. Catanzaro†

An approximate approach for solving the balanced minimum evolution problem

9

slide-12
SLIDE 12

Fundamentals of the balanced minimum evolution criterion

A minimal length phylogeny provides a lower bound on the overall amount of mutation events

  • ccurred along evolution of the set of species analyzed.

The balanced minimum evolution criterion is a variation of ME in which the length of a phylogeny is computed as: j i wb we w1

2 1 3

w2

  • A. Aringhieri*, C. Braghin* and D. Catanzaro†

An approximate approach for solving the balanced minimum evolution problem

10

slide-13
SLIDE 13

Combinatorial interpretation of BME

The phylogeny length under BME is equivalent to the average of the circular orders associated to a given phylogeny.

sum of the edge weight belonging to the path from leaves xi to xi+1

  • A. Aringhieri*, C. Braghin* and D. Catanzaro†

An approximate approach for solving the balanced minimum evolution problem

11

slide-14
SLIDE 14

Complexity of BME

The problem of finding a phylogeny which satisfies the balanced minimum evolution criterion is known as Balanced Minimum Evolution Problem (BME) and consists of minimizing the function

  • A. Aringhieri*, C. Braghin* and D. Catanzaro†

An approximate approach for solving the balanced minimum evolution problem

with the constraint that {τij} form a phylogeny. BME is in P if However in the most general case the complexity of BME is unknown.

12

slide-15
SLIDE 15

A possible approach to solution

  • A. Aringhieri*, C. Braghin* and D. Catanzaro†

An approximate approach for solving the balanced minimum evolution problem

13

slide-16
SLIDE 16

A possible approach to solution

Leaves NI-Shapes Shapes 3 1 3 4 1 15 5 1 105 6 2 945 7 2 10395 8 3 135135 9 4 2027025 10 11 34459425 15 265 1,00E+13 20 11020 1,00E+22 30 14502229 1,00E+39 40 11077270355 1,00E+58

The total number of possible phylogenies with n leaves is (2n-5)!! However, the number of Non-Isomorphic (NI) phylogenies increases much slowly. Hence, a possible approach to solution could consists of enumerating all the possible NI phylogenies and then to proceed with the leaf assignments.

  • A. Aringhieri*, C. Braghin* and D. Catanzaro†

An approximate approach for solving the balanced minimum evolution problem

13

slide-17
SLIDE 17

A possible approach to solution

  • A. Aringhieri*, C. Braghin* and D. Catanzaro†

An approximate approach for solving the balanced minimum evolution problem

Given a Distance Matrix Solve the TSP and save the best Circular Order CO* set T*=NULL

For Any non Isomorphic Phylogeny Assign CO* to the phylogeny Does the length decrease? Update T* For any clockwise rotation

  • f CO* on T

No Yes Run 2-OPT on T

In the most general case, this approach is approximate and can be stated as follows:

14

slide-18
SLIDE 18

Preliminary results: Molecular Datasets

  • A. Aringhieri*, C. Braghin* and D. Catanzaro†

An approximate approach for solving the balanced minimum evolution problem

Dataset Species Number of Species Characters Type

RbcL Plants 500 1314 rbcL gene Rana Ranoid Frogs 64 1976

  • mt. DNA

M37 Insects 37 2550

  • mt. DNA

M28 Mamals 28 2086

  • mt. DNA

M43 Cetacea 43 8128

  • mt. DNA

M62 Fungi 82 2062

  • mt. DNA

M82 Hyracoidae 62 3768

  • mt. DNA

SeedPlant25 Pinoles 25 19784 tRNA

15

slide-19
SLIDE 19

Preliminary results: Instances

  • A. Aringhieri*, C. Braghin* and D. Catanzaro†

An approximate approach for solving the balanced minimum evolution problem

From the previous datasets we have extracted two sets of 10 instances of 20 and 25 species each,

  • respectively. We have obtained the corresponding distance matrices by means of the General Time

Reversible (GTR) model of DNA sequence evolution. The estimation procedure applied was the one described in

  • D. Catanzaro, R. Pesenti, and M. C. Milinkovitch. A non-linear optimization procedure to estimate

distances and instantaneous substitution rate matrices under the GTR model. Bioinformatics 22(6), 708-715, 2006. The experiment run on a Intel(R) Pentium(R) D CPU 3.20 GHz, equipped with 2Gb RAM and Linux Kernel 2.6.20, gcc version 4.1.2.

16

slide-20
SLIDE 20

Preliminary results: Computational Results

  • A. Aringhieri*, C. Braghin* and D. Catanzaro†

An approximate approach for solving the balanced minimum evolution problem

Strategy Swap on best-so-far phylogeny Swap on each phylogeny Instances Dimension

Time (sec) Best-so-far value found Swaps improves? Time (sec) Best-so-far value found Swaps improves? Dataset01 20 6.4200 2858.512695 no 130.03000 2811.256836 yes Dataset02 20 6.29000 2942.833984 yes 127.63000 2942.833984 yes Dataset03 20 6.11000 2488.034424 no 127.72000 2452.971680 yes Dataset04 20 6.43000 2628.945312 no 127.32000 2611.698242 yes Dataset05 20 5.97000 2330.825684 no 129.58000 2330.825684 no Dataset06 20 6.34000 2659.358398 no 128.94000 2614.793457 yes Dataset07 20 6.34000 2775.332031 no 131.13000 2754.525391 yes Dataset08 20 6.41000 2636.678955 no 130.48000 2636.678955 no Dataset09 20 6.38000 2511.175781 yes 129.37000 2511.175781 yes Dataset10 20 6.35000 2567.597656 no 131.95000 2541.879883 yes

The non-isomorphic enumeration is completed in 0.72 sec for instances containing 20 species 84 sec are needed to enumerate non-isomorphic phylogenies for instances containing 25 species

17

slide-21
SLIDE 21

Summary and Conclusion

  • A. Aringhieri*, C. Braghin* and D. Catanzaro†

An approximate approach for solving the balanced minimum evolution problem

In summary: Some cases of BME are in P . However, deciding the complexity of BME is still an open problem. This is the the first attempt in solving BME by approximate algorithms. No exact algorithm is currently known in the literature. Brute force enumeration for phylogenetic estimation under ME are unable to tackle instances larger than 12. The enumerative approach can be applied to any phylogenetic estimation method. The enumerative approach can be combined with exact approaches (es. exact leaf assignment) As a drawback, the enumerative procedure is exponential. Computational results are encouraging. However, tackling larger instances still warrants additional analysis.

18