The worst case complexity of Maximum Parsimony Amir Carmel Noa - PowerPoint PPT Presentation

The worst case complexity of Maximum Parsimony ◮ Amir Carmel ◮ Noa Musa-Lempel ◮ Dekel Tsur ◮ Michal Ziv-Ukelson Ben-Gurion University June 12, 2014 1 / 23

What’s a phylogeny Phylogenies: ◮ Graph-like structures whose topology describes the inferred evolutionary history among a set of species. ◮ Modeled as either rooted or unrooted labeled binary trees, where the input entities are assigned to the leaf vertices. 2 / 23

Character based methods for phylogenetic reconstruction ◮ Each specie is characterized by a sequence of letters. ◮ We are given a subsitution scoring matrix over the letters. ◮ Position independence is assumed. 1: A C A G G T T 2: C C A G A T T 3: C C G G G T A 4: T G A G G T A 5: T G A G G T T 3 / 23

rooted/unrooted phylogeny ◮ The decision whether to model phylogenies as rooted versus unooted depends on the substitution scoring matrix. ◮ Modeling phylogenies as unrooted trees requires the assumption of symmetric scoring matrices. ◮ Today, many applications apply asymmetric scoring matrices. 4 / 23

Parsimony Maximization ◮ A classical approach for phylogenetic reconstruction. ◮ The Parsimony Maximization approach seeks the phylogenetic tree that supposes the least amount of evolutionary change explaining the observed data. ◮ There are two classical problems inferred from phylogenetic parsimony maximization: Small Parsimony (SP) and Maxmimum Parsimony (MP). 5 / 23

Small Parsimony Problem (SP) Input: multiple alignment, tree topology on n leaves. 1: A C A G G T T 2: C C A G A T T 3: C C G G G T A 1 2 3 4: T G A G G T A 4 5 5: T G A G G T T Goal: Assignment to internal vertices that minimizes the scoring function. C C C C C C G C 1 2 3 1 2 3 C C C 4 5 C C C 4 5 G G G G Score = 1 Score = 2 6 / 23

Small Parsimony Problem (SP) Input: multiple alignment, tree topology on n leaves. 1: A C A G G T T 2: C C A G A T T 3: C C G G G T A 1 2 3 4: T G A G G T A 4 5 5: T G A G G T T Goal: Assignment to internal vertices that minimizes the scoring function. C C C C C C G C 1 2 3 1 2 3 C C C 4 5 C C C 4 5 G G G G Score = 1 Score = 2 We note that known algorithms for Small Parsimony traverse the tree in a bottom up manner. 6 / 23

Maximum Parsimony Problem (MP) Input: multiple alignment 1: A C A G G T T 2: C C A G A T T 3: C C G G G T A 4: T G A G G T A 5: T G A G G T T Goal: topology and assignments to internal vertices, that minimizes the SP score. C C C C C C C C G C C C 3 1 2 3 1 2 3 C C C C 4 5 C C C 4 5 1 4 2 5 G G G G C G C G Score = 1 Score = 2 Score = 2 7 / 23

Maximum Parsimony Problem (MP) Input: multiple alignment 1: A C A G G T T 2: C C A G A T T 3: C C G G G T A 4: T G A G G T A 5: T G A G G T T Goal: topology and assignments to internal vertices, that minimizes the SP score. C C C C C C C C G C C C 3 1 2 3 1 2 3 C C C C 4 5 C C C 4 5 1 4 2 5 G G G G C G C G Score = 1 Score = 2 Score = 2 The Maximum Parsimony (MP) problem is NP-hard [L. R. Foulds and R. L. Graham (1982)]. 7 / 23

Measuring SP and MP complexity in terms of assignment operations ◮ Assignment operation - time to compute the assignment for a single vertex. ◮ This depends on the scoring scheme employed, for example: Fitch’s algorithm (Hamming distance) O ( m ), Sankoff’s algorithm (weighted edit distance) O ( m Σ 2 ). 8 / 23

Our contribution Previous results: ◮ Cavalli-Sforza and Edwards (1967) - ( n − 1) · (2 n − 3)!! assignment operations. ◮ Hendy and Penny (1982) - branch&bound algorithm for MP. Where (2 n − 3)!! = 1 × 3 × 5 × . . . × (2 n − 3). 9 / 23

Our contribution New results: ◮ Cavalli-Sforza and Edwards (1967) - ( n − 1) · (2 n − 3)!! assignment operations. ◮ Hendy and Penny (1982) - branch&bound algorithm for MP. Worst case running time: Θ( √ n · (2 n − 3)!!) assignment operations. ◮ A new, faster algorithm which executes Θ((2 n − 3)!!) assignment operations. Where (2 n − 3)!! = 1 × 3 × 5 × . . . × (2 n − 3) 9 / 23

The algorithm of Cavalli-Sforza and Edwards 10 / 23

The algorithm of Cavalli-Sforza and Edwards ◮ Cavalli-Sforza and Edwards showed that the number of rooted phylogenies with n leaves is (2 n − 3)!!. 10 / 23

The algorithm of Cavalli-Sforza and Edwards ◮ Cavalli-Sforza and Edwards showed that the number of rooted phylogenies with n leaves is (2 n − 3)!!. ◮ The algorithm enumerates all phylogenies with n leaves, and then solves the Small Parsimony (SP) problem on each tree. 10 / 23

The algorithm of Cavalli-Sforza and Edwards ◮ Cavalli-Sforza and Edwards showed that the number of rooted phylogenies with n leaves is (2 n − 3)!!. ◮ The algorithm enumerates all phylogenies with n leaves, and then solves the Small Parsimony (SP) problem on each tree. ◮ Each phylogeny has exactly n − 1 internal vertices, therefore the algorithm has a running time of ( n − 1) · (2 n − 3)!! assignment operations. 10 / 23

The algorithm of Hendy and Penny Preliminaries: 1 1 2 3 2 1 1 2 1 3 3 2 11 / 23

The algorithm of Hendy and Penny Enumeration space: 1 1 2 3 2 1 1 2 1 3 3 2 3 1 2 11 / 23

The algorithm of Hendy and Penny Enumeration space: 1 1 2 3 2 1 1 2 1 3 3 2 3 4 1 2 11 / 23

The algorithm of Hendy and Penny Assignment operations: 1 1 2 3 2 1 1 2 1 3 3 2 3 4 1 2 11 / 23

The algorithm of Hendy and Penny Assignment operations: 1 1 2 3 2 1 1 2 1 3 3 2 3 3 4 2 1 2 1 11 / 23

The algorithm of Hendy and Penny Assignment operations: 1 1 2 3 2 1 1 2 1 3 3 2 3 3 4 2 1 2 1 4 11 / 23

The algorithm of Hendy and Penny Assignment operations: 1 1 2 3 2 1 1 2 1 3 3 2 3 3 3 4 2 1 1 2 1 4 4 2 11 / 23

The algorithm of Hendy and Penny Assignment operations: 1 1 2 3 2 1 1 2 1 3 3 2 3 3 3 4 2 1 1 2 1 4 4 2 1 2 4 3 11 / 23

The algorithm of Hendy and Penny Assignment operations: 1 1 2 3 2 1 1 2 1 3 3 2 3 3 3 4 4 2 1 3 1 2 1 4 4 2 1 2 4 3 1 2 11 / 23

The algorithm of Hendy and Penny Assignment operations: 1 1 2 3 2 1 1 2 1 3 3 2 3 3 3 4 4 2 1 3 1 2 1 4 4 2 1 2 4 3 1 2 The search space tree is developed in top-down order, while the recalculations of assignments is done in a bottom-up order. 11 / 23

The algorithm of Hendy and Penny Assignment operations: 1 1 2 3 2 1 1 2 1 3 3 2 3 3 3 4 4 2 1 3 1 2 1 4 4 2 1 2 4 3 1 2 The complexity of the algorithm equals to the number of assignment operations. 11 / 23

The algorithm of Hendy and Penny Their algorithm was originally proposed for the purpose of branch and bound and its worst case bound was not previously properly analyzed. Using combinatorial methods we managed to achieve an exact bound. 12 / 23

The number of assignment operations ◮ Let NumAnc ( v ) denote the number of ancestors of x in F v . 13 / 23

The number of assignment operations ◮ Let NumAnc ( v ) denote the number of ancestors of x in F v . 5 4 2 6 7 x 1 3 ◮ 13 / 23

The number of assignment operations ◮ Let NumAnc ( v ) denote the number of ancestors of x in F v . 5 4 2 6 7 x 1 3 ◮ ◮ The number of ancestors of x in F v is equal to the number of assignment operations executes in node v . 13 / 23

The number of assignment operations ◮ Let NumAnc ( v ) denote the number of ancestors of x in F v . 5 4 2 6 7 x 1 3 ◮ ◮ The number of ancestors of x in F v is equal to the number of assignment operations executes in node v . ◮ Let H i be the sum of NumAnc ( v ) for all nodes v in level i + 1. 13 / 23

The number of assignment operations ◮ Let NumAnc ( v ) denote the number of ancestors of x in F v . 5 4 2 6 7 x 1 3 ◮ ◮ The number of ancestors of x in F v is equal to the number of assignment operations executes in node v . ◮ Let H i be the sum of NumAnc ( v ) for all nodes v in level i + 1. ◮ By definition, � NumAnc ( v ) = � n − 1 i =1 H i . 13 / 23

The worst case complexity of Maximum Parsimony Amir Carmel Noa - PowerPoint PPT Presentation

The worst case complexity of Maximum Parsimony Amir Carmel Noa Musa-Lempel Dekel Tsur Michal Ziv-Ukelson Ben-Gurion University June 12, 2014 1 / 23 Whats a phylogeny Phylogenies: Graph-like structures whose topology

Computing parsimony Parsimony treats each site (position in a sequence) l independently Total

Phylogenetic trees III Maximum Parsimony . Gerhard Jger ESSLLI 2016 Gerhard Jger Maximum

Phylogenetic trees III Maximum Parsimony Gerhard Jger Words, Bones, Genes, Tools February 28,

quiz insertion sort: worst-case time complexity? best-case time complexity? in-place?

Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and

Parsimony Small Parsimony Genome 559: Introduction to Statistical and Computational Genomics

Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and

Parsimony Large Parsimony, Search Algorithms, Branch confidence Genome 559: Introduction to

MAXIMUM CARDS MAXIMUM CARDS What is a Maximum Card ? The Maximum Card is the one which contains a

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

Section 3.3 Section Summary ! Time Complexity ! Worst-Case Complexity ! Algorithmic Paradigms !

Lattices that Admit Logarithmic Worst-Case to Average-Case Connection Factors Chris Peikert 1

Information Geometry in Mathematical Finance: Model Risk, Worst and Almost Worst Scenarios Imre

Worst-case Ethernet Network Latency for Shaped Sources Max Azarov, Standard Microsystems (SMSC)

Comparison of Efficiency Binary Binomial Procedure (worst- (worst- (amortized) case) case)

Typical versus Worst Case Design in Networking Nandita Dukkipati Yashar Ganjali, Rui Zhang-Shen

Wayne County Public Schools March 19, 2014 Plymouth, NC School District Characteristics

Common Services Project Update February 27, 2015 Agenda for Todays Meeting Common Services

Includes Estonian Marine Institute www.sea.ee 1 Estonian Marine Institute, University of Tartu

Economic Development Investment Strategies September 2014 CANDO National Conference Travis

R OGERS P ARTNERS LLP THE INTERPLAY BETWEEN TORT AND ACCIDENT BENEFITS The Law and Practical

Environmental Studies The Environmental Studies Program at California State University Monterey

ROTARY YOUTH EXCHANGE CHIARA COLOMBO ROTARY EXCHANGE STUDENT 2018-2019 MULTIDISTRETTO 2041 2042

GLOBAL BUSINESS Academic supervisor: Anton Varfolomeev Program manager: Elena Zinchak