Maximum Agreement Subtrees Seth Sullivant North Carolina State - PowerPoint PPT Presentation

Maximum Agreement Subtrees Seth Sullivant North Carolina State University March 24, 2018 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 1 / 23

Phylogenetics Problem Given a collection of species, find the tree that explains their history. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 2 / 23

Phylogenetics Problem Given a collection of species, find the tree that explains their history. Yeates, Meier, Wiegman, 2015 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 2 / 23

Rooted Binary X -Trees Definition A rooted tree T has a distinguished vertex ρ , the root. A rooted binary phylogenetic X tree T is a binary tree that has a distinguished root vertex and where the leaves are labeled by X . 2 5 7 8 1 6 4 3 In phylogenetics, only have access to data on extant (not extinct) species. We don’t know data or information about species corresponding to internal nodes in the tree. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 3 / 23

Induced subtrees Let X be a label set, with n = |X| . Let T be a binary rooted phylogenetic X -tree. Given S ⊆ X , T | S is the binary restriction tree. 2 5 7 2 5 2 5 4 6 8 1 3 3 3 Definition Given T 1 , T 2 binary rooted phylogenetic X -trees, MAST ( T 1 , T 2 ) = max { # S : S ∈ X and T 1 | S = T 2 | S } This is the size of a maximum agreement subtree. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 4 / 23

Example 5 7 2 5 7 8 4 6 8 1 2 1 6 4 3 3 MAST ( T 1 , T 2 ) = 3 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 5 / 23

Example 5 7 2 5 7 8 4 6 8 1 2 1 6 4 3 3 MAST ( T 1 , T 2 ) = 3 5 2 3 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 5 / 23

Example 5 7 2 5 7 8 4 6 8 1 2 1 6 4 3 3 MAST ( T 1 , T 2 ) = 3 5 7 7 2 5 2 3 6 4 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 5 / 23

Example 5 7 2 5 7 8 4 6 8 1 2 1 6 4 3 3 MAST ( T 1 , T 2 ) = 3 5 7 7 2 5 2 3 6 4 Theorem (Steel-Warnow 1993) There is an O ( n 2 ) algorithm to compute MAST ( T 1 , T 2 ) of binary rooted phylogenetic X -trees. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 5 / 23

What is the distribution of MAST ( T 1 , T 2 ) ? Problem Determine the distribution of MAST ( T 1 , T 2 ) for reasonable “nice” probability distributions on rooted binary trees. Uniform distribution Yule-Harding distribution Remark Simulations [Bryant-Mackenzie-Steel 2003] suggest that under both the uniform distribution and the Yule-Harding distribution √ E [ MAST ( T 1 , T 2 )] ∼ c n where n = |X| , for some constant c depending on the distribution. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 6 / 23

Motivation: Comparing New Phylogenetic Methods Suppose we come up with a new phylogenetic method. This method takes a data set D and constructs the tree M ( D ) . If we know the correct tree T we can evaluate the method by computing MAST ( T , M ( D )) . If MAST ( T , M ( D )) is consistently small (for lots of different D ), then we conclude that the new method does not work well. How small is small? Is it smaller than what you would expect to see by random chance? Need to know the distribution of MAST ( T , T ′ ) . Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 7 / 23

Motivation: Cospeciation Let T H be a phylogenetic tree of host species, and T P a phylogenetic tree of parasite species. Host and parasites are paired, so T H and T P have same label set. If MAST ( T H , T P ) is “large”, reject hypothesis that T H and T P evolved independently. i.e. large MAST ( T H , T P ) = ⇒ cospeciation. Need distribution of MAST ( T 1 , T 2 ) for random trees under null hypothesis of independence to perform hypothesis test. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 8 / 23

Motivation: Cospeciation Let T H be a phylogenetic tree of host species, and T P a phylogenetic tree of parasite species. Host and parasites are paired, so T H and T P have same label set. If MAST ( T H , T P ) is “large”, reject hypothesis that T H and T P evolved independently. i.e. large MAST ( T H , T P ) = ⇒ cospeciation. Need distribution of MAST ( T 1 , T 2 ) for random trees under null hypothesis of independence to perform hypothesis test. Hafner, M.S., Nadler, S.A. (1988) Nature 332: 258-259 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 8 / 23

Motivation: Cool Math Suppose that both T 1 and T 2 are comb trees. 1 3 5 6 7 8 w w w w w w 2 4 9 w w w 1 2 3 4 5 6 7 8 9 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 9 / 23

Motivation: Cool Math Suppose that both T 1 and T 2 are comb trees. 1 3 5 6 7 8 w w w w w w 2 4 9 w w w 1 2 3 4 5 6 7 8 9 A maximum agreement subtree corresponds to a longest increasing subsequence of the permutation w = w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 w 9 , denoted L ( w ) . Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 9 / 23

Motivation: Cool Math Suppose that both T 1 and T 2 are comb trees. 1 3 5 6 7 8 w w w w w w 2 4 9 w w w 1 2 3 4 5 6 7 8 9 A maximum agreement subtree corresponds to a longest increasing subsequence of the permutation w = w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 w 9 , denoted L ( w ) . MAST ( T 1 , T 2 ) for uniformly random comb trees is equivalent to L ( w ) for uniformly random permutations w ∈ S n . Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 9 / 23

Motivation: Cool Math Suppose that both T 1 and T 2 are comb trees. 1 3 5 6 7 8 w w w w w w 2 4 9 w w w 1 2 3 4 5 6 7 8 9 A maximum agreement subtree corresponds to a longest increasing subsequence of the permutation w = w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 8 w 9 , denoted L ( w ) . MAST ( T 1 , T 2 ) for uniformly random comb trees is equivalent to L ( w ) for uniformly random permutations w ∈ S n . Theorem (Baik-Deift-Johansson 1999) √ n − cn 1 / 6 + o ( n 1 / 6 ) c ≈ 1 . 77108 E [ L ( w )] = 2 L ( w ) − 2 √ n → Tracy-Widom Random Variable n 1 / 6 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 9 / 23

Random Trees Biologists are interested in models for random trees as models for speciation processes. Uniform distribution: Select a uniform tree from all ( 2 n − 3 )!! rooted binary phylogenetic trees Yule-Harding distribution: Grow a random tree by successively splitting leaves selected uniformly at random, then apply leaf labels at random. 2 1 5 3 4 β -splitting model, α -splitting model, etc. Question How well do the different random tree models match the shape and structure of phylogenetic trees occurring in nature? Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 10 / 23

Properties of Random Trees Proposition Both Yule-Harding and uniform random trees satisfy exchangeability and sampling consistency. P( )= P( ) 1 2 3 4 5 2 1 5 3 4 Exchangeability: Sampling Consistency: If T is a random tree, and S ⊆ X then T | S is a random tree from the same distribution on leaf label set S . Theorem (Aldous) The expected depth of a uniformly random tree is Θ( √ n ) . The expected depth of Yule-Harding random tree is Θ( log n ) . Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 11 / 23

Conjecture About The Maximum Agreement Subtree Conjecture For any exchangeable sampling consistent distribution on rooted binary phylogenetic X -trees, E [ MAST ( T 1 , T 2 )] = Θ( √ n ) where n = |X| . Recall that f ( n ) = Θ( √ n ) means that there are positive constants c and C such that √ √ n ≤ f ( n ) ≤ C c n . Note that the constants c and C might depend on the probability distribution. We hope further that we can show that, asymptotically √ E [ MAST ( T 1 , T 2 )] ∼ d n for some d (depending on the distribution) as n → ∞ . Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 12 / 23

Upper bounds Theorem (BHLSSS) For any exchangeable sampling consistent distribution on rooted binary phylogenetic trees, E [ MAST ( T 1 , T 2 )] = O ( √ n ) . Proof sketch for uniform distribution. For S ⊆ X let X S = 1 if T 1 | S = T 2 | S , X S = 0 otherwise. Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 13 / 23

Upper bounds Theorem (BHLSSS) For any exchangeable sampling consistent distribution on rooted binary phylogenetic trees, E [ MAST ( T 1 , T 2 )] = O ( √ n ) . Proof sketch for uniform distribution. For S ⊆ X let X S = 1 if T 1 | S = T 2 | S , X S = 0 otherwise. � Let Y n , k = X S = number of agreement sets of size k S ⊆X , # S = k Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 13 / 23

Upper bounds Theorem (BHLSSS) For any exchangeable sampling consistent distribution on rooted binary phylogenetic trees, E [ MAST ( T 1 , T 2 )] = O ( √ n ) . Proof sketch for uniform distribution. For S ⊆ X let X S = 1 if T 1 | S = T 2 | S , X S = 0 otherwise. � Let Y n , k = X S = number of agreement sets of size k S ⊆X , # S = k → 0 if k > c √ n � n � � n � 1 E [ Y n , k ] = P ( X S = 1 ) = − k k ( 2 k − 3 )!! Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 13 / 23

Maximum Agreement Subtrees Seth Sullivant North Carolina State - PowerPoint PPT Presentation

Maximum Agreement Subtrees Seth Sullivant North Carolina State University March 24, 2018 Seth Sullivant (NCSU) Maximum Agreement Subtrees March 24, 2018 1 / 23 Phylogenetics Problem Given a collection of species, find the tree that

MAXIMUM CARDS MAXIMUM CARDS What is a Maximum Card ? The Maximum Card is the one which contains a

What is the maximum efficiency that What is the maximum efficiency that What is the maximum

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

CS 287 Advanced Robotics (Fall 2019) Lecture 13: Kalman Smoother, Maximum A Posteriori, Maximum

SCOPE OF THE TBT AGREEMENT TRADE IN GOODS GATT 1994 TBT Agreement lex specialis SCOPE OF THE

Constructing Optimal Trees from Quartets BY BRYANT AND STEEL, JOURNAL OF ALGORITHMS 2001, 38,

Efficient Enumeration of Induced Subtrees in a K-Degenerate Graph Kunihiro Wasa 1 ,

Aligning Discourse and Argumentation Structures using Subtrees and Redescription Mining Laurine

Large fringe and non-fringe subtrees in conditional Galton-Watson trees Xing Shi Cai, Luc Devroye

On the number of subtrees on the fringe of random trees (partly joined with Huilan Chang)

Bonn Agreement Oil Appearance Code Bonn Agreement Oil Appearance Code BAOAC BAOAC Bonn

Agreement July 1 1 , 2017 Agreement Key Terms Agreement between TJPA and salesforce.com 25-Year

20 STREAMING AGREEMENT 19 16 OCTOBER US$145 million Streaming Agreement US$145 million

The Bonn Agreement 1969 and BE-AWARE Project Alexander von Buxhoeveden Representing the Bonn

(6) a. ERG agreement ABS agreement (not encoded in (3)) SUBJ OBJ [!] F ROM S YNTAX TO E

Agreement in HPSG Introduction to HPSG, WS 2007/2008 Monica L. L au Universitt Tbingen

Combinatorics of spaces of trees: an application of topology to phylogenetics Curran N. McConnell

The phylogenetics of basic word order Gerhard Jger Tbingen University University of

Small phylogeny problem: character evolution trees Arvind Gupta J an Ma nuch Ladislav

Brownian motion (on a phylogeny) borrowed from Liam Revell lecture notes

Phylogenetics Introduction to Bioinformatics Dortmund, 16.-20.07.2007 Lectures: Sven Rahmann

Limit Laws for the Number of Groups formed by Social Animals under the Extra Clustering Model

GPU computing and the tree of life Michael P . Cummings Center for Bioinformatics and

Phylogenetic Trees and Networks Konstantinos Mampentzidis PhD Defense Aarhus University, Aarhus,