Lecture 1: Trees, tree metric and tree spaces Piotr Zwiernik - PowerPoint PPT Presentation

Introduction Trees Tree metrics Phylogenetic oranges Tree correlations Further concepts Lecture 1: Trees, tree metric and tree spaces Piotr Zwiernik University of Genoa Algebraic Statistics 2015 Genova June 11, 2015 Lecture 1: Trees, tree metric and tree spaces 1 / 23

Introduction Trees Tree metrics Phylogenetic oranges Tree correlations Further concepts Trees Definition : Tree = undirected graph without cycles tree T = ( V , E ): V vertices, E edges r undirected rooted rooted tree often depicted as. . . leaves = degree one nodes inner nodes = degree ≥ 2 nodes Lecture 1: Trees, tree metric and tree spaces 2 / 23

Introduction Trees Tree metrics Phylogenetic oranges Tree correlations Further concepts Latent tree models Graphical models on trees have many nice properties exponential families with explicit formulas for the MLE dynamic programming for efficient computation of various probabilistic quantities Making some of the variables hidden gives greater flexibility Definition ∗ : Tree-decomposable distribution = marginal distribution of a tree distribution. hidden variables are marginalized out Tree-decomposable distributions discussed by Judea Pearl as a natural extension of star-decomposable distributions (naive Bayes model, latent class model) Judea Pearl, Fusion, Propagation, and Structuring in Belief Networks , Artificial Inteligence, 1986. Lecture 1: Trees, tree metric and tree spaces 3 / 23

Introduction Trees Tree metrics Phylogenetic oranges Tree correlations Further concepts Motivation Applications in: linguistics and bioinformatics – to model evolutionary processes hierarchical clustering image processing Important concept in causality Many well known statistical models are special cases examples: hidden Markov models, naive Bayes models general results can be used for these special cases Understand models with hidden data the most tractable family of models with hidden variables identifiability, geometry of the likelihood function Alan S. Willsky, Multiresolution Markov Models for Signal and Image Processing , 2002. Martin J. Wainwright, Michael I. Jordan, Graphical Models, Exponential Families, and Variational Inference , 2008. Lecture 1: Trees, tree metric and tree spaces 4 / 23

Introduction Trees Tree metrics Phylogenetic oranges Tree correlations Further concepts Short overview Lecture 1: Trees, tree metrics and tree spaces Lecture 2: Latent tree graphical models Lecture 3: Tree inference and parameter estimation Lecture 4: Likelihood geometry and model identifiability Main theme : phylogenetic combinatorics and results on tree metrics give a greater insight into the class of latent tree models Lecture 1: Trees, tree metric and tree spaces 5 / 23

Introduction Trees Tree metrics Phylogenetic oranges Tree correlations Further concepts Semi-labeled trees and phylogenetic trees semi-labeled tree T = ( T , φ ): φ : { 1 , . . . , m } → V all degree ≤ 2 nodes need to be labeled multiple labels at a node are allowed phylogenetic tree = semi-labeled tree such that: only leaves are labeled (there are no degree 2 nodes) no multiple labels allowed 3 4 3 1 5 1 4 5 , 6 2 6 2 phylogenetic semi-labeled this makes sense for both rooted and undirected trees Charles Semple, Mike Steel, Phylogenetics, 2003. Lecture 1: Trees, tree metric and tree spaces 6 / 23

Introduction Trees Tree metrics Phylogenetic oranges Tree correlations Further concepts Binary phylogenetic trees are universal Undirected binary tree = every inner node has degree three Rooted binary tree = every internal node has two children Let e = u − v be an edge of a semi-labeled tree T . T /e is the semi-labeled tree obtained from T by identifying u and v and removing e . The labeling sets of u, v are joined. this operation is called edge contraction Remark: Every semi-labeled tree can be obtained from a binary phylogenetic tree by edge contractions. Lecture 1: Trees, tree metric and tree spaces 7 / 23

Introduction Trees Tree metrics Phylogenetic oranges Tree correlations Further concepts Binary expansion A binary expansion of a semi-labeled tree T is a binary phylogenetic tree T ∗ such that T can be obtained from T ∗ by edge contractions. (typically not unique) 3 3 4 1 1 4 = ⇒ = ⇒ 5 , 6 5 , 6 2 2 3 4 3 4 1 5 1 5 = ⇒ 6 2 2 6 Lecture 1: Trees, tree metric and tree spaces 8 / 23

Introduction Trees Tree metrics Phylogenetic oranges Tree correlations Further concepts Tree metrics T semi-labeled tree with labeling set [ m ] := { 1 , . . . , m} Attach a positive number d e to each edge e of T For every two labeled nodes i, j ∈ [ m ] ij denotes the path between i and j in T d ij := � e∈ij d e is the T -distance between i and j in T  0 5 . 5 9 . 5 8  3 1 2 . 5 2 · 0 11 9 . 5 5   1 3 . 5   · · 0 3 . 5 4   · · · 0 2 Lecture 1: Trees, tree metric and tree spaces 9 / 23

Introduction Trees Tree metrics Phylogenetic oranges Tree correlations Further concepts Tree metrics (2) T a semi-labeled tree with labeling set [ m ]. D = [ d ij ] ∈ R m×m a symmetric matrix with zeros on the diagonal. Definition: D is a T -metric if there exists a collection of edge lengths d e of T such that d ij = � e∈ij d e for all i, j ∈ [ m ]. Definition: D is a tree metric if it is a T -metric for some semi-labeled tree T . Question: Given a symmetric matrix D with d ii = 0 and d ij > 0 for i � = j , can we say if it is a tree metric? If yes, can we identify the underlying tree T and the edge lengths d e ? Lecture 1: Trees, tree metric and tree spaces 10 / 23

Introduction Trees Tree metrics Phylogenetic oranges Tree correlations Further concepts Tree metric theorem Theorem[Buneman,1974]: A symmetric matrix D = [ d ij ] with d ii = 0 is a tree metric if and only if for any four (not necessarily distinct) i, j, k, l ∈ [ m ] � d ik + d jl d ij + d kl ≤ max d il + d jk . Moreover, a tree metric defines the defining T and the edge lengths d e uniquely. Every tree metric is a metric ≡ satisfies the triangle inequality. Lecture 1: Trees, tree metric and tree spaces 11 / 23

Introduction Trees Tree metrics Phylogenetic oranges Tree correlations Further concepts The space of tree metrics b a c c b a Billera, L. J., Holmes, S. P., & Vogtmann, K. (2001). Geometry of the Space of Phylogenetic Trees. Advances in Applied Mathematics, 27(4). Lecture 1: Trees, tree metric and tree spaces 12 / 23

Introduction Trees Tree metrics Phylogenetic oranges Tree correlations Further concepts Phylogenetic oranges T a semi-labeled tree with labeling set [ m ] = { 1 , . . . , m} Attach a number ρ e ∈ [0 , 1] to each edge of T . For every two labeled nodes i, j ∈ [ m ], ρ ij := � e∈ij ρ e . Write Σ = [ ρ ij ] ∈ PO( T ), ρ ii = 1. That Σ is positive semidefinite will be shown later. PO( m ) := � T semi − labeled PO( T ) Moulton, Steel, P eeling phylogenetic oranges , 2004. Kim, Slicing hyperdimensional oranges: the geometry of phylogenetic estimation , 2000. Engstr¨ om, Hersh, and Sturmfels, Toric cubes , 2012. Lecture 1: Trees, tree metric and tree spaces 13 / 23

Introduction Trees Tree metrics Phylogenetic oranges Tree correlations Further concepts Relation to tree metrics Note: all ρ e � = 0 if and only if all ρ ij � = 0 PO > ( m ) := PO( m ) ∩ (0 , 1]( m 2 ) Proposition : Points in PO > ( m ) are in one-to-one correspondence with tree metrics over [ m ]. define d ij := − log ρ ij , d e := − log ρ e , then d ij , d e ≥ 0 and d ij = � e∈ij d e (because ρ ij = � e∈ij ρ e ) The space of phylogenetic oranges arises naturally for various statistical models on trees, which we will see later. Tree metrics are well studied and many authors exploit this link to propose efficient learning algorithms. Lecture 1: Trees, tree metric and tree spaces 14 / 23

Introduction Trees Tree metrics Phylogenetic oranges Tree correlations Further concepts Semi-labeled forests If some ρ ij = 0, then Σ does not map to a tree metric. if ρ ij → 0, then − log ρ ij → ∞ ρ ij = � e∈ij ρ e and so ρ ij = 0 if and only if ρ e = 0 for some e ∈ ij . if ρ ij � = 0 and ρ jk � = 0 then ρ ik � = 0 and so i ∼ j iff ρ ij � = 0 defines an equivalence relation Every equivalence relation on [ m ] gives a partition B 1 / · · · /B r of [ m ] into equivalence classes (blocks). A semi-labeled forest F with labeling set [ m ] is a collection of semi-labeled trees with labeling sets B 1 , . . . , B r that are disjoint and � B i = [ m ]. Lecture 1: Trees, tree metric and tree spaces 15 / 23

Introduction Trees Tree metrics Phylogenetic oranges Tree correlations Further concepts Tuffley poset Consider all semi-labeled forests on [ m ]. They form a partially ordered set, called the Tuffley poset. If F is a semi-labeled forest then F/e is a semi-labeled forest obtained from F by contracting e If F is a semi-labeled forest then F \ e is a semi-labeled forest obtained from F by removing e (some post-processing is needed) We say that T ≤ T ′ in the Tuffley poset if T can be obtained from T ′ by edge contractions and edge deletions Lecture 1: Trees, tree metric and tree spaces 16 / 23

Lecture 1: Trees, tree metric and tree spaces Piotr Zwiernik - PowerPoint PPT Presentation

Introduction Trees Tree metrics Phylogenetic oranges Tree correlations Further concepts Lecture 1: Trees, tree metric and tree spaces Piotr Zwiernik University of Genoa Algebraic Statistics 2015 Genova June 11, 2015 Lecture 1: Trees,

Welcome back... Metric spaces. Approximate metric using a tree. Tree metric: 16 16 A metric

Dynamical Systems Continuous maps of metric spaces We work with metric spaces, usually a

Trees Chapter 11 Chapter Summary Introduction to Trees Applications of Trees Tree

Metric Spaces Definition If d is a metric on X , then the metric topology on X induced by d is

Trees Trees CSE, IIT KGP Trees and Spanning Trees Trees and Spanning Trees A graph having

Trees Eric McCreath Overview In this lecture we will explore: general trees, binary trees,

CALCULUS ON METRIC SPACES: BEYOND THE POINCAR INEQUALITY New Examples of Differentiability

( ( ) ) ( ) ( ) = = Work = h log t n B- B -Trees Trees B B- -Trees

Covering Metric Spaces by Few Trees Yair Bartal Nova Fandina Ofer neiman Tree Covers Let

A Few Pearls in the Theory of Quasi-Metric Spaces Jean Goubault-Larrecq ANR Blanc CPP TACL

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Trees (Part 2) 1 / 59 Trees (Part 2) Recap Recap 2 / 59 Trees (Part 2) Recap B + Tree A B

Trees I think that I shall never see Trees I A poem lovely as a tree -- J. Kilmer Trees

Tyrol Hill Park Phase 4 Elementary Campbell Elementary Campbell Park Spaces Open Park

Tournament Trees Winner trees. Loser Trees. Winner Trees Complete binary tree with n external

Binary trees Binary trees David Morgan Binary trees Binary trees elements have up to 2

The Biology of Amphibians Mark Mandica Executive Director The Amphibian Foundation

Swinging from Tree to Tree: Rearrangement Operations and their Metrics Stefan Grnewald

Phylogenetic Trees and Networks Konstantinos Mampentzidis PhD Defense Aarhus University, Aarhus,

GPU computing and the tree of life Michael P . Cummings Center for Bioinformatics and

The journey of a tropical geometer through four countries Mar a Ang elica Cueto

Building an Extensible File System via Policy-based Data

Todays research computing UF Research Computing Introduction to Galaxy at UF HPC

Matroids From Hypersimplex Splits Michael Joswig TU Berlin Berlin, 15 December 2016 joint w/