Introduction Gene family Several similar genes that have evolved - PowerPoint PPT Presentation

A N OPTIMAL RECONCILIATION ALGORITHM FOR GENE TREES WITH POLYTOMIES Manuel Lafond, Krister M. Swenson, Nadia El Mabrouk 1 DIRO, Université de Montréal

Introduction  Gene family  Several similar genes that have evolved from a common ancestor  Usually identified by sequence similarity  Dup-loss model : Evolution scenario determined by three kinds of events  Speciation : a new species is created, one copy of the gene existing in both species  Duplication : the gene is duplicated, giving the species at least two copies of it  Loss : the gene disappears from the family 2

Gene family history Species tree Gene tree g e f a b c d a1 b1 b2 c1 d1 Speciation Duplication 3 Loss a1 a2 b1 b2 c1 d1

Reconciliation  Given : a set of genes in the same family, a gene tree G and a species tree S  Infer : the evolutionary events that have led to the observed gene tree Gene tree Species tree a1 b1 b2 c1 d1 4 a1 a2 b1 b2 c1 d1

Reconciliation  A reconciliation is an « extension » of G that is consistent with S i.e. reflects the same phylogeny Species tree Gene tree g e f a b c d a1 b1 b2 c1 d1 Reconciliation tree g e f e e 5 a1 b1 a2 b2 c1 d1

Reconciliation  Parsimony criterion : minimum number of duplications + losses (mutation cost) Species tree Gene tree g e f a b c d a1 b1 b2 c1 d1 Reconciliation tree g e f e e 6 6 a1 b1 a2 b2 c1 d1

LCA Mapping  Many possible reconciliation trees  LCA Mapping (Bonizzoni et al., 2003)  Map each node of G with the lowest common ancestor of its leaves  Minimizes the duplication+loss cost in linear time  The label of a node x is the LCA mapping of x Species tree Gene tree g g Duplication e f e f e e a b c d a1 b1 a b2 c1 d1 7

Motivation  Most known methods work with binary gene trees  In case of uncertainty, a gene tree can be non- binary (weak edges)  Non-binary nodes are called polytomies  Reconciliation trees are binary g S G e f a b c d a a b c b a d d 8

Polytomies  Each polytomy can be solved independently (Chang & Eulenstein, 2006)  Cubic time algorithm for each polytomy g S G e f a b c d a a b c b a d d G1 9 a a b c a a b c

Polytomies  Each polytomy can be solved independently (Chang & Eulenstein, 2006) g S G g e f a b c d a a b c b a d d G2 c 10 a d d a b d d

Polytomies  Each polytomy can be solved independently (Chang & Eulenstein, 2006) g S G g g e f c a b c d a a b c b a b d d G3 f 11 g b g g a b g

Polytomies  Each polytomy can be solved independently (Chang & Eulenstein, 2006) g g S G g g g e f f a c a b c d a a b c b a b d d G3 f 12 g b g g a b g

The core problem  Find the minimum cost reconciliation between a species tree and a polytomy g S G e f a b c d a b b c c 13

Resolution  A reconciliation between S and a binary refinement of G. g S G e f a b c d a b b c c 14

Resolution  B(G) is a binary refinement of G g S B(G) e f a b c d a b b c c 15

Resolution  R(B(G)) is a reconciliation between S and B(G) g g S R(B(G)) f e e f c b d a b c d a b b c c 16

Problem statement  Given : a binary species tree S and a polytomy G  Find : a minimum mutation cost resolution of G. g S G e f a b c d a b b c c 17

Partial resolution at node s  A tree obtained from G in which every subtree rooted at a node labeled s is consistent with the species tree.  Every descendant of s is part of one of these subtrees. g G S e f a b c d a a a a b b c G’ e a e a 18 a a a b a b c

Partial resolution cost  The mutation cost of a partial resolution is the sum of the costs of all of its subtrees g G S e f a b c d a a a a b b c G’ e a e a 19 a a a b a b c

k-partial resolution at node s  A partial resolution with exactly k maximal subtrees rooted at s. g S G e f a b c d a a a a b b c G’ e a e a a a a b a b c 20

k-partial resolution at node s  A partial resolution with exactly k maximal subtrees rooted at s. g S G e f a b c d a a a a b b c G’ e e a e a a a a b a b c 21

Methodology  Idea : an optimal resolution contains a minimum k- partial resolution at s, for every node s in V(S) g S G e f c a b c d a b b b a 22

Methodology  R(B(G)) has a 1-partial resolution at e  It also has a 2-partial resolution at e g g R(B(G)) S e e e f e f a b b a c d b a b c d  For which k’s does the optimal resolution contain a k- 23 partial resolution ?

Methodology  M(s, k) denotes the minimum cost of a k-partial resolution at s  M(root(S), 1) is the minimum cost of the full resolution of G  The solution is a 1-partial resolution at root(S) g = root(S) e R(B(G)) : a 1-partial e resolution at g e f 24 a b b a c d b

Computation of M(s, k)  We compute the values of M(s, k) for each node s in V(S) in a bottom-up manner, and for every k. g S k = 1 2 3 4 5 6 e f M(a, k) M(b, k) a b c d M(c, k) G M(d, k) M(f, k) M(e, k) M(g, k) a a a a b b c c 25

Computation of M(s, k)  M(a, 4) = 0 g k = 1 2 3 4 5 6 S M(a, k) 0 e f M(b, k) M(c, k) a b c d M(d, k) G M(f, k) M(e, k) M(g, k) a a a a b b c c 26

Computation of M(s, k)  M(a, 5) = 1 (one loss in a) g k = 1 2 3 4 5 6 S M(a, k) 0 1 e f M(b, k) M(c, k) a b c d M(d, k) G’ M(e, k) M(f, k) M(g, k) a a a a a b b c 27

Computation of M(s, k)  M(a, 3) = 1 (one duplication in a) g k = 1 2 3 4 5 6 S M(a, k) 1 0 1 e f M(b, k) M(c, k) a b c d M(d, k) G’ M(e, k) M(f, k) M(g, k) a a a a a b b c 28

Computation of M(s, k)  Let nb(s) denote the number of leaves of G labeled s  For instance, nb(a) = 4, nb(b) = 2, …  In general, if s is a leaf, then M(s, k) = |k - nb(s)| G a a a a b b c 29

Computation of M(s, k)  The leaf values are easy to compute  M(s, k) = |k – nb(s)| g k = 1 2 3 4 5 6 S M(a, k) 3 2 1 0 1 2 e f M(b, k) 0 1 1 2 3 4 M(c, k) 0 1 2 3 4 5 a b c d M(d, k) 1 2 3 4 5 6 M(e, k) G M(f, k) M(g, k) a a a a b b c 30

Computation of M(s, k)  Computing M(e, k) g S e f k = 1 2 3 4 5 6 M(a, k) 3 2 1 0 1 2 a b c d M(b, k) 1 0 1 2 3 4 M(c, k) 0 1 2 3 4 5 G M(d, k) 1 2 3 4 5 6 M(e, k) a a a a b b c 31

Computation of M(s, k)  Either  M(e, 2) = M(a, 2) + M(b, 2) ( from above – indicates speciation)  M(e, 2) = M(e, 1) + 1 (from the left – indicates a loss)  M(e, 2) = M(e, 1) + 1 (from the left – indicates a duplication) k = 1 2 3 4 5 6 M(a, k) 3 2 1 0 1 2 + M(b, k) 1 0 1 2 3 4 M(c, k) 0 1 2 3 4 5 M(d, k) 1 2 3 4 5 6 M(e, k) x y z +1 loss +1 dup 32

Computation of M(s, k)  Temporarily let M(s, k) = M(s1, k) + M(s2, k) for every k k = 1 2 3 4 5 6 M(a, k) 3 2 1 0 1 2 M(b, k) 1 0 1 2 3 4 M(c, k) 0 1 2 3 4 5 M(d, k) 1 2 3 4 5 6 M(e, k) 4 2 2 2 4 6 33

Computation of M(s, k)  Keep the minimum values only  If there are more than one, they will be grouped together k = 1 2 3 4 5 6 M(a, k) 3 2 1 0 1 2 M(b, k) 1 0 1 2 3 4 M(c, k) 0 1 2 3 4 5 M(d, k) 1 2 3 4 5 6 M(e, k) 2 2 2 34

Computation of M(s, k)  Extend the minimums, adding one for each cell traversed k = 1 2 3 4 5 6 M(a, k) 3 2 1 0 1 2 M(b, k) 1 0 1 2 3 4 M(c, k) 0 1 2 3 4 5 M(d, k) 1 2 3 4 5 6 M(e, k) 3 2 2 2 3 4 +1 +1 +1 35

Computation of M(s, k)  The whole table can be filled this way g k = 1 2 3 4 5 6 S M(a, k) 3 2 1 0 1 2 e f M(b, k) 1 0 1 2 3 4 M(c, k) 0 1 2 3 4 5 a b c d M(d, k) 1 2 3 4 5 6 M(e, k) 3 2 2 2 3 4 G M(f, k) 1 2 3 4 5 6 M(g, k) 4 4 5 6 7 8 a a a a b b c 36

Computation of M(s, k)  The minimum cost of a resolution of G is M(g, 1) = 4 g k = 1 2 3 4 5 6 S M(a, k) 3 2 1 0 1 2 e f M(b, k) 1 0 1 2 3 4 M(c, k) 0 1 2 3 4 5 a b c d M(d, k) 1 2 3 4 5 6 M(e, k) 3 2 2 2 3 4 G M(f, k) 1 2 3 4 5 6 M(g, k) 4 4 5 6 7 8 a a a a b b c 37

Building the resolution  Using the table, we’ll find the number of duplications and losses for each node of s. k = 1 2 3 4 5 6 M(a, k) 3 2 1 0 1 2 M(b, k) 1 0 1 2 3 4 M(c, k) 0 1 2 3 4 5 M(d, k) 1 2 3 4 5 6 M(e, k) 3 2 2 2 3 4 M(f, k) 1 2 3 4 5 6 M(g, k) 4 4 5 6 7 8 38

Building the resolution  Backtrack where the value of M(g, 1) came from k = 1 2 3 4 5 6 M(a, k) 3 2 1 0 1 2 M(b, k) 1 0 1 2 3 4 M(c, k) 0 1 2 3 4 5 M(d, k) 1 2 3 4 5 6 M(e, k) 3 2 2 2 3 4 M(f, k) 1 2 3 4 5 6 M(g, k) 4 4 5 6 7 8 39

Introduction Gene family Several similar genes that have evolved - PowerPoint PPT Presentation

A N OPTIMAL RECONCILIATION ALGORITHM FOR GENE TREES WITH POLYTOMIES Manuel Lafond, Krister M. Swenson, Nadia El Mabrouk 1 DIRO, Universit de Montral Introduction Gene family Several similar genes that have evolved from a common

INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION

Introduction ATV Introduction A T V Introduction A lphabet T V Introduction A lphabet

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Shenzhen Cuilu jewelry Co., Ltd was founded in 1996 and its a large private enterprise

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Spectrum Painting Richard Shipman MW0RCZ ADARS 6th Jan 2020 Introduction Introduction

Introduction Introduction Introduction Introduction Outline Motivation Failures

Introduction Introduction Introduction Nationwide Cause for Concern 1

Team Introduction Experiments Outreach Problem Project Brainstorm Introduction Introduction

Lecture 1 Andreas Habegger Introduction Zynq Introduction Zynq Introduction Zynq PS vs. PL

Introduction to Web Design & Computer Principles Class 1 CSCI-UA 4 Introduction and Overview

Introduction to CICS Course introduction Course introduction What is CICS? What is an

INF5110 Compiler Construction Introduction Spring 2016 1 / 33 Outline 1. Introduction

INTRODUCTION I Syllabus INTRODUCTION I Syllabus I Why study labor economics? INTRODUCTION I

2018.06 01 SMILE5 Introduction S E 5 02 Alpha Cloud M I L 03 Company Introduction 04

Phylogenetic tree Michael Schroeder Biotechnology Center TU Dresden Phylogenetic trees

Phylogenetic trees III Maximum Parsimony . Gerhard Jger ESSLLI 2016 Gerhard Jger Maximum

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

Data Warehousing and Machine Learning Probabilistic Classifiers Thomas D. Nielsen Aalborg

Enumeration of Circuits and Minimal Forbidden Sets Frederik Stork ILOG GmbH, Germany Marc Uetz

Thermal Physics www.njctl.org Slide 3 / 163 Slide 4 / 163 Thermal Physics Temperature, Thermal

Ireland Mary Curley * and Seamus Walsh Data Management Workshop St Gallen October 2015

Solution Background Human skin-surface temperature is an important indicator of physical health.

Introduction Gene family Several similar genes that have evolved - PowerPoint PPT Presentation

A N OPTIMAL RECONCILIATION ALGORITHM FOR GENE TREES WITH POLYTOMIES Manuel Lafond, Krister M. Swenson, Nadia El Mabrouk 1 DIRO, Universit de Montral Introduction Gene family Several similar genes that have evolved from a common

INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION

Introduction ATV Introduction A T V Introduction A lphabet T V Introduction A lphabet

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Shenzhen Cuilu jewelry Co., Ltd was founded in 1996 and its a large private enterprise

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Spectrum Painting Richard Shipman MW0RCZ ADARS 6th Jan 2020 Introduction Introduction

Introduction Introduction Introduction Introduction Outline Motivation Failures

Introduction Introduction Introduction Nationwide Cause for Concern 1

Team Introduction Experiments Outreach Problem Project Brainstorm Introduction Introduction

Lecture 1 Andreas Habegger Introduction Zynq Introduction Zynq Introduction Zynq PS vs. PL

Introduction to Web Design &amp; Computer Principles Class 1 CSCI-UA 4 Introduction and Overview

Introduction to CICS Course introduction Course introduction What is CICS? What is an

INF5110 Compiler Construction Introduction Spring 2016 1 / 33 Outline 1. Introduction

INTRODUCTION I Syllabus INTRODUCTION I Syllabus I Why study labor economics? INTRODUCTION I

2018.06 01 SMILE5 Introduction S E 5 02 Alpha Cloud M I L 03 Company Introduction 04

Phylogenetic tree Michael Schroeder Biotechnology Center TU Dresden Phylogenetic trees

Phylogenetic trees III Maximum Parsimony . Gerhard Jger ESSLLI 2016 Gerhard Jger Maximum

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

Data Warehousing and Machine Learning Probabilistic Classifiers Thomas D. Nielsen Aalborg

Enumeration of Circuits and Minimal Forbidden Sets Frederik Stork ILOG GmbH, Germany Marc Uetz

Thermal Physics www.njctl.org Slide 3 / 163 Slide 4 / 163 Thermal Physics Temperature, Thermal

Ireland Mary Curley * and Seamus Walsh Data Management Workshop St Gallen October 2015

Solution Background Human skin-surface temperature is an important indicator of physical health.

Introduction to Web Design & Computer Principles Class 1 CSCI-UA 4 Introduction and Overview