Hierarchical orga- nization of syn- tenic blocks in large genomic - PowerPoint PPT Presentation

Hierarchical organization of syntenic blocks in large genomic datasets Daniel Doerr Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University Workshop on Data Structures in Bioinformatics, February 4, 2020

Hierarchical organization of syntenic blocks in large genomic datasets 1 Introduction Synteny hierarchies for permutations Synteny hierarchies for sequences PSyCHO

Data structures for large-scale comparisons Hierarchical organization of syntenic blocks in large genomic datasets: Introduction 2 Objective: multi-species whole-genome comparisons Solution: pan-genome data structures only suitable for very similar genomes

Data structures for large-scale comparisons Hierarchical organization of syntenic blocks in large genomic datasets: Introduction 2 Objective: multi-species whole-genome comparisons Solution: pan-genome data structures ... only suitable for very similar genomes

Abstraction by decomposition Hierarchical organization of syntenic blocks in large genomic datasets: Introduction 3 genomes decomposed into syntenic blocks essential for studying genome evolution between distant species current studies restricted to protein-coding genes omission of many other conserved genomic regions syntenic block CCTTGTGCGAGAATGCCCGCCAGTTCTCCCT GGAACACGCTCTTACGGGCGGTCAAGAGGGA

What is synteny? Hierarchical organization of syntenic blocks in large genomic datasets: Introduction 4 A zoo of definitions: “ the same ribbon ” (Renwick, 1971) , set of markers co-located on same chromosome markers must be collinear local rearrangements allowed mostly tool-centric: FISH, GRIMM/DRIMM-Synteny, Cyntenator, i-ADHoRe, Sibelia, CoGe, Satsuma, etc. A G H B

What is synteny? Hierarchical organization of syntenic blocks in large genomic datasets: Introduction into syntenic blocks there is no one true decomposition of genomes dilemma : contiguous syntenic blocks syntenic block (SB) : single marker or set of Definition [Ghiurcuta and Moret, 2014] 4 (equivalence) relations homology assignment : set H of pairwise A G Given two genomes G , H and homology assignment H , two SBs H A ⊆ G and B ⊆ H are homologous if for each B ⇒ ( a , b ′ ) ∈ H , b ′ ∈ B a ∈ A : ∃ ( a , h ) ∈ H , h ∈ H = ⇒ ( a ′ , b ) ∈ H , a ′ ∈ A b ∈ B : ∃ ( b , g ) ∈ H , g ∈ G = A G H B

Synteny hierarchy Hierarchical organization of syntenic blocks in large genomic datasets: Introduction 5 What are the homologous SBs of G , H ? G H

Synteny hierarchy Hierarchical organization of syntenic blocks in large genomic datasets: Introduction 5 G , H are covered by one homologous SB pair G H

Synteny hierarchy Hierarchical organization of syntenic blocks in large genomic datasets: Introduction 5 ... but contains several other homologous SB pairs G H

Hierarchical organization of syntenic blocks in large genomic datasets: Synteny hierarchies for permutations 6 Introduction Synteny hierarchies for permutations Synteny hierarchies for sequences PSyCHO

Common intervals in permutations Hierarchical organization of syntenic blocks in large genomic datasets: Synteny hierarchies for permutations 7 Definition A pair of intervals of two permutations is common if they share the same set of elements.

Synteny hierarchy 8 PQ-tree: [Booth and Lueker, 1976] “ Q ”-node: collinear, “ P ”-node: permute freely Hierarchical organization of syntenic blocks in large genomic datasets: Synteny hierarchies for permutations P P P Q P Q G H Q P Q P P P

Booth and Lueker Hierarchical organization of syntenic blocks in large genomic datasets: Synteny hierarchies for permutations 9 PQ tree construction linear time w.r.t. input size, i.e., number of 1s of an number of markers: n nodes! n × m matrix number of common intervals: m ∈ O ( n 2 ) ... but cubic w.r.t. output size: the PQ tree has only O ( n )

Intervals of a PQ tree Hierarchical organization of syntenic blocks in large genomic datasets: Synteny hierarchies for permutations 10 Definition [Bergeron et al., 2008] The frontier of a node is the set of labels of the leaves of the subtree rooted at this node, or a singleton comprising a leaf label.

Sets of common intervals in permutations Hierarchical organization of syntenic blocks in large genomic datasets: Synteny hierarchies for permutations 11 Definition [Bergeron et al., 2008] i j k l A set of intervals I is closed if (1) , .., ( n ) ∈ I , (1 .. m ) ∈ I , and for each pair of intervals ( i .. k ) , ( j .. l ) ∈ I s.t. i < j ≤ k < l , also ( i .. j ) , ( j .. k ) , ( k .. l ) , ( i .. l ) ∈ I

Commuting sets Hierarchical organization of syntenic blocks in large genomic datasets: Synteny hierarchies for permutations 12 Definition [Bergeron et al., 2008] intervals commute. Two intervals A , B commutes if A ⊆ B or B ⊆ A or A ∩ B = ∅ . ... and a set of intervals I is commuting if all pairs of

Strong intervals Hierarchical organization of syntenic blocks in large genomic datasets: Synteny hierarchies for permutations 13 Definition [Bergeron et al., 2008] Given a set of intervals I , an interval A is strong if it commutes with all intervals B ∈ I . The strong intervals of a closed set of intervals I are the frontier of the PQ tree of I .

Hierarchical organization of syntenic blocks in large genomic datasets: Synteny hierarchies for sequences 14 Introduction Synteny hierarchies for permutations Synteny hierarchies for sequences PSyCHO

SB hierarchy Hierarchical organization of syntenic blocks in large genomic datasets: Synteny hierarchies for sequences 15 Context-dependency two sets of common intervals intersect only if all their intervals intersect in the corresponding sequences G H I

Sets of common intervals in sequences Then there exists a unique PQ -tree with k j i . and I holds true that intervals I such that for the set of strong frontier be a near-closed set of intervals. Hierarchical organization of syntenic blocks in large genomic datasets: Synteny hierarchies for sequences Let Lemma Definition 16 l A set of intervals I is near-closed if (1) , .., ( n ) ∈ I , (1 .. m ) ∈ I , and for each pair of intervals ( i .. k ) , ( j .. l ) ∈ I s.t. i < j ≤ k < l , also ( i .. l ) ∈ I

Sets of common intervals in sequences Hierarchical organization of syntenic blocks in large genomic datasets: Synteny hierarchies for sequences k j i Then there exists a unique PQ -tree with Lemma 16 Definition l A set of intervals I is near-closed if (1) , .., ( n ) ∈ I , (1 .. m ) ∈ I , and for each pair of intervals ( i .. k ) , ( j .. l ) ∈ I s.t. i < j ≤ k < l , also ( i .. l ) ∈ I Let I be a near-closed set of intervals. frontier F such that for the set of strong intervals I ′ ⊆ I holds true that I ′ ⊆ F and |I| ≥ ⌈ 1/2 · |F|⌉ .

Hierarchical organization of syntenic blocks in large genomic datasets: PSyCHO 17 Introduction Synteny hierarchies for permutations Synteny hierarchies for sequences PSyCHO

PSyCHO Hierarchical organization of syntenic blocks in large genomic datasets: PSyCHO 18 PSyCHO Principled Synteny using Common Intervals and Hierarchical Organization http://github.com/danydoerr/PSyCHO

Construction of a synteny hierarchy Hierarchical organization of syntenic blocks in large genomic datasets: PSyCHO 19 raw genomic sequences G H I 1 genome segmentation marker-order sequences marker similarity graph G G G H H H I I I 2 synteny hierarchy construction discovery of homologous SBs G G 3 H H I I

Similarity graph, syntenic contexts, homologous SBs Hierarchical organization of syntenic blocks in large genomic datasets: PSyCHO 20 1. reference-based reconstruction of syntenic contexts 2. handling of insertions/deletions (work in progress) 3. reference-based discovery of homologous syntenic blocks in each context computational problem: enumerating common intervals in k sequences Reference subject to indel handling G 2 G 3 computational problem: finding δ -teams in sequences

Hierarchical orga- nization of syn- tenic blocks in large genomic - PowerPoint PPT Presentation

Hierarchical orga- nization of syn- tenic blocks in large genomic datasets Daniel Doerr Faculty of Technology and Center for Biotechnology (CeBiTec), Bielefeld University Workshop on Data Structures in Bioinformatics, February 4, 2020

Big Data, Blockchain and the Arriva l of the Sma rt Orga nization Connecting Pa rties, Systems a

Blocks What is syntax (delimiters) Where can blocks be used Scope and blocks Do

Getting started with SSH Keys with a free SYN Shop VM Host mrjones SYN Shop Wednesday May 16,

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

Establishing LM IS in the OECS Regi Regi on on The I The I e Interna e Interna

Landside de Acces cess Moder erni nization n Progra ram m (LAMP) LA LAX Los Angeles

Mo Mode derniza nization tion of of Ba Barda da Vo Vocationa ational l Ly Lyceu eum m

STARTER PLANT CONCRETE BLOCKS 1 X 8 INCH Quality building blocks are essential in the safe

Malicious Code Malicious Code for Fun and Profit for Fun and Profit Mihai Christodorescu

Bayesian hierarchical models in Stata Nikolay Balov StataCorp LP 2016 Stata Conference Nikolay

Unsupervised Learning and Clustering Owen Roberts, Zach Busser, Ganesh Sugunan Hierarchical

Work Group: Risk and Review Host: Fox Blocks Work Group: Risk and Review Host: Fox Blocks WG Core

Ari Strauch Five Blocks Inc. Five Blocks is a technology and digital consulting company

Blocks Together Peoples Budget Initiative CHICAGO/CENTRAL PARK TIF 1 BLOCKS TOGETHER - PEOPLES

FBPQ and building blocks FBPQ and building blocks Mark Drye Director of Asset Management

Using Network Flow to Bridge the Gap Using Network Flow to Bridge the Gap between Genotype and

Parameterized Complexity of 1-Planarity Michael J. Bannister, Sergio Cabello, and David Eppstein

MicroRNAs, miRBase and deep sequencing Sam Griffiths-Jones Trainer: Sam Griffiths-Jones He and

append/3 A Drosophila of L.P. As functions: append([], L) = L append([ H | T ], L) = [H |

Evolutionary Systems Biology: multilevel evolution Paulien Hogeweg Theoretical Biology and

Bioinformatics for High-Throughput Sequencing Misha Kapushesky St. Petersburg Russia 2010

Lecture 4 Sequence alignment: how to discover similarities between biological sequences

The Coalescent Evolution backward in time Joachim Hermisson Mathematics and Biosciences Group