The Potential of Family-Free Genome Comparison Mar lia D. V. Braga, - PowerPoint PPT Presentation

The Potential of Family-Free Genome Comparison Mar´ ılia D. V. Braga, Cedric Chauve, Daniel Doerr, Katharina Jahn, Jens Stoye, Annelyse Th´ evenin, Roland Wittler (Bielefeld, Bordeaux, Rio de Janeiro, Vancouver) MAGE, 26 August 2013

Introduction Comparative genomics Two levels of genome evolution: Small scale mutations: point mutations Large scale mutations: rearrangements, duplications, insertions, deletions Structural organization provides insights into: phylogeny and evolution gene function and interactions The Potential of Family-Free Genome Comparison (5 / 27) Jens Stoye

1 2 3 4 5 6 7 8 6 5 4 3 2 1 7 8

Introduction Comparative genomics with gene families Picture with gene families: Simple and powerful data type Many databases and tools available Produce reasonable results The Potential of Family-Free Genome Comparison (7 / 27) Jens Stoye

Introduction The Family-free Principle More realistic picture: Computational prediction of gene families is (mostly) unsupervised Do not always correspond to biological gene families Wrong gene family assignments may produce incorrect results in subsequent analyses The Potential of Family-Free Genome Comparison (8 / 27) Jens Stoye

Introduction The Family-free Principle Gene family assignments not necessary: ◮ If subsequent analyses can deal with original data ◮ For example gene similarity scores We may even invert the scenario: ◮ Integrated analysis: ortholog assignments and gene order analysis ◮ Gene family assignment based on positional orthology The Potential of Family-Free Genome Comparison (9 / 27) Jens Stoye

Introduction The Family-free Principle Conserved Gene structures similarities Pairwise Gene set proximities proximities Combined methods... Family-free Rearrangements Principle ...for conserved structure detection, ancestral Single genome reconstruction and Combined operation gene family prediction. operations models Ancestral genome Other applications reconstruction Whole Median- Contig Gene family Phylogenetic genome layouting prediction of-three distances duplication The Potential of Family-Free Genome Comparison (10 / 27) Jens Stoye

Conserved structures Conserved structures Conserved Gene structures similarities Pairwise Gene set proximities proximities Combined methods... Family-free Rearrangements Principle ...for conserved structure detection, ancestral Single genome reconstruction and Combined operation gene family prediction. operations models Ancestral genome Other applications reconstruction Whole Median- Contig Gene family Phylogenetic genome layouting prediction of-three distances duplication The Potential of Family-Free Genome Comparison (11 / 27) Jens Stoye

Conserved structures Gene similarity graph Gene similarity graph of 3 genomes: The Potential of Family-Free Genome Comparison (12 / 27) Jens Stoye

Conserved structures Partial k -matching Partial k -(dimensional) matching Given a gene similarity graph B = ( G 1 , . . . , G k , E ), a partial k-matching M ⊆ E is a selection of edges such that for each connected component C ⊆ B M := ( G 1 , . . . , G k , M ) no two genes in C belong to the same genome. For k = 3: 2 k − 1 = 7 valid components The Potential of Family-Free Genome Comparison (13 / 27) Jens Stoye

Conserved structures Partial k -matching Gene similarity graph of 3 genomes: . . . how to construct such a matching? The Potential of Family-Free Genome Comparison (14 / 27) Jens Stoye

Conserved structures Assessing matching properties Adjacency : proximity relation between two genes Adjacency score for consecutive genes ( g , g ′ ) in genome G and ( h , h ′ ) in genome H : � � if ( g , g ′ ), ( h , h ′ ) form a conserved adjacency w ( e g , h ) · w ( e g ′ , h ′ ) s ( g , g ′ , h , h ′ ) = 0 otherwise The Potential of Family-Free Genome Comparison (15 / 27) Jens Stoye

Conserved structures Assessing matching properties Adjacency : proximity relation between two genes Adjacency score for consecutive genes ( g , g ′ ) in genome G and ( h , h ′ ) in genome H : � � if ( g , g ′ ), ( h , h ′ ) form a conserved adjacency w ( e g , h ) · w ( e g ′ , h ′ ) s ( g , g ′ , h , h ′ ) = 0 otherwise Adjacency measure in M : � � s ( g , g ′ , h , h ′ ) adj ( M ) = g left of g ′ in G G , H h , h ′ in H The Potential of Family-Free Genome Comparison (15 / 27) Jens Stoye

Conserved structures Assessing matching properties Adjacency : proximity relation between two genes Adjacency score for consecutive genes ( g , g ′ ) in genome G and ( h , h ′ ) in genome H : � � if ( g , g ′ ), ( h , h ′ ) form a conserved adjacency w ( e g , h ) · w ( e g ′ , h ′ ) s ( g , g ′ , h , h ′ ) = 0 otherwise Adjacency measure in M : � � s ( g , g ′ , h , h ′ ) adj ( M ) = g left of g ′ in G G , H h , h ′ in H Similarity measure in M : � edg ( M ) = w ( e ) e ∈M The Potential of Family-Free Genome Comparison (15 / 27) Jens Stoye

Conserved structures Family-free Adjacencies Problem Family-free Adjacencies Problem Find matching M that maximizes the following formula: F α ( M ) = α · adj ( M ) + (1 − α ) · edg ( M ) . 0 1 α Similarity Synteny The Potential of Family-Free Genome Comparison (16 / 27) Jens Stoye

Conserved structures Gene set proximities: gene clusters Relaxation: conserved neighborhood up to θ > 0 genes Scoring θ -adjacencies: � � if ( g , g ′ ) and ( h , h ′ ) form a θ -adjacency w ( e g , h ) · w ( e g ′ , h ′ ) s θ ( g , g ′ , h , h ′ ) = 0 otherwise The Potential of Family-Free Genome Comparison (17 / 27) Jens Stoye

Conserved structures Gene set proximities: gene clusters Based on θ -adjacencies we can define gene clusters as pairs of intervals with large maximum weight matching M : The Potential of Family-Free Genome Comparison (18 / 27) Jens Stoye

Conserved structures Gene set proximities: consimilar intervals Calculating a maximum matching for all pairs of intervals is expensive. Therefore use unweighted gene similarity graph Consimilar interval : many edges inside, no edges to neighbors. Algorithm: O ( n 3 ) time The Potential of Family-Free Genome Comparison (19 / 27) Jens Stoye

Conserved structures Gene set proximities: consimilar intervals Calculating a maximum matching for all pairs of intervals is expensive. Therefore use unweighted gene similarity graph Consimilar interval : many edges inside, no edges to neighbors. Algorithm: O ( n 3 ) time Ranking by score of maximum weight matching inside the intervals. The Potential of Family-Free Genome Comparison (19 / 27) Jens Stoye

Rearrangements Rearrangements Conserved Gene structures similarities Pairwise Gene set proximities proximities Combined methods... Family-free Rearrangements Principle ...for conserved structure detection, ancestral Single genome reconstruction and Combined operation gene family prediction. operations models Ancestral genome Other applications reconstruction Whole Median- Contig Gene family Phylogenetic genome layouting prediction of-three distances duplication The Potential of Family-Free Genome Comparison (20 / 27) Jens Stoye

Rearrangements DCJ – Double Cut and Join DCJ accounts for rearrangement events: inversion, translocation, fusion, fission, transposition, block interchange Adjacency graph: distance d DCJ = N − C − I 2 The Potential of Family-Free Genome Comparison (21 / 27) Jens Stoye

Rearrangements DCJ – Double Cut and Join From the gene similarity graph . . . The Potential of Family-Free Genome Comparison (22 / 27) Jens Stoye

The Potential of Family-Free Genome Comparison Mar lia D. V. Braga, - PowerPoint PPT Presentation

The Potential of Family-Free Genome Comparison Mar lia D. V. Braga, Cedric Chauve, Daniel Doerr, Katharina Jahn, Jens Stoye, Annelyse Th evenin, Roland Wittler (Bielefeld, Bordeaux, Rio de Janeiro, Vancouver) MAGE, 26 August 2013 The

Self Study: Yeast Genome Comparison SESSION 4 MARTIN KRZYWINSKI Genome Sciences Centre BC

About Revit Family (NAH) Project Family Management Annotation Family System Family

Genome Reassembly From Fragments 7 January 2019 OSU CSE 1 Genome A genome is the encoding

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics & Computational

Genome Sequencing & Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference

Genome Annotation The steps in genome sequencing Generate genome sequence Assembly ORF

Visualizing ENCODE Data in the UCSC Genome Browser Pauline Fujita, Ph.D. UCSC Genome Bioinformatics

The Mouse Genome The Mouse Genome Database (MGD) Database (MGD) Eppig J.T., et al. (2005). The

Genome 562 February 2015 Week 6 Genome 562 p.1/13 Julian Huxley (1887-1975) Oxford

Genome 562 January 2015 Week 1 Genome 562 p.1/6 Early workers in theoretical population

Genome assembly Mark Stenglein, Todos Santos 2018 Genome assembly is the process of attempting to

Potential Games Matoula Petrolia April 14, 2011 Examples Potential Games Potential vs

Whole Genome Comparison: Project Presentations Felix Heeger, Max Homilius, Ivan Kel, Sabrina

International Cancer Genome Consortium Cancer A Disease of the Genome Challenge in Treating

Current Topics in Genome Analysis Fall 2006 Week 4: Mining Genomic Sequence Data Tyra G.

BayeHem: Bayesian Optimisation of Genome Assembly 1. Genome Assembly 2. Bayesian Optimisation

Machine Learning 2007: Lecture 2 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Parallel Game Tree Search Tsan-sheng Hsu tshsu@iis.sinica.edu.tw

February 2008 Differential Expression, Power, Exploratory Analysis Mauro Delorenzi Bioinformatics

Convex rank tests Anne Shiu Texas A&M University CombinaTexas 8 May 2016 From Algebraic

CARPENTER Biological Datasets Find Closed Patterns in Long Biological Datasets Gene

Machine Learning: Algorithms and Applications Floriano Zini Free University of Bozen-Bolzano

Search problems on Cayley graphs Elena Konstantinova Sobolev Institute of Mathematics

ON A CHINESE BUS DNA IS A CODE SCIENTIFIC INFERENCE: DESIGN IN BIOLOGY 1. The pattern in DNA is

Sambuz

Useful Links

Newsletter

Mail Us

The Potential of Family-Free Genome Comparison Mar lia D. V. Braga, - PowerPoint PPT Presentation

The Potential of Family-Free Genome Comparison Mar lia D. V. Braga, Cedric Chauve, Daniel Doerr, Katharina Jahn, Jens Stoye, Annelyse Th evenin, Roland Wittler (Bielefeld, Bordeaux, Rio de Janeiro, Vancouver) MAGE, 26 August 2013 The

Self Study: Yeast Genome Comparison SESSION 4 MARTIN KRZYWINSKI Genome Sciences Centre BC

About Revit Family (NAH) Project Family Management Annotation Family System Family

Genome Reassembly From Fragments 7 January 2019 OSU CSE 1 Genome A genome is the encoding

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics &amp; Computational

Genome Sequencing &amp; Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference

Genome Annotation The steps in genome sequencing Generate genome sequence Assembly ORF

Visualizing ENCODE Data in the UCSC Genome Browser Pauline Fujita, Ph.D. UCSC Genome Bioinformatics

The Mouse Genome The Mouse Genome Database (MGD) Database (MGD) Eppig J.T., et al. (2005). The

Genome 562 February 2015 Week 6 Genome 562 p.1/13 Julian Huxley (1887-1975) Oxford

Genome 562 January 2015 Week 1 Genome 562 p.1/6 Early workers in theoretical population

Genome assembly Mark Stenglein, Todos Santos 2018 Genome assembly is the process of attempting to

Potential Games Matoula Petrolia April 14, 2011 Examples Potential Games Potential vs

Whole Genome Comparison: Project Presentations Felix Heeger, Max Homilius, Ivan Kel, Sabrina

International Cancer Genome Consortium Cancer A Disease of the Genome Challenge in Treating

Current Topics in Genome Analysis Fall 2006 Week 4: Mining Genomic Sequence Data Tyra G.

BayeHem: Bayesian Optimisation of Genome Assembly 1. Genome Assembly 2. Bayesian Optimisation

Machine Learning 2007: Lecture 2 Instructor: Tim van Erven (Tim.van.Erven@cwi.nl) Website:

Parallel Game Tree Search Tsan-sheng Hsu tshsu@iis.sinica.edu.tw

February 2008 Differential Expression, Power, Exploratory Analysis Mauro Delorenzi Bioinformatics

Convex rank tests Anne Shiu Texas A&amp;M University CombinaTexas 8 May 2016 From Algebraic

CARPENTER Biological Datasets Find Closed Patterns in Long Biological Datasets Gene

Machine Learning: Algorithms and Applications Floriano Zini Free University of Bozen-Bolzano

Search problems on Cayley graphs Elena Konstantinova Sobolev Institute of Mathematics

ON A CHINESE BUS DNA IS A CODE SCIENTIFIC INFERENCE: DESIGN IN BIOLOGY 1. The pattern in DNA is

Sambuz

Useful Links

Newsletter

Mail Us

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics & Computational

Genome Sequencing & Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference

Convex rank tests Anne Shiu Texas A&M University CombinaTexas 8 May 2016 From Algebraic