Mining the semantics of genome super-blocks to infer ancestral - PowerPoint PPT Presentation

Mining the semantics of genome super-blocks to infer ancestral architectures Macha Nikolski macha@labri.fr 07/10/2008 Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 1 / 27

Introduction Challenge : Uncovering principal events that punctuate the evolution of species Approach : Plausible genome architectures of ancestral genomes Two-fold problem : determine ancestral architectures trace the rearrangement events that lead from the ancestors to contemporary genomes Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 2 / 27

Modeling evolution Hannenhalli and Pevzner theory rearrangement operations inversion • rearrangements common fusion ancestor ? • content change fission translocation Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 3 / 27

Mathematical vs. experimental approach Results from two techniques do not necessarily agree Rearrangement distance Chromosomal painting human, mouse, rat and chicken Eutherian clade ( ≈ 80 sp.) genome sequences hybridization of DNA probes gene ≈ 4 Mb Bourque & Pevzner 2002 Froenike 2006 Bourque & Pevzner 2006 Rocchi 2006 Possible solution : integrate more biological knowledge into the mathematical approach Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 4 / 27

Hannenhalli and Pevzner theory Signed permutation model : -9 -8 +6 +7 +2 +3 +4 +5 +1 Cat (chromosome E1) tp53 p4hb tkl ddx5 scn4a stat5b csf3 hoxb@ supt4h a genome = a set of signed permutations +1 +2 +3 +4 +5 +6 +7 +8 +9 Human (chromosome 17) tp53 stat5b csf3 hoxb@ supt4h ddx5 snc4a tkl p4hb Method : mimicking multichromosomal rearrangement operations by reversals on a single permutation genome Π � � � � � 5 � 8 1 2 3 4 6 7 Reversal � � � � � � 8 1 -4 -3 -2 5 6 7 Translocation � � � � � � 8 1 -6 -5 2 3 4 7 Translocation � � � � � � 6 -1 -8 -5 2 3 4 7 Fusion genome Γ � � � � � � 6 -1 -7 -4 -3 -2 5 8 Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 5 / 27

Ancestors as median genomes Formulation as median genome problem : Given G 1 , ..., G N , find M such that for a distance d N � d ( M , G i ) is minimal i = 1 Different distances : rearrangement, breakpoint, double cut and join This problem is NP-complete event for N = 3 breakpoint distance (Bryant 1998, Pe’er & Shamir 1998) rearrangement distance (Caprara 1999, Caprara 2003) Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 6 / 27

Limitations Misleading to speak of an ancestral genome ⇒ median genome Algorithmic and interpretation problems Computationally intractable, in practice need heuristics High number of equivalent solutions (Bourque & Pevzner 2002, Eriksen 2007) Ideas look for common features present in ancestral genome architecture (re-)introduce biologically pertinent features : breakpoints Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 7 / 27

Adjacencies, breakpoints and frequencies a c Π π 1 π k + 1 ... π l + 1 ... ... π k π l π n breakpoints : a , c ∈ Π and b , d ∈ Γ b d Γ π 1 ... − π k + 1 π l + 1 ... − π l ... π k π n Particular case of telomeres 0 .π 1 and π n . 0 Example G 1 = { 1 2 3 4 , 5 6 } G 2 = { 1 2 3 4 , − 5 6 } G 3 = { 3 1 4 2 − 5 , 6 } G 4 = { 2 1 3 4 , 5 6 } frequency adjacencies 4 6 . 0 3 3 . 4, 0 . 5, 4 . 0 2 5 . 6, 2 . 3, 1 . 2, 0 . 1 1 − 5 . 6, 2.-5, 4.2, 1.4, 1.3, 3.1, 2.1, 0.6, 5.0, 0.3, 0.2 Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 8 / 27

Adjacency graph � g = 2 π i − 1 Hannenhalli & Pevzner 1995 π i − → h = 2 π i Denoted : π i .π j by ( g 1 h 1 ) . ( g 2 h 2 ) π i . − π j by ( g 1 h 1 ) . ( h 2 g 2 ) Example The adjacency graph for a set A = { ( g 1 h 1 ) . ( g 2 h 2 ) } : g 1 h 1 g 2 h 2 4 vertices g 1 , h 1 , g 2 and h 2 two edges stand for elements e 1 = ( g 1 , h 1 ) and e 2 = ( g 2 , h 2 ) . one edge stands for the adjacency e 3 = ( h 1 , g 2 ) Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 9 / 27

Intuition For a set of genomes { G i } , the higher is the frequency of an adjacency, the higher is the probability that it should be present in a median genome. Build partial assemblies of median genomes Build a partition P of adjacencies where each part is composed of 1 inter-dependent adjacencies. P is partially ordered by adjacency frequency of the parts’ elements. Inspect P in decreasing order of its parts, and construct the partial 2 assemblies by favoring adjacencies with higher frequency. Assemble these partial assemblies into potential medians Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 10 / 27

Dependent adjacencies a = ( g a 1 h a 1 ) . ( g a 2 h a 2 ) and b = ( g b 1 h b 1 ) . ( g b 2 h b 2 ) G = ( V , E ) the adjacency graph for { a , b } Defi nition We say that a and b complement each other if if either (i) ∃ v 1 , v 2 ∈ V such that d ( v 1 ) = d ( v 2 ) = 1 and ∀ v � = v i , i ∈ [ 1 , 2 ] we have v � = 0 and d ( v ) = 2, or (ii) ∃ v ∈ V such that v = 0 and ∀ v ∈ V we have d ( v ) = 2. We say that a and b contradict each other if either (i) ∃ v ∈ V such that d ( v ) > 2, or (ii) ∀ v ∈ V we have v � = 0 and d ( v ) = 2. 5 6 1 2 3 4 5 6 1 2 1 2 3 4 3 4 complement cycle contradiction vertex contradiction Adjacency choice for the ancestral genome architecture u ( a ) > 1 : complementary adjacencies : multiple agreement contradictory adjacencies : multiple breakpoints Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 11 / 27

Relative frequency N genomes { G i } , d rearrangement distance C the set of all contradictory adjacencies M a and M b are identical up to two adjacencies Lemma For any pair of adjacencies { a , b } ∈ C and two genomes M a and M b identical up to 2 adjacencies with a ∈ M a and b ∈ M b , it holds that � N i d ( M a , G i ) − � N i d ( M b , G i ) ≤ N . If u ( a ) > u ( b ) � N i d ( M a , G i ) − � N i d ( M b , G i ) ≪ N G b Similarly for the breakpoint distance M a M b G a Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 12 / 27

Groups of adjacencies P ( A ) be a partition of A , set of all adjacencies. P 0 ( A ) : elementary cycles without 0 + singletons Merging of parts ⊔ defines a partition of A such that for any p ∈ ⊔ ( P ( A )) ∃ p 1 ∈ P ( A ) s.t. p = p 1 or ∃ p 1 , p 2 ∈ P ( A ) s.t. p = p 1 ∪ p 2 and moreover ∃ a ∈ p 1 and ∃ b ∈ p 2 s.t. u ( a ) = u ( b ) = u ( p 1 ) = u ( p 2 ) and either a and b are dependent or a and b participate in a cycle c ∈ G without vertex v = 0 s.t. ∀ v ∈ c we have u ( v ) ≥ u ( a ) . Defi nition A group g is a part of ⊔ n ( P 0 ( A )) , where ⊔ n ( P 0 ( A )) is the fixed point of ⊔ . Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 13 / 27

Groups of adjacencies, continued Example G 1 = { 1 2 3 4 , 5 6 } G 2 = { 1 2 3 4 , − 5 6 } G 3 = { 3 1 4 2 − 5 , 6 } G 4 = { 2 1 3 4 , 5 6 } Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 14 / 27

Groups of adjacencies, continued Example G 1 = { ( 1 2 )( 3 4 )( 5 6 )( 7 8 ) , ( 9 10 )( 11 12 ) } G 2 = { ( 1 2 )( 3 4 )( 5 6 )( 7 8 ) , ( 10 9 )( 11 12 ) G 3 = { ( 5 6 )( 1 2 )( 7 8 )( 3 4 )( 10 9 ) , ( 11 12 ) G 4 = { ( 3 4 )( 1 2 )( 5 6 )( 7 8 ) , ( 9 10 )( 11 12 ) } 0 10 11 12 9 6 7 8 2 5 1 4 3 Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 15 / 27

Groups of adjacencies, continued Example G 1 = { ( 1 2 )( 3 4 )( 5 6 )( 7 8 ) , (9 10)(11 12) } G 2 = { ( 1 2 )( 3 4 )( 5 6 )( 7 8 ) , (10 9)(11 12) } G 3 = { ( 5 6 )( 1 2 )( 7 8 )( 3 4 )( 10 9 ) , ( 11 12 ) G 4 = { ( 3 4 )( 1 2 )( 5 6 )( 7 8 ) , (9 10)(11 12) } P 0 ( A ) = { ( 9 10 ) . ( 11 12 ); ( 10 9 ) . ( 11 12 ) }∪ singletons 0 10 11 12 9 6 7 8 2 5 1 4 3 Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 16 / 27

Groups of adjacencies, continued Example G 1 = { ( 1 2 )( 3 4 ) (5 6)(7 8) , ( 9 10 )( 11 12 ) } G 2 = { ( 1 2 )( 3 4 ) (5 6)(7 8) , ( 10 9 )( 11 12 ) G 3 = { ( 5 6 )( 1 2 )( 7 8 )( 3 4 )( 10 9 ) , ( 11 12 ) G 4 = { ( 3 4 )( 1 2 ) (5 6)(7 8), ( 9 10 )( 11 12 ) } P 0 ( A ) = { ( 9 10 ) . ( 11 12 ); ( 10 9 ) . ( 11 12 ) }∪ singletons P 1 ( A ) = P 0 ( A ) ∪ { ( 5 6 ) . ( 7 8 ); ( 7 8 ) . 0 }∪ singletons 0 10 11 12 9 6 7 8 2 5 1 4 3 Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 17 / 27

Groups of adjacencies, continued Example G 1 = { (1 2)(3 4)(5 6) ( 7 8 ) , ( 9 10 )( 11 12 ) } G 2 = { (1 2)(3 4)(5 6) ( 7 8 ) , ( 10 9 )( 11 12 ) G 3 = { ( 5 6 )( 1 2 )( 7 8 )( 3 4 )( 10 9 ) , ( 11 12 ) G 4 = { (3 4)(1 2) ( 5 6 )( 7 8 ) , ( 9 10 )( 11 12 ) } P 0 ( A ) = { ( 9 10 ) . ( 11 12 ); ( 10 9 ) . ( 11 12 ) }∪ singletons P 1 ( A ) = P 0 ( A ) ∪ { ( 5 6 ) . ( 7 8 ); ( 7 8 ) . 0 }∪ singletons P 2 ( A ) = P 1 ( A ) ∪ 0 10 11 12 9 { 0 . ( 1 2 ) , ( 1 2 ) . ( 3 4 ) , ( 3 4 ) . ( 5 6 ) , ( 2 1 ) . ( 4 3 ) }∪ singletons 6 7 8 2 5 1 4 3 Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 18 / 27

Mining the semantics of genome super-blocks to infer ancestral - PowerPoint PPT Presentation

Mining the semantics of genome super-blocks to infer ancestral architectures Macha Nikolski macha@labri.fr 07/10/2008 Macha Nikolski (Universit e de Bordeaux) AlBio, Moscow 07/10/2008 1 / 27 Introduction Challenge : Uncovering principal

Blocks What is syntax (delimiters) Where can blocks be used Scope and blocks Do

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

Genome Reassembly From Fragments 7 January 2019 OSU CSE 1 Genome A genome is the encoding

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics & Computational

Genome Sequencing & Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference

UNVEILING THE SUPER ORBITAL UNVEILING THE SUPER ORBITAL UNVEILING THE SUPER-ORBITAL UNVEILING

SYNTHESIS OF SUPER SYNTHESIS OF SUPER NANOPOROUS SYNTHESIS OF SUPER SYNTHESIS OF

Current Topics in Genome Analysis Fall 2006 Week 4: Mining Genomic Sequence Data Tyra G.

Finding Inter-procedural Bugs at Scale with Infer Jules Villard <jul@fb.com> Facebook London

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Operational Semantics 1 / 14 Outline What is semantics? Operational Semantics What is

15-411: Dynamic Semantics Jan Ho ff mann Dynamic Semantics Static semantics: definition of

Genome Annotation The steps in genome sequencing Generate genome sequence Assembly ORF

Visualizing ENCODE Data in the UCSC Genome Browser Pauline Fujita, Ph.D. UCSC Genome Bioinformatics

The Mouse Genome The Mouse Genome Database (MGD) Database (MGD) Eppig J.T., et al. (2005). The

Schema Theory David White Wesleyan University November 30, 2009 Building Block Hypothesis

1 Analysis of SNP data with R The data we will be working with here can be read into R using the

CS681: Advanced Topics in Computational Biology Week 8 Lecture 1 Can Alkan EA224

Genome Wide SNP Selection with Entropy Based Methods Zhenqiu Liu University of Maryland

Dynamic Programming Part 2 Algorithm Theory WS 2012/13 Fabian Kuhn Dynamic Programming

Garnett Wilson and Wolfgang Banzhaf Memorial University of Newfoundland, St. Johns, NL, Canada

Short Reads Alignment to a Reference Genome Joanna Krupka CRUK Summer School in Bioinformatics

Sequence Ranges Paula Andrea Martinez, PhD. Data scientist DataCamp Introduction to

Sambuz

Useful Links

Newsletter

Mail Us

Mining the semantics of genome super-blocks to infer ancestral - PowerPoint PPT Presentation

Mining the semantics of genome super-blocks to infer ancestral architectures Macha Nikolski macha@labri.fr 07/10/2008 Macha Nikolski (Universit e de Bordeaux) AlBio, Moscow 07/10/2008 1 / 27 Introduction Challenge : Uncovering principal

Blocks What is syntax (delimiters) Where can blocks be used Scope and blocks Do

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

Genome Reassembly From Fragments 7 January 2019 OSU CSE 1 Genome A genome is the encoding

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics &amp; Computational

Genome Sequencing &amp; Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference

UNVEILING THE SUPER ORBITAL UNVEILING THE SUPER ORBITAL UNVEILING THE SUPER-ORBITAL UNVEILING

SYNTHESIS OF SUPER SYNTHESIS OF SUPER NANOPOROUS SYNTHESIS OF SUPER SYNTHESIS OF

Current Topics in Genome Analysis Fall 2006 Week 4: Mining Genomic Sequence Data Tyra G.

Finding Inter-procedural Bugs at Scale with Infer Jules Villard &lt;jul@fb.com&gt; Facebook London

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Operational Semantics 1 / 14 Outline What is semantics? Operational Semantics What is

15-411: Dynamic Semantics Jan Ho ff mann Dynamic Semantics Static semantics: definition of

Genome Annotation The steps in genome sequencing Generate genome sequence Assembly ORF

Visualizing ENCODE Data in the UCSC Genome Browser Pauline Fujita, Ph.D. UCSC Genome Bioinformatics

The Mouse Genome The Mouse Genome Database (MGD) Database (MGD) Eppig J.T., et al. (2005). The

Schema Theory David White Wesleyan University November 30, 2009 Building Block Hypothesis

1 Analysis of SNP data with R The data we will be working with here can be read into R using the

CS681: Advanced Topics in Computational Biology Week 8 Lecture 1 Can Alkan EA224

Genome Wide SNP Selection with Entropy Based Methods Zhenqiu Liu University of Maryland

Dynamic Programming Part 2 Algorithm Theory WS 2012/13 Fabian Kuhn Dynamic Programming

Garnett Wilson and Wolfgang Banzhaf Memorial University of Newfoundland, St. Johns, NL, Canada

Short Reads Alignment to a Reference Genome Joanna Krupka CRUK Summer School in Bioinformatics

Sequence Ranges Paula Andrea Martinez, PhD. Data scientist DataCamp Introduction to

Sambuz

Useful Links

Newsletter

Mail Us

Whole Genome Analysis and Annotation Adam Siepel Biological Statistics & Computational

Genome Sequencing & Analysis Core Resource Olivier Fedrigo Friday, October 19, 12 Reference

Finding Inter-procedural Bugs at Scale with Infer Jules Villard <jul@fb.com> Facebook London