mining the semantics of genome super blocks to infer
play

Mining the semantics of genome super-blocks to infer ancestral - PowerPoint PPT Presentation

Mining the semantics of genome super-blocks to infer ancestral architectures Macha Nikolski macha@labri.fr 07/10/2008 Macha Nikolski (Universit e de Bordeaux) AlBio, Moscow 07/10/2008 1 / 27 Introduction Challenge : Uncovering principal


  1. Mining the semantics of genome super-blocks to infer ancestral architectures Macha Nikolski macha@labri.fr 07/10/2008 Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 1 / 27

  2. Introduction Challenge : Uncovering principal events that punctuate the evolution of species Approach : Plausible genome architectures of ancestral genomes Two-fold problem : determine ancestral architectures trace the rearrangement events that lead from the ancestors to contemporary genomes Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 2 / 27

  3. Modeling evolution Hannenhalli and Pevzner theory rearrangement operations inversion • rearrangements common fusion ancestor ? • content change fission translocation Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 3 / 27

  4. Mathematical vs. experimental approach Results from two techniques do not necessarily agree Rearrangement distance Chromosomal painting human, mouse, rat and chicken Eutherian clade ( ≈ 80 sp.) genome sequences hybridization of DNA probes gene ≈ 4 Mb Bourque & Pevzner 2002 Froenike 2006 Bourque & Pevzner 2006 Rocchi 2006 Possible solution : integrate more biological knowledge into the mathematical approach Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 4 / 27

  5. Hannenhalli and Pevzner theory Signed permutation model : -9 -8 +6 +7 +2 +3 +4 +5 +1 Cat (chromosome E1) tp53 p4hb tkl ddx5 scn4a stat5b csf3 hoxb@ supt4h a genome = a set of signed permutations +1 +2 +3 +4 +5 +6 +7 +8 +9 Human (chromosome 17) tp53 stat5b csf3 hoxb@ supt4h ddx5 snc4a tkl p4hb Method : mimicking multichromosomal rearrangement operations by reversals on a single permutation genome Π � � � � � 5 � 8 1 2 3 4 6 7 Reversal � � � � � � 8 1 -4 -3 -2 5 6 7 Translocation � � � � � � 8 1 -6 -5 2 3 4 7 Translocation � � � � � � 6 -1 -8 -5 2 3 4 7 Fusion genome Γ � � � � � � 6 -1 -7 -4 -3 -2 5 8 Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 5 / 27

  6. Ancestors as median genomes Formulation as median genome problem : Given G 1 , ..., G N , find M such that for a distance d N � d ( M , G i ) is minimal i = 1 Different distances : rearrangement, breakpoint, double cut and join This problem is NP-complete event for N = 3 breakpoint distance (Bryant 1998, Pe’er & Shamir 1998) rearrangement distance (Caprara 1999, Caprara 2003) Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 6 / 27

  7. Limitations Misleading to speak of an ancestral genome ⇒ median genome Algorithmic and interpretation problems Computationally intractable, in practice need heuristics High number of equivalent solutions (Bourque & Pevzner 2002, Eriksen 2007) Ideas look for common features present in ancestral genome architecture (re-)introduce biologically pertinent features : breakpoints Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 7 / 27

  8. Adjacencies, breakpoints and frequencies a c Π π 1 π k + 1 ... π l + 1 ... ... π k π l π n breakpoints : a , c ∈ Π and b , d ∈ Γ b d Γ π 1 ... − π k + 1 π l + 1 ... − π l ... π k π n Particular case of telomeres 0 .π 1 and π n . 0 Example G 1 = { 1 2 3 4 , 5 6 } G 2 = { 1 2 3 4 , − 5 6 } G 3 = { 3 1 4 2 − 5 , 6 } G 4 = { 2 1 3 4 , 5 6 } frequency adjacencies 4 6 . 0 3 3 . 4, 0 . 5, 4 . 0 2 5 . 6, 2 . 3, 1 . 2, 0 . 1 1 − 5 . 6, 2.-5, 4.2, 1.4, 1.3, 3.1, 2.1, 0.6, 5.0, 0.3, 0.2 Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 8 / 27

  9. Adjacency graph � g = 2 π i − 1 Hannenhalli & Pevzner 1995 π i − → h = 2 π i Denoted : π i .π j by ( g 1 h 1 ) . ( g 2 h 2 ) π i . − π j by ( g 1 h 1 ) . ( h 2 g 2 ) Example The adjacency graph for a set A = { ( g 1 h 1 ) . ( g 2 h 2 ) } : g 1 h 1 g 2 h 2 4 vertices g 1 , h 1 , g 2 and h 2 two edges stand for elements e 1 = ( g 1 , h 1 ) and e 2 = ( g 2 , h 2 ) . one edge stands for the adjacency e 3 = ( h 1 , g 2 ) Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 9 / 27

  10. Intuition For a set of genomes { G i } , the higher is the frequency of an adjacency, the higher is the probability that it should be present in a median genome. Build partial assemblies of median genomes Build a partition P of adjacencies where each part is composed of 1 inter-dependent adjacencies. P is partially ordered by adjacency frequency of the parts’ elements. Inspect P in decreasing order of its parts, and construct the partial 2 assemblies by favoring adjacencies with higher frequency. Assemble these partial assemblies into potential medians Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 10 / 27

  11. Dependent adjacencies a = ( g a 1 h a 1 ) . ( g a 2 h a 2 ) and b = ( g b 1 h b 1 ) . ( g b 2 h b 2 ) G = ( V , E ) the adjacency graph for { a , b } Defi nition We say that a and b complement each other if if either (i) ∃ v 1 , v 2 ∈ V such that d ( v 1 ) = d ( v 2 ) = 1 and ∀ v � = v i , i ∈ [ 1 , 2 ] we have v � = 0 and d ( v ) = 2, or (ii) ∃ v ∈ V such that v = 0 and ∀ v ∈ V we have d ( v ) = 2. We say that a and b contradict each other if either (i) ∃ v ∈ V such that d ( v ) > 2, or (ii) ∀ v ∈ V we have v � = 0 and d ( v ) = 2. 5 6 1 2 3 4 5 6 1 2 1 2 3 4 3 4 complement cycle contradiction vertex contradiction Adjacency choice for the ancestral genome architecture u ( a ) > 1 : complementary adjacencies : multiple agreement contradictory adjacencies : multiple breakpoints Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 11 / 27

  12. Relative frequency N genomes { G i } , d rearrangement distance C the set of all contradictory adjacencies M a and M b are identical up to two adjacencies Lemma For any pair of adjacencies { a , b } ∈ C and two genomes M a and M b identical up to 2 adjacencies with a ∈ M a and b ∈ M b , it holds that � N i d ( M a , G i ) − � N i d ( M b , G i ) ≤ N . If u ( a ) > u ( b ) � N i d ( M a , G i ) − � N i d ( M b , G i ) ≪ N G b Similarly for the breakpoint distance M a M b G a Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 12 / 27

  13. Groups of adjacencies P ( A ) be a partition of A , set of all adjacencies. P 0 ( A ) : elementary cycles without 0 + singletons Merging of parts ⊔ defines a partition of A such that for any p ∈ ⊔ ( P ( A )) ∃ p 1 ∈ P ( A ) s.t. p = p 1 or ∃ p 1 , p 2 ∈ P ( A ) s.t. p = p 1 ∪ p 2 and moreover ∃ a ∈ p 1 and ∃ b ∈ p 2 s.t. u ( a ) = u ( b ) = u ( p 1 ) = u ( p 2 ) and either a and b are dependent or a and b participate in a cycle c ∈ G without vertex v = 0 s.t. ∀ v ∈ c we have u ( v ) ≥ u ( a ) . Defi nition A group g is a part of ⊔ n ( P 0 ( A )) , where ⊔ n ( P 0 ( A )) is the fixed point of ⊔ . Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 13 / 27

  14. Groups of adjacencies, continued Example G 1 = { 1 2 3 4 , 5 6 } G 2 = { 1 2 3 4 , − 5 6 } G 3 = { 3 1 4 2 − 5 , 6 } G 4 = { 2 1 3 4 , 5 6 } Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 14 / 27

  15. Groups of adjacencies, continued Example G 1 = { ( 1 2 )( 3 4 )( 5 6 )( 7 8 ) , ( 9 10 )( 11 12 ) } G 2 = { ( 1 2 )( 3 4 )( 5 6 )( 7 8 ) , ( 10 9 )( 11 12 ) G 3 = { ( 5 6 )( 1 2 )( 7 8 )( 3 4 )( 10 9 ) , ( 11 12 ) G 4 = { ( 3 4 )( 1 2 )( 5 6 )( 7 8 ) , ( 9 10 )( 11 12 ) } 0 10 11 12 9 6 7 8 2 5 1 4 3 Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 15 / 27

  16. Groups of adjacencies, continued Example G 1 = { ( 1 2 )( 3 4 )( 5 6 )( 7 8 ) , (9 10)(11 12) } G 2 = { ( 1 2 )( 3 4 )( 5 6 )( 7 8 ) , (10 9)(11 12) } G 3 = { ( 5 6 )( 1 2 )( 7 8 )( 3 4 )( 10 9 ) , ( 11 12 ) G 4 = { ( 3 4 )( 1 2 )( 5 6 )( 7 8 ) , (9 10)(11 12) } P 0 ( A ) = { ( 9 10 ) . ( 11 12 ); ( 10 9 ) . ( 11 12 ) }∪ singletons 0 10 11 12 9 6 7 8 2 5 1 4 3 Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 16 / 27

  17. Groups of adjacencies, continued Example G 1 = { ( 1 2 )( 3 4 ) (5 6)(7 8) , ( 9 10 )( 11 12 ) } G 2 = { ( 1 2 )( 3 4 ) (5 6)(7 8) , ( 10 9 )( 11 12 ) G 3 = { ( 5 6 )( 1 2 )( 7 8 )( 3 4 )( 10 9 ) , ( 11 12 ) G 4 = { ( 3 4 )( 1 2 ) (5 6)(7 8), ( 9 10 )( 11 12 ) } P 0 ( A ) = { ( 9 10 ) . ( 11 12 ); ( 10 9 ) . ( 11 12 ) }∪ singletons P 1 ( A ) = P 0 ( A ) ∪ { ( 5 6 ) . ( 7 8 ); ( 7 8 ) . 0 }∪ singletons 0 10 11 12 9 6 7 8 2 5 1 4 3 Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 17 / 27

  18. Groups of adjacencies, continued Example G 1 = { (1 2)(3 4)(5 6) ( 7 8 ) , ( 9 10 )( 11 12 ) } G 2 = { (1 2)(3 4)(5 6) ( 7 8 ) , ( 10 9 )( 11 12 ) G 3 = { ( 5 6 )( 1 2 )( 7 8 )( 3 4 )( 10 9 ) , ( 11 12 ) G 4 = { (3 4)(1 2) ( 5 6 )( 7 8 ) , ( 9 10 )( 11 12 ) } P 0 ( A ) = { ( 9 10 ) . ( 11 12 ); ( 10 9 ) . ( 11 12 ) }∪ singletons P 1 ( A ) = P 0 ( A ) ∪ { ( 5 6 ) . ( 7 8 ); ( 7 8 ) . 0 }∪ singletons P 2 ( A ) = P 1 ( A ) ∪ 0 10 11 12 9 { 0 . ( 1 2 ) , ( 1 2 ) . ( 3 4 ) , ( 3 4 ) . ( 5 6 ) , ( 2 1 ) . ( 4 3 ) }∪ singletons 6 7 8 2 5 1 4 3 Macha Nikolski (Universit´ e de Bordeaux) AlBio, Moscow 07/10/2008 18 / 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend