an algorithmic view on multi related segments a unifying
play

An Algorithmic View on Multi-related-segments: a unifying model for - PowerPoint PPT Presentation

An Algorithmic View on Multi-related-segments: a unifying model for approximate common interval X.Yang F .Sikora G.Blin S.Hamel R.Rizzi S.Aluru GSAP , Broad Institute of MIT & Harvard USA Universit e Paris-Est, LIGM, UMR 8049


  1. An Algorithmic View on Multi-related-segments: a unifying model for approximate common interval X.Yang F .Sikora G.Blin S.Hamel R.Rizzi S.Aluru GSAP , Broad Institute of MIT & Harvard – USA Universit´ e Paris-Est, LIGM, UMR 8049 – France DIRO - Universit´ e de Montr´ eal - QC – Canada DIMI - Universit` a di Udine - Udine – Italy Lehrstuhl f¨ ur Bioinformatik, Friedrich-Schiller-Universit¨ at Jena – Germany DECE, Iowa State University – USA May 2012 Guillaume Blin An Algorithmic View on MRS

  2. Comparing genomes ◮ A set of genes that are proximately located on multiple chromosomes often implies their origin from the same ancestral genomic segment or their involvment in the same biological process ◮ . . . seeking for gene clusters between genomes. ◮ A gene cluster = a set of genes appearing, in spatial proximity along chromosomes. Guillaume Blin An Algorithmic View on MRS

  3. Key properties for modeling gene proximity ◮ Hoberman and Durand 2005: Based on observing the co-occurrence of a gene set A ( ancestral genes ) in different chromosomal segments ◮ A is subject to evolutionary constraints A β = 2 ǫ m = 3 1 α = 1 2 ǫ l = 4 3 ǫ t = 19 Guillaume Blin An Algorithmic View on MRS

  4. Key properties for modeling gene proximity ◮ Hoberman and Durand 2005: Based on observing the co-occurrence of a gene set A ( ancestral genes ) in different chromosomal segments ◮ A is subject to evolutionary constraints A β = 2 ǫ m = 3 1 α = 1 2 ǫ l = 4 3 ǫ t = 19 ◮ evidence of any gene of interest as being ancestral : observing a minimum β occurrences of any gene of A ⇒ reducing the possibility of misinterpreting what is in fact a chance occurrence Guillaume Blin An Algorithmic View on MRS

  5. Key properties for modeling gene proximity ◮ Hoberman and Durand 2005: Based on observing the co-occurrence of a gene set A ( ancestral genes ) in different chromosomal segments ◮ A is subject to evolutionary constraints A β = 2 ǫ m = 3 1 α = 1 2 ǫ l = 4 3 ǫ t = 19 ◮ evidence of any gene of interest as being ancestral : β ◮ sufficient contribution of each segment to A : each segment contains at least ǫ m different ancestral genes Guillaume Blin An Algorithmic View on MRS

  6. Key properties for modeling gene proximity ◮ Hoberman and Durand 2005: Based on observing the co-occurrence of a gene set A ( ancestral genes ) in different chromosomal segments ◮ A is subject to evolutionary constraints A β = 2 ǫ m = 3 1 α = 1 2 ǫ l = 4 3 ǫ t = 19 ◮ evidence of any gene of interest as being ancestral : β ◮ sufficient contribution of each segment to A : ǫ m ◮ local and global ancestral gene density : at most α interleaving genes between two consecutive ancestral genes and a maximum ǫ l gene losses per segment with a maximum ǫ t total gene losses among all segments Guillaume Blin An Algorithmic View on MRS

  7. Existing models ◮ Gene clusters definition ◮ conserved segments , common intervals , conserved intervals , gene teams , approximate common intervals ◮ Conserved segments – which require a full conservation Guillaume Blin An Algorithmic View on MRS

  8. Existing models ◮ Gene clusters definition ◮ conserved segments , common intervals , conserved intervals , gene teams , approximate common intervals ◮ Common intervals – genes must occur consecutively, regardless of their order Guillaume Blin An Algorithmic View on MRS

  9. Existing models ◮ Gene clusters definition ◮ conserved segments , common intervals , conserved intervals , gene teams , approximate common intervals ◮ Conserved intervals – common intervals, framed by the same two genes Guillaume Blin An Algorithmic View on MRS

  10. Existing models ◮ Gene clusters definition ◮ conserved segments , common intervals , conserved intervals , gene teams , approximate common intervals ◮ Gene teams – genes in a cluster must not be interrupted by long stretches of genes not belonging to the cluster Guillaume Blin An Algorithmic View on MRS

  11. Existing models ◮ Gene clusters definition ◮ conserved segments , common intervals , conserved intervals , gene teams , approximate common intervals ◮ Approximate common intervals – common intervals that may contain few genes from outside the cluster Guillaume Blin An Algorithmic View on MRS

  12. M ULTI - RELATED - SEGMENTS model ◮ A unified model to capture approximate common intervals ◮ A M RS is a set of maximal segments capturing previoulsy mentioned key properties ( { β, ǫ m , α, ǫ l , ǫ t } ) ◮ It captures existing models: ◮ M RS = CI when β = k , ǫ m = | A | and α = 0 ◮ M RS = GT when α ≥ 0 ◮ M RS further captures gene loss events without strong pairwise similarity information A β = 2 ǫ m = 3 1 α = 1 2 ǫ l = 4 3 ǫ t = 19 Guillaume Blin An Algorithmic View on MRS

  13. Finding M RS ◮ The problem consists then in identifying the M RS in a set of k chromosomes ◮ Considering the ancestral gene set A as a priori known, the problem, termed L OCATE M RS , then corresponds to locate, given k chromosomes S = { S 1 , S 2 , . . . , S k } represented as strings, a feasible M RS originating from A . ◮ L OCATE M RS is NP -hard even in the restricted case where S i ’s are permutations and no gene insertion are allowed ( α = 0) ⇒ reduction from Exact-Cover by 3-Sets ◮ L OCATE M RS is fixed-parameter tractable considering parameter | A | when α = 0 Guillaume Blin An Algorithmic View on MRS

  14. Finding M RS ◮ When A is unknown, identifying all M RS is hard to approximate (APX-hard by reduction from Minimum Set Cover) even in the restricted case where S i ’s are permutations ◮ With the removal of the maximum number of gene loss constraint (i.e. ǫ t = ∞ ) and the maximum number of substrings per input sequence constraint (i.e. α = ∞ ), a polynomial algorithm can be derived. Guillaume Blin An Algorithmic View on MRS

  15. L OCATE M RS is FPT ◮ To show this, we provide a dynamic programming solution. A 1 2 3 ◮ Segments can be pruned considering ǫ l and A . Since α = 0, one has to select exactly one substring of interest in each sequence S j . Guillaume Blin An Algorithmic View on MRS

  16. L OCATE M RS is FPT ◮ To show this, we provide a dynamic programming solution. A 1 2 3 ◮ A naive algorithm = try all such combinations and check parameters. ⇒ an exponential running time. Guillaume Blin An Algorithmic View on MRS

  17. L OCATE M RS is FPT ◮ To show this, we provide a dynamic programming solution. A 1 2 3 ◮ A naive algorithm = try all such combinations and check parameters. ⇒ an exponential running time. Guillaume Blin An Algorithmic View on MRS

  18. L OCATE M RS is FPT ◮ To show this, we provide a dynamic programming solution. A 1 2 3 ◮ A naive algorithm = try all such combinations and check parameters. ⇒ an exponential running time. Guillaume Blin An Algorithmic View on MRS

  19. L OCATE M RS is FPT ◮ To show this, we provide a dynamic programming solution. A 1 2 3 ◮ By using an efficient dynamic programming strategy, one may hold the exponential factor in the size of the ancestral gene set. Guillaume Blin An Algorithmic View on MRS

  20. L OCATE M RS is FPT ◮ To show this, we provide a dynamic programming solution. A 1 2 3 ◮ No need to compute the exact number of times each character occurs but only to ensure that it occurs in at least β (usually β = 2) substrings in the solution. Guillaume Blin An Algorithmic View on MRS

  21. L OCATE M RS is FPT ◮ To show this, we provide a dynamic programming solution. A 1 2 3 ◮ Consider a fixed ordering of characters ( a 1 , a 2 , . . . , a | A | ) of A , one has to store a count vector C = ( c 1 , c 2 , . . . , c | A | ) , where c i ∈ { 0 , 1 , . . . , β } denotes the number of substrings containing a i . Here, C = ( 2 , 2 , 1 , 0 , 2 , 1 , 0 ) Guillaume Blin An Algorithmic View on MRS

  22. L OCATE M RS is FPT ◮ To show this, we provide a dynamic programming solution. A 1 S 1 S 2 S 3 1 1 1 2 S 1 S 3 S 2 2 2 2 3 S 1 S 3 S 2 3 3 3 ◮ The main property of this representation is that, given A , there are only β | A | possible vectors. Guillaume Blin An Algorithmic View on MRS

  23. L OCATE M RS is FPT ◮ To show this, we provide a dynamic programming solution. A 1 S 1 S 2 S 3 1 1 1 2 S 1 S 3 S 2 2 2 2 3 S 1 S 3 S 2 3 3 3 ◮ We define a boolean dynamic table D indexed by the last substring added to the solution and the vector C for the current solution. D ( S i j , ( c 1 , . . . c | A | )) Guillaume Blin An Algorithmic View on MRS

  24. L OCATE M RS is FPT ◮ To show this, we provide a dynamic programming solution. A 1 S 1 S 2 S 3 1 1 1 2 S 1 S 3 S 2 2 2 2 3 S 1 S 3 S 2 3 3 3 ◮ We then proceed by row D ( S 1 1 , ( 1 , 1 , 1 , 0 , 0 , 0 , 0 ) = 1 D ( S 2 1 , ( 0 , 0 , 0 , 1 , 1 , 1 , 0 ) = 1 D ( S 3 1 , ( 0 , 0 , 0 , 0 , 1 , 0 , 0 ) = 1 Guillaume Blin An Algorithmic View on MRS

  25. L OCATE M RS is FPT ◮ To show this, we provide a dynamic programming solution. A 1 S 1 S 2 S 3 1 1 1 2 S 1 S 3 S 2 2 2 2 3 S 1 S 3 S 2 3 3 3 ◮ We then proceed by row D ( S 1 2 , ( 0 , 0 , 1 , 0 , 0 , 1 , 1 ) = 1 Guillaume Blin An Algorithmic View on MRS

  26. L OCATE M RS is FPT ◮ To show this, we provide a dynamic programming solution. A 1 S 1 S 2 S 3 1 1 1 2 S 1 S 3 S 2 2 2 2 3 S 1 S 3 S 2 3 3 3 ◮ We then proceed by row D ( S 1 2 , ( 0 , 0 , 1 , 0 , 0 , 1 , 1 ) = 1 D ( S 1 2 , ( 1 , 1 , 2 , 0 , 0 , 1 , 1 ) = 1 Guillaume Blin An Algorithmic View on MRS

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend