algorithms and methods for large scale genome
play

ALGORITHMS AND METHODS FOR LARGE- SCALE GENOME REARRANGEMENTS - PowerPoint PPT Presentation

DEPARTMENT OF COMPUTER ARCHITECTURE DOCTORAL THESIS ALGORITHMS AND METHODS FOR LARGE- SCALE GENOME REARRANGEMENTS IDENTIFICATION Presented by Jose Antonio Arjona Medina Under the supervision of Prof. Dr. Oswaldo Trelles 1 Algorithms and


  1. DEPARTMENT OF COMPUTER ARCHITECTURE DOCTORAL THESIS ALGORITHMS AND METHODS FOR LARGE- SCALE GENOME REARRANGEMENTS IDENTIFICATION Presented by Jose Antonio Arjona Medina Under the supervision of Prof. Dr. Oswaldo Trelles 1

  2. Algorithms and methods for large-scale genome rearrangements identification Jose Antonio Arjona Medina arjona@uma.es Supervised by Dr. Oswaldo Trelles Milan Mann

  3. Publications supporting the thesis • “ Computational Synteny Block: A Framework to Identify Evolutionary Events ”, ( IEEE Transaction in Nano Bioscience, 2015) • “ Refining borders of genome-rearrangements including repetitions ”, ( BMC Genomics, 2016 ) • “ Computational workflow for the fine-grained analysis of metagenomic samples ”, ( BMC Genomics, 2016 ) • “ A multiple comparison framework for Synteny Block detection ” ( IWBBIO, 2017 ) • “ Ancestral sequence reconstruction: A framework to detect Synteny Blocks and revert rearrangements ” (in progress) arjona@uma.es 3

  4. Overview • Introduction • Background • Methods • Results • Conclusions and future work arjona@uma.es 4

  5. Introduction Synteny Blocks, Large-Scale Genome Rearrangements and Break Points General Overview 5

  6. Synteny Blocks • The idea: Conserved blocks that share the same order and strand High Score segments Pairs Synteny Blocks (SBs) (HSPs) produced by GECKO Genome 1: M. bovis PG45 Genome 0: M. agalactiae 5632 arjona@uma.es 6

  7. Large-Scale Genome Rearrangement • A LSGR is an operation that changes the order or the strand of a SB • Inversion Change the strand • Transposition change the order: moves the block to another position within the chromosome • Duplication copy the block • Translocation change the order: moves the block to another position in another chromosome arjona@uma.es 7

  8. Break Point • The point (or the region) in the sequence between two SBs that have suffered a LSGR The SB in the middle has suffered a LSGR (inversion) Dots represent BPs in the sequence arjona@uma.es 8

  9. General Overview HSPs SB and Rearrangements Refining SB GECKO rearrangements reconstruction (multi borders and BPs (Torreño and Trelles, pairwise detection comparison) 2015) (in progress) Starting GECKO-Evol GECKO-Refinement GECKO-CSB point Arjona and Trelles, Arjona, Perez and Arjona and Trelles, 2016 Trelles, 2018? 2015 Meta-GECKO GECKO-MGV Perez, Arjona, Torreño, Diaz del Pino, Arjona, Ulzurrun and Trelles, Torreño, Benavides and 2016 Trelles, 2016 arjona@uma.es 9

  10. Objectives • Formal definition of and detection of SBs • Detection of LSGR and BP • Refinement of SBs borders • Reversion of LSGR arjona@uma.es 10

  11. Background “If I have seen further, it is by standing on the shoulders of giants” 11

  12. State of the art • SB and BP detection – No formal definition (difficult to compare methods) – The granularity problem – The BP contradiction – Dealing with repetitions • Methods to reverse LSGR – Oriented to the “sorting permutation problem” – Reference depended – Not designed for dealing with repetitions arjona@uma.es 12

  13. The granularity problem Granularity SB BP LSGR Fine-grained Many (shorter Many (shorter Small subset and well and be5er of total LSGR conserved) quality) (short cycles) … … … … … … … … … … … … Coarse Few Few Small subset (larger and low (larger and noisy: of total LSGR percentage of Many short SB (Big picture) identity) are included) arjona@uma.es 13

  14. An example Fine-grained Coarse arjona@uma.es 14

  15. The break point contradiction • Rearrangements do not occur randomly • Fragile regions in the sequence, predispose to suffer a LSGR (hotspots) – BP should not be defined as a relation between two genomes – Although comparison is the only way (so far) to detect them – Most methods to refine SB take for granted that BPs are not conserved regions. arjona@uma.es 15

  16. Dealing with repetitions • Driven the evolution in many ways • Mostly associate with mobile elements • Repetitions increase the model complexity – Most methods to detect SBs avoid repetitions arjona@uma.es 16

  17. The sorting permutation problem • Transform one sequence into another (the reference) through operations. • Proven to be NP-hard – A reference is needed – No “natural” way to include repetitions in the model – No use of inside-block information arjona@uma.es 17

  18. Methods Pair-wise comparison method, refining blocks and multiple comparison framework: definitions and methods

  19. Methods Overview • 1) Pairwise SB and LSGR detection (GECKO-CSB) • 2) SB refinement • 3) Multi-genome SB and LSGR detection and reconstruction arjona@uma.es 19

  20. 1-Computational Synteny Blocks: A pair-wise framework to detect LSGR • Set of properties to detect SBs • Arrows represent strand arjona@uma.es 20

  21. 1-Computational Synteny Blocks: A pair- wise framework to detect LSGR • These properties also describe rearrangements arjona@uma.es 21

  22. 2-Synteny Block refinement • Using repetitions to refine (if any) • Does not force the BP to be a point or region arjona@uma.es 22

  23. Refining based on transitions including repeats Illustrative representation of the Region of Interest (ROI). a ROI region in an inversion event (CSB B). (b) Virtual CSBs and repetitions. (c) Same representation but including identity vectors and vector difference graphs arjona@uma.es 23

  24. Finite State Machine to detect identity transitions % Identity SB Repetitions SB FSM detects the coordinates where the vector difference value was the last time at a certain threshold (U1) before reaching the second threshold (U2) arjona@uma.es 24

  25. Result of the refinement 1 2 3 CSBs before and after the refinement . At the end of the refinement process, we detect BPs. We also extract PRASB and GAP sequences to analyse accuracy of the method. PRASB and BP have the same length arjona@uma.es 25

  26. 3-Multiple comparison framework • Motivation – Formal SB definition – Solve the BP contradiction – Solve the granularity problem – No reference-based – Combine sequence information and rearrangements arjona@uma.es 26

  27. The Synteny Block concept • SB has two categories – Block: The sequence – Synteny: The relation with other blocks arjona@uma.es 27

  28. Block Element • Subsequence in the sequence arjona@uma.es 28

  29. Unitary Block Element • A Block Element that does not overlap with others Unitary Block Elements arjona@uma.es 29

  30. Unitary Conserved Element • A Block Element originate from comparison arjona@uma.es 30

  31. The Unitary Conserved Element problem A) Two overlapped HSPs. B) Result of the trimming process. Two fragments are still overlapped. C) New overlapped Conserved Elements trigger a new trimming process. D) Final result of the recursive trimming process. The final pairs of Conserved Elements do not overlap. arjona@uma.es 31

  32. The Unitary Conserved Element problem (II) Representation of the trimming process in a multiple comparison. In the comparison AB there is an inversion, that triggers a trimming process in the comparison BC. As a result, another trimming process is triggered in comparison DC. arjona@uma.es 32

  33. Unitary Synteny Element • A set of Unitary Conserved Elements from different sequences – More than one block – Same length – Every Unitary Conserved Block belong to one and only one Unitary Synteny Element arjona@uma.es 33

  34. Unitary Synteny Element • Graphic representation arjona@uma.es 34

  35. Break Point • Defined as the region (or point) between two Unitary Conserved Elements arjona@uma.es 35

  36. The transitivity property of Synteny Block: Inferred HSP • This method does not increase the number of Unitary Conserved Blocks • It just reveals synteny relations that have not been detected by the chosen comparison method. – Hence, this supports the evidence why SBs must be defined in a N-dimensional space. arjona@uma.es 36

  37. Synteny Block concatenation • If the succession is the same • All these Unitary Conserved Elements conform each a Unitary Synteny Element: • and the sign relation between them is the same along adjacent Elementary Conserved Blocks arjona@uma.es 37

  38. SB concatenation: Example (I) arjona@uma.es 38

  39. Synteny Block concatenation • Then, Unitary Synteny Elements π− 1, π and π +1 can be merged into a single one by concatenating their Unitary Conserved Elements as follows: arjona@uma.es 39

  40. SB concatenation: Example (II) arjona@uma.es 40

  41. Inversions • If • And • Then, either α a or β b , ɣ g ,…, ω o are inversions arjona@uma.es 41

  42. Detection of an Inversion: Example arjona@uma.es 42

  43. Transpositions • If • And • Then, either α a or β b , ɣ g ,…, ω o are transpositions arjona@uma.es 43

  44. Detection of a Transposition: Example arjona@uma.es 44

  45. Insertions and deletions • When concatenating, not detected inserted blocks can be inferred if the length of the new Synteny Element is not the same. – A multiple alignment is needed • An insertion can be detected as follows: arjona@uma.es 45

  46. Detection of an Insertion/ deletion: Example arjona@uma.es 46

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend