1
ALGORITHMS AND METHODS FOR LARGE- SCALE GENOME REARRANGEMENTS IDENTIFICATION
Presented by
Jose Antonio Arjona Medina
Under the supervision of
- Prof. Dr. Oswaldo Trelles
DOCTORAL THESIS DEPARTMENT OF COMPUTER ARCHITECTURE
ALGORITHMS AND METHODS FOR LARGE- SCALE GENOME REARRANGEMENTS - - PowerPoint PPT Presentation
DEPARTMENT OF COMPUTER ARCHITECTURE DOCTORAL THESIS ALGORITHMS AND METHODS FOR LARGE- SCALE GENOME REARRANGEMENTS IDENTIFICATION Presented by Jose Antonio Arjona Medina Under the supervision of Prof. Dr. Oswaldo Trelles 1 Algorithms and
1
Presented by
Jose Antonio Arjona Medina
Under the supervision of
DOCTORAL THESIS DEPARTMENT OF COMPUTER ARCHITECTURE
Supervised by Dr. Oswaldo Trelles
Milan Mann
arjona@uma.es 3
arjona@uma.es 4
5
arjona@uma.es 6
Genome 0: M. agalactiae 5632 Genome 1: M. bovis PG45
High Score segments Pairs (HSPs) produced by GECKO
Synteny Blocks (SBs)
arjona@uma.es 7
Change the strand
change the order: moves the block to another position within the chromosome
copy the block
change the order: moves the block to another position in another chromosome
arjona@uma.es 8
The SB in the middle has suffered a LSGR (inversion) Dots represent BPs in the sequence
arjona@uma.es 9
HSPs GECKO (Torreño and Trelles, 2015)
SB and rearrangements pairwise detection Starting point GECKO-CSB Arjona and Trelles, 2015 Refining SB borders and BPs GECKO-Refinement Arjona and Trelles, 2016 Rearrangements reconstruction (multi comparison)
(in progress)
GECKO-Evol Arjona, Perez and Trelles, 2018?
GECKO-MGV Diaz del Pino, Arjona, Torreño, Benavides and Trelles, 2016 Meta-GECKO Perez, Arjona, Torreño, Ulzurrun and Trelles, 2016
arjona@uma.es 10
11
arjona@uma.es 12
arjona@uma.es 13
Granularity Fine-grained Coarse BP Many (shorter and be5er quality) Few (larger and noisy: Many short SB are included) SB Many (shorter and well conserved) Few (larger and low percentage of identity) LSGR Small subset
(short cycles) Small subset
(Big picture) … … … … … … … … … … … …
arjona@uma.es 14
Fine-grained Coarse
arjona@uma.es 15
arjona@uma.es 16
arjona@uma.es 17
arjona@uma.es 19
arjona@uma.es 20
arjona@uma.es 21
arjona@uma.es 22
arjona@uma.es 23
Illustrative representation of the Region of Interest (ROI). a ROI region in an inversion event (CSB B). (b) Virtual CSBs and repetitions. (c) Same representation but including identity vectors and vector difference graphs
arjona@uma.es 24
FSM detects the coordinates where the vector difference value was the last time at a certain threshold (U1) before reaching the second threshold (U2) SB SB Repetitions % Identity
arjona@uma.es 25
CSBs before and after the refinement. At the end of the refinement process, we detect BPs. We also extract PRASB and GAP sequences to analyse accuracy of the method. PRASB and BP have the same length 1 2 3
arjona@uma.es 26
arjona@uma.es 27
arjona@uma.es 28
arjona@uma.es 29
arjona@uma.es 30
arjona@uma.es 31
A) Two overlapped HSPs. B) Result of the trimming process. Two fragments are still overlapped. C) New overlapped Conserved Elements trigger a new trimming process. D) Final result of the recursive trimming process. The final pairs of Conserved Elements do not overlap.
arjona@uma.es 32
Representation of the trimming process in a multiple comparison. In the comparison AB there is an inversion, that triggers a trimming process in the comparison BC. As a result, another trimming process is triggered in comparison DC.
arjona@uma.es 33
arjona@uma.es 34
arjona@uma.es 35
arjona@uma.es 36
arjona@uma.es 37
arjona@uma.es 38
arjona@uma.es 39
arjona@uma.es 40
arjona@uma.es 41
arjona@uma.es 42
arjona@uma.es 43
arjona@uma.es 44
arjona@uma.es 45
arjona@uma.es 46
arjona@uma.es 47
arjona@bioinf.jku.at 48
arjona@bioinf.jku.at 49
arjona@uma.es 50
51
arjona@uma.es 52
arjona@uma.es 53
arjona@uma.es 54
arjona@uma.es 55
(a) Gecko-CSB detects three SBs (A,B and C). (b) progressiveMauve detects
(a) Gecko-CSB detects one SB. (b) progressiveMauve detects three SBs (B,C and D).
arjona@uma.es 56
arjona@uma.es 57
CSBs before and after the refinement. At the end of the refinement process, we detect BPs. We also extract PRASB and GAP sequences to analyse accuracy of the method. PRASB and BP have the same length 1 2 3
arjona@bioinf.jku.at 58
arjona@bioinf.jku.at 59
arjona@uma.es 61
arjona@uma.es 62
arjona@uma.es 63
arjona@uma.es 64
65