lecture 9 mapping reads to a reference burrows wheeler
play

Lecture 9: Mapping Reads to a Reference Burrows Wheeler Transform - PowerPoint PPT Presentation

Lecture 9: Mapping Reads to a Reference Burrows Wheeler Transform and FM Index Spring 2020 March 3,5, 2020 1 Outline Problem Definition Different Solutions Burrows-Wheeler Transformation (BWT) Ferragina-Manzini (FM) Index


  1. Lecture 9: Mapping Reads to a Reference – Burrows Wheeler Transform and FM Index Spring 2020 March 3,5, 2020 1

  2. Outline — Problem Definition — Different Solutions — Burrows-Wheeler Transformation (BWT) — Ferragina-Manzini (FM) Index — Search Using FM Index — Alignment Using FM Index 2

  3. Mapping Reads Problem: We are given a read, R, and a reference sequence, S . Find the best or all occurrences of R in S . Example: R = AAACGAGTTA S = TTAATGC AAACGAGTTA CCCAATATATAT AAACCAGTTA TT Considering no error: one occurrence. Considering up to 1 substitution error: two occurrences. Considering up to 10 substitution errors: many meaningless occurrences! 3

  4. Mapping Reads (continued) Variations: — Sequencing error ◦ No error: R is a perfect subsequence of S. ◦ Only substitution error: R is a subsequence of S up to a few substitutions. ◦ Indel and substitution error: R is a subsequence of S up to a few short indels and substitutions. — Junctions (for instance in alternative splicing) ◦ Fixed order/orientation R = R 1 R 2 …R n and R i map to different non-overlapping loci in S , but to the same strand and preserving the order. ◦ Arbitrary order/orientation R = R 1 R 2 …R n and R i map to different non-overlapping loci in S. 4

  5. Different Solutions — Alignment, such as Smith-Waterman algorithm: ◦ Pro: adequate for all variations. ◦ Con: computationally expensive, not suitable for next-generation sequencing. — Seed-and-Extend ◦ Pro: can handle errors and junctions more efficiently. ◦ Con: slow when no (few) error(s). — Ferragina Manzini (FM) Index Search ◦ Pro: computationally efficient, when no error. ◦ Con: exponential in the maximum number of errors. 5

  6. Burrows-Wheeler Transformation Example: mississippi Append to the 1. input string a special char, $, smaller than all mississippi$ alphabet. 6

  7. Burrows-Wheeler Transformation (cnt’d) Example: mississippi Generate all 2. m i s s i s s i p p i $ rotations. i s s i s s i p p i $ m s s i s s i p p i $ m i s i s s i p p i $ m i s i s s i p p i $ m i s s s s i p p i $ m i s s i s i p p i $ m i s s i s i p p i $ m i s s i s s p p i $ m i s s i s s i p i $ m i s s i s s i p i $ m i s s i s s i p p $ m i s s i s s i p p i 7

  8. Burrows-Wheeler Transformation (cnt’d) Example: mississippi Sort 3. $ m i s s i s s i p p i rotations i $ m i s s i s s i p p according i p p i $ m i s s i s s to the i s s i p p i $ m i s s alphabetica i s s i s s i p p i $ m l order. m i s s i s s i p p i $ p i $ m i s s i s s i p p p i $ m i s s i s s i s i p p i $ m i s s i s s i s s i p p i $ m i s s s i p p i $ m i s s i s s i s s i p p i $ m i 8

  9. Burrows-Wheeler Transformation (cnt’d) Example: mississippi Output the 4. $ m i s s i s s i p p i last i $ m i s s i s s i p p column. i p p i $ m i s s i s s i s s i p p i $ m i s s i s s i s s i p p i $ m m i s s i s s i p p i $ p i $ m i s s i s s i p p p i $ m i s s i s s i s i p p i $ m i s s i s s i s s i p p i $ m i s s s i p p i $ m i s s i s s i s s i p p i $ m i 9

  10. Burrows-Wheeler Transformation (cnt’d) Example: mississippi ipssm$pissii 10

  11. Ferragina-Manzini Index Example: mississippi First column: F $ m i s s i s s i p p i i $ m i s s i s s i p p Last column: L i p p i $ m i s s i s s i s s i p p i $ m i s s i s s i s s i p p i $ m Let’s make an m i s s i s s i p p i $ L to F map. p i $ m i s s i s s i p p p i $ m i s s i s s i Observation: s i p p i $ m i s s i s The n th i in L is s i s s i p p i $ m i s the n th i in F. s s i p p i $ m i s s i s s i s s i p p i $ m i 11

  12. Ferragina-Manzini Index (cnt’d) L to F map $ i m p s Occ( j, ‘c’) Store/compute i 0 1 0 0 0 a two p 0 1 0 1 0 dimensional s 0 1 0 1 1 Occ( j, ‘c’) table s 0 1 0 1 2 of the number of Cnt(‘c’) occurrences of m 0 1 1 1 2 char ‘c’ up to $ 1 1 1 1 2 $ i m p s position j p 1 1 1 2 2 1 4 1 2 4 (inclusive) . i 1 2 1 2 2 s 1 2 1 2 3 and a one s 1 2 1 2 4 dimensional Cnt(‘c’) table. i 1 3 1 2 4 i 1 4 1 2 4 12

  13. Ferragina-Manzini Index L to F map [Cnt(‘$’) + 1 $ m i s s i s s i p p i Cnt(‘i’) + 2 i $ m i s s i s s i p p Cnt(‘m’) + 3 i p p i $ m i s s i s s Cnt(‘p’) = 8] 4 i s s i p p i $ m i s s 5 i s s i s s i p p i $ m + 6 m i s s i s s i p p i $ [Occ(9, ‘s’)= 3] 7 p i $ m i s s i s s i p = 11 before ‘s’ 8 p p i $ m i s s i s s i 9 s i p p i $ m i s s i s 10 s i s s i p p i $ m i s ‘s’ section 11 s s i p p i $ m i s s i 12 s s i s s i p p i $ m i 13

  14. Ferragina-Manzini Index Reverse traversal (1) i 1 $ m i s s i s s i p p i (2) p 2 i $ m i s s i s s i p p (7) p 3 i p p i $ m i s s i s s (8) i 4 i s s i p p i $ m i s s (3) s 5 i s s i s s i p p i $ m (9) s 6 m i s s i s s i p p i $ (11) i 7 p i $ m i s s i s s i p (4) s 8 p p i $ m i s s i s s i (10) s 9 s i p p i $ m i s s i s (12) i (5) m 10 s i s s i p p i $ m i s (6) $ 11 s s i p p i $ m i s s i 12 s s i s s i p p i $ m i 14

  15. Ferragina-Manzini Index Search issi (1)-(12) 1 $ m i s s i s s i p p i i (2)-(5) 2 i $ m i s s i s s i p p 3 i p p i $ m i s s i s s si (9)-(10) 4 i s s i p p i $ m i s s ssi (11)- 5 i s s i s s i p p i $ m 6 m i s s i s s i p p i $ (12) 7 p i $ m i s s i s s i p issi (4)-(5) 8 p p i $ m i s s i s s i 9 s i p p i $ m i s s i s 10 s i s s i p p i $ m i s 11 s s i p p i $ m i s s i 12 s s i s s i p p i $ m i 15

  16. Ferragina-Manzini Index Search pi (1)-(12) 1 $ m i s s i s s i p p i i 2 i $ m i s s i s s i p p 3 i p p i $ m i s s i s s pi 4 i s s i p p i $ m i s s 5 i s s i s s i p p i $ m 6 m i s s i s s i p p i $ 7 p i $ m i s s i s s i p 8 p p i $ m i s s i s s i 9 s i p p i $ m i s s i s 10 s i s s i p p i $ m i s 11 s s i p p i $ m i s s i 12 s s i s s i p p i $ m i 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend