suffix tree and suffix array
play

Suffix tree and Suffix array Karatsuba CS214: Algorithms and - PowerPoint PPT Presentation

Suffix tree and Suffix array Karatsuba CS214: Algorithms and Complexity Shanghai Jiao Tong University 2016.12.22 Q: How to find a match of S in a target DNA sequence? S: DNA: Q: How to find a match of S in a target DNA sequence? S: DNA:


  1. Application with Suffix Tries (2) S=abaa$ Find the longest common substring of $ b S and t a a S= abaa$ a $ t= babaa b a $ a $ max commom=ba a $

  2. Application with Suffix Tries (2) S=abaa$ Find the longest common substring of $ b S and t a a S= abaa$ a $ t= babaa b a $ a $ max commom=abaa a $

  3. Constructing Suffix Tries

  4. To build a suffix tries of S First build a suffix tries of S[0] Add char one by one into the suffix tries

  5. r

  6. S=abaa$ a r a

  7. S=abaa$ a Suffix: r a a

  8. S=abaa$ ab Suffix: r ab a b

  9. S=abaa$ ab Suffix: r ab a b b b

  10. S=abaa$ ab Suffix: r ab a b b b

  11. S=abaa$ ab Suffix: r ab a b b b

  12. S=abaa$ aba Suffix: r aba a b b b a

  13. S=abaa$ aba Suffix: r aba a b ba b a a

  14. S=abaa$ aba Suffix: r aba a b ba b a a

  15. S=abaa$ aba Suffix: r aba a b ba a b a a

  16. S=abaa$ aba Suffix: r aba a b ba a b a a

  17. S=abaa$ abaa$ Suffix: r $ abaa$ a b baa$ aa$ a a$ b a $ $ a a $ a $ $

  18. How many nodes can a suffix trie have?

  19. Space-Efficient Suffix Trees

  20. A More Compact Represntation S=abaa$ r $ a b a b a $ a a $ a $ $

  21. A More Compact Represntation S=abaa$ 12345 r $ a baa$ a$ baa$ $

  22. A More Compact Represntation S=abaa$ 12345 r 5:5 3:3 2:4 4:5 2:4 5:5

  23. How to construct suffix tree in Linear time Further reading: Ukkonens Algorithm

  24. Suffix arrays

  25. Suffix Array Example str = catttcat $ 1 catttcat$ 8 $ 2 attcat$ 6 at$ 3 ttcat$ sort the suffixes 2 attcat$ alphabetically 4 tcat$ 5 cat$ 5 cat$ 1 cattcat$ 6 at$ 7 t$ 7 t$ 4 tcat$ 8 $ 3 ttcat$

  26. Suffix Arrays What can we do with this? 8 $ 1. Counting: 6 at$ how many times does ’at’ occur? 2 attcat$ 5 cat$ All the suffixes that start with ’at’ 1 cattcat$ will be next to each other in the array. 7 t$ Binary search to find ’at’ 4 tcat$ 3 ttcat$

  27. Suffix Arrays What can we do with this? 8 $ 2. K-mer counting: 6 at$ k-length substring that occurs exactly i 2 attcat$ times. 5 cat$ 1 cattcat$ 7 t$ 4 tcat$ 3 ttcat$

  28. Suffix Arrays K = 2 CurrentCount 1 8 $ 6 at$ 2 attcat$ 5 cat$ 1 cattcat$ 7 t$ 4 tcat$ 3 ttcat$

  29. Suffix Arrays K = 2 CurrentCount 1 8 $ 1 6 at$ 2 attcat$ 5 cat$ 1 cattcat$ 7 t$ 4 tcat$ 3 ttcat$

  30. Suffix Arrays K = 2 CurrentCount 1 8 $ 1 6 at$ 2 2 attcat$ 5 cat$ 1 cattcat$ 7 t$ 4 tcat$ 3 ttcat$

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend