bioinformatics algorithms
play

Bioinformatics Algorithms (Fundamental Algorithms, module 2) - PowerPoint PPT Presentation

Bioinformatics Algorithms (Fundamental Algorithms, module 2) Zsuzsanna Lipt ak Masters in Medical Bioinformatics academic year 2018/19, II semester Strings and Sequences in Computer Science Some formalism on strings a finite set


  1. Bioinformatics Algorithms (Fundamental Algorithms, module 2) Zsuzsanna Lipt´ ak Masters in Medical Bioinformatics academic year 2018/19, II semester Strings and Sequences in Computer Science

  2. Some formalism on strings • Σ a finite set called alphabet 2 / 7

  3. Some formalism on strings • Σ a finite set called alphabet • its elements are called characters or letters 2 / 7

  4. Some formalism on strings • Σ a finite set called alphabet • its elements are called characters or letters • | Σ | is the size of the alphabet (number of different characters) 2 / 7

  5. Some formalism on strings • Σ a finite set called alphabet • its elements are called characters or letters • | Σ | is the size of the alphabet (number of different characters) • a string over Σ is a finite sequence of characters from Σ 2 / 7

  6. Some formalism on strings • Σ a finite set called alphabet • its elements are called characters or letters • | Σ | is the size of the alphabet (number of different characters) • a string over Σ is a finite sequence of characters from Σ • we write strings as s = s 1 s 2 . . . s n i.e. s i is the i ’th character of s 2 / 7

  7. Some formalism on strings • Σ a finite set called alphabet • its elements are called characters or letters • | Σ | is the size of the alphabet (number of different characters) • a string over Σ is a finite sequence of characters from Σ • we write strings as s = s 1 s 2 . . . s n i.e. s i is the i ’th character of s N.B. : We number strings from 1, not from 0 2 / 7

  8. Some formalism on strings (cont.) • | s | is the length of string s 3 / 7

  9. Some formalism on strings (cont.) • | s | is the length of string s • ǫ is the empty string, the (unique) string of length 0 3 / 7

  10. Some formalism on strings (cont.) • | s | is the length of string s • ǫ is the empty string, the (unique) string of length 0 • Σ n is the set of strings of length n 3 / 7

  11. Some formalism on strings (cont.) • | s | is the length of string s • ǫ is the empty string, the (unique) string of length 0 • Σ n is the set of strings of length n • Σ ∗ = � ∞ n =0 Σ n 3 / 7

  12. Some formalism on strings (cont.) • | s | is the length of string s • ǫ is the empty string, the (unique) string of length 0 • Σ n is the set of strings of length n • Σ ∗ = � ∞ n =0 Σ n = Σ 0 ∪ Σ 1 ∪ Σ 2 ∪ . . . is the set of all strings over Σ 3 / 7

  13. Some formalism on strings: Examples Examples • DNA: Σ = { A,C,G,T } , alphabet size | Σ | = 4, s = ACCTG is a string of length 5 of Σ, with s 1 = A , s 2 = s 3 = C , s 4 = T , s 5 = G . 4 / 7

  14. Some formalism on strings: Examples Examples • DNA: Σ = { A,C,G,T } , alphabet size | Σ | = 4, s = ACCTG is a string of length 5 of Σ, with s 1 = A , s 2 = s 3 = C , s 4 = T , s 5 = G . • RNA: Σ = { A,C,G,U } , again alphabet size is 4 4 / 7

  15. Some formalism on strings: Examples Examples • DNA: Σ = { A,C,G,T } , alphabet size | Σ | = 4, s = ACCTG is a string of length 5 of Σ, with s 1 = A , s 2 = s 3 = C , s 4 = T , s 5 = G . • RNA: Σ = { A,C,G,U } , again alphabet size is 4 • protein: Σ = { A,C,D,E,F,. . . ,W,Y } , alphabet size is 20, ANRFYWNL is a string over Σ of length 8 4 / 7

  16. Some formalism on strings: Examples Examples • DNA: Σ = { A,C,G,T } , alphabet size | Σ | = 4, s = ACCTG is a string of length 5 of Σ, with s 1 = A , s 2 = s 3 = C , s 4 = T , s 5 = G . • RNA: Σ = { A,C,G,U } , again alphabet size is 4 • protein: Σ = { A,C,D,E,F,. . . ,W,Y } , alphabet size is 20, ANRFYWNL is a string over Σ of length 8 • English alphabet: Σ = { a,b,c,. . . ,x,y,z } of size 26, alphabet is a string over Σ of length 8 4 / 7

  17. Some formalism on strings Let s = s 1 . . . s n be a string over Σ. ex. s = ACCTG 5 / 7

  18. Some formalism on strings Let s = s 1 . . . s n be a string over Σ. ex. s = ACCTG • t is a substring of s if t = ǫ or t = s i . . . s j for some 1 ≤ i ≤ j ≤ n (i.e., a ”contiguous piece” of s ) 5 / 7

  19. Some formalism on strings Let s = s 1 . . . s n be a string over Σ. ex. s = ACCTG • t is a substring of s if t = ǫ or t = s i . . . s j for some 1 ≤ i ≤ j ≤ n (i.e., a ”contiguous piece” of s ) CCT , AC , . . . 5 / 7

  20. Some formalism on strings Let s = s 1 . . . s n be a string over Σ. ex. s = ACCTG • t is a substring of s if t = ǫ or t = s i . . . s j for some 1 ≤ i ≤ j ≤ n (i.e., a ”contiguous piece” of s ) CCT , AC , . . . • t is a prefix of s if t = ǫ or t = s 1 . . . s j for some 1 ≤ j ≤ n (i.e., a ”beginning” of s ) 5 / 7

  21. Some formalism on strings Let s = s 1 . . . s n be a string over Σ. ex. s = ACCTG • t is a substring of s if t = ǫ or t = s i . . . s j for some 1 ≤ i ≤ j ≤ n (i.e., a ”contiguous piece” of s ) CCT , AC , . . . • t is a prefix of s if t = ǫ or t = s 1 . . . s j for some 1 ≤ j ≤ n (i.e., a ”beginning” of s ) AC , ACCTG , . . . 5 / 7

  22. Some formalism on strings Let s = s 1 . . . s n be a string over Σ. ex. s = ACCTG • t is a substring of s if t = ǫ or t = s i . . . s j for some 1 ≤ i ≤ j ≤ n (i.e., a ”contiguous piece” of s ) CCT , AC , . . . • t is a prefix of s if t = ǫ or t = s 1 . . . s j for some 1 ≤ j ≤ n (i.e., a ”beginning” of s ) AC , ACCTG , . . . • t is a suffix of s if t = ǫ or t = s i . . . s n for some 1 ≤ i ≤ n (i.e., an ”end” of s ) 5 / 7

  23. Some formalism on strings Let s = s 1 . . . s n be a string over Σ. ex. s = ACCTG • t is a substring of s if t = ǫ or t = s i . . . s j for some 1 ≤ i ≤ j ≤ n (i.e., a ”contiguous piece” of s ) CCT , AC , . . . • t is a prefix of s if t = ǫ or t = s 1 . . . s j for some 1 ≤ j ≤ n (i.e., a ”beginning” of s ) AC , ACCTG , . . . • t is a suffix of s if t = ǫ or t = s i . . . s n for some 1 ≤ i ≤ n (i.e., an ”end” of s ) CCTG , G , . . . 5 / 7

  24. Some formalism on strings Let s = s 1 . . . s n be a string over Σ. ex. s = ACCTG • t is a substring of s if t = ǫ or t = s i . . . s j for some 1 ≤ i ≤ j ≤ n (i.e., a ”contiguous piece” of s ) CCT , AC , . . . • t is a prefix of s if t = ǫ or t = s 1 . . . s j for some 1 ≤ j ≤ n (i.e., a ”beginning” of s ) AC , ACCTG , . . . • t is a suffix of s if t = ǫ or t = s i . . . s n for some 1 ≤ i ≤ n (i.e., an ”end” of s ) CCTG , G , . . . • t is a subsequence of s if t can be obtained from s by deleting some (possibly 0, possibly all) characters from s 5 / 7

  25. Some formalism on strings Let s = s 1 . . . s n be a string over Σ. ex. s = ACCTG • t is a substring of s if t = ǫ or t = s i . . . s j for some 1 ≤ i ≤ j ≤ n (i.e., a ”contiguous piece” of s ) CCT , AC , . . . • t is a prefix of s if t = ǫ or t = s 1 . . . s j for some 1 ≤ j ≤ n (i.e., a ”beginning” of s ) AC , ACCTG , . . . • t is a suffix of s if t = ǫ or t = s i . . . s n for some 1 ≤ i ≤ n (i.e., an ”end” of s ) CCTG , G , . . . • t is a subsequence of s if t can be obtained from s by deleting some (possibly 0, possibly all) characters from s AT , CCT , . . . 5 / 7

  26. Some formalism on strings Let s = s 1 . . . s n be a string over Σ. ex. s = ACCTG • t is a substring of s if t = ǫ or t = s i . . . s j for some 1 ≤ i ≤ j ≤ n (i.e., a ”contiguous piece” of s ) CCT , AC , . . . • t is a prefix of s if t = ǫ or t = s 1 . . . s j for some 1 ≤ j ≤ n (i.e., a ”beginning” of s ) AC , ACCTG , . . . • t is a suffix of s if t = ǫ or t = s i . . . s n for some 1 ≤ i ≤ n (i.e., an ”end” of s ) CCTG , G , . . . • t is a subsequence of s if t can be obtained from s by deleting some (possibly 0, possibly all) characters from s AT , CCT , . . . N.B. string = sequence, but substring � = subsequence! 5 / 7

  27. Substrings etc. N.B. 1. Every substring is a subsequence, but not every subsequence is a substring! 6 / 7

  28. Substrings etc. N.B. 1. Every substring is a subsequence, but not every subsequence is a substring! Ex.: Let s = ACCTG , then ACT is a subsequence but not a substring. 6 / 7

  29. Substrings etc. N.B. 1. Every substring is a subsequence, but not every subsequence is a substring! Ex.: Let s = ACCTG , then ACT is a subsequence but not a substring. 2. Every prefix and every suffix is a substring. 6 / 7

  30. Substrings etc. N.B. 1. Every substring is a subsequence, but not every subsequence is a substring! Ex.: Let s = ACCTG , then ACT is a subsequence but not a substring. 2. Every prefix and every suffix is a substring. 3. t is substring of s ⇔ t is prefix of a suffix of s ⇔ t is suffix of a prefix of s 6 / 7

  31. Counting substrings, subsequences etc. Question Given s = s 1 . . . s n . How many • prefixes, • suffixes, • substrings, • subsequences does s have (exactly, or at most, or at least)? 7 / 7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend