Bioinformatics Algorithms (Fundamental Algorithms, module 2) - - PowerPoint PPT Presentation

bioinformatics algorithms
SMART_READER_LITE
LIVE PREVIEW

Bioinformatics Algorithms (Fundamental Algorithms, module 2) - - PowerPoint PPT Presentation

Bioinformatics Algorithms (Fundamental Algorithms, module 2) Zsuzsanna Lipt ak Masters in Medical Bioinformatics academic year 2018/19, II semester Strings and Sequences in Computer Science Some formalism on strings a finite set


slide-1
SLIDE 1

Bioinformatics Algorithms

(Fundamental Algorithms, module 2)

Zsuzsanna Lipt´ ak

Masters in Medical Bioinformatics academic year 2018/19, II semester

Strings and Sequences in Computer Science

slide-2
SLIDE 2

Some formalism on strings

  • Σ a finite set called alphabet

2 / 7

slide-3
SLIDE 3

Some formalism on strings

  • Σ a finite set called alphabet
  • its elements are called characters or letters

2 / 7

slide-4
SLIDE 4

Some formalism on strings

  • Σ a finite set called alphabet
  • its elements are called characters or letters
  • |Σ| is the size of the alphabet (number of different characters)

2 / 7

slide-5
SLIDE 5

Some formalism on strings

  • Σ a finite set called alphabet
  • its elements are called characters or letters
  • |Σ| is the size of the alphabet (number of different characters)
  • a string over Σ is a finite sequence of characters from Σ

2 / 7

slide-6
SLIDE 6

Some formalism on strings

  • Σ a finite set called alphabet
  • its elements are called characters or letters
  • |Σ| is the size of the alphabet (number of different characters)
  • a string over Σ is a finite sequence of characters from Σ
  • we write strings as s = s1s2 . . . sn

i.e. si is the i’th character of s

2 / 7

slide-7
SLIDE 7

Some formalism on strings

  • Σ a finite set called alphabet
  • its elements are called characters or letters
  • |Σ| is the size of the alphabet (number of different characters)
  • a string over Σ is a finite sequence of characters from Σ
  • we write strings as s = s1s2 . . . sn

i.e. si is the i’th character of s

N.B.: We number strings from 1, not from 0

2 / 7

slide-8
SLIDE 8

Some formalism on strings (cont.)

  • |s| is the length of string s

3 / 7

slide-9
SLIDE 9

Some formalism on strings (cont.)

  • |s| is the length of string s
  • ǫ is the empty string, the (unique) string of length 0

3 / 7

slide-10
SLIDE 10

Some formalism on strings (cont.)

  • |s| is the length of string s
  • ǫ is the empty string, the (unique) string of length 0
  • Σn is the set of strings of length n

3 / 7

slide-11
SLIDE 11

Some formalism on strings (cont.)

  • |s| is the length of string s
  • ǫ is the empty string, the (unique) string of length 0
  • Σn is the set of strings of length n
  • Σ∗ = ∞

n=0 Σn

3 / 7

slide-12
SLIDE 12

Some formalism on strings (cont.)

  • |s| is the length of string s
  • ǫ is the empty string, the (unique) string of length 0
  • Σn is the set of strings of length n
  • Σ∗ = ∞

n=0 Σn = Σ0 ∪ Σ1 ∪ Σ2 ∪ . . . is the set of all strings over Σ

3 / 7

slide-13
SLIDE 13

Some formalism on strings: Examples

Examples

  • DNA: Σ = {A,C,G,T}, alphabet size |Σ| = 4,

s = ACCTG is a string of length 5 of Σ, with s1 = A, s2 = s3 = C, s4 = T, s5 = G.

4 / 7

slide-14
SLIDE 14

Some formalism on strings: Examples

Examples

  • DNA: Σ = {A,C,G,T}, alphabet size |Σ| = 4,

s = ACCTG is a string of length 5 of Σ, with s1 = A, s2 = s3 = C, s4 = T, s5 = G.

  • RNA: Σ = {A,C,G,U}, again alphabet size is 4

4 / 7

slide-15
SLIDE 15

Some formalism on strings: Examples

Examples

  • DNA: Σ = {A,C,G,T}, alphabet size |Σ| = 4,

s = ACCTG is a string of length 5 of Σ, with s1 = A, s2 = s3 = C, s4 = T, s5 = G.

  • RNA: Σ = {A,C,G,U}, again alphabet size is 4
  • protein: Σ = {A,C,D,E,F,. . . ,W,Y}, alphabet size is 20,

ANRFYWNL is a string over Σ of length 8

4 / 7

slide-16
SLIDE 16

Some formalism on strings: Examples

Examples

  • DNA: Σ = {A,C,G,T}, alphabet size |Σ| = 4,

s = ACCTG is a string of length 5 of Σ, with s1 = A, s2 = s3 = C, s4 = T, s5 = G.

  • RNA: Σ = {A,C,G,U}, again alphabet size is 4
  • protein: Σ = {A,C,D,E,F,. . . ,W,Y}, alphabet size is 20,

ANRFYWNL is a string over Σ of length 8

  • English alphabet: Σ = {a,b,c,. . . ,x,y,z} of size 26,

alphabet is a string over Σ of length 8

4 / 7

slide-17
SLIDE 17

Some formalism on strings

Let s = s1 . . . sn be a string over Σ.

  • ex. s = ACCTG

5 / 7

slide-18
SLIDE 18

Some formalism on strings

Let s = s1 . . . sn be a string over Σ.

  • ex. s = ACCTG
  • t is a substring of s if t = ǫ or t = si . . . sj for some 1 ≤ i ≤ j ≤ n

(i.e., a ”contiguous piece” of s)

5 / 7

slide-19
SLIDE 19

Some formalism on strings

Let s = s1 . . . sn be a string over Σ.

  • ex. s = ACCTG
  • t is a substring of s if t = ǫ or t = si . . . sj for some 1 ≤ i ≤ j ≤ n

(i.e., a ”contiguous piece” of s) CCT, AC, . . .

5 / 7

slide-20
SLIDE 20

Some formalism on strings

Let s = s1 . . . sn be a string over Σ.

  • ex. s = ACCTG
  • t is a substring of s if t = ǫ or t = si . . . sj for some 1 ≤ i ≤ j ≤ n

(i.e., a ”contiguous piece” of s) CCT, AC, . . .

  • t is a prefix of s if t = ǫ or t = s1 . . . sj for some 1 ≤ j ≤ n

(i.e., a ”beginning” of s)

5 / 7

slide-21
SLIDE 21

Some formalism on strings

Let s = s1 . . . sn be a string over Σ.

  • ex. s = ACCTG
  • t is a substring of s if t = ǫ or t = si . . . sj for some 1 ≤ i ≤ j ≤ n

(i.e., a ”contiguous piece” of s) CCT, AC, . . .

  • t is a prefix of s if t = ǫ or t = s1 . . . sj for some 1 ≤ j ≤ n

(i.e., a ”beginning” of s) AC, ACCTG, . . .

5 / 7

slide-22
SLIDE 22

Some formalism on strings

Let s = s1 . . . sn be a string over Σ.

  • ex. s = ACCTG
  • t is a substring of s if t = ǫ or t = si . . . sj for some 1 ≤ i ≤ j ≤ n

(i.e., a ”contiguous piece” of s) CCT, AC, . . .

  • t is a prefix of s if t = ǫ or t = s1 . . . sj for some 1 ≤ j ≤ n

(i.e., a ”beginning” of s) AC, ACCTG, . . .

  • t is a suffix of s if t = ǫ or t = si . . . sn for some 1 ≤ i ≤ n

(i.e., an ”end” of s)

5 / 7

slide-23
SLIDE 23

Some formalism on strings

Let s = s1 . . . sn be a string over Σ.

  • ex. s = ACCTG
  • t is a substring of s if t = ǫ or t = si . . . sj for some 1 ≤ i ≤ j ≤ n

(i.e., a ”contiguous piece” of s) CCT, AC, . . .

  • t is a prefix of s if t = ǫ or t = s1 . . . sj for some 1 ≤ j ≤ n

(i.e., a ”beginning” of s) AC, ACCTG, . . .

  • t is a suffix of s if t = ǫ or t = si . . . sn for some 1 ≤ i ≤ n

(i.e., an ”end” of s) CCTG, G, . . .

5 / 7

slide-24
SLIDE 24

Some formalism on strings

Let s = s1 . . . sn be a string over Σ.

  • ex. s = ACCTG
  • t is a substring of s if t = ǫ or t = si . . . sj for some 1 ≤ i ≤ j ≤ n

(i.e., a ”contiguous piece” of s) CCT, AC, . . .

  • t is a prefix of s if t = ǫ or t = s1 . . . sj for some 1 ≤ j ≤ n

(i.e., a ”beginning” of s) AC, ACCTG, . . .

  • t is a suffix of s if t = ǫ or t = si . . . sn for some 1 ≤ i ≤ n

(i.e., an ”end” of s) CCTG, G, . . .

  • t is a subsequence of s if t can be obtained from s by deleting some

(possibly 0, possibly all) characters from s

5 / 7

slide-25
SLIDE 25

Some formalism on strings

Let s = s1 . . . sn be a string over Σ.

  • ex. s = ACCTG
  • t is a substring of s if t = ǫ or t = si . . . sj for some 1 ≤ i ≤ j ≤ n

(i.e., a ”contiguous piece” of s) CCT, AC, . . .

  • t is a prefix of s if t = ǫ or t = s1 . . . sj for some 1 ≤ j ≤ n

(i.e., a ”beginning” of s) AC, ACCTG, . . .

  • t is a suffix of s if t = ǫ or t = si . . . sn for some 1 ≤ i ≤ n

(i.e., an ”end” of s) CCTG, G, . . .

  • t is a subsequence of s if t can be obtained from s by deleting some

(possibly 0, possibly all) characters from s AT, CCT, . . .

5 / 7

slide-26
SLIDE 26

Some formalism on strings

Let s = s1 . . . sn be a string over Σ.

  • ex. s = ACCTG
  • t is a substring of s if t = ǫ or t = si . . . sj for some 1 ≤ i ≤ j ≤ n

(i.e., a ”contiguous piece” of s) CCT, AC, . . .

  • t is a prefix of s if t = ǫ or t = s1 . . . sj for some 1 ≤ j ≤ n

(i.e., a ”beginning” of s) AC, ACCTG, . . .

  • t is a suffix of s if t = ǫ or t = si . . . sn for some 1 ≤ i ≤ n

(i.e., an ”end” of s) CCTG, G, . . .

  • t is a subsequence of s if t can be obtained from s by deleting some

(possibly 0, possibly all) characters from s AT, CCT, . . .

N.B.

string = sequence, but substring = subsequence!

5 / 7

slide-27
SLIDE 27

Substrings etc.

N.B.

  • 1. Every substring is a subsequence, but not every subsequence is a

substring!

6 / 7

slide-28
SLIDE 28

Substrings etc.

N.B.

  • 1. Every substring is a subsequence, but not every subsequence is a

substring! Ex.: Let s = ACCTG, then ACT is a subsequence but not a substring.

6 / 7

slide-29
SLIDE 29

Substrings etc.

N.B.

  • 1. Every substring is a subsequence, but not every subsequence is a

substring! Ex.: Let s = ACCTG, then ACT is a subsequence but not a substring.

  • 2. Every prefix and every suffix is a substring.

6 / 7

slide-30
SLIDE 30

Substrings etc.

N.B.

  • 1. Every substring is a subsequence, but not every subsequence is a

substring! Ex.: Let s = ACCTG, then ACT is a subsequence but not a substring.

  • 2. Every prefix and every suffix is a substring.
  • 3. t is substring of s

⇔ t is prefix of a suffix of s ⇔ t is suffix of a prefix of s

6 / 7

slide-31
SLIDE 31

Counting substrings, subsequences etc.

Question

Given s = s1 . . . sn. How many

  • prefixes,
  • suffixes,
  • substrings,
  • subsequences

does s have (exactly, or at most, or at least)?

7 / 7