Alphabet An alphabet is a set of letters . e.g., { a, b, c, . . . , z - - PDF document

alphabet an alphabet is a set of letters e g a b c z e g
SMART_READER_LITE
LIVE PREVIEW

Alphabet An alphabet is a set of letters . e.g., { a, b, c, . . . , z - - PDF document

Alphabet An alphabet is a set of letters . e.g., { a, b, c, . . . , z } e.g., { , , . . . , } e.g., { a, b, c } e.g., { a, b } e.g., { 0 } (unary alphabet) e.g., { 0 , 1 } (binary alphabet) e.g., { N, S, E, W } (compass alphabet) We use the


slide-1
SLIDE 1

Alphabet An alphabet is a set of letters. e.g., {a, b, c, . . . , z} e.g., {α, β, . . . , ω} e.g., {a, b, c} e.g., {a, b} e.g., {0} (unary alphabet) e.g., {0, 1} (binary alphabet) e.g., {N, S, E, W} (compass alphabet) We use the symbol Σ for alphabet and σ for a letter Σ = {a, b} and σ ∈ Σ, for example, σ = a Word A word is a sequence of letters over some alphabet. A word is also called a string: e.g., aab over the alphabet {a, b} e.g., aabb over the alphabet {a, b} e.g., b over the alphabet {a, b} e.g., aab also a word over {a, b, c}. 0011 is a word over the binary alphabet, we call such words “binary strings”. Length of a word A string has a length which is the number of letters in the word. For example suppose Σ = {a, b, c} and w = abc then |w| = 3 If w = aa then |w| = 2 Suppose we have the binary alphabet {0, 1}. How many words of length 1? 2 (0, 1) How many words of length 2? 4 How many words of length 3? 8 Claim: the number of words is the number of letters in the alphabet raised to the number of letters in the word. That is, there are |Σ|i words of length i

  • ver Σ.

Suppose our alphabet is {a, b, c}. Then the words of length 1 are: a, b, c. The words of length 2 are: aa, ab, ac, ba, bb, bc, ca, cb, cc. Empty word How many words of length 0 over the alphabet {a, b}? 20 = 1 How many words of length 0 over the alphabet {a, b, c}? 30 = 1 We use ǫ to represent the empty word. ǫ is a word over every alphabet Thus if we have two words, x, y, and |x| = 0 and |y| = 0 then x = y ǫ is the empty word, it is a word that has no length and therefore consists of no letters. An alphabet is a set of letters, so ǫ is never a letter in an alphabet. That is, words and letters are different types. Another point: the one-letter word “a” is not the same as the letter “a”.

slide-2
SLIDE 2

ALPHABET sigma = {a, b}; WORD x = empty for (INT i = 0; i < 5; ++i) { pick LETTER a in sigma x += a } print x Concatenation Concatenation is a fundamental operation with words. It consists of putting one next to the other. We denote concatenation with simply juxtaposition where unambiguous or with · if needed or wanting to be explicit. For example: Let w = abc and w′ = aaa Then w · w′ = abcaaa, or simply ww′ = abcaaa Note that ww′ = w′w in general: w′w = aaaabc When would it be the case that ww′ = w′w? What would be the exact conditions on w? Hint: the obvious answer is when w = w′, but that is not all cases. Suppose you have the words a and b in infinite supply. How many different ways can you concatenate them to create abab? cat(a, b) = ab cat(ab , a) = aba cat(aba , b) = abab x = cat(a, b) x = cat(x, a) x = cat(x, b) print x cat(a, b) = ab cat(a, b) = ab cat(ab, ab) x = cat(a, b) y = cat(a, b) x = cat(x, y) print x cat(a, b) = ab cat(b, ab) = bab cat (a, bab) = abab

slide-3
SLIDE 3

x = cat(a, b) x = cat(b, x) x = cat(a, x) print x and two more (what are they?) abab ab a b ab a b abab aba ab a b a b abab a bab b ab a b Empty concatenation What happens when we concatenation with empty word? cat(ǫ, aba) = aba In general ǫ · w = w · ǫ = w. This also means that ǫǫ = ǫ. Repeated concatenations Instead of writing out all the words or letters through juxtaposition, we denote repeated concatenations with an exponent. w2 = ww w3 = www So wi is i copies of w juxtaposed. w0 is zero copies of w, which is ǫ. ∀w, w is a word, w0 = ǫ Think of this like an algorithm:

slide-4
SLIDE 4

WORD repeated_concatenation(WORD w, INT amount) { WORD retval = ""; // i.e, epsilon for (INT i = 0; i < amount; ++i) { retval = cat(retval, w); } return retval; } wiwj = wjwi = wi+j (wi)j = wij (xy)i = xyxyxy . . . xy In general, (xy)i = xiyi. In some cases like (ww)2 = wwww = w2w2 it is true but in others like (ab)2 = abab = aabb it is not. Concatenations of alphabets Just like we concatenate words and letters, we can concatenate sets of words and sets of letters. In this case it is all possible combinations. Σ = {a, b} Σ1 = {a, b} Σ2 = Σ · Σ = {a, b} · {a, b} = {aa, ab, ba, bb} Σ0 = {ǫ} Σ2 · Σ = Σ3 Σi · Σ0 = Σi Σ2 · Σ0 = {aa, ab, ba, bb} · {ǫ} = {aaǫ, abǫ, baǫ, bbǫ} = {aa, ab, ba, bb} = Σ2 Σi = {σ1σ2 . . . σi : ∀1 ≤ j ≤ i ⇒ σj ∈ Σ} Σ is an alphabet and Σ1 is a set of words of length 1 over Σ. Σi is the set

  • f words of length i.

So what is Σ0 ∪ Σ1 ∪ Σ2 . . . mean? Σ0 ∪ Σ1 ∪ Σ2 . . . =

i≥0 Σi which is all words of length 0 over Σ and all

words of length 1 and length 2 etc. This is simply all words of any length, thus all possible words you can create using letters from Σ. This is a useful notion so we use Σ⋆ as a shorthand for

i≥0 Σi. This is

known as the Kleene star or Kleene closure. Defining the length of a word How do we define the length of a word? Informally “the number of letters in it”. But how do we measure that? Suppose we want an algorithm to determine the “number of letters”, but we don’t have a length() function. Suppose we can only do two things: remove the first letter, and check if it is empty. procedural version

slide-5
SLIDE 5

INT length(WORD w) { INT i = 0; while (w != empty) { w.pop_front(); // w = w[1:] ++i; } return i; } recursive version INT length(WORD w) { // base case if (w == empty) return 0; // recursive case return 1 + length(w.pop_front()); } Suppose that w ∈ Σ⋆. If w = ǫ it has length zero. Otherwise w = ǫ so |w| ≥ 1 and thus w has at least one letter. That means that we can rewrite w as ax where a ∈ Σ and x ∈ Σ⋆. x may now be ǫ (what would w be if this is true) but a is always a single

  • letter. So we can consider |w| = |ax| = 1 + |x|. Since |x| is smaller than |w|, we

can keep repeating this until we reach ǫ. As in: |abc| = 1 + |bc| = 1 + 1 + |c| = 1 + 1 + 1 + |ǫ| = 1 + 1 + 1 + 0 = 3 These are called recursive definitions and we use them frequently. It makes it easier to prove things. It relates to inductive proofs (base case and inductive case). Here the definition has a base case and a recursive case. Recursive definition of the length of a word: |w| = 0 if w = ǫ |w| = 1 + |x| if w = ax where a ∈ Σ and x ∈ Σ⋆ Note that for all possible w ∈ Σ⋆, only one of these two cases are valid. Counting letters Sometimes we want to count the number of occurances of a particular letter (e.g., a), We use nσ : Σ⋆ → N as a function to count the number of σ letters in the input word. na(abc) = 1 nb(bbc) = 2 na(bbb) = 0 What is a recursive definition for na? Let’s start with algorithm.

slide-6
SLIDE 6

INPUT: some word w OUTPUT: a number n such that n_a(w)=n int i = 0; while (w != empty) { if (w.front() == a) { // w[0] == a ++i; } w = w.pop_front(); // w = w[1:] } return i; The alg is similar to the length calculation except there is a condition on whether to “add one”. Let’s rewrite as a recursive algorithm: INT n_a(WORD w) { if (w.empty()) return 0; if (w.front() == ’a’) return 1 + n_a(w.pop_front()); return n_a(w.pop_front()); } Recursive definition for number of letters. There are three cases, one base and two recursive. Base case is empty word, which has no letters and thus no letter σ. nσ(w) = 0 if w = ǫ Recursive case have at least one letter, which may or may not be σ: nσ(σx) = 1 + nσ(x) if w = σx where σ ∈ Σ and x ∈ Σ⋆ nσ(σ′x) = 0 + nσ(x) if w = σ′x and σ′ ∈ Σ \ {σ} and x ∈ Σ⋆ Reverse of a word The reverse of a word is the word with its letters written

  • backwards. We use wr to express this. So (abc)r = cba

A word is a palindrome if x = xr e.g., redivider, aabbaa, aba, a, ǫ. The empty word and a single letter are always palindrome. Exercises: give a recursive definition for wr and the function pal(w), which returns true if the word is a palindrome (hint: there are 2 base cases and 2 recursive cases). Substring operations A prefix is a string that occurs at the start of a word. x is a prefix of z if ∃y ∈ Σ⋆ such that z = xy How many prefices does a have? ǫ is a prefix of a a is a prefix of a because a = aǫ meaning y = ǫ for z = xy. A strict prefix is a prefix that has at least one letter less than the word If our word was aabb then we have the prefices: {ǫ, a, aa, aab, aabb} if our word was aabb then we have the strict prefices: {ǫ, a, aa, aab}

slide-7
SLIDE 7

The empty word has one prefix—itself If |w| = n, how many prefices of w are there? How many strict prefices? x is a suffix of z if ∃y ∈ Σ⋆ such that we can write z = yx (z can be written as y concatenated with x) We also have a notion of a strict suffix. x is a substring of z if ∃w, y ∈ Σ⋆ such that z = wxy. Both w and y can be ǫ. In that case it becomes either a suffix or a prefix. For example z = abba. We could have w = a and y = ba and x = b is a substring of z: wxy = abba = abba = z ǫ is a substring of all words. Is ǫ a substring of ǫ? Languages A language is a set of strings. A language is written over or using an alphabet. The alphabet Σ has the set of words Σ⋆. A language L is a subset of all possible words. L ⊆ Σ⋆ We can use set operations with languages, e.g., take unions, intersections, and complements. Recall complement assumes a universe, what is the universe here? The complement of a language is all words not in the language: L = Σ⋆ \ L Thus the universe is only the words with the same alphabet, not all words with all possible alphabets. Example languages: Words that have all a’s before all b’s: {aibj : i, j ≥ 0} Words that have all a’s before all b’s and at least one of each: {aibj : i, j > 0} Words that have even number of a’s: {w ∈ {a, b}⋆ : ∃i ∈ N : na(w) = 2i} Words that have the same number of a’s and b’s: {x ∈ {a, b}⋆ : na(x) = nb(x)} Palindromes over {a, b}: {x ∈ {a, b}⋆ : x = xr} = {ǫ, a, b, aa, bb, aba, bab, aaa, bbb . . .} Unary primes {0i : i is prime} Exercises: write the definition of all words that are one word doubled like “abab” write the definition of all words with all a’s before b’s before c’s, like “aabccc” write the definition of all words with all a’s and b’s before all c’s, like “ababacc” write the definition of unary numbers congruent to 3 mod 5, like 000 and 00000000.

slide-8
SLIDE 8

Set Operations on languages L1 ∪ (L2 ∩ L3) = (L1 ∪ L2) ∩ (L1 ∪ L3) L1 ∩ (L2 ∪ L3) = (L1 ∩ L2) ∪ (L1 ∩ L3) L1 ∪ L2 = L1 ∩ L2 L1 ∩ L2 = L1 ∪ L2 Other Operations Lr = {wr : w ∈ L} prefix(L) = {w : ∃x ∈ Σ⋆ : wx ∈ L} suffix(L) = {w : ∃x ∈ Σ⋆ : xw ∈ L} substring(L) = {w : ∃x, y ∈ Σ⋆ : xwy ∈ L} Concatenations of languages If we have languages L1, L2 ⊆ Σ⋆ then L1L2 = {xy : x ∈ L1 ∧ y ∈ L2} Example: L1 = {cross, inter} and L2 = {section, link} then L1L2 = {crosssection, intersection, crosslink, interlink} Think of this like cross product: all pairwise combinations are part of the resulting concatenation. The same word can be created multiple ways, and is only represented once in the set: {a, ǫ} · {a, ǫ} = {aa, aǫ, ǫa, ǫǫ} = {aa, a, a, ǫ} = {aa, a, ǫ} Think about unary numbers. What mathematical operation is manifested through a concatenation? Consider {0, 00} (1 and 2) concatenated to itself. The result is {00, 000, 0000}. What about {0} and {0i : i prime}, which yields {0 · 0i : i prime} = {0i+1 : i prime. As with concatenations of alphabets, we have the same notation: Ln = LLL . . . L (n copies of L) L0 = {ǫ} L1 = L Kleene star of a language L⋆ = L0 ∪ L1 ∪ L2 · · · =

i≥0 Li

We also have positive closure: L+ = L1 ∪ L2 · · · =

i>0 Li = L · i≥0 Li =

L · L⋆ We can take the positive closure of an alphabet too Σ+. Question: what is Σ⋆ \ Σ+ equal? (Hint: it doesn’t matter what Σ is to define it exactly.

slide-9
SLIDE 9

Theorem 1. L1(L2 ∪ L3) = L1L2 ∪ L1L3

  • Proof. First, we prove that L1(L2 ∪ L3) ⊆ L1L2 ∪ L1L3

choose x ∈ L1(L2 ∪ L3) we can rewrite x as x1y1 where x1 ∈ L1 and y1 ∈ (L2 ∪ L3). we have two cases: (i) y1 ∈ L2 (ii) y1 ∈ L3 case 1: y1 ∈ L2. then x = x1y1 ∈ L1L2 ∈ L1L2 ∪ L1L3 case 2: y1 ∈ L3. then x = x1y1 ∈ L1L3 ∈ L1L2 ∪ L1L3 Second, we prove that L1L2 ∪ L1L3 ⊆ L1(L2 ∪ L3) let x = x1y1 ∈ L1L2 ∪ L1L3 we have two cases (i) x ∈ L1L2 (ii) X ∈ L1L3. case 1: x1 ∈ L1 and y1 ∈ L2 therefore y1 ∈ L2 ∪ L3 therefore x = x1y1 ∈ L1(L2 ∪ L3) case 2: is analagous. Is this true for L1(L2 ∩ L3) ? give a proof or counter example? To show that this is not true, it suffices to provide a single counter example to disprove it. that means we need a L1, L2, and L3 such that the statement does not hold: L1(L2 ∩ L3) = L1L2 ∩ L1L3 Let L1 = {a, ǫ} and L2 = {b} and L3 = {ab} L2 ∩ L3 = {} L1(L2 ∩ L3) = L1 · ∅ = ∅ To recall: {a, b} · {a} = {aa, ba}. if there are no words in the language, then there are no resulting words in the concatenation. {a, b} · {} = {} L1L2 = {ab, ǫb} = {ab, b} L1L3 = {aab, ǫab} = {aab, ab} L1L2 ∩ L1L3 = {ab} Therefore L1(L2 ∩ L3) = L1L2 ∩ L1L3 and therefore the statement cannot be true in general. Lemma 1. If A ⊆ B and A′ ⊆ B then A ∪ A′ ⊆ B

  • Proof. exercise

Lemma 2. If A ⊆ B then L · A ⊆ L · B

  • Proof. exercise

Theorem 2. if L2 ⊆ L then L+ ⊆ L recall that: L0 = {ǫ} Ln = L · Ln−1 L2 = L · L L3 = LLL L+ = L1 ∪ L2 ∪ . . .

slide-10
SLIDE 10
  • Proof. to show that L+ ⊆ L we need to show that for all Li, i ≥ 1 that Li ⊆ L

(by Lemma 1). prove this by induction on n if L2 ⊆ L then Ln ⊆ L∀n ≥ 1 base case: L1 ⊆ L trivially. L2 ⊆ L by assumption. inductive hypothesis: now we assume that Li ⊆ L for all i < n prove it for Ln. that is, prove that if Ln−1 ⊆ L then Ln ⊆ L Ln = L · Ln−1 and we know that Ln−1 ⊆ L (by our inductive hypothesis) Ln−1 ⊆ L then L · Ln−1 ⊆ L · L (by Lemma 2) this means that Ln ⊆ L2 ⊆ L therefore we have that Li ⊆ L for all i if L2 ⊆ L and so L+ ⊆ L if L2 ⊆ L Indexing a string By convention, a string w ∈ Σ⋆ has letters w1w2 . . . wn with wi ∈ Σ for 1 ≤ i ≤ n ∧ n = |w|. So we can always refer to a particular letter in a word through its subscript. For example: if w = abc then w1 = a, w2 = b, and w3 = c In programming this would be like w[i] such as if (w[i] == ‘a’).