SLIDE 1 CSCI 3136 Principles of Programming Languages
Lexical Analysis and Automata Theory - 1
Summer 2013 Faculty of Computer Science Dalhousie University
1 / 78
SLIDE 2 Phases of Compilation
Semantic Analysis and Inter- mediate Code Generation Machine-independent Code Improvement (Optional) Target Code Generation Machine-specific Code Im- provement (Optional) Parser (syntactic analysis) Scanner (lexical analysis) Character Stream Token Stream Parse Tree Abstract Syntax Tree or Other Intermediate Form Modified Intermediate Form Target Language (e.g., assembly) Modified Target Language Symbol Table
2 / 78
SLIDE 3
SLIDE 4
Context-free Grammar Context-free Language
are gen- erated by
Parser (PDA)
recognizes
SLIDE 5
Context-free Grammar Context-free Language
are gen- erated by
Parser (PDA)
recognizes
Regular Expression Regular Language
are gen- erated by
Scanner (DFA)
recognizes
SLIDE 6
Context-free Grammar Context-free Language
are gen- erated by
Parser (PDA)
recognizes
Regular Expression Regular Language
are gen- erated by
Scanner (DFA)
recognizes
SLIDE 7
Context-free Grammar Context-free Language
are gen- erated by
Parser (PDA)
recognizes
Regular Expression Regular Language
are gen- erated by
Scanner (DFA)
recognizes
Formal Language (a set of strings)
SLIDE 8
Context-free Grammar Context-free Language
are gen- erated by
Parser (PDA)
recognizes
Regular Expression Regular Language
are gen- erated by
Scanner (DFA)
recognizes
SLIDE 9
Context-free Grammar Context-free Language
are gen- erated by
Parser (PDA)
recognizes
Regular Expression Regular Language
are gen- erated by
Scanner (DFA)
recognizes
SLIDE 10 Context-free Grammar Context-free Language
are gen- erated by
Parser (PDA)
recognizes
Regular Expression Regular Language
are gen- erated by
Scanner (DFA)
recognizes Rules:
- 1. Concatenation
- 2. Alternation
- 3. Kleene closure
SLIDE 11
Context-free Grammar Context-free Language
are gen- erated by
Parser (PDA)
recognizes
Regular Expression Regular Language
are gen- erated by
Scanner (DFA)
recognizes
SLIDE 12
Context-free Grammar Context-free Language
are gen- erated by
Parser (PDA)
recognizes
Regular Expression Regular Language
are gen- erated by
Scanner (DFA)
recognizes
SLIDE 13 Context-free Grammar Context-free Language
are gen- erated by
Parser (PDA)
recognizes
Regular Expression Regular Language
are gen- erated by
Scanner (DFA)
recognizes Rules:
- 1. Concatenation
- 2. Alternation
- 3. Kleene closure
SLIDE 14 Context-free Grammar Context-free Language
are gen- erated by
Parser (PDA)
recognizes
Regular Expression Regular Language
are gen- erated by
Scanner (DFA)
recognizes Rules:
- 1. Concatenation
- 2. Alternation
- 3. Kleene closure
- 4. Recursion
14 / 78
SLIDE 15 Formal Languages
A formal language L is a set of strings over an alphabet Σ.
15 / 78
SLIDE 16 Formal Languages
A formal language L is a set of strings over an alphabet Σ.
- Alphabet Σ: set of characters that can be used to form
strings (letters, digits, punctuation, . . . )
16 / 78
SLIDE 17 Formal Languages
A formal language L is a set of strings over an alphabet Σ.
- Alphabet Σ: set of characters that can be used to form
strings (letters, digits, punctuation, . . . )
- String: finite sequence of characters
17 / 78
SLIDE 18 Formal Languages
A formal language L is a set of strings over an alphabet Σ.
- Alphabet Σ: set of characters that can be used to form
strings (letters, digits, punctuation, . . . )
- String: finite sequence of characters
- ǫ denotes the empty string (string with no letters: “′′)
18 / 78
SLIDE 19 Formal Languages
A formal language L is a set of strings over an alphabet Σ.
- Alphabet Σ: set of characters that can be used to form
strings (letters, digits, punctuation, . . . )
- String: finite sequence of characters
- ǫ denotes the empty string (string with no letters: “′′)
- Length |s| of a string s = number of characters in s
|ǫ| = 0, |a| = 1, |abc| = 3, . . .
19 / 78
SLIDE 20 Formal Languages
A formal language L is a set of strings over an alphabet Σ.
- Alphabet Σ: set of characters that can be used to form
strings (letters, digits, punctuation, . . . )
- String: finite sequence of characters
- ǫ denotes the empty string (string with no letters: “′′)
- Length |s| of a string s = number of characters in s
|ǫ| = 0, |a| = 1, |abc| = 3, . . .
- Σ0 = set of strings of length 0: Σ0 = {ǫ}
Σ1 = set of strings of length 1: Σ1 = {a, b, c, ...} Σ2 = set of strings of length 2: Σ2 = {aa, ab,..., ca,...} . . .
20 / 78
SLIDE 21 Formal Languages
A formal language L is a set of strings over an alphabet Σ.
- Alphabet Σ: set of characters that can be used to form
strings (letters, digits, punctuation, . . . )
- String: finite sequence of characters
- ǫ denotes the empty string (string with no letters: “′′)
- Length |s| of a string s = number of characters in s
|ǫ| = 0, |a| = 1, |abc| = 3, . . .
- Σ0 = set of strings of length 0: Σ0 = {ǫ}
Σ1 = set of strings of length 1: Σ1 = {a, b, c, ...} Σ2 = set of strings of length 2: Σ2 = {aa, ab,..., ca,...} . . .
- Kleene star: Σ∗ = Σ0 ∪ Σ1 ∪ Σ2 ∪ . . . (set of all strings over
alphabet Σ)
21 / 78
SLIDE 22 Formal Languages
A formal language L is a set of strings over an alphabet Σ.
- Alphabet Σ: set of characters that can be used to form
strings (letters, digits, punctuation, . . . )
- String: finite sequence of characters
- ǫ denotes the empty string (string with no letters: “′′)
- Length |s| of a string s = number of characters in s
|ǫ| = 0, |a| = 1, |abc| = 3, . . .
- Σ0 = set of strings of length 0: Σ0 = {ǫ}
Σ1 = set of strings of length 1: Σ1 = {a, b, c, ...} Σ2 = set of strings of length 2: Σ2 = {aa, ab,..., ca,...} . . .
- Kleene star: Σ∗ = Σ0 ∪ Σ1 ∪ Σ2 ∪ . . . (set of all strings over
alphabet Σ)
- A formal language L over alphabet Σ is a subset L ⊆ Σ∗
22 / 78
SLIDE 23 Regular Expressions (RegEx)
Alphabet Σ = {a, b, c}
23 / 78
SLIDE 24 Regular Expressions (RegEx)
Alphabet Σ = {a, b, c}
24 / 78
SLIDE 25 Regular Expressions (RegEx)
Alphabet Σ = {a, b, c}
ab
25 / 78
SLIDE 26 Regular Expressions (RegEx)
Alphabet Σ = {a, b, c}
- Concatenation
- Alternation (|)
ab
26 / 78
SLIDE 27 Regular Expressions (RegEx)
Alphabet Σ = {a, b, c}
- Concatenation
- Alternation (|)
ab|c
27 / 78
SLIDE 28 Regular Expressions (RegEx)
Alphabet Σ = {a, b, c}
- Concatenation
- Alternation (|)
- Kleene star (*)
ab|c
28 / 78
SLIDE 29 Regular Expressions (RegEx)
Alphabet Σ = {a, b, c}
- Concatenation
- Alternation (|)
- Kleene star (*)
a∗b|c
29 / 78
SLIDE 30 Regular Expressions (RegEx)
Alphabet Σ = {a, b, c}
- Concatenation
- Alternation (|)
- Kleene star (*)
- Parentheses
a∗b|c
30 / 78
SLIDE 31 Regular Expressions (RegEx)
Alphabet Σ = {a, b, c}
- Concatenation
- Alternation (|)
- Kleene star (*)
- Parentheses
a∗(b|c)
31 / 78
SLIDE 32 Regular Expressions (RegEx)
Alphabet Σ = {a, b, c}
- Concatenation
- Alternation (|)
- Kleene star (*)
- Parentheses
a∗(b|c)
32 / 78
SLIDE 33 RegEx Operator Precedence
33 / 78
SLIDE 34 RegEx Operator Precedence
34 / 78
SLIDE 35 RegEx Operator Precedence
a∗(b|c)a∗|a
35 / 78
SLIDE 36 RegEx Operator Precedence
a∗(b|c)a∗|a
36 / 78
SLIDE 37 RegEx Operator Precedence
- Parentheses
- Kleene star (*)
a∗(b|c)a∗|a
37 / 78
SLIDE 38 RegEx Operator Precedence
- Parentheses
- Kleene star (*)
a∗(b|c)a∗|a
38 / 78
SLIDE 39 RegEx Operator Precedence
- Parentheses
- Kleene star (*)
- Concatenation
a∗(b|c)a∗|a
39 / 78
SLIDE 40 RegEx Operator Precedence
- Parentheses
- Kleene star (*)
- Concatenation
a∗(b|c)a∗|a
40 / 78
SLIDE 41 RegEx Operator Precedence
- Parentheses
- Kleene star (*)
- Concatenation
- Alternation (|)
a∗(b|c)a∗|a
41 / 78
SLIDE 42 RegEx Operator Precedence
- Parentheses
- Kleene star (*)
- Concatenation
- Alternation (|)
a∗(b|c)a∗|a
42 / 78
SLIDE 43 Examples of Regular Expressions-1
Σ = {0, 1}
43 / 78
SLIDE 44 Examples of Regular Expressions-1
Σ = {0, 1}
44 / 78
SLIDE 45 Examples of Regular Expressions-1
Σ = {0, 1}
0 followed by any number of 1s (i.e., {0, 01, 011, 0111, . . .})
45 / 78
SLIDE 46 Examples of Regular Expressions-1
Σ = {0, 1}
0 followed by any number of 1s (i.e., {0, 01, 011, 0111, . . .})
46 / 78
SLIDE 47 Examples of Regular Expressions-1
Σ = {0, 1}
0 followed by any number of 1s (i.e., {0, 01, 011, 0111, . . .})
0 followed by any number of 1s then 01 (i.e., {001, 0101, 01101, 011101, . . .})
47 / 78
SLIDE 48 Examples of Regular Expressions-1
Σ = {0, 1}
0 followed by any number of 1s (i.e., {0, 01, 011, 0111, . . .})
0 followed by any number of 1s then 01 (i.e., {001, 0101, 01101, 011101, . . .})
48 / 78
SLIDE 49 Examples of Regular Expressions-1
Σ = {0, 1}
0 followed by any number of 1s (i.e., {0, 01, 011, 0111, . . .})
0 followed by any number of 1s then 01 (i.e., {001, 0101, 01101, 011101, . . .})
all binary strings that end in 0
49 / 78
SLIDE 50 Examples of Regular Expressions-1
Σ = {0, 1}
0 followed by any number of 1s (i.e., {0, 01, 011, 0111, . . .})
0 followed by any number of 1s then 01 (i.e., {001, 0101, 01101, 011101, . . .})
all binary strings that end in 0
50 / 78
SLIDE 51 Examples of Regular Expressions-1
Σ = {0, 1}
0 followed by any number of 1s (i.e., {0, 01, 011, 0111, . . .})
0 followed by any number of 1s then 01 (i.e., {001, 0101, 01101, 011101, . . .})
all binary strings that end in 0
all binary strings that start with 0 or 1, followed by one or more 0s
51 / 78
SLIDE 52 Examples of Regular Expressions-1
Σ = {0, 1}
0 followed by any number of 1s (i.e., {0, 01, 011, 0111, . . .})
0 followed by any number of 1s then 01 (i.e., {001, 0101, 01101, 011101, . . .})
all binary strings that end in 0
all binary strings that start with 0 or 1, followed by one or more 0s
52 / 78
SLIDE 53 Examples of Regular Expressions-1
Σ = {0, 1}
0 followed by any number of 1s (i.e., {0, 01, 011, 0111, . . .})
0 followed by any number of 1s then 01 (i.e., {001, 0101, 01101, 011101, . . .})
all binary strings that end in 0
all binary strings that start with 0 or 1, followed by one or more 0s
all binary strings
53 / 78
SLIDE 54 Examples of Regular Expressions-2
Given, Σ = {a, b, c} which regular expression expresses the set of strings that are at least four characters long, and that either start with ‘a’ and end with ‘b’, or start with ‘b’ and end with ‘a’?
54 / 78
SLIDE 55 Examples of Regular Expressions-3
Given Σ = {0, 1}, write down the following Regular Expressions:
55 / 78
SLIDE 56 Examples of Regular Expressions-3
Given Σ = {0, 1}, write down the following Regular Expressions:
- Strings of length 0 or 1:
56 / 78
SLIDE 57 Examples of Regular Expressions-3
Given Σ = {0, 1}, write down the following Regular Expressions:
- Strings of length 0 or 1:
- Strings of length 2:
57 / 78
SLIDE 58 Examples of Regular Expressions-3
Given Σ = {0, 1}, write down the following Regular Expressions:
- Strings of length 0 or 1:
- Strings of length 2:
- Strings of even length:
58 / 78
SLIDE 59 Examples of Regular Expressions-3
Given Σ = {0, 1}, write down the following Regular Expressions:
- Strings of length 0 or 1:
- Strings of length 2:
- Strings of even length:
- Strings of length multiple of 3:
59 / 78
SLIDE 60 Examples of Regular Expressions-3
Given Σ = {0, 1}, write down the following Regular Expressions:
- Strings of length 0 or 1:
- Strings of length 2:
- Strings of even length:
- Strings of length multiple of 3:
- Strings except those of length 1:
60 / 78
SLIDE 61 Examples of Regular Expressions-3
Given Σ = {0, 1}, write down the following Regular Expressions:
- Strings of length 0 or 1:
- Strings of length 2:
- Strings of even length:
- Strings of length multiple of 3:
- Strings except those of length 1:
- Strings except 0 and 1:
61 / 78
SLIDE 62 Examples of Regular Expressions-3
Given Σ = {0, 1}, write down the following Regular Expressions:
- Strings of length 0 or 1:
- Strings of length 2:
- Strings of even length:
- Strings of length multiple of 3:
- Strings except those of length 1:
- Strings except 0 and 1:
- Strings with two consecutive 0s:
62 / 78
SLIDE 63 Examples of Regular Expressions-3
Given Σ = {0, 1}, write down the following Regular Expressions:
- Strings of length 0 or 1:
- Strings of length 2:
- Strings of even length:
- Strings of length multiple of 3:
- Strings except those of length 1:
- Strings except 0 and 1:
- Strings with two consecutive 0s:
- Strings without two consecutive 0s:
63 / 78
SLIDE 64 Examples of Regular Expressions-3
Given Σ = {0, 1}, write down the following Regular Expressions:
- Strings of length 0 or 1:
- Strings of length 2:
- Strings of even length:
- Strings of length multiple of 3:
- Strings except those of length 1:
- Strings except 0 and 1:
- Strings with two consecutive 0s:
- Strings without two consecutive 0s:
- Strings without three consecutive 0s:
64 / 78
SLIDE 65 Examples of Regular Expressions-3
Given Σ = {0, 1}, write down the following Regular Expressions:
- Strings of length 0 or 1:
- Strings of length 2:
- Strings of even length:
- Strings of length multiple of 3:
- Strings except those of length 1:
- Strings except 0 and 1:
- Strings with two consecutive 0s:
- Strings without two consecutive 0s:
- Strings without three consecutive 0s:
- Strings with an even number of 0s:
65 / 78
SLIDE 66 Applications of Regular Expressions
Two common operations
66 / 78
SLIDE 67 Applications of Regular Expressions
Two common operations
- Searching for patterns in a text
67 / 78
SLIDE 68 Applications of Regular Expressions
Two common operations
- Searching for patterns in a text
- Replacing text portions matching a pattern
68 / 78
SLIDE 69 Applications of Regular Expressions
Two common operations
- Searching for patterns in a text
- Replacing text portions matching a pattern
Used in
69 / 78
SLIDE 70 Applications of Regular Expressions
Two common operations
- Searching for patterns in a text
- Replacing text portions matching a pattern
Used in
- Text editors: emacs, vim, . . .
70 / 78
SLIDE 71 Applications of Regular Expressions
Two common operations
- Searching for patterns in a text
- Replacing text portions matching a pattern
Used in
- Text editors: emacs, vim, . . .
- System tools: shells, grep, lex, flex, sed, awk, . . .
71 / 78
SLIDE 72 Applications of Regular Expressions
Two common operations
- Searching for patterns in a text
- Replacing text portions matching a pattern
Used in
- Text editors: emacs, vim, . . .
- System tools: shells, grep, lex, flex, sed, awk, . . .
- Programming languages: Perl, Ruby, Python, C/C++ (with
regex library), Java (with regex package), . . .
72 / 78
SLIDE 73 RegEx or CFG?
number − → integer | decimal integer − → d d∗ decimal − → d∗(.d|d.)d∗ d − → 0|1|2|3|4|5|6|7|8|9
RegEx or CFG?
number − → integer | decimal integer − → d d∗ decimal − → d∗(.d|d.)d∗| decimal d − → 0|1|2|3|4|5|6|7|8|9
73 / 78
SLIDE 74 RegEx or CFG?
number − → integer | decimal integer − → d d∗ decimal − → d∗(.d|d.)d∗ d − → 0|1|2|3|4|5|6|7|8|9
RegEx
dd∗|d∗(.d|d.)d∗
RegEx or CFG?
number − → integer | decimal integer − → d d∗ decimal − → d∗(.d|d.)d∗| decimal d − → 0|1|2|3|4|5|6|7|8|9
74 / 78
SLIDE 75 Scanner Implementation
Also known as RegEx = ⇒ Scanner (DFA)
75 / 78
SLIDE 76 Scanner Implementation
Also known as RegEx = ⇒ Scanner (DFA)
⇒ NFA (nondeterministic finite automaton)
76 / 78
SLIDE 77 Scanner Implementation
Also known as RegEx = ⇒ Scanner (DFA)
⇒ NFA (nondeterministic finite automaton)
⇒ DFA (deterministic finite automaton)
77 / 78
SLIDE 78 Scanner Implementation
Also known as RegEx = ⇒ Scanner (DFA)
⇒ NFA (nondeterministic finite automaton)
⇒ DFA (deterministic finite automaton)
78 / 78