CSCI 3136 Principles of Programming Languages Lexical Analysis and - - PowerPoint PPT Presentation

csci 3136 principles of programming languages
SMART_READER_LITE
LIVE PREVIEW

CSCI 3136 Principles of Programming Languages Lexical Analysis and - - PowerPoint PPT Presentation

CSCI 3136 Principles of Programming Languages Lexical Analysis and Automata Theory - 1 Summer 2013 Faculty of Computer Science Dalhousie University 1 / 78 Phases of Compilation Character Stream Scanner (lexical analysis) Token Stream


slide-1
SLIDE 1

CSCI 3136 Principles of Programming Languages

Lexical Analysis and Automata Theory - 1

Summer 2013 Faculty of Computer Science Dalhousie University

1 / 78

slide-2
SLIDE 2

Phases of Compilation

Semantic Analysis and Inter- mediate Code Generation Machine-independent Code Improvement (Optional) Target Code Generation Machine-specific Code Im- provement (Optional) Parser (syntactic analysis) Scanner (lexical analysis) Character Stream Token Stream Parse Tree Abstract Syntax Tree or Other Intermediate Form Modified Intermediate Form Target Language (e.g., assembly) Modified Target Language Symbol Table

2 / 78

slide-3
SLIDE 3
slide-4
SLIDE 4

Context-free Grammar Context-free Language

are gen- erated by

Parser (PDA)

recognizes

slide-5
SLIDE 5

Context-free Grammar Context-free Language

are gen- erated by

Parser (PDA)

recognizes

Regular Expression Regular Language

are gen- erated by

Scanner (DFA)

recognizes

slide-6
SLIDE 6

Context-free Grammar Context-free Language

are gen- erated by

Parser (PDA)

recognizes

Regular Expression Regular Language

are gen- erated by

Scanner (DFA)

recognizes

slide-7
SLIDE 7

Context-free Grammar Context-free Language

are gen- erated by

Parser (PDA)

recognizes

Regular Expression Regular Language

are gen- erated by

Scanner (DFA)

recognizes

Formal Language (a set of strings)

slide-8
SLIDE 8

Context-free Grammar Context-free Language

are gen- erated by

Parser (PDA)

recognizes

Regular Expression Regular Language

are gen- erated by

Scanner (DFA)

recognizes

slide-9
SLIDE 9

Context-free Grammar Context-free Language

are gen- erated by

Parser (PDA)

recognizes

Regular Expression Regular Language

are gen- erated by

Scanner (DFA)

recognizes

slide-10
SLIDE 10

Context-free Grammar Context-free Language

are gen- erated by

Parser (PDA)

recognizes

Regular Expression Regular Language

are gen- erated by

Scanner (DFA)

recognizes Rules:

  • 1. Concatenation
  • 2. Alternation
  • 3. Kleene closure
slide-11
SLIDE 11

Context-free Grammar Context-free Language

are gen- erated by

Parser (PDA)

recognizes

Regular Expression Regular Language

are gen- erated by

Scanner (DFA)

recognizes

slide-12
SLIDE 12

Context-free Grammar Context-free Language

are gen- erated by

Parser (PDA)

recognizes

Regular Expression Regular Language

are gen- erated by

Scanner (DFA)

recognizes

slide-13
SLIDE 13

Context-free Grammar Context-free Language

are gen- erated by

Parser (PDA)

recognizes

Regular Expression Regular Language

are gen- erated by

Scanner (DFA)

recognizes Rules:

  • 1. Concatenation
  • 2. Alternation
  • 3. Kleene closure
slide-14
SLIDE 14

Context-free Grammar Context-free Language

are gen- erated by

Parser (PDA)

recognizes

Regular Expression Regular Language

are gen- erated by

Scanner (DFA)

recognizes Rules:

  • 1. Concatenation
  • 2. Alternation
  • 3. Kleene closure
  • 4. Recursion

14 / 78

slide-15
SLIDE 15

Formal Languages

A formal language L is a set of strings over an alphabet Σ.

15 / 78

slide-16
SLIDE 16

Formal Languages

A formal language L is a set of strings over an alphabet Σ.

  • Alphabet Σ: set of characters that can be used to form

strings (letters, digits, punctuation, . . . )

16 / 78

slide-17
SLIDE 17

Formal Languages

A formal language L is a set of strings over an alphabet Σ.

  • Alphabet Σ: set of characters that can be used to form

strings (letters, digits, punctuation, . . . )

  • String: finite sequence of characters

17 / 78

slide-18
SLIDE 18

Formal Languages

A formal language L is a set of strings over an alphabet Σ.

  • Alphabet Σ: set of characters that can be used to form

strings (letters, digits, punctuation, . . . )

  • String: finite sequence of characters
  • ǫ denotes the empty string (string with no letters: “′′)

18 / 78

slide-19
SLIDE 19

Formal Languages

A formal language L is a set of strings over an alphabet Σ.

  • Alphabet Σ: set of characters that can be used to form

strings (letters, digits, punctuation, . . . )

  • String: finite sequence of characters
  • ǫ denotes the empty string (string with no letters: “′′)
  • Length |s| of a string s = number of characters in s

|ǫ| = 0, |a| = 1, |abc| = 3, . . .

19 / 78

slide-20
SLIDE 20

Formal Languages

A formal language L is a set of strings over an alphabet Σ.

  • Alphabet Σ: set of characters that can be used to form

strings (letters, digits, punctuation, . . . )

  • String: finite sequence of characters
  • ǫ denotes the empty string (string with no letters: “′′)
  • Length |s| of a string s = number of characters in s

|ǫ| = 0, |a| = 1, |abc| = 3, . . .

  • Σ0 = set of strings of length 0: Σ0 = {ǫ}

Σ1 = set of strings of length 1: Σ1 = {a, b, c, ...} Σ2 = set of strings of length 2: Σ2 = {aa, ab,..., ca,...} . . .

20 / 78

slide-21
SLIDE 21

Formal Languages

A formal language L is a set of strings over an alphabet Σ.

  • Alphabet Σ: set of characters that can be used to form

strings (letters, digits, punctuation, . . . )

  • String: finite sequence of characters
  • ǫ denotes the empty string (string with no letters: “′′)
  • Length |s| of a string s = number of characters in s

|ǫ| = 0, |a| = 1, |abc| = 3, . . .

  • Σ0 = set of strings of length 0: Σ0 = {ǫ}

Σ1 = set of strings of length 1: Σ1 = {a, b, c, ...} Σ2 = set of strings of length 2: Σ2 = {aa, ab,..., ca,...} . . .

  • Kleene star: Σ∗ = Σ0 ∪ Σ1 ∪ Σ2 ∪ . . . (set of all strings over

alphabet Σ)

21 / 78

slide-22
SLIDE 22

Formal Languages

A formal language L is a set of strings over an alphabet Σ.

  • Alphabet Σ: set of characters that can be used to form

strings (letters, digits, punctuation, . . . )

  • String: finite sequence of characters
  • ǫ denotes the empty string (string with no letters: “′′)
  • Length |s| of a string s = number of characters in s

|ǫ| = 0, |a| = 1, |abc| = 3, . . .

  • Σ0 = set of strings of length 0: Σ0 = {ǫ}

Σ1 = set of strings of length 1: Σ1 = {a, b, c, ...} Σ2 = set of strings of length 2: Σ2 = {aa, ab,..., ca,...} . . .

  • Kleene star: Σ∗ = Σ0 ∪ Σ1 ∪ Σ2 ∪ . . . (set of all strings over

alphabet Σ)

  • A formal language L over alphabet Σ is a subset L ⊆ Σ∗

22 / 78

slide-23
SLIDE 23

Regular Expressions (RegEx)

Alphabet Σ = {a, b, c}

23 / 78

slide-24
SLIDE 24

Regular Expressions (RegEx)

Alphabet Σ = {a, b, c}

  • Concatenation

24 / 78

slide-25
SLIDE 25

Regular Expressions (RegEx)

Alphabet Σ = {a, b, c}

  • Concatenation

ab

25 / 78

slide-26
SLIDE 26

Regular Expressions (RegEx)

Alphabet Σ = {a, b, c}

  • Concatenation
  • Alternation (|)

ab

26 / 78

slide-27
SLIDE 27

Regular Expressions (RegEx)

Alphabet Σ = {a, b, c}

  • Concatenation
  • Alternation (|)

ab|c

27 / 78

slide-28
SLIDE 28

Regular Expressions (RegEx)

Alphabet Σ = {a, b, c}

  • Concatenation
  • Alternation (|)
  • Kleene star (*)

ab|c

28 / 78

slide-29
SLIDE 29

Regular Expressions (RegEx)

Alphabet Σ = {a, b, c}

  • Concatenation
  • Alternation (|)
  • Kleene star (*)

a∗b|c

29 / 78

slide-30
SLIDE 30

Regular Expressions (RegEx)

Alphabet Σ = {a, b, c}

  • Concatenation
  • Alternation (|)
  • Kleene star (*)
  • Parentheses

a∗b|c

30 / 78

slide-31
SLIDE 31

Regular Expressions (RegEx)

Alphabet Σ = {a, b, c}

  • Concatenation
  • Alternation (|)
  • Kleene star (*)
  • Parentheses

a∗(b|c)

31 / 78

slide-32
SLIDE 32

Regular Expressions (RegEx)

Alphabet Σ = {a, b, c}

  • Concatenation
  • Alternation (|)
  • Kleene star (*)
  • Parentheses

a∗(b|c)

32 / 78

slide-33
SLIDE 33

RegEx Operator Precedence

33 / 78

slide-34
SLIDE 34

RegEx Operator Precedence

  • Parentheses

34 / 78

slide-35
SLIDE 35

RegEx Operator Precedence

  • Parentheses

a∗(b|c)a∗|a

35 / 78

slide-36
SLIDE 36

RegEx Operator Precedence

  • Parentheses

a∗(b|c)a∗|a

36 / 78

slide-37
SLIDE 37

RegEx Operator Precedence

  • Parentheses
  • Kleene star (*)

a∗(b|c)a∗|a

37 / 78

slide-38
SLIDE 38

RegEx Operator Precedence

  • Parentheses
  • Kleene star (*)

a∗(b|c)a∗|a

38 / 78

slide-39
SLIDE 39

RegEx Operator Precedence

  • Parentheses
  • Kleene star (*)
  • Concatenation

a∗(b|c)a∗|a

39 / 78

slide-40
SLIDE 40

RegEx Operator Precedence

  • Parentheses
  • Kleene star (*)
  • Concatenation

a∗(b|c)a∗|a

40 / 78

slide-41
SLIDE 41

RegEx Operator Precedence

  • Parentheses
  • Kleene star (*)
  • Concatenation
  • Alternation (|)

a∗(b|c)a∗|a

41 / 78

slide-42
SLIDE 42

RegEx Operator Precedence

  • Parentheses
  • Kleene star (*)
  • Concatenation
  • Alternation (|)

a∗(b|c)a∗|a

42 / 78

slide-43
SLIDE 43

Examples of Regular Expressions-1

Σ = {0, 1}

43 / 78

slide-44
SLIDE 44

Examples of Regular Expressions-1

Σ = {0, 1}

  • 01∗

44 / 78

slide-45
SLIDE 45

Examples of Regular Expressions-1

Σ = {0, 1}

  • 01∗

0 followed by any number of 1s (i.e., {0, 01, 011, 0111, . . .})

45 / 78

slide-46
SLIDE 46

Examples of Regular Expressions-1

Σ = {0, 1}

  • 01∗

0 followed by any number of 1s (i.e., {0, 01, 011, 0111, . . .})

  • (01∗)(01)

46 / 78

slide-47
SLIDE 47

Examples of Regular Expressions-1

Σ = {0, 1}

  • 01∗

0 followed by any number of 1s (i.e., {0, 01, 011, 0111, . . .})

  • (01∗)(01)

0 followed by any number of 1s then 01 (i.e., {001, 0101, 01101, 011101, . . .})

47 / 78

slide-48
SLIDE 48

Examples of Regular Expressions-1

Σ = {0, 1}

  • 01∗

0 followed by any number of 1s (i.e., {0, 01, 011, 0111, . . .})

  • (01∗)(01)

0 followed by any number of 1s then 01 (i.e., {001, 0101, 01101, 011101, . . .})

  • (0|1)∗0

48 / 78

slide-49
SLIDE 49

Examples of Regular Expressions-1

Σ = {0, 1}

  • 01∗

0 followed by any number of 1s (i.e., {0, 01, 011, 0111, . . .})

  • (01∗)(01)

0 followed by any number of 1s then 01 (i.e., {001, 0101, 01101, 011101, . . .})

  • (0|1)∗0

all binary strings that end in 0

49 / 78

slide-50
SLIDE 50

Examples of Regular Expressions-1

Σ = {0, 1}

  • 01∗

0 followed by any number of 1s (i.e., {0, 01, 011, 0111, . . .})

  • (01∗)(01)

0 followed by any number of 1s then 01 (i.e., {001, 0101, 01101, 011101, . . .})

  • (0|1)∗0

all binary strings that end in 0

  • (0|1)00∗

50 / 78

slide-51
SLIDE 51

Examples of Regular Expressions-1

Σ = {0, 1}

  • 01∗

0 followed by any number of 1s (i.e., {0, 01, 011, 0111, . . .})

  • (01∗)(01)

0 followed by any number of 1s then 01 (i.e., {001, 0101, 01101, 011101, . . .})

  • (0|1)∗0

all binary strings that end in 0

  • (0|1)00∗

all binary strings that start with 0 or 1, followed by one or more 0s

51 / 78

slide-52
SLIDE 52

Examples of Regular Expressions-1

Σ = {0, 1}

  • 01∗

0 followed by any number of 1s (i.e., {0, 01, 011, 0111, . . .})

  • (01∗)(01)

0 followed by any number of 1s then 01 (i.e., {001, 0101, 01101, 011101, . . .})

  • (0|1)∗0

all binary strings that end in 0

  • (0|1)00∗

all binary strings that start with 0 or 1, followed by one or more 0s

  • (0|1)∗

52 / 78

slide-53
SLIDE 53

Examples of Regular Expressions-1

Σ = {0, 1}

  • 01∗

0 followed by any number of 1s (i.e., {0, 01, 011, 0111, . . .})

  • (01∗)(01)

0 followed by any number of 1s then 01 (i.e., {001, 0101, 01101, 011101, . . .})

  • (0|1)∗0

all binary strings that end in 0

  • (0|1)00∗

all binary strings that start with 0 or 1, followed by one or more 0s

  • (0|1)∗

all binary strings

53 / 78

slide-54
SLIDE 54

Examples of Regular Expressions-2

Given, Σ = {a, b, c} which regular expression expresses the set of strings that are at least four characters long, and that either start with ‘a’ and end with ‘b’, or start with ‘b’ and end with ‘a’?

54 / 78

slide-55
SLIDE 55

Examples of Regular Expressions-3

Given Σ = {0, 1}, write down the following Regular Expressions:

55 / 78

slide-56
SLIDE 56

Examples of Regular Expressions-3

Given Σ = {0, 1}, write down the following Regular Expressions:

  • Strings of length 0 or 1:

56 / 78

slide-57
SLIDE 57

Examples of Regular Expressions-3

Given Σ = {0, 1}, write down the following Regular Expressions:

  • Strings of length 0 or 1:
  • Strings of length 2:

57 / 78

slide-58
SLIDE 58

Examples of Regular Expressions-3

Given Σ = {0, 1}, write down the following Regular Expressions:

  • Strings of length 0 or 1:
  • Strings of length 2:
  • Strings of even length:

58 / 78

slide-59
SLIDE 59

Examples of Regular Expressions-3

Given Σ = {0, 1}, write down the following Regular Expressions:

  • Strings of length 0 or 1:
  • Strings of length 2:
  • Strings of even length:
  • Strings of length multiple of 3:

59 / 78

slide-60
SLIDE 60

Examples of Regular Expressions-3

Given Σ = {0, 1}, write down the following Regular Expressions:

  • Strings of length 0 or 1:
  • Strings of length 2:
  • Strings of even length:
  • Strings of length multiple of 3:
  • Strings except those of length 1:

60 / 78

slide-61
SLIDE 61

Examples of Regular Expressions-3

Given Σ = {0, 1}, write down the following Regular Expressions:

  • Strings of length 0 or 1:
  • Strings of length 2:
  • Strings of even length:
  • Strings of length multiple of 3:
  • Strings except those of length 1:
  • Strings except 0 and 1:

61 / 78

slide-62
SLIDE 62

Examples of Regular Expressions-3

Given Σ = {0, 1}, write down the following Regular Expressions:

  • Strings of length 0 or 1:
  • Strings of length 2:
  • Strings of even length:
  • Strings of length multiple of 3:
  • Strings except those of length 1:
  • Strings except 0 and 1:
  • Strings with two consecutive 0s:

62 / 78

slide-63
SLIDE 63

Examples of Regular Expressions-3

Given Σ = {0, 1}, write down the following Regular Expressions:

  • Strings of length 0 or 1:
  • Strings of length 2:
  • Strings of even length:
  • Strings of length multiple of 3:
  • Strings except those of length 1:
  • Strings except 0 and 1:
  • Strings with two consecutive 0s:
  • Strings without two consecutive 0s:

63 / 78

slide-64
SLIDE 64

Examples of Regular Expressions-3

Given Σ = {0, 1}, write down the following Regular Expressions:

  • Strings of length 0 or 1:
  • Strings of length 2:
  • Strings of even length:
  • Strings of length multiple of 3:
  • Strings except those of length 1:
  • Strings except 0 and 1:
  • Strings with two consecutive 0s:
  • Strings without two consecutive 0s:
  • Strings without three consecutive 0s:

64 / 78

slide-65
SLIDE 65

Examples of Regular Expressions-3

Given Σ = {0, 1}, write down the following Regular Expressions:

  • Strings of length 0 or 1:
  • Strings of length 2:
  • Strings of even length:
  • Strings of length multiple of 3:
  • Strings except those of length 1:
  • Strings except 0 and 1:
  • Strings with two consecutive 0s:
  • Strings without two consecutive 0s:
  • Strings without three consecutive 0s:
  • Strings with an even number of 0s:

65 / 78

slide-66
SLIDE 66

Applications of Regular Expressions

Two common operations

66 / 78

slide-67
SLIDE 67

Applications of Regular Expressions

Two common operations

  • Searching for patterns in a text

67 / 78

slide-68
SLIDE 68

Applications of Regular Expressions

Two common operations

  • Searching for patterns in a text
  • Replacing text portions matching a pattern

68 / 78

slide-69
SLIDE 69

Applications of Regular Expressions

Two common operations

  • Searching for patterns in a text
  • Replacing text portions matching a pattern

Used in

69 / 78

slide-70
SLIDE 70

Applications of Regular Expressions

Two common operations

  • Searching for patterns in a text
  • Replacing text portions matching a pattern

Used in

  • Text editors: emacs, vim, . . .

70 / 78

slide-71
SLIDE 71

Applications of Regular Expressions

Two common operations

  • Searching for patterns in a text
  • Replacing text portions matching a pattern

Used in

  • Text editors: emacs, vim, . . .
  • System tools: shells, grep, lex, flex, sed, awk, . . .

71 / 78

slide-72
SLIDE 72

Applications of Regular Expressions

Two common operations

  • Searching for patterns in a text
  • Replacing text portions matching a pattern

Used in

  • Text editors: emacs, vim, . . .
  • System tools: shells, grep, lex, flex, sed, awk, . . .
  • Programming languages: Perl, Ruby, Python, C/C++ (with

regex library), Java (with regex package), . . .

72 / 78

slide-73
SLIDE 73

RegEx or CFG?

number − → integer | decimal integer − → d d∗ decimal − → d∗(.d|d.)d∗ d − → 0|1|2|3|4|5|6|7|8|9

RegEx or CFG?

number − → integer | decimal integer − → d d∗ decimal − → d∗(.d|d.)d∗| decimal d − → 0|1|2|3|4|5|6|7|8|9

73 / 78

slide-74
SLIDE 74

RegEx or CFG?

number − → integer | decimal integer − → d d∗ decimal − → d∗(.d|d.)d∗ d − → 0|1|2|3|4|5|6|7|8|9

RegEx

dd∗|d∗(.d|d.)d∗

RegEx or CFG?

number − → integer | decimal integer − → d d∗ decimal − → d∗(.d|d.)d∗| decimal d − → 0|1|2|3|4|5|6|7|8|9

74 / 78

slide-75
SLIDE 75

Scanner Implementation

Also known as RegEx = ⇒ Scanner (DFA)

75 / 78

slide-76
SLIDE 76

Scanner Implementation

Also known as RegEx = ⇒ Scanner (DFA)

  • RegEx =

⇒ NFA (nondeterministic finite automaton)

76 / 78

slide-77
SLIDE 77

Scanner Implementation

Also known as RegEx = ⇒ Scanner (DFA)

  • RegEx =

⇒ NFA (nondeterministic finite automaton)

  • NFA =

⇒ DFA (deterministic finite automaton)

77 / 78

slide-78
SLIDE 78

Scanner Implementation

Also known as RegEx = ⇒ Scanner (DFA)

  • RegEx =

⇒ NFA (nondeterministic finite automaton)

  • NFA =

⇒ DFA (deterministic finite automaton)

  • Minimizing the DFA

78 / 78