CSCI 3136 Principles of Programming Languages Lexical Analysis and - - PowerPoint PPT Presentation

csci 3136 principles of programming languages
SMART_READER_LITE
LIVE PREVIEW

CSCI 3136 Principles of Programming Languages Lexical Analysis and - - PowerPoint PPT Presentation

CSCI 3136 Principles of Programming Languages Lexical Analysis and Automata Theory - 4 Summer 2013 Faculty of Computer Science Dalhousie University 1 / 11 Regular Expression to NFA (Example) d ( . d | d . )d . d d


slide-1
SLIDE 1

CSCI 3136 Principles of Programming Languages

Lexical Analysis and Automata Theory - 4

Summer 2013 Faculty of Computer Science Dalhousie University

1 / 11

slide-2
SLIDE 2

Regular Expression to NFA (Example)

d∗(.d|d.)d∗ . d d . ǫ ǫ ǫ ǫ d ǫ ǫ ǫ ǫ d ǫ ǫ ǫ ǫ

2 / 11

slide-3
SLIDE 3

NFA to DFA (Example)

d∗(.d|d.)d∗ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 . d d . ǫ ǫ ǫ ǫ d ǫ ǫ ǫ ǫ d ǫ ǫ ǫ ǫ d . {1} {2,3,4,5,8,9} {6} {2,3,4,5,8,9} {2,3,4,5,8,9 } {6,10,11,12,14} {6} { 7,11,12,14 } ∅ {6,10,11,12,14} { 7,11,12,13,14} ∅ { 7,11,12,14 } {12,13,14} ∅ { 7,11,12,13,14} {12,13,14} ∅ {12,13,14} {12,13,14} ∅

3 / 11

slide-4
SLIDE 4

NFA to DFA (Example)

{1} {2, 3, 4, 5, 8, 9} {6} {7, 11, 12, 14} {6, 10, 11, 12, 14} {7, 11, 12, 13, 14} {12, 13, 14} d . . d d d d d d

⇐ ⇒

1 2 3 5 4 6 7 d . . d d d d d d

4 / 11

slide-5
SLIDE 5

DFA Minimization Algorithm

  • Create lower-triangular table DISTINCT, initially blank
  • For every pair of states (p,q):

◮ If p is final and q is not, or vice versa,

Set DISTINCT(p,q) to be ǫ

  • Loop until there is no change in the table contents:

◮ For each pair of states (p,q) and each symbol a in the

alphabet:

◮ If DISTINCT(p,q) is empty and DISTINCT( δ(p, a), δ(q, a) )

is not empty Set DISTINCT(p, q) to be a

  • Combine all states that are not distinct

5 / 11

slide-6
SLIDE 6

Minimizing the DFA (Example)

1 2 3 5 4 6 7 d . . d d d d d d 1 2 3 4, 5, 6, 7 d . . d d d

2 . 3 d d 4 ǫ ǫ ǫ 5 ǫ ǫ ǫ 6 ǫ ǫ ǫ 7 ǫ ǫ ǫ 1 2 3 4 5 6

6 / 11

slide-7
SLIDE 7

Limits of Regular Language

  • You cannot construct DFA to recognize these languages

◮ L = anbn, (n)n (parenthesis languages) ◮ L ={set of all syntactically valid C programs} ◮ L ={ap: where p is a prime number} ◮ . . .

  • Not all languages are regular

7 / 11

slide-8
SLIDE 8

Pumping Lemma for RLs

For any regular language L, there exists a constant n such that any string w ∈ L, |w|≥n can be broken into w = xyz, such that:

◮ |xy| ≤ n ◮ |y| > 0 ◮ xykz ∈ L for all k = 0, 1, 2, · · ·

That is: the substring y can be pumped (removed or repeated any number of times, and the resulting string is always in L).

8 / 11

slide-9
SLIDE 9

Proof Sketch for Pumping Lemma

Let L be defined by a DFA with n states. If string w has length |w| ≥ n number of states then, from the pigeonhole principle, a state q is repeated in the walk

9 / 11

slide-10
SLIDE 10

Proof Sketch for Pumping Lemma

Let L be defined by a DFA with n states. If string w has length |w| ≥ n number of states then, from the pigeonhole principle, a state q is repeated in the walk q walk w

10 / 11

slide-11
SLIDE 11

Example: Prove that L = {anbn|n > 0} is not regular

Proof: Assume L is regular. = ⇒ the pumping lemma holds. Choose w ∈ L, where m is the constant in the pumping lemma. (Note that w must be choosen such that |w| ≥ m.) The only way to partition w into three parts, w = xyz, is such that x contains 0

  • r more a’s, y contains 1 or more a’s, and z contains 0 or more a’s

concatenated with bm. This is because of the restrictions |xy| ≤ m and |y| > 0. So the partition is: ambm =

m

  • a · · · a

x

a · · · a

y

· · · a

m

b · · · b y = ak, k ≥ 1 We have: xyz = ambm From the Pumping Lemma: xyiz ∈ L, i = 0, 1, 2, · · · Thus: xy2z ∈ L xy2z = xyyz = am+kbm ∈ L (A contradiction !)

11 / 11