CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis - - PowerPoint PPT Presentation

cse443 compilers
SMART_READER_LITE
LIVE PREVIEW

CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis - - PowerPoint PPT Presentation

CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall Announcements HW-01 posted PR-01 posted Team formation: what is current status? Lexical Phases of structure a compiler Figure 1.6, page 5 of text Bird's eye view


slide-1
SLIDE 1

CSE443 Compilers

  • Dr. Carl Alphonce

alphonce@buffalo.edu 343 Davis Hall

slide-2
SLIDE 2

Announcements

HW-01 posted PR-01 posted Team formation: what is current status?

slide-3
SLIDE 3

Phases of a compiler

Figure 1.6, page 5 of text

Lexical structure

slide-4
SLIDE 4

Bird's eye view

{ for, while, x, factorial, … }

language: a set of strings

G = (N, ∑, P, S)

grammar: rules for generating language

regular expression

regex: a form of grammar

finite automaton

a machine for language

C program

generated by FLEX

slide-5
SLIDE 5

Formally, a grammar is defined by 4 items:

  • 1. N, a set of non-terminals
  • 2. ∑, a set of terminals
  • 3. P, a set of productions
  • 4. S, a start symbol

G = (N, ∑, P, S)

languages & grammars

slide-6
SLIDE 6

N, a set of non-terminals ∑, a set of terminals (alphabet) N ∩ ∑ = {} P, a set of productions of the form (right linear) X -> a X -> aY X -> 𝜁 X ∈ N, Y ∈ N, a ∈ ∑, 𝜁 denotes the empty string S, a start symbol S ∈ N

languages & grammars

slide-7
SLIDE 7

Lexical Analysis

Lexical structure described by regular grammar Deterministic finite state machine performs analysis

slide-8
SLIDE 8

LANGUAGE operations

base cases { 𝜁 } is a regular language ∀ a ∈ ∑, { a } is a regular language

Recall, 𝜁 is the empty string

slide-9
SLIDE 9

Li is L concatenated with itself i times: L0 = {𝜁}, by definition L1 = L L2 = LL L3 = LLL, etc. L* is the union of all these sets!

LANGUAGE operations

If L and M are regular, so are:

L ∪ M = { s | s ∈ L or s ∈ M } union LM = { st | s ∈ L and t ∈ M } concatenation L* = ∪i=0,∞ Li Kleene closure

slide-10
SLIDE 10

Example of L*

Suppose L is {a, bb} L0 = {𝜁}, by definition L1 = L = {a, bb} L2 = LL = {aa, abb, bba, bbbb} L3 = LLL = {aaa, aabb, abba, abbbb, bbaa, bbbba, bbaa, bbabb, bbbba, bbbbbb, abbbb, bbabb} L4 = …and so so… L* = ∪i=0,∞ Li = {𝜁, a, bb, aa, abb, bba, bbbb, aaa, aabb, abba, abbbb, bbaa, bbbba, bbaa, bbabb, bbbba, bbbbbb, abbbb, bbabb, … }

slide-11
SLIDE 11

Given an alphabet ∑ REGular EXpression (regex) Inductive definition

𝜁 is a regex 𝓜(𝜁) = {𝜁} For each a ∈ ∑, a is a regex 𝓜(a) = {a}

slide-12
SLIDE 12

Regular expressions (regex) Inductive definition

Assume r and s are regexes. r|s is a regex denoting 𝓜(r)∪𝓜(s) rs is a regex denoting 𝓜(r)𝓜(s) r* is a regex denoting (𝓜(r))* (r) is a regex denoting 𝓜(r) Precedence: Kleene closure > concatenation > union Associativity: all left-associative (minimize use of parentheses: (r|s)|t = r|s|t )

slide-13
SLIDE 13

Algebraic laws

Assume r and s are regexes. Commutativity r|s = s|r Associativity r|(s|t) = (r|s)|t and r(st) = (rs)t Disributivity r(s|t) = rs|rt and (s|t)r = sr|tr Identity 𝜁r = r𝜁 = r Idempotency r** = r*

slide-14
SLIDE 14

We can describe a regular language using a regular expression

slide-15
SLIDE 15

A regular expression can be recognized using a finite state machine. Machines: NFA non-deterministic finite automaton DFA deterministic finite automaton

slide-16
SLIDE 16

Process of building lexical analyzer

language

1) spell out the language

slide-17
SLIDE 17

Process of building lexical analyzer

language regex

2) formulate a regular expression

slide-18
SLIDE 18

Process of building lexical analyzer

language regex NFA

3) build an NFA

slide-19
SLIDE 19

Process of building lexical analyzer

DFA

4) transform NFA to DFA

language regex NFA

slide-20
SLIDE 20

Process of building lexical analyzer

DFA

5) transform DFA to a minimal DFA

DFA language regex NFA

slide-21
SLIDE 21

Process of building lexical analyzer

DFA character stream token stream

lexical analyzer 5) The minimal DFA is

  • ur lexical analyzer

DFA language regex NFA

slide-22
SLIDE 22

Focus for today

regex NFA

slide-23
SLIDE 23

Nondeterministic Finite Automata (NFA)

A finite set of states S An alphabet ∑, 𝜁 ∉ ∑ 𝛆 ⊆ S X (∑ ∪ {𝜁}) X 𝒬(S) (transition function) s0 ∈ S (a single start state) F ⊆ S (a set of final or accepting states)

slide-24
SLIDE 24

Deterministic Finite Automata (DFA)

A finite set of states S An alphabet ∑, 𝜁 ∉ ∑ 𝛆 ⊆ S X ∑ X S (transition function) s0 ∈ S (a single start state) F ⊆ S (a set of final or accepting states)

slide-25
SLIDE 25

A state is a circle with its state number written inside.

slide-26
SLIDE 26

Initial state has an arrow from nowhere pointing in. State 0 is

  • ften the initial state.
slide-27
SLIDE 27

1

A final state is drawn with a double circle.

slide-28
SLIDE 28

… or a ∈ ∑.

1 𝜁 1 a for each a ∈ ∑

Arrows are labeled with 𝜁 …

slide-29
SLIDE 29

Regex -> NFA

1 𝜁 1 a

N(s) N(t)

1 for each a ∈ ∑ S | t 𝜁 𝜁 𝜁 𝜁

slide-30
SLIDE 30

Regex -> NFA

N(s) N(s)

1

N(t)

1 𝜁 𝜁 𝜁 𝜁 St S*

slide-31
SLIDE 31

Simple example

static

slide-32
SLIDE 32

Simple example

static 1 2 3 4 5 6 s t a t i c

slide-33
SLIDE 33

Simple example

static struct 1 2 3 4 5 6 s t a t i c 7 8 9

10 11 12 13

s t r u c t i

F

𝜁 𝜁 𝜁 𝜁

slide-34
SLIDE 34

Process of building lexical analyzer

DFA character stream token stream

lexical analyzer 5) The minimal DFA is

  • ur lexical analyzer

DFA language regex NFA

slide-35
SLIDE 35

Focus above: build a non-deterministic recognizer

regex NFA

slide-36
SLIDE 36

DFA NFA

Next step: make recognizer deterministic

slide-37
SLIDE 37

(a|b)*abb

first we construct an NFA from this regular expression

slide-38
SLIDE 38

(a|b)*abb

a

slide-39
SLIDE 39

(a|b)*abb

a b

slide-40
SLIDE 40

(a|b)*abb

𝜁 𝜁 𝜁 𝜁 a b

slide-41
SLIDE 41

(a|b)*abb

𝜁 𝜁 𝜁 𝜁 a b 𝜁 𝜁 𝜁 𝜁

slide-42
SLIDE 42

(a|b)*abb

𝜁 𝜁 𝜁 𝜁 a b a 𝜁 𝜁 𝜁 𝜁

slide-43
SLIDE 43

(a|b)*abb

𝜁 𝜁 𝜁 𝜁 a b a 𝜁 𝜁 𝜁 𝜁 b

slide-44
SLIDE 44

(a|b)*abb

𝜁 𝜁 𝜁 𝜁 a b a 𝜁 𝜁 𝜁 𝜁 b b

slide-45
SLIDE 45

(a|b)*abb

𝜁 𝜁 𝜁 𝜁 a b a 𝜁 𝜁 𝜁 𝜁 b b 1 2 3 4 5 6 7 8 9

10

slide-46
SLIDE 46

Operations

𝜁-closure(t) is the set of states reachable from state t using only 𝜁-transitions. 𝜁-closure(T) is the set of states reachable from any state t ∈ T using only 𝜁- transitions. move(T,a) is the set of states reachable from any state t ∈ T following a transition on symbol a ∈ ∑.

slide-47
SLIDE 47

NFA -> DFA algorithm

(set of states construction - page 153 of text)

INPUT: An NFA N = (S, ∑, 𝛆, s0, F) OUTPUT: A DFA D = (S', ∑, 𝛆', s0', F') such that ℒ(D)=ℒ(N) ALGORITHM: Compute s0' = 𝜁-closure(s0), an unmarked set of states Set S' = { s0' } while there is an unmarked T ∈ S' mark T for each symbol a ∈ ∑ let U = 𝜁-closure(move(T,a)) if U ∉ S', add unmarked U to S' add transition: 𝛆'(T,a) = U F' is the subset of S' all of whose members contain a state in F .