CSE443 Compilers
- Dr. Carl Alphonce
CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis - - PowerPoint PPT Presentation
CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall Announcements HW-01 posted PR-01 posted Team formation: what is current status? Lexical Phases of structure a compiler Figure 1.6, page 5 of text Bird's eye view
Figure 1.6, page 5 of text
{ for, while, x, factorial, … }
language: a set of strings
G = (N, ∑, P, S)
grammar: rules for generating language
regular expression
regex: a form of grammar
finite automaton
a machine for language
C program
generated by FLEX
Formally, a grammar is defined by 4 items:
G = (N, ∑, P, S)
N, a set of non-terminals ∑, a set of terminals (alphabet) N ∩ ∑ = {} P, a set of productions of the form (right linear) X -> a X -> aY X -> 𝜁 X ∈ N, Y ∈ N, a ∈ ∑, 𝜁 denotes the empty string S, a start symbol S ∈ N
Recall, 𝜁 is the empty string
Li is L concatenated with itself i times: L0 = {𝜁}, by definition L1 = L L2 = LL L3 = LLL, etc. L* is the union of all these sets!
L ∪ M = { s | s ∈ L or s ∈ M } union LM = { st | s ∈ L and t ∈ M } concatenation L* = ∪i=0,∞ Li Kleene closure
Suppose L is {a, bb} L0 = {𝜁}, by definition L1 = L = {a, bb} L2 = LL = {aa, abb, bba, bbbb} L3 = LLL = {aaa, aabb, abba, abbbb, bbaa, bbbba, bbaa, bbabb, bbbba, bbbbbb, abbbb, bbabb} L4 = …and so so… L* = ∪i=0,∞ Li = {𝜁, a, bb, aa, abb, bba, bbbb, aaa, aabb, abba, abbbb, bbaa, bbbba, bbaa, bbabb, bbbba, bbbbbb, abbbb, bbabb, … }
Assume r and s are regexes. r|s is a regex denoting 𝓜(r)∪𝓜(s) rs is a regex denoting 𝓜(r)𝓜(s) r* is a regex denoting (𝓜(r))* (r) is a regex denoting 𝓜(r) Precedence: Kleene closure > concatenation > union Associativity: all left-associative (minimize use of parentheses: (r|s)|t = r|s|t )
Assume r and s are regexes. Commutativity r|s = s|r Associativity r|(s|t) = (r|s)|t and r(st) = (rs)t Disributivity r(s|t) = rs|rt and (s|t)r = sr|tr Identity 𝜁r = r𝜁 = r Idempotency r** = r*
language
language regex
language regex NFA
DFA
language regex NFA
DFA
DFA language regex NFA
DFA character stream token stream
DFA language regex NFA
regex NFA
A finite set of states S An alphabet ∑, 𝜁 ∉ ∑ 𝛆 ⊆ S X (∑ ∪ {𝜁}) X 𝒬(S) (transition function) s0 ∈ S (a single start state) F ⊆ S (a set of final or accepting states)
A finite set of states S An alphabet ∑, 𝜁 ∉ ∑ 𝛆 ⊆ S X ∑ X S (transition function) s0 ∈ S (a single start state) F ⊆ S (a set of final or accepting states)
A state is a circle with its state number written inside.
Initial state has an arrow from nowhere pointing in. State 0 is
A final state is drawn with a double circle.
… or a ∈ ∑.
Arrows are labeled with 𝜁 …
N(s) N(t)
N(s) N(s)
N(t)
10 11 12 13
F
DFA character stream token stream
DFA language regex NFA
regex NFA
DFA NFA
10
𝜁-closure(t) is the set of states reachable from state t using only 𝜁-transitions. 𝜁-closure(T) is the set of states reachable from any state t ∈ T using only 𝜁- transitions. move(T,a) is the set of states reachable from any state t ∈ T following a transition on symbol a ∈ ∑.
(set of states construction - page 153 of text)
INPUT: An NFA N = (S, ∑, 𝛆, s0, F) OUTPUT: A DFA D = (S', ∑, 𝛆', s0', F') such that ℒ(D)=ℒ(N) ALGORITHM: Compute s0' = 𝜁-closure(s0), an unmarked set of states Set S' = { s0' } while there is an unmarked T ∈ S' mark T for each symbol a ∈ ∑ let U = 𝜁-closure(move(T,a)) if U ∉ S', add unmarked U to S' add transition: 𝛆'(T,a) = U F' is the subset of S' all of whose members contain a state in F .