SLIDE 1
Lexical analysis Lexical analysis Lexical analysis checks the - - PowerPoint PPT Presentation
Lexical analysis Lexical analysis Lexical analysis checks the - - PowerPoint PPT Presentation
Lexical analysis Lexical analysis Lexical analysis checks the correctness of program words and transforms a program to the stream of tokens: removes empty symbols and commentaries; identifies keywords, indentifiers and literal
SLIDE 2
SLIDE 3
Regular expressions
Regular expressions over (finite) alphabet ✝ ❊ ✿✿❂ ❀ ❥ ✧ ❥ ❛ ❥ ✭❊ ❊✮ ❥ ✭❊ ❥ ❊✮ ❥ ❊❄ where ❛ ✷ ✝. Regular expression ❊ defines a language ▲✭❊✮ ✒ ✝❄ ▲✭❀✮ ❂ ❀ ▲✭❊✶ ❊✷✮ ❂ ❢✉✈ ❥ ✉ ✷ ▲✭❊✶✮❀ ✈ ✷ ▲✭❊✷✮❣ ▲✭✧✮ ❂ ❢✧❣ ▲✭❊✶ ❥ ❊✷✮ ❂ ▲✭❊✶✮ ❬ ▲✭❊✷✮ ▲✭❛✮ ❂ ❢❛❣ ▲✭❊❄✮ ❂ ❢✇✐ ❥ ✇ ✷ ▲✭❊✮❀ ✐ ✕ ✵❣ where ✇✵ ❂ ✧ and ✇♥✰✶ ❂ ✇✇♥.
SLIDE 4
Regular expressions
Examples: Regular expression Defined language ❛ ❥ ❜ ❢❛❀ ❜❣ ❛❜❜❛ ❢❛❜❜❛❣ ❛❜❄❛ ❢❛❛❀ ❛❜❛❀ ❛❜❜❛❀ ❛❜❜❜❛❀ ✿ ✿ ✿❣ ✭❛❜✮❄ ❢✧❀ ❛❜❀ ❛❜❛❜❀ ❛❜❛❜❛❜❀ ✿ ✿ ✿❣ To minimize a number of needed parentheses, operators have priorities: – the closure operator ✭✁✮❄ has highest priority; – the choice operator ✭✁ ❥ ✁✮ has lowest priority.
SLIDE 5
Regular expressions
A regular description over alphabet ✝ is the set of rules ❞✶ ✦ ❊✶ ❞✷ ✦ ❊✷ ✿ ✿ ✿ ❞♥ ✦ ❊♥ where ❞✐ is a (unique) name and ❊✐ is a regular expression
- ver alphabet ✝ ❬ ❢❞✶❀ ✿ ✿ ✿ ❀ ❞✐✶❣.
Short-hand notation for regular expressions: – nonempty closure: ❊✰ ❂ ❊❊❄; – option: ❊❄ ❂ ✧ ❥ ❊; – character classes: eg. ❬❛❀ ❜❀ ❝❪ ❂ ❛ ❥ ❜ ❥ ❝ or ❬❛ ③❪ ❂ ❛ ❥ ✿ ✿ ✿ ❥ ③.
SLIDE 6
Regular expressions
Examples of regular descriptions: Identifiers: Letter ✦ ❬❛ ③❀ ❆ ❩❪ Digit ✦ ❬✵ ✾❪ Identifier ✦ Letter ✭Letter ❥ Digit✮❄ Numeric constants: Sign ✦ ✭✰ ❥ ✮❄ Integer ✦ ✵ ❥ Sign ❬✶ ✾❪ Digit❄ Decimal ✦ Integer ✿ Digit✰ Real ✦ ✭Integer ❥ Decimal✮ ❊ Integer
SLIDE 7
Finite automata
A finite automaton is the quintuple ❆ ❂ ❤◗❀ ✝❀ ✍❀ q✵❀ ❋✐, where – ◗ is a finite set of states; – ✝ is the finite alphabet; – ✍ ✒ ◗ ✂ ✭✝ ❬ ✧✮ ✂ ◗ is the transition relation; – q✵ ✷ ◗ is the initial state; – ❋ ✒ ◗ is a set of final states. A finite automaton is deterministic (DFA), if the transition relation is a function ✍ ✿ ◗ ✂ ✝ ✦ ◗. Otherwise, the finite automaton is nondeterministic (NFA).
SLIDE 8
Finite automata
Finite automata can be represented by state transition diagrams:
q✵ q✶ q✷
❛ ❛ ❜
The finite automaton ❆ ❂ ❤◗❀ ✝❀ ✍❀ q✵❀ ❋✐ accepts the language ▲✭❆✮ ❂ ❢✇ ✷ ✝❄ ❥ ✭q✵❀ ✇❀ q❢✮ ✷ ✍❄❀ q❢ ✷ ❋❣ where ✍❄ ✒ ◗ ✂ ✝❄ ✂ ◗ is a reflexive and transitive closure
- f the transition relation ✍.
Theorem: The class of languages accepted by finite automata is that of regular languages.
SLIDE 9
Converting a regular expression to an automaton
Thompson’s construction for converting a regular expression to NFA: for a regular expression ❊ construct the ”automaton”:
q✵ q❢
❊
transform the ”automaton” using following rules until all transitions have only simple labels (ie. ✧ or a character):
q ♣ q ♣ q ♣ q q✶ ♣ q ♣ q q✶ q✷ ♣
❊✶ ❊✷ ❊✶ ❥❊✷ ❊❄ ❊✶ ❊✷ ❊✶ ❊✷ ✧ ✧ ❊ ✧ ✧
SLIDE 10
Converting a regular expression to an automaton
Example:
q✵ q❢ q✵ q✶ q❢ q✵ q✶ q✷ q✸ q❢ q✵ q✶ q✷ q✸ q❢
❛ ✭❛ ❥ ❜✮❄ ❛ ✭❛ ❥ ❜✮❄ ❛ ✧ ✧ ❛ ❥ ❜ ✧ ✧ ❛ ✧ ❜ ✧ ❛ ✧ ✧
SLIDE 11
Converting a regular expression to an automaton
Example:
q✵ q❢ q✵ q✶ q❢ q✵ q✶ q✷ q✸ q❢ q✵ q✶ q✷ q✸ q❢
❛ ✭❛ ❥ ❜✮❄ ❛ ✭❛ ❥ ❜✮❄ ❛ ✧ ✧ ❛ ❥ ❜ ✧ ✧ ❛ ✧ ❜ ✧ ❛ ✧ ✧
SLIDE 12
Converting a regular expression to an automaton
Example:
q✵ q❢ q✵ q✶ q❢ q✵ q✶ q✷ q✸ q❢ q✵ q✶ q✷ q✸ q❢
❛ ✭❛ ❥ ❜✮❄ ❛ ✭❛ ❥ ❜✮❄ ❛ ✧ ✧ ❛ ❥ ❜ ✧ ✧ ❛ ✧ ❜ ✧ ❛ ✧ ✧
SLIDE 13
Converting a regular expression to an automaton
Example:
q✵ q❢ q✵ q✶ q❢ q✵ q✶ q✷ q✸ q❢ q✵ q✶ q✷ q✸ q❢
❛ ✭❛ ❥ ❜✮❄ ❛ ✭❛ ❥ ❜✮❄ ❛ ✧ ✧ ❛ ❥ ❜ ✧ ✧ ❛ ✧ ❜ ✧ ❛ ✧ ✧
SLIDE 14
Constructing DFA
Given NFA ❆ ❂ ❤◗❀ ✝❀ ✍❀ q✵❀ ❋✐ construct an equivalent DFA ❆✵ ❂ ❤◗✵❀ ✝❀ ✍✵❀ q✵
✵❀ ❋ ✵✐ by subset construction.
Auxiliary functions: – the ✧-closure function ✧-❝❧♦s✉r❡ ✿ ✷◗ ✦ ✷◗ ✧-❝❧♦s✉r❡✭❙✮ ❂ ❢♣ ❥ q ✷ ❙❀ ✭q❀ ✧❀ ♣✮ ✷ ✍❄❣ – the single step function ♠♦✈❡ ✿ ✷◗ ✂ ✝ ✦ ✷◗ ♠♦✈❡✭❙❀ ❛✮ ❂ ❢♣ ❥ q ✷ ❙❀ ✭q❀ ❛❀ ♣✮ ✷ ✍❣
SLIDE 15
Constructing DFA
Algorithm: ◗✵ ✿❂ ❀❀ ❋ ✵ ✿❂ ❀❀ ✍✵ ✿❂ ❀❀ q✵
✵ ✿❂ ✧-❝❧♦s✉r❡✭❢q✵❣✮❀ ❯ ✿❂ ❢q✵ ✵❣❀
while ✾❙ ✷ ❯ do ❯ ✿❂ ❯ ♥ ❙❀ ◗✵ ✿❂ ◗✵ ❬ ❢❙❣❀ foreach ❛ ✷ ✝ do ❚ ✿❂ ✧-❝❧♦s✉r❡✭♠♦✈❡✭❙❀ ❛✮✮❀ if ❚ ✻✷ ❯ ❬ ◗✵ then ❯ ✿❂ ❯ ❬ ❢❚❣❀ ✍✵ ✿❂ ✍✵ ❬ ❢✭❙❀ ❛✮ ✼✦ ❚❣❀ end end ❋ ✵ ✿❂ ❢❙ ✷ ◗✵ ❥ ❙ ❭ ❋ ✻❂ ❀❣❀
SLIDE 16
Constructing DFA
Example:
q✵ q✶ q✷ q✸ q❢
❛ ✧ ❜ ✧ ❛ ✧ ✧
SLIDE 17
Constructing DFA
Example:
q✵ q✶ q✷ q✸ q❢ q✵
✵
❛ ✧ ❜ ✧ ❛ ✧ ✧
SLIDE 18
Constructing DFA
Example:
q✵ q✶ q✷ q✸ q❢ q✵
✵
q✵
✶
❛ ✧ ❜ ✧ ❛ ✧ ✧ ❛
SLIDE 19
Constructing DFA
Example:
q✵ q✶ q✷ q✸ q❢ q✵
✵
q✵
✶
q✵
✷
❛ ✧ ❜ ✧ ❛ ✧ ✧ ❛ ❛ ❜
SLIDE 20
Constructing DFA
Example:
q✵ q✶ q✷ q✸ q❢ q✵
✵
q✵
✶
q✵
✷
❛ ✧ ❜ ✧ ❛ ✧ ✧ ❛ ❛ ❜ ❛ ❜
SLIDE 21
Constructing DFA
Example:
q✵ q✶ q✷ q✸ q❢ q✵
✵
q✵
✶
q✵
✷
❛ ✧ ❜ ✧ ❛ ✧ ✧ ❛ ❛ ❜ ❛ ❜
SLIDE 22
Minimizing DFA
DFA constructed from the regular expression ❛✭❛ ❥ ❜✮❄:
q✵ q✶ q✷
❛ ❛ ❜ ❛ ❜
An equivalent smaller DFA:
q✵ q✶
❛ ❛ ❜
SLIDE 23
Minimizing DFA
DFA is minimal if there is no smaller DFA accepting the same language. For every DFA ❆ ❂ ❤◗❀ ✝❀ ✍❀ q✵❀ ❋✐ there exists an (unique) equivalent minimal DFA ❆✵ ❂ ❤◗✵❀ ✝❀ ✍✵❀ q✵
✵❀ ❋ ✵✐.
Idea: partition the set of states into equivalence classes. – States ♣❀ q ✷ ◗ are equivalent or indistinguishable if automata having these as initial states accept the same language (ie. for any word ✇ ✷ ✝❄ if one succeeds (resp. fails), the other one does the same, and vice versa). – For every letter, the transition function transformes equivalent states to equivalent states.
SLIDE 24
Minimizing DFA
Minimization algorithm: Remove all states unreachable from the initial state q✵. On the remaining set of states find the biggest partition ✆ into equivalence classes. Construct the new automaton ❆✵ ❂ ❤◗✵❀ ✝❀ ✍✵❀ q✵
✵❀ ❋ ✵✐, where
– the set of states is ◗✵ ❂ ✆; – the initial state is q✵
✵ ❂ P✵, where P✵ ✷ ✆ and q✵ ✷ P✵;
– the set of final states is ❋ ✵ ❂ ❢P ✷ ✆ ❥ P ❭ ❋ ✻❂ ❀❣; – the transition function is ✍✵ ❂ ❢✭P✐❀ ❛✮ ✼✦ P❥ ❥ P❥ ✷ ♠♦✈❡✭P✐❀ ❛✮❣.
SLIDE 25
Minimizing DFA
Naive algorithm for finding partition: P ✿❂ ❢❋❀ ◗ ♥ ❋❣❀ do ✆ ✿❂ P❀ P ✿❂ ❀❀ foreach ❙ ✷ ✆ do foreach ❛ ✷ ✝ do ❯ ✿❂ ❢❚ ✷ ✆ ❥ ❚ ❭ ♠♦✈❡✭❙❀ ❛✮ ✻❂ ❀❣❀ ❱ ✿❂ ❢❙ ❭ ♠♦✈❡✶
❛ ✭❚✮ ❥ ❚ ✷ ❯❣❀
P ✿❂ P ❬ ❱ ❀ end end until ✆ ❂ P❀
SLIDE 26
Minimizing DFA
Naive algorithm tries to split all partition at every iteration. – In worst case has a quadradic complexity. – It is enough to consider only these partitions from which
- ne can move to some split partition.
Hopcroft’s algorithm for finding the partition: – uses work-list for non-examined split partitions; – if a partition not in the work-list is split, then only one (smaller) subpartition is put to the work-list.
SLIDE 27
Minimizing DFA
Hopcroft’s algorithm: ✆ ✿❂ ❢❋❀ ◗ ♥ ❋❣❀ ❲ ✿❂ ✆❀ while ✾❙ ✷ ❲ do ❲ ✿❂ ❲ ♥ ❙❀ foreach ❛ ✷ ✝ do P ✿❂ ♠♦✈❡✶
❛ ✭❙✮❀
foreach ❘ ✷ ❢❚ ✷ ✆ ❥ ❚ ❭ P ✻❂ ❀❀ ❚ ✻✒ P❣ do ❘✶ ✿❂ ❘ ❭ P❀ ❘✷ ✿❂ ❘ ♥ ❘✶❀ ✆ ✿❂ ✭✆ ♥ ❘✮ ❬ ❢❘✶❀ ❘✷❣❀ if ❘ ✷ ❲ then ❲ ✿❂ ✭❲ ♥ ❘✮ ❬ ❢❘✶❀ ❘✷❣❀ else if ❥❘✶❥ ✔ ❥❘✷❥ then ❲ ✿❂ ❲ ❬ ❢❘✶❣❀ else ❲ ✿❂ ❲ ❬ ❢❘✷❣❀ end end end
SLIDE 28
Minimizing DFA
Example – minimizing DFA corresponding to the regular expression ✭❛ ❥ ❜✮❄❛❜❜:
q✵ q✶ q✸ q✹ q✷
❛ ❜ ❜ ❛ ❛ ❛ ❜ ❛ ❜ ❜
SLIDE 29
Minimizing DFA
Example – minimizing DFA corresponding to the regular expression ✭❛ ❥ ❜✮❄❛❜❜:
q✵ q✶ q✸ q✹ q✷
❛ ❜ ❜ ❛ ❛ ❛ ❜ ❛ ❜ ❜
SLIDE 30
Minimizing DFA
Example – minimizing DFA corresponding to the regular expression ✭❛ ❥ ❜✮❄❛❜❜:
q✵ q✶ q✸ q✹ q✷
❛ ❜ ❜ ❛ ❛ ❛ ❜ ❛ ❜ ❜
SLIDE 31
Minimizing DFA
Example – minimizing DFA corresponding to the regular expression ✭❛ ❥ ❜✮❄❛❜❜:
q✵ q✶ q✸ q✹ q✷
❛ ❜ ❜ ❛ ❛ ❛ ❜ ❛ ❜ ❜
SLIDE 32
Minimizing DFA
Example – minimizing DFA corresponding to the regular expression ✭❛ ❥ ❜✮❄❛❜❜:
q✵ q✶ q✸ q✹ q✷
❛ ❜ ❜ ❛ ❛ ❛ ❜ ❛ ❜ ❜
SLIDE 33
Minimizing DFA
Example – minimizing DFA corresponding to the regular expression ✭❛ ❥ ❜✮❄❛❜❜:
q✵ q✶ q✸ q✹
❛ ❜ ❜ ❛ ❛ ❛ ❜ ❜
SLIDE 34
Scanner generator Flex
foo.l flex lex.yy.c gcc file.foo a.out tokens
SLIDE 35
Scanner generator Flex
Format of the input file: An input file of Flex has three parts: definitions %% rules %% user code The definition part consits of: – C code (included header files, definitions of global variables); – regular descriptions; – definitions of start conditions.
SLIDE 36
Scanner generator Flex
The rules part consits of a sequence of pairs: pattern action where the pattern must start without indentation and ends with the first empty symbol; the action must start on the same line as is the pattern. A pattern is a (extended) regular expression; an action is an arbitrary C statement. – If action is empty, the input corresponding to the pattern is removed. – If input doesn’t match with any pattern then it is copied to the output. The third part of the Flex input file is a C code which is copied to the generated file lex.yy.c in verbatim. – May be absent in which case the second separator is also not required.
SLIDE 37