Lexical analysis Lexical analysis Lexical analysis checks the - - PowerPoint PPT Presentation

lexical analysis lexical analysis
SMART_READER_LITE
LIVE PREVIEW

Lexical analysis Lexical analysis Lexical analysis checks the - - PowerPoint PPT Presentation

Lexical analysis Lexical analysis Lexical analysis checks the correctness of program words and transforms a program to the stream of tokens: removes empty symbols and commentaries; identifies keywords, indentifiers and literal


slide-1
SLIDE 1

Lexical analysis

slide-2
SLIDE 2

Lexical analysis

Lexical analysis checks the correctness of program words and transforms a program to the stream of tokens: – removes empty symbols and commentaries; – identifies keywords, indentifiers and literal constants; – constructs a symbol table; – finds line/column numbers of symbols; – informs about lexical errors when necessary. Lexical analysis is also called scanning and the corresponding analyser is called scanner.

slide-3
SLIDE 3

Regular expressions

Regular expressions over (finite) alphabet ✝ ❊ ✿✿❂ ❀ ❥ ✧ ❥ ❛ ❥ ✭❊ ❊✮ ❥ ✭❊ ❥ ❊✮ ❥ ❊❄ where ❛ ✷ ✝. Regular expression ❊ defines a language ▲✭❊✮ ✒ ✝❄ ▲✭❀✮ ❂ ❀ ▲✭❊✶ ❊✷✮ ❂ ❢✉✈ ❥ ✉ ✷ ▲✭❊✶✮❀ ✈ ✷ ▲✭❊✷✮❣ ▲✭✧✮ ❂ ❢✧❣ ▲✭❊✶ ❥ ❊✷✮ ❂ ▲✭❊✶✮ ❬ ▲✭❊✷✮ ▲✭❛✮ ❂ ❢❛❣ ▲✭❊❄✮ ❂ ❢✇✐ ❥ ✇ ✷ ▲✭❊✮❀ ✐ ✕ ✵❣ where ✇✵ ❂ ✧ and ✇♥✰✶ ❂ ✇✇♥.

slide-4
SLIDE 4

Regular expressions

Examples: Regular expression Defined language ❛ ❥ ❜ ❢❛❀ ❜❣ ❛❜❜❛ ❢❛❜❜❛❣ ❛❜❄❛ ❢❛❛❀ ❛❜❛❀ ❛❜❜❛❀ ❛❜❜❜❛❀ ✿ ✿ ✿❣ ✭❛❜✮❄ ❢✧❀ ❛❜❀ ❛❜❛❜❀ ❛❜❛❜❛❜❀ ✿ ✿ ✿❣ To minimize a number of needed parentheses, operators have priorities: – the closure operator ✭✁✮❄ has highest priority; – the choice operator ✭✁ ❥ ✁✮ has lowest priority.

slide-5
SLIDE 5

Regular expressions

A regular description over alphabet ✝ is the set of rules ❞✶ ✦ ❊✶ ❞✷ ✦ ❊✷ ✿ ✿ ✿ ❞♥ ✦ ❊♥ where ❞✐ is a (unique) name and ❊✐ is a regular expression

  • ver alphabet ✝ ❬ ❢❞✶❀ ✿ ✿ ✿ ❀ ❞✐✶❣.

Short-hand notation for regular expressions: – nonempty closure: ❊✰ ❂ ❊❊❄; – option: ❊❄ ❂ ✧ ❥ ❊; – character classes: eg. ❬❛❀ ❜❀ ❝❪ ❂ ❛ ❥ ❜ ❥ ❝ or ❬❛ ③❪ ❂ ❛ ❥ ✿ ✿ ✿ ❥ ③.

slide-6
SLIDE 6

Regular expressions

Examples of regular descriptions: Identifiers: Letter ✦ ❬❛ ③❀ ❆ ❩❪ Digit ✦ ❬✵ ✾❪ Identifier ✦ Letter ✭Letter ❥ Digit✮❄ Numeric constants: Sign ✦ ✭✰ ❥ ✮❄ Integer ✦ ✵ ❥ Sign ❬✶ ✾❪ Digit❄ Decimal ✦ Integer ✿ Digit✰ Real ✦ ✭Integer ❥ Decimal✮ ❊ Integer

slide-7
SLIDE 7

Finite automata

A finite automaton is the quintuple ❆ ❂ ❤◗❀ ✝❀ ✍❀ q✵❀ ❋✐, where – ◗ is a finite set of states; – ✝ is the finite alphabet; – ✍ ✒ ◗ ✂ ✭✝ ❬ ✧✮ ✂ ◗ is the transition relation; – q✵ ✷ ◗ is the initial state; – ❋ ✒ ◗ is a set of final states. A finite automaton is deterministic (DFA), if the transition relation is a function ✍ ✿ ◗ ✂ ✝ ✦ ◗. Otherwise, the finite automaton is nondeterministic (NFA).

slide-8
SLIDE 8

Finite automata

Finite automata can be represented by state transition diagrams:

q✵ q✶ q✷

❛ ❛ ❜

The finite automaton ❆ ❂ ❤◗❀ ✝❀ ✍❀ q✵❀ ❋✐ accepts the language ▲✭❆✮ ❂ ❢✇ ✷ ✝❄ ❥ ✭q✵❀ ✇❀ q❢✮ ✷ ✍❄❀ q❢ ✷ ❋❣ where ✍❄ ✒ ◗ ✂ ✝❄ ✂ ◗ is a reflexive and transitive closure

  • f the transition relation ✍.

Theorem: The class of languages accepted by finite automata is that of regular languages.

slide-9
SLIDE 9

Converting a regular expression to an automaton

Thompson’s construction for converting a regular expression to NFA: for a regular expression ❊ construct the ”automaton”:

q✵ q❢

transform the ”automaton” using following rules until all transitions have only simple labels (ie. ✧ or a character):

q ♣ q ♣ q ♣ q q✶ ♣ q ♣ q q✶ q✷ ♣

❊✶ ❊✷ ❊✶ ❥❊✷ ❊❄ ❊✶ ❊✷ ❊✶ ❊✷ ✧ ✧ ❊ ✧ ✧

slide-10
SLIDE 10

Converting a regular expression to an automaton

Example:

q✵ q❢ q✵ q✶ q❢ q✵ q✶ q✷ q✸ q❢ q✵ q✶ q✷ q✸ q❢

❛ ✭❛ ❥ ❜✮❄ ❛ ✭❛ ❥ ❜✮❄ ❛ ✧ ✧ ❛ ❥ ❜ ✧ ✧ ❛ ✧ ❜ ✧ ❛ ✧ ✧

slide-11
SLIDE 11

Converting a regular expression to an automaton

Example:

q✵ q❢ q✵ q✶ q❢ q✵ q✶ q✷ q✸ q❢ q✵ q✶ q✷ q✸ q❢

❛ ✭❛ ❥ ❜✮❄ ❛ ✭❛ ❥ ❜✮❄ ❛ ✧ ✧ ❛ ❥ ❜ ✧ ✧ ❛ ✧ ❜ ✧ ❛ ✧ ✧

slide-12
SLIDE 12

Converting a regular expression to an automaton

Example:

q✵ q❢ q✵ q✶ q❢ q✵ q✶ q✷ q✸ q❢ q✵ q✶ q✷ q✸ q❢

❛ ✭❛ ❥ ❜✮❄ ❛ ✭❛ ❥ ❜✮❄ ❛ ✧ ✧ ❛ ❥ ❜ ✧ ✧ ❛ ✧ ❜ ✧ ❛ ✧ ✧

slide-13
SLIDE 13

Converting a regular expression to an automaton

Example:

q✵ q❢ q✵ q✶ q❢ q✵ q✶ q✷ q✸ q❢ q✵ q✶ q✷ q✸ q❢

❛ ✭❛ ❥ ❜✮❄ ❛ ✭❛ ❥ ❜✮❄ ❛ ✧ ✧ ❛ ❥ ❜ ✧ ✧ ❛ ✧ ❜ ✧ ❛ ✧ ✧

slide-14
SLIDE 14

Constructing DFA

Given NFA ❆ ❂ ❤◗❀ ✝❀ ✍❀ q✵❀ ❋✐ construct an equivalent DFA ❆✵ ❂ ❤◗✵❀ ✝❀ ✍✵❀ q✵

✵❀ ❋ ✵✐ by subset construction.

Auxiliary functions: – the ✧-closure function ✧-❝❧♦s✉r❡ ✿ ✷◗ ✦ ✷◗ ✧-❝❧♦s✉r❡✭❙✮ ❂ ❢♣ ❥ q ✷ ❙❀ ✭q❀ ✧❀ ♣✮ ✷ ✍❄❣ – the single step function ♠♦✈❡ ✿ ✷◗ ✂ ✝ ✦ ✷◗ ♠♦✈❡✭❙❀ ❛✮ ❂ ❢♣ ❥ q ✷ ❙❀ ✭q❀ ❛❀ ♣✮ ✷ ✍❣

slide-15
SLIDE 15

Constructing DFA

Algorithm: ◗✵ ✿❂ ❀❀ ❋ ✵ ✿❂ ❀❀ ✍✵ ✿❂ ❀❀ q✵

✵ ✿❂ ✧-❝❧♦s✉r❡✭❢q✵❣✮❀ ❯ ✿❂ ❢q✵ ✵❣❀

while ✾❙ ✷ ❯ do ❯ ✿❂ ❯ ♥ ❙❀ ◗✵ ✿❂ ◗✵ ❬ ❢❙❣❀ foreach ❛ ✷ ✝ do ❚ ✿❂ ✧-❝❧♦s✉r❡✭♠♦✈❡✭❙❀ ❛✮✮❀ if ❚ ✻✷ ❯ ❬ ◗✵ then ❯ ✿❂ ❯ ❬ ❢❚❣❀ ✍✵ ✿❂ ✍✵ ❬ ❢✭❙❀ ❛✮ ✼✦ ❚❣❀ end end ❋ ✵ ✿❂ ❢❙ ✷ ◗✵ ❥ ❙ ❭ ❋ ✻❂ ❀❣❀

slide-16
SLIDE 16

Constructing DFA

Example:

q✵ q✶ q✷ q✸ q❢

❛ ✧ ❜ ✧ ❛ ✧ ✧

slide-17
SLIDE 17

Constructing DFA

Example:

q✵ q✶ q✷ q✸ q❢ q✵

❛ ✧ ❜ ✧ ❛ ✧ ✧

slide-18
SLIDE 18

Constructing DFA

Example:

q✵ q✶ q✷ q✸ q❢ q✵

q✵

❛ ✧ ❜ ✧ ❛ ✧ ✧ ❛

slide-19
SLIDE 19

Constructing DFA

Example:

q✵ q✶ q✷ q✸ q❢ q✵

q✵

q✵

❛ ✧ ❜ ✧ ❛ ✧ ✧ ❛ ❛ ❜

slide-20
SLIDE 20

Constructing DFA

Example:

q✵ q✶ q✷ q✸ q❢ q✵

q✵

q✵

❛ ✧ ❜ ✧ ❛ ✧ ✧ ❛ ❛ ❜ ❛ ❜

slide-21
SLIDE 21

Constructing DFA

Example:

q✵ q✶ q✷ q✸ q❢ q✵

q✵

q✵

❛ ✧ ❜ ✧ ❛ ✧ ✧ ❛ ❛ ❜ ❛ ❜

slide-22
SLIDE 22

Minimizing DFA

DFA constructed from the regular expression ❛✭❛ ❥ ❜✮❄:

q✵ q✶ q✷

❛ ❛ ❜ ❛ ❜

An equivalent smaller DFA:

q✵ q✶

❛ ❛ ❜

slide-23
SLIDE 23

Minimizing DFA

DFA is minimal if there is no smaller DFA accepting the same language. For every DFA ❆ ❂ ❤◗❀ ✝❀ ✍❀ q✵❀ ❋✐ there exists an (unique) equivalent minimal DFA ❆✵ ❂ ❤◗✵❀ ✝❀ ✍✵❀ q✵

✵❀ ❋ ✵✐.

Idea: partition the set of states into equivalence classes. – States ♣❀ q ✷ ◗ are equivalent or indistinguishable if automata having these as initial states accept the same language (ie. for any word ✇ ✷ ✝❄ if one succeeds (resp. fails), the other one does the same, and vice versa). – For every letter, the transition function transformes equivalent states to equivalent states.

slide-24
SLIDE 24

Minimizing DFA

Minimization algorithm: Remove all states unreachable from the initial state q✵. On the remaining set of states find the biggest partition ✆ into equivalence classes. Construct the new automaton ❆✵ ❂ ❤◗✵❀ ✝❀ ✍✵❀ q✵

✵❀ ❋ ✵✐, where

– the set of states is ◗✵ ❂ ✆; – the initial state is q✵

✵ ❂ P✵, where P✵ ✷ ✆ and q✵ ✷ P✵;

– the set of final states is ❋ ✵ ❂ ❢P ✷ ✆ ❥ P ❭ ❋ ✻❂ ❀❣; – the transition function is ✍✵ ❂ ❢✭P✐❀ ❛✮ ✼✦ P❥ ❥ P❥ ✷ ♠♦✈❡✭P✐❀ ❛✮❣.

slide-25
SLIDE 25

Minimizing DFA

Naive algorithm for finding partition: P ✿❂ ❢❋❀ ◗ ♥ ❋❣❀ do ✆ ✿❂ P❀ P ✿❂ ❀❀ foreach ❙ ✷ ✆ do foreach ❛ ✷ ✝ do ❯ ✿❂ ❢❚ ✷ ✆ ❥ ❚ ❭ ♠♦✈❡✭❙❀ ❛✮ ✻❂ ❀❣❀ ❱ ✿❂ ❢❙ ❭ ♠♦✈❡✶

❛ ✭❚✮ ❥ ❚ ✷ ❯❣❀

P ✿❂ P ❬ ❱ ❀ end end until ✆ ❂ P❀

slide-26
SLIDE 26

Minimizing DFA

Naive algorithm tries to split all partition at every iteration. – In worst case has a quadradic complexity. – It is enough to consider only these partitions from which

  • ne can move to some split partition.

Hopcroft’s algorithm for finding the partition: – uses work-list for non-examined split partitions; – if a partition not in the work-list is split, then only one (smaller) subpartition is put to the work-list.

slide-27
SLIDE 27

Minimizing DFA

Hopcroft’s algorithm: ✆ ✿❂ ❢❋❀ ◗ ♥ ❋❣❀ ❲ ✿❂ ✆❀ while ✾❙ ✷ ❲ do ❲ ✿❂ ❲ ♥ ❙❀ foreach ❛ ✷ ✝ do P ✿❂ ♠♦✈❡✶

❛ ✭❙✮❀

foreach ❘ ✷ ❢❚ ✷ ✆ ❥ ❚ ❭ P ✻❂ ❀❀ ❚ ✻✒ P❣ do ❘✶ ✿❂ ❘ ❭ P❀ ❘✷ ✿❂ ❘ ♥ ❘✶❀ ✆ ✿❂ ✭✆ ♥ ❘✮ ❬ ❢❘✶❀ ❘✷❣❀ if ❘ ✷ ❲ then ❲ ✿❂ ✭❲ ♥ ❘✮ ❬ ❢❘✶❀ ❘✷❣❀ else if ❥❘✶❥ ✔ ❥❘✷❥ then ❲ ✿❂ ❲ ❬ ❢❘✶❣❀ else ❲ ✿❂ ❲ ❬ ❢❘✷❣❀ end end end

slide-28
SLIDE 28

Minimizing DFA

Example – minimizing DFA corresponding to the regular expression ✭❛ ❥ ❜✮❄❛❜❜:

q✵ q✶ q✸ q✹ q✷

❛ ❜ ❜ ❛ ❛ ❛ ❜ ❛ ❜ ❜

slide-29
SLIDE 29

Minimizing DFA

Example – minimizing DFA corresponding to the regular expression ✭❛ ❥ ❜✮❄❛❜❜:

q✵ q✶ q✸ q✹ q✷

❛ ❜ ❜ ❛ ❛ ❛ ❜ ❛ ❜ ❜

slide-30
SLIDE 30

Minimizing DFA

Example – minimizing DFA corresponding to the regular expression ✭❛ ❥ ❜✮❄❛❜❜:

q✵ q✶ q✸ q✹ q✷

❛ ❜ ❜ ❛ ❛ ❛ ❜ ❛ ❜ ❜

slide-31
SLIDE 31

Minimizing DFA

Example – minimizing DFA corresponding to the regular expression ✭❛ ❥ ❜✮❄❛❜❜:

q✵ q✶ q✸ q✹ q✷

❛ ❜ ❜ ❛ ❛ ❛ ❜ ❛ ❜ ❜

slide-32
SLIDE 32

Minimizing DFA

Example – minimizing DFA corresponding to the regular expression ✭❛ ❥ ❜✮❄❛❜❜:

q✵ q✶ q✸ q✹ q✷

❛ ❜ ❜ ❛ ❛ ❛ ❜ ❛ ❜ ❜

slide-33
SLIDE 33

Minimizing DFA

Example – minimizing DFA corresponding to the regular expression ✭❛ ❥ ❜✮❄❛❜❜:

q✵ q✶ q✸ q✹

❛ ❜ ❜ ❛ ❛ ❛ ❜ ❜

slide-34
SLIDE 34

Scanner generator Flex

foo.l flex lex.yy.c gcc file.foo a.out tokens

slide-35
SLIDE 35

Scanner generator Flex

Format of the input file: An input file of Flex has three parts: definitions %% rules %% user code The definition part consits of: – C code (included header files, definitions of global variables); – regular descriptions; – definitions of start conditions.

slide-36
SLIDE 36

Scanner generator Flex

The rules part consits of a sequence of pairs: pattern action where the pattern must start without indentation and ends with the first empty symbol; the action must start on the same line as is the pattern. A pattern is a (extended) regular expression; an action is an arbitrary C statement. – If action is empty, the input corresponding to the pattern is removed. – If input doesn’t match with any pattern then it is copied to the output. The third part of the Flex input file is a C code which is copied to the generated file lex.yy.c in verbatim. – May be absent in which case the second separator is also not required.

slide-37
SLIDE 37

Scanner generator Flex

Interface for a parser: int yylex(void) the main function; returns the class of the recognized word; and 0 at EOF char *yytext points to the last scanned word int yyleng the length of the last scanned word FILE *yyin the default input file FILE *yyout the default output file int yywrap(void) should be defined in the third part; if not then use ’-lfl’ when linking; usually re- turns simply 1 YYSTYPE yylval the structure containing a value of the symbol; defined in the parser (in the inc- luded header fail parser.tab.h)