Compiler Development (CMPSC 401) Lexical Analysis Janyl Jumadinova - - PowerPoint PPT Presentation

compiler development cmpsc 401
SMART_READER_LITE
LIVE PREVIEW

Compiler Development (CMPSC 401) Lexical Analysis Janyl Jumadinova - - PowerPoint PPT Presentation

Compiler Development (CMPSC 401) Lexical Analysis Janyl Jumadinova January 29, 2019 Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 1 / 40 Automaton and Regular Expressions Deterministic Finite Automata (DFAs),


slide-1
SLIDE 1

Compiler Development (CMPSC 401)

Lexical Analysis Janyl Jumadinova January 29, 2019

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 1 / 40

slide-2
SLIDE 2

Automaton and Regular Expressions

Deterministic Finite Automata (DFAs), Non-deterministic Finite Automata (NFAs) and REs have same expressive power i.e. allow precisely same patterns/sets to be specified.

DFA RE NFA

For every RE there is an equivalent NFA For every DFA there is an equivalent RE For every NFA there is an equivalent DFA

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 2 / 40

slide-3
SLIDE 3

Finite State Automaton

A finite automaton is a machine that has a finite number of states and a finite number of transitions between these.

One marked as initial state. One or more marked as final states. States sometimes labeled or numbered.

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 3 / 40

slide-4
SLIDE 4

Finite State Automaton

A finite automaton is a machine that has a finite number of states and a finite number of transitions between these.

One marked as initial state. One or more marked as final states. States sometimes labeled or numbered.

A set of transitions from state to state.

Each labeled with symbol from (the alphabet), or ε. The symbols correspond to characters in the input string.

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 3 / 40

slide-5
SLIDE 5

Example

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 4 / 40

slide-6
SLIDE 6

Example

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 5 / 40

slide-7
SLIDE 7

Example

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 6 / 40

slide-8
SLIDE 8

Example

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 7 / 40

slide-9
SLIDE 9

Example

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 8 / 40

slide-10
SLIDE 10

Example

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 9 / 40

slide-11
SLIDE 11

Example

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 10 / 40

slide-12
SLIDE 12

Example

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 11 / 40

slide-13
SLIDE 13

Example

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 12 / 40

slide-14
SLIDE 14

Example

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 13 / 40

slide-15
SLIDE 15

Example

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 14 / 40

slide-16
SLIDE 16

Example

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 15 / 40

slide-17
SLIDE 17

Example

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 16 / 40

slide-18
SLIDE 18

Example

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 17 / 40

slide-19
SLIDE 19

Finite State Automaton

Operate by reading input symbols (usually characters).

Transition can be taken if labeled with current symbol. ε-transition can be taken at any time.

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 18 / 40

slide-20
SLIDE 20

Finite State Automaton

Operate by reading input symbols (usually characters).

Transition can be taken if labeled with current symbol. ε-transition can be taken at any time.

Accept when final state reached and no more input.

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 18 / 40

slide-21
SLIDE 21

Finite State Automaton

Operate by reading input symbols (usually characters).

Transition can be taken if labeled with current symbol. ε-transition can be taken at any time.

Accept when final state reached and no more input.

Slightly different in a scanner, where the FSA is used as a subroutine to find the longest input string that matches a token RE.

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 18 / 40

slide-22
SLIDE 22

Finite State Automaton

Operate by reading input symbols (usually characters).

Transition can be taken if labeled with current symbol. ε-transition can be taken at any time.

Accept when final state reached and no more input.

Slightly different in a scanner, where the FSA is used as a subroutine to find the longest input string that matches a token RE.

Reject if no transition possible, or no more input and not in final state.

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 18 / 40

slide-23
SLIDE 23

A More Complex Automaton

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 19 / 40

slide-24
SLIDE 24

A More Complex Automaton

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 20 / 40

slide-25
SLIDE 25

A More Complex Automaton

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 21 / 40

slide-26
SLIDE 26

A More Complex Automaton

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 22 / 40

slide-27
SLIDE 27

A More Complex Automaton

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 23 / 40

slide-28
SLIDE 28

A More Complex Automaton

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 24 / 40

slide-29
SLIDE 29

A More Complex Automaton

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 25 / 40

slide-30
SLIDE 30

A More Complex Automaton

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 26 / 40

slide-31
SLIDE 31

A More Complex Automaton

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 27 / 40

slide-32
SLIDE 32

A More Complex Automaton

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 28 / 40

slide-33
SLIDE 33

A More Complex Automaton

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 29 / 40

slide-34
SLIDE 34

A More Complex Automaton

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 30 / 40

slide-35
SLIDE 35

A More Complex Automaton

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 31 / 40

slide-36
SLIDE 36

DFA vs. NFA

Deterministic Finite Automata (DFA)

No choice of which transition to make.

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 32 / 40

slide-37
SLIDE 37

DFA vs. NFA

Deterministic Finite Automata (DFA)

No choice of which transition to make.

Non-deterministic Finite Automata (NFA)

Choice of transition in at least one case.

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 32 / 40

slide-38
SLIDE 38

DFA vs. NFA

Deterministic Finite Automata (DFA)

No choice of which transition to make.

Non-deterministic Finite Automata (NFA)

Choice of transition in at least one case. ε transitions (arcs): If the current state has any outgoing ε arcs, we can follow any of them without consuming any input. Modeling choice option 1: guess path, backtrack if rejects Option 2: “clone” at choice point, accept if any clone accepts.

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 32 / 40

slide-39
SLIDE 39

Simulating an NFA

For each character in the input:

For each current state:

  • Follow all transitions labeled with the current letter.
  • Add these states to the set of new states.

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 33 / 40

slide-40
SLIDE 40

Simulating an NFA

For each character in the input:

For each current state:

  • Follow all transitions labeled with the current letter.
  • Add these states to the set of new states.

Add every state reachable by an ε-move to the set of next states.

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 33 / 40

slide-41
SLIDE 41

Simulating an NFA

For each character in the input:

For each current state:

  • Follow all transitions labeled with the current letter.
  • Add these states to the set of new states.

Add every state reachable by an ε-move to the set of next states.

Accept if some way to reach a final state on given input. Reject if no possible way to final state.

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 33 / 40

slide-42
SLIDE 42

FAs in Scanners

Want DFA for speed (no backtracking or cloning). But conversion from regular expressions to NFA is easier. Luckily, there is a well-defined procedure for converting an NFA to an equivalent DFA.

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 34 / 40

slide-43
SLIDE 43

Usefulness of RE to NFA Construction

Lexical Analysis

Specify language tokens (identifiers, numerical constants, symbols etc.) as REs. Tools like lex automatically generate automaton-based code to decompose source code into constituent tokens.

Pattern Matching e.g. text editors, grep

Pattern specified as RE. Automaton-based search locates occurrences.

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 35 / 40

slide-44
SLIDE 44

Lexical Analysis Generators

Generates analyzer automatically from “descriptions” (regular expressions/ NFAs) of tokens in the programming language. Examples: lex/flex for C jFlex for Java

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 36 / 40

slide-45
SLIDE 45

Terminology

A token is a group of characters having collective meaning. A lexeme is an actual character sequence forming a specific instance

  • f a token, such as num.

A pattern is a rule expressed as a regular expression and describing how a particular token can be formed. For example, [A-Za-z][A-Za-z 0-9]* is a rule.

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 37 / 40

slide-46
SLIDE 46

(jF)Lex

Input:

description of token structure (regular expressions) information on how to “process” different tokens

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 38 / 40

slide-47
SLIDE 47

(jF)Lex

Input:

description of token structure (regular expressions) information on how to “process” different tokens

Output: an implemetation of NFA-based function that:

recognizes tokens (as specified by RE rules) processes them (as specified by actions)

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 38 / 40

slide-48
SLIDE 48

jFlex Program Format

/* User code */ %% /* Options and declarations */ %% /* Lexical Rules */

1 User Code (e.g. import statements), included top of generated Java;

  • ften empty.

2 Options “Marcos” (named REs); code to be spliced into generated

Java class.

3 Rule = Pattern + Action. 4 Pattern = Regular Expression. 5 Action = Snippet of Java code (Actions triggered whenever pattern

matched).

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 39 / 40

slide-49
SLIDE 49

jFlex RE Syntax

Janyl Jumadinova Compiler Development (CMPSC 401) January 29, 2019 40 / 40