Compiler Construction Lecture 3: Scanner Generators 2020-01-14 - PowerPoint PPT Presentation

Compiler Construction Lecture 3: Scanner Generators 2020-01-14 Michael Engel Includes material by Jan Christian Meyer

Overview • DFAs and regular expressions • Nondeterministic finite automata (NFA) • From regular expressions to NFAs Compiler Construction 03: Scanner generators � 2

The DFA, again Lexical analysis This DFA from the previous week… [0-9] [0-9] [0-9] '.' s 1 s 2 s 3 …was able to tell you whether a character sequence is a   valid decimal number (integer + optional fractional part) or not • Start with the initial state s 1 , then follow the edges Compiler Construction 03: Scanner generators � 3

  More about lexemes Lexical analysis • Lexeme Common patterns in lexemes • Lexemes are units of • Sequences of specific parts lexical analysis, words • chains of states in the graph   • Like dictionary entries 'a' 'b’ s n s n+1 s n+2 Sequence “ab” 'q' • Repetition Any number   • loops in the graph s n (>=0) of 'q’s • Alternatives s n+1 'a' Either   • different paths in the graph s n 'a' or 'b' 'b’ s n+2 Compiler Construction 03: Scanner generators � 4

    DFA formal notation Lexical analysis Formal definition: DFA = 5-tuple ( Q , Σ , δ , q 0 , F ) Q is a finite set called the states , Σ is a finite set called the alphabet , δ : Q ×Σ → Q is the transition function , Q = { s 1 , s 2 , s 3 } q 0 ∈ Q is the start state , and Σ = {0,1,2,3,4,5,6,7,8,9,.} q 0 = s 1 F ⊆ Q is the set of accepting states F = { s 2 , s 3 } δ =   [0-9] [0-9] δ 0 1 2 3 4 5 6 7 8 9 . s 1 s 2 s 2 s 2 s 2 s 2 s 2 s 2 s 2 s 2 s 2 er s 2 s 2 s 2 s 2 s 2 s 2 s 2 s 2 s 2 s 2 s 2 s 3 s 3 s 3 s 3 s 3 s 3 s 3 s 3 s 3 s 3 s 3 s 3 er [0-9] '.' s 1 s 2 s 2 s 3 Compiler Construction 03: Scanner generators � 5

Alphabets in DFAs • Alphabet : finite set of symbols (characters) • {0,1} is the alphabet of binary strings • [A-Za-z0-9] is the alphabet of alphanumeric strings • A language is a set of valid strings (sequences of symbols) over an alphabet • L = {000, 010, 100, 110} is the language of   “even, positive binary numbers less than 8” • A finite automaton accepts a language • it decides whether or not a given strings belongs to the language described by it Compiler Construction 03: Scanner generators � 6

Operations on languages • Union of languages: s ∈ L 1 ∪ L 2 if s ∈ L 1 or s ∈ L 2 • Concatenation : L 1 L 2 = { s 1 s 2 | s 1 ∈ L 1 and s 2 ∈ L 2 } • Concatenation of a language with itself: “multiplication”   ( Cartesian product ):   LLL = { s 1 s 2 s 3 | s 1 ∈ L and s 2 ∈ L and s 3 ∈ L } • Closures L* = ∪ i= 0,1,2 ,… L i : “Kleene closure”: 0 or more strings from L • L + = ∪ i= 1,2 ,… L i : “Positive closure”: 1 or more strings from L • Compiler Construction 03: Scanner generators � 7

Operations on languages: examples • Union of languages: s ∈ L 1 ∪ L 2 if s ∈ L 1 or s ∈ L 2 • L 1 = {000, 010, 100, 110} , L 2 = {001, 011, 101, 111}   ⇒ L 1 ∪ L 2 = {000, 001, 010, 011, 100, 101, 110, 111} • Concatenation : L 1 L 2 = { s 1 s 2 | s 1 ∈ L 1 and s 2 ∈ L 2 } • L 1 = {“ab”, “c”}, L 2 = {“x”}   ⇒ L 1 L 2 = {“abx”, “cx”} • Concatenation of a language with itself: “multiplication”   ( Cartesian product ):   LLL = { s 1 s 2 s 3 | s 1 ∈ L and s 2 ∈ L and s 3 ∈ L } • L = {“a”, “b”} ⇒ LLL =   { “aaa”, “aab”, “aba”, “abb”, “baa”, “bab”, “bba”, “bbb" } Compiler Construction 03: Scanner generators � 8

        Operations on languages: examples • Closures L* = ∪ i= 0,1,2 ,… L i : “Kleene closure”: 0 or more strings from L   • 0 strings = empty word ε (“epsilon”) {"ab","c"}* = { ε , "ab", "c", "abab", "abc", "cab", "cc", "ababab", "ababc", "abcab", "abcc", "cabab", "cabc", "ccab", "ccc", ...} L + = ∪ i= 1,2 ,… L i : “Positive closure”: 1 or more strings from L   • {"a", "b", “c”} + = { "a", "b", "c", "aa", "ab", "ac", "ba", "bb", "bc", "ca", "cb", "cc", "aaa", "aab", …}   L* = { ε } ∪ L + • Compiler Construction 03: Scanner generators � 9

Regular expressions (“regexp”) Given: Empty string ε (epsilon), Alphabet 𝝩 (sigma) Recursive definition of regular expressions: Basis • ε is a regular expression, L ( ε ) is the language with only ε in it • If a is in Σ , then a is also a regular expression, L (a) is the language with only a in it Induction • If r 1 and r 2 are regexps ⇒ r 1 | r 2 is regexp for L(r 1 ) ∪ L(r 2 ) ( selection ) • If r 1 and r2 are regexps ⇒ r 1 r 2 is regexp for L(r 1 )L(r 2 ) ( concatenation ) • If r is a regular expression ⇒ r* denotes L(r)* ( Kleene closure ) • (r) is a regular expression denoting L(r)   ( We can add parentheses to group parts of the regexp ) Compiler Construction 03: Scanner generators � 10

  DFAs and regular expressions Lexical analysis Again, the DFA which accepts decimal numbers: [0-9] [0-9] [0-9] '.' s 1 s 2 s 3 This DFA corresponds to the following regular expression: [0-9] [0-9]* ( . [0-9]* )? Abbreviated notation used for regexps:   . – any character ∈ 𝝩   optional, since [abc] – either 'a' or 'b' or 'c' state s 2 accepts [a-d] – characters from 'a' to 'd' inclusive ? – either zero or one repetition Compiler Construction 03: Scanner generators � 11

Three ways to describe a language • Graphs • provide a quick overview of the structure • Tables • help writing programs to implement the DFA • Regular expressions • help generating accepting automata automatically Compiler Construction 03: Scanner generators � 12

Regular languages • All three representations are equivalent • We have not shown a formal way to transform one representations into the other and did not prove this • Maybe you can still see it? • The family of languages that can be recognized by automata/regexps is called regular languages • They are an important and powerful class of languages • However, they do not cover all use cases • e.g., recursion cannot be specified using regexps • more on this later… Compiler Construction 03: Scanner generators � 13

Combining automata Wanted: language that includes the words {“all”, “and”} • Simple DFAs to detect each of the words separately: l a l a n d We omit the numbering of states if the specific number is not relevant for an example Compiler Construction 03: Scanner generators � 14

Combining automata Wanted: language that includes the words {“all”, “and”} • Can we build an automaton to detect both words? • How about combining both DFAs? • Simply join the starting and accepting states of both: l a l a d n Compiler Construction 03: Scanner generators � 15

Now we have a (small) problem “Walking” the DFA does not work any more • Starting at s 0 and reading 'a', the next state can be s 1 or s 2 • If we read an 'a', chose s 1 and then read an ’n' ⇒ wrong path • We would need to go to states s 1 and s 2 at the same time • Otherwise, we would need some way to backtrack to s 0 l a s 1 l s 0 a d s 2 n Compiler Construction 03: Scanner generators � 16

An obvious solution Combine states states s 1 and s 2   ⇒ postpone the decision which path to choose • Walking the DFA works again! • Need to determine which parts both words have in common   (can that be generalized?) l l a n d Compiler Construction 03: Scanner generators � 17

Non-Deterministic Finite Automata Idea:   admit multiple transitions from one state on the same character • Alternative: allow transitions on the empty input ε   (i.e., without reading a character) • Both notations are equivalent: a l l l a ε ε l ε a d ε n d a n Compiler Construction 03: Scanner generators � 18

NFAs and regular expressions NFAs can easily be constructed from regular expressions • For our example, the regexp would be: { all | and }   (equivalent deterministic variant: a{ll | nd}) • The two sub-automata can easily be identified in the graph: sub-automaton (“machine”) 1 a l l ε ε ε ε a n d sub-automaton (“machine”) 2 Compiler Construction 03: Scanner generators � 19

Constructing a scanner What are the parts of a regexp again? 1. a (single) character: stands for itself (or ε – that’s not shown) 2. concatenation: R 1 R 2 3. selection: R 1 | R 2 4. grouping: (R 1 ) 5. Kleene closure: R 1 * • We can construct an NFA for each of these   …as long as R 1 and R 2 are regexps ( ⇒ recursive definition) • Note: each DFA is also an NFA (with zero ε -transitions) • Formal: the set of DFAs is a subset of the set of NFAs Compiler Construction 03: Scanner generators � 20

Constructing a scanner: characters Single characters (and epsilons) in a regexp become transitions between two states in an NFA • For our example { all | and }, the transitions are thus: a l l a n d Now we can combine these simple regexps… Compiler Construction 03: Scanner generators � 21

Compiler Construction Lecture 3: Scanner Generators 2020-01-14 - PowerPoint PPT Presentation

Compiler Construction Lecture 3: Scanner Generators 2020-01-14 Michael Engel Includes material by Jan Christian Meyer Overview DFAs and regular expressions Nondeterministic finite automata (NFA) From regular expressions to NFAs

Compiler Construction Chapter 11 1 Compiler Construction Compiler Construction A New Compiler

Compiler Construction Compiler Construction 1 / 111 Mayer Goldberg \ Ben-Gurion University

Compiler Construction November 21, 2018 Compiler Construction November 21, 2018 1 / 102 Mayer

Compiler Construction Compiler Construction 1 / 54 Mayer Goldberg \ Ben-Gurion University Tuesday

Compiler Construction Compiler Construction 1 / 193 Mayer Goldberg \ Ben-Gurion University Friday

Compiler Construction October 20, 2018 Compiler Construction October 20, 2018 1 / 115 Mayer

Compiler Construction Compiler Construction 1 / 177 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Compiler Construction 1 / 87 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Compiler Construction 1 / 88 Mayer Goldberg \ Ben-Gurion University Tuesday

Compiler Construction Compiler Construction 1 / 104 Mayer Goldberg \ Ben-Gurion University Friday

Compiler Construction Compiler Construction 1 / 104 Mayer Goldberg \ Ben-Gurion University Monday

Compiler Construction October 31, 2018 Compiler Construction October 31, 2018 1 / 175 Mayer

Compiler Construction Compiler Construction 1 / 114 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Compiler Construction 1 / 112 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Christian Rinderknecht 31 October 2008 1 Why study compiler construction?

Compiler Construction Lecture 19: Code Generation V (Compiler Backend) Winter Semester 2018/19

91.304 Foundations of (Th (Theoretical) Computer Science ti l) C t S i Chapter 1 Lecture

Lecture 4 Regular Expressions 4-0 DFAs vs NFAs Surprisingly, for finite

CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall Announcements HW-01

Theory of Computer Science C6. Context-free Languages: Closure & Decidability Gabriele R

CS 301 Lecture 07 Closure properties of regular languages Stephen Checkoway February 7, 2018

Applications in finite state automata Completeness of Regular Relations Kurt Eberle

Membership Properties for Regular Languages 5DV037 Fundamentals of Computer Science Ume a

Concatenation hierarchies and separation Marc Zeitoun LaBRI, Bordeaux University Caalm 19,

Compiler Construction Lecture 3: Scanner Generators 2020-01-14 - PowerPoint PPT Presentation

Compiler Construction Lecture 3: Scanner Generators 2020-01-14 Michael Engel Includes material by Jan Christian Meyer Overview DFAs and regular expressions Nondeterministic finite automata (NFA) From regular expressions to NFAs

Compiler Construction Chapter 11 1 Compiler Construction Compiler Construction A New Compiler

Compiler Construction Compiler Construction 1 / 111 Mayer Goldberg \ Ben-Gurion University

Compiler Construction November 21, 2018 Compiler Construction November 21, 2018 1 / 102 Mayer

Compiler Construction Compiler Construction 1 / 54 Mayer Goldberg \ Ben-Gurion University Tuesday

Compiler Construction Compiler Construction 1 / 193 Mayer Goldberg \ Ben-Gurion University Friday

Compiler Construction October 20, 2018 Compiler Construction October 20, 2018 1 / 115 Mayer

Compiler Construction Compiler Construction 1 / 177 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Compiler Construction 1 / 87 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Compiler Construction 1 / 88 Mayer Goldberg \ Ben-Gurion University Tuesday

Compiler Construction Compiler Construction 1 / 104 Mayer Goldberg \ Ben-Gurion University Friday

Compiler Construction Compiler Construction 1 / 104 Mayer Goldberg \ Ben-Gurion University Monday

Compiler Construction October 31, 2018 Compiler Construction October 31, 2018 1 / 175 Mayer

Compiler Construction Compiler Construction 1 / 114 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Compiler Construction 1 / 112 Mayer Goldberg \ Ben-Gurion University

Compiler Construction Christian Rinderknecht 31 October 2008 1 Why study compiler construction?

Compiler Construction Lecture 19: Code Generation V (Compiler Backend) Winter Semester 2018/19

91.304 Foundations of (Th (Theoretical) Computer Science ti l) C t S i Chapter 1 Lecture

Lecture 4 Regular Expressions 4-0 DFAs vs NFAs Surprisingly, for finite

CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall Announcements HW-01

Theory of Computer Science C6. Context-free Languages: Closure &amp; Decidability Gabriele R

CS 301 Lecture 07 Closure properties of regular languages Stephen Checkoway February 7, 2018

Applications in finite state automata Completeness of Regular Relations Kurt Eberle

Membership Properties for Regular Languages 5DV037 Fundamentals of Computer Science Ume a

Concatenation hierarchies and separation Marc Zeitoun LaBRI, Bordeaux University Caalm 19,

Theory of Computer Science C6. Context-free Languages: Closure & Decidability Gabriele R