csci 2320 lexical analysis
play

CSCI 2320 Lexical Analysis Ref: Ch 3 + Handout (Nishimura) MOHAMMAD - PDF document

9/25/17 CSCI 2320 Lexical Analysis Ref: Ch 3 + Handout (Nishimura) MOHAMMAD T. IRFAN Plan Chomsky Hierarchy Lexical Analysis 1 9/25/17 Chomsky Hierarchy Faster computa?on Regular grammar BoGom of hierarchy Context-free grammar


  1. 9/25/17 CSCI 2320 Lexical Analysis Ref: Ch 3 + Handout (Nishimura) MOHAMMAD T. IRFAN Plan Chomsky Hierarchy Lexical Analysis 1

  2. 9/25/17 Chomsky Hierarchy Faster computa?on Regular grammar BoGom of hierarchy Context-free grammar (CFG/BNF) Context-sensi?ve grammar Unrestricted grammar Top of hierarchy More expressive power Chomsky Hierarchy A, B ∈ N ω ∈ T* α, β ∈ (T U N)* A → ω B A → ω B | ω Regular grammar A → ω Context-free grammar (CFG/BNF) A → β Context-sensi?ve grammar α → β, where |α| <= |β| Unrestricted grammar α → β 2

  3. 9/25/17 Regular grammar: A, B ∈ N A → ω B ω ∈ T* A → ω pros and cons Pros ◦ Can do the first layer of abstrac?on in PL syntax ◦ Integer → 0 Integer | 1 Integer | ... | 9 Integer | 0 | 1 | ... | 9 ◦ Note: following is not regular grammar (why?) ◦ Integer à Integer Digit ◦ Digit à 0 | 1 | ... | 9 Cons ◦ Cannot check balanced parenthesis, braces, etc. ◦ Cannot represent {a n b n | n >= 1} CFG/BNF/EBNF: A ∈ N A → β β ∈ (T U N)* pros and cons Pros ◦ Can do all layers of abstrac?ons in PL syntax ◦ Assignment à Iden-fier = Expression; Cons ◦ Can't do lots of seman?c-type things ◦ Variable declared before use? ◦ Operand and operator compa?ble? ◦ Can't represent languages like {ww | w ∈ T + } ◦ Can do equality checking (a n b n ), but can't detect repe??on 3

  4. 9/25/17 A, B ∈ N Context-sensiRve: ω ∈ T* α, β ∈ (T U N)* pros and cons α → β, where |α| <= |β| Pros ◦ Can represent languages like {a n b n c n | n >= 1} Cons ◦ It is undecidable whether a given sentence ω can be derived from a given context-sensi?ve grammar ◦ Can't do parsing! ◦ Can't write a compiler for context-sensi?ve grammar! A, B ∈ N Unrestricted: ω ∈ T* α, β ∈ (T U N)* pros and cons α → β Pros ◦ Equivalent to Turing machine ◦ That is, can compute any computable func?on Cons ◦ Can we do parsing? 4

  5. 9/25/17 Plan Chomsky Hierarchy Lexical Analysis Lexical Analysis Input: Lexemes (typed ASCII characters) Output: Tokens (sequence of characters having a collec?ve meaning) Discard: whitespace, comments int count = 10; Lexemes int count = 10 ; keywo ident opera intLi separ Tokens rd ifier tor teral ator 5

  6. 9/25/17 Why do lexical analysis separately? Simpler, faster grammar for parsing ◦ Next: how? 75% of ?me spent in lexical analysis Def. Regular Expressions RegExpr Meaning x a character x \x an escaped character, e.g., \n { Z } a reference to a reg expr Z M | N M or N, where M and N are reg expr M N M followed by N M* zero or more occurrences of M M+ One or more occurrences of M M? Zero or one occurrence of M 6

  7. 9/25/17 Def. Regular Expressions RegExpr Meaning [aeiou] the set of vowels [0-9] the set of digits . Any single character Special symbols: ^ means not (e.g., [^aeiouAEIOU] is a non-vowel) CLite regular definiRon Category Defini3on AnyChar [ -~] From space (ASCII 27) to ?lde (126) LeGer [a-zA-Z] Digit [0-9] Whitespace [ \t] Space and tab Eol \n 7

  8. 9/25/17 Category Defini3on Keyword bool | char | else | false | float | if | int | main | true | while Iden?fier {LeGer}({LeGer} | {Digit})* IntegerLit {Digit}+ FloatLit {Digit}+\.{Digit}+ CharLit '{AnyChar}' Category Defini3on Operator = | || | && | == | != | < | 
 <= | > | + | - | * | / | ! | [ | ] Separator : | . | { | } | ( | ) Comment // ({AnyChar} | {Whitespace})* {Eol} 8

  9. 9/25/17 ImplementaRon Using Python Python's re package hGps://docs.python.org/3/library/re.html import re #regex re.split(...) #Use regex argument to split a string into parts Common string matching regex: Symbol Defini3on \d [0-9] \D [^0-9] \w [a-zA-Z0-9_] \W [^a-zA-Z0-9_] 9

  10. 9/25/17 Describe the language: 1. 0(0|1) + 0 2. ((ε|0)1*)* 3. 0*10*10*10* 4. (00|11)* Write regular expression for: 1. All strings of lowercase leGers, where leGers appear in ascending order. 2. All strings of leGers containing vowels in order. 10

  11. 9/25/17 Exam 1 Coming Thursday, Sept 28 Start of class (30 min) Up to today's class Finite State Automata (FSA) BEHIND THE SCENE OF REGULAR EXPRESSIONS 11

  12. 9/25/17 Finite State Automata (FSA) Σ: Input alphabet + unique end symbol ($) Set of states ◦ Represented by nodes ◦ Unique start state ◦ One or more final states State transi?on func?on ◦ Labelled (using alphabet) arcs in graph DeterminisRc F.A. (DFA) There is at most one outgoing arc from any state for any par?cular input symbol ◦ Easy to parse: does x belong to L G ? 12

  13. 9/25/17 Non-determinisRc F.A. (NFA) Allows mul?ple outgoing arcs from a state for the same input symbol Allows transi?ons on empty string (ε) ◦ Easy to express a language ◦ But difficult to parse Known algorithms 1. DFA à regular expression 2. Regular expression à NFA Language designer à implementa?on (parsing) 3. NFA à DFA DFA à Regex à NFA à DFA All 3 are equivalent! 13

  14. 9/25/17 Example State elimina?on algorithm • Nishimura handout: + means | Odd binary number (More details soon) Regex à NFA à DFA à Regex (0|1)*1 à ? à ? à ? Idea: • For |, symbols will be on the same arc For concatena?on, create new state • • For *, use self-loop (More details on next slide) Idea: • Start with the NFA start symbol and tabulate all possible sets of NFA states that you can reach on 0 and 1 transi?ons. • Each set of NFA state is a DFA state. Regex à NFA ScoG, Programming Languages (2000) 14

  15. 9/25/17 NFA/DFA à Regex State eliminaRon algorithm How to preserve all paths a•er dele?ng a node? For each node to be deleted: ◦ Match each incoming arc with every outgoing arc Class ParRcipaRon 3 Do the following for binary numbers with an even number of 0s: Regular expression à NFA à DFA à Regular expression. 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend