Automatic scanner generation in MiniJava Symbol class We use the - PDF document

Automatic scanner generation in MiniJava Symbol class We use the jflex tool to automatically create a scanner from a Lexemes are represented as instances of class Symbol specification file, Scanner/minijava.jflex class Symbol { (We use the CUP tool to automatically create a parser from a int sym; // which token class? specification file, Parser/minijava.cup , which also Object value; // any extra data for this lexeme generates all the code for the token classes used in the ... scanner, via the Symbol class.) } The MiniJava Makefile automatically rebuilds the scanner A different integer constant is defined for each token class, in the (or parser) whenever its specification file changes sym helper class class sym { static int CLASS = 1; static int IDENTIFIER = 2; static int COMMA = 3; ... } Can use this in printing code for Symbol s • see symbolToString in minijava.jflex Craig Chambers 40 CSE 401 Craig Chambers 41 CSE 401 Token declarations jflex token specifications Declare new token classes in Parser/minijava.cup , Helper definitions for character classes and regular expressions using terminal declarations letter = [a-zA-Z] • include Java type if Symbol stores extra data eol = [\r\n] Examples: (Simple) token definitions are of the form: regexp { Java stmt } /* reserved words: */ terminal CLASS, PUBLIC, STATIC, EXTENDS; regexp can be (at least): ... • a string literal in double-quotes, e.g. "class" , "<=" /* operators: */ • a reference to a named helper, in braces, e.g. {letter} terminal PLUS, MINUS, STAR, SLASH, EXCLAIM; • a character list or range, in square brackets, e.g. [a-zA-Z] ... • a negated character list or range, e.g. [^\r\n] /* delimiters: */ • . (which matches any single character) terminal OPEN_PAREN, CLOSE_PAREN; • regexp regexp , regexp | regexp , regexp * , regexp + , regexp ? , ( regexp ) terminal EQUALS, SEMICOLON, COMMA, PERIOD; ... Java stmt (the accept action) is typically: /* tokens with values: */ • return symbol(sym. CLASS ); for a simple token terminal String IDENTIFIER; • return symbol(sym. CLASS , yytext()); for a token terminal Integer INT_LITERAL; with extra data based on the lexeme string yytext() • empty for whitespace Craig Chambers 42 CSE 401 Craig Chambers 43 CSE 401

Syntactic Analysis / Parsing Context-free grammars (CFG’s) Purpose: stream of tokens ⇒ abstract syntax tree (AST) Syntax specified using CFG’s • RE’s not powerful enough AST: • can’t handle nested, recursive structure • general grammars (GG’s) too powerful • captures hierarchical structure of input program • not decidable ⇒ parser might run forever! • primary representation of program for rest of compiler CFG’s: convenient compromise • capture important structural & nesting characteristics Plan: • some properties checked later during semantic analysis • study how grammars can specify syntax • study algorithms for constructing ASTs from token streams • study MiniJava implementation Common notation for CFG’s: Extended Backus-Naur Form (EBNF) Craig Chambers 44 CSE 401 Craig Chambers 45 CSE 401 CFG terminology EBNF description of initial MiniJava syntax Program ::= MainClassDecl {ClassDecl} MainClassDecl::= class ID { Terminals : alphabet of language defined by CFG public static void main ( String [ ] ID ) { {Stmt} } } ClassDecl ::= class ID [ extends ID] { Nonterminals : symbols defined in terms of terminals and {ClassVarDecl} {MethodDecl} } nonterminals ClassVarDecl ::= Type ID ; MethodDecl ::= public Type ID Production : rule for how a nonterminal (l.h.s.) is defined in ( [Formal { , Formal}] ) { {Stmt} return Expr ; } terms of a finite, possibly empty sequence of terminals & Formal ::= Type ID nonterminals Type ::= int | boolean | ID • recursive productions allowed! Stmt ::= Type ID ; | { {Stmt} } Can have multiple productions for same nonterminal | if ( Expr ) Stmt else Stmt | while ( Expr ) Stmt • alternatives | System.out.println ( Expr ) ; | ID = Expr ; Start symbol : root symbol defining language Expr ::= Expr Op Expr | ! Expr | Expr . ID ( [Expr { , Expr}] ) | ID | this Program ::= Stmt | Integer | true | false Stmt ::= if ( Expr ) Stmt else Stmt | ( Expr ) Stmt ::= while ( Expr ) Stmt Op ::= + | - | * | / | < | <= | >= | > | == | != | && Craig Chambers 46 CSE 401 Craig Chambers 47 CSE 401

Transition diagrams Derivations and parse trees “Railroad diagrams” Derivation: sequence of expansion steps, beginning with start symbol, • another, more graphical notation for CFG’s leading to a string of terminals • look like FSA’s, where arcs can be labelled with nonterminals as well as terminals Parsing: inverse of derivation • given target string of terminals (a.k.a. tokens), want to recover nonterminals representing structure Can represent derivation as a parse tree • concrete syntax tree Craig Chambers 48 CSE 401 Craig Chambers 49 CSE 401 Example grammar Ambiguity E ::= E Op E | - E | ( E ) | id Some grammars are ambiguous : Op ::= + | - | * | / • multiple distinct parse trees with same final string Structure of parse tree captures much of meaning of program; ambiguity ⇒ multiple possible meanings for same program Craig Chambers 50 CSE 401 Craig Chambers 51 CSE 401

Famous ambiguities: “dangling else” Resolving the ambiguity Stmt ::= ... | Option 1: add a meta-rule e.g. “ else associates with closest previous if ” if ( Expr ) Stmt | if ( Expr ) Stmt else Stmt • works, keeps original grammar intact • ad hoc and informal “ if ( e 1 ) if ( e 2 ) s 1 else s 2 ” Craig Chambers 52 CSE 401 Craig Chambers 53 CSE 401 Option 2: rewrite the grammar to resolve ambiguity explicitly Option 3: redesign the language to remove the ambiguity Stmt ::= ... | Stmt ::= MatchedStmt | UnmatchedStmt if Expr then Stmt end | MatchedStmt ::= ... | if Expr then Stmt else Stmt end if ( Expr ) MatchedStmt else MatchedStmt • formal, clear, elegant UnmatchedStmt ::= if ( Expr ) Stmt | if ( Expr ) MatchedStmt • allows sequence of Stmt s in then and else branches, no { , } needed else UnmatchedStmt • extra end required for every if • formal, no additional rules beyond syntax • sometimes obscures original grammar Craig Chambers 54 CSE 401 Craig Chambers 55 CSE 401

Another famous ambiguity: expressions Resolving the ambiguity E ::= E Op E | - E | ( E ) | id Option 1: add some meta-rules, e.g. precedence and associativity rules Op ::= + | - | * | / “ a + b * c ” Example: E ::= E Op E | - E | E ++ | ( E ) | id Op ::= + | - | * | / | % | ** | == | < | && | || operator precedence associativity postfix ++ highest left prefix - right ** (expon.) right * , / , % left + , - left == , < none && left || lowest left Craig Chambers 56 CSE 401 Craig Chambers 57 CSE 401 Option 2: modify the grammar to explicitly resolve the ambiguity Example, redone Strategy: E ::= E0 • create a nonterminal for each precedence level E0 ::= E0 || E1 | E1 left associative • expr is lowest precedence nonterminal, E1 ::= E1 && E2 | E2 left associative each nonterminal can be rewritten with higher E2 ::= E3 ( == | < ) E3 non associative precedence operator, E3 ::= E3 ( + | - ) E4 | E4 left associative highest precedence operator includes atomic exprs E4 ::= E4 ( * | / | % ) E5 | E5 left associative • at each precedence level, use: E5 ::= E6 ** E5 | E6 right associative • left recursion for left-associative operators E6 ::= - E6 | E7 right associative • right recursion for right-associative operators E7 ::= E7 ++ | E8 left associative • no recursion for non-associative operators E8 ::= id | ( E ) Craig Chambers 58 CSE 401 Craig Chambers 59 CSE 401

Designing a grammar Parsing algorithms Concerns: Given grammar, want to parse input programs • accuracy • check legality • unambiguity • produce AST representing structure • formality • be efficient • readability, clarity • ability to be parsed by particular parsing algorithm Kinds of parsing algorithms: • top-down parser ⇒ LL(k) grammar • top-down • bottom-up parser ⇒ LR(k) grammar • bottom-up • ability to be implemented using a particular strategy • by hand • by automatic tools Craig Chambers 60 CSE 401 Craig Chambers 61 CSE 401 Top-down parsing Predictive parsing Build parse tree for input program from the top (start symbol) Predictive parser: down to leaves (terminals) top-down parser that can select correct rhs looking at at most k input tokens (the lookahead ) Basic issue: Efficient: • when "expanding" a nonterminal with some r.h.s., how to pick which r.h.s.? • no backtracking needed • linear time to parse E.g. Stmt ::= Call | Assign | If | While Implementation of predictive parsers: Call ::= Id ( Expr { , Expr } ) • recursive-descent parser Assign ::= Id = Expr ; • each nonterminal parsed by a procedure If ::= if Test then Stmts end | • call other procedures to parse sub-nonterminals, recursively if Test then Stmts else Stmts end • typically written by hand While ::= while Test do Stmts end • table-driven parser • PDA: like table-driven FSA, plus stack to do recursive FSA calls Solution: look at input tokens to help decide • typically generated by a tool from a grammar specification Craig Chambers 62 CSE 401 Craig Chambers 63 CSE 401

Automatic scanner generation in MiniJava Symbol class We use the - PDF document

Automatic scanner generation in MiniJava Symbol class We use the jflex tool to automatically create a scanner from a Lexemes are represented as instances of class Symbol specification file, Scanner/minijava.jflex class Symbol { (We use the CUP

2D & 3D Scanner 2D & 3D Scanner 3D & 2D scanner HD The scanner C800 was specially

Symbol tables COMP 520 Fall 2013 Symbol tables (2) Symbol tables are used to describe and analyse

Getting Input from a File To open a file for reading: Scanner inFile = new Scanner (file);

2. Lexical Analysis 2.1 Tasks of a Scanner 2.2 Regular Grammars and Finite Automata 2.3 Scanner

2. Lexical Analysis 2.1 Tasks of a Scanner 2.2 Regular Grammars and Finite Automata 2.3 Scanner

Lexical Analysis The Scanner CSC 4181 Compiler Construction 1 Scanner 1 Introduction A

INF5110 Compiler Construction Symbol tables Spring 2016 1 / 43 Outline 1. Symbol tables

Introduction to Computer Science I Scanner, Increment/Decrement, Conversion Janyl Jumadinova

Introduction to Computer Science I Scanner, Increment/Decrement, Conversion Janyl Jumadinova 5

HP Presentation Barcode Scanner The HP Presentation Barcode scanner is compatible with CornerStore

Scanning Slides and Negatjves Prepare the scanner #1 1. Open the scanner and remove the document

Retrieving String inputs from the keyboard package PackageName ; // import the Scanner class

Fermilab LBNF CF Far Detector BSI Facilities Engineering Services Section 8/26/2015

Symbol Tables in JastAdd What is a symbol table used for? Determining the origin and the

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

How to Scan 35mm Slides Make sure the scanner cable is plugged into the iMacs USB port. 1 Tie

17. Recursion 2 Input: 3 + 5 * 20 Output: 103 Input: (3 + 5) * 20 Output: 160 Input: -(3 + 5) + 20

UMBC A B M A L T F O U M B C I M Y O R T 1 (Feb. 25, 2002) I E S R C E O

Memory Systems Design & Programming CMPE 310 Memory Types Two basic types: ROM:

Inequality and Instability What we learned and why it matters James K. Galbraith and Batrice

CSC 1800 Organization of Programming Languages Date Syntax 1 Assignment 2 BNF/EBNF for

tr tr

Compiling Techniques Lecture 5: Top-Down Parsing Christophe Dubach 26 September 2017 Christophe

24) Exchange Syntax and Textual 3.Advanced features 1. Mapping text to data types DSLs using

Automatic scanner generation in MiniJava Symbol class We use the - PDF document

Automatic scanner generation in MiniJava Symbol class We use the jflex tool to automatically create a scanner from a Lexemes are represented as instances of class Symbol specification file, Scanner/minijava.jflex class Symbol { (We use the CUP

2D &amp; 3D Scanner 2D &amp; 3D Scanner 3D &amp; 2D scanner HD The scanner C800 was specially

Symbol tables COMP 520 Fall 2013 Symbol tables (2) Symbol tables are used to describe and analyse

Getting Input from a File To open a file for reading: Scanner inFile = new Scanner (file);

2. Lexical Analysis 2.1 Tasks of a Scanner 2.2 Regular Grammars and Finite Automata 2.3 Scanner

2. Lexical Analysis 2.1 Tasks of a Scanner 2.2 Regular Grammars and Finite Automata 2.3 Scanner

Lexical Analysis The Scanner CSC 4181 Compiler Construction 1 Scanner 1 Introduction A

INF5110 Compiler Construction Symbol tables Spring 2016 1 / 43 Outline 1. Symbol tables

Introduction to Computer Science I Scanner, Increment/Decrement, Conversion Janyl Jumadinova

Introduction to Computer Science I Scanner, Increment/Decrement, Conversion Janyl Jumadinova 5

HP Presentation Barcode Scanner The HP Presentation Barcode scanner is compatible with CornerStore

Scanning Slides and Negatjves Prepare the scanner #1 1. Open the scanner and remove the document

Retrieving String inputs from the keyboard package PackageName ; // import the Scanner class

Fermilab LBNF CF Far Detector BSI Facilities Engineering Services Section 8/26/2015

Symbol Tables in JastAdd What is a symbol table used for? Determining the origin and the

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

How to Scan 35mm Slides Make sure the scanner cable is plugged into the iMacs USB port. 1 Tie

17. Recursion 2 Input: 3 + 5 * 20 Output: 103 Input: (3 + 5) * 20 Output: 160 Input: -(3 + 5) + 20

UMBC A B M A L T F O U M B C I M Y O R T 1 (Feb. 25, 2002) I E S R C E O

Memory Systems Design &amp; Programming CMPE 310 Memory Types Two basic types: ROM:

Inequality and Instability What we learned and why it matters James K. Galbraith and Batrice

CSC 1800 Organization of Programming Languages Date Syntax 1 Assignment 2 BNF/EBNF for

tr tr

Compiling Techniques Lecture 5: Top-Down Parsing Christophe Dubach 26 September 2017 Christophe

24) Exchange Syntax and Textual 3.Advanced features 1. Mapping text to data types DSLs using

2D & 3D Scanner 2D & 3D Scanner 3D & 2D scanner HD The scanner C800 was specially

Memory Systems Design & Programming CMPE 310 Memory Types Two basic types: ROM: