1
play

1 Specifying Tokens with SableCC Recognizing Tokens with DFAs - PowerPoint PPT Presentation

Scanning and Parsing Structure of a Typical Interpreter Compiler Analysis Synthesis Announcements character stream Project 1 is 5% of total grade Project 2 is 10% of total grade lexical analysis IR code generation Project 3 is


  1. Scanning and Parsing Structure of a Typical Interpreter Compiler Analysis Synthesis Announcements character stream – Project 1 is 5% of total grade – Project 2 is 10% of total grade lexical analysis IR code generation – Project 3 is 15% of total grade tokens “words” IR – Project 4 is 10% of total grade syntactic analysis optimization Today AST “sentences” IR – Outline of planned topics for course – Overall structure of a compiler semantic analysis code generation – Lexical analysis (scanning) annotated AST – Syntactic analysis (parsing) target language interpreter CS553 Lecture Scanning and Parsing 2 CS553 Lecture Scanning and Parsing 3 Lexical Analysis (Scanning) Interaction Between Scanning and Parsing Break character stream into tokens (“words”) – Tokens, lexemes, and patterns – Lexical analyzers are usually automatically generated from patterns lexer.next() parse tree (regular expressions) ( e.g., lex) lexer.peek() or AST character stream Lexical Examples Parser analyzer token lexeme(s) pattern token const const const if if if relation <,<=,=,!=,... < | <= | = | != | ... identifier foo,index [a-zA-Z_]+[a-zA-Z0-9_]* number 3.14159,570 [0-9]+ | [0-9]*.[0-9]+ string “hi”, “mom” “.*” const pi := 3.14159 ⇒ const, identifier ( pi ), assign,number ( 3.14159 ) CS553 Lecture Scanning and Parsing 4 CS553 Lecture Scanning and Parsing 5 1

  2. Specifying Tokens with SableCC Recognizing Tokens with DFAs Theory meets practice: f i – Regular expressions, formal 1 4 5 ‘if‘ t_if languages, grammars, parsing… SableCC example input file: Tokens Package minijava; t_plus = '+'; letter or digit Helpers t_if = 'if'; all = [0..0xFFFF]; letter cr = 13; letter (letter | digit)* t_id = letter (letter | digit | underscore)*; 1 2 t_id digit = ['0'..'9']; t_blank = (' ' | eo eol | tab)+; letter = ['a'..'z'] | ['A'..'Z']; t_comment = c_comment | line_comment; underscore = ’_’; Ignored Tokens not_star = [all - '*']; Ambiguity due to matching substrings not_star_slash = [not_star - '/']; t_blank, – Longest match t_comment; c_comment = '/*' not_star* ('*' – Rule priority (not_star_slash not_star*)?)* '*/'; CS553 Lecture Scanning and Parsing 6 CS553 Lecture Scanning and Parsing 7 Syntactic Analysis (Parsing) Interaction Between Scanning and Parsing Impose structure on token stream – Limited to syntactic structure ( ⇒ high-level) – Structure usually represented with an abstract syntax tree (AST) lexer.next() parse tree – Parsers are usually automatically generated from context-free grammars lexer.peek() ( e.g., yacc, bison, cup, javacc, sablecc) or AST character stream Lexical Parser for Example analyzer token i 1 10 asg for i = 1 to 10 do a[i] = x * 5; arr tms a i x 5 for id( i ) equal number( 1 ) to number( 10 ) do id( a ) lbracket id( i ) rbracket equal id( x ) times number( 5 ) semi CS553 Lecture Scanning and Parsing 8 CS553 Lecture Scanning and Parsing 9 2

  3. Bottom-Up Parsing: Shift-Reduce Shift-Reduce Parsing Example Stack Input Action Grammer a + b + c (1) S -> E (2) E -> E + T $ a + b + c shift (1) S -> E S -> E (3) E -> T $ a + b + c reduce (4) (2) E -> E + T -> E + T (4) T -> id $ T + b + c reduce (3) (3) E -> T -> E + id (4) T -> id -> E + T + id $ E + b + c shift -> E + id + id $ E + b + c shift -> T + id + id $ E + b + c reduce (4) -> id + id + id $ E + T + c reduce (2) $ E + c shift Rightmost derivation: expand rightmost non-terminals first $ E + c shift SableCC, yacc, and bison generate shift-reduce parsers: $ E + c reduce (4) – LALR(1): look-ahead, left-to-right, rightmost derivation in reverse, 1 symbol lookahead – LALR is a parsing table construction method, smaller tables than canonical LR $ E + T reduce (2) $ E reduce (1) $ S accept Reference: Barbara Ryder’s 198:515 lecture notes Reference: Barbara Ryder’s 198:515 lecture notes CS553 Lecture Scanning and Parsing 10 CS553 Lecture Scanning and Parsing 11 Shift-Reduce Parsing Example (precedence problem) Syntax-directed Translation: AST Construction example Stack Input Action Grammer with production rules (1) S -> E (2) E -> E + T S: E { $$ = $1; }; $ a + b * c shift (3) E -> E * T E: E ‘+’ T { $$ = new node(“+”, $1, $3); } (4) E -> T | T { $$ = $1; } ; (5) T -> id T: T_ID { $$ = new leaf(“id”, $1); }; Implicit parse tree for a+b+c AST for a+b+c S + E + E + T c E + T T_ID b a T T_ID T_ID c b a Reference: Barbara Ryder’s 198:515 lecture notes CS553 Lecture Scanning and Parsing 12 CS553 Lecture Scanning and Parsing 13 3

  4. Using SableCC to specify grammar and generate AST Parsing Terms Productions CFG (Context-free Grammer) cst_program {-> program} = cst_main_class cst_class_decl* – production rule {-> New program(cs cst_main_class.main_class,[cst_class_decl.class_decl])} ; – terminal cst_exp_list {-> exp* } = – nonterminal {many_rule} cst_exp cst_exp_rest* {-> [cst_exp.exp, cst_exp_rest.exp] } – FOLLOW(X): “the set of terminals that can immediately follow X” | {empty_rule} {-> [] } ; cst_exp_rest {-> exp* } = t_comma cst_exp {-> [cst_exp.exp] }; BNF (Backus-Naur Form) and EBNF (Extended BNF): equivalent to CFGs Abstract Syntax Tree program = main_class [class_decls]:class_decl*; exp = {call} exp t_ t_id [args]:exp* | ... CS553 Lecture Scanning and Parsing 14 CS553 Lecture Scanning and Parsing 15 Parsing Terms cont … Concepts Top-down parsing Compilation stages in a compiler – LL(1): left-to-right reading of tokens, leftmost derivation, 1 symbol look-ahead – Scanning, parsing, semantic analysis, intermediate code generation, – Predictive parser : an efficient non-backtracking top-down parser that can handle optimization, code generation LL(1) Lexical analysis or scanning – More generally recursive descent parsing may involve backtracking – Tools: SableCC, lex, flex, etc. Bottom-up Parsing Syntactic analysis or parsing – LR(1): left-to-right reading of tokens, rightmost derivation in reverse, 1 symbol – Tools: SableCC, yacc, bison, etc. lookahead – Shift-reduce parsers: for example, bison, yacc, and SableCC generated parsers – Methods for producing an LR parsing table – SLR, simple LR – Canonical LR, most powerful – LALR(1) CS553 Lecture Scanning and Parsing 16 CS553 Lecture Scanning and Parsing 17 4

  5. Next Time Language Implementation Timeline Modern DFA [Kildall] & Lamport’s parallelism Value numbering [Cocke&Schwartz] For entertainment purposes only! Pascal [Wirth] & 1 st uproc [4004] C [Ritchie] & ML [Milner et al.] COBOL [Short Range Comm.] GCD test [Banerjee & Towle] Flow-sens. defined [Banning] BASIC [Kemeny & Kurtz] Simula [Dahl & Nygaard] Dep. vectors [Karp et al.] Lex & YACC [Johnson] Lecture Copying GC [Cheney] Prolog [Colmeraurer] May v. must [Barth] Parafrase [Kuck] PRE [Morel et al.] Fortran [Backus] LISP [McCarthy] Parser generators Algol [Comm.] – More undergraduate compilers review A-0 [Hopper ] ‘50 ‘60 ‘70 ‘80 Trace sched. [Fisher] Coloring reg. alloc. [Chaitin] Sparse cond. const. [Wegman&Zadeck] 1 st RISC (IBM 801), Wolfe’s thesis Smalltalk [Kay] & PFC [Kennedy] Itanium ships & Jikes RVM [IBM] Superblock scheduling [Hwu] SSA [Cytron] 486 w/ cache PDG [Ferante] Perl [Wall] SW pipelining [Lam] Dragon book [ASU] Java [Gosling&Sun] CS553 @ CSU C++ [Stroustrup] Ocaml [INRIA] ‘80 ‘90 2000 2010 CS553 Lecture Scanning and Parsing 18 CS553 Lecture Scanning and Parsing 19 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend