1
play

1 L L (k) L L(k) LL( k ) LL(k) Grammars What if there are common - PDF document

Parsing Algorithms Earleys algorithm (1970) Top-down works for all CFGs Bottom-up Recursive descent O(N 3 ) worst case LL performance O(N 2 ) for Parsing: continued LR unambiguous grammars LALR


  1. Parsing Algorithms • Earley’s algorithm (1970) • Top-down • works for all CFGs Bottom-up • Recursive descent – O(N 3 ) worst case • LL performance – O(N 2 ) for Parsing: continued • LR unambiguous grammars • LALR – Based on dynamic • SLR programming, used • CYK primarily for computational • David Notkin GLR linguistics • Simple precedence parser • Different parsing algorithms Autumn 2008 • Bounded context generally place various • … restrictions on the grammar of the language to be parsed • ACM digital library returned 5600+ articles matching “parsing algorithm” • Google Scholar almost 34,000 CSE401 Au08 2 Top Down Parsing Predictive Parser • Build parse tree from the top (start symbol) down to • Predictive parser: top-down parser that uses at most the next k tokens to select production (the lookahead ) leaves (terminals) • Efficient: no backtracking needed, linear time to parse • Basic issue: when expanding a nonterminal, which right • Implementations (analogous to lexing) hand side should be selected? – recursive-descent parser • Solution: look at input tokens to decide • each nonterminal parsed by a procedure • call other procedures to parse sub-nonterminals, Stmts ::= Call | Assign | If | While recursively Call ::= Id ( Expr { , Expr} ) • typically written by hand Assign ::= Id := Expr ; – table-driven parser If ::= if Test then Stmts end • push-down automata: essentially a table-driven FSA, | if Test then Stmts else Stmts end plus stack to do recursive calls While ::= while Test do Stmts end • typically generated by a tool from a grammar specification CSE401 Au08 3 CSE401 Au08 4 1

  2. L L (k) L L(k) LL( k ) LL(k) Grammars What if there are common prefixes? k tokens lookahead Find Left derivation Left-to-right scan • • Left factor common prefixes to eliminate them Can construct predictive parser automatically and easily if grammar is LL(k) – create new nonterminal for different suffixes – Left-to-right scan of input, finds leftmost derivation – delay choice until after common prefix – k tokens of look ahead needed • Before • Some restrictions including If ::= if Test then Stmts end | – no ambiguity if Test then Stmts else Stmts end – no common prefixes of length ≥ k: If ::= if Test then Stmts end | • After if Test then Stmts else Stmts end If ::= if Test then Stmts IfCont – no left recursion (e.g., E ::= E Op E | ...) IfCont ::= end | else Stmts end • Restrictions guarantee that, given k input tokens, can always select correct right hand side to expand nonterminal. CSE401 Au08 5 CSE401 Au08 6 Left recursion? Rewrite… Table-driven predictive parser • Automatically compute PREDICT table from grammar Before After • PREDICT(nonterminal,input-token) => right hand E ::= E + T | T E ::= T ECon side ECon ::= + T ECon |  T ::= T * F | F F ::= id | ... T ::= F TCon TCon ::= * F TCon |  F ::= id | ... • May not be as clear; can sugar it E ::= T { + T } T ::= F { * F } F ::= id | ( E ) | … • Greater distance from concrete syntax to abstract syntax CSE401 Au08 7 CSE401 Au08 8 2

  3. Compute PREDICT table Example for you to do: if you want • Compute FIRST set for each right hand side – All tokens that can appear first in a derivation from that right hand side • In case right hand side can be empty – Compute FOLLOW set for each non-terminal • All tokens that can appear immediately after that non-terminal in a derivation • Compute FIRST and FOLLOW sets mutually recursively • PREDICT then depends on the FIRST set CSE401 Au08 9 CSE401 Au08 10 PREDICT and LL(1) Top down implementation • If PREDICT table has at most one entry per cell • int accept(Symbol s) { For years the 401 compiler was a top- if (sym == s) { – Then the grammar is LL(1) down predictive parser, getsym(); implemented by a – There is always exactly one right choice return 1; method for each • So it’s fast to parse and easy to implement } nonterminal return 0; – We have shifted to a • If multiple entries in each cell bottom-up, automatically } generated parser – Ex: common prefixes, left recursion, ambiguity – But if you’re going to build a simple one, this is – Can rewrite grammar (sometimes) int expect(Symbol s) { usually best if (accept(s)) • – Can patch table manually, if you “know” what to do Examples from return 1; http://en.wikibooks.org/ – Or can use more powerful parsing technique wiki/Compiler_construct error("expect: unexpected symbol"); ion – return 0; Helper functions on right } CSE401 Au08 11 CSE401 Au08 12 3

  4. Example method Example method void statement(void) { if (accept(ident)) { void factor(void) { expect(becomes); if (accept(ident)) { expression(); ; … } else if (accept(number)) { } else if (accept(ifsym)) { ; condition(); } else if (accept(lparen)) { expect(thensym); expression(); statement(); expect(rparen); } else if (accept(whilesym)) { } else { condition(); error("factor: syntax error"); expect(dosym); getsym(); statement(); } } } } CSE401 Au08 13 CSE401 Au08 14 “Shift - reduce” strategy Bottom up parsing • • read (“shift”) tokens until the right hand side of Construct parse tree for input from leaves up – reducing a string of tokens to single start symbol by inverting “correct” production has been seen productions • reduce handle to nonterminal, then continue • Bottom-up parsing is more general than top-down parsing and • done when all input read and reduced to start just as efficient – generally preferred in practice nonterminal Read the T ::= int int * int + int productions found T ::= int * T by bottom-up parse int * T + int bottom to top; this xyzabcdef T ::= int A ::= bc .D T + int is a rightmost ^ E ::= T derivation! T + T E ::= T + E T + E E CSE401 Au08 15 CSE401 Au08 16 4

  5. LR(k) LR Parsing Tables • LR(k) parsing • Construct parsing tables implementing a FSA with a stack – rows: states of parser – Left-to-right scan of input, rightmost derivation – columns: token(s) of lookahead – k tokens of look ahead – entries: action of parser • Strictly more general than LL(k) • shift, goto state X – Gets to look at whole right hand side of production • reduce production “X ::= RHS” before deciding what to do, not just first k tokens • accept – Can handle left recursion and common prefixes • error • Algorithm to construct FSA similar to algorithm to build DFA – As efficient as any top-down parsing from NFA • Complex to implement – each state represents set of possible places in parsing – Generally need automatic tools to construct parser • LR(k) algorithm may build huge tables from grammar CSE401 Au08 17 CSE401 A8 18 Questions? Ada language/compiler color • US DoD wanted (roughly) a single, high-level programming language • They wrote requirements for this language and received 14 bids (1977) • Four semi-finalists (1978): green (Cii), red for (Intermetrics), blue (SofTech), yellow for (SRI) • Two finalists: green and red – requirements finalized as Steelman document CSE401 Au08 19 CSE401 Au08 20 5

  6. York Ada compiler (c. 1986) General syntax: examples from Steelman “Facts and Figures About the York Ada Compiler” (Wand et al.) • • • Written in C 2A. Character Set. The full set of 2D. Other Syntactic Issues. Multiple character graphics that may be occurrences of a language defined • About 80 KLOC for compiler used in source problems shall be symbol appearing in the same given in the language definition. context shall not have essentially – Front-end about 57 KLOC, code gen about 20 different meanings. … Every source program shall also have a representation that uses only • 2E. Mnemonic identifiers. KLOC, VAX-specific code gen about 3 KLOC the following 55 character subset of Mnemonically significant identifiers the ASClI graphics: … • 7 KLOC for run-time shall be allowed. There shall be a • 2B. Grammar. The language should break character for use within • “It is difficult to make an accurate estimate of the time taken to write the have a simple, uniform, and easily identifiers. The language and its compiler because the compiler writers had other demands on their time parsed grammar and lexical translators shall not permit (completing PhDs, teaching, etc.) . Fourteen individuals have been structure. The language shall have identifiers or reserved words to be free form syntax and should use abbreviated. … involved at various times during the project and have contributed familiar notations where such use approximately 20 man years to the design and construction of the • 2G. Numeric Literals. There shall be does not conflict with other goals. built-in decimal literals. There shall software . The money spent directly to support the construction of the be no implicit truncation or rounding compiler was [approximately $340k], however this included neither the of integer and fixed point literals salaries of four members of the project nor the cost of computer time (we used approximately 30% of a VAX- 11/780 over a five year period).” CSE401 Au08 21 CSE401 Au08 22 6

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend