Parser Larissa von Witte Institut fr Softwaretechnik und - PowerPoint PPT Presentation

Parser Larissa von Witte Institut für Softwaretechnik und Programmiersprachen 11. Januar 2016 L. v. Witte 11. Januar 2016 1/23

Contents Introduction Taxonomy Recursive Descent Parser Shift Reduce Parser Parser Generators Parse Tree Conclusion L. v. Witte 11. Januar 2016 2/23

Introduction ◮ analyses the syntax of an input text with a given grammar or regular expression ◮ returns a parse tree ◮ important for the further compiling process L. v. Witte 11. Januar 2016 3/23

Lookahead Definition: Lookahead The lookahead k are the following k tokens of the text, that are provided by the scanner. L. v. Witte 11. Januar 2016 4/23

Context-free Grammar Definition: Formal Grammar A formal grammar is a tuple G = ( T , N , S , P ) , with ◮ T as a finite set of terminal symbols ◮ N as a finite set of nonterminal symbols and N ∩ T = ∅ ◮ S as a start symbol and S ∈ N ◮ P as a finite set of production rules of the form l → r with l , r ∈ ( N ∪ T ) ∗ Definition: Context-free Grammar A grammar G = ( N , T , S , P ) is called context-free if every rule l → r holds the condition: l is a single nonterminal symbol, so l ∈ N . L. v. Witte 11. Januar 2016 5/23

LL(1) Grammar Definition: First ( A ) First ( A ) = { t | A ⇒ ∗ t α } ∪ { ε | A ⇒ ∗ ε } Definition: Follow ( A ) Follow ( A ) = { t | S ⇒ ∗ α At β } Definition: LL(1) Grammar A context-free grammar is called LL(1) grammar if it holds the following conditions for every rule A → α 1 | α 2 | . . . | α n with i � = j First ( α i ) ∩ First ( α j ) = ∅ ε ∈ First ( α i ) → Follow ( A ) ∩ First ( α j ) = ∅ L. v. Witte 11. Januar 2016 6/23

Recursive Descent Parser ◮ top-down parser ◮ basic idea: create an own parser parse A for every nonterminal symbol A ◮ every parser parse A is basically a method which consists of a case-by-case analysis ◮ it compares the lookahead with the expected symbols ◮ begins with parse S and determines the next parser based on the lookahead k (usually k = 1) ◮ needs LL(k) grammar for a distinct decision ◮ grammar must not be left recursive because it could lead to a non-terminating parser L. v. Witte 11. Januar 2016 7/23

Example: Recursive Descent Parser Example Grammar expression → number | ( expression operator expression ) operator → + | − | ∗ | / L. v. Witte 11. Januar 2016 8/23

Example: Recursive Descent Parser boolean parseOperator ( ) { char op = Text . getLookahead ( ) ; i f ( op == ’+ ’ | | op == ’ − ’ | | op == ’ ∗ ’ | | op == ’ / ’ ) { Text . removeChar ( ) ; / / removes the operator from the input return true ; } else { throwException ( ) ; } boolean parseExpression ( ) { i f ( Text . getLookahead ( ) . i s D i g i t ( ) ) { return parseNumber ( ) ; } else i f ( Text . getLookahead ( ) == ’ ( ’ ) { boolean check = true ; Text . removeChar ( ) ; check &= parseExpression ( ) && parseOperator ( ) && parseExpression ( ) ; i f ( Text . getLookahead ( ) != ’ ) ’ ) { throwException ( ) ; } else { return check ; } } else { throwException ( ) ; } } L. v. Witte 11. Januar 2016 9/23

Recursive descent parser ◮ often used for hand-written parsers ◮ needs special grammar ◮ often requires a grammar transformation ◮ usually lookahead = 1 L. v. Witte 11. Januar 2016 10/23

Shift Reduce Parser ◮ bottom-up parser ◮ uses a parser table to determine the next operation ◮ parser table gets the upper state of the stack and the lookahead as input and returns the operation L. v. Witte 11. Januar 2016 11/23

Shift Reduce Parser ◮ uses a push-down automaton to analyse the syntax of the input ◮ notation: α • au : ◮ α represents the already read and partially processed input (on the stack) ◮ au represents the tokens that are not yet analysed ◮ possible operations: ◮ shift : read the next token and switch to the state α a • u ◮ reduce : 1. detect the tail α 2 of α as the right side of the production rule A → α 2 2. remove α 2 from the top of the stack and put A on the stack transforms α 1 α 2 • au into α 1 A • au with the production rule A → α 2 L. v. Witte 11. Januar 2016 12/23

Example: Grammar & items ◮ grammar: S ′ → S eof (1) S → ( S ) (2) | [ S ] (3) | id (4) ◮ items: S ′ → • S eof S ′ → S • eof S ′ → S eof • S → • ( S ) S → ( • S ) S → ( S • ) S → ( S ) • S → • [ S ] S → [ • S ] S → [ S • ] S → [ S ] • S → • id S → id • L. v. Witte 11. Januar 2016 13/23

Example: Non-deterministic automaton start S ′ → • S eof S S ′ → S • eof S → • [ S ] S → • ( S ) S → • id [ ( id eof S ′ → S eof • S → [ • S ] S → ( • S ) S → id • S S S → [ S • ] S → ( S • ) ) ] S → [ S ] • S → ( S ) • L. v. Witte 11. Januar 2016 14/23

Example: Deterministic automaton ◮ every state is a set of the states of the non-deterministic automaton eof start B OK S A [ ( S S G D C F id ) ] id id I E H [ ( ◮ H,I,E and OK contain reduce items L. v. Witte 11. Januar 2016 15/23

Example: Parser table ◮ rows: states of the deterministic automaton ◮ columns: terminal and nonterminal symbols ◮ the resulting parser table: ( ) [ ] id S eof A C D E B B OK C D E F D C E G E r(4) r(4) r(4) F H G I H r(2) r(2) r(2) I r(3) r(3) r(3) L. v. Witte 11. Januar 2016 16/23

Shift Reduce Parser ◮ needs LR(k) grammar but modern grammars often are in that form ◮ often created by parser generators because they are complex L. v. Witte 11. Januar 2016 17/23

Parser Generators ◮ parser generators automatically generate parsers for a grammar or a regular expression. ◮ often LR or LALR parsers ◮ Yacc (“yet another compiler compiler”) and Bison are famous LALR-parser generators ◮ Bison generates two output files: 1. executable code 2. grammar and parser table L. v. Witte 11. Januar 2016 18/23

Example: Input for Bison ◮ input file consists of three parts that are seperated with %% : 1. declarations of the tokens 2. production rules 3. C-function that executes the parser (optional) % token ID % % S : ’ ( ’ S ’ ) ’ | ’ [ ’ S ’ ] ’ | ID ; % % L. v. Witte 11. Januar 2016 19/23

Example: Output of Bison Grammar 0 $accept : S $end 1 S: ’ ( ’ S ’ ) ’ 2 | ’ [ ’ S ’ ] ’ 3 | ID [ . . . ] State 0 0 $accept : . S $end ID s h i f t , and go to state 1 ’ ( ’ s h i f t , and go to state 2 ’ [ ’ s h i f t , and go to state 3 S go to state 4 State 1 3 S: ID . $default reduce using rule 3 (S) State 2 1 S: ’ ( ’ . S ’ ) ’ ID s h i f t , and go to state 1 ’ ( ’ s h i f t , and go to state 2 ’ [ ’ s h i f t , and go to state 3 S go to state 5 [ . . . ] L. v. Witte 11. Januar 2016 20/23

Parse Tree ◮ describes the derivation of the expression from the grammar ◮ important for the compiling process Example unambigous grammar: S → S + S | ( S − S ) | id expression: id + ( id − id ) S S S S S + ( - ) id id id L. v. Witte 11. Januar 2016 21/23

Parse Tree Example ambigous grammar: S → S + S | S − S | id expression: id + id − id S S S S S S S S S S - - + + id id id id id id L. v. Witte 11. Januar 2016 22/23

Conclusion ◮ choice of parser type is important because each one has its advantages ◮ parser development has become much easier with parser generators L. v. Witte 11. Januar 2016 23/23

Questions? L. v. Witte 11. Januar 2016 24/23

Parser Larissa von Witte Institut fr Softwaretechnik und - PowerPoint PPT Presentation

Parser Larissa von Witte Institut fr Softwaretechnik und Programmiersprachen 11. Januar 2016 L. v. Witte 11. Januar 2016 1/23 Contents Introduction Taxonomy Recursive Descent Parser Shift Reduce Parser Parser Generators Parse Tree

https://bazel.build/ Inputs /usr/bin/cc Action Outputs ./parser.h cc -I. -c parser.c -o

1 2 3+4 2 type Parser = String Tree type Parser = String ( Tree, String) type Parser =

Building a Predictive Parser I.e., How to build the parse table for a recursive-descent parser 1

Tasks of a Parser Tasks of a Parser Document Parser Interfaces Document Parser Interfaces

Parser Evaluation and the BNC Standard Parser Evaluation The Parsers Jennifer Foster and Josef

Ensemble Models for Dependency Parsing: Cheap and Good? Mihai Surdeanu and Christopher D. Manning

A Transition-Based Directed Acyclic Graph Parser for Universal Conceptual Cognitive Annotation

Models of Human Parsing Experimental Data 2 Informatics 2A: Lecture 22 Eye-tracking Reading

Outline LR Parsing Review of bottom-up parsing LALR Parser Generators Computing the

Keep Calm Keep Calm and Use Parser and Use Parser Nov, 2015 Howard Huang, Huawei Julien

Context-Free Grammars 19 March 2019 OSU CSE 1 BL Compiler Structure Code Tokenizer Parser

A Protocol for Leibowitz Travis Goodspeed, Sergey Bratus You say a radio, I say a parser You

Project1: Build A Small Scanner/Parser Introducing Lex, Yacc, and POET cs5363 1 Project1:

Staging Parser Combinators for Efficient Data Processing Parsing @ SLE, 14 September 2014

7. Building Compilers with Coco/R 7.1 Overview 7.2 Scanner Specification 7.3 Parser

3 3.1 Grammars and Sentence Structure 3.2 What Makes a Good Grammar 3.3 A Top-Down Parser 3.4 A

Packrat Parsin g: Sim ple, Powerfu l, Lazy, Lin ear Tim e Bryan Ford Massachusetts Institute of

JavaCC: SimpleExamples This directory contains five examples to get you started using JavaCC. Each

How Much Lookahead is Needed to Win Infinite Games? Joint work with Felix Klein (Saarland

Planning and Optimization F6. Determinization-based Algorithms Gabriele R oger and Thomas

EE 457 Unit 2b Fast Adders Carry-Lookahead Adders (Carry-Lookahead Adder) FAST ADDERS 2b.3

UMBC A B M A L T F O U M B C I M Y O R T 1 (December 11, 2000 3:44 pm) I E S

Self-applicable probabilistic inference without interpretive overhead Oleg Kiselyov Chung-chieh

hashes Hashes in lisp are basically a lookup table of key-value pairs can create/destroy