Recursive-Descent Parsing 22 March 2019 OSU CSE 1 BL Compiler - - PowerPoint PPT Presentation

recursive descent parsing
SMART_READER_LITE
LIVE PREVIEW

Recursive-Descent Parsing 22 March 2019 OSU CSE 1 BL Compiler - - PowerPoint PPT Presentation

Recursive-Descent Parsing 22 March 2019 OSU CSE 1 BL Compiler Structure Code Tokenizer Parser Generator string of string of abstract string of characters tokens program integers (source code) (words) (object code) Note that


slide-1
SLIDE 1

Recursive-Descent Parsing

22 March 2019 OSU CSE 1

slide-2
SLIDE 2

BL Compiler Structure

22 March 2019 OSU CSE 2

Code Generator Parser Tokenizer string of characters (source code) string of tokens (“words”) abstract program string of integers (object code)

Note that the parser starts with a string of tokens.

slide-3
SLIDE 3

Plan for the BL Parser

  • Design a context-free grammar (CFG) to

specify syntactically valid BL programs

  • Use the grammar to implement a

recursive-descent parser (i.e., an algorithm to parse a BL program and construct the corresponding Program

  • bject)

22 March 2019 OSU CSE 3

slide-4
SLIDE 4

Parsing

  • A CFG can be used to generate strings in

its language

– “Given the CFG, construct a string that is in the language”

  • A CFG can also be used to recognize

strings in its language

– “Given a string, decide whether it is in the language” – And, if it is, construct a derivation tree (or AST)

22 March 2019 OSU CSE 4

slide-5
SLIDE 5

Parsing

  • A CFG can be used to generate strings in

its language

– “Given the CFG, construct a string that is in the language”

  • A CFG can also be used to recognize

strings in its language

– “Given a string, decide whether it is in the language” – And, if it is, construct a derivation tree (or AST)

22 March 2019 OSU CSE 5

Parsing generally refers to this last step, i.e., going from a string (in the language) to its derivation tree or— for a programming language— perhaps to an AST for the program.

slide-6
SLIDE 6

A Recursive-Descent Parser

  • One parse method per non-terminal symbol
  • A non-terminal symbol on the right-hand side of

a rewrite rule leads to a call to the parse method for that non-terminal

  • A terminal symbol on the right-hand side of a

rewrite rule leads to “consuming” that token from the input token string

  • | in the CFG leads to “if-else” in the parser

22 March 2019 OSU CSE 6

slide-7
SLIDE 7

Example: Arithmetic Expressions

expr → expr add-op term | term term → term mult-op factor | factor factor → ( expr ) | digit-seq add-op → + | - mult-op → * | DIV | REM digit-seq → digit digit-seq | digit digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

22 March 2019 OSU CSE 7

slide-8
SLIDE 8

A Problem

expr → expr add-op term | term term → term mult-op factor | factor factor → ( expr ) | digit-seq add-op → + | - mult-op → * | DIV | REM digit-seq → digit digit-seq | digit digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

22 March 2019 OSU CSE 8

Do you see a problem with a recursive descent parser for this CFG? (Hint!)

slide-9
SLIDE 9

A Solution

expr → term { add-op term } term → factor { mult-op factor } factor → ( expr ) | digit-seq add-op → + | - mult-op → * | DIV | REM digit-seq → digit digit-seq | digit digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

22 March 2019 OSU CSE 9

slide-10
SLIDE 10

A Solution

expr → term { add-op term } term → factor { mult-op factor } factor → ( expr ) | digit-seq add-op → + | - mult-op → * | DIV | REM digit-seq → digit digit-seq | digit digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

22 March 2019 OSU CSE 10

The special CFG symbols { and } mean that the enclosed sequence of symbols occurs zero or more times; this helps change a left-recursive CFG into an equivalent CFG that can be parsed by recursive descent.

slide-11
SLIDE 11

A Solution

expr → term { add-op term } term → factor { mult-op factor } factor → ( expr ) | number add-op → + | - mult-op → * | DIV | REM number → 0 | nz-digit { 0 | nz-digit } nz-digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

22 March 2019 OSU CSE 11

The special CFG symbols { and } also simplify a non-terminal for a number that has no leading zeroes.

slide-12
SLIDE 12

A Recursive-Descent Parser

  • One parse method per non-terminal symbol
  • A non-terminal symbol on the right-hand side of

a rewrite rule leads to a call to the parse method for that non-terminal

  • A terminal symbol on the right-hand side of a

rewrite rule leads to “consuming” that token from the input token string

  • | in the CFG leads to “if-else” in the parser
  • {...} in the CFG leads to “while” in the parser

22 March 2019 OSU CSE 12

slide-13
SLIDE 13

More Improvements

expr → term { add-op term } term → factor { mult-op factor } factor → ( expr ) | number add-op → + | - mult-op → * | DIV | REM number → 0 | nz-digit { 0 | nz-digit } nz-digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

22 March 2019 OSU CSE 13

If we treat every number as a token, then things get simpler for the parser: now there are only 5 non- terminals to worry about.

slide-14
SLIDE 14

More Improvements

expr → term { add-op term } term → factor { mult-op factor } factor → ( expr ) | number add-op → + | - mult-op → * | DIV | REM number → 0 | nz-digit { 0 | nz-digit } nz-digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

22 March 2019 OSU CSE 14

If we treat every add-op and mult-op as a token, then it’s even simpler: there are only 3 non-terminals.

slide-15
SLIDE 15

Improvements

expr → term { add-op term } term → factor { mult-op factor } factor → ( expr ) | number add-op → + | - mult-op → * | DIV | REM number → 0 | nz-digit { 0 | nz-digit } nz-digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

22 March 2019 OSU CSE 15

Can you write the tokenizer for this language, so every number, add-op, and mult-op is a token?

slide-16
SLIDE 16

Evaluating Arithmetic Expressions

  • For this problem, parsing an arithmetic

expression means evaluating it

  • The parser goes from a string of tokens in

the language of the CFG on the previous slide, to the value of that expression as an int

22 March 2019 OSU CSE 16

slide-17
SLIDE 17

Structure of Solution

22 March 2019 OSU CSE 17

Parser Tokenizer string of characters (arithmetic expression) string of tokens value of arithmetic expression "4 + 29 DIV 3" <"4", "+", "29", "DIV", "3", "EOI"> 13

slide-18
SLIDE 18

Structure of Solution

22 March 2019 OSU CSE 18

Parser Tokenizer string of characters (arithmetic expression) string of tokens value of arithmetic expression "4 + 29 DIV 3" 13

We will use a Queue<String> to hold a mathematical value like this.

<"4", "+", "29", "DIV", "3", "EOI">

slide-19
SLIDE 19

Structure of Solution

22 March 2019 OSU CSE 19

Parser Tokenizer string of characters (arithmetic expression) string of tokens value of arithmetic expression "4 + 29 DIV 3" 13

We will also assume the tokenizer adds an “end-of-input” token at the end.

<"4", "+", "29", "DIV", "3", "EOI">

slide-20
SLIDE 20

Parsing an expr

  • We want to parse an expr, which must

start with a term and must be followed by zero or more (pairs of) add-ops and terms:

expr → term { add-op term }

  • An expr has an int value, which is what

we want returned by the method to parse an expr

22 March 2019 OSU CSE 20

slide-21
SLIDE 21

Contract for Parser for expr

/** * Evaluates an expression and returns its value. * ... * @updates ts * @requires * [an expr string is a proper prefix of ts] * @ensures * valueOfExpr = [value of longest expr string at * start of #ts] and * #ts = [longest expr string at start of #ts] * ts */ private static int valueOfExpr(Queue<String> ts) {...}

22 March 2019 OSU CSE 21

slide-22
SLIDE 22

Parsing a term

  • We want to parse a term, which must start

with a factor and must be followed by zero

  • r more (pairs of) mult-ops and factors:

term → factor { mult-op factor }

  • A term has an int value, which is what

we want returned by the method to parse a term

22 March 2019 OSU CSE 22

slide-23
SLIDE 23

Contract for Parser for term

/** * Evaluates a term and returns its value. * ... * @updates ts * @requires * [a term string is a proper prefix of ts] * @ensures * valueOfTerm = [value of longest term string at * start of #ts] and * #ts = [longest term string at start of #ts] * ts */ private static int valueOfTerm(Queue<String> ts) {...}

22 March 2019 OSU CSE 23

slide-24
SLIDE 24

Parsing a factor

  • We want to parse a factor, which must

start with the token "(" followed by an expr followed by the token ")"; or it must be a number token:

factor → ( expr ) | number

  • A factor has an int value, which is what

we want returned by the method to parse a factor

22 March 2019 OSU CSE 24

slide-25
SLIDE 25

Contract for Parser for factor

/** * Evaluates a factor and returns its value. * ... * @updates ts * @requires * [a factor string is a proper prefix of ts] * @ensures * valueOfFactor = [value of longest factor string at * start of #ts] and * #ts = [longest factor string at start of #ts] * ts */ private static int valueOfFactor(Queue<String> ts){ ... }

22 March 2019 OSU CSE 25

slide-26
SLIDE 26

Code for Parser for expr

private static int valueOfExpr(Queue<String> ts) { int value = valueOfTerm(ts); while (ts.front().equals("+") || ts.front().equals("-")) { String op = ts.dequeue(); int nextTerm = valueOfTerm(ts); if (op.equals("+")) { value = value + nextTerm; } else /* "-" */ { value = value - nextTerm; } } return value; }

22 March 2019 OSU CSE 26

slide-27
SLIDE 27

Code for Parser for expr

private static int valueOfExpr(Queue<String> ts) { int value = valueOfTerm(ts); while (ts.front().equals("+") || ts.front().equals("-")) { String op = ts.dequeue(); int nextTerm = valueOfTerm(ts); if (op.equals("+")) { value = value + nextTerm; } else /* "-" */ { value = value - nextTerm; } } return value; }

22 March 2019 OSU CSE 27

expr → term { add-op term } add-op → + | -

slide-28
SLIDE 28

Code for Parser for expr

private static int valueOfExpr(Queue<String> ts) { int value = valueOfTerm(ts); while (ts.front().equals("+") || ts.front().equals("-")) { String op = ts.dequeue(); int nextTerm = valueOfTerm(ts); if (op.equals("+")) { value = value + nextTerm; } else /* "-" */ { value = value - nextTerm; } } return value; }

22 March 2019 OSU CSE 28

This method is very similar to valueOfExpr.

slide-29
SLIDE 29

Code for Parser for expr

private static int valueOfExpr(Queue<String> ts) { int value = valueOfTerm(ts); while (ts.front().equals("+") || ts.front().equals("-")) { String op = ts.dequeue(); int nextTerm = valueOfTerm(ts); if (op.equals("+")) { value = value + nextTerm; } else /* "-" */ { value = value - nextTerm; } } return value; }

22 March 2019 OSU CSE 29

Look ahead

  • ne token in

ts to see what’s next.

slide-30
SLIDE 30

Code for Parser for expr

private static int valueOfExpr(Queue<String> ts) { int value = valueOfTerm(ts); while (ts.front().equals("+") || ts.front().equals("-")) { String op = ts.dequeue(); int nextTerm = valueOfTerm(ts); if (op.equals("+")) { value = value + nextTerm; } else /* "-" */ { value = value - nextTerm; } } return value; }

22 March 2019 OSU CSE 30

“Consume” the next token from ts.

slide-31
SLIDE 31

Code for Parser for expr

private static int valueOfExpr(Queue<String> ts) { int value = valueOfTerm(ts); while (ts.front().equals("+") || ts.front().equals("-")) { String op = ts.dequeue(); int nextTerm = valueOfTerm(ts); if (op.equals("+")) { value = value + nextTerm; } else /* "-" */ { value = value - nextTerm; } } return value; }

22 March 2019 OSU CSE 31

Evaluate next term.

slide-32
SLIDE 32

Code for Parser for expr

private static int valueOfExpr(Queue<String> ts) { int value = valueOfTerm(ts); while (ts.front().equals("+") || ts.front().equals("-")) { String op = ts.dequeue(); int nextTerm = valueOfTerm(ts); if (op.equals("+")) { value = value + nextTerm; } else /* "-" */ { value = value - nextTerm; } } return value; }

22 March 2019 OSU CSE 32

Evaluate (some of) the expression.

slide-33
SLIDE 33

Code for Parser for term

private static int valueOfTerm(Queue<String> ts) { }

22 March 2019 OSU CSE 33

Can you write the body, using valueOfExpr as a guide?

slide-34
SLIDE 34

Code for Parser for factor

private static int valueOfFactor( Queue<String> ts) { int value; if (ts.front().equals("(")) { ts.dequeue(); value = valueOfExpr(ts); ts.dequeue(); } else { String number = ts.dequeue(); value = Integer.parseInt(number); } return value; }

22 March 2019 OSU CSE 34

slide-35
SLIDE 35

Code for Parser for factor

private static int valueOfFactor( Queue<String> ts) { int value; if (ts.front().equals("(")) { ts.dequeue(); value = valueOfExpr(ts); ts.dequeue(); } else { String number = ts.dequeue(); value = Integer.parseInt(number); } return value; }

22 March 2019 OSU CSE 35

factor → ( expr ) | number

slide-36
SLIDE 36

Code for Parser for factor

private static int valueOfFactor( Queue<String> ts) { int value; if (ts.front().equals("(")) { ts.dequeue(); value = valueOfExpr(ts); ts.dequeue(); } else { String number = ts.dequeue(); value = Integer.parseInt(number); } return value; }

22 March 2019 OSU CSE 36

Look ahead

  • ne token in

ts to see what’s next.

slide-37
SLIDE 37

Code for Parser for factor

private static int valueOfFactor( Queue<String> ts) { int value; if (ts.front().equals("(")) { ts.dequeue(); value = valueOfExpr(ts); ts.dequeue(); } else { String number = ts.dequeue(); value = Integer.parseInt(number); } return value; }

22 March 2019 OSU CSE 37

What token does this throw away?

slide-38
SLIDE 38

Code for Parser for factor

private static int valueOfFactor( Queue<String> ts) { int value; if (ts.front().equals("(")) { ts.dequeue(); value = valueOfExpr(ts); ts.dequeue(); } else { String number = ts.dequeue(); value = Integer.parseInt(number); } return value; }

22 March 2019 OSU CSE 38

Though method is called parseInt, it is not one of our parser methods; it is a static method from the Java library’s Integer class (with int utilities).

slide-39
SLIDE 39

Code for Parser for factor

private static int valueOfFactor( Queue<String> ts) { int value; if (ts.front().equals("(")) { ts.dequeue(); value = valueOfExpr(ts); ts.dequeue(); } else { String number = ts.dequeue(); value = Integer.parseInt(number); } return value; }

22 March 2019 OSU CSE 39

Recursive descent: notice that valueOfExpr calls valueOfTerm, which calls valueOfFactor, which here may call valueOfExpr.

slide-40
SLIDE 40

Code for Parser for factor

private static int valueOfFactor( Queue<String> ts) { int value; if (ts.front().equals("(")) { ts.dequeue(); value = valueOfExpr(ts); ts.dequeue(); } else { String number = ts.dequeue(); value = Integer.parseInt(number); } return value; }

22 March 2019 OSU CSE 40

How do you know this (indirect) recursion terminates?

slide-41
SLIDE 41

A Recursive-Descent Parser

  • One parse method per non-terminal symbol
  • A non-terminal symbol on the right-hand side of

a rewrite rule leads to a call to the parse method for that non-terminal

  • A terminal symbol on the right-hand side of a

rewrite rule leads to “consuming” that token from the input token string

  • | in the CFG leads to “if-else” in the parser
  • {...} in the CFG leads to “while” in the parser

22 March 2019 OSU CSE 41

slide-42
SLIDE 42

Observations

  • This is so formulaic that tools are available

that can generate RDPs from CFGs

  • In the lab, you will write an RDP for a

language similar to the one illustrated here

– The CFG will be a bit different – There will be no tokenizer, so you will parse a string of characters in a Java StringBuilder

  • See methods charAt and deleteCharAt

22 March 2019 OSU CSE 42

slide-43
SLIDE 43

Resources

  • Wikipedia: Recursive Descent Parser

– http://en.wikipedia.org/wiki/Recursive_descent_parser

  • Java Libraries API: StringBuilder

– http://docs.oracle.com/javase/8/docs/api/

22 March 2019 OSU CSE 43