Parserpalloza Today, well implement a few recursive-descent parsers - - PowerPoint PPT Presentation

parserpalloza today we ll implement a few recursive
SMART_READER_LITE
LIVE PREVIEW

Parserpalloza Today, well implement a few recursive-descent parsers - - PowerPoint PPT Presentation

Parserpalloza Today, well implement a few recursive-descent parsers in groups Youll have to figure this out yourself in Lab 5 Ill post this code online after were done Take 2 minutes to find 1-2 group mates (you can work by


slide-1
SLIDE 1

Parserpalloza

slide-2
SLIDE 2

Today, we’ll implement a few recursive-descent parsers in groups You’ll have to figure this out yourself in Lab 5 I’ll post this code online after we’re done

slide-3
SLIDE 3

Take 2 minutes to find 1-2 group mates (you can work by yourself, too, but if you do you have to commit to programming, not sitting there)

Everyone must touch the keyboard once today If you get stuck, ask the group to your left / right first, not me If two groups stuck, I will help

slide-4
SLIDE 4

Key rule: At each step of the way, if I see some token next, what rule production must I choose

slide-5
SLIDE 5

FIRST(A)

FIRST(A) is the set of terminals that could occur first when I recognize A

slide-6
SLIDE 6

NULLABLE

Is the set productions which could generate ε

slide-7
SLIDE 7

FOLLOW(A)

FOLLOW(A) is the set of terminals that appear immediately to the right of A in some form

slide-8
SLIDE 8

What is FIRST for each nonterminal What is NULLABLE for the grammar What is FOLLOW for each nonterminal

S -> A | B A -> aAa B -> bb

slide-9
SLIDE 9

E TE' E' +TE' E' ε T FT' T' *FT' T' ε F (E) F id

What is FIRST for each nonterminal What is NULLABLE for the grammar What is FOLLOW for each nonterminal More practice…

slide-10
SLIDE 10

Let’s say I want to parse the following grammar

A -> aAa | bb

slide-11
SLIDE 11

To parse A, I check for either

A -> aAa | B B -> bb

FIRST(aAa) FIRST(B)

slide-12
SLIDE 12

A -> aAa | B B -> bb

(define (parse-A) (match curtok [#\a (begin (accept #\a) (parse-A) (accept #\a))] [#\b (parse-B)]))

slide-13
SLIDE 13

A -> aAa | B B -> bb

(define (parse-A) (match curtok [#\a (begin (accept #\a) (parse-A) (accept #\a))] [#\b (parse-B)]))

slide-14
SLIDE 14

A -> aAa | B B -> bb

(define (parse-A) (match curtok [#\a (begin (accept #\a) (parse-A) (accept #\a))] [#\b (parse-B)]))

slide-15
SLIDE 15

A -> aAa | B B -> bb

(define (parse-A) (match curtok [#\a (begin (accept #\a) (parse-A) (accept #\a))] [#\b (parse-B)]))

slide-16
SLIDE 16

A -> aAa | B B -> bb

(define (parse-A) (match curtok [#\a (begin (accept #\a) (parse-A) (accept #\a))] [#\b (parse-B)]))

slide-17
SLIDE 17

A -> aAa | B B -> bb

(define (parse-B) (begin (accept #\b) (accept #\b)))

slide-18
SLIDE 18

A general comment

You can often “follow your nose” for writing recursive descent parsers In this class we want you to follow this cookbook method. Make sure your parser follows the grammar (If you implement a parser for a different grammar that still works you will still lose points in lab) Comment each production (I didn’t do in slides for space)

slide-19
SLIDE 19

Challenge 1: Produce 2 strings in the language and one string out of the language Demonstrate how to parse them (or show parsing error)

slide-20
SLIDE 20

There are also bottom-up parsers, which produce the rightmost derivation

Won’t talk about them, in general they’re impossibly-hard to write / understand, easier to use

slide-21
SLIDE 21
slide-22
SLIDE 22

Basically everyone uses lex and yacc to write real parsers Recursive-descent is easy to implement, but requires messing around with grammar

slide-23
SLIDE 23

More practice with parsers

slide-24
SLIDE 24

Plus -> num MoreNums MoreNums -> + num MoreNums | ε

How would you do it? (Hint: Think about NULLABLE)

slide-25
SLIDE 25

Let’s think through this one on the board in pseudo-code

slide-26
SLIDE 26

Plus -> num MoreNums MoreNums -> + num MoreNums | ε

slide-27
SLIDE 27

(define (parse-Plus) (begin (parse-num) (parse-MorePlus))) (define (parse-MorePlus) (match curtok ['plus (begin (accept 'plus) (parse-num) (parse-MorePlus))] ['eof (void)]))

slide-28
SLIDE 28

Yet another (this one in the C++ files)

slide-29
SLIDE 29

START -> E ε E -> number E -> identifier E -> ( E_IN_PARENS ) E_IN_PARENS -> OP E E OP -> +|-|*

slide-30
SLIDE 30

Now yet another…. This will use the intuition from FOLLOW

slide-31
SLIDE 31

Add -> Term MoreTerms MoreTerms -> + Term MoreTerms MoreTerms -> ε Term -> num MoreNums MoreNums -> * num MoreNums | ε

slide-32
SLIDE 32

Add -> Term MoreTerms MoreTerms -> + Term MoreTerms MoreTerms -> ε Term -> num MoreNums MoreNums -> * num MoreNums | ε

Consider how we would implement MoreTerms

slide-33
SLIDE 33

Add -> Term MoreTerms MoreTerms -> + Term MoreTerms MoreTerms -> ε Term -> num MoreNums MoreNums -> * num MoreNums | ε

If you’re at the beginning of MoreTerms you have to see a +

slide-34
SLIDE 34

Add -> Term MoreTerms MoreTerms -> + Term MoreTerms MoreTerms -> ε Term -> num MoreNums MoreNums -> * num MoreNums | ε

If you’ve just seen a + you have to see FIRST(Term)

slide-35
SLIDE 35

Add -> Term MoreTerms MoreTerms -> + Term MoreTerms MoreTerms -> ε Term -> num MoreNums MoreNums -> * num MoreNums | ε

After Term you recognize something in FOLLOW(Term)

slide-36
SLIDE 36

Add -> Term MoreTerms MoreTerms -> + Term MoreTerms MoreTerms -> ε Term -> num MoreNums MoreNums -> * num MoreNums | ε

Because MoreTerms is NULLABLE, have to account for null

slide-37
SLIDE 37

Code up collectively….

slide-38
SLIDE 38

Let’s say I want to generate an AST

slide-39
SLIDE 39

(struct add (left right) #:transparent) (struct times (left right) #:transparent)

Model my AST…

slide-40
SLIDE 40

(struct add (left right) #:transparent) (struct times (left right) #:transparent)

Model my AST…

Now, modify your parser to generate this AST

slide-41
SLIDE 41

More Recursive-descent practice…

(We’ll skip this for now and you can do it by yourself)

slide-42
SLIDE 42

Write recursive-descent parsers for the following….

slide-43
SLIDE 43

A grammar for S-Expressions

slide-44
SLIDE 44

datum ::= number | string | identifier | ‘SExpr SExpr ::= (SExprs) | datum SExprs ::= SExpr SExprs | ε

Parsing mini-Racket / Scheme

slide-45
SLIDE 45

S -> a C H | b H C H -> b H | d C -> e C | f C

slide-46
SLIDE 46

E -> A E -> L A -> n A -> i L -> ( S ) S -> E S’ S’ -> , S S’ -> ε

slide-47
SLIDE 47

So far, I’ve given you grammars that are amenable to LL(1) parsers… (Many grammars are not) (But you can manipulate them to be!)

slide-48
SLIDE 48

What about this grammar?

E -> E - T | T T -> number

slide-49
SLIDE 49

This grammar is left recursive

E -> E - T | T T -> number

What happens if we try to write recursive-descent parser?

slide-50
SLIDE 50

This grammar is left recursive

E -> E - T | T T -> number

slide-51
SLIDE 51

We really want this grammar, because it corresponds to the correct notion of associativity

slide-52
SLIDE 52

5 - 3 - 1

E -> E - T | T T -> number

slide-53
SLIDE 53

Infinite loop!

slide-54
SLIDE 54

5 - 3 - 1

E -> E - T | T T -> number

A recursive descent parser will first call parse-E And then crash

slide-55
SLIDE 55

5 - 3 - 1

Draw the rightmost derivation for this string

E -> E - T | T T -> number

slide-56
SLIDE 56

If we could only have the rightmost derivation, our problem would be solved

slide-57
SLIDE 57

The problem is, a recursive-descent parser needs to look at the next input immediately

slide-58
SLIDE 58

Recursive descent parsers work by looking at the next token and making a decision / prediction Rightmost derivations require us to delay making choices about the input until later As humans, we naturally guess which derivation to use (for small examples)

Thus, LL(k) parsers cannot generate rightmost derivations :(

slide-59
SLIDE 59

We can remove left recursion

slide-60
SLIDE 60

E -> E - T | T T -> number E -> T E’ E’ -> - T E’ E’ -> ε

Factor!

slide-61
SLIDE 61

In general, if we have

A -> Aa | bB

Rewrite to…

A -> bB A’ A’ -> a A’ | ε Generalizes even further

https://en.wikipedia.org/wiki/LL_parser#Left_Factoring

slide-62
SLIDE 62

But this still doesn’t give us what we want!!!

E -> T E’ E’ -> - T E’ E’ -> ε E -> T E’

  • > T - T E’
  • > T - T - T E’
  • > T - T - T
slide-63
SLIDE 63

So how do we get left associativity?

Answer: Basically, hack in implementation

slide-64
SLIDE 64

Sub -> num Sub’ Sub’ -> + num Sub’ | epsilon Sub -> num Sub’ (+ num)*

Is basically…

slide-65
SLIDE 65

Intuition: treat this as while loop, then when building parse tree, put in left-associative order

Sub -> num Sub’ (+ num)*

slide-66
SLIDE 66

Sub -> num Sub’ Sub’ -> + num Sub’ | epsilon

slide-67
SLIDE 67

If you want to get rightmost derivation, you need to use an LR parser

slide-68
SLIDE 68

input: /* empty */ | input line ; line: '\n' | exp '\n' { printf ("\t%.10g\n", $1); } ; exp: NUM { $$ = $1; } | exp exp '+' { $$ = $1 + $2; } | exp exp '-' { $$ = $1 - $2; } | exp exp '*' { $$ = $1 * $2; } | exp exp '/' { $$ = $1 / $2; } /* Exponentiation */ | exp exp '^' { $$ = pow ($1, $2); } /* Unary minus */ | exp 'n' { $$ = -$1; } ;

slide-69
SLIDE 69

Parsing is lame, it’s 2017

slide-70
SLIDE 70
slide-71
SLIDE 71

If you can, just use something like JSON / protobufs / etc… Inventing your own format is probably wrong For small / prototypical things, recursive-descent For real things, use yacc / bison / ANTLR