Grammars and Parsing Forth mini-homework If there is a number on - - PowerPoint PPT Presentation

grammars and parsing forth mini homework if there is a
SMART_READER_LITE
LIVE PREVIEW

Grammars and Parsing Forth mini-homework If there is a number on - - PowerPoint PPT Presentation

Grammars and Parsing Forth mini-homework If there is a number on the stack, and we enter dup dup * *, what will be on the stack? If there are three numbers on the stark, and we enter over -1 * over -1 * + + + * , what will be on the stack?


slide-1
SLIDE 1

Grammars and Parsing

slide-2
SLIDE 2

Forth mini-homework…

slide-3
SLIDE 3

If there is a number on the stack, and we enter dup dup * *, what will be on the stack?

slide-4
SLIDE 4

If there are three numbers on the stark, and we enter

  • ver -1 * over -1 * + + + *,

what will be on the stack?

slide-5
SLIDE 5

If we assume there are 2 values on the top of the stack, and we want to replace them with the sum of their squares, what would we type?

slide-6
SLIDE 6
  • If we assume there are at least 3 values on the

top of the stack, and we want to replace the top three with two values, so that the new top is one less than the old top, and the number right below it is the product of the other two we removed, what should we type?

: iter 1 - rot rot * swap ;

slide-7
SLIDE 7

If commands in FORTH

slide-8
SLIDE 8

: maybeadd1 dup 42 = invert if 1 + then ; 23 ok maybeadd1 ok .s <1> 24 ok drop ok 42 ok maybeadd1 ok .s <1> 42 ok

slide-9
SLIDE 9

if <handle-true> (else <handle-else>)? then

An if will be true if -1 (true) is on the stack

: maybeadd1 if 1 + then ; 23 -1 ok maybeadd1

slide-10
SLIDE 10

Grammars and Parsing

slide-11
SLIDE 11

(define my-tree '(+ 1 (* 2 3))) (define (evaluate-expr e) (match e [`(+ ,e1 ,e2) (+ (evaluate-expr e1) (evaluate-expr e2))] [`(* ,e1 ,e2) (* (evaluate-expr e2) (evaluate-expr e2))] [else e]))

This allows us to write interpreters

slide-12
SLIDE 12

Expr -> number Expr -> Expr + Expr Expr -> Expr * Expr

1 + 2 * 3

Expr

  • > Expr + Expr
  • > Expr + Expr * Expr
  • > number + Expr * Expr
  • > number + number * Expr
  • > number + number * number

Expr

  • > Expr * Expr
  • > Expr + Expr * Expr
  • > number + Expr * Expr
  • > number + number * Expr
  • > number + number * number
slide-13
SLIDE 13

Expr

  • > Expr + Expr
  • > number + Expr
  • > number + number
  • > 1 + number
  • > 1 + 2

Expr + Expr Expr Number Number 1 2

slide-14
SLIDE 14

This parse tree is a hierarchical representation of the data A parser is a program that automatically generates a parse tree A parser will generate an abstract syntax tree for the language

slide-15
SLIDE 15

Expr

  • > Expr + Expr
  • > Expr + Expr * Expr
  • > number + Expr * Expr
  • > number + number * Expr
  • > number + number * number

Expr

  • > Expr * Expr
  • > Expr + Expr * Expr
  • > number + Expr * Expr
  • > number + number * Expr
  • > number + number * number

Exercise: draw the parse trees for the following derivations

slide-16
SLIDE 16

<Expr> ::= <number> <Expr> ::= <Expr> + <Expr> <Expr> ::= <Expr> * <Expr>

BNF

(Bakus-Naur Form)

Slightly different form for writing CFGs, superficially different (BNF renders nicely in ASCII, but no huge differences) I write colloquially in some mix of BNF and more math style

slide-17
SLIDE 17

Two kinds of derivations

Leftmost derivation: The leftmost nonterminal is expanded first at each step Rightmost derivation: The rightmost nonterminal is expanded first at each step

slide-18
SLIDE 18

Work in groups

slide-19
SLIDE 19

G -> GG G -> a

Draw the leftmost derivation for…

aaa

Draw the rightmost derivation for…

aaa

slide-20
SLIDE 20

G -> G + G G -> G / G G -> number

Draw a leftmost derivation for… 1 / 2 / 3 Now draw another leftmost derivation

slide-21
SLIDE 21

Draw the parse trees for each derivation What does each parse tree mean?

slide-22
SLIDE 22

A grammar is ambiguous if there is a string with more than one leftmost derivation (Equiv: has more than one parse tree)

slide-23
SLIDE 23

Generally, we’re going to want our grammar to be unambiguous

slide-24
SLIDE 24

There’s another problem with this grammar (OOO)

G -> G + G G -> G / G G -> number

slide-25
SLIDE 25

We need to tackle ambiguity

slide-26
SLIDE 26

Idea: introduce extra nonterminals that force you to get left-associativity (Also force OOP)

slide-27
SLIDE 27

Add -> Add + Mul | Mul Mul -> Mul / Term | Term Term -> number

Draw the parse tree for 5 / 3 / 1 Write derivation for 5 / 3 / 1

slide-28
SLIDE 28

Add -> Add + Mul | Mul Mul -> Mul / Term | Term Term -> number This grammar is left recursive

slide-29
SLIDE 29

Add -> Add + Mul | Mul Mul -> Mul / Term | Term Term -> number

A grammar is left-recursive if any nonterminal A has a production of the form A -> A…

slide-30
SLIDE 30

Add -> Add + Mul | Mul Mul -> Mul / Term | Term Term -> number

This will turn out to be bad for one class of parsing algorithms

slide-31
SLIDE 31

Recursive-Descent Parsing

slide-32
SLIDE 32

Recursive-descent parsing is a simple parsing algorithm

slide-33
SLIDE 33

First, a digression on lexing

Let’s assume the get-token function will give me the next token

slide-34
SLIDE 34

Let’s say I want to parse the following grammar

S -> aSa | bb

slide-35
SLIDE 35

First, a few questions

S -> aSa | bb

If I were matching the string bb, what would my derivation look like? If I were matching the string abba, what would my derivation look like? Is this grammar ambiguous?

slide-36
SLIDE 36

First, a few questions

S -> aSa | bb

Key idea: if I look at the next input, at most one of these productions can “fire” If I see an a I know that I must use the first production If I see a b, I know I must be in second production

slide-37
SLIDE 37

Slight transformation..

S -> A | B A -> aAa B -> bb

slide-38
SLIDE 38

Slight transformation..

S -> A | B A -> aAa B -> bb

Now, I write out one function to parse each nonterminal

slide-39
SLIDE 39

FIRST(A)

FIRST(A) is the set of terminals that could occur first when I recognize A

Note: ε cannot be a member of FIRST because it is not a character

slide-40
SLIDE 40

NULLABLE

Is the set productions which could generate ε

slide-41
SLIDE 41

FOLLOW(A)

FOLLOW(A) is the set of terminals that appear immediately to the right of A in some form

slide-42
SLIDE 42

What is FIRST for each nonterminal What is NULLABLE for the grammar What is FOLLOW for each nonterminal

S -> A | B A -> aAa B -> bb

slide-43
SLIDE 43

E TE' E' +TE' E' ε T FT' T' *FT' T' ε F (E) F id

What is FIRST for each nonterminal What is NULLABLE for the grammar What is FOLLOW for each nonterminal More practice…

slide-44
SLIDE 44

Let’s say I want to parse S

A -> aAa | B B -> bb

I look at the next token, and I have two possible choices

If I see an a, I must parse an A If I see a b, I must parse a B

slide-45
SLIDE 45

We use the FIRST set to help us design our recursive-descent parser!

slide-46
SLIDE 46

Livecoding this parser in class

slide-47
SLIDE 47

The recursive-descent parsers we will cover are generally called predictive parsers, because they use lookahead to predict which production to handle next

slide-48
SLIDE 48

LL(1)

A grammar is LL(1) if we only have to look at the next token to decide which production will match! I.e., if S -> A | B, FIRST(A) ∩ FIRST(B) must be empty

slide-49
SLIDE 49

L L

eft to right eft derivation

1 token of lookahead

slide-50
SLIDE 50

Recursive-descent is called top-down parsing because you build a parse tree from the root down to the leaves

slide-51
SLIDE 51

There are also bottom-up parsers, which produce the rightmost derivation

Won’t talk about them, in general they’re impossibly-hard to write / understand, easier to use

slide-52
SLIDE 52
slide-53
SLIDE 53

Basically everyone uses lex and yacc to write real parsers Recursive-descent is easy to implement, but requires lots of messing around with grammar

slide-54
SLIDE 54

What about this grammar?

E -> E - T | T T -> number

slide-55
SLIDE 55

This grammar is left recursive

E -> E - T | T T -> number

What happens if we try to write recursive-descent parser?

slide-56
SLIDE 56

Infinite loop!

slide-57
SLIDE 57

We can remove left recursion

slide-58
SLIDE 58

E -> E - T | T T -> number E -> T E’ E’ -> - T E’ E’ -> ε

Factor!

slide-59
SLIDE 59

In general, if we have

A -> Aa | bB

Rewrite to…

A -> bB A’ A’ -> a A’ | ε Generalizes even further

https://en.wikipedia.org/wiki/LL_parser#Left_Factoring

slide-60
SLIDE 60

But this still doesn’t give us what we want!!!

E -> T E’ E’ -> - T E’ E’ -> ε E -> T E’

  • > T - T E’
  • > T - T - T E’
  • > T - T - T
slide-61
SLIDE 61

So how do we get left associativity?

Answer: Basically, stupid hack in implementation

slide-62
SLIDE 62

Sub -> num Sub’ Sub’ -> + num Sub’ | epsilon Sub -> num Sub’ (+ num)*

Is basically…

slide-63
SLIDE 63

Intuition: treat this as while loop, then when building parse tree, put in left-associative order

Sub -> num Sub’ (+ num)*

slide-64
SLIDE 64

Sub -> num Sub’ Sub’ -> + num Sub’ | epsilon

slide-65
SLIDE 65

Parsing is lame, it’s 2017

slide-66
SLIDE 66
slide-67
SLIDE 67

If you can, just use something like JSON / protobufs / etc… Inventing your own format is stupid For small / prototypical things, recursive-descent For real things, just use yacc