[PPT] - Compiler Construction Lecture 7: Bottom-up parsing 2020-01-28 PowerPoint Presentation

SLIDE 1

Compiler Construction

Lecture 7: Bottom-up parsing 2020-01-28 Michael Engel

Includes material by Jan Christian Meyer and Rich Maclin (UNM)

SLIDE 2

Compiler Construction 07: Bottom-up parsing

2

Overview

Top-down parsing revisited
Bottom-up parsing
Comparison to top-down parsing
Shift-reduce parsers
Conflict resolution

SLIDE 3

Compiler Construction 07: Bottom-up parsing

3

Types of languages and automata

Context-free languages are a superset of regular languages
Regular languages can be detected by DFAs/NFAs
DFAs and NFAs don’t have a memory
Stack machines (also called pushdown automata)

add memory by introducing operations  push and pop

They enable the stack

machine to memorize  (trace) the path they  took to get to a state  (and revert to a   previous one)

More powerful than D/NFA

Syntax analysis regular languages (type 3) context-free (type 2) context-sensitive  (type 1) recursively enumerable  (type 0)

Finite automata Stack machines

SLIDE 4

Compiler Construction 07: Bottom-up parsing

4

Top-down parsing and the stack

We’ve seen LL(1) tables and manually

built recursive descent parsers

Another simple example:

Syntax analysis

x y EOF A A→xB A→yC B B→xB B→ε C C→yC C→ε

A → xB | yC   B → xB | ε   C → yC | ε

void parse_A() { switch (sym) { case 'x': add_tree(x,B);  match(x);  parse_B();  break; case 'y': add_tree(y,C);  match(y);  parse_C();  break; case EOF: error();  break; } return; }

void parse_B() { switch (sym): case 'x': add_tree(x,B);  match(x);  parse_B();  break; case 'y': error(); break; case EOF: return; return; } void parse_C() { switch (sym): case 'x': error(); break; case 'y': add_tree(y,C);  match(y);  parse_C();  break; case EOF: return; return; }

SLIDE 5

Compiler Construction 07: Bottom-up parsing

5

Tracing the recursive descent code

Which derivation do we get when parsing "y y y"?

A → yC → yyC → yyyC → yyy

What is the related hierarchy of function calls?

Syntax analysis

A → xB | yC   B → xB | ε   C → yC | ε

parse_A parse_A match(y) parse_A parse_C match(y) parse_A parse_A parse_C parse_A parse_C parse_A parse_C parse_C match(y) parse_A parse_C parse_C parse_C parse_A parse_C parse_C parse_C match(y)

time Recur: Unwind: time

parse_A parse_C parse_C parse_C match(y)

…

parse_A parse_C parse_C parse_C parse_A parse_C parse_C parse_A parse_C parse_A Call Call Call Call Call Call parse_A parse_C parse_C Return Return Return Call Call Return Return Return Return Finished

SLIDE 6

Compiler Construction 07: Bottom-up parsing

Memory in recursive descent code

Where is the memory hidden in our parser?
We do not explicitly store and retrieve state
The programming language hides it:
When calling (returning) from a function, state is pushed onto

(popped from) the computer’s stack automatically

This state includes the return address of the call site
We can also build LL(1) parsers using iterations
but then we have to implement our own stack…
The stack is needed to match beginnings and ends of productions
Any production of the form A → xBy where B can contain further

instances of x and y, such as: 

Expression → (Expression)  Statement → {Statement}  Comment → (* Comment *)

Syntax analysis

parse_A parse_A match(y) parse_A parse_C match(y) parse_A parse_A parse_C parse_A parse_C Call Call Call Call Call Return Return

SLIDE 7

Compiler Construction 07: Bottom-up parsing

7

Top-down parsing and the syntax tree

Syntax analysis

LL(1) parsers generate a parse tree from top to bottom:

𝜸 𝛃 𝛕 u0 uR v1 u2 v

initial part  

f the input

token stream   that is already   derived input token stream   remaining to be read

Part of the syntax tree that has already been derived 𝛃: current NT symbol

At this point, the parser tries to find a derivation for 𝛃:  u0𝛽u2 → u0vu2 uR has to be derivable from u2   to complete parsing  (otherwise: syntax error)

SLIDE 8

Compiler Construction 07: Bottom-up parsing

8

Bottom-up parsing

Syntax analysis

Can we also construct the parse tree from bottom to top?

𝛃 𝛕 u0 uR u u2

initial part already   reduced input token stream   remaining to be read

v1 𝛃 v2 u2

We try to guess a production 𝛃 → v1v2

SLIDE 9

Compiler Construction 07: Bottom-up parsing

9

General idea of bottom-up parsing

Syntax analysis

Bottom-up parsing starts from the input token stream (whereas

top-down starts from the grammar start symbol)

It reduces a string to the start symbol by inverting productions
trying to find a production matching the right hand side

E → T + E | T  T → int × T | int | ε E ← T + E | T  T ← int × T | int | ε

Consider the input token

stream int * int + int:

Reading the productions

in reverse (from bottom to top) gives a rightmost derivation

int × int + int T → int  int × T + int T → int × T  T + int T → int  T + T E → T  T + E E → T + E  E

SLIDE 10

Compiler Construction 07: Bottom-up parsing

A bottom-up parser traces a rightmost derivation in reverse

10

The resulting parse tree

Syntax analysis

int × int + int  int × T + int   T + int   T + T   T + E   E

E T E ×

+

T T int int int

SLIDE 11

Compiler Construction 07: Bottom-up parsing

11

A simple bottom-up parsing algo

Syntax analysis

I = input string repeat select a non-empty substring 𝛾 of I  where X→𝛾 is a production in the grammar if no such 𝛾 exists, backtrack   replace one 𝛾 by X in I until I == "S" /* start symbol */  

r all other possibilities exhausted /* error */
Idea: split input string (token stream) into two substrings
Right substring (a string of terminal symbols) has not been

examined so far

Left substring has terminals and nonterminals (generated by

replacing the right side of a production by the left side) 

SLIDE 12

Compiler Construction 07: Bottom-up parsing

12

Bottom-up parsing steps

Syntax analysis

I = input string repeat select a non-empty substring 𝛾 of I  where X→𝛾 is a production in the grammar if no such 𝛾 exists, backtrack   replace one 𝛾 by X in I until I == "S" /* start symbol */  

r all other possibilities exhausted /* error */
Initially, all input is unexamined,

written as: ↑x1x2x3…xn Two kinds of operations:

Shift: move ↑one place to the right

ABC↑xyz ABCx↑yz

Reduce: Apply an inverse production at the right end of the left string
If A → xy is a production, then

Cbxy↑ijk CbA↑ijk

SLIDE 13

Compiler Construction 07: Bottom-up parsing

13

Example with reductions only

Syntax analysis

int × int ↑ + int

E → T + E | T  T → int × T | int | ε

int × T ↑ + int reduce T → int reduce T → int × T T + int ↑ reduce T → int T + T ↑ reduce E → T T + E ↑ reduce E → T + E

SLIDE 14

Compiler Construction 07: Bottom-up parsing

14

Example with shift-reduce parsing

Syntax analysis

↑ int × int + int shift int ↑ × int + int shift int × ↑ int + int shift int × int ↑ + int reduce T → int int × T ↑ + int reduce T → int × T T ↑ + int shift T + ↑ int shift T + int ↑ reduce T → int  T + T ↑ reduce E → T T + E ↑ reduce E → T + E  E (arrived at start symbol!)

E → T + E | T  T → int × T | int | ε

SLIDE 15

Compiler Construction 07: Bottom-up parsing

15

Implementing the memory

Syntax analysis

Idea:

Left substring can be implemented

by a stack

shift operating pushes a terminal symbol onto the stack
reduce pops zero or more symbols off the stack (the right-

hand side of a production) and pushes a non-terminal symbol onto the stack (left-hand side of a production)

E → T + E | T  T → int × T | int | ε

[] ↑ int × int + int shift: push [int] [int] int ↑ × int + int shift: push [×] [int, ×] int × ↑ int + int shift: push [int] [int, ×, int] int × int ↑ + int reduce T → int: pop->int, push[T] [int, ×, T] int × int ↑ + int reduce T → int × T: pop, push[T] [T] int × int ↑ + int … input token stream parser operation: stack operation(s) stack contents

SLIDE 16

Compiler Construction 07: Bottom-up parsing

16

Conflicts in parsing

Problem:

How do we decide when to shift or reduce?
Consider the step int ↑ × int + int
We could reduce using T → int giving T ↑ × int + int
A fatal mistake: No way to reduce to the start symbol E
Generic shift-reduce strategy:
If there is a matching pattern (handle) on the stack, reduce
Otherwise, shift
What if there is a choice (between two matching patterns)?
If it’s legal to shift or reduce, there is a shift-reduce conflict
If it is legal to reduce by two different productions, there is a

reduce-reduce conflict

Syntax analysis

SLIDE 17

Compiler Construction 07: Bottom-up parsing

17

Source of conflicts and example

Conflicts arise due to:

Ambiguous grammars: always cause conflicts
But beware, so do many non-ambiguous grammars
Conflict example

Syntax analysis

E → E + E   | E × E  | ( E )  | int

Grammar

↑ int × int + int shift … E × E ↑ + int reduce E → int × E E ↑ + int shift E + ↑ int shift E + int ↑ reduce E → int E + E ↑ reduce E → E + E E ↑

SLIDE 18

Compiler Construction 07: Bottom-up parsing

18

Source of conflicts and example

Another derivation   is also possible:

Syntax analysis

E → E + E   | E × E  | ( E )  | int

↑ int × int + int shift … E × E ↑ + int reduce E → int × E E ↑ + int shift E + ↑ int shift E + int ↑ reduce E → int E + E ↑ reduce E → E + E E ↑ ↑ int × int + int shift … E × E ↑ + int shift E × E + ↑ int shift E × E + int ↑ reduce E → int E × E + E ↑ reduce E → E + E E × E ↑ reduce E → E × E E ↑

We can decide to either shift or reduce in this step

The choice whether to shift or reduce determines the associativity of + and ×!

SLIDE 19

Compiler Construction 07: Bottom-up parsing

19

Resolving conflicts: precedence

Syntax analysis

E → E + E   | E × E  | ( E )  | int

The choice whether to shift or reduce determines the associativity of + and ×

We could rewrite the grammar to enforce precedence

(as seen with top-down parsing)

Alternative:

provide precedence declarations

these cause shift-reduce parsers

to resolve conflicts in certain ways

Declaring “× has greater precedence than +” causes

parser to reduce at E × E ↑ + int

More precisely, precedence declaration is used to

resolve conflict between reducing a × and shifting a +

The term “precedence declaration”   is misleading. These declarations do   not define precedence; they define   conflict resolutions

SLIDE 20

Compiler Construction 07: Bottom-up parsing

20

What now?

Our key ingredients for bottom-up parsing:
a stack to shift and reduce symbols on
an automaton that can use stacked history to backtrack its

footsteps

The LR(k) family of languages can all be parser using a shift-reduce

parser like this

The complexity of the grammars you can handle is related to how

elaborate your automaton is

several variants: SLR, LALR, LR(1)
Let’s start with a simple one, LR(0), in the next lecture

Syntax analysis

Compiler Construction

Lecture 7: Bottom-up parsing 2020-01-28 Michael Engel

Overview

Types of languages and automata

add memory by introducing operations push and pop

machine to memorize (trace) the path they took to get to a state (and revert to a previous one)

Top-down parsing and the stack

built recursive descent parsers

x y EOF A A→xB A→yC B B→xB B→ε C C→yC C→ε

A → xB | yC B → xB | ε C → yC | ε

Tracing the recursive descent code

A → yC → yyC → yyyC → yyy

A → xB | yC B → xB | ε C → yC | ε

time Recur: Unwind: time

…

Memory in recursive descent code

(popped from) the computer’s stack automatically

instances of x and y, such as:

Expression → (Expression) Statement → {Statement} Comment → (* Comment *)

Top-down parsing and the syntax tree

LL(1) parsers generate a parse tree from top to bottom:

𝜸 𝛃 𝛕 u0 uR v1 u2 v

At this point, the parser tries to find a derivation for 𝛃: u0𝛽u2 → u0vu2 uR has to be derivable from u2 to complete parsing (otherwise: syntax error)

Bottom-up parsing

Can we also construct the parse tree from bottom to top?

𝛃 𝛕 u0 uR u u2

v1 𝛃 v2 u2

General idea of bottom-up parsing

top-down starts from the grammar start symbol)

E → T + E | T T → int × T | int | ε E ← T + E | T T ← int × T | int | ε

stream int * int + int:

in reverse (from bottom to top) gives a rightmost derivation

int × int + int T → int int × T + int T → int × T T + int T → int T + T E → T T + E E → T + E E

The resulting parse tree

int × int + int int × T + int T + int T + T T + E E

E T E ×

T T int int int

A simple bottom-up parsing algo

examined so far

replacing the right side of a production by the left side)

Bottom-up parsing steps

written as: ↑x1x2x3…xn Two kinds of operations:

ABC↑xyz ABCx↑yz

Cbxy↑ijk CbA↑ijk

Example with reductions only

int × int ↑ + int

E → T + E | T T → int × T | int | ε

int × T ↑ + int reduce T → int reduce T → int × T T + int ↑ reduce T → int T + T ↑ reduce E → T T + E ↑ reduce E → T + E

Example with shift-reduce parsing

E → T + E | T T → int × T | int | ε

Implementing the memory

Idea:

by a stack

hand side of a production) and pushes a non-terminal symbol onto the stack (left-hand side of a production)

E → T + E | T T → int × T | int | ε

Conflicts in parsing

Problem:

reduce-reduce conflict

Source of conflicts and example

Conflicts arise due to:

E → E + E | E × E | ( E ) | int

Grammar

↑ int × int + int shift … E × E ↑ + int reduce E → int × E E ↑ + int shift E + ↑ int shift E + int ↑ reduce E → int E + E ↑ reduce E → E + E E ↑

Source of conflicts and example

Another derivation is also possible:

E → E + E | E × E | ( E ) | int

The choice whether to shift or reduce determines the associativity of + and ×!

Resolving conflicts: precedence

E → E + E | E × E | ( E ) | int

The choice whether to shift or reduce determines the associativity of + and ×

(as seen with top-down parsing)

provide precedence declarations

to resolve conflicts in certain ways

parser to reduce at E × E ↑ + int

resolve conflict between reducing a × and shifting a +

What now?

footsteps

parser like this

elaborate your automaton is

add memory by introducing operations  push and pop

machine to memorize  (trace) the path they  took to get to a state  (and revert to a   previous one)

A → xB | yC   B → xB | ε   C → yC | ε

A → xB | yC   B → xB | ε   C → yC | ε

instances of x and y, such as: 

Expression → (Expression)  Statement → {Statement}  Comment → (* Comment *)

At this point, the parser tries to find a derivation for 𝛃:  u0𝛽u2 → u0vu2 uR has to be derivable from u2   to complete parsing  (otherwise: syntax error)

E → T + E | T  T → int × T | int | ε E ← T + E | T  T ← int × T | int | ε

int × int + int T → int  int × T + int T → int × T  T + int T → int  T + T E → T  T + E E → T + E  E

int × int + int  int × T + int   T + int   T + T   T + E   E

replacing the right side of a production by the left side) 

E → T + E | T  T → int × T | int | ε

E → T + E | T  T → int × T | int | ε

E → T + E | T  T → int × T | int | ε

E → E + E   | E × E  | ( E )  | int

Another derivation   is also possible:

E → E + E   | E × E  | ( E )  | int

E → E + E   | E × E  | ( E )  | int