SableCC SableCC The output is: a LALR(1) parser for the defined - - PowerPoint PPT Presentation

sablecc sablecc
SMART_READER_LITE
LIVE PREVIEW

SableCC SableCC The output is: a LALR(1) parser for the defined - - PowerPoint PPT Presentation

The SableCC Tool The SableCC Tool The input is: a sequence of token definitions Compilation 2007 Compilation 2007 a context-free grammar SableCC SableCC The output is: a LALR(1) parser for the defined language


slide-1
SLIDE 1

1

Compilation 2007 Compilation 2007

SableCC SableCC

Michael I. Schwartzbach BRICS, University of Aarhus

2

SableCC

The SableCC Tool The SableCC Tool

The input is:

  • a sequence of token definitions
  • a context-free grammar

The output is:

  • a LALR(1) parser for the defined language
  • available as a Java class

3

SableCC

Our Favorite Grammar in SableCC Our Favorite Grammar in SableCC

Helpers tab = 9; cr = 13; lf = 10; Tokens eol = cr | lf | cr lf; blank = ' ' | tab; star = '*'; slash = '/'; plus = '+'; minus = '-'; lpar = '('; rpar = ')'; id = 'x' | 'y' | 'z'; Ignored Tokens blank,eol; Productions start = {plus} start plus term | {minus} start minus term | {term} term; term = {mult} term star factor | {div} term slash factor | {factor} factor; factor = {id} id | {paren} lpar start rpar; 4

SableCC

Generated Classes Generated Classes

drwxr-xr-x 2 mis users 4096 Sep 7 09:28 analysis/ drwxr-xr-x 2 mis users 4096 Sep 7 09:28 lexer/ drwxr-xr-x 2 mis users 4096 Sep 7 09:32 node/ drwxr-xr-x 2 mis users 4096 Sep 7 09:28 parser/

  • rw-r--r--

1 mis users 536 Sep 7 09:32 xyz.sablecc

We never need to look at this output

slide-2
SLIDE 2

2

5

SableCC

The Main Application The Main Application

import parser.*; import lexer.*; import node.*; import java.io.*; class Main { public static void main(String args[]) { try { Parser p = new Parser ( new Lexer ( new PushbackReader(new InputStreamReader(System.in)))); Start tree = p.parse(); /* parse the input */ } catch(Exception e) { System.out.println(e); } } } 6

SableCC

An Ambiguous Grammar An Ambiguous Grammar X → Λ | a X | a a a a X

Any string in this language has exponentially many different parse trees

a a . . . a a a . . . a

n has exactly Fib(n) parse trees

7

SableCC

The SableCC Version The SableCC Version

Tokens a = 'a'; Productions x = {empty} | {one} a x | {two} [first]:a [second]:a x;

Note that all symbols must have unique names The default name for foo is [foo]:

8

SableCC

SableCC is Unhappy SableCC is Unhappy

reduce/reduce conflict in state [stack: TA TA PX *] on EOF in { [ PX = TA PX * ] followed by EOF (reduce), [ PX = TA TA PX * ] followed by EOF (reduce) }

The LALR(1) table contains conflicting actions

slide-3
SLIDE 3

3

9

SableCC

Solution: Less Stupid Grammar Solution: Less Stupid Grammar

Tokens a = 'a'; Productions x = {empty} | {one} a x ;

10

SableCC

A Grammar for If A Grammar for If-

  • Statements

Statements

Tokens eol = cr | lf | cr lf; blank = ' ' | tab; exp = 'exp'; if = 'if'; then = 'then'; else = 'else'; assign = 'assign'; Ignored Tokens blank,eol; Productions stm = {one} if exp then stm | {both} if exp then [thenbranch]:stm else [elsebranch]:stm | {assign} assign;

11

SableCC

SableCC is Unhappy SableCC is Unhappy

shift/reduce conflict in state [stack: TIf TExp TThen PStm *]

  • n TElse in {

[ PStm = TIf TExp TThen PStm * TElse PStm ] (shift), [ PStm = TIf TExp TThen PStm * ] followed by TElse (reduce) }

But the grammar does not appear to be stupid...

12

SableCC

Solution: Less Natural Grammar Solution: Less Natural Grammar

Productions stm = {one} if exp then stm | {both} if exp then [thenbranch]:stm2 else [elsebranch]:stm | {assign} assign; stm2 = {both} if exp then [thenbranch]:stm2 else [elsebranch]:stm2 | {assign} assign;

slide-4
SLIDE 4

4

13

SableCC

Dangling Else Problem Dangling Else Problem

An example statement:

if exp then if exp then assign else assign

To which if does the else belong? The first grammar is ambiguous Our modified grammar parses the string as:

if exp then if exp then assign else assign

( )

14

SableCC

The Palindrome Grammar The Palindrome Grammar

Tokens zero = '0';

  • ne = '1';

Productions pal = {empty} | {one} one | {zero} zero | {oneone} [first]:one pal [second]:one | {zerozero} [first]:zero pal [second]:zero;

15

SableCC

SableCC is Unhappy SableCC is Unhappy

shift/reduce conflict in state [stack: TZero *] on TZero in { [ PPal = * TZero PPal TZero ] (shift), [ PPal = * TZero ] (shift), [ PPal = * ] followed by TZero (reduce), [ PPal = TZero * ] followed by TZero (reduce) } shift/reduce conflict in state [stack: TZero *] on TZero in { [ PPal = * TZero PPal TZero ] (shift), [ PPal = * TZero ] (shift), [ PPal = * ] followed by TZero (reduce), [ PPal = TZero * ] followed by TZero (reduce) } shift/reduce conflict in state [stack: TZero *] on TOne in { [ PPal = * TOne PPal TOne ] (shift), [ PPal = * TOne ] (shift), [ PPal = TZero * ] followed by TOne (reduce) } 16

SableCC

No Solution! No Solution!

There is no LALR(1) grammar for this language Some grammars are not LALR(1) And some languages are not LALR(1) Some grammars are ambiguous And some languages are ambiguous

slide-5
SLIDE 5

5

17

SableCC

Language Containments Language Containments

Context-Free Unambiguous LALR(1)

palindromes { aibjck | i=j or j=k }

18

SableCC

EBNF Features EBNF Features

SableCC allows right-hand side abbreviations: Optional: x = y? List: x = y* Non-empty list: x = y+ This has many benefits: shorter less error-prone fewer names must be invented

19

SableCC

EBNF Example EBNF Example

block = lbrace decl* stm+ rbrace ; decl = type id init? semicolon ; init = equals exp;

20

SableCC

EBNF Expansion EBNF Expansion

x = y? x = {some} y | {none} ; x = y* x = {zero} | {more} y x ; x = y+ x = {one} y | {more} y x ;