Shift-Reduce Parsers for Transition Networks Luca Breveglieri - PowerPoint PPT Presentation

Shift-Reduce Parsers for Transition Networks Luca Breveglieri Stefano Crespi Reghizzi Angelo Morzenti Politecnico di Milano LATA 2014 - 10-14 March - Madrid Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014 1 / 26

Introduction Aim of the Work Problem statement and research objectives On the status of the LR or bottom-up syntax analysis LR (bottom-up) is an established methodology for syntax analysis. Theory is mostly developed for grammars in Backus-Naur Form ( BNF ). There are automated tools for compiler design that use it (e.g., Bison). Extended BNF ( EBNF ) grammars (rules contain regular expressions) are widely used for specifying technical languages of all sorts. Usually EBNF rules are reduced to BNF ones and then analyzed ! Objectives of the present research work Develop an Extended LR ( ELR ) methodology to generalize the LR one. Applicable to EBNF grammars represented as Transition Networks ( TN ). Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014 2 / 26

Introduction Contents Outline Table of contents Introduction 1 Transition Network 2 Parser Control 3 Main Theorem 4 Parser Construction 5 Experimentation 6 Conclusion 7 Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014 3 / 26

Introduction State of the Art State of the art in the LR syntax analysis Classical LR ( k ) theory for BNF grammars is well developed. Compiler design tools for LR ( k ) parsers exist (e.g., Bison). EBNF grammars are popular for describing technical languages (e.g., syntax charts), but then little used to obtain the parser. More recently attention has focused on representing an EBNF grammar in the equivalent form of a Transition Network ( TN ). For EBNF grammars (or their TN ’s) there are many attempts to apply LR analysis, but no simple and standard solution: • regular expressions are annotated and manipulated directly ⇒ this approach is somewhat distant from practical parsing • EBNF is turned into BNF ⇒ grammar is obscured and larger • EBNF rules are processed directly ⇒ parser is complicated due to the reduction move (at least in the current solutions) There are also incomplete or even wrong solutions proposed. Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014 4 / 26

Transition Network Definition and Example EBNF grammar and recursive transition network An EBNF grammar may have a regular expr. in a rule right part: a | b B ∗ c A → in general A → r . e . a , b , . . . , A , B . . . � + � � � and such an extended rule is interpreted as ∞ -many BNF rules: A → a | b c | b B c | b B B c | . . . | a b c | b c a | . . . Thus stipulate EBNF may have only one rule per each nonterminal. Represent a grammar by a Transition Network ( TN ): a set of DFA ’s. Each DFA is equivalent to the regular expression in a rule right part. The TN has a single DFA (called machine ) per each nonterminal. A transition with a nonterminal label is a call site for another machine. So any machine can invoke any other one recursively (even itself). Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014 5 / 26

Transition Network Definition and Example Sample transition network EBNF grammar G of a simple language of expressions (axiom S ) { a , ‘ ( ’ , ‘ ) ’ } S T ∗ � → Σ = G V { S , T } T ‘ ( ’ S ‘ ) ’ | a → = Transition network of G with a machine for S (axiomatic) and one for T T S ( ) T → S → T → 0 S 1 S 0 T 1 T 2 T 3 T call site ↓ ↓ a A machine of the TN is a DFA over the alphabet union of Σ and V . But the initial state of a machine must not have any ingoing arcs. A machine may be in the minimal form (except the initial state). BNF : machine modeled as tree with no loops or confluent paths. Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014 6 / 26

Transition Network Alternative Representation Right linearized grammar of a TN A right linearized grammar is a piecewise right linear grammar. Rules are parted into right linear groups, which call one another. Each such group has only terminal or right recursive linear rules. So a right linearized grammar maps a TN 1 , but is purely BNF . It is a useful theoretical representation, yet unfit for parsing. Right linearized grammar G RL of the sample TN (axiom 0 S ) 0 T → ‘ ( ’ 1 T | a 3 T  ↑ ↑ T   0 S → 0 T 1 S | ε S → T 0 S 1 S  1 T → 0 S 2 T  G RL 2 T → ‘ ) ’ 3 T ( S )  1 S → 0 T 1 S | ε T → →  0 T 1 T 2 T 3 T  3 T → ε  a 1 Heilbrunner defined G RL (’79), unrelated to TN . Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014 7 / 26

Parser Control Analysis Item Item structure and its meaning An item is a pair � p , π � (called state , look-ahead ), such that: • p is a state of (a machine of) the transition network � � • π ⊆ Σ ∪ { ⊣ } is a subset of terminals ( π � = ∅ ) An item represents an analysis point reached by the parser: • a machine (i.e., a rule) matches the input as far as state p • π contains the terminals expected after the machine ends a B d o A r A s A t A A · · · · · · call site � p B , { d } � b c p B q B B → → 0 B If the string to parse is . . . a b c d . . . , then the item means that B machine B (called at site r A → s A ) has matched symbol b and now is at state p B , and that when it ends, symbol d is expected. Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014 8 / 26

Parser Control Analysis Item Item shift: evolving an existing item Suppose � p , π � is an existing item, with state p and look-ahead π . Define an item shift (partial) function: shift : set of all items × Σ ∪ V � � → set of all items which works on an item as follows: if arc p X shift � p , π � , X = � q , π � → q is in the TN � � where X is any grammar symbol (terminal or nonterminal). Using the TN , the shift function matches an item and a grammar symbol, and goes to the next item on the same machine. Since the machine of the TN remains the same for the shifted item, the shift function does not change the item look-ahead. Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014 9 / 26

Parser Control Analysis Item Item closure: creating a new look-ahead Closure of a (non-empty) set I of items ∃ item � r , ρ � ∈ closure ( I )     B  and ∃ arc � r → s � ∈ TN  closure ( I ) = I ∪ � 0 B , π � and π = initials L ( s ) · ρ  � �    (sample TN of G ) Closure examples set I of items new items added to I by closure � 0 S , { ‘ ) ’ } � � 0 T , { ‘ ( ’ , a , ‘ ) ’ } � � 1 T , { . . . } � T ( S ) S → T T → → 0 S 1 S 0 T 1 T 2 T 3 T ↓ ↓ a Closure may create items with initial TN state and new look-ahead. Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014 10 / 26

Parser Control Graph Construction Macro-state (m-state) and parser pilot A macro-state (m-state) is a non-empty set of items, which represent possible analysis points reached by the parser. The pilot of a TN (grammar) is a finite directed graph, where: • the nodes are the m-states reachable by the parser • the arcs connect m-states through grammar symbols Extend the item shift function shift to the macro-states: � p , π � p π shift ( � p , π � , X ) � q , π � � r , ρ � r = ρ shift ( I , X ) = shift ( � r , ρ � , X ) � s , ρ � = . . . . . . . . . shift ( . . . , X ) . . . m-state I graphic form For BNF grammars, items are often denoted as marked rules: B → β • γ, π � p B , π � ⇔ String β is the path from state 0 B to state p B in the machine B . Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014 11 / 26

Parser Control Graph Construction Algorithm for building the pilot graph pilot DFA P = ( Σ ∪ V , R , ϑ, I 0 ) m-state set R = { I 0 , I 1 , . . . } transition function ϑ : R × ( Σ ∪ V ) → R Pilot graph algorithm - computes R and ϑ of P R := closure - - initial m-state I 0 � � � � � 0 S , { ⊣ } � repeat for each m-s. I ∈ R and sym. X ∈ Σ ∪ V do I ′ := closure shift ( I , X ) � � add m-state I ′ to the m-state set R → I ′ to the transition function ϑ add arc I X end for until R does not change any more Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014 12 / 26

Parser Control Graph Construction Algorithm for building the pilot graph pilot DFA P = ( Σ ∪ V , R , ϑ, I 0 ) m-state set R = { I 0 , I 1 , . . . } transition function ϑ : R × ( Σ ∪ V ) → R Pilot graph algorithm - computes R and ϑ of P R := closure - - initial m-state I 0 � � � � � 0 S , { ⊣ } � graphic form repeat items obtained X for each m-s. I ∈ R and sym. X ∈ Σ ∪ V do I − → through shift shift (m-state base) I ′ := closure shift ( I , X ) � � closure add m-state I ′ to the m-state set R new items (if any) → I ′ to the transition function ϑ add arc I X added to m-state through closure end for (m-state closure) until R does not change any more m-state I ′ Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014 12 / 26

Shift-Reduce Parsers for Transition Networks Luca Breveglieri - PowerPoint PPT Presentation

Shift-Reduce Parsers for Transition Networks Luca Breveglieri Stefano Crespi Reghizzi Angelo Morzenti Politecnico di Milano LATA 2014 - 10-14 March - Madrid Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014

LR Parsing Compiler Design CSE 504 Shift-Reduce Parsing 1 LR Parsers 2 SLR and LR(1) Parsers

1 2 nd Shift Associates 2 nd Shift Associates 3 rd Shift Associates 3 rd Shift Associates 2

Scanners and parsers COMP 520 Fall 2010 Scanners and Parsers (2) A scanner or lexer transforms a

Compilers Shift-Reduce Parsing Alex Aiken Shift-Reduce Parsing Important Fact #1 about

HOLY SHIFT! Linda Zheng Roadmap You are here My Shift Introduction Shift AST Experience

Objectives Combinator Parsing Show how to build complex parsers by composing simpler parsers.

XML Parsers Asst. Prof. Dr. Kanda Runapongsa Saikaew (krunapon@kku.ac.th) Dept. of Computer

CS453 : JavaCUP and error recovery CS453 Shift-reduce Parsing 1 Shift-reduce parsing in an LR

Plan for Today Finish control-flow code gen from Tuesday Handling shift-reduce errors

Paradigm Shift: Moving from Vertical Paradigm Shift: Moving from Vertical Paradigm Shift:

Sharon Mast, Facilitator IIRP World Conference Bethlehem PA October 27, 2014 Shift your

Phase Transition in 3SAT Yi Zhou Phase Transition in 3SAT Phase Transition in 3SAT Fine Grained

TATA HARRIER Harrier Gear Shift Knob TATA HARRIER GEAR KNOB TATA NEXON Nexon Gear Shift Knob

Shift Work and the Impact on Wellbeing Helen Lawson Objectives Shift work in context &

VHDL Modeling for Synthesis Hierarchical Design Textbook Section 4.8: Add and Shift Multiplier

Instruction Parsers Nathan Jay Paradyn Project Scalable Tools Workshop Granlibakken, California

PCFG : P robabilistic C ontext F ree G rammars Presenter: Ba Dat Nguyen Advisor: Dr. Martin

From MARC silos to Linked Data silos? Data models for bibliographic Linked Data Osma Suominen

Syntactic list of tokens analysis Syntactic analyzer grammar: context free format: BNF

Outline Functions on Lists Amtoft from Hatcliff from Leavens Inductive Definitions

Definition 3.1 Linear-time temporal logic (LTL) has the following syntax given in Backus Naur

CISC836: Models in Software Development: Methods, Techniques and Tools Topic 5: Domain Specific

Lecture 3 Parsing Syntax Analysis Transform a sequence of tokens into a parse tree : get

An observational study of equivalence links in cultural heritage linked data for agents Nuno

Shift-Reduce Parsers for Transition Networks Luca Breveglieri - PowerPoint PPT Presentation

Shift-Reduce Parsers for Transition Networks Luca Breveglieri Stefano Crespi Reghizzi Angelo Morzenti Politecnico di Milano LATA 2014 - 10-14 March - Madrid Breveglieri, Crespi, Morzenti (PoliMi) S-R Parsers for Transition Networks LATA 2014

LR Parsing Compiler Design CSE 504 Shift-Reduce Parsing 1 LR Parsers 2 SLR and LR(1) Parsers

1 2 nd Shift Associates 2 nd Shift Associates 3 rd Shift Associates 3 rd Shift Associates 2

Scanners and parsers COMP 520 Fall 2010 Scanners and Parsers (2) A scanner or lexer transforms a

Compilers Shift-Reduce Parsing Alex Aiken Shift-Reduce Parsing Important Fact #1 about

HOLY SHIFT! Linda Zheng Roadmap You are here My Shift Introduction Shift AST Experience

Objectives Combinator Parsing Show how to build complex parsers by composing simpler parsers.

XML Parsers Asst. Prof. Dr. Kanda Runapongsa Saikaew (krunapon@kku.ac.th) Dept. of Computer

CS453 : JavaCUP and error recovery CS453 Shift-reduce Parsing 1 Shift-reduce parsing in an LR

Plan for Today Finish control-flow code gen from Tuesday Handling shift-reduce errors

Paradigm Shift: Moving from Vertical Paradigm Shift: Moving from Vertical Paradigm Shift:

Sharon Mast, Facilitator IIRP World Conference Bethlehem PA October 27, 2014 Shift your

Phase Transition in 3SAT Yi Zhou Phase Transition in 3SAT Phase Transition in 3SAT Fine Grained

TATA HARRIER Harrier Gear Shift Knob TATA HARRIER GEAR KNOB TATA NEXON Nexon Gear Shift Knob

Shift Work and the Impact on Wellbeing Helen Lawson Objectives Shift work in context &amp;

VHDL Modeling for Synthesis Hierarchical Design Textbook Section 4.8: Add and Shift Multiplier

Instruction Parsers Nathan Jay Paradyn Project Scalable Tools Workshop Granlibakken, California

PCFG : P robabilistic C ontext F ree G rammars Presenter: Ba Dat Nguyen Advisor: Dr. Martin

From MARC silos to Linked Data silos? Data models for bibliographic Linked Data Osma Suominen

Syntactic list of tokens analysis Syntactic analyzer grammar: context free format: BNF

Outline Functions on Lists Amtoft from Hatcliff from Leavens Inductive Definitions

Definition 3.1 Linear-time temporal logic (LTL) has the following syntax given in Backus Naur

CISC836: Models in Software Development: Methods, Techniques and Tools Topic 5: Domain Specific

Lecture 3 Parsing Syntax Analysis Transform a sequence of tokens into a parse tree : get

An observational study of equivalence links in cultural heritage linked data for agents Nuno

Shift Work and the Impact on Wellbeing Helen Lawson Objectives Shift work in context &