parsing expression grammars
play

Parsing Expression Grammars: A Recognition-Based Syntactic - PowerPoint PPT Presentation

Parsing Expression Grammars: A Recognition-Based Syntactic Foundation Bryan Ford Massachusetts Institute of Technology January 14, 2004 Designing a Language Syntax Designing a Language Syntax Textbook Method 1.Formalize syntax via


  1. Parsing Expression Grammars: A Recognition-Based Syntactic Foundation Bryan Ford Massachusetts Institute of Technology January 14, 2004

  2. Designing a Language Syntax

  3. Designing a Language Syntax Textbook Method 1.Formalize syntax via context-free grammar 2.Write a YACC parser specification 3.Hack on grammar until “near- LALR(1) ” 4.Use generated parser

  4. Designing a Language Syntax Textbook Method Pragmatic Method 1.Specify syntax 1.Formalize syntax via informally context-free grammar 2.Write a recursive 2.Write a YACC parser descent parser specification 3.Hack on grammar until “near- LALR(1) ” 4.Use generated parser

  5. What exactly does a CFG describe? Short answer: a rule system to generate language strings S Example CFG:  aa S S  aa S aa aaaa S S   aaaa ...

  6. What exactly does a CFG describe? Short answer: a rule system to generate language strings Start symbol S Example CFG:  aa S S  aa S aa aaaa S S   aaaa ...

  7. What exactly does a CFG describe? Short answer: a rule system to generate language strings Start symbol S Example CFG:  aa S S  aa S aa aaaa S S   aaaa ... Output strings

  8. What exatly do we want to describe? Proposed answer: a rule system to recognize language strings Parsing Expression Grammar (PEG) models recursive descent parsing practice a a a a  Example PEG: a a S S  aa S /  a a S S

  9. What exatly do we want to describe? Proposed answer: a rule system to recognize language strings Parsing Expression Grammar (PEG) models recursive descent parsing practice Input a a a a  Example PEG: string a a S S  aa S /  a a S S

  10. What exatly do we want to describe? Proposed answer: a rule system to recognize language strings Parsing Expression Grammar (PEG) models recursive descent parsing practice Input a a a a  Example PEG: string a a S S  aa S /  a a Derive S structure S

  11. Take-Home Points Key benefits of PEGs: ● Simplicity, formalism, analyzability of CFGs ● Closer match to syntax practices – More expressive than deterministic CFGs ( LL / LR ) – More of the “ right kind ” of expressiveness: prioritized choice, greedy rules, syntactic predicates – Unlimited lookahead, backtracking ● Linear-time parsing for any PEG

  12. What kind of recursive descent parsing? Key assumptions: ● Parsing functions are stateless : depend only on input string ● Parsing functions make decisions locally : return at most one result (success/failure)

  13. Parsing Expression Grammars Consists of: (∑, N , R , e S ) – ∑: finite set of terminals (character set) – N : finite set of nonterminals – R : finite set of rules of the form “ A  e ”, where A ∈ N , e is a parsing expression . – e S : a parsing expression called the start expression .

  14. Parsing Expressions  the empty string terminal ( a ∈ ∑) a nonterminal ( A ∈ N ) A a sequence of parsing expressions e 1 e 2 e 1 / e 2 prioritized choice between alternatives e ? , e *, e + optional, zero-or-more, one-or-more & e , ! e syntactic predicates

  15. How PEGs Express Languages Given input string s , a parsing expression either: – Matches and consumes a prefix s' of s . – Fails on s . Example: S matches “ badder ” S matches “ baddest ” S  bad S fails on “ abad ” S fails on “ babe ”

  16. Prioritized Choice with Backtracking S  A / B means: “To parse an S , first try to parse an A . If A fails, then backtrack and try to parse a B .” Example: S  if C then S else S / if C then S S matches “ if C then S foo ” S matches “ if C then S 1 else S 2 ” S fails on “ if C else S ”

  17. Prioritized Choice with Backtracking S  A / B means: “To parse an S , first try to parse an A . If A fails, then backtrack and try to parse a B .” Example from the C++ standard : “An expression-statement ... can be indistinguishable from a declaration ... In those cases the statement is a declaration .” statement  declaration / expression-statement

  18. Greedy Option and Repetition A  e ? A  e /  equivalent to A  e* A  e A /  equivalent to A  e + A  e e* equivalent to Example: I matches “ foobar ” I  L + I matches “ foo(bar) ” L  a / b / c / ... I fails on “ 123 ”

  19. Syntactic Predicates And-predicate: & e succeeds whenever e does, but consumes no input [Parr '94, '95] Not-predicate: ! e succeeds whenever e fails Example: A matches “ foobar ” A  foo &( bar ) A fails on “ foobie ” B matches “ foobie ” B  foo !( bar ) B fails on “ foobar ”

  20. Syntactic Predicates And-predicate: & e succeeds whenever e does, but consumes no input [Parr '94, '95] Not-predicate: ! e succeeds whenever e fails Example: C  B I* E C matches “ (*ab*)cd ” I  ! E ( C / T ) C matches “ (*a(*b*)c*) ” B  (* C fails on “ (*a(*b*) ” E  *) T  [any terminal]

  21. Syntactic Predicates And-predicate: & e succeeds whenever e does, but consumes no input [Parr '94, '95] Not-predicate: ! e succeeds whenever e fails Example: Begin marker C  B I* E C matches “ (*ab*)cd ” I  ! E ( C / T ) C matches “ (*a(*b*)c*) ” B  (* C fails on “ (*a(*b*) ” E  *) T  [any terminal]

  22. Syntactic Predicates And-predicate: & e succeeds whenever e does, but consumes no input [Parr '94, '95] Not-predicate: ! e succeeds whenever e fails Example: Internal elements C  B I* E C matches “ (*ab*)cd ” I  ! E ( C / T ) C matches “ (*a(*b*)c*) ” B  (* C fails on “ (*a(*b*) ” E  *) T  [any terminal]

  23. Syntactic Predicates And-predicate: & e succeeds whenever e does, but consumes no input [Parr '94, '95] Not-predicate: ! e succeeds whenever e fails Example: End marker C  B I* E C matches “ (*ab*)cd ” I  ! E ( C / T ) C matches “ (*a(*b*)c*) ” B  (* C fails on “ (*a(*b*) ” E  *) T  [any terminal]

  24. Syntactic Predicates And-predicate: & e succeeds whenever e does, but consumes no input [Parr '94, '95] Not-predicate: ! e succeeds whenever e fails Example: C  B I* E C matches “ (*ab*)cd ” ➔ I  ! E ( C / T ) C matches “ (*a(*b*)c*) ” B  (* C fails on “ (*a(*b*) ” E  *) T  [any terminal]

  25. Syntactic Predicates And-predicate: & e succeeds whenever e does, but consumes no input [Parr '94, '95] Not-predicate: ! e succeeds whenever e fails Only if an end marker doesn't start here... Example: C  B I* E C matches “ (*ab*)cd ” ➔ I  ! E ( C / T ) C matches “ (*a(*b*)c*) ” B  (* C fails on “ (*a(*b*) ” E  *) T  [any terminal]

  26. Syntactic Predicates And-predicate: & e succeeds whenever e does, but consumes no input [Parr '94, '95] Not-predicate: ! e succeeds whenever e fails Only if an end marker doesn't start here... Example: C  B I* E ...consume a nested comment, or else consume any single character. C matches “ (*ab*)cd ” ➔ I  ! E ( C / T ) C matches “ (*a(*b*)c*) ” B  (* C fails on “ (*a(*b*) ” E  *) T  [any terminal]

  27. Syntactic Predicates And-predicate: & e succeeds whenever e does, but consumes no input [Parr '94, '95] Not-predicate: ! e succeeds whenever e fails Example: C  B I* E C matches “ (*ab*)cd ” I  ! E ( C / T ) C matches “ (*a(*b*)c*) ” B  (* C fails on “ (*a(*b*) ” E  *) T  [any terminal]

  28. Unified Grammars PEGs can express both lexical and hierarchical syntax of realistic languages in one grammar ● Example (in paper): Complete self-describing PEG in 2/3 column ● Example (on web): Unified PEG for Java language

  29. Lexical/Hierarchical Interplay Unified grammars create new design opportunities Example: To get Unicode “ ∀ ”, E  S / ( E ) / ... instead of “\u2200” , S  “ C * “ write “\(0x2200)” C  \( E ) / “\(8704)” or ! “ ! \ T “\(FOR_ALL)” or T  [any terminal]

  30. Lexical/Hierarchical Interplay Unified grammars create new design opportunities Example: General-purpose expression syntax To get Unicode “ ∀ ”, E  S / ( E ) / ... instead of “\u2200” , S  “ C * “ write “\(0x2200)” C  \( E ) / “\(8704)” or ! “ ! \ T “\(FOR_ALL)” or T  [any terminal]

  31. Lexical/Hierarchical Interplay Unified grammars create new design opportunities Example: String literals To get Unicode “ ∀ ”, E  S / ( E ) / ... instead of “\u2200” , S  “ C * “ write “\(0x2200)” C  \( E ) / “\(8704)” or ! “ ! \ T “\(FOR_ALL)” or T  [any terminal]

  32. Lexical/Hierarchical Interplay Unified grammars create new design opportunities Example: Quotable characters To get Unicode “ ∀ ”, E  S / ( E ) / ... instead of “\u2200” , S  “ C * “ write “\(0x2200)” C  \( E ) / “\(8704)” or ! “ ! \ T “\(FOR_ALL)” or T  [any terminal]

  33. Lexical/Hierarchical Interplay Unified grammars create new design opportunities Example: To get Unicode “ ∀ ”, E  S / ( E ) / ... instead of “\u2200” , S  “ C * “ write “\(0x2200)” C  \( E ) / “\(8704)” or ! “ ! \ T “\(FOR_ALL)” or T  [any terminal]

  34. Formal Properties of PEGs ● Express all deterministic languages - LR(k) ● Closed under union, intersection, complement ● Some non-context free languages, e.g., a n b n c n ● Undecidable whether L ( G ) = ∅ ● Predicate operators can be eliminated – ...but the process is non-trivial!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend