CIS 352
Parsing, Part I
Jim Royer April 2, 2019
CIS 352 ❖ Parsing, Part I 1
Parsing, Part I Jim Royer April 2, 2019 CIS 352 Parsing, Part I 1 - - PowerPoint PPT Presentation
CIS 352 Parsing, Part I Jim Royer April 2, 2019 CIS 352 Parsing, Part I 1 Miss Teen South Carolinas Famous Answer https://kellblog.com/2007/09/01/parsing-the-unparseable-miss-teen-south-carolinas-answer/ CIS 352 Parsing, Part I 2
CIS 352
Jim Royer April 2, 2019
CIS 352 ❖ Parsing, Part I 1
https://kellblog.com/2007/09/01/parsing-the-unparseable-miss-teen-south-carolinas-answer/ CIS 352 ❖ Parsing, Part I 2
Natural Languages
stream of phonemes
via lexical
− − − − − →
analysis
stream of words
via parsing
− − − − − − → sentences
Artificial Languages
stream of characters
via lexical
− − − − − →
analysis
stream of tokens
via parsing
− − − − − − → abstract syntax Tokens: Variable names, numerals, operators key-words, . . .
int main(void) { printf("hello, world\n"); return 0; }
int main ( void ) { printf ( "hello, world\n" ) ; return 0 ; }
CIS 352 ❖ Parsing, Part I 3
Grammars rules for organizing ◮ word-streams into sentences ◮ token-streams into abstract syntax (parse trees) Context Free Grammmars (CFGs)
◮ Terminals: concrete syntax (e.g., printf ( . . . ) ◮ Nonterminals: syntactic categories: (e.g., Noun-Phrase, key-word, . . . )
Example (Palandromes over { a, b, c })
A ::= ǫ | a | b | c | aAa | bAb | cAc
CIS 352 ❖ Parsing, Part I 4
Phases P ::= C | E | B Commands C ::= skip | ℓ : = E | C; C | if B then C else C | while B do C Integer Expressons E ::= n | !ℓ | E ⊛ E (⊛ ∈ { +, −, ×, . . . }) Boolean Expressons B ::= b | E ⊛ E (⊛ ∈ { =, <, ≥, . . . }) Integers n ∈ Z = { . . . , −3, −2, −1, 0, 1, 2, 3, . . . } Booleans b ∈ B = { true, false } Locations ℓ ∈ L = { x0, x1, x2, . . . } !ℓ ≡ the integer currently stored in ℓ
x1 := 1; x2 := !x0; // Computes factorial of !x0 while (!x2>0) do x1 := (!x1*!x2); x2 := (!x2-1)
CIS 352 ❖ Parsing, Part I 5
sentence ::= subjectverb1 | subjectverb2object subject ::= articlenoun | pronoun
verb1 ::= swims | pauses | exists verb2 ::= believes | hopes | imagines article ::= a | some | the noun ::= lizard | truth | man pronoun ::= he | she | it
CIS 352 ❖ Parsing, Part I 6
◮ CFGs recursively specify a finite collection of sets of strings, syntactic categories. ◮ Each syntactic category is named by a nonterminal symbol. E.g.: object, verb1, and noun. ◮ One of the nonterminals is chosen to be the start symbol; its syntactic category is the language given by the grammar. E.g.: sentence. ◮ A syntactic category (named by nonterminal N) is described by a set
N ::= X1 . . . Xn where each X1 is a terminal or nonterminal (and n could be 0). E.g.: sentence ::= subjectverb1 sentence ::= subjectverb2object
CIS 352 ❖ Parsing, Part I 7
Notation: Xe = the nonterminal for reg. exp. e For: Add: e = a Xe ::= a e = ǫ Xe ::= ǫ e = (e1|e2) Xe ::= Xe1 | Xe2 e = (e1e2) Xe ::= Xe1Xe2 e = (e′)∗ Xe ::= Xe′Xe | ǫ For e = (01|10)∗: X(01|10)∗ ::= X01|10X(01|10)∗ | ǫ X01|10 ::= X01 | X10 X01 ::= X0X1 X10 ::= X1X0 X0 ::= 0 X1 ::= 1
CIS 352 ❖ Parsing, Part I 8
Notation: N ⇓ w means w is in N’s syntactic category. N1 ⇓ w1 · · · Nk ⇓ wk N ⇓ w
w = u0w1u1 . . . wkuk
| exp − exp | exp ∗ exp | exp/exp | num | (exp)
A dodgy grammar
num ⇓ 2 exp ⇓ 2 num ⇓ 3 exp ⇓ 3 num ⇓ 4 exp ⇓ 4 (⋆) exp ⇓ 3 ∗ 4 (†) exp ⇓ 2 + 3 ∗ 4 (⋆) “3*4” = “3”++“*”++“4” (†) “2+3*4” = “2”++“+”++“3*4”
CIS 352 ❖ Parsing, Part I 9
exp ::= exp + exp | exp − exp | exp ∗ exp | exp/exp | num | (exp)
Exp Exp + Exp 2 Exp * Exp 3 4 Exp Exp * Exp Exp + Exp 4 2 3
Two parses of 2 + 3 ∗ 4
Definition (Ambiguity)
A CFG is abmiguous when some some string in the language has two possible parses. (Great for lawyers, not-so-great in computing.)
[From a newspaper discussion of a documentary on Merle Haggard.] “Among those interviewed were his two ex-wives, Kris Kristofferson and Robert Duvall.”
CIS 352 ❖ Parsing, Part I 10
Definition
Suppose ⊕ is an operator (e.g., +, ∗, <). (a) ⊕ is left-associative when a ⊕ b ⊕ c = (a ⊕ b) ⊕ c. (E.g., −, /) (b) ⊕ is right-associative when a ⊕ b ⊕ c = a ⊕ (b ⊕ c). (E.g., :, = in C) (c) ⊕ is non-associative when a ⊕ b ⊕ c is illegal. (E.g., <) ◮ + and ∗ can be either left- or right-associative. ◮ To be consistent with − and /, we treat them as left-assoc.
For rewrite to left-assoc. ⊕ E ::= E ⊕ E | num E ::= E ⊕ E′ | E′ E′ ::= num right-assoc. ⊕ E ::= E ⊕ E | num E ::= E′ ⊕ E | E′ E′ ::= num
[What is the parse of 1 ⊕ 2 ⊕ 3 under these two grammars?]
CIS 352 ❖ Parsing, Part I 11
Definition
Operators have an ordering called precedence. In an expression a ⊕ b ⊙ c: ◮ if precedence(⊕) > precedence(⊙), then: a ⊕ b ⊙ c = (a ⊕ b) ⊙ c. ◮ if precedence(⊕) < precedence(⊙), then: a ⊕ b ⊙ c = a ⊕ (b ⊙ c). ◮ if precedence(⊕) = precedence(⊙), then:
➱ if ⊕ and ⊗ are both left-assoc., then: a ⊕ b ⊙ c = (a ⊕ b) ⊙ c. ➱ if ⊕ and ⊙ are both right-assoc., then: a ⊕ b ⊙ c = a ⊕ (b ⊙ c). ➱ Otherwise, no standard answer.
CIS 352 ❖ Parsing, Part I 12
exp ::= exp + exp | exp − exp (level 1 precedence) | exp ∗ exp | exp/exp (level 2 precedence) | num | (exp) (level 3 precedence) ◮ Handle left- and right-associativity as before. ◮ Each level gets its own nonterminal. ◮ Go from lowest to highest precedence levels. exp1 ::= exp1 + exp2 | exp1 − exp2 | exp2 exp2 ::= exp2 ∗ exp3 | exp2/exp3 | exp3 exp3 ::= num | (exp1) [More problems and repairs in the next homework.]
CIS 352 ❖ Parsing, Part I 13
Warning: Greek letters!
Notation: (a) αNβ ⇒ αγβ means αNβ rewrites to αγβ by applying the production N ::= γ. (b) ⇒∗ = the reflexive-transitive closure of ⇒. G ⊢ αNβ ⇒ αγβ
is in G
⇒ subject verb2 object ⇒ article noun verb2 object ⇒ the noun verb2 object ⇒ the man verb2 object ⇒ the man believes object ⇒ the man believes that sentence ⇒ the man believes that subject verb1 ⇒ the man believes that article noun verb1 ⇒ the man believes that some noun verb1 ⇒ the man believes that some lizard verb1 ⇒ the man believes that some lizard exists
CIS 352 ❖ Parsing, Part I 14
See Graham Hutton’s slides for Chapter 8 of his “Programming in Haskell” text http://www.cs.nott.ac.uk/~gmh/chapter8.ppt Also: ◮ Hutton’s “Programming in Haskell, 2/e” homepage: http://www.cs.nott.ac.uk/~gmh/book.html ◮ Hutton’s Example Parsing Library (From the 1st edition — Not GHC 8.0.1 compliant): http://www.cs.nott.ac.uk/~gmh/Parsing.lhs ◮ Erik Meijer’s video lecture based on the Hutton’s Chapter 8
http://channel9.msdn.com/Series/ C9-Lectures-Erik-Meijer-Functional-Programming-Fundamentals/ C9-Lectures-Dr-Erik-Meijer-Functional-Programming-Fundamentals-Chapter-
(Skip to time 6:05 for the beginning for the discussion of parsers.) . . .
CIS 352 ❖ Parsing, Part I 15