Parsing, Part I Jim Royer April 2, 2019 CIS 352 Parsing, Part I 1 - - PowerPoint PPT Presentation

parsing part i
SMART_READER_LITE
LIVE PREVIEW

Parsing, Part I Jim Royer April 2, 2019 CIS 352 Parsing, Part I 1 - - PowerPoint PPT Presentation

CIS 352 Parsing, Part I Jim Royer April 2, 2019 CIS 352 Parsing, Part I 1 Miss Teen South Carolinas Famous Answer https://kellblog.com/2007/09/01/parsing-the-unparseable-miss-teen-south-carolinas-answer/ CIS 352 Parsing, Part I 2


slide-1
SLIDE 1

CIS 352

Parsing, Part I

Jim Royer April 2, 2019

CIS 352 ❖ Parsing, Part I 1

slide-2
SLIDE 2

Miss Teen South Carolina’s Famous Answer

https://kellblog.com/2007/09/01/parsing-the-unparseable-miss-teen-south-carolinas-answer/ CIS 352 ❖ Parsing, Part I 2

slide-3
SLIDE 3

The Syntactic Side of Languages (Again)

Natural Languages

stream of phonemes

via lexical

− − − − − →

analysis

stream of words

via parsing

− − − − − − → sentences

Artificial Languages

stream of characters

via lexical

− − − − − →

analysis

stream of tokens

via parsing

− − − − − − → abstract syntax Tokens: Variable names, numerals, operators key-words, . . .

int main(void) { printf("hello, world\n"); return 0; }

int main ( void ) { printf ( "hello, world\n" ) ; return 0 ; }

CIS 352 ❖ Parsing, Part I 3

slide-4
SLIDE 4

Context Free Grammars, 1

Grammars rules for organizing ◮ word-streams into sentences ◮ token-streams into abstract syntax (parse trees) Context Free Grammmars (CFGs)

◮ Terminals: concrete syntax (e.g., printf ( . . . ) ◮ Nonterminals: syntactic categories: (e.g., Noun-Phrase, key-word, . . . )

Example (Palandromes over { a, b, c })

A ::= ǫ | a | b | c | aAa | bAb | cAc

CIS 352 ❖ Parsing, Part I 4

slide-5
SLIDE 5

CFGs Examples: LC

Phases P ::= C | E | B Commands C ::= skip | ℓ : = E | C; C | if B then C else C | while B do C Integer Expressons E ::= n | !ℓ | E ⊛ E (⊛ ∈ { +, −, ×, . . . }) Boolean Expressons B ::= b | E ⊛ E (⊛ ∈ { =, <, ≥, . . . }) Integers n ∈ Z = { . . . , −3, −2, −1, 0, 1, 2, 3, . . . } Booleans b ∈ B = { true, false } Locations ℓ ∈ L = { x0, x1, x2, . . . } !ℓ ≡ the integer currently stored in ℓ

x1 := 1; x2 := !x0; // Computes factorial of !x0 while (!x2>0) do x1 := (!x1*!x2); x2 := (!x2-1)

CIS 352 ❖ Parsing, Part I 5

slide-6
SLIDE 6

CFGs Examples: A Fragment of English

sentence ::= subjectverb1 | subjectverb2object subject ::= articlenoun | pronoun

  • bject ::= that sentence

verb1 ::= swims | pauses | exists verb2 ::= believes | hopes | imagines article ::= a | some | the noun ::= lizard | truth | man pronoun ::= he | she | it

CIS 352 ❖ Parsing, Part I 6

slide-7
SLIDE 7

CFGs, 2

◮ CFGs recursively specify a finite collection of sets of strings, syntactic categories. ◮ Each syntactic category is named by a nonterminal symbol. E.g.: object, verb1, and noun. ◮ One of the nonterminals is chosen to be the start symbol; its syntactic category is the language given by the grammar. E.g.: sentence. ◮ A syntactic category (named by nonterminal N) is described by a set

  • f productions of the form:

N ::= X1 . . . Xn where each X1 is a terminal or nonterminal (and n could be 0). E.g.: sentence ::= subjectverb1 sentence ::= subjectverb2object

  • bject ::= that sentence

CIS 352 ❖ Parsing, Part I 7

slide-8
SLIDE 8

Example: Translating a regular expression to CFG

Notation: Xe = the nonterminal for reg. exp. e For: Add: e = a Xe ::= a e = ǫ Xe ::= ǫ e = (e1|e2) Xe ::= Xe1 | Xe2 e = (e1e2) Xe ::= Xe1Xe2 e = (e′)∗ Xe ::= Xe′Xe | ǫ For e = (01|10)∗: X(01|10)∗ ::= X01|10X(01|10)∗ | ǫ X01|10 ::= X01 | X10 X01 ::= X0X1 X10 ::= X1X0 X0 ::= 0 X1 ::= 1

CIS 352 ❖ Parsing, Part I 8

slide-9
SLIDE 9

A Big-Step Semantics for CFG

Notation: N ⇓ w means w is in N’s syntactic category. N1 ⇓ w1 · · · Nk ⇓ wk N ⇓ w

  • N ::= u0N1u1N2 . . . Nkuk

w = u0w1u1 . . . wkuk

  • exp ::= exp + exp

| exp − exp | exp ∗ exp | exp/exp | num | (exp)

A dodgy grammar

num ⇓ 2 exp ⇓ 2 num ⇓ 3 exp ⇓ 3 num ⇓ 4 exp ⇓ 4 (⋆) exp ⇓ 3 ∗ 4 (†) exp ⇓ 2 + 3 ∗ 4 (⋆) “3*4” = “3”++“*”++“4” (†) “2+3*4” = “2”++“+”++“3*4”

CIS 352 ❖ Parsing, Part I 9

slide-10
SLIDE 10

Parse Trees

exp ::= exp + exp | exp − exp | exp ∗ exp | exp/exp | num | (exp)

Exp Exp + Exp 2 Exp * Exp 3 4 Exp Exp * Exp Exp + Exp 4 2 3

Two parses of 2 + 3 ∗ 4

Definition (Ambiguity)

A CFG is abmiguous when some some string in the language has two possible parses. (Great for lawyers, not-so-great in computing.)

[From a newspaper discussion of a documentary on Merle Haggard.] “Among those interviewed were his two ex-wives, Kris Kristofferson and Robert Duvall.”

CIS 352 ❖ Parsing, Part I 10

slide-11
SLIDE 11

Grammar Repair, 1 (§3.4 in Mogensen)

Definition

Suppose ⊕ is an operator (e.g., +, ∗, <). (a) ⊕ is left-associative when a ⊕ b ⊕ c = (a ⊕ b) ⊕ c. (E.g., −, /) (b) ⊕ is right-associative when a ⊕ b ⊕ c = a ⊕ (b ⊕ c). (E.g., :, = in C) (c) ⊕ is non-associative when a ⊕ b ⊕ c is illegal. (E.g., <) ◮ + and ∗ can be either left- or right-associative. ◮ To be consistent with − and /, we treat them as left-assoc.

For rewrite to left-assoc. ⊕ E ::= E ⊕ E | num E ::= E ⊕ E′ | E′ E′ ::= num right-assoc. ⊕ E ::= E ⊕ E | num E ::= E′ ⊕ E | E′ E′ ::= num

[What is the parse of 1 ⊕ 2 ⊕ 3 under these two grammars?]

CIS 352 ❖ Parsing, Part I 11

slide-12
SLIDE 12

Grammar Repair, 2 (§3.4 in Mogensen)

Definition

Operators have an ordering called precedence. In an expression a ⊕ b ⊙ c: ◮ if precedence(⊕) > precedence(⊙), then: a ⊕ b ⊙ c = (a ⊕ b) ⊙ c. ◮ if precedence(⊕) < precedence(⊙), then: a ⊕ b ⊙ c = a ⊕ (b ⊙ c). ◮ if precedence(⊕) = precedence(⊙), then:

➱ if ⊕ and ⊗ are both left-assoc., then: a ⊕ b ⊙ c = (a ⊕ b) ⊙ c. ➱ if ⊕ and ⊙ are both right-assoc., then: a ⊕ b ⊙ c = a ⊕ (b ⊙ c). ➱ Otherwise, no standard answer.

CIS 352 ❖ Parsing, Part I 12

slide-13
SLIDE 13

Grammar Repair, 3 (§3.4 in Mogensen)

exp ::= exp + exp | exp − exp (level 1 precedence) | exp ∗ exp | exp/exp (level 2 precedence) | num | (exp) (level 3 precedence) ◮ Handle left- and right-associativity as before. ◮ Each level gets its own nonterminal. ◮ Go from lowest to highest precedence levels. exp1 ::= exp1 + exp2 | exp1 − exp2 | exp2 exp2 ::= exp2 ∗ exp3 | exp2/exp3 | exp3 exp3 ::= num | (exp1) [More problems and repairs in the next homework.]

CIS 352 ❖ Parsing, Part I 13

slide-14
SLIDE 14

A Small-Steps Semantics for CFGs

Warning: Greek letters!

Notation: (a) αNβ ⇒ αγβ means αNβ rewrites to αγβ by applying the production N ::= γ. (b) ⇒∗ = the reflexive-transitive closure of ⇒. G ⊢ αNβ ⇒ αγβ

  • N ::= γ

is in G

  • sentence

⇒ subject verb2 object ⇒ article noun verb2 object ⇒ the noun verb2 object ⇒ the man verb2 object ⇒ the man believes object ⇒ the man believes that sentence ⇒ the man believes that subject verb1 ⇒ the man believes that article noun verb1 ⇒ the man believes that some noun verb1 ⇒ the man believes that some lizard verb1 ⇒ the man believes that some lizard exists

CIS 352 ❖ Parsing, Part I 14

slide-15
SLIDE 15

Digression

See Graham Hutton’s slides for Chapter 8 of his “Programming in Haskell” text http://www.cs.nott.ac.uk/~gmh/chapter8.ppt Also: ◮ Hutton’s “Programming in Haskell, 2/e” homepage: http://www.cs.nott.ac.uk/~gmh/book.html ◮ Hutton’s Example Parsing Library (From the 1st edition — Not GHC 8.0.1 compliant): http://www.cs.nott.ac.uk/~gmh/Parsing.lhs ◮ Erik Meijer’s video lecture based on the Hutton’s Chapter 8

http://channel9.msdn.com/Series/ C9-Lectures-Erik-Meijer-Functional-Programming-Fundamentals/ C9-Lectures-Dr-Erik-Meijer-Functional-Programming-Fundamentals-Chapter-

(Skip to time 6:05 for the beginning for the discussion of parsers.) . . .

CIS 352 ❖ Parsing, Part I 15