Parsing: Episode I Matthew Might University of Utah matt.might.net - - PowerPoint PPT Presentation

parsing episode i
SMART_READER_LITE
LIVE PREVIEW

Parsing: Episode I Matthew Might University of Utah matt.might.net - - PowerPoint PPT Presentation

Parsing: Episode I Matthew Might University of Utah matt.might.net ucombinator.org Administrivia Project 1: Use the source! Agenda What is parsing? Context-free languages Context-free grammars Recursive descent parsing


slide-1
SLIDE 1

Parsing: Episode I

Matthew Might University of Utah matt.might.net ucombinator.org

slide-2
SLIDE 2

Administrivia

  • Project 1: Use the source!
slide-3
SLIDE 3

Agenda

  • What is parsing?
  • Context-free languages
  • Context-free grammars
  • Recursive descent parsing
  • Properties of grammars
slide-4
SLIDE 4

What is parsing?

A parser converts a token stream from the lexer into a parse tree.

slide-5
SLIDE 5

Example

f x = x

slide-6
SLIDE 6

Example

f x = x

ID(f) ID(x) EQUAL ID(x)

slide-7
SLIDE 7

Example

f x = x

ID(f) ID(x) EQUAL ID(x)

Dec FunDef ID(f) ArgList Arg ID(x) EQUAL Expr Ref ID(x)

slide-8
SLIDE 8

Parsing methods

  • LALR(k)
  • LR(k)
  • SLR(k)
  • LL(k)
  • Back-tracking search
  • Nondet. rec. descent
  • Predictive rec. descent
  • PEG/Packrat
  • Combinators
  • Earley
slide-9
SLIDE 9

Context-free languages

slide-10
SLIDE 10

Context-free languages

  • Natural choice for describing syntax
  • Like regular expressions plus recursion
slide-11
SLIDE 11

Example

  • Language of balanced parentheses
  • Language is context-free language
  • But language is not regular language
slide-12
SLIDE 12

As formal language

  • Context-free languages are formal languages
  • Two operations allowed: catenation, union
  • Recursive equations are allowed as well
slide-13
SLIDE 13

Example

LB = {ǫ} ∪ ({(} · LB · {)} · LB) .

slide-14
SLIDE 14

Problem: Recursion!

How do we assign meaning to recursive definitions?

slide-15
SLIDE 15

Fixed points!

slide-16
SLIDE 16

Fixed points

If x = f(x), then the point x is a fixed point of the function f.

slide-17
SLIDE 17

Fixed points

Fix(f) = {L : L = f(L)} .

slide-18
SLIDE 18

Algebra

  • x = x2 - 1 is a recursive definition of x
  • If f(v) = v2 - 1, then x = f(x).
  • Solutions are the fixed points of f.
slide-19
SLIDE 19

f(x) x

slide-20
SLIDE 20

f(x) x f(x) = x2 -1

slide-21
SLIDE 21

f(x) x fixed line f(x) = x2 -1

slide-22
SLIDE 22

Refactoring

LB = f(LB) f(L) = {ǫ} ∪ ({(} · L · {)} · L) .

slide-23
SLIDE 23

Candidates

LB ∈ Fix(f),

slide-24
SLIDE 24

Sensible choices

lfp(f) =

  • L∈Fix(f)

L

gfp(f) =

  • L∈Fix(f)

L

slide-25
SLIDE 25

Greatest fixed point

  • Includes infinitely long strings!
  • Example: ()()()()()()()...
slide-26
SLIDE 26

Kleene’s theorem (specialized)

If a function f is continuous, then:

lfp(f) =

  • n≥1

f n(∅)

slide-27
SLIDE 27

Continuous

f

  • i

xi

  • =
  • i

f(xi)

The function f is continuous only if:

slide-28
SLIDE 28

Constructive

  • bservation

∅ ⊆ f(∅) ⊆ f 2(∅) ⊆ f 3(∅) ⊆ · · ·

slide-29
SLIDE 29

Excursion

slide-30
SLIDE 30

In general

  • In general, for a set of recursive equations over the languages L1, . . . , Ln, if

L1 = f1(L1, . . . , Ln) L2 = f2(L1, . . . , Ln) . . . = . . . Ln = fn(L1, . . . , Ln), then these languages are a fixed point of the function F : P (A∗)n → P (A∗)n: F(L1, . . . , Ln) = (f1(L1, . . . , Ln), f2(L1, . . . , Ln), . . . fn(L1, . . . , Ln)), and by default, the least fixed point of this function: (L1, . . . , Ln) = lfp(F).

slide-31
SLIDE 31

Context-free grammars

slide-32
SLIDE 32

Context-free grammars

A context-free grammar is a quadruple (A, N, R, n0), where:

  • the set A contains the terminal symbols of the language—its alphabet; and
  • the set N contains the non-terminal symbols of the language; and
  • the set R ⊆ N × (A × N)∗ contains non-terminal-to-terminal substitution rules; and
  • the symbol n0 ∈ N is the top-level “start” symbol.
slide-33
SLIDE 33

Example

A = {(, )} N = {B} R ∋ B → (B)B R ∋ B → ǫ n0 = B.

slide-34
SLIDE 34

Recognizing strings

(n → s1 . . . sn) ∈ R wnw′ ∈ L(A, N, R, n0) ws1 . . . snw′ ∈ L(A, N, R, n0).

slide-35
SLIDE 35

Example

(B → ǫ) ∈ R (B → (B)B) ∈ R B = n0 B ∈ L(GB) (B)B ∈ L(GB) () ∈ L(GB).

slide-36
SLIDE 36

Parse trees

  • Convenient diagrammatic notation
  • Demonstrates membership in language
  • Simultaneously shows structure of string
slide-37
SLIDE 37

Example

B ( B ǫ ) B ǫ

slide-38
SLIDE 38

Example: Regexes

A = {(, ), a, . . . , z, |, *} N = {E, T, F, K} R ∋ E → T | E R ∋ E → T R ∋ T → F T R ∋ T → F R ∋ F → K* R ∋ F → K R ∋ K → (E) R ∋ K → a, for every a ∈ {a, . . . , z} n0 = E.

slide-39
SLIDE 39

A = {(, ), a, . . . , z, |, *} N = {E, T, F, K} R ∋ E → T | E R ∋ E → T R ∋ T → F T R ∋ T → F R ∋ F → K* R ∋ F → K R ∋ K → (E) R ∋ K → a, for every a ∈ {a, . . . , z} n0 = E.

Parse tree: (a|b)*

E T F K ( E T F K a E T F K b ) *

slide-40
SLIDE 40

Ambiguous grammars

A grammar is ambiguous if there is at least one string that has one or more parse trees.

slide-41
SLIDE 41

Example: Ambiguity

A = {(, ), +, *} ∪ Z N = {E} R ∋ E → E + E R ∋ E → E * E R ∋ E → z, for every z ∈ Z n0 = E.

slide-42
SLIDE 42

Example: 3 + 4 * 9

E 3 + E 4 * 9 E E 3 + 4 * 9

slide-43
SLIDE 43

Left-recursion

A grammar is left-recursive if a non-terminal symbol can derive a new string with itself in leftmost position.

slide-44
SLIDE 44

Example: Left-recursion

S → S , x S → x

slide-45
SLIDE 45

Example: Factoring

S → x , S S → x

slide-46
SLIDE 46

Exercise: Nondeterministic recursive descent

slide-47
SLIDE 47

Grammar

X → (X∗) X → num X → sym X∗ → X X∗ X∗ → ǫ.

slide-48
SLIDE 48

Exercise: Predictive recursive descent

slide-49
SLIDE 49

Lexer API

  • next() : Token
  • eat(t : TokenType)
  • peek(k : Int) : TokenType
slide-50
SLIDE 50

CFG properties

slide-51
SLIDE 51

Nullability

The nullability function, δ : (A ∪ N) → {{ǫ} , ∅}, returns the set {ǫ} if the provided symbol can derive the empty string, and ∅ otherwise: δ(a) = ∅ δ(n) ⊇ δ(s1) · . . . · δ(sn) if (n → s1 . . . sn) ∈ R δ(n) ⊇ {ǫ} if (n → ǫ) ∈ R.

slide-52
SLIDE 52

Nullability

The nullability function, δ : (A ∪ N) → {{ǫ} , ∅}, returns the set {ǫ} if the provided symbol can derive the empty string, and ∅ otherwise: δ(a) = ∅ δ(n) ⊇ δ(s1) · . . . · δ(sn) if (n → s1 . . . sn) ∈ R δ(n) ⊇ {ǫ} if (n → ǫ) ∈ R.

slide-53
SLIDE 53

Inclusion constraints

X1 ⊇ f1(X1, . . . , Xn) . . . . . . Xn ⊇ fn(X1, . . . , Xn),

slide-54
SLIDE 54

Inclusion constraints

X1 ⊇ f1(X1, . . . , Xn) . . . . . . Xn ⊇ fn(X1, . . . , Xn),

slide-55
SLIDE 55

Solving inclusions

Xi ← ∅ for all i changed ← true while (changed) changed ← false X′

i ← fi(X1, . . . , Xn)

if (Xi = X′

i)

Xi ← X′

i

changed ← true.

slide-56
SLIDE 56

First sets

In context-free grammars, first sets are easily computed with subset-inclusion constraints; for every rule (n → s1 . . . sm) ∈ R: first(n) ⊇

m

  • i≥1

δ(s1 . . . si−1) · first(si).

slide-57
SLIDE 57

First sets

In context-free grammars, first sets are easily computed with subset-inclusion constraints; for every rule (n → s1 . . . sm) ∈ R: first(n) ⊇

m

  • i≥1

δ(s1 . . . si−1) · first(si).

slide-58
SLIDE 58

Follow sets

function follow : (A ∪ N) → A; for every rule n → s1 . . . sn follow(si) ⊇

n−1

  • j≥i

δ(si+1 . . . sj) · first(sj+1) ∪ δ(si+1 . . . sn) · follow(n).

slide-59
SLIDE 59

Follow sets

function follow : (A ∪ N) → A; for every rule n → s1 . . . sn follow(si) ⊇

n−1

  • j≥i

δ(si+1 . . . sj) · first(sj+1) ∪ δ(si+1 . . . sn) · follow(n).

slide-60
SLIDE 60

CFL trivia

  • Are regular languages context-free?
  • Are CFLs closed under complement?
  • Is the intersection of CFLs context-free?
  • Does a CFG accept no strings?
  • Does a CFG accept a finite set?
  • Does a CFG accept every string?
  • Is one CFL a subset of another CFL?