parsing episode i
play

Parsing: Episode I Matthew Might University of Utah matt.might.net - PowerPoint PPT Presentation

Parsing: Episode I Matthew Might University of Utah matt.might.net ucombinator.org Administrivia Project 1: Use the source! Agenda What is parsing? Context-free languages Context-free grammars Recursive descent parsing


  1. Parsing: Episode I Matthew Might University of Utah matt.might.net ucombinator.org

  2. Administrivia • Project 1: Use the source!

  3. Agenda • What is parsing? • Context-free languages • Context-free grammars • Recursive descent parsing • Properties of grammars

  4. What is parsing? A parser converts a token stream from the lexer into a parse tree.

  5. Example f x = x

  6. Example f x = x ID(f) ID(x) EQUAL ID(x)

  7. Example f x = x ID(f) ID(x) EQUAL ID(x) Dec FunDef ArgList Expr ID(f) EQUAL Arg Ref ID(x) ID(x)

  8. Parsing methods • LALR( k ) • Nondet. rec. descent • LR( k ) • Predictive rec. descent • SLR( k ) • PEG/Packrat • LL( k ) • Combinators • Back-tracking search • Earley

  9. Context-free languages

  10. Context-free languages • Natural choice for describing syntax • Like regular expressions plus recursion

  11. Example • Language of balanced parentheses • Language is context-free language • But language is not regular language

  12. As formal language • Context-free languages are formal languages • Two operations allowed: catenation, union • Recursive equations are allowed as well

  13. Example L B = { ǫ } ∪ ( { ( } · L B · { ) } · L B ) .

  14. Problem: Recursion! How do we assign meaning to recursive definitions?

  15. Fixed points!

  16. Fixed points If x = f ( x ), then the point x is a fixed point of the function f .

  17. Fixed points Fix ( f ) = { L : L = f ( L ) } .

  18. Algebra • x = x 2 - 1 is a recursive definition of x • If f ( v ) = v 2 - 1, then x = f ( x ). • Solutions are the fixed points of f .

  19. f ( x ) 0 x

  20. f ( x ) = x 2 -1 f ( x ) 0 x

  21. f ( x ) = x 2 -1 f ( x ) fixed line 0 x

  22. Refactoring L B = f ( L B ) f ( L ) = { ǫ } ∪ ( { ( } · L · { ) } · L ) .

  23. Candidates L B ∈ Fix ( f ),

  24. Sensible choices � lfp( f ) = L L ∈ Fix ( f ) � gfp( f ) = L L ∈ Fix ( f )

  25. Greatest fixed point • Includes infinitely long strings! • Example: ()()()()()()() ...

  26. Kleene’s theorem (specialized) If a function f is continuous , then: ∞ � f n ( ∅ ) lfp( f ) = n ≥ 1

  27. Continuous The function f is continuous only if: �� � � = f ( x i ) f x i i i

  28. Constructive observation ∅ ⊆ f ( ∅ ) ⊆ f 2 ( ∅ ) ⊆ f 3 ( ∅ ) ⊆ · · ·

  29. Excursion

  30. In general � In general, for a set of recursive equations over the languages L 1 , . . . , L n , if L 1 = f 1 ( L 1 , . . . , L n ) L 2 = f 2 ( L 1 , . . . , L n ) . . . . . = . L n = f n ( L 1 , . . . , L n ), then these languages are a fixed point of the function F : P ( A ∗ ) n → P ( A ∗ ) n : F ( L 1 , . . . , L n ) = ( f 1 ( L 1 , . . . , L n ) , f 2 ( L 1 , . . . , L n ) , . . . f n ( L 1 , . . . , L n )), and by default, the least fixed point of this function: ( L 1 , . . . , L n ) = lfp( F ).

  31. Context-free grammars

  32. Context-free grammars A context-free grammar is a quadruple ( A, N, R, n 0 ), where: • the set A contains the terminal symbols of the language—its alphabet; and • the set N contains the non-terminal symbols of the language; and • the set R ⊆ N × ( A × N ) ∗ contains non-terminal-to-terminal substitution rules; and • the symbol n 0 ∈ N is the top-level “start” symbol.

  33. Example A = { ( , ) } N = { B } R ∋ B → ( B ) B R ∋ B → ǫ n 0 = B .

  34. Recognizing strings wnw ′ ∈ L ( A, N, R, n 0 ) ( n → s 1 . . . s n ) ∈ R ws 1 . . . s n w ′ ∈ L ( A, N, R, n 0 ).

  35. Example B = n 0 ( B → ( B ) B ) ∈ R ( B → ǫ ) ∈ R B ∈ L ( G B ) ( B ) B ∈ L ( G B ) () ∈ L ( G B ).

  36. Parse trees • Convenient diagrammatic notation • Demonstrates membership in language • Simultaneously shows structure of string

  37. Example B B B ( ) ǫ ǫ

  38. Example: Regexes A = { ( , ) , a , . . . , z , | , * } N = { E, T, F, K } R ∋ E → T | E R ∋ E → T R ∋ T → F T R ∋ T → F R ∋ F → K * R ∋ F → K R ∋ K → ( E ) R ∋ K → a , for every a ∈ { a , . . . , z } n 0 = E .

  39. Parse tree: (a|b)* E A = { ( , ) , a , . . . , z , | , * } T N = { E, T, F, K } F R ∋ E → T | E R ∋ E → T K * R ∋ T → F T ( E ) R ∋ T → F T E R ∋ F → K * F T R ∋ F → K K F R ∋ K → ( E ) K a R ∋ K → a , for every a ∈ { a , . . . , z } b n 0 = E .

  40. Ambiguous grammars A grammar is ambiguous if there is at least one string that has one or more parse trees.

  41. Example: Ambiguity A = { ( , ) , + , * } ∪ Z N = { E } R ∋ E → E + E R ∋ E → E * E R ∋ E → z , for every z ∈ Z n 0 = E .

  42. Example: 3 + 4 * 9 E E E E 3 + * 9 4 * 9 3 + 4

  43. Left-recursion A grammar is left-recursive if a non-terminal symbol can derive a new string with itself in leftmost position.

  44. Example: Left-recursion S → S , x S → x

  45. Example: Factoring S → x , S S → x

  46. Exercise: Nondeterministic recursive descent

  47. Grammar X → ( X ∗ ) X → num X → sym X ∗ → X X ∗ X ∗ → ǫ .

  48. Exercise: Predictive recursive descent

  49. Lexer API • next() : Token • eat(t : TokenType) • peek(k : Int) : TokenType

  50. CFG properties

  51. Nullability The nullability function , δ : ( A ∪ N ) → {{ ǫ } , ∅ } , returns the set { ǫ } if the provided symbol can derive the empty string, and ∅ otherwise: δ ( a ) = ∅ δ ( n ) ⊇ δ ( s 1 ) · . . . · δ ( s n ) if ( n → s 1 . . . s n ) ∈ R δ ( n ) ⊇ { ǫ } if ( n → ǫ ) ∈ R .

  52. Nullability The nullability function , δ : ( A ∪ N ) → {{ ǫ } , ∅ } , returns the set { ǫ } if the provided symbol can derive the empty string, and ∅ otherwise: δ ( a ) = ∅ δ ( n ) ⊇ δ ( s 1 ) · . . . · δ ( s n ) if ( n → s 1 . . . s n ) ∈ R δ ( n ) ⊇ { ǫ } if ( n → ǫ ) ∈ R .

  53. Inclusion constraints X 1 ⊇ f 1 ( X 1 , . . . , X n ) . . . . . . X n ⊇ f n ( X 1 , . . . , X n ),

  54. Inclusion constraints X 1 ⊇ f 1 ( X 1 , . . . , X n ) . . . . . . X n ⊇ f n ( X 1 , . . . , X n ),

  55. Solving inclusions X i ← ∅ for all i changed ← true while ( changed ) changed ← false X ′ i ← f i ( X 1 , . . . , X n ) if ( X i � = X ′ i ) X i ← X ′ i changed ← true .

  56. First sets In context-free grammars, first sets are easily computed with subset-inclusion constraints; for every rule ( n → s 1 . . . s m ) ∈ R : m � first ( n ) ⊇ δ ( s 1 . . . s i − 1 ) · first ( s i ). i ≥ 1

  57. First sets In context-free grammars, first sets are easily computed with subset-inclusion constraints; for every rule ( n → s 1 . . . s m ) ∈ R : m � first ( n ) ⊇ δ ( s 1 . . . s i − 1 ) · first ( s i ). i ≥ 1

  58. Follow sets function follow : ( A ∪ N ) → A ; for every rule n → s 1 . . . s n n − 1 � follow ( s i ) ⊇ δ ( s i +1 . . . s j ) · first ( s j +1 ) j ≥ i ∪ δ ( s i +1 . . . s n ) · follow ( n ).

  59. Follow sets function follow : ( A ∪ N ) → A ; for every rule n → s 1 . . . s n n − 1 � follow ( s i ) ⊇ δ ( s i +1 . . . s j ) · first ( s j +1 ) j ≥ i ∪ δ ( s i +1 . . . s n ) · follow ( n ).

  60. CFL trivia • Are regular languages context-free? • Are CFLs closed under complement? • Is the intersection of CFLs context-free? • Does a CFG accept no strings? • Does a CFG accept a finite set? • Does a CFG accept every string? • Is one CFL a subset of another CFL?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend