SLIDE 1 Packrat Parsin g:
Sim ple, Powerfu l, Lazy, Lin ear Tim e
Bryan Ford Massachusetts Institute of Technology
In tern ation al Con feren ce on Fu n ction al Program m in g, October 2002
SLIDE 2 Overview
- Wh at Is Packrat Parsin g?
- Wh at is it Good (an d n ot good) For?
- Practical Experien ce
- Related Work
- Con clu sion
SLIDE 3
Wh at is Packrat Parsin g?
Answe r: Top-down parsin g with backtrackin g – except: Uses m em oization to ach ieve lin ear parse tim e
SLIDE 4
Example Grammar
Additive → Multitive '+' Additive | Multitive Multitive → Primary '*' Multitive | Primary Primary → '(' Additive ')' | Decimal Decimal → '0' | ... | '9'
SLIDE 5
Recu rsive Descen t Parser
SLIDE 6
Recu rsive Descen t Parser
data Result v = Parsed v String | NoParse
SLIDE 7
Recu rsive Descen t Parser
data Result v = Parsed v String | NoParse
SLIDE 8 Recu rsive Descen t Parser
data Result v = Parsed v String | NoParse
Semantic Value
SLIDE 9 Recu rsive Descen t Parser
data Result v = Parsed v String | NoParse
Remainder String
SLIDE 10
Recu rsive Descen t Parser
data Result v = Parsed v String | NoParse
SLIDE 11
Recu rsive Descen t Parser
data Result v = Parsed v String | NoParse pAdditive :: String -> Result Int pMultitive:: String -> Result Int pPrimary :: String -> Result Int pDecimal :: String -> Result Int
SLIDE 12
Recu rsive Descen t Parser
pAdditive :: String -> Result Int
SLIDE 13 Recu rsive Descen t Parser
Multitive '+' Additive
Multitive pAdditive :: String -> Result Int
SLIDE 14 Recu rsive Descen t Parser
Multitive '+' Additive
Multitive pAdditive :: String -> Result Int pAdditive = (do l <- pMultitive char '+' r <- pAdditive return (l + r)) <|> (do pMultitive)
SLIDE 15 Recu rsive Descen t Parser
Multitive '+' Additive
Multitive pAdditive :: String -> Result Int pAdditive = (do l <- pMultitive char '+' r <- pAdditive return (l + r)) <|> (do pMultitive)
SLIDE 16 Recu rsive Descen t Parser
Multitive '+' Additive
Multitive pAdditive :: String -> Result Int pAdditive = (do l <- pMultitive char '+' r <- pAdditive return (l + r)) <|> (do pMultitive)
SLIDE 17 Recu rsive Descen t Parser
Multitive '+' Additive
Multitive pAdditive :: String -> Result Int pAdditive = (do l <- pMultitive char '+' r <- pAdditive return (l + r)) <|> (do pMultitive)
SLIDE 18 Recu rsive Descen t Parser
Multitive '+' Additive
Multitive pAdditive :: String -> Result Int pAdditive = (do l <- pMultitive char '+' r <- pAdditive return (l + r)) <|> (do pMultitive)
SLIDE 19 Recu rsive Descen t Parser
Multitive '+' Additive
Multitive pAdditive :: String -> Result Int pAdditive = (do l <- pMultitive char '+' r <- pAdditive return (l + r)) <|> (do pMultitive)
SLIDE 20 Recu rsive Descen t Parser
Multitive '+' Additive
Multitive pAdditive :: String -> Result Int pAdditive = (do l <- pMultitive char '+' r <- pAdditive return (l + r)) <|> (do pMultitive)
SLIDE 21
Parsin g Exam ple
pAdditive “2*(3+4)”
SLIDE 22
Parsin g Exam ple
pAdditive “2*(3+4)” A → M '+' A
SLIDE 23 Parsin g Exam ple
pAdditive “2*(3+4)” A → M '+' A
pMu ltitive “2*(3+4)”
SLIDE 24 Parsin g Exam ple
pAdditive “2*(3+4)” A → M '+' A
pMu ltitive “2*(3+4)” M → P '*' M
SLIDE 25 Parsin g Exam ple
pAdditive “2*(3+4)” A → M '+' A
pMu ltitive “2*(3+4)” M → P '*' M
pPrim ary “2*(3+4)”
SLIDE 26 Parsin g Exam ple
pAdditive “2*(3+4)” A → M '+' A
pMu ltitive “2*(3+4)” M → P '*' M
pPrim ary “2*(3+4)” P → '(' A ')'
SLIDE 27 Parsin g Exam ple
pAdditive “2*(3+4)” A → M '+' A
pMu ltitive “2*(3+4)” M → P '*' M
pPrim ary “2*(3+4)” P → '(' A ')'
SLIDE 28 Parsin g Exam ple
pAdditive “2*(3+4)” A → M '+' A
pMu ltitive “2*(3+4)” M → P '*' M
pPrim ary “2*(3+4)” P → D
SLIDE 29 Parsin g Exam ple
pAdditive “2*(3+4)” A → M '+' A
pMu ltitive “2*(3+4)” M → P '*' M
pPrim ary “2*(3+4)” P → D
pDecim al “2*(3+4)”
SLIDE 30 Parsin g Exam ple
pAdditive “2*(3+4)” A → M '+' A
pMu ltitive “2*(3+4)” M → P '*' M
pPrim ary “2*(3+4)” P → D
pDecim al “2*(3+4)” ⇒ Parsed 2 “*(3+4)”
SLIDE 31 Parsin g Exam ple
pAdditive “2*(3+4)” A → M '+' A
pMu ltitive “2*(3+4)” M → P '*' M
pPrim ary “2*(3+4)” ⇒ Parsed 2 “*(3+4)”
SLIDE 32 Parsin g Exam ple
pAdditive “2*(3+4)” A → M '+' A
pMu ltitive “2*(3+4)” M → P '*' M
pPrim ary “2*(3+4)” ⇒ Parsed 2 “*(3+4)” pCh ar '*' “*(3+4)”
SLIDE 33 Parsin g Exam ple
pAdditive “2*(3+4)” A → M '+' A
pMu ltitive “2*(3+4)” M → P '*' M
pPrim ary “2*(3+4)” ⇒ Parsed 2 “*(3+4)” pCh ar '*' “*(3+4)” ⇒ Parsed () “(3+4)”
SLIDE 34 Parsin g Exam ple
pAdditive “2*(3+4)” A → M '+' A
pMu ltitive “2*(3+4)” M → P '*' M
pPrim ary “2*(3+4)” ⇒ Parsed 2 “*(3+4)” pCh ar '*' “*(3+4)” ⇒ Parsed () “(3+4)” pMu ltitive “(3+4)”
SLIDE 35 Parsin g Exam ple
pAdditive “2*(3+4)” A → M '+' A
pMu ltitive “2*(3+4)” M → P '*' M
pPrim ary “2*(3+4)” ⇒ Parsed 2 “*(3+4)” pCh ar '*' “*(3+4)” ⇒ Parsed () “(3+4)” pMu ltitive “(3+4)”
. . .
SLIDE 36 Parsin g Exam ple
pAdditive “2*(3+4)” A → M '+' A
pMu ltitive “2*(3+4)” M → P '*' M
pPrim ary “2*(3+4)” ⇒ Parsed 2 “*(3+4)” pCh ar '*' “*(3+4)” ⇒ Parsed () “(3+4)” pMu ltitive “(3+4)” ⇒ Parsed 7 “”
SLIDE 37 Parsin g Exam ple
pAdditive “2*(3+4)” A → M '+' A
pMu ltitive “2*(3+4)” ⇒ Parsed 14 “”
SLIDE 38 Parsin g Exam ple
pAdditive “2*(3+4)” A → M '+' A
pMu ltitive “2*(3+4)” ⇒ Parsed 14 “” pCh ar '+' “”
SLIDE 39 Parsin g Exam ple
pAdditive “2*(3+4)” A → M '+' A
pMu ltitive “2*(3+4)” ⇒ Parsed 14 “” pCh ar '+' “”
SLIDE 40
Parsin g Exam ple
pAdditive “2*(3+4)” A → M
SLIDE 41 Parsin g Exam ple
pAdditive “2*(3+4)” A → M
pMu ltitive “2*(3+4)”
SLIDE 42 Parsin g Exam ple
pAdditive “2*(3+4)” A → M
pMu ltitive “2*(3+4)”
. . .
SLIDE 43 Th e Backtrackin g Problem
- Can yield expon en tial worst-case parse
tim es
SLIDE 44 Th e Backtrackin g Problem
- Can yield expon en tial worst-case parse
tim es
ypic a l solution: avoid backtrackin g by
- Prediction u sin g on e-token lookah ead
- Hackin g th e gram m ar
- Design in g th e lan gu age for easy parsin g
SLIDE 45 Th e Backtrackin g Problem
- Can yield expon en tial worst-case parse
tim es
ypic a l solution: avoid backtrackin g by
- Prediction u sin g on e-token lookah ead
- Hackin g th e gram m ar
- Design in g th e lan gu age for easy parsin g
- Alte rna te solution: allow backtrackin g;
- m em oize all interm ediate res ults .
SLIDE 46 Mem oization of Resu lts
Assu m ption s:
- Parsin g fu n ction s depen d only on in pu t
strin g.
- Parsin g fu n ction s yield at m os t one res ult.
Im plication :
- Requ ires resu lts table of size (m
✁
(n+1))
✂
m = n u m ber of n on term in als/ parsin g fu n ction s
✂
n = len gth of in pu t strin g
SLIDE 47
Bu ildin g a Packrat Parser
SLIDE 48
Bu ildin g a Packrat Parser
data Result v = Parsed v String | NoParse pAdditive :: String -> Result Int pMultitive:: String -> Result Int pPrimary :: String -> Result Int pDecimal :: String -> Result Int
SLIDE 49
Bu ildin g a Packrat Parser
data Result v = Parsed v String | NoParse pAdditive :: String -> Result Int pMultitive:: String -> Result Int pPrimary :: String -> Result Int pDecimal :: String -> Result Int
SLIDE 50
Bu ildin g a Packrat Parser
data Result v = Parsed v Derivs | NoParse pAdditive :: Derivs -> Result Int pMultitive:: Derivs -> Result Int pPrimary :: Derivs -> Result Int pDecimal :: Derivs -> Result Int
SLIDE 51
Bu ildin g a Packrat Parser
data Result v = Parsed v Derivs | NoParse data Derivs = Derivs { dvAdditive :: Result Int, dvMultitive :: Result Int, dvPrimary :: Result Int, dvDecimal :: Result Int, dvChar :: Result Char}
SLIDE 52 parse :: String -> Derivs parse s = d where
Bu ildin g th e Derivs Stru ctu re
SLIDE 53 Bu ildin g th e Derivs Stru ctu re
parse :: String -> Derivs parse s = d where d = Derivs add mult prim dec chr
SLIDE 54 Bu ildin g th e Derivs Stru ctu re
parse :: String -> Derivs parse s = d where d = Derivs add mult prim dec chr chr = case s of (c:s') -> Parsed c (parse s') []
SLIDE 55 Bu ildin g th e Derivs Stru ctu re
parse :: String -> Derivs parse s = d where d = Derivs add mult prim dec chr chr = case s of (c:s') -> Parsed c (parse s') []
SLIDE 56 Bu ildin g th e Derivs Stru ctu re
parse :: String -> Derivs parse s = d where d = Derivs add mult prim dec chr chr = case s of (c:s') -> Parsed c (parse s') []
SLIDE 57 Bu ildin g th e Derivs Stru ctu re
parse :: String -> Derivs parse s = d where d = Derivs add mult prim dec chr chr = case s of (c:s') -> Parsed c (parse s') []
SLIDE 58 Bu ildin g th e Derivs Stru ctu re
parse :: String -> Derivs parse s = d where d = Derivs add mult prim dec chr chr = case s of (c:s') -> Parsed c (parse s') []
SLIDE 59 Bu ildin g th e Derivs Stru ctu re
parse :: String -> Derivs parse s = d where d = Derivs add mult prim dec chr chr = case s of (c:s') -> Parsed c (parse s') []
SLIDE 60 Bu ildin g th e Derivs Stru ctu re
parse :: String -> Derivs parse s = d where d = Derivs add mult prim dec chr chr = case s of (c:s') -> Parsed c (parse s') []
SLIDE 61 Bu ildin g th e Derivs Stru ctu re
parse :: String -> Derivs parse s = d where d = Derivs add mult prim dec chr chr = case s of (c:s') -> Parsed c (parse s') []
add = pAdditive d mult= pMultitive d prim= pPrimary d dec = pDecimal d
SLIDE 62 Bu ildin g th e Derivs Stru ctu re
parse :: String -> Derivs parse s = d where d = Derivs add mult prim dec chr chr = case s of (c:s') -> Parsed c (parse s') []
add = pAdditive d mult= pMultitive d prim= pPrimary d dec = pDecimal d
SLIDE 63
Modifyin g th e Parsin g Fu n ction s
pAdditive :: Derivs -> Result Int pAdditive = (do l <- pMultitive char '+' r <- pAdditive return (l + r)) <|> (do pMultitive)
SLIDE 64
Modifyin g th e Parsin g Fu n ction s
pAdditive :: Derivs -> Result Int pAdditive = (do l <- pMultitive char '+' r <- pAdditive return (l + r)) <|> (do pMultitive)
SLIDE 65
Modifyin g th e Parsin g Fu n ction s
pAdditive :: Derivs -> Result Int pAdditive = (do l <- dvMultitive char '+' r <- dvAdditive return (l + r)) <|> (do dvMultitive)
SLIDE 66 Packrat Parsin g Exam ple
parse “2*(3+4)”
SLIDE 67 Packrat Parsin g Exam ple
A
? ? ? ?
M P D C
?
SLIDE 68 Packrat Parsin g Exam ple
A
parse “*(3+4)” ? ? ? ? 2 ●
M P D C
SLIDE 69 Packrat Parsin g Exam ple
A
parse “(3+4)” ? ? ? ? 2 ●
M P D C
? ? ? ? * ●
SLIDE 70 Packrat Parsin g Exam ple
A
parse “3+4)” ? ? ? ? 2 ●
M P D C
? ? ? ? * ● ? ? ? ? ( ●
SLIDE 71 Packrat Parsin g Exam ple
A
parse “+4)” ? ? ? ? 2 ●
M P D C
? ? ? ? * ● ? ? ? ? ( ● ? ? ? ? 3 ●
SLIDE 72 Packrat Parsin g Exam ple
A
parse “4)” ? ? ? ? 2 ●
M P D C
? ? ? ? * ● ? ? ? ? ( ● ? ? ? ? 3 ● ? ? ? ? + ●
SLIDE 73 Packrat Parsin g Exam ple
A
parse “)” ? ? ? ? 2 ●
M P D C
? ? ? ? * ● ? ? ? ? ( ● ? ? ? ? 3 ● ? ? ? ? + ● ? ? ? ? 4 ●
SLIDE 74 Packrat Parsin g Exam ple
A
parse “” ? ? ? ? 2 ●
M P D C
? ? ? ? * ● ? ? ? ? ( ● ? ? ? ? 3 ● ? ? ? ? + ● ? ? ? ? 4 ● ? ? ? ? ) ●
SLIDE 75 Packrat Parsin g Exam ple
A
? ? ? ? 2 ●
M P D C
? ? ? ? * ● ? ? ? ? ( ● ? ? ? ? 3 ● ? ? ? ? + ● ? ? ? ? 4 ● ? ? ? ? ) ● ? ? ? ?
✘
SLIDE 76 Packrat Parsin g Exam ple
A
? ? ? ? 2 ●
M P D C
? ? ? ? * ● ? ? ? ? ( ● ? ? ? ? 3 ● ? ? ? ? + ● ? ? ? ? 4 ● ? ? ? ? ) ● ? ? ? ?
✘
(eval) A → M + A
SLIDE 77 Packrat Parsin g Exam ple
A
? ? ? ? 2 ●
M P D C
? ? ? ? * ● ? ? ? ? ( ● ? ? ? ? 3 ● ? ? ? ? + ● ? ? ? ? 4 ● ? ? ? ? ) ● ? ? ? ?
✘
(eval) M → P * M
SLIDE 78 Packrat Parsin g Exam ple
A
? ? ? ? 2 ●
M P D C
? ? ? ? * ● ? ? ? ? ( ● ? ? ? ? 3 ● ? ? ? ? + ● ? ? ? ? 4 ● ? ? ? ? ) ● ? ? ? ?
✘
(eval) P → D
SLIDE 79 Packrat Parsin g Exam ple
A
? ? ? ? 2 ●
M P D C
? ? ? ? * ● ? ? ? ? ( ● ? ? ? ? 3 ● ? ? ? ? + ● ? ? ? ? 4 ● ? ? ? ? ) ● ? ? ? ?
✘
(eval) D → '2'
SLIDE 80 Packrat Parsin g Exam ple
A
? ? ? 2 ●
M P D C
? ? ? ? * ● ? ? ? ? ( ● ? ? ? ? 3 ● ? ? ? ? + ● ? ? ? ? 4 ● ? ? ? ? ) ● ? ? ? ?
✘
2 ● (eval) P → D
SLIDE 81 Packrat Parsin g Exam ple
A
? ? 2 ●
M P D C
? ? ? ? * ● ? ? ? ? ( ● ? ? ? ? 3 ● ? ? ? ? + ● ? ? ? ? 4 ● ? ? ? ? ) ● ? ? ? ?
✘
2 ● 2 ● (eval) M → P * M
SLIDE 82 Packrat Parsin g Exam ple
A
? ? 2 ●
M P D C
? ? ? ? * ● ? ? ? ? ( ● ? ? ? ? 3 ● ? ? ? ? + ● ? ? ? ? 4 ● ? ? ? ? ) ● ? ? ? ?
✘
2 ● 2 ● (eval) M → P * M
SLIDE 83 Packrat Parsin g Exam ple
A
? ? 2 ●
M P D C
? ? ? ? * ● ? ? ? ? ( ● ? ? ? ? 3 ● ? ? ? ? + ● ? ? ? ? 4 ● ? ? ? ? ) ● ? ? ? ?
✘
2 ● 2 ● (eval) M → P * M
SLIDE 84 Packrat Parsin g Exam ple
A
? ? 2 ●
M P D C
? ? ? ? * ● ? ? ( ● 3 ● ? ? ? ? + ● 4 ● ? ? ? ? ) ● ? ? ? ?
✘
2 ● 2 ● (eval) 7 ● 7 ● 3 ● 3 ● 3 ● 7 ● 4 ● 4 ● 4 ● 4 ●
SLIDE 85 Packrat Parsin g Exam ple
A
? ? 2 ●
M P D C
? ? ? ? * ● ? ? ( ● 3 ● ? ? ? ? + ● 4 ● ? ? ? ? ) ● ? ? ? ?
✘
2 ● 2 ● (eval) 7 ● 7 ● 3 ● 3 ● 3 ● 7 ● 4 ● 4 ● 4 ● 4 ● M → P * M
SLIDE 86 Packrat Parsin g Exam ple
A
? 2 ●
M P D C
? ? ? ? * ● ? ? ( ● 3 ● ? ? ? ? + ● 4 ● ? ? ? ? ) ● ? ? ? ?
✘
2 ● 2 ● (eval) 7 ● 7 ● 3 ● 3 ● 3 ● 7 ● 4 ● 4 ● 4 ● 4 ● 14 ● A → M + A
SLIDE 87 Packrat Parsin g Exam ple
A
? 2 ●
M P D C
? ? ? ? * ● ? ? ( ● 3 ● ? ? ? ? + ● 4 ● ? ? ? ? ) ● ? ? ? ?
✘
2 ● 2 ● (eval) 7 ● 7 ● 3 ● 3 ● 3 ● 7 ● 4 ● 4 ● 4 ● 4 ● 14 ● A → M + A
SLIDE 88 Packrat Parsin g Exam ple
A
? 2 ●
M P D C
? ? ? ? * ● ? ? ( ● 3 ● ? ? ? ? + ● 4 ● ? ? ? ? ) ● ? ? ? ?
✘
2 ● 2 ● (eval) 7 ● 7 ● 3 ● 3 ● 3 ● 7 ● 4 ● 4 ● 4 ● 4 ● 14 ● A → M
SLIDE 89 Packrat Parsin g Exam ple
A
2 ●
M P D C
? ? ? ? * ● ? ? ( ● 3 ● ? ? ? ? + ● 4 ● ? ? ? ? ) ● ? ? ? ?
✘
2 ● 2 ● (eval) 7 ● 7 ● 3 ● 3 ● 3 ● 7 ● 4 ● 4 ● 4 ● 4 ● 14 ● 14 ●
SLIDE 90
Wh at is Packrat Parsin g Good (an d n ot good) For?
SLIDE 91 Th eoretical Properties
- Form ally developed by Birm an in 1970s
- Proved existen ce of lin ear-tim e parsin g
algorith m
- ...but apparently never im plem ented
- Recogn izable langu ages:
- Strictly larger th an determ in istic parsin g
algorith m s: e.g., LL(k ), LR(k )
- In com parable to class of con text-free
lan gu ages
SLIDE 92 Scan n erless Parsin g
- Tradition al lin ear-tim e parsers lim ited by
fixed (e.g., on e-token ) lookah ead
- If we on ly h ave on e lookah ead token , th en
it's easier if token s are big.
- Packrat parsers provide u n lim ited
lookah ead
- No lon ger n eed to separate lexical an alysis
- Wh y scan n erless parsin g?
- Sim plicity: u n ified gram m ar for en tire
lan gu age
- Power: lexical elem en ts with com plex syn tax
SLIDE 93 Syn tactic Flexibility
- Syn tactic predicates
- Parse X on ly if Y also m atch es
- Parse X on ly if followed by Y
- Su btractive syn tax
- Parse X on ly if Y doesn 't m atch
- Parse X on ly if not followed by Y
- Sem an tic predicates
- Parse X if its sem an tic valu e satisfies
con dition
SLIDE 94 Lim itation s
Wh at is a packrat parser not good for?
- Gen eral CFG parsin g: e.g., am bigu ou s
gram m ars
(becau se of “at-m ost-on e resu lt” lim itation )
- Parsin g h igh ly “statefu l” syn tax: e.g., C, C++
(m em oization depen ds on statelessn ess)
- Parsin g in m in im al space
(LL/ LR parser grows with stack depth , n ot in pu t size)
SLIDE 95 Practical Parsers
Exam ple packrat parser for th e J ava lan gu age:
- Un ified (scan n erless) parser
- Im plem en ted in Haskell
- Th ree version s:
- 1. Han d-coded with m on adic com bin ators
- 2. Han d-coded with prim itive pattern -m atch in g
- 3. Au tom atically bu ilt by prototype parser
gen erator
SLIDE 96 Perform an ce Resu lts (Su m m ary)
Parse Tim e:
- Reliably lin ear growth with in pu t size
- 26-52KB/ s (600-1200 lin es/ sec)
(GHC 5.04, 1.2GHz Ath lon )
- Com parable to Happy-gen erated LR parser
(faster for average-size J ava sou rces)
- Heap u sage:
- Reliably lin ear growth with in pu t size
- 300-600
✁
expan sion ratio
SLIDE 97 Related Work
Fu n ction al/ m on adic Parsin g:
- Wadler, Fokker, Hu tton , Meijer, etc.
Scann erless Parsin g:
- Tai, Salom on , Corm ack – NSLR(1)
✂
(lin ear tim e, bu t restrictive of gram m ar)
- Visser et al. – Gen eralized-LR
✂
(n ot lin ear tim e)
Syn tactic & Sem an tic Predicates:
- Parr, Qu on g – pred-LL(k )
SLIDE 98 Con clu sion
Packrat parsing:
- Uses m em oization to provide backtrackin g
an d u n lim ited lookah ead in a lin ear-tim e parser
- Is easily expressed as a lazy data stru ctu re
- Provides m ore flexibility th an LL or LR
parsin g
- En ables practical scan n erless parsin g
- Has su bstan tial storage cost, bu t often
reason able
SLIDE 99 For More In form ation
Papers, Master's Th esis Prototype Packrat Parser Gen erator Sou rce Code for Exam ple Parsers Test Su ite for Exam ple J ava Parsers available at:
http://pdos.lcs.mit.edu/~baford/packrat/