In tro duction to F unctional Programming: Lecture 10 1 In - - PDF document

in tro duction to f unctional programming lecture 10 1 in
SMART_READER_LITE
LIVE PREVIEW

In tro duction to F unctional Programming: Lecture 10 1 In - - PDF document

In tro duction to F unctional Programming: Lecture 10 1 In tro duction to F unctional Programming John Harrison Univ ersit y of Cam bridge Lecture 10 ML examples I I: Recursiv e Descen t P arsing T opics co v


slide-1
SLIDE 1 In tro duction to F unctional Programming: Lecture 10 1 In tro duction to F unctional Programming John Harrison Univ ersit y
  • f
Cam bridge Lecture 10 ML examples I I: Recursiv e Descen t P arsing T
  • pics
co v ered:
  • The
parsing problem
  • Recursiv
e descen t
  • P
arsers in ML
  • Higher
  • rder
parser com binators
  • Eciency
and limitations. John Harrison Univ ersit y
  • f
Cam bridge, 5 F ebruary 1998
slide-2
SLIDE 2 In tro duction to F unctional Programming: Lecture 10 2 Grammar for terms W e w
  • uld
lik e to ha v e a parser for
  • ur
terms, so that w e don't ha v e to write them in terms
  • f
t yp e constructors. ter m
  • !
name( ter ml ist) j name j (ter m) j numer al j
  • ter
m j ter m + ter m j ter m * ter m ter ml ist
  • !
ter m,ter ml ist j ter m Here w e ha v e a grammar for terms, dened b y a set
  • f
pro duction rules. John Harrison Univ ersit y
  • f
Cam bridge, 5 F ebruary 1998
slide-3
SLIDE 3 In tro duction to F unctional Programming: Lecture 10 3 Am biguit y The task
  • f
p arsing, in general, is to rev erse this, i.e. nd a sequence
  • f
pro ductions that could generate a giv en string. Unfortunately the ab
  • v
e grammar is ambiguous, since certain strings can b e pro duced in sev eral w a ys, e.g. ter m
  • !
ter m + ter m
  • !
ter m + ter m * ter m and ter m
  • !
ter m * ter m
  • !
ter m + ter m * ter m These corresp
  • nd
to dieren t `parse trees'. Eectiv ely , w e are free to in terpret x + y * z either as x + (y * z )
  • r
(x + y ) * z . John Harrison Univ ersit y
  • f
Cam bridge, 5 F ebruary 1998
slide-4
SLIDE 4 In tro duction to F unctional Programming: Lecture 10 4 Enco ding precedences W e can enco de
  • p
erator precedences b y in tro ducing extra categories, e.g. atom
  • !
name( ter ml ist) j name j numer al j (ter m) j
  • atom
mul exp
  • !
atom * mul exp j atom ter m
  • !
mul exp + ter m j mul exp ter ml ist
  • !
ter m,ter ml ist j ter m No w it's unam biguous. Multiplication has higher precedence and b
  • th
inxes asso ciate to the righ t. John Harrison Univ ersit y
  • f
Cam bridge, 5 F ebruary 1998
slide-5
SLIDE 5 In tro duction to F unctional Programming: Lecture 10 5 Recursiv e descen t A r e cursive desc ent parser is a series
  • f
m utually recursiv e functions,
  • ne
for eac h syn tactic category (ter m, mul exp etc.). The m utually recursiv e structure mirrors that in the grammar. This mak es them quite easy and natural to write | esp ecially in ML, where recursion is the principal con trol mec hanism. F
  • r
example, the pro cedure for parsing terms, sa y term will,
  • n
encoun tering a
  • sym
b
  • l,
mak e a recursiv e call to itself to parse the subterm, and
  • n
encoun tering a name follo w ed b y an
  • p
ening paren thesis, will mak e a recursiv e call to termlist. This in itself will mak e at least
  • ne
recursiv e call to term, and so
  • n.
John Harrison Univ ersit y
  • f
Cam bridge, 5 F ebruary 1998
slide-6
SLIDE 6 In tro duction to F unctional Programming: Lecture 10 6 P arsers in ML W e assume that a parser accepts a list
  • f
input c haracters
  • r
tok ens
  • f
arbitrary t yp e. It returns the result
  • f
parsing, whic h has some
  • ther
arbitrary t yp e, and also the list
  • f
input
  • b
jects not y et pro cessed. Therefore the t yp e
  • f
a parser is: ( )l ist !
  • (
)l ist F
  • r
example, when giv en the input c haracters (x + y) * z the function atom will pro cess the c haracters (x + y) and lea v e the remaining c haracters * z. It migh t return a parse tree for the pro cessed expression using
  • ur
earlier recursiv e t yp e, and hence w e w
  • uld
ha v e: atom "(x + y) * z" = Fn("+",[Var "x", Var "y"]),"* z" John Harrison Univ ersit y
  • f
Cam bridge, 5 F ebruary 1998
slide-7
SLIDE 7 In tro duction to F unctional Programming: Lecture 10 7 P arser com binators In ML, w e can dene a series
  • f
c
  • mbinators
for plugging parsers together and creating new parsers from existing
  • nes.
By giving some
  • f
them inx status, w e can mak e the ML parser program lo
  • k
quite similar in structure to the
  • riginal
grammar. First w e declare an exception to b e used where parsing fails: exception Noparse; p1 ++ p2 applies p1 rst and then applies p2 to the remaining tok ens; many k eeps applying the same parser as long as p
  • ssible.
p >> f w
  • rks
lik e p but then applies f to the result
  • f
the parse. p1 || p2 tries p1 rst, and if that fails, tries p2. These are automatically inx, in decreasing
  • rder
  • f
precedence. John Harrison Univ ersit y
  • f
Cam bridge, 5 F ebruary 1998
slide-8
SLIDE 8 In tro duction to F unctional Programming: Lecture 10 8 Denitions
  • f
the com binators fun ++ (parser1,parser2) input = let val (result1,rest1) = parser1 input val (result2,rest2) = parser2 rest1 in ((result1,result 2) ,re st 2) end; fun many parser input = let val (result,next) = parser input val (results,rest) = many parser next in ((result::result s) ,re st ) end handle Noparse => ([],input); fun >> (parser,treatment ) input = let val (result,rest) = parser input in (treatment(resul t) ,re st ) end; fun || (parser1,parser2) input = parser1 input handle Noparse => parser2 input; John Harrison Univ ersit y
  • f
Cam bridge, 5 F ebruary 1998
slide-9
SLIDE 9 In tro duction to F unctional Programming: Lecture 10 9 Auxiliary functions W e mak e some
  • f
these inx: infixr 8 ++; infixr 7 >>; infixr 6 ||; W e will use the follo wing general functions b elo w: fun itlist f [] b = b | itlist f (h::t) b = f h (itlist f t b); fun K x y = x; fun fst(x,y) = x; fun snd(x,y) = y; val explode = map str
  • explode;
John Harrison Univ ersit y
  • f
Cam bridge, 5 F ebruary 1998
slide-10
SLIDE 10 In tro duction to F unctional Programming: Lecture 10 10 A tomic parsers W e need a few primitiv e parsers to get us started. fun some p [] = raise Noparse | some p (h::t) = if p h then (h,t) else raise Noparse; fun a tok = some (fn item => item = tok); fun finished input = if input = [] then (0,input) else raise Noparse; The rst t w
  • accept
something satisfying p, and something equal to tok, resp ectiv ely . The last
  • ne
mak es sure there is no unpro cessed input. John Harrison Univ ersit y
  • f
Cam bridge, 5 F ebruary 1998
slide-11
SLIDE 11 In tro duction to F unctional Programming: Lecture 10 11 Lexical analysis First w e w an t to do lexical analysis, i.e. split the input c haracters in to tok ens. This can also b e done using
  • ur
com binators, together with a few c haracter discrimination functions. First w e declare the t yp e
  • f
tok ens: datatype token = Name
  • f
string | Num
  • f
string | Other
  • f
string; W e w an t the lexer to accept a string and pro duce a list
  • f
tok ens, ignoring spaces, e.g.
  • lex
"sin(x + y) * cos(2 * x + y)"; > val it = [Name "sin", Other "(", Name "x", Other "+", Name "y", Other ")", Other "*", Name "cos", Other "(", Num "2", Other "*", Name "x", Other "+", Name "y", Other ")"] : token list; John Harrison Univ ersit y
  • f
Cam bridge, 5 F ebruary 1998
slide-12
SLIDE 12 In tro duction to F unctional Programming: Lecture 10 12 Denition
  • f
the lexer val lex = let fun several p = many (some p) fun lowercase_letter s = "a" <= s andalso s <= "z" fun uppercase_letter s = "A" <= s andalso s <= "Z" fun letter s = lowercase_letter s
  • relse
uppercase_letter s fun alpha s = letter s
  • relse
s = "_"
  • relse
s = "'" fun digit s = "0" <= s andalso s <= "9" fun alphanum s = alpha s
  • relse
digit s fun space s = s = " "
  • relse
s = "\n"
  • relse
s = "\t" fun collect(h,t) = h^(itlist (fn s1 => fn s2 => s1^s2) t "") val rawname = some alpha ++ several alphanum >> (Name
  • collect)
val rawnumeral = some digit ++ several digit >> (Num
  • collect)
val rawother = some (K true) >> Other val token = (rawname || rawnumeral || rawother) ++ several space >> fst val tokens = (several space ++ many token) >> snd val alltokens = (tokens ++ finished) >> fst in fst
  • alltokens
  • explode
end; John Harrison Univ ersit y
  • f
Cam bridge, 5 F ebruary 1998
slide-13
SLIDE 13 In tro duction to F unctional Programming: Lecture 10 13 P arsing terms In
  • rder
to parse terms, w e start with some basic parsers for single tok ens
  • f
a particular kind: fun name (Name s::rest) = (s,rest) | name _ = raise Noparse; fun numeral (Num s::rest) = (s,rest) | numeral _ = raise Noparse; fun
  • ther
(Other s::rest) = (s,rest) |
  • ther
_ = raise Noparse; No w w e can dene a parser for terms, in a form v ery similar to the
  • riginal
grammar. The main dierence is that eac h pro duction rule has asso ciated with it some sort
  • f
sp ecial action to tak e as a result
  • f
parsing. John Harrison Univ ersit y
  • f
Cam bridge, 5 F ebruary 1998
slide-14
SLIDE 14 In tro duction to F unctional Programming: Lecture 10 14 The term parser (tak e 1) fun atom input = (name ++ a (Other "(") ++ termlist ++ a (Other ")") >> (fn (f,(_,(a,_))) => Fn(f,a)) || name >> (fn s => Var s) || numeral >> (fn s => Const s) || a (Other "(") ++ term ++ a (Other ")") >> (fst
  • snd)
|| a (Other "-") ++ atom >> snd) input and mulexp input = (atom ++ a(Other "*") ++ mulexp >> (fn (a,(_,m)) => Fn("*",[a,m])) || atom) input and term input = (mulexp ++ a(Other "+") ++ term >> (fn (a,(_,m)) => Fn("+",[a,m])) || mulexp) input and termlist input = (term ++ a (Other ",") ++ termlist >> (fn (h,(_,t)) => h::t) || term >> (fn h => [h])) input; John Harrison Univ ersit y
  • f
Cam bridge, 5 F ebruary 1998
slide-15
SLIDE 15 In tro duction to F unctional Programming: Lecture 10 15 Examples Let us pac k age ev erything up as a single parsing function: val parser = fst
  • (term
++ finished >> fst)
  • lex;
T
  • see
it in action, w e try with and without the prin ter (see ab
  • v
e) installed:
  • parser
"sin(x + y) * cos(2 * x + y)"; > val it = Fn("*", [Fn("sin", [Fn("+", [Var "x", Var "y"])]), Fn("cos", [Fn("+", [Fn("*", [Const "2", Var "x"]), Var "y"])])]) : term
  • installPP
print_term; > val it = () : unit
  • parser
"sin(x + y) * cos(2 * x + y)"; > val it = `sin(x + y) * cos(2 * x + y)` : term John Harrison Univ ersit y
  • f
Cam bridge, 5 F ebruary 1998
slide-16
SLIDE 16 In tro duction to F unctional Programming: Lecture 10 16 Automating precedence parsing W e can easily let ML construct the `xed-up' grammar from
  • ur
dynamic list
  • f
inxes: fun binop
  • pr
parser input = let val (result as (atom1,rest1)) = parser input in if rest1 <> [] andalso hd rest1 = Other
  • pr
then let val (atom2,rest2) = binop
  • pr
parser (tl rest1) in (Fn(opr,[atom1, atom2]),rest2) end else result end; fun findmin l = itlist (fn (p1 as (_,pr1)) => fn (p2 as (_,pr2)) => if pr1 <= pr2 then p1 else p2) (tl l) (hd l); fun delete x (h::t) = if h = x then t else h::(delete x t); fun precedence ilist parser input = if ilist = [] then parser input else let val
  • pp
= findmin ilist val ilist' = delete
  • pp
ilist in binop (fst
  • pp)
(precedence ilist' parser) input end; John Harrison Univ ersit y
  • f
Cam bridge, 5 F ebruary 1998
slide-17
SLIDE 17 In tro duction to F unctional Programming: Lecture 10 17 The term parser (tak e 2) No w the main parser is simpler and more general. fun atom input = (name ++ a (Other "(") ++ termlist ++ a (Other ")") >> (fn (f,(_,(a,_))) => Fn(f,a)) || name >> (fn s => Var s) || numeral >> (fn s => Const s) || a (Other "(") ++ term ++ a (Other ")") >> (fst
  • snd)
|| a (Other "-") ++ atom >> snd) input and term input = precedence (!infixes) atom input and termlist input = (term ++ a (Other ",") ++ termlist >> (fn (h,(_,t)) => h::t) || term >> (fn h => [h])) input; This will dynamically construct the precedence parser using the list
  • f
inxes activ e when it is actually used. No w the basic grammar is simpler. John Harrison Univ ersit y
  • f
Cam bridge, 5 F ebruary 1998
slide-18
SLIDE 18 In tro duction to F unctional Programming: Lecture 10 18 Bac ktrac king and repro cessing Some pro ductions for the same syn tactic category ha v e a common prex. Note that
  • ur
pro duction rules for ter m ha v e this prop ert y: ter m
  • !
name( ter ml ist) j name j
  • W
e carefully put the longer pro duction rst in
  • ur
actual implemen tation,
  • therwise
success in reading a name w
  • uld
cause the abandonmen t
  • f
attempts to read a paren thesized list
  • f
argumen ts. Ho w ev er, this bac ktrac king can lead to
  • ur
pro cessing the initial name t wice. This is not v ery serious here, but it could b e in termlist. John Harrison Univ ersit y
  • f
Cam bridge, 5 F ebruary 1998
slide-19
SLIDE 19 In tro duction to F unctional Programming: Lecture 10 19 An impro v ed treatmen t W e can easily replace: fun ... and termlist input = (term ++ a (Other ",") ++ termlist >> (fn (h,(_,t)) => h::t) || term >> (fn h => [h])) input; with let ... and termlist input = (term ++ many (a (Other ",") ++ term >> snd) >> (fn (h,t) => h::t)) input; This giv es another impro v emen t to the parser, whic h is no w more ecien t and sligh tly simpler. The nal v ersion is: John Harrison Univ ersit y
  • f
Cam bridge, 5 F ebruary 1998
slide-20
SLIDE 20 In tro duction to F unctional Programming: Lecture 10 20 The term parser (tak e 3) fun atom input = (name ++ a (Other "(") ++ termlist ++ a (Other ")") >> (fn (f,(_,(a,_))) => Fn(f,a)) || name >> (fn s => Var s) || numeral >> (fn s => Const s) || a (Other "(") ++ term ++ a (Other ")") >> (fst
  • snd)
|| a (Other "-") ++ atom >> snd) input and term input = precedence (!infixes) atom input and termlist input = (term ++ many (a (Other ",") ++ term >> snd) >> (fn (h,t) => h::t)) input; John Harrison Univ ersit y
  • f
Cam bridge, 5 F ebruary 1998
slide-21
SLIDE 21 In tro duction to F unctional Programming: Lecture 10 21 General remarks With care, this parsing metho d can b e used eectiv ely . It is a go
  • d
illustration
  • f
the p
  • w
er
  • f
higher
  • rder
functions. The co de
  • f
suc h a parser is highly structured and similar to the grammar, therefore easy to mo dify . Ho w ev er it is not as ecien t as LR parsers; ML-Y acc is capable
  • f
generating go
  • d
LR parsers automatically . Recursiv e descen t also has trouble with left r e cursion. F
  • r
example, if w e had w an ted to mak e the addition
  • p
erator left-asso ciativ e in
  • ur
earlier grammar, w e could ha v e used: ter m
  • !
ter m + mul exp j mul exp The naiv e transcription in to ML w
  • uld
lo
  • p
indenitely . Ho w ev er w e can
  • ften
replace suc h constructs with explicit rep etitions. John Harrison Univ ersit y
  • f
Cam bridge, 5 F ebruary 1998