parsing
play

Parsing package docs: Part III: Using the ReadP package - PowerPoint PPT Presentation

On to ReadP ReadP A small, but fairly complete parsing package (shipped with GHC) Parsing package docs: Part III: Using the ReadP package http://hackage.haskell.org/package/base-4.12.0.0/docs/ Text-ParserCombinators-ReadP.html


  1. On to ReadP • ReadP • A small, but fairly complete parsing package (shipped with GHC) Parsing • package docs: Part III: Using the ReadP package http://hackage.haskell.org/package/base-4.12.0.0/docs/ Text-ParserCombinators-ReadP.html • Parsec • A bigger more complete parsing package Jim Royer • Unlike ReadP, it can handle errors in an OK fashion. April 9, 2019 • package docs: http://hackage.haskell.org/package/parsec CIS 352 • The Parsec page on the Haskell Wiki: https://wiki.haskell.org/Parsec 1/22 2/22 Primitives Repeated from Hutton’s Parser.hs First Examples getLetter, openClose :: Parser Char • get :: ReadP Char getLetter = satisfy isLetter Consumes and returns the next character. Fails on an empty input. • getLetter • (<++) :: ReadP a -> ReadP a -> ReadP a openClose = do { char ’(’ parses the language ( +++ means something else in ReadP.) Equivalent to Hutton’s +++ . ; char ’)’ { a , b , . . . , z , A , B , . . . , Z } . } • pfail :: ReadP a Equivalent to Hutton’s fail . • openClose anbn :: Parser () • satisfy :: (Char -> Bool) -> ReadP Char parses the language { () } . anbn = do { char ’a’ Equivalent to Hutton’s sat . • anbn ; anbn • char :: Char -> ReadP Char parses the language { a n b n n ≥ 0 } ? ; char ’b’ Same as in Hutton’s (Actually, there are problems.) ; return () • string :: String -> ReadP String } Same as in Hutton’s <++ return () 3/22 4/22

  2. Digression: Running your parser Two Handy Definitions • readP to S :: ReadP a -> String -> [(a,String)] (readP to S p str) runs parser p on str and returns the results. parse :: ReadP a -> String -> [(a,String)] samples.hs parse = readP_to_S After loading samples.hs . . . *Main> readP to S openClose "()" parseWith :: ReadP a -> String -> a sample, openClose :: ReadP Char [(’)’,"")] sample = satify isLetter parseWith p s = case [a | (a,t) <- parse p s, all isSpace t] of *Main> readP to S openClose "(]" openClose [] [a] -> a . = do { char ’(’ ; char ’)’ } . [] -> error "no parse" . . . _ -> error "ambiguous parse" . In our parser files, we’ll usually introduce the alias parse = readP to S 5/22 6/22 ReadP’s (+++) (+++) versus (<++) • (+++) :: ReadP a -> ReadP a -> ReadP a (p1 +++ p2) runs parses p1 and p2 “in parallel” and returns the list of results. When we mix (+++) and recursion, things get interesting. (Not the same as Hutton’s (+++) !) as1, as2 :: ReadP String After loading samples.hs Recall that (p1 <++ p2) trys p1 , and if that fails, trys p2 . as1 = do { c <- char ’a’ *Main> parse as1 "aaaxxx" Examples ; cs <- as1 [("","aaaxxx"), *Main> parse (string "ask" +++ string "as") "ask him" ; return (c:cs) ("a","aaxxx"), } [("as","k him"),("ask"," him")] ("aa","axxx"), +++ return "" ("aaa","xxx")] *Main> parse (string "ask" <++ string "as") "ask him" [("ask"," him")] as2 = same as as1 but with <++ . *Main> parse as2 "aaaxxx" . . [("aaa","xxx")] . *Main> parse (string "as" <++ string "ask") "ask him" [("as","k him")] 7/22 8/22

  3. Primitives beyond Hutton’s, munch, munch1 Parsing Primitives beyond Hutton’s, munch, munch1 • many :: (ReadP a) -> (ReadP [a]) Parses zero or more occurrences of the given parser 2019-04-09 • many1 :: (ReadP a) -> (ReadP [a]) Parses one or more occurrences of the given parser • munch, munch1 :: (Char -> Bool) -> ReadP String (munch tst) is a greedy variant of (many (satisfy tst)) . For example: • many :: (ReadP a) -> (ReadP [a]) > parse (many (char ’a’)) "aaaa" [("","aaaa"), ("a","aaa"), ("aa","aa"), Primitives beyond Hutton’s, munch, munch1 ("aaa","a"), ("aaaa","")] > parse (munch (==’a’)) "aaaa" [("aaaa","")] Parses zero or more occurrences of the given parser • many1 :: (ReadP a) -> (ReadP [a]) • Greedy ≈ parses as much of the string as possible. Parses one or more occurrences of the given parser • munch and munch1 use (<++) . • munch, munch1 :: (Char -> Bool) -> ReadP String • many and many1 use (+++) . (munch tst) is a greedy variant of (many (satisfy tst)) . For example: > parse (many (char ’a’)) "aaaa" [("","aaaa"), ("a","aaa"), ("aa","aa"), ("aaa","a"), ("aaaa","")] > parse (munch (==’a’)) "aaaa" [("aaaa","")] 9/22 Adding Semantics, An Example A Few Combinators, 1 nesting :: Parser Int Things to look up in the ReadP docs: nesting = do { char ’(’ ; n <- nesting • skipMany (and friends) ; char ’)’ • between ; m <- nesting • sepBy (and friends) ; return (max (n+1) m) } • endBy (and friends) +++ return 0 URL: https://hackage.haskell.org/package/base-4.11.0.0/docs/ Text-ParserCombinators-ReadP.html [Try (parse nesting "(())") , (parse nesting "()((()())())") , etc.] 10/22 11/22

  4. A Few Combinators, 2 Simple sentence parsing Simple sentence parsing (continued) word :: ReadP String word = munch1 isLetter sentence :: ReadP [String] sentence oneOf :: [Char] -> ReadP Char = do { words <- sepBy1 oneOf cs word = choice [char c | c <- cs] separator Parsing CSV Files ; oneOf ".?!" separator :: ReadP () ; return words separator } = skipMany1 (oneOf " ,") *Main> parse sentence "traffic lights are red, blue, and green." ["traffic","lights","are","red","blue","and","green"] 12/22 A CSV parser (from Real World Haskell ) A Grammar for CSV CSV: Comma-separated values � file � :: = � line � ∗ A simple file format used by spreadsheets and databases. � line � :: = (( � cell � , ) ∗ � cell � ) ? � newline � See: http://en.wikipedia.org/wiki/Comma-separated_values � cell � :: = � character � + | � quotedCell � A sample � quotedCell � :: = ” � quotedChar � ∗ ” Year , Make , Model , Description , Price 1997 , Ford , E350 , "ac, abs, moon" , 3000.00 � quotedChar � :: = � notQuote � | ”” 1999 , Chevy , "Venture ""Extended Edition""" , "" , 4900.00 1999 , Chevy , "Venture ""Extended Edition, Very Large""","" , 5000.00 � notQuote � :: = everything but ” 1996 , Jeep , Grand Cherokee , "MUST SELL! � newline � :: = \ n \ r | \ r \ n | \ n | \ r air, moon roof, loaded" , 4799.00 � character � :: = a | b | . . . • Commas separate “cells”. Note: A ? • Unquoted commas are in red. ≡ A | ǫ ≡ 0 or 1 copies of A • Inside quoted text "" is a quoted quote. [Stage direction: Copy the grammar to the board.] • Lines normally end with a newline, but quoted text can cross line boundries. 13/22 14/22

  5. A parser for CSV, 1 A parser for CSV, 2 � file � :: = � line � ∗ � newline � :: = \ n \ r | \ r \ n | \ n | \ r � cell � :: = � character � + | � quotedCell � � character � :: = a | b | . . . csvFile :: ReadP [[String]] line :: ReadP [String] csvFile = endBy line eol line = sepBy cell (char ’,’) eol :: ReadP String cell :: ReadP String eol = (string " \ n \ r") cell = quotedCell <++ (string " \ r \ n") <++ munch (‘notElem‘ ", \ n \ r") <++ (string " \ n") <++ (string " \ r") 15/22 16/22 A parser for CSV, 3 A parser for CSV, 4 All on one page cell :: ReadP String � quotedCell � :: = ” � quotedChar � ∗ ” cell = csvFile :: ReadP [[String]] � quotedChar � :: = � notQuote � | ”” quotedCell csvFile = endBy line eol <++ munch (‘notElem‘ ", \ n \ r") � notQuote � :: = everything but ” line :: ReadP [String] quotedCell :: ReadP String line = sepBy cell (char ’,’) quotedCell = between ( char ’"’) quotedCell :: ReadP String eol :: ReadP String ( char ’"’) quotedCell = between (char ’"’) eol = ( string " \ n \ r") ( many quotedChar) (char ’"’) <++ ( string " \ r \ n") quotedChar :: ReadP Char (many quotedChar) <++ ( string " \ n") quotedChar = <++ ( string " \ r") quotedChar :: ReadP Char satisfy (/= ’"’) quotedChar = +++ ( string " \ " \ "" >> return ’"’) satisfy (/= ’"’) +++ (string " \ " \ "" >> return ’"’) Parser combinators (other than <++ and +++ ) are in bold . 17/22 18/22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend