The Parsley Data Description Language Prashanth Mundkur 1 Linda - - PowerPoint PPT Presentation

the parsley data description language
SMART_READER_LITE
LIVE PREVIEW

The Parsley Data Description Language Prashanth Mundkur 1 Linda - - PowerPoint PPT Presentation

Motivation Parsley Design Examples Ongoing Work The Parsley Data Description Language Prashanth Mundkur 1 Linda Briesemeister 1 Natarajan Shankar 1 Prashant Anantharaman 2 Sameed Ali 2 Zephyr Lucas 2 Sean Smith 2 1 SRI International 2 Dartmouth


slide-1
SLIDE 1

Motivation Parsley Design Examples Ongoing Work

The Parsley Data Description Language

Prashanth Mundkur1 Linda Briesemeister1 Natarajan Shankar1 Prashant Anantharaman2 Sameed Ali2 Zephyr Lucas2 Sean Smith2

1SRI International 2Dartmouth College

May 21, 2020

1 / 19

slide-2
SLIDE 2

Motivation Parsley Design Examples Ongoing Work

Problem

◮ What kinds of formal grammars can capture data formats?

◮ ELF, OpenDocument, PDF

◮ How can untrusted data be parsed securely?

◮ e.g. prevent parser exploits that manipulate parsing offsets

2 / 19

slide-3
SLIDE 3

Motivation Parsley Design Examples Ongoing Work

Limitations of Typical Formal Grammars

◮ Need for handling long-range context dependency

◮ e.g. graph structures in document formats

◮ Very limited or no interface to parsing buffer

◮ e.g. offset manipulations, buffer windowing, transformations

3 / 19

slide-4
SLIDE 4

Motivation Parsley Design Examples Ongoing Work

Issue: Handling Context Sensitivity

◮ Parsing and semantic actions can be controlled by context ◮ Distant context may need to be threaded through to current parsing operation ◮ Need a bounded mechanism to thread just the relevant context ◮ Relevant context can require structured datatypes

4 / 19

slide-5
SLIDE 5

Motivation Parsley Design Examples Ongoing Work

Issue: No First-class Parsing Buffers

◮ Binary data often specify offsets, lengths, etc.

◮ Parsing cursor locations are data-dependent

◮ Network packet formats often require checksums

◮ Requires windowed views into parsing buffer

◮ Data can often be optionally compressed, encrypted, etc.

◮ Requires controlled transformations of windowed views into the parsing buffer

5 / 19

slide-6
SLIDE 6

Motivation Parsley Design Examples Ongoing Work

Design Goal: Adding Computation to Syntax

We are looking for a DDL that incorporates the computation needed ◮ to store and retrieve contextual information ◮ to evaluate contextual constraints ◮ to operate on parsing buffers

6 / 19

slide-7
SLIDE 7

Motivation Parsley Design Examples Ongoing Work

Other Design Goals

◮ Automatic extraction of parsers ◮ Construction of proofs-of-parse to validate correctness (Blaudeau, Shankar CPP 2020)

7 / 19

slide-8
SLIDE 8

Motivation Parsley Design Examples Ongoing Work

Parsley Design

◮ Composition of conventional techniques

◮ Parsing expression grammars (PEG)

◮ for determinism and unambiguity

◮ Attribute grammars

◮ for structured handling of context-sensitivity ◮ specifying semantic actions

◮ Functional languages

◮ for type-safe computation of constraints and semantic actions

◮ Capture syntax using PEG and attribute grammars ◮ Capture computation using a functional language ◮ Capture contextual and computational state using attributes

8 / 19

slide-9
SLIDE 9

Motivation Parsley Design Examples Ongoing Work

Overall Parsley Architecture

format.ply in Parsley DDL Focus of this talk Parsley Compiler PVS Prover Safety format-parser.so generated parser format.info generated properties app.exe Application code Untrusted data Parsed Representation Independent Verifier Validity

9 / 19

slide-10
SLIDE 10

Motivation Parsley Design Examples Ongoing Work

Sublanguages of the Parsley DDL

◮ Parsing expression grammar for production rules

◮ Specified in notation similar to EBNF NonTerm := rule1 ; rule2 ; . . . ;;

◮ Typed functional expression language

◮ Used for constraint expressions

[guard1(args) && (guard2(args) || guard3(args))]

◮ and semantic actions

{let val := expr(args) in nt.syn attr := val}

10 / 19

slide-11
SLIDE 11

Motivation Parsley Design Examples Ongoing Work

Grammar sublanguage

◮ Standard attribute system

◮ NonTerm (inh attr:type) := rules Communicates context-sensitive information using inherited attributes ◮ NonTerm {syn attr:type} := rules Stores information computed during local parsing in synthesized attributes

◮ Boolean constraints can guard parsing progress of a rule

. . . := prev rule ;...[bool expr(attrs)]...; next rule ◮ Expressed in terms of the computed attributes in scope ◮ Rewinds on guard failure to the last ordered choice (;) and continues with the next choice

◮ Standard library contains basic primitives like character classes

[:ascii:], [:alphanum:]

11 / 19

slide-12
SLIDE 12

Motivation Parsley Design Examples Ongoing Work

Expression sublanguage

◮ Simple typed functional language with user-defined types and functions

fun f(a: arg type) -> result type = {...}

◮ Types of grammar non-terminals are represented by records with synthesized attributes as field names

type non term type = {attrname: type}

◮ Standard library contains basic functions such as conversions

  • f parsed data into typed values, and basic data structures like

lists, sets, and maps

string to int(), byte to int(), . . . List.append, Map.extend, Set.add, . . .

12 / 19

slide-13
SLIDE 13

Motivation Parsley Design Examples Ongoing Work

PDF Examples

◮ Null Null := "null" ;; ◮ Comments Comment := "%" ([:char:] \ "\n")* "\n" ;; ◮ Booleans Boolean b {val : bool} := "true" { b.val := true } ; "false" { b.val := false } ;; ◮ Name object Name n {val : string} := "/" s=([:alphanum:]+) { n.val := normalize_name(s) } ;;

13 / 19

slide-14
SLIDE 14

Motivation Parsley Design Examples Ongoing Work

PDF Examples

◮ Context-sensitive Whitespace Whitespace (allow_empty : bool) := [allow_empty] // boolean constraint (" " | "\0" | "\t" | "\r" | "\n" | "\x0c" // form-feed | Comment )* ; [!allow_empty] (" " | "\0" | "\t" | "\r" | "\n" | "\x0c" // form-feed | Comment )+ ;;

14 / 19

slide-15
SLIDE 15

Motivation Parsley Design Examples Ongoing Work

Tagged-Length-Value (TLV) constructs

type tlv = TTL

  • f int

| Port of int | Unknown of byte * [byte] fun ttl_to_int(b : [byte; 1]) -> int = { byte_to_int(b[0]) } fun port_to_int(b : [byte; 2]) -> int = { 256* byte_to_int(b[1]) * byte_to_int(b[0]) } TLV t { val: tlv } := tg=([:byte:]) l=([:byte:]) v=([[:byte:] * byte_to_int(l)]) ( [tg = 1] [l = 1] { t.val := tlv::TTL(ttl_to_int(v)) } | [tg = 2] [l = 2] { t.val := tlv::Port(port_to_int(v)) } | { t.val := tlv::Unknown(tg, v) } )

15 / 19

slide-16
SLIDE 16

Motivation Parsley Design Examples Ongoing Work

ELF Fragment

type elf_file_hdr = {ph_offset : int} ELF_File := e=ELF_File_Hdr [e.ph_offset < ParseBuffer.size($buffer)] @(ParseBuffer.start($buffer) + e.ph_offset) p=ELF_Prog_Hdr ...

16 / 19

slide-17
SLIDE 17

Motivation Parsley Design Examples Ongoing Work

Design challenges

◮ Backtracking in ordered choice requires rewinding attribute updates

◮ This is easier with L-attributed grammars

◮ Defining how backtracking composes with operations on the parsing buffer

17 / 19

slide-18
SLIDE 18

Motivation Parsley Design Examples Ongoing Work

Acknowledgements

This work was supported by DARPA under agreement number

  • HR001119C0075. The views and conclusions contained herein are

those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed

  • r implied, of DARPA or the U.S. Government.

18 / 19

slide-19
SLIDE 19

Motivation Parsley Design Examples Ongoing Work

Questions?

Thank you!

19 / 19