Secure and Efgicient Parsing via Programming Language Theory Neel - - PowerPoint PPT Presentation

secure and efgicient parsing via programming language
SMART_READER_LITE
LIVE PREVIEW

Secure and Efgicient Parsing via Programming Language Theory Neel - - PowerPoint PPT Presentation

Secure and Efgicient Parsing via Programming Language Theory Neel Krishnaswami & Jeremy Yallop parsing & security types & algebras staging & speed speed & correctness e : < < e > > Parsing


slide-1
SLIDE 1

Secure and Efgicient Parsing via Programming Language Theory

Neel Krishnaswami & Jeremy Yallop

slide-2
SLIDE 2

parsing & security types & algebras Γ ⊢ e : τ staging & speed < < e > > speed & correctness ✓ ✓

slide-3
SLIDE 3

Parsing and security

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

parsing interpretation

slide-4
SLIDE 4

Parser combinators: appeal

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

simplicity: parsers are functions declarative: parsers resemble BNF

star :: Parser a → Parser [a] star p = ps ⊕ empty where empty = return [] ps = do x ← p xs ← star p return (x : xs) sexp = (lparen >> star sexp >> rparen) ⊕ atom sexp ::= LPAREN sexp* RPAREN | ATOM

slide-5
SLIDE 5

Parser combinators: pitfalls

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

complexity: exponential (or worse) declarative? not in practice

p ⊕ q ̸≡ p ⊕ q

(demonstration)

slide-6
SLIDE 6

Parser combinators: pitfalls

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

complexity: exponential (or worse) declarative? not in practice

p ⊕ q ̸≡ p ⊕ q

(demonstration)

slide-7
SLIDE 7

ASP and its aims

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

asp: a combinator library with an unusual combination of features Conventional Interface Unsurprising Semantics Guaranteed Determinism Competitive Performance

slide-8
SLIDE 8

ASP: interface

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

Abstract grammar interface (context-free expressions)

type α t val chr: char → char t val eps: unit t val seq: α t → β t → (α * β) t val bot: α t val alt: α t → α t → α t val fix: (α t → α t) → α t val map: (α → β) → α t → β t

User-defined functions

let option r = alt (map (fun _ → None) eps) (map (fun x → Some x) r) (also star, plus, infix, &c.)

Parsers from grammars

val parser: α t → (char Stream.t → α)

slide-9
SLIDE 9

ASP: interface

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness Real interface: arbitrary tokens

Abstract grammar interface (context-free expressions)

type α t val chr: char → char t val eps: unit t val seq: α t → β t → (α * β) t val bot: α t val alt: α t → α t → α t val fix: (α t → α t) → α t val map: (α → β) → α t → β t

User-defined functions

let option r = alt (map (fun _ → None) eps) (map (fun x → Some x) r) (also star, plus, infix, &c.)

Parsers from grammars

val parser: α t → (char Stream.t → α)

slide-10
SLIDE 10

ASP: interface

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness Real interface: arbitrary tokens

Abstract grammar interface (context-free expressions)

type α t val chr: char → char t val eps: unit t val seq: α t → β t → (α * β) t val bot: α t val alt: α t → α t → α t val fix: (α t → α t) → α t val map: (α → β) → α t → β t

User-defined functions

let option r = alt (map (fun _ → None) eps) (map (fun x → Some x) r) (also star, plus, infix, &c.)

Parsers from grammars

val parser: α t → (char Stream.t → α)

slide-11
SLIDE 11

ASP: interface

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness Real interface: arbitrary tokens

Abstract grammar interface (context-free expressions)

type α t val chr: char → char t val eps: unit t val seq: α t → β t → (α * β) t val bot: α t val alt: α t → α t → α t val fix: (α t → α t) → α t val map: (α → β) → α t → β t

User-defined functions

let option r = alt (map (fun _ → None) eps) (map (fun x → Some x) r) (also star, plus, infix, &c.)

Parsers from grammars

val parser: α t → (char Stream.t → α)

slide-12
SLIDE 12

ASP: interface

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness Real interface: arbitrary tokens

Imperative stream Abstract grammar interface (context-free expressions)

type α t val chr: char → char t val eps: unit t val seq: α t → β t → (α * β) t val bot: α t val alt: α t → α t → α t val fix: (α t → α t) → α t val map: (α → β) → α t → β t

User-defined functions

let option r = alt (map (fun _ → None) eps) (map (fun x → Some x) r) (also star, plus, infix, &c.)

Parsers from grammars

val parser: α t → (char Stream.t → α)

slide-13
SLIDE 13

ASP: interface

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness Real interface: arbitrary tokens

Imperative stream May fail! Abstract grammar interface (context-free expressions)

type α t val chr: char → char t val eps: unit t val seq: α t → β t → (α * β) t val bot: α t val alt: α t → α t → α t val fix: (α t → α t) → α t val map: (α → β) → α t → β t

User-defined functions

let option r = alt (map (fun _ → None) eps) (map (fun x → Some x) r) (also star, plus, infix, &c.)

Parsers from grammars

val parser: α t → (char Stream.t → α)

slide-14
SLIDE 14

accepted or rejected?

alt (map (fun _ → 1) (chr 'a')) (map (fun _ → 2) (chr 'b')) seq (chr 'a') (option (chr 'b')) alt (map (fun _ → 1) (chr 'a')) (map (fun _ → 2) (chr 'a')) seq (option (chr 'a')) (option (chr 'a'))

slide-15
SLIDE 15

accepted or rejected?

alt (map (fun _ → 1) (chr 'a')) (map (fun _ → 2) (chr 'b')) seq (chr 'a') (option (chr 'b')) alt (map (fun _ → 1) (chr 'a')) (map (fun _ → 2) (chr 'a')) seq (option (chr 'a')) (option (chr 'a'))

slide-16
SLIDE 16

accepted or rejected?

alt (map (fun _ → 1) (chr 'a')) (map (fun _ → 2) (chr 'b')) seq (chr 'a') (option (chr 'b')) alt (map (fun _ → 1) (chr 'a')) (map (fun _ → 2) (chr 'a')) seq (option (chr 'a')) (option (chr 'a'))

slide-17
SLIDE 17

accepted or rejected?

alt (map (fun _ → 1) (chr 'a')) (map (fun _ → 2) (chr 'b')) seq (chr 'a') (option (chr 'b')) alt (map (fun _ → 1) (chr 'a')) (map (fun _ → 2) (chr 'a')) seq (option (chr 'a')) (option (chr 'a'))

disjunctive non-determinismX

slide-18
SLIDE 18

accepted or rejected?

alt (map (fun _ → 1) (chr 'a')) (map (fun _ → 2) (chr 'b')) seq (chr 'a') (option (chr 'b')) alt (map (fun _ → 1) (chr 'a')) (map (fun _ → 2) (chr 'a')) seq (option (chr 'a')) (option (chr 'a'))

disjunctive non-determinismX

slide-19
SLIDE 19

accepted or rejected?

alt (map (fun _ → 1) (chr 'a')) (map (fun _ → 2) (chr 'b')) seq (chr 'a') (option (chr 'b')) alt (map (fun _ → 1) (chr 'a')) (map (fun _ → 2) (chr 'a')) seq (option (chr 'a')) (option (chr 'a'))

disjunctive non-determinismX

slide-20
SLIDE 20

accepted or rejected?

alt (map (fun _ → 1) (chr 'a')) (map (fun _ → 2) (chr 'b')) seq (chr 'a') (option (chr 'b')) alt (map (fun _ → 1) (chr 'a')) (map (fun _ → 2) (chr 'a')) seq (option (chr 'a')) (option (chr 'a'))

disjunctive non-determinismX

slide-21
SLIDE 21

accepted or rejected?

alt (map (fun _ → 1) (chr 'a')) (map (fun _ → 2) (chr 'b')) seq (chr 'a') (option (chr 'b')) alt (map (fun _ → 1) (chr 'a')) (map (fun _ → 2) (chr 'a')) seq (option (chr 'a')) (option (chr 'a'))

(also reject: lefu recursion, non-lefu-factored)

disjunctive non-determinismX

sequential non-determinismX

slide-22
SLIDE 22

accepted or rejected? Plan: use a type system to decide

alt (map (fun _ → 1) (chr 'a')) (map (fun _ → 2) (chr 'b')) seq (chr 'a') (option (chr 'b')) alt (map (fun _ → 1) (chr 'a')) (map (fun _ → 2) (chr 'a')) seq (option (chr 'a')) (option (chr 'a'))

(also reject: lefu recursion, non-lefu-factored)

disjunctive non-determinismX

sequential non-determinismX

slide-23
SLIDE 23

context-free expressions

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness Context-free expressions (CFEs) g ::= ⊥ | g ∨ g′ | ϵ | c | g · g′ | x | µx. g Semantics of CFEs ⊥ γ = ∅ g ∨ g′ γ = g γ ∪ g′ γ ϵ γ = {ε} c γ = {c} g · g′ γ = {w · w′ | w ∈ g γ ∧ w′ ∈ g′ γ} x γ = γ(x) µx. g γ = fix(λX. g (γ, X/x)) fix(f) = ∪

i∈N

Li where L0 = ∅ Ln+1 = f(Ln)

slide-24
SLIDE 24

ASP: equations

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness CFEs form an idempotent semiring g1 ∨ (g2 ∨ g3) = (g1 ∨ g2) ∨ g3 g ∨ g′ = g′ ∨ g g ∨ ⊥ = g g ∨ g = g g1 · (g2 · g3) = (g1 · g2) · g3 g · ϵ = g (g1 ∨ g2) · g = (g1 · g) ∨ (g2 · g) g · (g1 ∨ g2) = (g · g1) ∨ (g · g2) g · ⊥ = ⊥ ⊥ · g = ⊥ (along with some equations for µ)

slide-25
SLIDE 25

ASP: types

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

Types for languages Types τ ∈ { NULL : 2; FIRST : P(Σ); FLAST : P(Σ) } τ1 ∨ τ2 =    NULL = τ1.NULL ∨ τ2.NULL FIRST = τ1.FIRST ∪ τ2.FIRST FLAST = τ1.FLAST ∪ τ2.FLAST Type predicates τ1 # τ2 ≜ (τ1.FIRST ∩ τ2.FIRST = ∅) ∧ ¬(τ1.NULL ∧ τ2.NULL) Properties of types If L | = τ and M | = τ ′ and τ # τ ′, then L ∪ M | = τ ∨ τ ′.

slide-26
SLIDE 26

ASP: type system

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

Syntactic type system for LL(1) grammars Γ; ∆ ⊢ g : τ Γ; ∆ ⊢ g′ : τ ′ τ # τ ′ Γ; ∆ ⊢ g ∨ g′ : τ ∨ τ ′ Semantic soundness If Γ; ∆ ⊢ g : τ and γ | = Γ and δ | = ∆ then g (γ, δ) | = τ Type inference No type annotations needed (even for fixed points)

slide-27
SLIDE 27

ASP: parsing with types

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

A simple parsing algorithm P (Γ; ∆ ⊢ g : τ) ∈ Env(Γ) → Env(∆) → Σ∗ ⇀ Σ∗ P (Γ; ∆ ⊢ g ∨ g′ : τ ∨ τ ′) ˆ γ ˆ δ [] = { [] when (τ ∨ τ ′).NULL fail

  • therwise

P (Γ; ∆ ⊢ g ∨ g′ : τ ∨ τ ′) ˆ γ ˆ δ ((c :: _) as s) =            P (Γ; ∆ ⊢ g : τ) ˆ γ ˆ δ s when c ∈ τ.FIRST

  • r τ.NULL ∧ c ̸∈ (τ ∨ τ ′).FIRST

P (Γ; ∆ ⊢ g′ : τ ′) ˆ γ ˆ δ s when c ∈ τ ′.FIRST

  • r τ ′.NULL ∧ c ̸∈ (τ ∨ τ ′).FIRST

fail

  • therwise

The parsing algorithm is sound and complete i.e. it parses exactly the words of the language.

slide-28
SLIDE 28

ASP: linear-time guarantee

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

Guarantee: well-typed parsers don’t back-track

input size run time

linear-time

slide-29
SLIDE 29

ASP: linear-time guarantee

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

Guarantee: well-typed parsers don’t back-track

input size run time

linear-time

slide-30
SLIDE 30

Speed: combinators vs yacc

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

input size run time

linear-time also linear-time How can we close the gap? Staging

slide-31
SLIDE 31

Speed: combinators vs yacc

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

input size run time

linear-time also linear-time How can we close the gap? Staging

slide-32
SLIDE 32

Speed: combinators vs yacc

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

input size run time

linear-time also linear-time How can we close the gap? Staging

slide-33
SLIDE 33

Speed: combinators vs yacc

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

input size run time

linear-time also linear-time How can we close the gap? Staging

slide-34
SLIDE 34

staging removes overhead

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

Parser combinators abstract over the grammar Abstraction carries a performance penalty abstraction = ⇒ overhead Use staging to specialize code once grammar is known Delay (quote) code that accesses the input stream ≪ peek stream == ’a’ ≫ Evaluate code that depends only on the grammar abstraction + staging = ⇒ no overhead

slide-35
SLIDE 35

staging removes overhead

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

Parser combinators abstract over the grammar Abstraction carries a performance penalty abstraction = ⇒ overhead Use staging to specialize code once grammar is known Delay (quote) code that accesses the input stream ≪ peek stream == ’a’ ≫ Evaluate code that depends only on the grammar abstraction + staging = ⇒ no overhead

slide-36
SLIDE 36

less-naive staging

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

Binding-time improvements turn dynamic terms static

if peek stream == 'a' then e peek (fun c → if c == 'a' then e) stream CPS-convert f c match c with | 'a' → f 'a' | 'b' → f 'b' | 'c' → f 'c' | . . . match c with | 'a' → f 'a' | _ → f 'c' c is a char prune using types

slide-37
SLIDE 37

less-naive staging

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

Binding-time improvements turn dynamic terms static

if peek stream == 'a' then e peek (fun c → if c == 'a' then e) stream CPS-convert f c match c with | 'a' → f 'a' | 'b' → f 'b' | 'c' → f 'c' | . . . match c with | 'a' → f 'a' | _ → f 'c' c is a char prune using types

slide-38
SLIDE 38

less-naive staging

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

Binding-time improvements turn dynamic terms static

if peek stream == 'a' then e peek (fun c → if c == 'a' then e) stream CPS-convert f c match c with | 'a' → f 'a' | 'b' → f 'b' | 'c' → f 'c' | . . . match c with | 'a' → f 'a' | _ → f 'c' c is a char prune using types

slide-39
SLIDE 39

less-naive staging

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

Binding-time improvements turn dynamic terms static

if peek stream == 'a' then e peek (fun c → if c == 'a' then e) stream CPS-convert f c match c with | 'a' → f 'a' | 'b' → f 'b' | 'c' → f 'c' | . . . match c with | 'a' → f 'a' | _ → f 'c' c is a char prune using types

slide-40
SLIDE 40

less-naive staging

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

Binding-time improvements turn dynamic terms static

if peek stream == 'a' then e peek (fun c → if c == 'a' then e) stream CPS-convert f c match c with | 'a' → f 'a' | 'b' → f 'b' | 'c' → f 'c' | . . . match c with | 'a' → f 'a' | _ → f 'c' c is a char prune using types

slide-41
SLIDE 41

less-naive staging

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

Binding-time improvements turn dynamic terms static

if peek stream == 'a' then e peek (fun c → if c == 'a' then e) stream CPS-convert f c match c with | 'a' → f 'a' | 'b' → f 'b' | 'c' → f 'c' | . . . match c with | 'a' → f 'a' | _ → f 'c' c is a char prune using types

slide-42
SLIDE 42

less-naive staging

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

Binding-time improvements turn dynamic terms static

if peek stream == 'a' then e peek (fun c → if c == 'a' then e) stream CPS-convert f c match c with | 'a' → f 'a' | 'b' → f 'b' | 'c' → f 'c' | . . . match c with | 'a' → f 'a' | _ → f 'c' c is a char prune using types

slide-43
SLIDE 43

less-naive staging

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

Binding-time improvements turn dynamic terms static

if peek stream == 'a' then e peek (fun c → if c == 'a' then e) stream CPS-convert f c match c with | 'a' → f 'a' | 'b' → f 'b' | 'c' → f 'c' | . . . match c with | 'a' → f 'a' | _ → f 'c' c is a char prune using types

slide-44
SLIDE 44

less-naive staging

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

Binding-time improvements turn dynamic terms static

if peek stream == 'a' then e peek (fun c → if c == 'a' then e) stream CPS-convert f c match c with | 'a' → f 'a' | 'b' → f 'b' | 'c' → f 'c' | . . . match c with | 'a' → f 'a' | _ → f 'c' c is a char prune using types

slide-45
SLIDE 45

faster than yacc!

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

arith sexp 20 40 Throughput (MB/s) unstaged yacc staged

slide-46
SLIDE 46

Future: heterogeneous staging

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness Homogeneity limits both guarantees and performance low-level (monomorphic, efgicient) high-level (powerful types & abstractions) language expressiveness host target homogeneous host target h e t e r

  • g

e n e

  • u

s

slide-47
SLIDE 47

Future: verified staging

parsing & security types & ⊢ algebras ≡ staging & speed < <e> > speed & ✓ ✓ correctness

Challenge: verifying a staged program

_;_ : Statement → Statement → Statement … ⇝-assignment : E ⊢ e ⇒ v → S (x := e) k E ⇝ S nop k (x → v , E) … seq : Parser α → Parser α → Parser α … assoc: seq p (seq q r) ∼ = seq (seq p q) r) …

Code Logic C Parsers

A typed interface to C Reduction semantics Parser operations Parser properties

slide-48
SLIDE 48

parsing & security types & algebras Γ ⊢ e : τ staging & speed < < e > > speed & correctness ✓ ✓