Statically Typed String Sanitation Inside a Python Nathan Fulton - - PowerPoint PPT Presentation

statically typed string sanitation inside a python
SMART_READER_LITE
LIVE PREVIEW

Statically Typed String Sanitation Inside a Python Nathan Fulton - - PowerPoint PPT Presentation

Statically Typed String Sanitation Inside a Python Nathan Fulton Cyrus Omar Jonathan Aldrich The Problem Applications use strings to build SQL commands sql_exec("SELECT * FROM users WHERE" + "username = " + input1 +


slide-1
SLIDE 1

Statically Typed String Sanitation Inside a Python

Nathan Fulton Cyrus Omar Jonathan Aldrich

slide-2
SLIDE 2

The Problem

Applications use strings to build SQL commands sql_exec("SELECT * FROM users WHERE" + "username = " + input1 + " AND " + "password = " + input2)

01

slide-3
SLIDE 3

The Problem

Applications use strings to build HTML commands print("You searched for: " + keyword)

02

slide-4
SLIDE 4

The Problem

Applications use strings to build JS commands print("<script>" + "document.getElementById(" + "‘" + input + "‘" + ")" + "..." + "</script>")

03

slide-5
SLIDE 5

The Problem

Applications use strings to build shell commands call("cat " + input)

04

slide-6
SLIDE 6

Arbitrary strings are dangerous.

05

slide-7
SLIDE 7

Existing Solutions

  • Web Frameworks

06

slide-8
SLIDE 8

Existing Solutions

  • Web Frameworks

○ may contain bugs

07

slide-9
SLIDE 9

Existing Solutions

  • Web Frameworks

○ may contain bugs

  • Prepared Statements

08

slide-10
SLIDE 10

Existing Solutions

“Drupal is an open source content management platform powering millions of websites… During a code audit of Drupal extensions for a customer an SQL Injection was found in the way the Drupal core handles prepared statements. A malicious user can inject arbitrary SQL queries… This leads to a code execution as well.”

  • Stefan Horst, 6 days ago

09

slide-11
SLIDE 11

Existing Solutions

  • Web Frameworks

○ may contain bugs

  • Prepared Statements

○ may contain bugs

10

slide-12
SLIDE 12

Existing Solutions

  • Web Frameworks

○ may contain bugs

  • Prepared Statements

○ may contain bugs

  • Problem specific parsers

11

slide-13
SLIDE 13

Existing Solutions

“Three of our Sports API servers had malicious code executed on them… This mutation happened to exactly fit a command injection bug in a monitoring script our Sports team was using at that moment to parse and debug their web logs.”

  • Alex Stamos (Yahoo! CISO), two weeks ago

12

slide-14
SLIDE 14

Existing Solutions

  • Web Frameworks

○ may contain bugs

  • Prepared Statements

○ may contain bugs

  • Problem specific parsers

○ may contain bugs

13

slide-15
SLIDE 15

The Goal: A general approach for specifying and verifying input sanitation procedures, with a minimal trusted core.

14

slide-16
SLIDE 16

Arbitrary strings are dangerous. Static reasoning about strings is easy!

15

slide-17
SLIDE 17

Regular Expression Types

Python, Java, etc: string Lambda RS: string[regex]

16

slide-18
SLIDE 18

Contributions

  • Regular Expression Types corresponding to

common string and regex library operations.

  • Translation into a language with a bare

string type. Together, these define a type system extension which is implemented in the extensible programming language atlang.

17

slide-19
SLIDE 19

Typing Rule for String Literals

If:

  • s in a string in the language of r

Then:

  • rstr[s] has type stringin[r].

18

slide-20
SLIDE 20

Typing Rule for String Literals

19

slide-21
SLIDE 21

The Security Theorem

If e has type stringin[r], then e evaluates to a string (denoted rstr[s]) such that s ∈ L(r).

20

slide-22
SLIDE 22

"""this function will remove quotes.""" def sanitize(s : string): s //TODO def get_user(u : string): sql_exec("select * from users where " + "username = '" + u + "'")

21

slide-23
SLIDE 23

"""this function will remove quotes.""" def sanitize(s : string): s //TODO def get_user(u : string): sql_exec("select * from users where " + "username = '" + u + "'") x = "';DELETE FROM users--"

get_user(sanitize(x))

22

slide-24
SLIDE 24

"""this function will remove quotes.""" def sanitize(s : string): s //TODO def get_user(u : string[!']): sql_exec("select * from users where " + "username = '" + u + "'") x = "';DELETE FROM users--"

get_user(sanitize(x)) ^ type error! L(.*) is not in L(!')

23

slide-25
SLIDE 25

"""this function will remove quotes.""" def sanitize(s : string) -> stringin[!']: s.replace(r"'", "") def get_user(u : string[!']): sql_exec("select * from users where " + "username = '" + u + "'") x = "';DELETE FROM users--"

get_user(sanitize(x)) ^ OK!

24

slide-26
SLIDE 26

Regular Expressions

r ::= a | r·r | r ++ r | r*

25

slide-27
SLIDE 27

Regular Languages

r ::= a | r·r | r ++ r | r* L(psp) = {psp} L(ps*p) = {pp, psp, pssp, psssp, ...} L(a ++ b) = {a, b}

26

slide-28
SLIDE 28

Regexes as Specs

Often Unstated Specifications: !'

27

slide-29
SLIDE 29

Regexes as Specs

Often Unstated Specifications: !' (a|b|c|...)*

28

slide-30
SLIDE 30

Regexes as Implementations

Often Unstated Specifications: !' (a|b|c|...)* Implementations: replace(!’, "", input)

29

slide-31
SLIDE 31

Unstated Assertion: implementation meets specification.

30

slide-32
SLIDE 32

Construct Abstract Syntax A Python Concat rconcat(e1;e2) e1 + e2 Substring

rstrcase(e1; e2; x,y.e3) if e1 == "": e2 else: e3(e1[:1], e1[1:])

Replace rreplace[r](e1; e2) e1.sub(r"r", e2)

The Core Language (1 / 2)

31

slide-33
SLIDE 33

Concept Abstract Syntax A Python Coercion rcoerce[r](e) e Checks rcheck[r](e; x.e1; e2)

if re.search(r”r”,e) == None: e2 else: e1(e)

The Core Language (2 / 2)

32

slide-34
SLIDE 34

λRS

String Concatenation rconcat(e; e) Substrings rstrcase(e; e; x,y.e) Substitution rreplace[r](e; e)

33

Coercions rcoerce[r](e) Checked Casts rcheck[r](e; x.e; e)

slide-35
SLIDE 35

String Concatenation

Recall: if e has type stringin[r] then e evaluates to v and v ∈ L(r).

34

slide-36
SLIDE 36

String Concatenation

Recall: if e has type stringin[r] then e evaluates to v and v ∈ L(r). If:

  • e1 : stringin[r1]
  • e2 : stringin[r2]

then:

  • concat(e1; e2) : stringin[r1r2].

35

slide-37
SLIDE 37

String Concatenation

Recall: if e has type stringin[r] then e evaluates to v and v ∈ L(r).

36

slide-38
SLIDE 38

Example Typing Derivation

37

slide-39
SLIDE 39

Substrings

""" S = state code then D.O.B. """ def get_state(s : stringin[(a-z0-9)*]): rstrcase(s; ''; x + rstrcase(y; ''; x))

38

slide-40
SLIDE 40

Substrings

get_state("WI1956")

39

slide-41
SLIDE 41

Substrings

get_state("WI1956") ⇓ rstrcase("WI1956"; ''; x + rstrcase(y; ''; x))

40

slide-42
SLIDE 42

Substrings

get_state("WI1956") ⇓ rstrcase("WI1956"; ''; x + rstrcase(y; ''; x)) ⇓ "W" + rstrcase("I1956”; ''; x)

41

slide-43
SLIDE 43

Substrings

get_state("WI1956") ⇓ rstrcase("WI1956"; ''; x + rstrcase(y; ''; x)) ⇓ "W" + rstrcase("I1956”; ''; x) ⇓ "W" + "I" = "WI"

42

slide-44
SLIDE 44

Substrings

“Get the first n characters of a string s”

43

slide-45
SLIDE 45

Substrings

“Get the first character of a string s” “Get everything after the first character of s”

44

slide-46
SLIDE 46

Substrings

“Get the first character of a string s”

lhead(r) = lhead(r, ε) lhead(ε, r’) = ε lhead(a, r’) = a lhead(r1·r2, r’) = lhead(r1, r2) lhead(r1 + r2, r’) = lhead(r1, r’) + lhead(r2, r’) lhead(r*, r’) = lhead(r’, ε) + lhead(r, ε)

45

slide-47
SLIDE 47

Substrings

“Get the first character of a string s”

lhead(r) = lhead(r, ε) lhead(ε, r’) = ε lhead(a, r’) = a lhead(r1·r2, r’) = lhead(r1, r2) lhead(r1 + r2, r’) = lhead(r1, r’) + lhead(r2, r’) lhead(r*, r’) = lhead(r’, ε) + lhead(r, ε)

“Get everything after the first character of s” δa(r) + δb(r) + δc(r) + ...

46

slide-48
SLIDE 48

Substrings

Observation: If s ∈ L((a-z)*(0-9)) then get_state(rstr[s]) ⇓ rstr[t] such that t ∈ (a-z0-9)*.

47

slide-49
SLIDE 49

Substrings

Observation: If s ∈ L((a-z)*(0-9)) then get_state(rstr[s]) ⇓ rstr[t] such that t ∈ (a-z0-9)*.

48

slide-50
SLIDE 50

On the precision of rstrcase

Note that lhead(r)·ltail(r) ≠ r.

49

slide-51
SLIDE 51

On the precision of rstrcase

Note that lhead(r)·ltail(r) ≠ r. Example: Choose r = (ab)+(cd), so “ad” ∉ L(r). Note that: lhead(r) = a + c ltail(r) = δa(r) + δc(r) = b + d Therefore, “ad” ∈ L(lhead(r)·ltail(r)).

50

slide-52
SLIDE 52

String Replacement

51

subst(r; s1; s2) reads “substitute s2 for r in s1”

slide-53
SLIDE 53

String Replacement

52

slide-54
SLIDE 54

String Replacement

Key Fact: lreplace and subst correspond: subst(r, s1, s2) is in lreplace(r, r1, r2) where:

  • s1 ∈ r1, and
  • s2 ∈ r2.

53

slide-55
SLIDE 55

String Replacement

subst(r, s1, s2) is in lreplace(r, r1, r2). This does not entail a definition of lreplace given a definition of subst.

54

slide-56
SLIDE 56

Saturation

replace("ee", "Kleeene", "e")

replace ee in "Kleene" with e

= “Kleene”

55

slide-57
SLIDE 57

Translation

56

slide-58
SLIDE 58

Translation

Translation defines either an embedding (as a language extension) or, alternatively, an erasure.

57

slide-59
SLIDE 59

58

slide-60
SLIDE 60

Atlang Core

Inference, subtyping, casting, etc. Regular Strings

<:

...

Type Constructor Type Constructor Type Constructor

59

slide-61
SLIDE 61

Conclusions

Constrained String Types are a general approach for specifying and verifying input sanitation procedures. Unlike other approaches, constrained strings

  • nly require a minimal trusted core.

60

slide-62
SLIDE 62

Future Work

  • 1. Implement a static analysis and verify a

realistic query builder.

  • 2. Application of replacement operation to

program repair in dynamic logic over trace semantics.

○ replacement on hybrid regular programs.

  • 3. Explore other privacy & security applications
  • f extensible type systems.
slide-63
SLIDE 63