Decision Procedures for String Constraints Ph.D. Proposal Pieter - - PowerPoint PPT Presentation

decision procedures for string constraints
SMART_READER_LITE
LIVE PREVIEW

Decision Procedures for String Constraints Ph.D. Proposal Pieter - - PowerPoint PPT Presentation

Decision Procedures for String Constraints Ph.D. Proposal Pieter Hooimeijer University of Virginia 1 Motivation 2 Mitre Corp. data reported on http://www.attrition.org/ Motivation #1 #2 3 Mitre Corp. data reported on


slide-1
SLIDE 1

1

Decision Procedures for String Constraints

Ph.D. Proposal Pieter Hooimeijer University of Virginia

slide-2
SLIDE 2

2

Motivation

Mitre Corp. data reported on http://www.attrition.org/

slide-3
SLIDE 3

3

Motivation

Mitre Corp. data reported on http://www.attrition.org/

#1 #2

slide-4
SLIDE 4

4

Motivation

“String values have lost their innocence and are being used in many unforeseen contexts.” [Thiemann05]

slide-5
SLIDE 5

5

Motivation

#1 #2

“String their are b unfor

slide-6
SLIDE 6

6

Motivation

#1 #2

“String values have lost their innocence and are being used in many unforeseen contexts.” [Thiemann05]

slide-7
SLIDE 7

7

Motivation

“String values have lost their innocence and are being used in many unforeseen contexts.” [Thiemann05]

Now what?

slide-8
SLIDE 8

8

Goal

Make string analysis available to a wider class of program analysis tools.

slide-9
SLIDE 9

9

Outline

  • String Constraint Solving
  • Preliminary Results
  • Proposed Research
slide-10
SLIDE 10

10

Outline

  • String Constraint Solving

– example code – definitions

  • Preliminary Results
  • Proposed Research
slide-11
SLIDE 11

11

Outline

  • String Constraint Solving

– example code – definitions

  • Preliminary Results
  • Proposed Research
slide-12
SLIDE 12

12

Example

// v1 and v2 are user inputs if (!ereg('o(pp)+', v1)){exit;} if (!ereg('p*q', v2)){exit;} v3 = v1 . v2; // concat if (v3 != 'oppppq'){exit;} magic();

slide-13
SLIDE 13

13

Query: Will this code ever execute magic?

slide-14
SLIDE 14

14

Example

// v1 and v2 are user inputs if (!ereg('o(pp)+', v1)){exit;} if (!ereg('p*q', v2)){exit;} v3 = v1 . v2; // concat if (v3 != 'oppppq'){exit;} magic(); 1 3 2

slide-15
SLIDE 15

15

Outline

  • String Constraint Solving

– example code – definitions

  • Preliminary Results
  • Proposed Research
slide-16
SLIDE 16

16

Outline

  • String Constraint Solving

– example code – definitions

  • Preliminary Results
  • Proposed Research
slide-17
SLIDE 17

17

Definitions

String Constraint C ::= E ∈ R E ::= V | E ∉ R | E ◦ V R : regex V : variable

slide-18
SLIDE 18

18

Definitions Definitions

Constraint System S = { C1,..., Cn }

where each Ci

∈ S is a well-formed

string constraint.

slide-19
SLIDE 19

19

Definitions Definitions

Decision Procedure D : constraint system → { Satisfiable, Unsatisfiable }

slide-20
SLIDE 20

20

Definitions Definitions

Soundness [D(S) = Sat.] → S is sat. Completeness S is sat. → [D(S) = Sat.]

slide-21
SLIDE 21

21

Definitions Definitions

Soundness [D(S) = Sat.] → S is sat. Completeness S is sat. → [D(S) = Sat.]

slide-22
SLIDE 22

22

Definitions Definitions

Soundness [D(S) = Sat.] → S is sat. Completeness S is sat. → [D(S) = Sat.]

String Constraint C ::= E ∈ R E ::= V | E ∉ R | E ◦ V R : regex V : variable Constraint System S = { C1,..., Cn } where each Ci ∈ S is a well- formed string constraint.

Decision Procedure D : constraint system → { Satisfiable, Unsatisfiable }

slide-23
SLIDE 23

23

Definitions Definitions

Soundness [D(S) = Sat.] → S is sat. Completeness S is sat. → [D(S) = Sat.]

String Constraint C ::= E ∈ R E ::= V | E ∉ R | E ◦ V R : regex V : variable Constraint System S = { C1,..., Cn } where each Ci ∈ S is a well- formed string constraint.

Decision Procedure D : constraint system → { Satisfiable, Unsatisfiable }

slide-24
SLIDE 24

24

Definitions Definitions

Soundness [D(S) = Sat.] → S is sat. Completeness S is sat. → [D(S) = Sat.]

String Constraint C ::= E ∈ R E ::= V | E ∉ R | E ◦ V R : regex V : variable Constraint System S = { C1,..., Cn } where each Ci ∈ S is a well- formed string constraint.

Decision Procedure D : constraint system → { Satisfiable, Unsatisfiable }

slide-25
SLIDE 25

25

Outline

  • String Constraint Solving

– example code – definitions

  • Preliminary Results
  • Proposed Research
slide-26
SLIDE 26

26

Existing Tools

DPRLE [PLDI09] Automata Hampi [ISSTA09] Encode to STP Rex [ICST10] Encode to Z3 Kaluza [Oakland10] Encode to Hampi & STP Our Prototype Lazy Automata

slide-27
SLIDE 27

27

Questions

Make string analysis available to a wider class of program analysis tools.

slide-28
SLIDE 28

28

Questions

  • What is acceptable

performance?

  • What type of constraints

should we allow?

slide-29
SLIDE 29

29

Outline

  • String Constraint Solving
  • Preliminary Results

– scalability – expressive utility

  • Proposed Research
slide-30
SLIDE 30

30

Subjects:

  • Decision Procedure for Regular

Language Equations [PLDI09]

  • Hampi [ISSTA09]
  • Lazy Prototype

Scalability

slide-31
SLIDE 31

31

Task: find a string that is in both [a-c]*a[a-c]{n+1} and [a-c]*b[a-c]{n}

Scalability

slide-32
SLIDE 32

32

Scalability

Time to Generate First String

slide-33
SLIDE 33

33

Scalability

Time to Generate First String

slide-34
SLIDE 34

34

Scalability

  • Existing approaches are less

scalable than they could be on the tested benchmarks

  • Interaction with an underlying

solver introduces performance artifacts

slide-35
SLIDE 35

35

Outline

  • String Constraint Solving
  • Preliminary Results

– scalability – expressive utility

  • Proposed Research
slide-36
SLIDE 36

36

Expressive Utility

  • Picked 88 PHP projects on

SourceForge = 9.6 million LOC

  • Tally:111 distinct string functions
slide-37
SLIDE 37

37

Expressive Utility

slide-38
SLIDE 38

38

Expressive Utility

Index: 63,003

(substr, strlen, strpos, ...)

Regex: 29,141

(preg_match, preg_replace, ...)

slide-39
SLIDE 39

39

Expressive Utility

  • Existing approaches typically

support 'Regex,' but not 'Index'

  • perations
  • 'Index' operations were 2x as

common in the sample under study

slide-40
SLIDE 40

40

Outline

  • String Constraint Solving
  • Preliminary Results

– scalability – expressive utility

  • Proposed Research
slide-41
SLIDE 41

41

Outline

  • String Constraint Solving
  • Preliminary Results
  • Proposed Research

– subset constraints – scalability through laziness – integer index operations – proof strategies

slide-42
SLIDE 42

42

Thesis Statement

Index: 63,003 (substr, strlen, strpos, ...) Regex: 29,141 (preg_match, preg_replace, ...) Time to Generate First String

It is possib practical a the satisfiab cover both

  • perations

program an admits a m

  • f correctn
slide-43
SLIDE 43

43

Thesis Statement

It is possible to construct a practical algorithm that decides the satisfiability of constraints that cover both string and integer index

  • perations, scales up to real-world

program analysis problems, and admits a machine-checkable proof

  • f correctness.
slide-44
SLIDE 44

44

Outline

  • String Constraint Solving
  • Preliminary Results
  • Proposed Research

– subset constraints – scalability through laziness – integer index operations – proof strategies

slide-45
SLIDE 45

45

Subset Constraints [PLDI'09]

constants variables concatenation

slide-46
SLIDE 46

46

Approach

1 2 3 Input

slide-47
SLIDE 47

47

Approach

Input Cross Product

✔Sat. ✘

(c1 ◦ c2) c ∩ 3

Unsat.

slide-48
SLIDE 48

48

Example

// v1 and v2 are user inputs if (!ereg('o(pp)+', v1)){exit;} if (!ereg('p*q', v2)){exit;} v3 = v1 . v2; // concat if (v3 != 'oppppq'){exit;} magic();

slide-49
SLIDE 49

49

d1 d2 d3 d4 d5 d6 d7 a1 a2

  • a3

p a4 p ε b1 p b2 q ε a1d1 a2d2

  • p

p p p q

  • a3d3

a4d4

p p

a2d4 b1d4 ε ε a3d5 a4d6

p p

b1d6 ε b1d5

p p

b2d7

q

slide-50
SLIDE 50

50

a1 a2

  • a3

p a4 p ε b1 p b2 q ε a1d1 a2d2

  • a3d3

a4d4

p p

a2d4 b1d4 ε ε a3d5 a4d6

p p

b1d6 ε b1d5

p p q

b2d7

Solution I: v1 = { opp }

v2 = { ppq }

slide-51
SLIDE 51

51

a1 a2

  • a3

p a4 p ε b1 p b2 q ε a1d1 a2d2

  • a3d3

a4d4

p p

a2d4 b1d4 ε ε a3d5 a4d6

p p

b1d6 ε b1d5

p p

b2d7

q

Solution II: v1 = { opppp }

Solution I: v1 = { opp }

v2 = { ppq } v2 = { q }

slide-52
SLIDE 52

52

Algorithms and a Proof

  • Concat-Intersect (CI) algorithm:

– two variables, three constants; fixed form – mechanically verified proof in Coq 8.1pl3 – proof size is ~1300 lines

  • Regular Matching Assignments (RMA):

– implemented in a tool, DPRPLE – applies CI procedure inductively

slide-53
SLIDE 53

53

Evaluation

  • Find SQL injection vulnerabilities

[Wassermann and Su; PLDI07]

  • For each vulnerability:

– generate SQL + program path

– check path consistency (Simplify) – solve string constraints (DPRLE)

slide-54
SLIDE 54

54

Outline

  • String Constraint Solving
  • Preliminary Results
  • Proposed Research

– subset constraints – scalability through laziness – integer index operations – proof strategies

slide-55
SLIDE 55

55

Scalability through Laziness

Idea: Cast constraint solving as a search problem. Traverse as little of the search space as possible.

slide-56
SLIDE 56

56

Proposed Approach

datatype searchstate = { next : variable; states : variable→pos→status} datatype status = | Unknown of status | StartsAt of nfastate→status | Path of nfapath→status

slide-57
SLIDE 57

57

Proposed Evaluation

  • Within-domain performance comparison:

– DPRLE – Hampi

  • Use previously-published benchmarks:

– long strings task [Veanes et al.] – set difference task [Veanes et al.] – grammar intersection task [Kiezun et al.] – CFG Analyzer – Rex

slide-58
SLIDE 58

58

Outline

  • String Constraint Solving
  • Preliminary Results
  • Proposed Research

– subset constraints – scalability through laziness – integer index operations – proof strategies

slide-59
SLIDE 59

59

Integer Index Operations

Idea: Extend the lazy search- based approach to support integer index operations. Make use (if possible) of existing integer arithmetic models that support incremental solving. Idea: Extend the lazy search- based approach to support integer index operations. Make use (if possible) of existing integer arithmetic models that support incremental solving. Idea: Extend the lazy search- based approach to support integer index operations. Make use (if possible) of existing integer arithmetic models that support incremental solving.

slide-60
SLIDE 60

60

Proposed Approach

  • Explicitly-typed constraint language for

strings and integer indices

  • Support integer arithmetic on indices

using an existing approach

slide-61
SLIDE 61

61

Proposed Evaluation

  • Compare to existing approach

[Saxena et al.] where features overlap

  • Develop PHP benchmark based on

preliminary results

  • Metrics: running time, proportion of

testcases fully expressible

slide-62
SLIDE 62

62

Outline

  • String Constraint Solving
  • Preliminary Results
  • Proposed Research

– subset constraints – scalability through laziness – integer index operations – proof strategies

slide-63
SLIDE 63

63

Proof Strategies

Idea: Extend the lazy search- based approach to support integer index operations. Make use (if possible) of existing integer arithmetic models that support incremental solving. Idea: Extend the lazy search- based approach to support integer index operations. Make use (if possible) of existing integer arithmetic models that support incremental solving. Idea: Develop a more general approach for formally verifying string decision procedures so that proof and algorithm can co- evolve.

slide-64
SLIDE 64

64

Schedule

slide-65
SLIDE 65

65

Conclusion

  • Presented proposed research on decision

procedures, focusing on:

– expressive utility – scalability – correctness

  • Research thrusts:

– subset constraints – lazy search – integer index operations – proof strategies

slide-66
SLIDE 66

We encourage difficult questions.

slide-67
SLIDE 67

67

An Example

void site_exec(char *cmd){ char *slash; char *sp = (char*)strchr(cmd,' '); /* sanitize the command-string */ while (sp && (slash=strchr(cmd,'/')) && (slash < sp)) cmd = slash + 1;

slide-68
SLIDE 68

68

Example

/* sanitize the command-string */ while (sp && (slash=strchr(cmd,'/')) && (slash < sp)) cmd = slash + 1;

\0

sp cmd / b i n / l s - l s

slide-69
SLIDE 69

69

/* sanitize the command-string */ while (sp && && (slash < sp)) cmd = slash + 1;

/ b i n / l s - l s \0 sp cmd slash

(slash=strchr(cmd,'/'))

\0

/ b i n / l s - l s

slide-70
SLIDE 70

70

/* sanitize the command-string */ while (sp && (slash=strchr(cmd,'/')) && (slash < sp))

/ b i n / l s - l s \0 sp cmd slash / b i n / l s - l s \0

cmd = slash + 1;

slide-71
SLIDE 71

71

/* sanitize the command-string */ while (sp && && (slash < sp)) cmd = slash + 1;

/ b i n / l s - l s \0 sp

(slash=strchr(cmd,'/'))

/ b i n / l s - l s \0 cmd slash

slide-72
SLIDE 72

72

/* sanitize the command-string */ while (sp && (slash=strchr(cmd,'/')) && (slash < sp))

/ b i n / l s - l s \0 sp cmd slash / b i n / l s - l s \0

cmd = slash + 1;

slide-73
SLIDE 73

73

/* sanitize the command-string */ while (sp && (slash=strchr(cmd,'/')) && (slash < sp))

/ b i n / l s - l s \0 sp cmd slash=0 / b i n / l s - l s \0

cmd = slash + 1;

slide-74
SLIDE 74

74

char *slash; char *sp = (char*)strchr(cmd,' '); /* sanitize the command-string */ while (sp && (slash=strchr(cmd,'/')) && (slash < sp)) cmd = slash + 1;

string c ∈ Σ* index sp := findfirst(cmd, ' '); string c2 := cmd[:sp] index slsh := findlast(cmd2, '/') string c3 := cmd[slash + 1:]

slide-75
SLIDE 75

75

Can cmd contain '/' ? Can the substring between cmd and sp contain '/bin/rm' ?

Example: Some Queries

✔ ✘