1
Decision Procedures for String Constraints
Ph.D. Proposal Pieter Hooimeijer University of Virginia
Decision Procedures for String Constraints Ph.D. Proposal Pieter - - PowerPoint PPT Presentation
Decision Procedures for String Constraints Ph.D. Proposal Pieter Hooimeijer University of Virginia 1 Motivation 2 Mitre Corp. data reported on http://www.attrition.org/ Motivation #1 #2 3 Mitre Corp. data reported on
1
Decision Procedures for String Constraints
Ph.D. Proposal Pieter Hooimeijer University of Virginia
2
Motivation
Mitre Corp. data reported on http://www.attrition.org/
3
Motivation
Mitre Corp. data reported on http://www.attrition.org/
#1 #2
4
Motivation
“String values have lost their innocence and are being used in many unforeseen contexts.” [Thiemann05]
5
Motivation
#1 #2
“String their are b unfor
6
Motivation
#1 #2
“String values have lost their innocence and are being used in many unforeseen contexts.” [Thiemann05]
7
Motivation
“String values have lost their innocence and are being used in many unforeseen contexts.” [Thiemann05]
8
Goal
Make string analysis available to a wider class of program analysis tools.
9
Outline
10
Outline
– example code – definitions
11
Outline
– example code – definitions
12
Example
// v1 and v2 are user inputs if (!ereg('o(pp)+', v1)){exit;} if (!ereg('p*q', v2)){exit;} v3 = v1 . v2; // concat if (v3 != 'oppppq'){exit;} magic();
13
Query: Will this code ever execute magic?
14
Example
// v1 and v2 are user inputs if (!ereg('o(pp)+', v1)){exit;} if (!ereg('p*q', v2)){exit;} v3 = v1 . v2; // concat if (v3 != 'oppppq'){exit;} magic(); 1 3 2
15
Outline
– example code – definitions
16
Outline
– example code – definitions
17
Definitions
String Constraint C ::= E ∈ R E ::= V | E ∉ R | E ◦ V R : regex V : variable
18
Definitions Definitions
Constraint System S = { C1,..., Cn }
where each Ci
∈ S is a well-formed
string constraint.
19
Definitions Definitions
Decision Procedure D : constraint system → { Satisfiable, Unsatisfiable }
20
Definitions Definitions
Soundness [D(S) = Sat.] → S is sat. Completeness S is sat. → [D(S) = Sat.]
21
Definitions Definitions
Soundness [D(S) = Sat.] → S is sat. Completeness S is sat. → [D(S) = Sat.]
22
Definitions Definitions
Soundness [D(S) = Sat.] → S is sat. Completeness S is sat. → [D(S) = Sat.]
String Constraint C ::= E ∈ R E ::= V | E ∉ R | E ◦ V R : regex V : variable Constraint System S = { C1,..., Cn } where each Ci ∈ S is a well- formed string constraint.
Decision Procedure D : constraint system → { Satisfiable, Unsatisfiable }
23
Definitions Definitions
Soundness [D(S) = Sat.] → S is sat. Completeness S is sat. → [D(S) = Sat.]
String Constraint C ::= E ∈ R E ::= V | E ∉ R | E ◦ V R : regex V : variable Constraint System S = { C1,..., Cn } where each Ci ∈ S is a well- formed string constraint.
Decision Procedure D : constraint system → { Satisfiable, Unsatisfiable }
24
Definitions Definitions
Soundness [D(S) = Sat.] → S is sat. Completeness S is sat. → [D(S) = Sat.]
String Constraint C ::= E ∈ R E ::= V | E ∉ R | E ◦ V R : regex V : variable Constraint System S = { C1,..., Cn } where each Ci ∈ S is a well- formed string constraint.
Decision Procedure D : constraint system → { Satisfiable, Unsatisfiable }
25
Outline
– example code – definitions
26
Existing Tools
DPRLE [PLDI09] Automata Hampi [ISSTA09] Encode to STP Rex [ICST10] Encode to Z3 Kaluza [Oakland10] Encode to Hampi & STP Our Prototype Lazy Automata
27
Questions
Make string analysis available to a wider class of program analysis tools.
28
Questions
performance?
should we allow?
29
Outline
– scalability – expressive utility
30
Subjects:
Language Equations [PLDI09]
Scalability
31
Task: find a string that is in both [a-c]*a[a-c]{n+1} and [a-c]*b[a-c]{n}
Scalability
32
Scalability
Time to Generate First String
33
Scalability
Time to Generate First String
34
Scalability
scalable than they could be on the tested benchmarks
solver introduces performance artifacts
35
Outline
– scalability – expressive utility
36
Expressive Utility
SourceForge = 9.6 million LOC
37
Expressive Utility
38
Expressive Utility
Index: 63,003
(substr, strlen, strpos, ...)
Regex: 29,141
(preg_match, preg_replace, ...)
39
Expressive Utility
support 'Regex,' but not 'Index'
common in the sample under study
40
Outline
– scalability – expressive utility
41
Outline
– subset constraints – scalability through laziness – integer index operations – proof strategies
42
Thesis Statement
Index: 63,003 (substr, strlen, strpos, ...) Regex: 29,141 (preg_match, preg_replace, ...) Time to Generate First String
It is possib practical a the satisfiab cover both
program an admits a m
43
Thesis Statement
It is possible to construct a practical algorithm that decides the satisfiability of constraints that cover both string and integer index
program analysis problems, and admits a machine-checkable proof
44
Outline
– subset constraints – scalability through laziness – integer index operations – proof strategies
45
Subset Constraints [PLDI'09]
constants variables concatenation
46
Approach
1 2 3 Input
47
Approach
Input Cross Product
(c1 ◦ c2) c ∩ 3
Unsat.
48
Example
// v1 and v2 are user inputs if (!ereg('o(pp)+', v1)){exit;} if (!ereg('p*q', v2)){exit;} v3 = v1 . v2; // concat if (v3 != 'oppppq'){exit;} magic();
49
d1 d2 d3 d4 d5 d6 d7 a1 a2
p a4 p ε b1 p b2 q ε a1d1 a2d2
p p p q
a4d4
p p
a2d4 b1d4 ε ε a3d5 a4d6
p p
b1d6 ε b1d5
p p
b2d7
q
50
a1 a2
p a4 p ε b1 p b2 q ε a1d1 a2d2
a4d4
p p
a2d4 b1d4 ε ε a3d5 a4d6
p p
b1d6 ε b1d5
p p q
b2d7
Solution I: v1 = { opp }
v2 = { ppq }
51
a1 a2
p a4 p ε b1 p b2 q ε a1d1 a2d2
a4d4
p p
a2d4 b1d4 ε ε a3d5 a4d6
p p
b1d6 ε b1d5
p p
b2d7
q
Solution II: v1 = { opppp }
Solution I: v1 = { opp }
v2 = { ppq } v2 = { q }
52
Algorithms and a Proof
– two variables, three constants; fixed form – mechanically verified proof in Coq 8.1pl3 – proof size is ~1300 lines
– implemented in a tool, DPRPLE – applies CI procedure inductively
53
Evaluation
[Wassermann and Su; PLDI07]
– generate SQL + program path
– check path consistency (Simplify) – solve string constraints (DPRLE)
54
Outline
– subset constraints – scalability through laziness – integer index operations – proof strategies
55
Scalability through Laziness
Idea: Cast constraint solving as a search problem. Traverse as little of the search space as possible.
56
Proposed Approach
datatype searchstate = { next : variable; states : variable→pos→status} datatype status = | Unknown of status | StartsAt of nfastate→status | Path of nfapath→status
57
Proposed Evaluation
– DPRLE – Hampi
– long strings task [Veanes et al.] – set difference task [Veanes et al.] – grammar intersection task [Kiezun et al.] – CFG Analyzer – Rex
58
Outline
– subset constraints – scalability through laziness – integer index operations – proof strategies
59
Integer Index Operations
Idea: Extend the lazy search- based approach to support integer index operations. Make use (if possible) of existing integer arithmetic models that support incremental solving. Idea: Extend the lazy search- based approach to support integer index operations. Make use (if possible) of existing integer arithmetic models that support incremental solving. Idea: Extend the lazy search- based approach to support integer index operations. Make use (if possible) of existing integer arithmetic models that support incremental solving.
60
Proposed Approach
strings and integer indices
using an existing approach
61
Proposed Evaluation
[Saxena et al.] where features overlap
preliminary results
testcases fully expressible
62
Outline
– subset constraints – scalability through laziness – integer index operations – proof strategies
63
Proof Strategies
Idea: Extend the lazy search- based approach to support integer index operations. Make use (if possible) of existing integer arithmetic models that support incremental solving. Idea: Extend the lazy search- based approach to support integer index operations. Make use (if possible) of existing integer arithmetic models that support incremental solving. Idea: Develop a more general approach for formally verifying string decision procedures so that proof and algorithm can co- evolve.
64
Schedule
65
Conclusion
procedures, focusing on:
– expressive utility – scalability – correctness
– subset constraints – lazy search – integer index operations – proof strategies
We encourage difficult questions.
67
An Example
void site_exec(char *cmd){ char *slash; char *sp = (char*)strchr(cmd,' '); /* sanitize the command-string */ while (sp && (slash=strchr(cmd,'/')) && (slash < sp)) cmd = slash + 1;
68
Example
/* sanitize the command-string */ while (sp && (slash=strchr(cmd,'/')) && (slash < sp)) cmd = slash + 1;
\0
sp cmd / b i n / l s - l s
69
/* sanitize the command-string */ while (sp && && (slash < sp)) cmd = slash + 1;
/ b i n / l s - l s \0 sp cmd slash
(slash=strchr(cmd,'/'))
\0
/ b i n / l s - l s
70
/* sanitize the command-string */ while (sp && (slash=strchr(cmd,'/')) && (slash < sp))
/ b i n / l s - l s \0 sp cmd slash / b i n / l s - l s \0
cmd = slash + 1;
71
/* sanitize the command-string */ while (sp && && (slash < sp)) cmd = slash + 1;
/ b i n / l s - l s \0 sp
(slash=strchr(cmd,'/'))
/ b i n / l s - l s \0 cmd slash
72
/* sanitize the command-string */ while (sp && (slash=strchr(cmd,'/')) && (slash < sp))
/ b i n / l s - l s \0 sp cmd slash / b i n / l s - l s \0
cmd = slash + 1;
73
/* sanitize the command-string */ while (sp && (slash=strchr(cmd,'/')) && (slash < sp))
/ b i n / l s - l s \0 sp cmd slash=0 / b i n / l s - l s \0
cmd = slash + 1;
74
char *slash; char *sp = (char*)strchr(cmd,' '); /* sanitize the command-string */ while (sp && (slash=strchr(cmd,'/')) && (slash < sp)) cmd = slash + 1;
string c ∈ Σ* index sp := findfirst(cmd, ' '); string c2 := cmd[:sp] index slsh := findlast(cmd2, '/') string c3 := cmd[slash + 1:]
75
Can cmd contain '/' ? Can the substring between cmd and sp contain '/bin/rm' ?
Example: Some Queries