 
              Decision Procedures for String Constraints Ph.D. Proposal Pieter Hooimeijer University of Virginia 1
Motivation 2 Mitre Corp. data reported on http://www.attrition.org/
Motivation #1 #2 3 Mitre Corp. data reported on http://www.attrition.org/
Motivation “String values have lost their innocence and are being used in many unforeseen contexts.” [Thiemann05] 4
Motivation “String their #1 are b unfor #2 5
Motivation “String values have lost their innocence and #1 are being used in many unforeseen contexts.” #2 [Thiemann05] 6
Motivation “String values have lost Now their innocence and are being used in many unforeseen contexts.” what? [Thiemann05] 7
Goal Make string analysis available to a wider class of program analysis tools. 8
Outline • String Constraint Solving • Preliminary Results • Proposed Research 9
Outline • String Constraint Solving – example code – definitions • Preliminary Results • Proposed Research 10
Outline • String Constraint Solving – example code – definitions • Preliminary Results • Proposed Research 11
Example // v1 and v2 are user inputs if (!ereg('o(pp)+', v1)){exit;} if (!ereg('p*q', v2)){exit;} v3 = v1 . v2; // concat if (v3 != 'oppppq'){exit;} magic(); 12
Query: Will this code ever execute magic ? 13
Example // v1 and v2 are user inputs 1 if (!ereg('o(pp)+', v1)){exit;} 2 if (!ereg('p*q', v2)){exit;} v3 = v1 . v2; // concat 3 if (v3 != 'oppppq'){exit;} magic(); 14
Outline • String Constraint Solving – example code – definitions • Preliminary Results • Proposed Research 15
Outline • String Constraint Solving – example code – definitions • Preliminary Results • Proposed Research 16
Definitions String Constraint C ::= E ∈ R E ::= V | E ∉ R | E ◦ V R : regex V : variable 17
Definitions Definitions Constraint System S = { C 1 ,..., C n } where each C i ∈ S is a well-formed string constraint. 18
Definitions Definitions Decision Procedure D : constraint system → { Satisfiable, Unsatisfiable } 19
Definitions Definitions Soundness [ D ( S ) = Sat. ] → Completeness S is sat. S is sat. → [ D ( S ) = Sat. ] 20
Definitions Definitions Soundness [ D ( S ) = Sat. ] → Completeness S is sat. S is sat. → [ D ( S ) = Sat. ] 21
Definitions Definitions Constraint System Decision Procedure D : constraint system → S = { C 1 ,..., C n } { Satisfiable, where each C i ∈ S is a well- Unsatisfiable } formed string constraint. Soundness String Constraint [ D ( S ) = Sat. ] → S is sat. C ::= E ∈ R E ::= V Completeness | E ∉ R | E ◦ V S is sat. → R : regex V : variable [ D ( S ) = Sat. ] 22
Definitions Definitions Constraint System Decision Procedure D : constraint system → S = { C 1 ,..., C n } { Satisfiable, where each C i ∈ S is a well- Unsatisfiable } formed string constraint. Soundness String Constraint [ D ( S ) = Sat. ] → S is sat. C ::= E ∈ R E ::= V Completeness | E ∉ R | E ◦ V S is sat. → R : regex V : variable [ D ( S ) = Sat. ] 23
Definitions Definitions Constraint System Decision Procedure D : constraint system → S = { C 1 ,..., C n } { Satisfiable, where each C i ∈ S is a well- Unsatisfiable } formed string constraint. Soundness String Constraint [ D ( S ) = Sat. ] → S is sat. C ::= E ∈ R E ::= V Completeness | E ∉ R | E ◦ V S is sat. → R : regex V : variable [ D ( S ) = Sat. ] 24
Outline • String Constraint Solving – example code – definitions • Preliminary Results • Proposed Research 25
Existing Tools DPRLE [PLDI09] Automata Hampi [ISSTA09] Encode to STP Rex [ICST10] Encode to Z3 Kaluza [Oakland10] Encode to Hampi & STP Our Prototype Lazy Automata 26
Questions Make string analysis available to a wider class of program analysis tools. 27
Questions • What is acceptable performance? • What type of constraints should we allow? 28
Outline • String Constraint Solving • Preliminary Results – scalability – expressive utility • Proposed Research 29
Scalability Subjects: - Decision Procedure for Regular Language Equations [PLDI09] - Hampi [ISSTA09] - Lazy Prototype 30
Scalability Task: find a string that is in both [a-c]*a[a-c]{n+1} and [a-c]*b[a-c]{n} 31
Scalability Time to Generate First String 32
Scalability Time to Generate First String 33
Scalability • Existing approaches are less scalable than they could be on the tested benchmarks • Interaction with an underlying solver introduces performance artifacts 34
Outline • String Constraint Solving • Preliminary Results – scalability – expressive utility • Proposed Research 35
Expressive Utility • Picked 88 PHP projects on SourceForge = 9.6 million LOC • Tally:111 distinct string functions 36
Expressive Utility 37
Expressive Utility Index : 63,003 (substr, strlen, strpos, ...) Regex : 29,141 (preg_match, preg_replace, ...) 38
Expressive Utility • Existing approaches typically support 'Regex,' but not 'Index' operations • 'Index' operations were 2x as common in the sample under study 39
Outline • String Constraint Solving • Preliminary Results – scalability – expressive utility • Proposed Research 40
Outline • String Constraint Solving • Preliminary Results • Proposed Research – subset constraints – scalability through laziness – integer index operations – proof strategies 41
Thesis Statement It is possib Time to Generate First String practical a the satisfiab cover both operations program an Index : 63,003 (substr, strlen, strpos, ...) admits a m Regex : 29,141 (preg_match, preg_replace, ...) of correctn 42
Thesis Statement It is possible to construct a practical algorithm that decides the satisfiability of constraints that cover both string and integer index operations, scales up to real-world program analysis problems, and admits a machine-checkable proof of correctness. 43
Outline • String Constraint Solving • Preliminary Results • Proposed Research – subset constraints – scalability through laziness – integer index operations – proof strategies 44
Subset Constraints [PLDI'09] concatenation constants variables 45
Approach Input 1 2 3 46
Approach Input Cross Product ✔ Sat. ✘ Unsat. (c 1 ◦ c 2 ) c ∩ 3 47
Example // v1 and v2 are user inputs if (!ereg('o(pp)+', v1)){exit;} if (!ereg('p*q', v2)){exit;} v3 = v1 . v2; // concat if (v3 != 'oppppq'){exit;} magic(); 48
ε p q o p p ε b1 b2 a1 a2 a3 a4 o d1 a1d1 o p d2 a2d2 p p d3 a3d3 p ε ε d4 a2d4 a4d4 b1d4 p p p d5 a3d5 b1d5 p p ε p d6 a4d6 b1d6 q q b2d7 49 d7
ε p q o p p ε b1 b2 a1 a2 a3 a4 Solution I: v 1 = { opp } o a1d1 v 2 = { ppq } p a2d2 p a3d3 ε ε a2d4 a4d4 b1d4 p p a3d5 b1d5 p ε p a4d6 b1d6 q b2d7 50
ε p q o p p ε b1 b2 a1 a2 a3 a4 Solution I: v 1 = { opp } o a1d1 v 2 = { ppq } p a2d2 Solution II: v 1 = { opppp } v 2 = { q } p a3d3 ε ε a2d4 a4d4 b1d4 p p a3d5 b1d5 p ε p a4d6 b1d6 q b2d7 51
Algorithms and a Proof • Concat-Intersect (CI) algorithm: – two variables, three constants; fixed form – mechanically verified proof in Coq 8.1pl3 – proof size is ~1300 lines • Regular Matching Assignments (RMA): – implemented in a tool, DPRPLE – applies CI procedure inductively 52
Evaluation • Find SQL injection vulnerabilities [Wassermann and Su; PLDI07] • For each vulnerability: – generate SQL + program path – check path consistency (Simplify) – solve string constraints (DPRLE) 53
Outline • String Constraint Solving • Preliminary Results • Proposed Research – subset constraints – scalability through laziness – integer index operations – proof strategies 54
Scalability through Laziness Idea: Cast constraint solving as a search problem. Traverse as little of the search space as possible. 55
Proposed Approach datatype searchstate = { next : variable; states : variable → pos → status} datatype status = | Unknown of status | StartsAt of nfastate → status | Path of nfapath → status 56
Proposed Evaluation • Within-domain performance comparison: – CFG Analyzer – DPRLE – Rex – Hampi • Use previously-published benchmarks: – long strings task [Veanes et al. ] – set difference task [Veanes et al .] – grammar intersection task [Kiezun et al .] 57
Outline • String Constraint Solving • Preliminary Results • Proposed Research – subset constraints – scalability through laziness – integer index operations – proof strategies 58
Recommend
More recommend