2/9/2009
An Efficient Black-box Technique for Defeating Web Application Attacks
- R. Sekar
An Efficient Black-box Technique for Defeating Web Application - - PowerPoint PPT Presentation
An Efficient Black-box Technique for Defeating Web Application Attacks R. Sekar S tony Brook University (Research supported by DARPA, NS F and ONR) 2/9/2009 Example: SquirrelMail Command Injection Attack: use maliciously Incom ing
2/9/2009
2/ 9/ 2009 2
Based on “ taint:” degree to
Requires policies
Application-independent
(Untrusted input)
(S ecurity-sensitive operations) (To databases, backend servers, command interpreters, files, … )
2/ 9/ 2009 3
Others 24% Form at string 1% Mem ory errors 10% I nput validation/ DoS 9% Directory traversal 4% Cross-site scripting 19% Com m and injection 18% SQL injection 14% Config/ Race errors 1%
2/ 9/ 2009 4
2/ 9/ 2009 5
Taint-inference: Black-box technique to infer taint by
S
(PHP, Java, C, C++,…)
Taint Inference
substring matching
Syntax Analysis
parameters, cookies, …
for SQL, HTML, …
Attack Detection
policy enforcement
2/ 9/ 2009 6
2/ 9/ 2009 7
2/ 9/ 2009 8
2/ 9/ 2009 9
2/ 9/ 2009 10
A substring u of t is tainted if ED(s, u) < d
Here, ED denotes the edit-distance
Approximate ED by ED
#, defined on length | s| substrings of t
Let U (and V) denote a multiset of characters in u (resp., v) ED #(u, v) = min(| U-V| , | V-U| )
S
# incrementally
Prove: ED(s, r) < d ⇒ ED #(s, r) < d for all substrings r of t
O(| s| 2) space in worst-case performs like a linear-time algorithm in practice
2/ 9/ 2009 11
Policy structure mirrors that of syntax trees
And-Or “ trees”
Can specify constraints on values (using regular expressions)
2/ 9/ 2009 12
separator
2/ 9/ 2009 13
2/ 9/ 2009 15
Complex application-specific data transformations
Protocol/ language-specific transformations handled
S
A limitation common to taint-based approaches
Detected all attacks in experiments with the exception of a
S
~21K S
HTTP response splitting on WebGoat
2/ 9/ 2009 16
Result of coincidental matches (in taint-inference) Can be controlled by setting the distance threshold d based on the
Likelihood small even for short strings No false positives reported in experiments Implication Can use large distances for moderate-size strings (len > 10), thus
2/ 9/ 2009 17
2/ 9/ 2009 18
2/ 9/ 2009 19
~5x performance improvement due to pruning
2/ 9/ 2009 20
Focus on formal characterization of S
Our contributions
A robust, application-independent technique to infer taint
Policies decoupled from grammar
Applicable to many languages
Flow inference algorithms tuned for simpler data (file names,
And related approaches (AMNES
Require deep analysis/ instrumentation of applications
2/ 9/ 2009 21