An Efficient Black-box Technique for Defeating Web Application - PowerPoint PPT Presentation

An Efficient Black-box Technique for Defeating Web Application Attacks R. Sekar S tony Brook University (Research supported by DARPA, NS F and ONR) 2/9/2009

Example: SquirrelMail Command Injection � Attack: use maliciously Incom ing crafted input to exert Request unintended control over (Untrusted input) $send_to_list = sendto=“ nobody; rm – output operations $_GET[‘sendto’] rf * ” � Detect “ exertion of control” Program $command = “gpg $command=“gpg –r � Based on “ taint:” degree to -r $send_to_list nobody ; rm –rf * which output depends on 2>&1” 2>&1” input � Detect if control is popen($command) popen($command) intended: Attack: Removes files � Requires policies Outgoing Request/ Response � Application-independent (S ecurity-sensitive operations) (To databases, backend servers, policies are preferable command interpreters, files, … ) � 2/ 9/ 2009 2

Attack Space of Interest (CVE 2006-07) Form at string 1% Config/ Race Mem ory errors errors 1% Others 10% 24% SQL injection 14% I nput validation/ DoS 9% Com m and injection Directory 18% traversal Generalized Injection 4% Attacks Cross-site scripting 19% � 2/ 9/ 2009 3

Drawbacks of Taint-Tracking and Motivation for Our Approach � Intrusive instrumentation � Transform every statement in target application � Can potentially impact stability and robustness � High performance overheads � Often slow down programs by 2x or more � Language dependence � E.g., they apply either to Java or C/ C++ � 2/ 9/ 2009 4

Approach Overview Syntax Analysis Protected System •Decode HTTP parameters, cookies, … •Construct parse trees for SQL, HTML, … Web Database/ Web Internet Server Backend App Taint Inference (IIS/ server (PHP, Java, • Based on approximate Apache) C, C++,…) substring matching Attack Detection Interceptors • Syntax and taint-aware policy enforcement System Libraries � Efficient, language-neutral, and non-intrusive � Consists of � Taint-inference: Black-box technique to infer taint by observing inputs and outputs of protected apps � S yntax- and Taint-aware policies for detecting unintended use of tainted data � 2/ 9/ 2009 5

Syntax Analysis: Input Parsing � Inputs: � Parse into components � Request type, URL, form parameters, cookies, … � Exposes more of protocol semantics to other phases � All information mapped to (name, value) pairs � Normalize formats to avoid effect of various encoding schemes � To cope with evasion techniques � To ensure accuracy of taint-inference � Our implementation uses ModS ecurity code � 2/ 9/ 2009 6

Syntax Tree Construction � Outputs: � Pluggable architecture to parse different output languages � HTML, S QL, S hell scripts, … � Use “ rough” parsing, since accurate parsers are: � time-consuming to write � may not gracefully handle: � errors (especially common in HTML), or � language extensions and variations (different shells, different flavors of S QL) � Map to a language-neutral representation � Implemented using standard tools (Flex/ Bison) � 2/ 9/ 2009 7

Taint Inference � Infer taint by observing inputs and outputs � Allow for simple transformations that are common in web applications � S pace removal (or replacement with “ _” ) � Upper-to-lower case transformation, quoting or unescaping, … � Other application-specific changes � S quirrelMail, when given the “ to” field value “ alice, bob; touch /tmp/a ” produces an output “ -r alice@ -r bob; touch /tmp/a ” olution: use approximate substring matching � S � 2/ 9/ 2009 8

Taint Inference Algorithm � S tandard approximate substring matching algorithms have quadratic time and space complexity � Too high, since inputs and outputs can be quite large � Our contribution � A linear-time “ coarse-filtering” algorithm � More expensive edit-distance algorithm invoked on substrings selected by coarse-filtering algorithm � The combination is effectively linear-time � Ensures taint identification if distance between two strings is below a user-specified threshold d � Contrast with biological computing tools that provide speed up heuristics, but no such guarantee � 2/ 9/ 2009 9

Coarse-filtering to speed up Taint Inference � Definition of taint: � A substring u of t is tainted if ED(s , u) < d � Here, ED denotes the edit-distance � Key idea for coarse-filtering: # , defined on length | s| substrings of t � Approximate ED by ED � Let U (and V ) denote a multiset of characters in u (resp., v ) # (u, v ) = min (| U-V | , | V-U | ) � ED # incrementally lide a window of size | s| over t , compute ED � S # (s, r ) < d for all substrings r of t � Prove: ED (s, r ) < d ⇒ ED � Result: � O (| s| 2 ) space in worst-case � performs like a linear-time algorithm in practice � 2/ 9/ 2009 10

Overview of Syntax+Taint-aware Policies � Leverage structure+taint to simplify/ generalize policy � Policy structure mirrors that of syntax trees � And-Or “ trees” (possibly with cycles) � Can specify constraints on values (using regular expressions) and taint associated with a parse tree node ELEMENT NAME = “ script” OR PARAM ELEM_BODY PARAM_NAME=“ src” PARAM_VALUE 1. Policy for detecting XSS � 2/ 9/ 2009 11

Injection attacks and Syntax-aware policies root root cmd cmd cmd name param param name param param name param param separator sekar@ ; rm -rf * gpg -r gpg -r nobody abc.com � (2) S panNodes policy: captures “ lexical confinement” � tainted data to be contained within a single tree node � (3) S traddleTrees policy: captures “ overflows” � Both are “ default deny” policies � Tainted data begins in the middle of one syntactic structure (subtree), then flows into next subtree � 2/ 9/ 2009 12

Further Optimization: Pruning Policies � Most inputs are benign, and cannot lead to violation of policies � Policies constrain tainted content, which comes from input � Thus, policies implicitly constrain inputs � Approach: � Define “ pruning policies” that make these implicit constraints explicit � Pruning policies identify subset of inputs that can possibly lead to policy violation � For other inputs, we can skip taint inference as well as policy checking algorithms � 2/ 9/ 2009 13

Evaluation: Applications and Policies Application Language LOC (Size) Environment Attacks Notes Apache or IIS phpBB PHP/C 34K SQL inj w/MySQL Popular real- Shell command world apps. SquirrelMail PHP/C 35K/42K Apache or IIS inj, XSS Exploits from the wild. XMLRPC PHP command PHP/C 2K Apache or IIS (library) inj SQL inj Apps from Apache+Tomcat w/ Attacks by Java/C 30K (21K attacks. gotocode.com MySQL [Halfond et al] 4K legitimate) command inj, WebGoat Java/C Tomcat HTTP response splitting App DARPA PHP 2K Apache SQL inj developed by RedTeam App Red Team � We used the 3 policies described earlier in the talk

False Negatives (and Detection Results) � Occur due to � Complex application-specific data transformations � Protocol/ language-specific transformations handled � S econd-order attacks (data written into persistent store, read back subsequently, and used in security-sensitive operations) � A limitation common to taint-based approaches � Experimental results: � Detected all attacks in experiments with the exception of a single second-order inj ection attack in Red Team evaluation � S hell and PHP command inj ections and XS S on � ~21K S QL inj ection attacks on 5 moderate-size JS P applications (AMNES IA [Halfond et al] dataset) � HTTP response splitting on WebGoat � 2/ 9/ 2009 15

False Positives � Result of coincidental matches (in taint-inference) � Can be controlled by setting the distance threshold d based on the desired false positive probability � Likelihood small even for short strings � No false positives reported in experiments � Implication � Can use large distances for moderate-size strings (len > 10), thus tolerating significant input transformations 1.E+00 1.E-01 0 10 20 30 40 50 60 70 d=0, a=40 1.E-02 d=0.3, a=40 1.E-03 1.E-04 d=0.7,a=70 1.E-05 d=0.7,a=40 1.E-06 1.E-07 � 2/ 9/ 2009 16

Taint inference overhead � Coarse filtering optimization � 10x to 20x improvement in speed in experiments � 50x to 1000x reduction in space � time spent in coarse filtering (linear-time algorithm) exceeds time spent inside edit-distance algorithm � performance decreases with large values of distance � When coincidental probability increases beyond 10 -6 � 2/ 9/ 2009 17

Overhead of different phases � 60% spent in taint inference � After coarse-filtering optimization � 20% in parsing � 20% in policy checking � Overhead of interposition not measured � but assumed to be relatively small because of reliance on library interposition � 2/ 9/ 2009 18

An Efficient Black-box Technique for Defeating Web Application - PowerPoint PPT Presentation

An Efficient Black-box Technique for Defeating Web Application Attacks R. Sekar S tony Brook University (Research supported by DARPA, NS F and ONR) 2/9/2009 Example: SquirrelMail Command Injection Attack: use maliciously Incom ing

Paradoxes in Probability How probability continues to amuse me! Let's play a game! Box A Box B

Black Box Scanning Tool + White Box Testing Tool Toshis Black Box Scanning Tool Same

A recipe for black box functors Maru Sarazola and Brendan Fong What is a black box functor? In

Efficient Black-Box Combinatorial Optimization Hamid Dadkhahi Karthikeyan Shanmugam Jesus Rios

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Kid s Box American English Level 1 Presentation Plus: Kid s Box American English Kid s Box

Flux Box Flux Box A concept by Flux Laboratory Flux box : concept Flux box : concept What is Flux

[7] Gaussian Elimination Starting to peek inside the black box So far sol ve( A, b) is a black

Side Channel Analysis & Countermeasures Begl Bilgin 27 Dec. 2014 - IAM Alumni Meeting

PRIVATE EVENTS PrivateEvents@ACL-LIVE.com (512)404-1318 ACL LIVE: A Black Box for events

Make sure we can query black box algorithms

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Black Hole Thermodynamics Robert M. Wald I. Black Holes; Event Horizons and Killing Horizons II.

Red-Black Trees Binary Search Trees with O(log n) Worst-Case Time per Operation The Red-Black

5 Rules 1 Red Black Tree Properties - A 1. Every Node Is Either RED or BLACK 2. Every NILL Node

Mats Rahmstrm The President and CEOs address Annual General Meeting 2018 Product portfolio

Amgen Investor Event 2014 ACC Scientific Session March 30, 2014 Safe Harbor Statement This

Introduction to Artificial Intelligence Local Search Janyl Jumadinova September 19, 2016

LCS 11: Cognitive Science quickly you hardly catch it going? Its really all memory ...except

CS 103 Unit 11 Linked Lists Mark Redekopp 2 NULL Pointer Just like there was a null

Dynamic Analysis Kung-Fu with PANDA This work is sponsored in part under Air Force contract

Thy Kingdom Come [A STUDY THROUGH MATTHEWS GOSPEL] QUESTIONS FOR DISCUSSION & DISCOVERY 1.

Tor and circumvention: Lessons learned Nick Mathewson The Tor Project https://torproject.org/

Introduction: In this 21 st century the cognitive impairment and oxidative stress were most

Sambuz

Useful Links

Newsletter

Mail Us

An Efficient Black-box Technique for Defeating Web Application - PowerPoint PPT Presentation

An Efficient Black-box Technique for Defeating Web Application Attacks R. Sekar S tony Brook University (Research supported by DARPA, NS F and ONR) 2/9/2009 Example: SquirrelMail Command Injection Attack: use maliciously Incom ing

Paradoxes in Probability How probability continues to amuse me! Let's play a game! Box A Box B

Black Box Scanning Tool + White Box Testing Tool Toshis Black Box Scanning Tool Same

A recipe for black box functors Maru Sarazola and Brendan Fong What is a black box functor? In

Efficient Black-Box Combinatorial Optimization Hamid Dadkhahi Karthikeyan Shanmugam Jesus Rios

Web Services Web Services Towards Web Services Towards Web Services Towards Web Services A

Kid s Box American English Level 1 Presentation Plus: Kid s Box American English Kid s Box

Flux Box Flux Box A concept by Flux Laboratory Flux box : concept Flux box : concept What is Flux

[7] Gaussian Elimination Starting to peek inside the black box So far sol ve( A, b) is a black

Side Channel Analysis &amp; Countermeasures Begl Bilgin 27 Dec. 2014 - IAM Alumni Meeting

PRIVATE EVENTS PrivateEvents@ACL-LIVE.com (512)404-1318 ACL LIVE: A Black Box for events

Make sure we can query black box algorithms

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Black Hole Thermodynamics Robert M. Wald I. Black Holes; Event Horizons and Killing Horizons II.

Red-Black Trees Binary Search Trees with O(log n) Worst-Case Time per Operation The Red-Black

5 Rules 1 Red Black Tree Properties - A 1. Every Node Is Either RED or BLACK 2. Every NILL Node

Mats Rahmstrm The President and CEOs address Annual General Meeting 2018 Product portfolio

Amgen Investor Event 2014 ACC Scientific Session March 30, 2014 Safe Harbor Statement This

Introduction to Artificial Intelligence Local Search Janyl Jumadinova September 19, 2016

LCS 11: Cognitive Science quickly you hardly catch it going? Its really all memory ...except

CS 103 Unit 11 Linked Lists Mark Redekopp 2 NULL Pointer Just like there was a null

Dynamic Analysis Kung-Fu with PANDA This work is sponsored in part under Air Force contract

Thy Kingdom Come [A STUDY THROUGH MATTHEWS GOSPEL] QUESTIONS FOR DISCUSSION &amp; DISCOVERY 1.

Tor and circumvention: Lessons learned Nick Mathewson The Tor Project https://torproject.org/

Introduction: In this 21 st century the cognitive impairment and oxidative stress were most

Sambuz

Useful Links

Newsletter

Mail Us

Side Channel Analysis & Countermeasures Begl Bilgin 27 Dec. 2014 - IAM Alumni Meeting

Thy Kingdom Come [A STUDY THROUGH MATTHEWS GOSPEL] QUESTIONS FOR DISCUSSION & DISCOVERY 1.