jk: Using Dynamic Analysis to Crawl and Test Modern Web Applications - - PowerPoint PPT Presentation

j k using dynamic analysis to crawl and test modern web
SMART_READER_LITE
LIVE PREVIEW

jk: Using Dynamic Analysis to Crawl and Test Modern Web Applications - - PowerPoint PPT Presentation

jk: Using Dynamic Analysis to Crawl and Test Modern Web Applications Giancarlo Pellegrino (1) , Constantin Tschrtz (2) , Eric Bodden (2) , and Christian Rossow (1) 18th International Symposium on Research in Attacks, Intrusions and Defenses


slide-1
SLIDE 1

jÄk: Using Dynamic Analysis to Crawl and Test Modern Web Applications

Giancarlo Pellegrino(1), Constantin Tschürtz(2), Eric Bodden(2), and Christian Rossow(1)

18th International Symposium on Research in Attacks, Intrusions and Defenses November 3rd, Kyoto, Japan

(1) CISPA, Saarland University, Germany

(2) Fraunhofer SIT / TU Darmstadt, Germany

slide-2
SLIDE 2
  • Nov. 3, 2016

Web Application Scanners

 (Semi-)automated security testing tools  Follow a dynamic and black-box testing approach

slide-3
SLIDE 3
  • Nov. 3, 2016

Web Application Scanners

 (Semi-)automated security testing tools  Follow a dynamic and black-box testing approach

slide-4
SLIDE 4
  • Nov. 3, 2016

Architecture

Crawler Module Attacker Module Analysis Module

slide-5
SLIDE 5
  • Nov. 3, 2016

Crawler

Seed URL http://shop.foo http://shop.foo

slide-6
SLIDE 6
  • Nov. 3, 2016

Crawler

<html> <head> <title>Online shopping</title> </head> <body> <a href=”/contacts”>Contacts</a> <form action=”/search”> <input type=”text” name=”q”/> <input type=”submit”/> </form> </body> </html> http://shop.foo

slide-7
SLIDE 7
  • Nov. 3, 2016

Crawler

<html> <head> <title>Online shopping</title> </head> <body> <a href=”/contacts”>Contacts</a> <form action=”/search”> <input type=”text” name=”q”/> <input type=”submit”/> </form> </body> </html> http://shop.foo New URL

slide-8
SLIDE 8
  • Nov. 3, 2016

Crawler

<html> <head> <title>Online shopping</title> </head> <body> <a href=”/contacts”>Contacts</a> <form action=”/search”> <input type=”text” name=”q”/> <input type=”submit”/> </form> </body> </html> http://shop.foo New search HTML form

slide-9
SLIDE 9
  • Nov. 3, 2016

Crawler

http://shop.foo/contacts Next?

slide-10
SLIDE 10
  • Nov. 3, 2016

Crawler

<html> <head> <title>Contact Page</title> </head> <body> <form action=”/comments”> <input type=”text” name=”msg”/> <input type=”submit”/> </form> </body> </html> http://shop.foo/contacts New HTML form

slide-11
SLIDE 11
  • Nov. 3, 2016

Security Testing

<form action=”/search”> <input type=”text” name=”q”/> <input type=”submit”/> </form> Tests == Attacks Responses

?

shop.foo

XSS payload XSS payload SQL payload SQL payload

slide-12
SLIDE 12
  • Nov. 3, 2016

Crawler Critical for Coverage

 Crawler explores the Web application attack surface

  • Missing parts → missing possible vulnerabilities

 Existing crawlers based on:

  • HTML parsing and pattern matching to extract URLs
  • “clickable” areas to further explore the surface
slide-13
SLIDE 13
  • Nov. 3, 2016

Crawler and Modern Web Applications

 Complexity of client side has dramatically increased (i.e., stateful JS programs)

slide-14
SLIDE 14
  • Nov. 3, 2016

Crawler and Modern Web Applications

 Complexity of client side has dramatically increased (i.e., stateful JS programs)  Links and forms can be built and inserted in the webpage at run-time

➔HTML parsing and pattern matching no longer sufficient

var url = scheme() + '://' + domain() + '/' + endpoint(); document.getElementByID('myLink').href = url;

slide-15
SLIDE 15
  • Nov. 3, 2016

Crawler and Modern Web Applications

 Complexity of client side has dramatically increased (i.e., stateful JS programs)  Links and forms can be built and inserted in the webpage at run-time

➔HTML parsing and pattern matching no longer sufficient

 JS is an event-driven language

  • Functions executed upon events

➔Lack of support of event-based execution model

var url = scheme() + '://' + domain() + '/' + endpoint(); document.getElementByID('myLink').href = url;

click mouse movement timeout Ajax response received generate URLs/HTML form register new events Ajax requests

slide-16
SLIDE 16
  • Nov. 3, 2016

Crawler and Modern Web Applications

 Complexity of client side has dramatically increased (i.e., stateful JS programs)  Links and forms can be built and inserted in the webpage at run-time

➔HTML parsing and pattern matching no longer sufficient

 JS is an event-driven language

  • Functions executed upon events

➔Lack of support of event-based execution model

var url = scheme() + '://' + domain() + '/' + endpoint(); document.getElementByID('myLink').href = url;

click mouse movement timeout Ajax response received generate URLs/HTML form register new events Ajax requests

Large part of web applications remain unexplored! Large part of web applications remain unexplored!

slide-17
SLIDE 17
  • Nov. 3, 2016

Crawler and Modern Web Applications

 Complexity of client side has dramatically increased (i.e., stateful JS programs)  Links and forms can be built and inserted in the webpage at run-time

➔HTML parsing and pattern matching no longer sufficient

 JS is an event-driven language

  • Functions executed upon events

➔Lack of support of event-based execution model

var url = scheme() + '://' + domain() + '/' + endpoint(); document.getElementByID('myLink').href = url;

click mouse movement timeout Ajax response received generate URLs/HTML form register new events Ajax requests

Large part of web applications remain unexplored! Large part of web applications remain unexplored!

 We addressed the coverage problem with

  • JavaScript client side dynamic analysis
  • Model-based Crawler

 Build a tool: jÄk

slide-18
SLIDE 18
  • Nov. 3, 2016

Our Approach

 Combine dynamic analysis with model-based crawler

  • Dynamic analysis monitors client side program execution
  • Crawler builds, maintains, uses a model of the visited attack surface

Seed URL

Model-based Crawler

Model Inference/Update Action Navigator

Dynamic Analysis

Trace Analysis APIs I/O Trace Handler reg. JS Engine Probe

slide-19
SLIDE 19
  • Nov. 3, 2016

Dynamic Analysis

 Different approaches:

Seed URL Model Inference/Update Action Navigator Trace Analysis I/O Trace Handler reg. JS Engine Environment Probe APIs

slide-20
SLIDE 20
  • Nov. 3, 2016

Dynamic Analysis

 Different approaches:

1) JS engine instrumentation → laborious task, engine-dependent

Seed URL Model Inference/Update Action Navigator Trace Analysis I/O Trace Handler reg. JS Engine Environment Probe APIs

slide-21
SLIDE 21
  • Nov. 3, 2016

Dynamic Analysis

 Different approaches:

1) JS engine instrumentation → laborious task, engine-dependent 2) JS program instrumentation → JS code is not entirely available

Seed URL Model Inference/Update Action Navigator Trace Analysis I/O Trace Handler reg. JS Engine Environment Probe APIs

slide-22
SLIDE 22
  • Nov. 3, 2016

Dynamic Analysis

 Different approaches:

1) JS engine instrumentation → laborious task, engine-dependent 2) JS program instrumentation → JS code is not entirely available 3) Modification of execution environment

Seed URL Model Inference/Update Action Navigator Trace Analysis I/O Trace Handler reg. JS Engine Environment Probe APIs

slide-23
SLIDE 23
  • Nov. 3, 2016

Dynamic Analysis

 Modify execution environment via function hooking:

  • Intercept API calls (e.g., network I/O and event handler registration)
  • Object manipulations (i.e., object properties)
  • Schedule DOM inspections

 Hooks installed by injecting own JS code:

  • Function redefinition
  • Set functions

Seed URL Model Inference/Update Action Navigator I/O Handler reg. JS Engine Environment Probe APIs

slide-24
SLIDE 24
  • Nov. 3, 2016

Function Redefinition

function handler() { alert("hello world"); } el = document.getElementByID('img') el.addEventListener("click", handler);

Application JS code

slide-25
SLIDE 25
  • Nov. 3, 2016

Function Redefinition

function handler() { alert("hello world"); } el = document.getElementByID('img') el.addEventListener("click", handler);

slide-26
SLIDE 26
  • Nov. 3, 2016

Function Redefinition

function handler() { alert("hello world"); } el = document.getElementByID('img') el.addEventListener("click", handler);

slide-27
SLIDE 27
  • Nov. 3, 2016

Function Redefinition

function handler() { alert("hello world"); } el = document.getElementByID('img') el.addEventListener("click", handler);

API API

Element.prototype.addEventListener = function(e, h) { […] listeners[e].append(h); } Element.prototype.addEventListener = function(e, h) { […] listeners[e].append(h); }

slide-28
SLIDE 28
  • Nov. 3, 2016

Function Redefinition

function handler() { alert("hello world"); } el = document.getElementByID('img') el.addEventListener("click", handler);

API API

Element.prototype.addEventListener = function(e, h) { […] listeners[e].append(h); } Element.prototype.addEventListener = function(e, h) { […] listeners[e].append(h); }

Intercept! Intercept!

slide-29
SLIDE 29
  • Nov. 3, 2016

Function Redefinition

function handler() { alert("hello world"); } el = document.getElementByID('img') el.addEventListener("click", handler);

preamble

Application JS code

PREAMBLE PREAMBLE

var orig_f = Element.prototype.addEventListener; Element.prototype.addEventListener = function(){ console.log("new handler registration"); return orig_f.apply(this, argument); }; var orig_f = Element.prototype.addEventListener; Element.prototype.addEventListener = function(){ console.log("new handler registration"); return orig_f.apply(this, argument); };

slide-30
SLIDE 30
  • Nov. 3, 2016

Function Redefinition

function handler() { alert("hello world"); } el = document.getElementByID('img') el.addEventListener("click", handler);

preamble

Application JS code

PREAMBLE PREAMBLE

var orig_f = Element.prototype.addEventListener; Element.prototype.addEventListener = function(){ console.log("new handler registration"); return orig_f.apply(this, argument); }; var orig_f = Element.prototype.addEventListener; Element.prototype.addEventListener = function(){ console.log("new handler registration"); return orig_f.apply(this, argument); };

API API

Element.prototype.addEventListener = function(e, h) { […] listeners[e].append(h); } Element.prototype.addEventListener = function(e, h) { […] listeners[e].append(h); }

slide-31
SLIDE 31
  • Nov. 3, 2016

Function Redefinition

function handler() { alert("hello world"); } el = document.getElementByID('img') el.addEventListener("click", handler);

preamble

Application JS code

PREAMBLE PREAMBLE

var orig_f = Element.prototype.addEventListener; Element.prototype.addEventListener = function(){ console.log("new handler registration"); return orig_f.apply(this, argument); }; var orig_f = Element.prototype.addEventListener; Element.prototype.addEventListener = function(){ console.log("new handler registration"); return orig_f.apply(this, argument); };

API API

Element.prototype.addEventListener = function(e, h) { […] listeners[e].append(h); } Element.prototype.addEventListener = function(e, h) { […] listeners[e].append(h); }

slide-32
SLIDE 32
  • Nov. 3, 2016

Function Redefinition

function handler() { alert("hello world"); } el = document.getElementByID('img') el.addEventListener("click", handler);

preamble

Application JS code

PREAMBLE PREAMBLE

var orig_f = Element.prototype.addEventListener; Element.prototype.addEventListener = function(){ console.log("new handler registration"); return orig_f.apply(this, argument); }; var orig_f = Element.prototype.addEventListener; Element.prototype.addEventListener = function(){ console.log("new handler registration"); return orig_f.apply(this, argument); };

API API

Element.prototype.addEventListener = function(e, h) { […] listeners[e].append(h); } Element.prototype.addEventListener = function(e, h) { […] listeners[e].append(h); }

slide-33
SLIDE 33
  • Nov. 3, 2016

Function Redefinition

function handler() { alert("hello world"); } el = document.getElementByID('img') el.addEventListener("click", handler);

preamble

Application JS code

PREAMBLE PREAMBLE

var orig_f = Element.prototype.addEventListener; Element.prototype.addEventListener = function(){ console.log("new handler registration"); return orig_f.apply(this, argument); }; var orig_f = Element.prototype.addEventListener; Element.prototype.addEventListener = function(){ console.log("new handler registration"); return orig_f.apply(this, argument); };

API API

Element.prototype.addEventListener = function(e, h) { […] listeners[e].append(h); } Element.prototype.addEventListener = function(e, h) { […] listeners[e].append(h); }

slide-34
SLIDE 34
  • Nov. 3, 2016

Function Redefinition

function handler() { alert("hello world"); } el = document.getElementByID('img') el.addEventListener("click", handler);

preamble

Application JS code

PREAMBLE PREAMBLE

var orig_f = Element.prototype.addEventListener; Element.prototype.addEventListener = function(){ console.log("new handler registration"); return orig_f.apply(this, argument); }; var orig_f = Element.prototype.addEventListener; Element.prototype.addEventListener = function(){ console.log("new handler registration"); return orig_f.apply(this, argument); };

API API

Element.prototype.addEventListener = function(e, h) { […] listeners[e].append(h); } Element.prototype.addEventListener = function(e, h) { […] listeners[e].append(h); }

slide-35
SLIDE 35
  • Nov. 3, 2016

Model-based Crawler

 Creates and maintain a web application model

  • Oriented graph: nodes are page clusters and edges are URLs, HTML forms, or events

 Model used to decide on the next action

  • Priority: Events → high, URLs/forms → low

Seed URL

Dynamic Analysis

Trace Analysis APIs I/O Trace Handler reg. Hooking functions JS Engine Environment Probe

Model-based Crawler

Model Inference/Update Navigator Action

slide-36
SLIDE 36
  • Nov. 3, 2016

Model-based Crawler

 Creates and maintain a web application model

  • Oriented graph: nodes are page clusters and edges are URLs, HTML forms, or events

 Model used to decide on the next action

  • Priority: Events → high, URLs/forms → low

Seed URL

Dynamic Analysis

Trace Analysis APIs I/O Trace Handler reg. Hooking functions JS Engine Environment Probe

Model-based Crawler

Model Inference/Update Navigator Action

slide-37
SLIDE 37
  • Nov. 3, 2016

Assessment

slide-38
SLIDE 38
  • Nov. 3, 2016

Our Tool

Dynamic Analysis

Trace Analysis APIs I/O Trace Handler reg. Hooking functions JS Engine Environment Probe

Model-based Crawler

Model Inference/Update Navigator Action

 New tool: jÄk [pron. Jack]  Source code on GitHub

  • https://github.com/ConstantinT/jAEk

 Free to run, copy, distribute, study, change and improve it

  • Free Software (GPL3) … and also free as free beer!
slide-39
SLIDE 39
  • Nov. 3, 2016

Experiments

 Comparative analysis

  • Skipfish, W3af, Wget, and Crawljax

 Case studies:

  • WIVET (Web Input Vector Extractor Teaser)
  • assess strength and limitations of existing crawlers
  • 13 web applications
  • studied coverage and vulnerability detection power
slide-40
SLIDE 40
  • Nov. 3, 2016

Coverage

 Explored surface by jÄk (# of unique URL structs.)

  • x16 (Crawljax) to x2 (Skipfish) bigger

Crawler jÄk

~x6 bigger

slide-41
SLIDE 41
  • Nov. 3, 2016

Coverage

 Explored surface by jÄk (# of unique URL structs.)

  • x16 (Crawljax) to x2 (Skipfish) bigger

 Relative size of new surface:

  • From +70% (Wget) to +98% (Crawljax) of URLs are new

jÄk

New surface Known surface ~86% more

slide-42
SLIDE 42
  • Nov. 3, 2016

Coverage

 Explored surface by jÄk (# of unique URL structs.)

  • x16 (Crawljax) to x2 (Skipfish) bigger

 Relative size of new surface:

  • From +70% (Wget) to +98% (Crawljax) of URLs are new

 Global surface missed by jÄk:

  • From 22% (Skipfish) to 0.5% (Crawljax) are missed

Crawler jÄk

~15% missed URLs Unknown to jÄk

slide-43
SLIDE 43
  • Nov. 3, 2016

Coverage

 Explored surface by jÄk (# of unique URL structs.)

  • x16 (Crawljax) to x2 (Skipfish) bigger

 Relative size of new surface:

  • From +70% (Wget) to +98% (Crawljax) of URLs are new

 Global surface missed by jÄk:

  • From 22% (Skipfish) to 0.5% (Crawljax) are missed
  • Further analysis:
  • 75% of missed are due to URL forgery
  • 25% to static resources, unsupported action, and others

Crawler jÄk

~15% missed URLs Unknown to jÄk

slide-44
SLIDE 44
  • Nov. 3, 2016

Conclusion

slide-45
SLIDE 45
  • Nov. 3, 2016

Conclusion/Takeaway

 Novel technique based on

  • dynamic analysis of JS program + model-based crawling

 Built jÄk, a tool implementing our approach  Assessed against 13 web applications  Our result show that jÄk explores a surface ~6x larger with +86% new URLs