Statically-informed Dynamic Analysis Tools to Detect Algorithmic - - PowerPoint PPT Presentation

statically informed dynamic analysis tools to detect
SMART_READER_LITE
LIVE PREVIEW

Statically-informed Dynamic Analysis Tools to Detect Algorithmic - - PowerPoint PPT Presentation

learn invent impact Statically-informed Dynamic Analysis Tools to Detect Algorithmic Complexity Vulnerabilities 16th IEEE Interna,onal Working Conference on Source Code Analysis and Manipula,on (SCAM 2016) October 2, 2016 Benjamin Holland,


slide-1
SLIDE 1

learn invent impact

Statically-informed Dynamic Analysis Tools to Detect Algorithmic Complexity Vulnerabilities

16th IEEE Interna,onal Working Conference on Source Code Analysis and Manipula,on (SCAM 2016) October 2, 2016

learn invent impact

Acknowledgement: Team members at Iowa State University and EnSoft, DARPA contracts FA8750-12-2-0126 & FA8750-15-2-0080

Benjamin Holland, Ganesh Ram Santhanam, Payas Awadhutkar, and Suresh Kothari Email: {bholland, gsanthan, payas, kothari}@iastate.edu

slide-2
SLIDE 2

learn invent impact

  • DARPA Space/Time Analysis for Cybersecurity (STAC) program

⁻ Given a compiled Java bytecode program ⁻ Discover Algorithmic Complexity (AC) vulnerabili,es

Mo Mo,va,o ,va,on n

2

<?xml version="1.0"?> <!DOCTYPE lolz [ <!ENTITY lol "lol"> <!ELEMENT lolz (#PCDATA)> <!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;"> <!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;"> <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;"> … <!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;"> <!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;"> <!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;"> ]> <lolz>&lol9;</lolz>

XML Parser

Parsing a specially craVed input file of less than a kilobyte creates a string of 109 concatenated “lol” strings requiring approximately 3 gigabytes of memory.

slide-3
SLIDE 3

learn invent impact

  • DARPA Space/Time Analysis for Cybersecurity (STAC) program

⁻ Given a compiled Java bytecode program ⁻ Discover Algorithmic Complexity (AC) vulnerabili,es ⁻ Vulnerabili,es are defined with respect to a budget

  • Example: Max input size 1kb, execu,on ,me exceeds 300s on a given reference

plaWorm

Mo Mo,va,o ,va,on n

3

slide-4
SLIDE 4

learn invent impact

  • Approach
  • Sta,c and Dynamic Analysis Tools
  • Sta,c loop analysis
  • Instrumenta,on and dynamic analysis
  • Case Study
  • Walkthrough analysis
  • Q/A

Over Overvi view ew

4

slide-5
SLIDE 5

learn invent impact

  • Algorithmic complexity (AC) vulnerabili,es are rooted in the space and ,me

complexi,es of externally-controlled execu,on paths with loops.

⁻ Exis,ng tools for compu,ng the loop complexity are limited and cannot prove termina,on for several classes of loops. ⁻ At the extreme, a completely automated detec,on of AC vulnerabili,es amounts to solving the intractable hal,ng problem.

  • Key Idea: Combine human intelligence with sta,c and dynamic analysis to

achieve scalability and accuracy.

⁻ A lightweight sta,c analysis is used for a scalable explora,on of loops in bytecode from large so\ware, and an analyst selects a small subset of these loops for further evalua,on using a dynamic analysis for accuracy.

Ap Approa

  • ach

ch

5

slide-6
SLIDE 6

learn invent impact

  • 1. Automated Explora7on: Iden,fy loops, pre-compute their crucial a]ributes

such as intra- and inter-procedural nes,ng structures and depths, and termina,on condi,ons.

  • 2. Hypothesis Genera7on: Through an interac,ve inspec,on of the pre-

computed informa,on the analyst hypothesizes plausible AC vulnerabili,es and selects candidate loops for further examina,on using dynamic analysis.

  • 3. Hypothesis Valida7on: The analyst inserts probes and creates a driver to

exercise the program by feeding workloads to measure resource consump,on for the selected loops.

Vu Vulner erability Det y Detec, ec,on

  • n Pr

Proces

  • cess

6

slide-7
SLIDE 7

learn invent impact

  • Loop Call Graph (LCG)

⁻ Recovers loop headers in bytecode using the DLI algorithm [Wei SAS 2007] ⁻ Combines call rela,onships to produce a compact visual model to explore intra- and inter-procedural nes,ng structures of loops. ⁻ Constructed sta,cally, interac,ve, expandable, corresponds to source

  • Time Complexity Analyzer (TCA)

⁻ A dynamic analyzer that enables the analyst to automa,cally instrument the selected loops with resource usage probes ⁻ Skeleton driver genera,on ⁻ Linear regression to es,mate complexity

Sta,cally-informe med Dynami mic Analysis (SID) Tools

7

slide-8
SLIDE 8

learn invent impact

Lo Loop Call G p Call Graph raph

8

Ca Called ed Ou Outside Loop e Loop Ca Called ed In Inside Loop e Loop

Nod Nodes es:

  • Methods containing loops (blue)
  • Methods reaching methods

containing loops (white) Ed Edges:

  • Call rela,onships
  • Color a]ributes to show

placement of call site in loop

slide-9
SLIDE 9

learn invent impact

Control Fl Flow Loop View

9

  • Loop levels are shaded

darker for each nes,ng level

  • Branch condi,on

coloring

⁻ Red is false ⁻ Green is true

  • Loop back edge is grey
  • Uncondi,onal is black
slide-10
SLIDE 10

learn invent impact

In Interac, erac,ve Grap ve Graph Mod Models els – Trad Tradi, i,on

  • nal Call Grap

al Call Graph

10

CFG

0-Level Call graph Call Graph “smart view”

slide-11
SLIDE 11

learn invent impact

In Interac, erac,ve Grap ve Graph Mod Models els – Trad Tradi, i,on

  • nal Call Grap

al Call Graph

11

Complete Call Graph Call Graph “smart view”

slide-12
SLIDE 12

learn invent impact

In Interac, erac,ve Grap ve Graph Mod Models els – Lo Loop Call G p Call Graph (Ex raph (Expandable) pandable)

12

Loop Call Graph “smart view” expandable

slide-13
SLIDE 13

learn invent impact

In Interac, erac,ve Grap ve Graph Mod Models els – Loop Loop Call Grap Call Graph

13

Vulnerability

slide-14
SLIDE 14

learn invent impact

  • Analyst picks entry point in the app using Loop Call

Graph (LCG) view ⁻ LCG: Induced subgraph of reachable methods that contain loops

  • Analyst selects methods from the LCG view to

instrument ⁻ Probe choices: Itera,on counters & Wall clock ,mers

  • Automa,c probe inser,on into Jimple & reassembly

into bytecode

  • Automa,c driver skeleton genera,on

⁻ Analyst fills in the driver with code that provides test input

  • Automa,c plot of the collected measurements for the

given test input

Time me Comp mplexity Analyzer

14

slide-15
SLIDE 15

learn invent impact

  • Itera,on Counters

⁻ Tracks the number of ,mes a loop header is executed ⁻ PlaWorm independent, repeatable

  • Wall Clock Timers

⁻ Uses ,mestamps to measure the cumula,ve ,me spent in a loop ⁻ More prone to noisy and inaccuracy, but s,ll useful

  • Consider: caching or garbage collec,on side effects on the run,me
  • Probes are inserted a\er selected loop headers

TCA Instrume menta,on

15

slide-16
SLIDE 16

learn invent impact

  • Generates driver

“skeleton” with callsites to target methods

  • Workload is provided

by the user

⁻ Workload should map inputs to a “workload size”

Driver G Driver Genera,o enera,on n

16

slide-17
SLIDE 17

learn invent impact

  • Plots results on a log-log scale
  • Linear regression to fit

measurements

  • R2 error value
  • A slope of m on the log-log plot

indicates the measured empirical complexity of nm.

  • Poten,al use in educa,on for

comparing empirical complexi,es

  • f two algorithms

Comp mplexity Analysis

17 5 10 15 20 25 30 2 4 6 8 10 12 14 16 log(counter) log(input size) linear, slope = 1.83, R2 = 0.99 binary, slope = 1.23, R2 = 0.99

Linear vs. Binary Inserhon Sort Performance on Random Data

slide-18
SLIDE 18

learn invent impact

Walkthrough of Blogger

18 learn learn invent impact www.ece.iastate.edu

slide-19
SLIDE 19

learn invent impact

Analyst Goal

– Find most expensive loops reachable in the app – Verify if they violate resource consump,on limit within the budget

Demo: SID tools used to find AC vulnerability

– Loop Call Graph: Find loops reachable from points of interest – Smart Views: On-demand composable analysis – Time Complexity Analyzer: Measure run,me performance of loops for inputs within budget

19 learn learn invent impact www.ece.iastate.edu

Blogger Walkthrough/Workflow

slide-20
SLIDE 20

learn invent impact

learn learn invent impact www.ece.iastate.edu learn learn invent impact www.ece.iastate.edu 20

Blogger > How we found the AC vulnerability

1. Follow call graphs from entry point to code that serves client requests

– Call graph from JavaWebServer.main() is too large – No,ce that JDK APIs are used to start Threads – Look at reverse call graph from Thread.start() to see what threads are started

2. Iden,fy use of threads in applica,on server design

– ServerRunnable is listener thread – ClientHandler is request processor thread

3. Iden,fy loops reachable from ClientHandler using LCG

– Narrow down scope of vulnerability to 25 of the 422 methods

4. Formulate & Validate Hypothesis

– Run dynamic analysis informed by LCG to find method causing vulnerability

slide-21
SLIDE 21

learn invent impact

learn learn invent impact www.ece.iastate.edu learn learn invent impact www.ece.iastate.edu 21

Step 1 – Locate use of Threads

Zooming into leaves of call graph from

JavaWebServer.main() shows JDK APIs are

used to start Threads NanoHTTPD is a threaded web server.

  • Q. Where are threads started in the app? Which threads handle client requests?
slide-22
SLIDE 22

learn invent impact

learn learn invent impact www.ece.iastate.edu www.ece.iastate.edu

Step 2 – ClientHandler Thread Handlers HTTP requests

ClientHandler handles client requests

Forward call graph from ClientHandler.run() is s,ll large: 483 nodes, 1135 edges

  • Q. What loops in the app are reachable from ClientHandler.run()?
slide-23
SLIDE 23

learn invent impact

learn learn invent impact www.ece.iastate.edu www.ece.iastate.edu

Step 3 – Loop Call Graph

Significantly more compact view than the

  • riginal call graph
  • 79 nodes, 150 edges in LCG from ClientHandler.run
  • 41 loops reached from ClientHandler.run
  • Compared to 483 nodes, 1135 edges in the call graph
  • Focuses analyst a]en,on on loops,

while preserving call reachability

  • Includes the vulnerability - URIVerifier.verify()

Analyst wants to find “interes,ng” methods to inspect

slide-24
SLIDE 24

learn invent impact

learn learn invent impact www.ece.iastate.edu www.ece.iastate.edu

Step 4 – Dynamic Analysis Informed by LCG

  • 1. Analyst uses TCA to probe each of the 41 loops using Itera,on

Counter instrument

  • 2. TCA compiles, runs instrumented jar

(Instrumented Blogger server is started)

  • 3. Once server is started, analyst interacts with the applica,on

using a web browser

  • 4. TCA records the number of itera,ons for each loop execu,on
slide-25
SLIDE 25

learn invent impact

learn learn invent impact www.ece.iastate.edu www.ece.iastate.edu

Step 4 – Dynamic Analysis Informed by LCG

Analyst issues 3 sample URLs to server

“/” “/test” “/stac/example/Example”

Instrumented server counts and saves # itera,ons for each loop exercised 2 methods record large itera,on counts

  • HTTPSession.findHeaderEnd()
  • URIVerifier.verify()
slide-26
SLIDE 26

learn invent impact

learn learn invent impact www.ece.iastate.edu www.ece.iastate.edu

Step 4 – Dynamic Analysis Informed by LCG

  • Single loop
  • Single termina,on condi,on
  • Loop induc,on variable splitbyte:

– Modified in one loca,on inside the loop body – Monotonically increases up to termina,on condi,on

slide-27
SLIDE 27

learn invent impact

learn learn invent impact www.ece.iastate.edu www.ece.iastate.edu

Step 4 – Dynamic Analysis Informed by LCG

  • 3 loops
  • Logic behind push and pop on loop induc,on variable tuples is unclear
  • Analyst decides to instrument URIVerifier.verify() separately
slide-28
SLIDE 28

learn invent impact

learn learn invent impact www.ece.iastate.edu www.ece.iastate.edu

Step 4 – Dynamic Analysis Informed by LCG

Analyst uses TCA to instrument URIVerifier.verify() with itera,on counter Driver to test the method with URL strings of increasing length:

slide-29
SLIDE 29

learn invent impact

learn learn invent impact www.ece.iastate.edu www.ece.iastate.edu

Step 4 – Dynamic Analysis Informed by LCG

TCA produces a plot of # itera,ons in URIVerifier.verify() vs. URL string length Analyst confirms URIVerifier.verify() exceeds budgeted ,me of 300 seconds for URL strings of length > 35

slide-30
SLIDE 30

learn invent impact

  • SID Tools: h]ps://enso\corp.github.io/SID/

⁻ Eclipse Plugin ⁻ Open Source, MIT License ⁻ Video Demo

  • Atlas

⁻ Supports C/Java/JVM Bytecode (Jimple IR) ⁻ Free for academic use/open source projects ⁻ h]p://www.enso\corp.com/atlas/

  • Soot

⁻ Bytecode to Jimple transforma,on ⁻ h]ps://sable.github.io/soot/

Tool Tools

30

slide-31
SLIDE 31

learn invent impact

  • Be]er heuris,cs to guide analyst to problem areas

⁻ Loops with complex termina,on condi,ons ⁻ Non-monotonic loops

  • Thinking hard about input genera,on

Fu Future Work

31

slide-32
SLIDE 32

learn invent impact

  • Ques,ons?

Th Thank y you

  • u.

32