Dytan: A Generic Dynamic Taint Analysis Framework James Clause, - - PowerPoint PPT Presentation

dytan a generic dynamic taint analysis framework
SMART_READER_LITE
LIVE PREVIEW

Dytan: A Generic Dynamic Taint Analysis Framework James Clause, - - PowerPoint PPT Presentation

Dytan: A Generic Dynamic Taint Analysis Framework James Clause, Wanchun (Paul) Li, and Alessandro Orso College of Computing Georgia Institute of Technology Partially supported by: NSF awards CCF-0541080 and CCR-0205422 to Georgia Tech, DHS


slide-1
SLIDE 1

Dytan: A Generic Dynamic Taint Analysis Framework

James Clause, Wanchun (Paul) Li, and Alessandro Orso

College of Computing Georgia Institute of Technology

Partially supported by: NSF awards CCF-0541080 and CCR-0205422 to Georgia Tech, DHS and US Air Force Contract No. FA8750-05-2-0214

slide-2
SLIDE 2

C A B Z C A B 3 1 2 Z C A B 3 1 2 Z 3

Dynamic taint analysis

(aka dynamic information-flow analysis)

slide-3
SLIDE 3

Dynamic tainting applications

Information policy enforcement Attack detection / prevention Testing Data lifetime / scope

slide-4
SLIDE 4

Dynamic tainting applications

Information policy enforcement Attack detection / prevention Testing Data lifetime / scope Attack detection / prevention

Detect / prevent attacks such as SQL injection, buffer overruns, stack smashing, cross site scripting

e.g., Suh et al. 04, Newsome and Song 05, Halfond et al. 06, Kong et al. 06, Qin et al. 06

slide-5
SLIDE 5

Dynamic tainting applications

Information policy enforcement Attack detection / prevention Testing Data lifetime / scope Information policy enforcement

ensure classified information does not leak outside the system

e.g.,Vachharajani et al. 04, McCamant and Ernst 06

slide-6
SLIDE 6

Dynamic tainting applications

Information policy enforcement Attack detection / prevention Testing Data lifetime / scope Testing

Coverage metrics, test data generation heuristic, ...

e.g., Masri et al 05, Leek et al. 07

slide-7
SLIDE 7

Dynamic tainting applications

Information policy enforcement Attack detection / prevention Testing Data lifetime / scope Data lifetime / scope

track how long sensitive data, such as passwords or account numbers, remain in the application

e.g., Chow et al. 04

slide-8
SLIDE 8

Motivation

Ad-hoc taint analysis implementation

Results

Ad-hoc taint analysis implementation Ad-hoc taint analysis implementation

Results Results

Ad-hoc taint analysis implementation

Results

slide-9
SLIDE 9

Motivation

  • Flexible
  • Easy to use
  • Accurate

Configuration Dytan Generic Framework Custom Dynamic Taint Analysis

Results

slide-10
SLIDE 10

Outline

Motivation & overview

  • Framework (Dytan)
  • flexibility
  • ease of use
  • accuracy
  • Empirical evaluation
  • Conclusions
slide-11
SLIDE 11

Framework: flexibility

Taint sources Propagation policy Taint sinks

Configuration

slide-12
SLIDE 12

Framework: flexibility

Taint sources Propagation policy Taint sinks

slide-13
SLIDE 13

Framework: flexibility

Taint sources Taint sources Propagation policy Taint sinks

Which data to tag, and how to tag it

slide-14
SLIDE 14

Framework: flexibility

Propagation policy Taint sources Propagation policy Taint sinks

How tags should be propagated at runtime

slide-15
SLIDE 15

Framework: flexibility

Taint sinks Taint sources Propagation policy Taint sinks

Where and how tags should be checked

slide-16
SLIDE 16

Taint sources

What to tag How to tag

Identify what program data should be assigned tags

  • Variables (local or global)
  • Function parameters
  • Function return values
  • Data from an input stream

network, filesystem, keyboard, ...

  • Specific input stream

141.195.121.134:80, a.txt,... Describe how tags should be assigned for identified data

  • Single tag
  • One tag per source
  • Multiple tags per source
  • ...
slide-17
SLIDE 17

a.txt a.txt

Taint sources

What to tag: a.txt How to tag: single tag

a.txt 1 1 1 1 1 1

slide-18
SLIDE 18

a.txt a.txt

Taint sources

What to tag: a.txt

a.txt 1 1 1 1 1 1 2 3 1 4 5 n

How to tag: multiple tags

slide-19
SLIDE 19

Affecting data Mapping function

Data that affects the outcome of a statement through

  • Data dependencies
  • Control dependencies

A policy can consider both or

  • nly data dependencies

Define how tags associated with affecting data should be combined

  • Union
  • Max
  • ...

Propagation policy

3

B A

1 2 3

C

slide-20
SLIDE 20

3

if(X) { C = A + B; }

1

2

Propagation policy

Affecting data: control dependence Mapping function: data dependence union max

slide-21
SLIDE 21

3

if(X) { C = A + B; }

1

2

Propagation policy

Affecting data: control dependence Mapping function: data dependence

!

union max

! !

1 2

slide-22
SLIDE 22

3

if(X) { C = A + B; }

1

2

Propagation policy

Affecting data: control dependence Mapping function: data dependence

!

union max

! !

3

slide-23
SLIDE 23

Where to check What to check

Location in the program to perform a check

  • Function entry / exit
  • Statement type
  • Specific program point

The data whose tags should be checked

  • Variables
  • Function parameters
  • Function return value

Taint Sinks

How to check

Set of conditions to check and a set of actions to perform if the conditions are not met.

  • validate presence of tags (exit or log)
  • ensure absence of tags (exit or log)
  • ...
slide-24
SLIDE 24

Taint Sinks

cmd = read(file); args = read(socket); cmd = trim(cmd + args); ... tok[] = parse(cmd); exec(tok[0], tok[1]);

2 3

slide-25
SLIDE 25

validate presence of: validate absence of:

Taint Sinks

function: exec, param: 0

Where / what to check: How to check: Result:

cmd = read(file); args = read(socket); cmd = trim(cmd + args); ... tok[] = parse(cmd); exec(tok[0], tok[1]);

2 3 2 3

slide-26
SLIDE 26

validate presence of: validate absence of:

Taint Sinks

function: exec, param: 0

Where / what to check: How to check: Result:

cmd = read(file); args = read(socket); cmd = trim(cmd + args); ... tok[] = parse(cmd); exec(tok[0], tok[1]);

"

2 3 2 3 2 3

slide-27
SLIDE 27

Framework: ease of use

  • Basic
  • Select sources, propagation policies, and

sinks from a set of predefined options

  • XML based configuration
  • Advanced
  • Suitable for more esoteric applications
  • Extend OO implementation

Provide two ways to configure the framework

slide-28
SLIDE 28

Framework: accuracy

  • Dytan operates at the binary level
  • consider the actual program semantics
  • transparently handle libraries
  • Dytan accounts for both data- and control-

flow dependencies

slide-29
SLIDE 29
  • Address Generators

add %eax, %ebx // A = A + B consumed: %eax, [%ebx] , %ebx

Two common examples:

  • Implicit operands

add %eax, %ebx // A = A + B produced: %eax , %eflags

Framework: accuracy

The most common source of inaccuracy is incorrectly identifying the information produced and consumed by a statement

[ ] *

slide-30
SLIDE 30

Outline

Motivation & overview Framework flexibility ease of use accuracy

  • Empirical evaluation
  • Conclusions
slide-31
SLIDE 31

Empirical evaluation

  • RQ1: Can Dytan be used to (easily)

implement existing dynamic taint analyses?

  • RQ2: How do inaccurate propagation

policies affect the analysis results?

  • In addition: discussion on performance
slide-32
SLIDE 32

RQ1: flexibility

  • Selected two techniques:
  • Overwrite attack detection [Qin et al. 04]
  • SQL injection detection [Halfond et al. 06]
  • Used Dytan to re-implement both techniques
  • Measure implementation time
  • Validate against the original implementation

Goal: show that Dytan can be used to (easily) implement existing dynamic taint analyses

slide-33
SLIDE 33

RQ1: results

  • Implementation time:
  • Overwrite attack detection: < 1 hour
  • SQL injection detection: < 1 day
  • Comparison with original implementations:
  • Successfully stopped same attacks as the
  • riginal implementations
slide-34
SLIDE 34

RQ2: accuracy impact

  • Selected two subjects:
  • Gzip (75kb w/o libraries)
  • Firefox (850kb w/o libraries)
  • Use Dytan to taint program inputs and measure

the amount of heap data tainted at program exit

  • Compare Dytan against inaccurate policies
  • no implicit operands (no IM)
  • no address generators (no AG)
  • no implicit operands, no address generators

(no IM, no AG) Goal: measure the effect of inaccurate propagation policies on analysis results

slide-35
SLIDE 35

RQ2: results

0% 25% 50% 75% 100% Firefox (1 page) Firefox (3 pages) Gzip

Dytan No IM No AG No IM, no IG

slide-36
SLIDE 36

Performance

  • In line with existing implementations
  • Designed for experimentation
  • Favors flexibility over performance
  • Implementation can be further optimized
  • Measured for gzip:

30x for data flow 50x for data and control flow

  • High overhead, but...
slide-37
SLIDE 37

Related work

  • Existing dynamic tainting approaches

[Suh et al. 04, Newsome and Song 05, Halfond et al. 06, Kong et al. 06, ...]

  • Ad-hoc
  • Other dynamic taint analysis frameworks

[Xu et al. 06 and Lam and Chiueh 06]

  • Focused on security applications
  • Single taint mark
  • No control-flow propagation
  • Operate at the source code level
slide-38
SLIDE 38

Conclusions

  • Dytan
  • a general framework for dynamic tainting
  • allows for instantiating and experimenting with

different dynamic taint analysis approaches

  • Initial evaluation
  • flexible
  • easy to use
  • accurate
slide-39
SLIDE 39

Future directions

  • Tool release (documentation, code cleanup)

http://www.cc.gatech.edu/~clause/dytan/

(pre-release on request)

  • Optimization (general and specific)
  • Applications
  • Memory protection
  • Debugging
slide-40
SLIDE 40

Questions?

http://www.cc.gatech.edu/~clause/dytan/