All You Ever Wanted to Know About Dynamic Taint Analysis & - - PowerPoint PPT Presentation

all you ever wanted to know about dynamic taint analysis
SMART_READER_LITE
LIVE PREVIEW

All You Ever Wanted to Know About Dynamic Taint Analysis & - - PowerPoint PPT Presentation

All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) (Yes, we were trying to overflow the title length field on the submission server) Edward J. Schwartz, Thanassis


slide-1
SLIDE 1

All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask)

Edward J. Schwartz, Thanassis Avgerinos, David Brumley

8/16/2010 Carnegie Mellon University 1

(Yes, we were trying to overflow the title length field

  • n the submission server)
slide-2
SLIDE 2

A Few Things You Need to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask)

Edward J. Schwartz, Thanassis Avgerinos, David Brumley

8/16/2010 Carnegie Mellon University 2

slide-3
SLIDE 3

The Root of All Evil

8/16/2010 Carnegie Mellon University 3

Humans write programs

This Talk: Computers Analyzing Programs Dynamically at Runtime

slide-4
SLIDE 4

Two Essential Runtime Analyses

8/16/2010 Carnegie Mellon University 4

Dynamic Taint Analysis: What values are derived from user input?

Detect Exploits

[Costa2005,Crandall2005, Newsome2005,Suh2004]

Detect packing in malware

[Bayer2009,Yin2007]

Forward Symbolic Execution: What input will make execution reach this line of code?

Input Filter Generation [Costa2007,Brumley2008] Automated Test Case Generation

[Cadar2008,Godefroid2005,Sen2005]

slide-5
SLIDE 5

Our Contributions

1: Turn English descriptions into an algorithm

– Operational Semantics

2: Algorithm highlights caveats, issues, and unsolved problems that are deceptively hard

8/16/2010 Carnegie Mellon University 5

Dynamic Taint Analysis: Is this value affected by user input? Forward Symbolic Execution: What input will make execution reach this line of code?

Computers Analyzing Programs Dynamically at Runtime

slide-6
SLIDE 6

Our Contributions (cont’d)

3: Systematize recurring themes in a wealth of previous work

8/16/2010 Carnegie Mellon University 6

slide-7
SLIDE 7
  • 1. How it works – example
  • 2. Desired properties
  • 3. Example issue. Paper has many more.

8/16/2010 Carnegie Mellon University 7

Dynamic Taint Analysis: What values are derived from user input?

slide-8
SLIDE 8

8/16/2010 Carnegie Mellon University 8

x = get_input( ) y = x + 42 … goto y

Input is tainted untainted tainted

x 7

Δ

Var Val T x Tainted? Var

τ

Input t = IsUntrusted(src) get_input(src)↓ t

Taint Introduction

slide-9
SLIDE 9

8/16/2010 Carnegie Mellon University 9

x = get_input( ) y = x + 42 … goto y

Data derived from user input is tainted untainted tainted

y 49

Δ

Var Val x 7 T y Tainted? T Var x

τ

BinOp t1 = τ[x1] , t2 = τ[x2] x1 + x2 ↓ t1 v t2

Taint Propagation

slide-10
SLIDE 10

8/16/2010 Carnegie Mellon University 10

Policy Violation Detected

x = get_input( ) y = x + 42 … goto y

untainted tainted

Δ

Var Val x 7 y 49 Tainted? T T Var x y

τ

Taint Checking

Pgoto(ta) = ¬ ta

(Must be true to execute)

slide-11
SLIDE 11

8/16/2010 Carnegie Mellon University 11

x = get_input( ) y = … … goto y

… strcpy(buffer,argv[1]) ; … return ; Jumping to

  • verwritten

return address

Real Use: Exploit Detection Different Use: Program Control

slide-12
SLIDE 12

Memory Load

8/16/2010 Carnegie Mellon University 12

Variables Memory

Δ

Var Val x 7 Tainted? T Var x

τ μ

Addr Val 7 42 Tainted? F Addr 7

τμ

slide-13
SLIDE 13

Problem: Memory Addresses

8/16/2010 Carnegie Mellon University 13

x = get_input( ) y = load( x ) … goto y

All values derived from user input are tainted??

7 42

μ

Addr Val Tainted? F Addr 7

τμ

x 7

Δ

Var Val

slide-14
SLIDE 14

μ

Addr Val x = get_input( ) y = load( x ) … goto y

Jump target could be any untainted memory cell value

Policy 1:

8/16/2010 Carnegie Mellon University 14

Load v = Δ*x] , t = τμ[v] load(x) ↓ t

Taint depends only on the memory cell

Taint Propagation 7 42 Tainted? F Addr 7

τμ

x 7

Δ

Var Val

Undertainting

Failing to identify tainted values

  • e.g., missing exploits
slide-15
SLIDE 15

jmp_table Policy Violation?

8/16/2010 Carnegie Mellon University 15

x = get_input( ) y = load(jmp_table + x % 2 ) … goto y

Policy 2:

Memory

printa printb

Address expression is tainted

Load v = Δ*x] , t = τμ[v], ta = τ[x] load(x) ↓ t v ta

If either the address or the memory cell is tainted, then the value is tainted

Taint Propagation

Overtainting

Unaffected values are tainted

  • e.g., exploits on safe inputs
slide-16
SLIDE 16

Research Challenge State-of-the-Art is not perfect for all programs

8/16/2010 Carnegie Mellon University 16

Undertainting: Policy may miss taint Overtainting: Policy may wrongly detect taint

slide-17
SLIDE 17
  • How it works – example
  • Inherent problems of symbolic execution
  • Proposed solutions

8/16/2010 Carnegie Mellon University 17

Forward Symbolic Execution: What input will make execution reach this line of code?

slide-18
SLIDE 18

The Challenge

8/16/2010 Carnegie Mellon University 18

0x12345678

232 possible inputs packet_len(int header, char *packet) char buf*2048+ = “…”; if (header < 0) return 0; if (header == 0x12345678) strcpy(buf, packet); return strlen(buf); Forward Symbolic Execution: What input will make execution reach this line of code?

slide-19
SLIDE 19

f t f t

A Simple Example

8/16/2010 Carnegie Mellon University 19

header < 0 header symbolic can have any value packet_len(int header, …) If (header < 0) If header == 0x12345678 return 0; strcpy(buf,packet); return strlen(buf); Interpreter Interpreter Interpreter Interpreter Interpreter header ≥ 0 Λ header != 0x12345678 header ≥ 0 Λ header == 0x12345678 header ≥ 0 What input will make execution reach this line of code?

slide-20
SLIDE 20

8/16/2010 Carnegie Mellon University 20

One Problem: Exponential Blowup Due to Branches

Branch 2 Branch 3 Branch 1 Exponential Number of Interpreters/formulas in # of branches

Interpreter

slide-21
SLIDE 21

8/16/2010 Carnegie Mellon University 21

Path Selection Heuristics

Symbolic Execution Tree

  • Depth-First Search (bounded) ,Random Search [Cadar2008]
  • Concolic Testing [Sen2005,Godefroid2008]

However, these are heuristics. In the worst case all create an exponential number of formulas in the tree height.

slide-22
SLIDE 22

Symbolic Execution is not Easy

  • Exponential number of interpreters/formulas
  • Exponentially-sized formulas
  • Solving a formula is NP-Complete!

8/16/2010 Carnegie Mellon University 22

branching substitution

s + s + s + s + s + s + s + s == 42

slide-23
SLIDE 23

Other Important Issues

8/16/2010 Carnegie Mellon University 23

Formalization

Π = (s + s + s + s + s + s + s + s) == 42

More complex policies

slide-24
SLIDE 24

Conclusion

  • Dynamic taint analysis and forward symbolic

execution used extensively in literature

– Formal algorithm and what is done for each possible step of execution often not emphasized

  • We provided a formal definition and summarized

– Critical issues – State-of-the-art solutions – Common tradeoffs

8/16/2010 Carnegie Mellon University 24

slide-25
SLIDE 25

8/16/2010 Carnegie Mellon University 25

Questions? Thank You!

thanassis@cmu.edu