all you ever wanted to know about dynamic taint analysis
play

All You Ever Wanted to Know About Dynamic Taint Analysis & - PowerPoint PPT Presentation

All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) (Yes, we were trying to overflow the title length field on the submission server) Edward J. Schwartz, Thanassis


  1. All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) (Yes, we were trying to overflow the title length field on the submission server) Edward J. Schwartz, Thanassis Avgerinos, David Brumley 8/16/2010 Carnegie Mellon University 1

  2. A Few Things You Need to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis Avgerinos, David Brumley 8/16/2010 Carnegie Mellon University 2

  3. The Root of All Evil Humans write programs This Talk: Computers Analyzing Programs Dynamically at Runtime 8/16/2010 Carnegie Mellon University 3

  4. Two Essential Runtime Analyses Detect Detect Exploits packing in malware [Costa2005,Crandall2005, Newsome2005,Suh2004] [Bayer2009,Yin2007] Dynamic Taint Analysis: What values are derived from user input? Automated Test Case Input Filter Generation Generation [ Costa2007,Brumley2008 ] [Cadar2008,Godefroid2005,Sen2005] Forward Symbolic Execution: What input will make execution reach this line of code? 8/16/2010 Carnegie Mellon University 4

  5. Our Contributions 1: Turn English Computers Analyzing Programs descriptions into an Dynamically at Runtime algorithm – Operational Semantics Dynamic Taint Analysis: Is this value affected by user input? 2: Algorithm highlights caveats, issues, and unsolved problems Forward Symbolic Execution: that are deceptively What input will make execution hard reach this line of code? 8/16/2010 Carnegie Mellon University 5

  6. Our Contributions (cont’d) 3: Systematize recurring themes in a wealth of previous work 8/16/2010 Carnegie Mellon University 6

  7. Dynamic Taint Analysis: What values are derived from user input? 1. How it works – example 2. Desired properties 3. Example issue. Paper has many more. 8/16/2010 Carnegie Mellon University 7

  8. Δ tainted untainted Var Val x = get_input( ) x 7 y = x + 42 … Input is tainted goto y τ Taint Introduction Tainted? Var Input t = IsUntrusted( src ) T x get_input( src )↓ t 8/16/2010 Carnegie Mellon University 8

  9. Δ tainted untainted Var Val x = get_input( ) x 7 y = x + 42 y 49 … Data derived from user input is tainted goto y τ Taint Propagation Tainted? Var t 1 = τ [x 1 ] , t 2 = τ [x 2 ] T x BinOp x 1 + x 2 ↓ t 1 v t 2 T y 8/16/2010 Carnegie Mellon University 9

  10. Δ tainted untainted Var Val x = get_input( ) x 7 y = x + 42 y 49 … Policy Violation goto y Detected τ Taint Checking Tainted? Var T x P goto (t a ) = ¬ t a T y (Must be true to execute) 8/16/2010 Carnegie Mellon University 10

  11. Different Use: Real Use: Exploit Detection Program Control x = get_input( ) y = … … goto y … Jumping to strcpy(buffer,argv[1]) ; overwritten … return address return ; 8/16/2010 Carnegie Mellon University 11

  12. Memory Load Variables Memory Δ μ Var Val Addr Val x 7 7 42 τ τ μ Tainted? Tainted? Var Addr T 7 F x 8/16/2010 Carnegie Mellon University 12

  13. Problem: Memory Addresses Var Val Δ x = get_input( ) x 7 y = load( x ) … Addr Val μ goto y 7 42 All values derived from user input are tainted?? Tainted? Addr τ μ 7 F 8/16/2010 Carnegie Mellon University 13

  14. Policy 1: Taint depends only on the memory cell Var Val Δ x = get_input( ) x 7 Jump target could Undertainting y = load( x ) be any untainted … Addr Val memory cell value Failing to identify tainted values μ goto y - e.g., missing exploits 7 42 Taint Propagation Tainted? Addr τ μ Load v = Δ* x] , t = τ μ [v] 7 F load(x) ↓ t 8/16/2010 Carnegie Mellon University 14

  15. If either the address or the memory Policy 2: cell is tainted, then the value is tainted Memory x = get_input( ) Address Overtainting expression y = load(jmp_table + x % 2 ) is tainted … Unaffected values are tainted jmp_table printa goto y - e.g., exploits on safe inputs printb Policy Violation? Taint Propagation Load v = Δ* x] , t = τ μ [v], t a = τ [x] load(x) ↓ t v t a 8/16/2010 Carnegie Mellon University 15

  16. Research Challenge State-of-the-Art is not perfect for all programs Overtainting: Undertainting: Policy may wrongly Policy may miss taint detect taint 8/16/2010 Carnegie Mellon University 16

  17. Forward Symbolic Execution: What input will make execution reach this line of code? • How it works – example • Inherent problems of symbolic execution • Proposed solutions 8/16/2010 Carnegie Mellon University 17

  18. The Challenge packet_len(int header, char *packet) 2 32 possible char buf *2048+ = “…”; if (header < 0) inputs return 0; if (header == 0x12345678) 0x12345678 strcpy(buf, packet); return strlen(buf); Forward Symbolic Execution: What input will make execution reach this line of code? 8/16/2010 Carnegie Mellon University 18

  19. A Simple Example header symbolic packet_len(int header, …) Interpreter can have any value What input will make execution If (header < 0) Interpreter Interpreter reach this line of header ≥ 0 t f code? return 0; If header == 0x12345678 Interpreter Interpreter t f header < 0 return strlen(buf); strcpy(buf,packet); header ≥ 0 Λ header ≥ 0 Λ header == 0x12345678 header != 0x12345678 8/16/2010 Carnegie Mellon University 19

  20. One Problem: Exponential Blowup Due to Branches Interpreter Branch 1 Branch 2 Branch 3 Exponential Number of Interpreters/formulas in # of branches 8/16/2010 Carnegie Mellon University 20

  21. Path Selection Heuristics Symbolic Execution Tree However, these are heuristics. In the worst case all create an exponential number of formulas in the tree height. • Depth-First Search (bounded) ,Random Search [Cadar2008] … • Concolic Testing [Sen2005,Godefroid2008] 8/16/2010 Carnegie Mellon University 21

  22. Symbolic Execution is not Easy • Exponential number of interpreters/formulas branching • Exponentially-sized formulas s + s + s + s + substitution s + s + s + s == 42 • Solving a formula is NP-Complete! 8/16/2010 Carnegie Mellon University 22

  23. Other Important Issues Formalization More complex policies Π = ( s + s + s + s + s + s + s + s) == 42 8/16/2010 Carnegie Mellon University 23

  24. Conclusion • Dynamic taint analysis and forward symbolic execution used extensively in literature – Formal algorithm and what is done for each possible step of execution often not emphasized • We provided a formal definition and summarized – Critical issues – State-of-the-art solutions – Common tradeoffs 8/16/2010 Carnegie Mellon University 24

  25. Thank You! thanassis@cmu.edu Questions? 8/16/2010 Carnegie Mellon University 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend