Qi Gao, Wenbin Zhang, Yan Tang, and Feng Qin The Ohio State - - PowerPoint PPT Presentation

qi gao wenbin zhang yan tang and feng qin
SMART_READER_LITE
LIVE PREVIEW

Qi Gao, Wenbin Zhang, Yan Tang, and Feng Qin The Ohio State - - PowerPoint PPT Presentation

Qi Gao, Wenbin Zhang, Yan Tang, and Feng Qin The Ohio State University 1 Memory Management Bugs are Severe Memory management bugs: Programming errors related to memory management E.g., buffer overflows, dangling pointers, etc.


slide-1
SLIDE 1

Qi Gao, Wenbin Zhang, Yan Tang, and Feng Qin

The Ohio State University

1

slide-2
SLIDE 2

Memory Management Bugs are Severe

 Memory management bugs:  Programming errors related to memory management  E.g., buffer overflows, dangling pointers, etc.  Causing severe problems during

production runs

 System hangs or crashes  System compromises [ US-CERT]  Long delays for diagnosing and fixing the bugs

[ Symantec 2006, Arbaugh 2000]

2

slide-3
SLIDE 3

Desired Features for Handling Memory Bugs at Production Runs?

 Quick recovery  Improving availability  Immune from future errors  Covering the time window before official bug fixes  Safe  Not introduce new bugs  Useful diagnosis reports  Assisting offline bug diagnosis  Low overhead  For production runs

3

slide-4
SLIDE 4

Existing Solutions

4

Category Exam ples Lim itations Oblivion- based Failure-oblivious computing, reactive immune systems Unsafe Redundancy- based N-version programming, recovery blocks, DieHard, Exterminator Expensive Avoidance- based Rx, Archipelago Expensive or Non-immune

slide-5
SLIDE 5

Our Contributions

 First-Aid: A low-overhead method for

surviving and preventing memory bugs

 Environmental change based failure diagnosis  Runtime patches for surviving failures and preventing future errors

 Evaluation with seven real-world applications

 Fast diagnosis and failure recovery (0.887 sec on average)  Effective in preventing bug reoccurrence  Low runtime overhead (3.7% on average)  Informative bug reports

5

slide-6
SLIDE 6

Outline

 Motivation & Introduction  First-Aid Overview  Design and Algorithms  Software architecture  Diagnosis algorithm  Validation algorithm  Evaluation  Conclusion

6

slide-7
SLIDE 7

Environmental Changes for Failure Diagnosis

 Two types of environmental changes for

diagnosis:

 Preventive changes  Exposing changes  Execution environments:  Everything but the program itself  E.g., runtime systems, operating systems, etc.

7

slide-8
SLIDE 8

B

An Example of Preventive and Exposing Changes

B B

Preventive change: add padding Exposing change: Pad with canary*

* Canary: a bit pattern that unlikely appears in normal execution, e.g. 0xdeadbeef Enlarge buffer size: (padding is random data)  can prevent failure but not proving occurrence (possibly cure other types due to disturbance)

  • 1. Detect Overflow!!!
  • 2. Identify bug-affected
  • bjects

8

slide-9
SLIDE 9

Environmental Changes for Different Types of Memory Bugs

Bug types Preventive changes Exposing changes (Bug manifestations) Application points Buffer

  • verflow

Padding new objects Padding objects with canary (corruption) allocation Dangling pointer read Delay free Fill objects with canary (failure) deallocation Dangling pointer write Delay free Fill objects with canary (corruption) deallocation Double free Delay free Check parameters (free twice) deallocation Uninitialized read Fill new objects with zeros Fill new objects with canary (failure) allocation

9

slide-10
SLIDE 10

Runtime Patches

Bug types Preventive changes/ Runtime patches Exposing changes (Bug manifestations) Application points Buffer

  • verflow

Padding new objects Padding objects with canary (corruption) allocation Dangling pointer read Delay free Fill objects with canary (failure) deallocation Dangling pointer write Delay free Fill objects with canary (corruption) deallocation Double free Delay free Check parameters (free twice) deallocation Uninitialized read Fill new objects with zeros Fill new objects with canary (failure) allocation

10

slide-11
SLIDE 11

First-Aid Working Scenario

11

Checkpoint bug diagnosis

  • ne diagnosis step

rollback to checkpoint re-execute with change analyze result

patch validation

re-execute multiple times with randomization

patch generation

patch list allocation/ deallocation trace illegal access trace patch details diagnosis log

bug report

Failure or Error Detected

Program execution

slide-12
SLIDE 12

Outline

 Motivation & introduction  First-Aid overview  Design and algorithms  Software architecture  Diagnosis algorithm  Validation algorithm  Evaluation  Summary

12

slide-13
SLIDE 13

First-Aid Architecture

13

Application First-Aid

Memory Allocator Extension Error Monitor(s) Lightweight Checkpoint/ Rollback Diagnosis Engine Validation Engine Patch Management

slide-14
SLIDE 14

Diagnosis Engine

 Phase I:  Is the failure due to memory bug(s)?  Which checkpoint to rollback to for diagnosis and patching?  Phase II:  Which type(s) of memory bug(s) has occurred?  What memory objects are potentially affected by the bug?

14

slide-15
SLIDE 15

Diagnosis Phase I

15

Rollback Phase I : I s the failure due to m em ory bug( s) ? W hich checkpoint to rollback to? Re-execute: All preventive changes

  • n All objects

from this checkpoint

We know:

  • 1. A memory bug
  • 2. Triggered after this checkpoint

Pass

slide-16
SLIDE 16

Call-site: [ 0x806437b] [ 0x80651a8] [ 0x8074d94]

Diagnosis Phase II

16

Phase I I : W hich bug type? W here to patch? Re-execute: exposing one type, and preventing other types

  • n all memory objects

undecided set identified set double free Manifested Not m anifested buffer overflow Locate the call-sites by:

  • 1. check corruption, or
  • 2. binary search

We know:

  • 1. Buffer overflow bug
  • 2. Exact call-sites

Enough for patch generation

slide-17
SLIDE 17

Validation Engine

17

Instrumentation allocation/ deallocation trace I teration 1 : illegal access trace E.g. read before initialization; write

  • ver boundary;

etc. allocation/ deallocation trace I teration 2 : illegal access trace Randomized allocation allocation/ deallocation trace I teration 3 : illegal access trace Cross check:

  • 1. patch triggering
  • 2. illegal accesses
  • 3. offset of each illegal

access Validation: Does the patch have consistent effects? In parallel with recovered program

slide-18
SLIDE 18

Outline

 Motivation & introduction  First-Aid overview  Design and algorithms  Software architecture  Diagnosis algorithm  Validation algorithm  Evaluation  Summary

18

slide-19
SLIDE 19

Experimental Setup

 Implementation:  Linux 2.4.22 with flashback checkpointing support  Extension based on Lea allocator (used in GNU libc)  Platform:  Intel Xeon 3.00 GHz, 2MB L2 cache, 2GB memory  100 Mbps Ethernet connection  Applications:  Effectiveness: 7 applications (Apache, Squid, CVS, Pine, Mutt, M4, and BC), 7 real bugs, 2 injected bugs  Overhead: the above 7 applications, SPEC INT2000, allocation intensive benchmarks

19

slide-20
SLIDE 20

Overall Effectiveness

Application Diagnosed bugs Runtime patch (call-sites applied) Error prevention Recovery time (s) Apache dangling pointer read delay free (7)

Yes

3.978 Squid buffer overflow add padding (1)

Yes

0.386 CVS double free delay free (1)

Yes

0.121 Pine buffer overflow add padding (1)

Yes

0.722 Mutt buffer overflow add padding (1)

Yes

0.617 M4 dangling pointer read delay free (2)

Yes

1.396 BC buffer overflow add padding (3)

Yes

0.573 Apache-uir* uninitialized read fill with zero (1)

Yes

0.102 Apache-dpw* dangling pointer write delay free (1)

Yes

0.084

20

slide-21
SLIDE 21

Comparison with Rx and Restart

 Trigger the buffer overflow bug in Squid

periodically after 7 second

21

2 4 6 8 10 12 5 10 15 20 25 Throughput (MB/ s) Elapsed Time (s) Restart Rx First-Aid

slide-22
SLIDE 22

Scope of Patch

 Call-sites and memory objects affected by

runtime patches in buggy regions

Nam e Call-sites Objects First-Aid Rx Ratio First-Aid Rx Ratio Apache 7 32 21.88% 315 2567 12.23% Squid 1 61 1.64% 1 3626 0.03% CVS 1 44 2.27% 17 306 5.56% Pine 1 380 0.26% 11 2881 0.38% Mutt 1 216 0.46% 2 5004 0.04% M4 2 8 25.00% 3 183 1.64% BC 3 34 8.82% 5 732 0.68%

22

slide-23
SLIDE 23

Runtime Overhead

23

1.02 1.04 1.04 1.05 1.03 1.06 1.02 1.02 1.02 1.02 1.00 1.00 1.02 1.02 1.02 1.03 1.03 1.09 1.12 1.09 1.01 1.06 1.04

0.2 0.4 0.6 0.8 1 1.2

Apache Squid CVS Mutt Pine BC M4 164.gzip 175.vpr 176.gcc 181.mcf 186.crafty 197.parser 252.eon 253.perlbmk 255.vortex 256.bzip2 300.twolf cfrac espresso lindsay p2c Average

Original Allocator Overall Applications SPEC I NT2 0 0 0 Allocation I ntensive

slide-24
SLIDE 24

Conclusions and Limitations

 Avoidance-based methods with accurate

diagnosis can efficiently and effectively survive and prevent memory management bugs.

 Limitations:  Cannot handle all types of memory bugs (e.g. memory leaks, incorrect pointer arithmetics)  Cannot handle memory bugs that manifest themselves silently

 Need more powerful error checkers

24

slide-25
SLIDE 25

Future Work and Acknowledgements

 Future Work  Evaluate First-Aid with more types of memory bugs in more applications  Extend First-Aid to support multi-tier server applications  Acknowledgements  Our shepherd: Julia Lawall  Anonymous reviewers  Wei Huang, Matthew Koop, Chris Stewart, Guoqing Xu, and Yuanyuan Zhou

25