Scheduling Black-box Muta5onal Fuzzing ACM CCS 2013 Maverick Woo - - PowerPoint PPT Presentation

scheduling black box muta5onal fuzzing acm ccs 2013
SMART_READER_LITE
LIVE PREVIEW

Scheduling Black-box Muta5onal Fuzzing ACM CCS 2013 Maverick Woo - - PowerPoint PPT Presentation

Scheduling Black-box Muta5onal Fuzzing ACM CCS 2013 Maverick Woo Carnegie Mellon University pooh@cmu.edu Our Crew Maverick Woo Sang Kil Cha Samantha Gottlieb David Brumley 2 The Story 3 Typical Exploit Genera5on Bug Finding Exploit


slide-1
SLIDE 1

Scheduling Black-box Muta5onal Fuzzing ACM CCS 2013

Maverick Woo

Carnegie Mellon University pooh@cmu.edu

slide-2
SLIDE 2

Our Crew

2

David Brumley Maverick Woo Samantha Gottlieb Sang Kil Cha

slide-3
SLIDE 3

3

The Story

slide-4
SLIDE 4

Typical Exploit Genera5on

Fuzzing Bug Triage Exploit Generation

4

crashes bugs Bug Finding

slide-5
SLIDE 5

5

Scheduling is Equally Important

Ordering Time Allocation

slide-6
SLIDE 6

Scheduling Black-box Muta5onal Fuzzing

6

slide-7
SLIDE 7

Scheduling Black-box Muta5onal Fuzzing

A common program testing technique popularized by Miller et al. in late 1980s [18]

  • Use a fuzzer to generate test inputs to program-under-test
  • At its simplest, look for crashes—memory corruption,

uncaught exceptions, failed assertions, etc.

7

Fuzzer Crash Termi- nation Program Test Input

slide-8
SLIDE 8

Scheduling Black-box Muta5onal Fuzzing

A black-box fuzzer observes a program’s I/O behavior only

  • cf. Whitebox Fuzzing by Godefroid et al. 2012 [11]
  • SimpliSication: only distinguish termination vs. crash

8

Fuzzer Crash Termi- nation Program Mutated Input Seed Input PRNG(j)

Detect anomaly by mutating a valid input (= seed)

slide-9
SLIDE 9

Scheduling Black-box Muta5onal Fuzzing

A black-box fuzzer observes a program’s I/O behavior only

  • cf. Whitebox Fuzzing by Godefroid et al. 2012 [11]
  • SimpliSication: only distinguish termination vs. crash

Given a seed input s and a mutation ratio r:

  • 1. Select d = r × |s| bits in s uniformly at random
  • 2. Flip each selected bit with probability ½

9

Fuzzer Crash Termi- nation Program Mutated Input Seed Input PRNG(j)

slide-10
SLIDE 10

Scheduling Black-box Muta5onal Fuzzing

Key Observations:

  • 1. We can reproduce a program crash by storing

(a) the seed input and (b) the PRNG seed

  • 2. Mutation = uniform sampling from the Hamming cube of

radius d centered at s

10

Fuzzer Crash Termi- nation Program Mutated Input Seed Input PRNG(j)

slide-11
SLIDE 11

Scheduling Black-box Muta5onal Fuzzing

Key Observations:

  • 1. We can reproduce a program crash by storing

(a) the seed input and (b) the PRNG seed

  • 2. Mutation = uniform sampling from the Hamming cube of

radius d centered at s

11

“Fuzz ConBiguration” (i) program p (ii) seed input s (iii) mutation ratio r

Fuzzer Crash Termi- nation Program Mutated Input Seed Input PRNG(j)

slide-12
SLIDE 12

Scheduling Black-box Muta5onal Fuzzing

Key Observations:

  • 1. We can reproduce a program crash by storing

(a) the seed input and (b) the PRNG seed

  • 2. Mutation = uniform sampling from the Hamming cube of

radius d centered at s

12

“Fuzz ConBiguration” (i) program p (ii) seed input s (iii) 0.04%

Fuzzer Crash Termi- nation Program Mutated Input Seed Input PRNG(j)

slide-13
SLIDE 13

Scheduling Black-box Muta5onal Fuzzing

Key Observations:

  • 1. We can reproduce a program crash by storing

(a) the seed input and (b) the PRNG seed

  • 2. Mutation = uniform sampling from the Hamming cube of

radius d centered at s

13

“Fuzz ConBiguration” = “(program, seed) pair” in this talk

Fuzzer Crash Termi- nation Program Mutated Input Seed Input PRNG(j)

slide-14
SLIDE 14

Scheduling Black-box Muta5onal Fuzzing

A fuzz campaign comprises a sequence of epochs:

  • 1. takes a list of (program, seed) pairs as input
  • 2. at the beginning of each epoch, picks one (program, seed)

pair to fuzz based on data collected from previous epochs We investigate two epoch types:

  • Fixed-run: Sixed number of fuzz runs in each epoch

– implemented in CMU CERT BFF v2.6 [14]

  • Fixed-time: Sixed amount of time in each epoch

– proposed in this paper – slightly harder to implement

14

slide-15
SLIDE 15

Problem Statement

Given a list of K fuzz conSigurations {(p1, s1), . . . , (pK, sK)}, the Fuzz Con?iguration Scheduling (FCS) problem seeks to maximize the number of unique bugs discovered in a fuzz campaign that runs for a duration of length T. Important Assumptions:

  • 1. Only one conSiguration can be fuzzed within an epoch
  • 2. Separate program analysis of (pi, si) is not allowed
  • 3. Bugs from different (pi, si) are disjoint

15

See paper for discussions

slide-16
SLIDE 16

How to Solve the FCS Problem?

Two competing goals during a fuzz campaign: Good News:

  • Clearly a Multi-Armed Bandit (MAB) problem!

16

vs.

Explore each (pi, si) sufSiciently often so as to identify pairs that can yield new bugs Exploit knowledge of (pi, si) that are likely to yield new bugs by fuzzing them more

slide-17
SLIDE 17

Mul5-Armed Bandits

17

slide-18
SLIDE 18

MAB in Berlin

18

slide-19
SLIDE 19

How to Solve the FCS Problem?

Two competing goals during a fuzz campaign: Good News:

  • Clearly a Multi-Armed Bandit (MAB) problem!
  • Lots of published MAB algorithms

– provably optimal algorithms for many settings, e.g., Auer et al. 2002 [2] handles certain adversarial cases

19

vs.

Explore each (pi, si) sufSiciently often so as to identify pairs that can yield new bugs Exploit knowledge of (pi, si) that are likely to yield new bugs by fuzzing them more

slide-20
SLIDE 20

How to Solve the FCS Problem?

Bad News: recognizing “FCS ∈ MAB” is not enough Given a list of K fuzz conSigurations {(p1, s1), . . . , (pK, sK)}, the Fuzz Con?iguration Scheduling (FCS) problem seeks to maximize the number of unique bugs discovered in a fuzz campaign that runs for a duration of length T.

  • 1. Classic MAB: once you identify a good beer, it stays good

⇒ drink it often to accumulate rewards J

  • 2. Our Setting: each program has a ?inite number of bugs

⇒ bug exhaustion gives a diminish of return L We are not aware of MAB algorithms that cater to our case… ⇒ We need our own algorithms!

20

slide-21
SLIDE 21

How to Solve the FCS Problem?

Bad News: recognizing “FCS ∈ MAB” is not enough Given a list of K fuzz conSigurations {(p1, s1), . . . , (pK, sK)}, the Fuzz Con?iguration Scheduling (FCS) problem seeks to maximize the number of unique bugs discovered in a fuzz campaign that runs for a duration of length T.

  • 1. Classic MAB: once you identify a good beer, it stays good

⇒ drink it often to accumulate rewards J

  • 2. Our Setting: each program has a ?inite number of bugs

⇒ bug exhaustion gives a diminish of return L We are not aware of MAB algorithms that cater to our case… ⇒ We need our own algorithms!

21

slide-22
SLIDE 22

Scheduling Black-box Muta5onal Fuzzing

Key Observations:

  • 1. We can reproduce a program crash by storing

(a) the seed input and (b) the PRNG seed

  • 2. Mutation = uniform sampling from the Hamming cube of

radius d centered at s

22

Fuzzer Crash Termi- nation Program Mutated Input Seed Input PRNG(j)

Previously

slide-23
SLIDE 23

Modeling Black-box Muta5onal Fuzzing

Consider the repeated fuzzings of a ?ixed (pi, si) and let

  • utcomei(j) denote the j-th outcome in the sequence:
  • Termination ⇒ ID 0
  • Crash ⇒ bug ID obtained from bug triage

Key Observation: BMF is memoryless, i.e., outcomei(j) are i.i.d. RVs for a Sixed i

23

Fuzzer Crash Termi- nation Program Mutated Input Seed Input PRNG(j)

slide-24
SLIDE 24

Coupon Collector’s Problem (CCP)

Suppose every box of breakfast cereal comes with a coupon that is randomly chosen among M different coupon types

  • How many boxes do you expect to buy before you have

collected at least one coupon of each type? Traditional Setting

  • Coupon types are uniformly distributed ⇒ Θ(M log M)

Our Setting

  • Bugs do not occur uniformly at random ⇒ Weighted CCP
  • Prevalence of different bugs is unknown ahead of time

24

slide-25
SLIDE 25

Coupon Collector’s Problem (CCP)

Suppose every box of breakfast cereal comes with a coupon that is randomly chosen among M different coupon types

  • How many boxes do you expect to buy before you have

collected at least one coupon of each type? Traditional Setting

  • Coupon types are uniformly distributed ⇒ Θ(M log M)

Our Setting

  • Bugs do not occur uniformly at random ⇒ Weighted CCP
  • Prevalence of different bugs is unknown ahead of time

25

Also observed by Arcuri 2010 [1]

slide-26
SLIDE 26

WCCP w/ Unknown is Intractable

No Free Lunch Theorem

(you did pay the registration, right?)

26

vs.

… …

slide-27
SLIDE 27

WCCP w/ Unknown is Intractable

No Free Lunch Theorem

Wolpert and Macready 2005 on [22]

  • “Any two optimization

algorithms are equivalent when their performance is averaged across all possible problems”

27

vs.

… …

slide-28
SLIDE 28

“Bring Your Own Prior”

No Free Lunch Theorem

Wolpert and Macready 2005 on [22]

  • “Any two optimization

algorithms are equivalent when their performance is averaged across all possible problems”

28

vs.

… … Circumvention may be possible!

  • NFL Theorem does not apply if we

focus on distributions that are more likely to occur in practice

  • More accurate model ⇒ More

accurate predictions ⇒ More bugs

slide-29
SLIDE 29

Rule of Three aaaa

Q: Suppose we have Slipped a biased H-T coin n times and every time it comes up H. Does Pr[T] have to be small? A: No, so long as Pr[T] < 1, our observation is always possible ConBidence Intervals: Pr[T] < 3/n in 95% of all “parallel universes” Usage:

  • 1. Suppose (pi, si) has yielded n different outcomes so far
  • 2. Collectively call all n outcome types H
  • 3. With 95% conSidence, Pr[T (i.e., new outcome)] < 3/n

29

See discussion in Jovanovic 1997 [15]

slide-30
SLIDE 30

Algorithm Design Space

We explore 3 dimensions in algorithm design and present:

  • 2 Epoch Types

– Sixed-run – Sixed-time

  • 5 MAB Algorithms

– Round-Robin – Uniform-Random – EXP3.S.1 from Auer et al. 2002 [2] – Weighted-Random – ε-Greedy

  • 5 Belief Metrics

30

w.r.t. belief metrics

2 * (3 + 2 * 5) = 26 Scheduling Algorithms

slide-31
SLIDE 31

Belief Metrics

The belief over (pi, si) is a heuristic to estimate the likelihood

  • f yielding a new outcome in the next fuzz run of this pair
  • Weighted-Random & ε-Greedy both bias towards pairs with

higher belief

31

3 RGR = #bugs 1 RPM = 3/#runs 2 EWT = 3/time spent 4 DENSITY = #bugs/#runs 5 RATE = #bugs/time spent

#runs time spent

×

#runs time spent

×

#bugs 3

×

#bugs 3

× No Prior With “Bug Prior”

slide-32
SLIDE 32

The Evalua5on Challenge

  • 1. Find large & representative data sets

If an algorithm performs well on such data sets, then we gain conSidence that it is superior for current practice

  • 2. How good is an algorithm, really?

Is an algorithm that Sinds 200 bugs in 10 days good or bad? ⇒ Need to know max #bugs that can be found in 10 days, but this is circular! We are trying to solve this problem!

  • 3. How to try many algorithms affordably?

Yes, we tried way more than 26 combinations… J

32

slide-33
SLIDE 33

How To Pull This Off

33

http://s3.amazonaws.com/rapgenius/Silepicker%2FgkTHRLQsyzS3MggKloYA_money.jpg

slide-34
SLIDE 34

How To Pull This Off

Step 1. Select two representative datasets: Intra-Program: 100 randomly-sampled seeds for FFMPEG Inter-Program: 100 Sile converters in Debian w/ valid seeds Step 2. Fuzz each of the 200 pairs on EC2 for 10 days— 48,000 CPU hours (~5.5 CPU years) later: Step 3. Build the FUZZSIM replay system to simulate any scheduling algorithm with no additional fuzzings

34

slide-35
SLIDE 35

FUZZSIM Overview

  • Example log entry:

(p=FFMPEG, s=a.avi, timestamp=100, run=42, PRNG=17)

  • Can simulate any schedule using log Siles

– Including Of?line Optimal (≈ dynamic prog. for BOUNDED KNAPSACK)

35

slide-36
SLIDE 36

36

slide-37
SLIDE 37

37

Recommendation 1: Use Weighted Random w/ Rate

slide-38
SLIDE 38

38

Recommendation 2: Use Fixed-Time Campaigns

slide-39
SLIDE 39

Comparison with CERT BFF v2.6

CERT BFF is the state-of-the-art fuzzing framework

– Supports fuzzing one program w/ multiple seeds – Varies mutation ratio online – Fixed-run epochs – Weighted-Random MAB algorithm – use Density (#bugs/#runs) as belief

39

Fixed-time Weighted-Random Rate Sinds

  • n average 1.5x more bugs in our datasets

(at a Sixed mutation ratio)

slide-40
SLIDE 40

RPM Density RR EWT RGR Rate Offline

50 100 1 2 3 4 5 6 7 8 9 10

days #bugs

Intra: FFMPEG Dataset

Density RPM RR EWT RGR Rate Offline

50 100 150 200 1 2 3 4 5 6 7 8 9 10

days #bugs

Inter: File Converters Dataset

40

wh th

slide-41
SLIDE 41

Future Work

Vary mutation ratio

  • m mutation ratios ⇒ m-fold cost increase

Online bug triage

  • triage time is currently being discounted

Other program testing techniques

  • black-box generational (grammar-based) fuzzing?
  • concolic execution?

41

slide-42
SLIDE 42

42

Start MAB Not Enough!

Summary

slide-43
SLIDE 43

43

Start MAB Not Enough! WCCP

Summary

slide-44
SLIDE 44

44

Start MAB Not Enough! WCCP NFL

Summary

slide-45
SLIDE 45

45

Start MAB Not Enough! WCCP NFL Rule of Three

Summary

slide-46
SLIDE 46

Summary

46

Start MAB Not Enough! WCCP NFL Rule of Three Algorithm Design

slide-47
SLIDE 47

Summary

47

Start MAB Not Enough! WCCP NFL Rule of Three Algorithm Design Open Science

slide-48
SLIDE 48

h^p://security.ece.cmu.edu/fuzzsim/

48