Behavior-Based Detection The old way match syntactic signatures: - - PowerPoint PPT Presentation

behavior based detection
SMART_READER_LITE
LIVE PREVIEW

Behavior-Based Detection The old way match syntactic signatures: - - PowerPoint PPT Presentation

Behavior-Based Detection The old way match syntactic signatures: One-to- one < 50% detection The new way examine underlying behavior: One-to- many 1 Specifying Behaviors NtOpenKey \CurrentVersion\Run NtDeleteValueKey


slide-1
SLIDE 1

Behavior-Based Detection

The old way – match syntactic signatures:

The new way – examine underlying behavior:

One-to-

  • ne

One-to- many < 50% detection

1

slide-2
SLIDE 2

Specifying Behaviors

NtOpenKey “…\CurrentVersion\Run ” NtDeleteValueKey “McAfee Firewall”

2

slide-3
SLIDE 3

Specifying Behaviors

3

Behavior-graph representation

– Nodes epresent events & arguments

  • System calls, library calls, high-level events

– Edges represent data dependencies

  • Data substring equality, resource generation/use

– Argument values are crucial!

slide-4
SLIDE 4

Finding the Needle in the Haystack

NtOpenKey “…\CurrentVersion\Run ” NtDeleteValueKey “McAfee Firewall” NtOpenKey “…\InternetSettings\... ” NtSetValueKey “ProxyBypass”

4

slide-5
SLIDE 5

Large, Complex Problem

  • Behavior graphs are large

– Between tens of thousands to millions of nodes

  • New malware is ever-present

– Lower bound of 7,933 samples/day in 2009

  • Large, diverse benign application pool

– Windows 7 is backwards compatible to NT/95

  • Manual analysis, brute force not feasible
slide-6
SLIDE 6

6

Mining Approach (Somesh Jha et al)

  • New specification-synthesis algorithm

– Perform efficient, large-scale data mining first to uncover suspicious behaviors – Probabilistically refines and optimizes specifications

  • Key algorithms scale to real problem size

– Reduces the window of vulnerability

  • Tunable true positive/false positive rate

– 86% TP for low FP, 100% TP for higher FP

slide-7
SLIDE 7

NtOpenKey “…\CurrentVersion\Run ” NtDeleteValueKey “McAfee Firewall”

Holmes: Our Approach to Specification Synthesis

  • Roadmap:

– Workflow

1.Mine significant behaviors 2.Synthesize specification

– Results – Conclusion

slide-8
SLIDE 8

Significant Behaviors

  • Significant behaviors discriminate between

labeled malicious and benign sets

  • Measured statistically via frequency

counting of subgraphs

– Can use information gain, cross entropy, G-test, …

NtOpenKey “…\CurrentVersion\Run” NtDeleteValueKey “McAfee Firewall”

slide-9
SLIDE 9

Key Requirement

  • Significant behavior appears in many

malware graphs, few benign graphs

slide-10
SLIDE 10

Leap Mining: Extracting Significant Behaviors

  • Want to find subgraph that optimizes

significance measure

  • Problem: Number of candidate subgraphs

is factorial in # Nodes + # Edges

Structural leap mining exploits the correlation between structural similarity and similarity in

  • bjective score for functions like information gain

to quickly prune the candidate search space.

slide-11
SLIDE 11

Leap Mining (Contd)

  • Insight: Correlation between structural

similarity, significance score similarity to guide search [Yan et al., SIGMOD ‘08]

– “Leap” over branches in search tree with similar structure

  • Future: Probabilistically compress source

graphs to mine behaviors more efficiently [Chen et al, VLDB ‘09]

11

slide-12
SLIDE 12

Leap Mining: Example

Significance `

0.1

Significance score similar to parent! This means we can prune siblings

0.2 0.8 0.1 0.2

Most significant pattern!

12

slide-13
SLIDE 13

13

Candidate subgraphs are enumerated from small to large size in a search tree and their score is checked against the best subgraph discovered so far. Whenever the upper bound of information gain in a search branch is smaller than the score of the best subgraph discovered, the whole search branch can be safely discarded. Siblings of previously enumerated candidates are “leaped over” since it is likely they will share similar information gain. This horizontal pruning is effective in pruning subgraphs that represent similar behaviors

slide-14
SLIDE 14

Naïve Synthesis: Just Significant Behaviors

  • Use all significant behaviors exhibited by

a specific sample

  • Pros:

– Not path-dependent – Significance metric likely to select behaviors that give low false positives

  • Cons:

– Some significant behaviors may be variant- specific  false negatives! – Some samples may not exhibit many mined suspicious behaviors  false positives!

slide-15
SLIDE 15

Searching for the Optimal Specification

  • Insight: significant behaviors are

suspicious behaviors

  • A good spec. is the right combination of

suspicious behaviors

  • Given a malware set, search using

concept analysis

– Concept is a pair: ({malware samples}, {suspicious behaviors}) – Find set of concepts with optimal true/false positive characteristics

slide-16
SLIDE 16

Simulated Annealing

  • Concept space is enormous: factorial in number of suspicious behaviors
  • Simulated annealing: probabilistic search over localized portions of

solution space

– Derive new solutions greedily most of the time – With certain probability, move to sub-optimal solutions in the search  avoid local minima – Known sampling methods, cooling schedules to guarantee optimal convergence

slide-17
SLIDE 17

Simulated Annealing: Example

Detection Rate False Positives `

6 7 8 1 11 5

Probabilistically take sub-optimal solution!

17

slide-18
SLIDE 18

18

Workflow

Known Malware

Specification Synthesis

Discriminative Specification Benign Apps Significant Behaviors

Behavior Mining

Benign Apps Recent Malware

18

slide-19
SLIDE 19

Evaluation Workflow

492 samples Known Malware

Specification Synthesis

Discriminative Specification Benign Apps Significant Behaviors

Behavior Mining

11 apps 166 behaviors 378 samples Benign Apps 28 apps 1 specification (with 10-fold cross-validation)

Behavior-Based Malware Detection

Detection Results Recent Malware Benign Apps 28 apps 42 samples New Malware

19

slide-20
SLIDE 20

20

Corpus Details

  • 912 malware samples

– 18 AV-labeled families

  • Spyware, worms, bots, filesystem viruses, …

– 492 samples in 6 families for mining – 420 samples in 12 families for synthesis & evaluation

  • 49 benign applications

– Behaviorally-diverse set: browsers, system administration, media…

slide-21
SLIDE 21

Corpus Details (Contd)

  • Trace collection accounts for a single

path

– 120 seconds for malware – Typical usage patterns for benign applications

slide-22
SLIDE 22

22

Behavior Mining Results

  • Mined 109 unique behaviors

– 18.1 per family, on average – 77 manually deemed malicious

  • Non-malicious behaviors due to sample size
  • Most behaviors correspond to those in AV

databases

– Mined some unreported by AV, e.g. code injection & browser reconfiguration in worms and viruses – Some behaviors missing (likely) due to single- path collection

slide-23
SLIDE 23

23

Specification Synthesis Results

  • 0 FP on test corpus for 86.5% detection rate
  • TP/FP tradeoff configurable
  • Better than commercial AV on our corpus: Sana (42.61%), Threatfire

(61.70%)

slide-24
SLIDE 24

24

Specification Synthesis Results

  • 0 FP on test corpus for 86.5% detection rate
  • TP/FP tradeoff configurable
  • Better than commercial AV on our corpus: Sana (42.61%), Threatfire

(61.70%)

slide-25
SLIDE 25

Performance and Scalability

  • Behavior mining runtime varies between

families

– Worst-case exponential; can tweak tradeoff in accuracy – Similarity between malicious/benign graphs affects runtime – Can easily parallelize for linear speedup

  • Specification synthesis works quickly

– Most specifications found in under one minute (near-optimal solutions) – Optimal solution can be found in exponential time using same algorithm

slide-26
SLIDE 26

26

Summary

NtOpenKey “…\CurrentVersion\Run ” NtDeleteValueKey “McAfee Firewall”

  • Synthesizing specifications is hard!
  • Holmes utilizes large-scale data mining to

extract suspicious behaviors

  • Holmes probabilistically searches for

near-optimal specifications using suspicious behaviors

  • Detection results beat industry results
  • Algorithms scale to real problem size
slide-27
SLIDE 27

Motivation

  • Thousands of new malware samples appear each day
  • Automatic analysis systems allow us to create thousands of

analysis reports

  • Now a way to group the reports is needed. We would like to

cluster them into sets of malware reports that exhibit similar behavior.

– we require automated clustering techniques

  • Clustering allows us to:

– discard reports of samples that have been seen before – guide an analyst in the selection of those samples that require most attention – derive generalized signatures, implement removal procedures that work for a whole class of samples

slide-28
SLIDE 28

Scalable, Behavior-Based Malware Clustering

  • Malware Clustering:

Find a partitioning of a given set of malware samples into subsets so that subsets share some common traits (i.e., find “virus families”)

  • Behavior-Based: A

malware sample is represented by its actions performed at run-time

  • Scalable: It has to work for

large sets of malware samples

slide-29
SLIDE 29

System Overview

Execution Trace augmented with taint-information and network analysis results

Dynamic Analysis of the Sample

Extraction

  • f the

Behavioral Profile Clustering Behavioral Profile

Result Result Input Input

slide-30
SLIDE 30

Dynamic Analysis

  • Full-system emulator

– Generates an execution trace listing all invoked system calls

  • extend with:

– system call dependencies (Tainting) – control flow dependencies – network analysis (for accurately describing a sample‟s network behavior)

  • Output of this step: Execution trace

augmented with taint information and network analysis results

slide-31
SLIDE 31

Extraction Of The Behavioral Profile

  • In this step, we process the execution trace

provided by the „dynamic analysis‟ step

  • Goal: abstract from the system call trace

– system calls can vary significantly, even between programs that exhibit the same behavior – remove execution-specific artifacts from the trace

  • A behavioral profile is an abstraction of the

program's execution trace that accurately captures the behavior of the binary

slide-32
SLIDE 32

Reasons For An Abstract Behavioral Description

  • Different ways to read from a file:
  • Different system calls with similar

semantics

– e.g., NtCreateProcess, NtCreateProcessEx

  • You can easily interleave the trace with

unrelated calls:

f = fopen(“C:\\test”); read(f, 1); read(f, 1); read(f, 1); f = fopen(“C:\\test”); read(f, 3); f = fopen(“C:\\test”); read(f, 1); readRegValue(..); read(f, 1);

A: B: C:

slide-33
SLIDE 33

Reasons For An Abstract Behavioral Description

  • Different ways to read from a file:
  • Different system calls with similar

semantics

– e.g., NtCreateProcess, NtCreateProcessEx

  • You can easily interleave the trace with

unrelated calls:

f = fopen(“C:\\test”); read(f, 1); read(f, 1); read(f, 1); f = fopen(“C:\\test”); read(f, 3); f = fopen(“C:\\test”); read(f, 1); readRegValue(..); read(f, 1);

A: B: C:

slide-34
SLIDE 34

Reasons For An Abstract Behavioral Description

  • Different ways to read from a file:
  • Different system calls with similar

semantics

– e.g., NtCreateProcess, NtCreateProcessEx

  • You can easily interleave the trace with

unrelated calls:

f = fopen(“C:\\test”); read(f, 1); read(f, 1); read(f, 1); f = fopen(“C:\\test”); read(f, 3); f = fopen(“C:\\test”); read(f, 1); readRegValue(..); read(f, 1);

A: B: C:

slide-35
SLIDE 35

Reasons For An Abstract Behavioral Description

  • Different ways to read from a file:
  • Different system calls with similar

semantics

– e.g., NtCreateProcess, NtCreateProcessEx

  • You can easily interleave the trace with

unrelated calls:

f = fopen(“C:\\test”); read(f, 1); read(f, 1); read(f, 1); f = fopen(“C:\\test”); read(f, 3); f = fopen(“C:\\test”); read(f, 1); readRegValue(..); read(f, 1);

A: B: C:

slide-36
SLIDE 36

Reasons For An Abstract Behavioral Description

  • Different ways to read from a file:
  • Different system calls with similar

semantics

– e.g., NtCreateProcess, NtCreateProcessEx

  • You can easily interleave the trace with

unrelated calls:

f = fopen(“C:\\test”); read(f, 1); read(f, 1); read(f, 1); f = fopen(“C:\\test”); read(f, 3); f = fopen(“C:\\test”); read(f, 1); readRegValue(..); read(f, 1);

A: B: C:

slide-37
SLIDE 37

Elements Of A Behavioral Profile

  • OS Objects: represent a resource such as a file that can be

manipulated via system calls

– has a name and a type

  • OS Operations: generalization of a system call

– carried out on an OS object – the order of operations is irrelevant – the number of operations on a certain resource does not matter

  • Object Dependencies: model dependencies between OS objects

(e.g., a copy operation from a source file to a target file)

– also reflect the true order of operations

  • Control Flow Dependencies: reflect how tainted data is used by

the program (comparisons with tainted data)

slide-38
SLIDE 38

Scalable Clustering

  • Most clustering algorithms require to compute the distances

between all pairs of points => O(n2)

  • Use LSH (locality sensitive hashing), a technique introduced by

Indyk and Motwani, to compute an approximate clustering that requires less than n2 distance computations

  • Clustering algorithm takes as input a set of malware samples

where each malware sample is represented as a set of features

we have to transform each behavioral profile into a feature set first

  • Our similarity measure: Jaccard Index defined as

| | / | | ) , ( b a b a b a J   

slide-39
SLIDE 39

LSH Clustering

  • We are performing an approximate,

single-linkage hierarchical clustering:

  • Step 1: Locality Sensitive Hashing

– to cluster a set of samples we have to choose a similarity threshold t – the result is an approximation of the true set

  • f all near (as defined by the parameter t)

pairs

  • Step 2: Single-Linkage hierarchical

clustering

slide-40
SLIDE 40

Evaluating Clustering Quality (Jha)

  • For assessing the quality of the clustering algorithm, we

compare our clustering results with a reference clustering of the same sample set

– since no reference clustering for malware exists, we had to create it first

  • Reference Clustering:
  • 1. we obtained a random sampling of 14,212 malware samples that

were submitted to Anubis from Oct. 27th 2007 to Jan. 31st 2008

  • 2. we scanned each sample with 6 different virus scanners
  • 3. we selected only those samples for which the majority of the

anti-virus programs reported the same malware family. This resulted in a total of 2,658 samples.

  • 4. we manually corrected classification problems
slide-41
SLIDE 41

Quantitative Evaluation (Jha)

  • We ran our clustering algorithm with a

similarity threshold t = 0.7 on the reference set of 2,658 samples.

  • Our system produced 87 clusters while

the reference clustering consists of 84 clusters.

  • Precision: 0.984

– precision measures how well a clustering algorithm distinguishes between samples that are different

  • Recall: 0.930

– recall measures how well a clustering algorithm recognizes similar samples

slide-42
SLIDE 42

Comparative Evaluation (Jha)

0.959 0.60 LSH Jaccard Index Our Profile 0.959 0.61 Exact Jaccard Index Our Profile 0.656 0.19 Exact Jaccard Index Syscalls 0.801 0.63 Exact Jaccard Index Bailey- Profile 0.916 0.75 Exact NCD Bailey- profile Quality Optimal Threshold Clustering Similarity Measure Behavioral Description

slide-43
SLIDE 43

Performance Evaluation (Jha)

  • Input: 75,692 malware samples
  • Previous work by Bailey et al (extrapolated

from their results of 500 samples): Number of distance calculations: 2,864,639,432 Time for a single distance calculation: 1.25 ms Runtime: 995 hours (~ 6 weeks)

  • Our results:

Number of distance calculations: 66,528,049 Runtime: 2h 18min

slide-44
SLIDE 44

Malware Clustering

  • Ulrich Bayer, Paolo Milani Comparetti,

Clemens Hlauschek, Christopher Krügel, Engin Kirda

– Scalable, Behavior-Based Malware Clustering. – NDSS 2009

  • ANUBIS by Somesh Jha et al
slide-45
SLIDE 45

Taint Analysis

45

slide-46
SLIDE 46

Paper

  • Edward J. Schwartz, Thanassis Avgerinos,

and David Brumley

– All You Ever Wanted to Know about Dynamic Taint Analysis and Forward Symbolic Execution (but Might Have Been Afraid to Ask). – IEEE Symposium on Security and Privacy 2010

46

slide-47
SLIDE 47

Two Essential Runtime Analyses

47

Dynamic Taint Analysis: What values are derived from user input? Detect Exploits

[Costa2005,Crandall2005, Newsome2005,Suh2004]

Detect packing in malware

[Bayer2009,Yin2007]

Forward Symbolic Execution: What input will make execution reach this line of code? Input Filter Generation [Costa2007,Brumley2008] Automated Test Case Generation

[Cadar2008,Godefroid2005,Sen200 5]

slide-48
SLIDE 48

48

x = get_input( ) y = x + 42 … goto y

Input is tainted untainted tainted

x 7

Δ

Var Val T x Tainted ? Var

τ

Input t = IsUntrusted(src) get_input(src)↓ t

Taint Introduction

slide-49
SLIDE 49

49

x = get_input( ) y = x + 42 … goto y

Data derived from user input is tainted untainted tainted

y 49

Δ

Var Val x 7 T y Tainted ? T Var x

τ

BinO p t1 = τ[x1] , t2 = τ[x2] x1 + x2 ↓ t1 v t2

Taint Propagation

slide-50
SLIDE 50

50 Policy Violation Detected

x = get_input( ) y = x + 42 … goto y

untainted tainted

Δ

Var Val x 7 y 49 Tainted ? T T Var x y

τ

Taint Checking

Pgoto(ta) = ¬ ta

(Must be true to execute)

slide-51
SLIDE 51

51

x = get_input( ) y = … … goto y

… strcpy(buffer,argv[1] ) ; … return ; Jumping to

  • verwritten

return address

Real Use: Exploit Detection

slide-52
SLIDE 52

Memory Load

52

Variables Memory

Δ

Var Val x 7 Tainted ? T Var x

τ μ

Add r Val 7 42 Tainted ? F Add r 7

τμ

slide-53
SLIDE 53

Problem: Memory Addresses

53

x = get_input( ) y = load( x ) … goto y

All values derived from user input are tainted??

7 42

μ

Add r Val Tainted ? F Add r 7

τμ

x 7

Δ

Var Val

slide-54
SLIDE 54

μ

Add r Val x = get_input( ) y = load( x ) … goto y

Jump target could be any untainted memory cell value

Policy 1:

54

Loa d v = Δ[x] , t = τμ[v] load(x) ↓ t

Taint depends only on the memory cell

Taint Propagation 7 42 Tainted ? F Add r 7

τμ

x 7

Δ

Var Val

Undertainting Failing to identify tainted

values

  • e.g., missing exploits
slide-55
SLIDE 55

jmp_table Policy Violation?

55

x = get_input( ) y = load(jmp_table + x % 2 ) … goto y

Policy 2:

Memory

printa printb

Address expression is tainted

Loa d v = Δ[x] , t = τμ[v], ta = τ[x] load(x) ↓ t v ta

If either the address or the memory cell is tainted, then the value is tainted

Taint Propagation

Overtainting Unaffected values are

tainted

  • e.g., exploits on safe

inputs

slide-56
SLIDE 56

Research Challenge State-of-the-Art is not perfect for all programs

56 Undertainting: Policy may miss taint Overtainting: Policy may wrongly detect taint

slide-57
SLIDE 57

The Challenge

57

0x1234567 8

232 possible inputs bad_abs(x is input) if (x < 0) then return -x if (x = 0x12345678) then return -x return x Forward Symbolic Execution: What input will make execution reach this line of code?

slide-58
SLIDE 58

f t f t

A Simple Example

58 x < 0 x symbolic can have any value bad_abs(x is input) If (x < 0) If x == 0x12345678 return -x return -x return x Interprete r Interprete r Interprete r Interprete r Interprete r x ≥ 0 Λ x != 0x12345678 x ≥ 0 Λ x == 0x12345678 x ≥ 0 What input will make execution reach this line of code?

slide-59
SLIDE 59

59

One Problem: Exponential Blowup Due to Branches

Branch 2 Branch 3 Branch 1 Exponential Number of Interpreters/formulas in # of branches

Interpreter

slide-60
SLIDE 60

60

Path Selection Heuristics

Symbolic Execution Tree

  • Depth-First Search (bounded) ,Random Search

[Cadar2008]

  • Concolic Testing [Sen2005,Godefroid2008]

However, these are heuristics. In the worst case all create an exponential number of formulas in the tree height.

slide-61
SLIDE 61

Symbolic Execution is not Easy

  • Exponential number of

interpreters/formulas

  • Exponentially-sized formulas
  • Solving a formula is NP-Complete!

61 branching substitutio n

s + s + s + s + s + s + s + s == 42

slide-62
SLIDE 62

Other Important Issues

62 Formaliz ation

Π = (s + s + s + s + s + s + s + s) == 42

More comple x policies

slide-63
SLIDE 63

Summary

  • Dynamic taint analysis and forward

symbolic execution used extensively in literature

– Formal algorithm and what is done for each possible step of execution often not emphasized

63

slide-64
SLIDE 64

Problem: Memory Addresses

64

x = get_input( ) y = load( x ) … goto y 7 42

μ

Add r Val Tainted ? F Add r 7

τμ

x 7

Δ

Var Val

slide-65
SLIDE 65

Software Fault Isolation

slide-66
SLIDE 66

Software Fault Isolation [Whabe et al., 1993]

Goal: confine apps running in same address space – Codec code should not interfere with media player – Device drivers should not corrupt kernel Simple solution: runs apps in separate address spaces – Problem: slow if apps communicate frequently

  • requires context switch per message
slide-67
SLIDE 67

Software Fault Isolation

SFI approach: – Partition process memory into segments

  • Locate unsafe instructions: jmp, load, store

– At compile time, add guards before unsafe instructions – When loading code, ensure all guards are present

code segment data segment code segment data segment

app #1 app #2

slide-68
SLIDE 68

Segment matching technique

  • Designed for MIPS processor. Many registers available.
  • dr1, dr2: dedicated registers not used by binary

– compiler pretends these registers don’t exist – dr2 contains segment ID

  • Indirect load instruction R12  [R34] becomes:

dr1  R34 scratch-reg  (dr1 >> 20) : get segment ID compare scratch-reg and dr2 : validate seg. ID trap if not equal R12  [dr1] : do load

Guard ensures code does not load data from another segment

slide-69
SLIDE 69

Address sandboxing technique

  • dr2: holds segment ID
  • Indirect load instruction R12  [R34] becomes:

dr1  R34 & segment-mask : zero out seg bits dr1  dr1 | dr2 : set valid seg ID R12  [dr1] : do load

  • Fewer instructions than segment matching

… but does not catch offending instructions

  • Similar guards places on all unsafe instructions
slide-70
SLIDE 70

Problem: what if jmp [addr] jumps directly into indirect (bypassing guard) Solution: jmp guard must ensure [addr] does not bypass load guard

slide-71
SLIDE 71

Cross domain calls

caller domain callee domain

call draw call stub draw: return

br addr br addr br addr

ret stub

  • Only stubs allowed to make cross-domain jumps
  • Jump table contains allowed exit points

– Addresses are hard coded, read-only segment

br addr br addr br addr

slide-72
SLIDE 72

SFI Summary

  • Shared memory: use virtual memory hardware

– map same physical page to two segments in addr space

  • Performance

– Usually good: mpeg_play, 4% slowdown

  • Limitations of SFI: harder to implement on x86 :

– variable length instructions: unclear where to put guards – few registers: can’t dedicate three to SFI – many instructions affect memory: more guards needed

slide-73
SLIDE 73

Dawn%Song%

Compromise% Create% Exploit%

Finding%Bugs/VulnerabiliAes%

  • ABackers:%

– Find%vulnerabiliAes% – Weaponize%them%(exploit%the%vulnerabiliAes)% – Use%exploits%to%compromise%machines%&%systems% – Exploits%are%worth%money%

Find% Vulnerability% $$$%

slide-74
SLIDE 74

Dawn%Song%

Market%for%0days%

  • Sell%for%$10KR1M%
slide-75
SLIDE 75

Dawn%Song%

Bug%fixing%

Finding%Bugs/VulnerabiliAes%

  • Defenders:%

– Find%vulnerabiliAes%&%eliminate%them%

  • Improve%security%of%so)ware%
  • Easier%and%cheaper%to%fix%a%vulnerability%before%so)ware%deployed%
  • A)er%deployed:%patching%is%expensive%

– Ideally%prove%a%program%is%free%of%vulnerabiliAes%

Bug%finding% Internal%fix% Patch% Lower%cost% Higher%cost%

slide-76
SLIDE 76

Dawn%Song%

Example:%StaAc%Device%Verifier%

  • Verifies%that%drivers%are%not%making%illegal%funcAon%calls%or%

causing%system%corrupAon%

– SLAM%project%at%Microso)% – hBp://research.microso).com/enRus/projects/slam%

  • “The%requirements%for%the%Windows%logo%program%(now$

Windows$Hardware$Cer.fica.on$Program)%state%that%a% driver%must%not%fail%while%running%under%Driver%Verifier.”%

slide-77
SLIDE 77

Dawn%Song%

Techniques%&%Approaches%

AutomaAc%test% case%generaAon% StaAc%analysis% Program%% verificaAon% Fuzzing% Dynamic% Symbolic% ExecuAon% Lower%coverage% Lower%false%posiAve% Higher%false%negaAve% Higher%coverage% Lower%false%negaAve% Higher%false%posiAve%

slide-78
SLIDE 78

Dawn%Song%

Fuzzing%

slide-79
SLIDE 79

Dawn%Song%

Finding%bugs%in%PDF%viewer%

PDF%viewer%

?"

slide-80
SLIDE 80

Dawn%Song%

BlackRbox%Fuzz%TesAng%

  • Given%a%program,%simply%feed%it%random%inputs,%see%

whether%it%crashes%

  • Advantage:%really%easy%
  • Disadvantage:%inefficient%

– Input%o)en%requires%structures,%random%inputs%are%likely% to%be%malformed% – Inputs%that%would%trigger%a%crash%is%a%very%small%fracAon,% probability%of%ge`ng%lucky%may%be%very%low%

slide-81
SLIDE 81

Dawn%Song%

Fuzzing%

  • AutomaAcally%generate%test%cases%
  • Many%slightly%anomalous%test%cases%are%input%into%a%target%
  • ApplicaAon%is%monitored%for%errors%
  • Inputs%are%generally%either%file%based%(.pdf,%%.png,%.wav,%.mpg)%
  • Or%network%based…%

– hBp,%SNMP,%SOAP%

Input% Generator% ApplicaAon% Monitor% Inputs%

slide-82
SLIDE 82

Dawn%Song%

Regression%vs.%Fuzzing%

Regression" Fuzzing" DefiniAon% Run%program%on%many%normal% inputs,%look%for%badness.% Run%program%on%many%abnormal% inputs,%look%for%badness.% % Goals% Prevent%normal"users"from% encountering%errors%(e.g.%asserAon% failures%are%bad).% Prevent%a2ackers%from%encountering% exploitable%errors%(e.g.%asserAon% failures%are%o)en%ok).%

slide-83
SLIDE 83

Dawn%Song%

Enhancement%I:%MutaAonRBased%Fuzzing%

  • Take%a%wellRformed%input,%randomly%perturb%(flipping%bit,%etc.)%
  • LiBle%or%no%knowledge%of%the%structure%of%the%inputs%is%assumed%
  • Anomalies%are%added%to%exisAng%valid%inputs%
  • Anomalies%may%be%completely%random%or%follow%some%heurisAcs%(e.g.%remove%NUL,%shi)%

character%forward)%

  • Examples:%

– E.g.,%ZZUF,%very%successful%at%finding%bugs%in%many%realRworld%programs,% hBp://sam.zoy.org/zzuf/% – Taof,%GPF,%ProxyFuzz,%FileFuzz,%Filep,%etc.%

Take%an%input% Perturb% Feed%to%program%

Crash?%

slide-84
SLIDE 84

Dawn%Song%

Example:%fuzzing%a%pdf%viewer%

  • Google%for%.pdf%(about%1%billion%results)%
  • Crawl%pages%to%build%a%corpus%%
  • Use%fuzzing%tool%(or%script)%

1. Grab%a%file% 2. Mutate%that%file% 3. Feed%it%to%the%program% 4. Record%if%it%crashed%(and%input%that%crashed%it)%

slide-85
SLIDE 85

Dawn%Song%

MutaAonRbased%Fuzzing%In%Short%

MutaAonR based%

% Super%easy%to% setup%and% automate% % LiBle%to%no%protocol% knowledge%required% % % Limited%by%iniAal% corpus% % % May%fail%for%protocols% with%checksums,%those% which%depend%on% challenge%% %

slide-86
SLIDE 86

Dawn%Song%

Enhancement%II:%GeneraAonRBased%Fuzzing%

  • Test%cases%are%generated%from%some%descripAon%of%the%format:%RFC,%

documentaAon,%etc.%

– Using%specified%protocols/file%format%info% – E.g.,%SPIKE%by%Immunity% hBp://www.immunitysec.com/resourcesRfreeso)ware.shtml%

  • Anomalies%are%added%to%each%possible%spot%in%the%inputs%
  • Knowledge%of%protocol%should%give%beBer%results%than%random%fuzzing%

Take%a%spec% Generate% concrete%inputs% Feed%to%program% RFC% …%

Crash?%

slide-87
SLIDE 87

Dawn%Song%

Example:%Protocol%DescripAon%

//png.spk //author: Charlie Miller // Header - fixed. s_binary("89504E470D0A1A0A"); // IHDRChunk s_binary_block_size_word_bigendian("IHDR"); //size of data field s_block_start("IHDRcrc"); s_string("IHDR"); // type s_block_start("IHDR"); // The following becomes s_int_variable for variable stuff // 1=BINARYBIGENDIAN, 3=ONEBYE s_push_int(0x1a, 1); // Width s_push_int(0x14, 1); // Height s_push_int(0x8, 3); // Bit Depth - should be 1,2,4,8,16, based on colortype s_push_int(0x3, 3); // ColorType - should be 0,2,3,4,6 s_binary("00 00"); // Compression || Filter - shall be 00 00 s_push_int(0x0, 3); // Interlace - should be 0,1 s_block_end("IHDR"); s_binary_block_crc_word_littleendian("IHDRcrc"); // crc of type and data s_block_end("IHDRcrc"); ...

slide-88
SLIDE 88

Dawn%Song%

GeneraAonRBased%Fuzzing%In%Short%

MutaAonR based%

Super%easy%to% setup%and% automate% LiBle%to%no%protocol% knowledge%required% % Limited%by% iniAal%corpus% % May%fail%for%protocols% with%checksums,%those% which%depend%on% challenge%% %

GeneraAon Rbased%

WriAng% generator%can%be% labor%intensive% for%complex% protocols% Have%to%have%spec%of% protocol%(O)en%can% find%good%tools%for% exisAng%protocols%e.g.% hBp,%SNMP)% Completeness% Can%deal%with%complex% dependencies%e.g.% checksums% %

slide-89
SLIDE 89

Dawn%Song%

Fuzzing%Tools%&%Frameworks%

Input%generaAon% Input%injecAon% Bug%detecAon%

slide-90
SLIDE 90

Dawn%Song%

Input%GeneraAon%

  • ExisAng%generaAonal%fuzzers%for%common%protocols%()p,%hBp,%SNMP,%etc.)%

– Mu%Dynamics,%Codenomicon,%PROTOS,%FTPFuzz,%WebScarab%

  • Fuzzing%Frameworks:%providing%a%fuzz%set%with%a%given%spec%

– SPIKE,%Peach,%Sulley%

  • MutaAonRbased%fuzzers%

– Taof,%GPF,%ProxyFuzz,%PeachShark%

  • Special%purpose%fuzzers%

– AcAveX%(AxMan),%regular%expressions,%etc.%

slide-91
SLIDE 91

Dawn%Song%

Input%InjecAon%

  • Simplest%

– Run%program%on%fuzzed%file% – Replay%fuzzed%packet%trace%%

  • Modify%exisAng%program/client%

– Invoke%fuzzer%at%appropriate%point%

  • Use%fuzzing%framework%

– e.g.%Peach%automates%generaAng%COM%interface%fuzzers%

slide-92
SLIDE 92

Dawn%Song%

Bug%DetecAon%

  • See%if%program%crashed%

– Type%of%crash%can%tell%a%lot%(SEGV%vs.%assert%fail)%

  • Run%program%under%dynamic%memory%error%detector%(valgrind/purify)%

– Catch%more%bugs,%but%more%expensive%per%run.%

  • See%if%program%locks%up%
  • Write%your%own%checker:%e.g.%valgrind%skins%
slide-93
SLIDE 93

Dawn%Song%

Workflow%AutomaAon%

  • Sulley,%Peach,%MuR4000%

– %Provide%tools%to%aid%setup,%running,%recording,%etc.%

  • Virtual%machines:%help%create%reproducible%

workload% %

slide-94
SLIDE 94

Dawn%Song%

How%Much%Fuzzing%Is%Enough?%

  • MutaAon%based%fuzzers%may%generate%an%infinite%number%of%

test%cases...%%When%has%the%fuzzer%run%long%enough?%

  • GeneraAon%based%fuzzers%may%generate%a%finite%number%of%

test%cases.%%What%happens%when%theyre%all%run%and%no%bugs% are%found?%

slide-95
SLIDE 95

Dawn%Song%

Code%Coverage%

  • Some%of%the%answers%to%these%quesAons%lie%

in%code$coverage$

  • Code%coverage%is%a%metric%which%can%be%used%

to%determine%how%much%code%has%been% executed.%

  • Data%can%be%obtained%using%a%variety%of%

profiling%tools.%e.g.%gcov!

%

slide-96
SLIDE 96

Dawn%Song%

Line%Coverage%

if( a > 2 ) a = 2; if( b > 2 ) b = 2;

Line/block%coverage:%Measures%how%many%lines%of%

source%code%have%been%executed.% % For%the%code%on%the%right,%how%many%test%cases% (values%of%pair%(a,b))%needed%for%full(100%)%line% coverage?%

slide-97
SLIDE 97

Dawn%Song%

Branch%Coverage%

if( a > 2 ) a = 2; if( b > 2 ) b = 2;

Branch%coverage:%Measures%how%many%

branches%in%code%have%been%taken% (condiAonal%jmps)% % For%the%code%on%the%right,%how%many%test%cases% needed%for%full%branch%coverage?%

slide-98
SLIDE 98

Dawn%Song%

Path%Coverage%

Path%coverage:%Measures%how%many%

paths%have%been%taken.% %

For%the%code%on%the%right,%how%many%test%cases% needed%for%full%path%coverage?%

if( a > 2 ) a = 2; if( b > 2 ) b = 2;

slide-99
SLIDE 99

Dawn%Song%

Code%Coverage%

  • Benefits:%

– How%good%is%this%iniAal%file?% – Am%I%ge`ng%stuck%somewhere?%

if(packet[0x10] < 7) { //hot path } else { //cold path }

– How%good%is%fuzzer%X%vs.%fuzzer%Y% – Am%I%ge`ng%benefits%from%running%a%different%fuzzer?%

%

slide-100
SLIDE 100

Dawn%Song%

Problems%of%code%coverage%

  • For:%
  • Does%full%line%coverage%guarantee%finding%the%bug?%

%○%%Yes %%%%%○%%No% % %

mySafeCpy(char *dst, char* src){ if(dst && src) strcpy(dst, src); }

slide-101
SLIDE 101

Dawn%Song%

Problems%of%code%coverage%

  • For:%
  • Does%full%line%coverage%guarantee%finding%the%bug?%

%○%%Yes %%%%%○%%No%

  • Does%full%branch%coverage%guarantee%finding%the%bug?%

%○%%Yes %%%%%○%%No% % %

mySafeCpy(char *dst, char* src){ if(dst && src) strcpy(dst, src); }

slide-102
SLIDE 102

Dawn%Song%

Fuzzing%Rules%of%Thumb%

  • Protocol%specific%knowledge%very%helpful%

– GeneraAonal%tends%to%beat%random,%beBer%specs%make%beBer%fuzzers%

  • More%fuzzers%is%beBer%

– Each%implementaAon%will%vary,%different%fuzzers%find%different%bugs%

  • The%longer%you%run,%the%more%bugs%you%may%find%
  • Best%results%come%from%guiding%the%process%

– NoAce%where%your%ge`ng%stuck,%use%profiling!%

  • Code%coverage%can%be%very%useful%for%guiding%the%process:%AFL%
  • Can%we%do%beBer?%