Behavior-Based Detection
The old way – match syntactic signatures:
The new way – examine underlying behavior:
One-to-
- ne
One-to- many < 50% detection
1
Behavior-Based Detection The old way match syntactic signatures: - - PowerPoint PPT Presentation
Behavior-Based Detection The old way match syntactic signatures: One-to- one < 50% detection The new way examine underlying behavior: One-to- many 1 Specifying Behaviors NtOpenKey \CurrentVersion\Run NtDeleteValueKey
The new way – examine underlying behavior:
One-to-
One-to- many < 50% detection
1
NtOpenKey “…\CurrentVersion\Run ” NtDeleteValueKey “McAfee Firewall”
2
3
NtOpenKey “…\CurrentVersion\Run ” NtDeleteValueKey “McAfee Firewall” NtOpenKey “…\InternetSettings\... ” NtSetValueKey “ProxyBypass”
4
6
NtOpenKey “…\CurrentVersion\Run ” NtDeleteValueKey “McAfee Firewall”
NtOpenKey “…\CurrentVersion\Run” NtDeleteValueKey “McAfee Firewall”
Structural leap mining exploits the correlation between structural similarity and similarity in
11
Significance `
Significance score similar to parent! This means we can prune siblings
Most significant pattern!
12
13
solution space
– Derive new solutions greedily most of the time – With certain probability, move to sub-optimal solutions in the search avoid local minima – Known sampling methods, cooling schedules to guarantee optimal convergence
Detection Rate False Positives `
Probabilistically take sub-optimal solution!
17
18
Known Malware
Specification Synthesis
Discriminative Specification Benign Apps Significant Behaviors
Behavior Mining
Benign Apps Recent Malware
18
492 samples Known Malware
Specification Synthesis
Discriminative Specification Benign Apps Significant Behaviors
Behavior Mining
11 apps 166 behaviors 378 samples Benign Apps 28 apps 1 specification (with 10-fold cross-validation)
Behavior-Based Malware Detection
Detection Results Recent Malware Benign Apps 28 apps 42 samples New Malware
19
20
22
23
(61.70%)
24
(61.70%)
26
NtOpenKey “…\CurrentVersion\Run ” NtDeleteValueKey “McAfee Firewall”
analysis reports
cluster them into sets of malware reports that exhibit similar behavior.
– we require automated clustering techniques
– discard reports of samples that have been seen before – guide an analyst in the selection of those samples that require most attention – derive generalized signatures, implement removal procedures that work for a whole class of samples
Find a partitioning of a given set of malware samples into subsets so that subsets share some common traits (i.e., find “virus families”)
malware sample is represented by its actions performed at run-time
large sets of malware samples
Execution Trace augmented with taint-information and network analysis results
Dynamic Analysis of the Sample
Extraction
Behavioral Profile Clustering Behavioral Profile
Result Result Input Input
– system calls can vary significantly, even between programs that exhibit the same behavior – remove execution-specific artifacts from the trace
f = fopen(“C:\\test”); read(f, 1); read(f, 1); read(f, 1); f = fopen(“C:\\test”); read(f, 3); f = fopen(“C:\\test”); read(f, 1); readRegValue(..); read(f, 1);
f = fopen(“C:\\test”); read(f, 1); read(f, 1); read(f, 1); f = fopen(“C:\\test”); read(f, 3); f = fopen(“C:\\test”); read(f, 1); readRegValue(..); read(f, 1);
f = fopen(“C:\\test”); read(f, 1); read(f, 1); read(f, 1); f = fopen(“C:\\test”); read(f, 3); f = fopen(“C:\\test”); read(f, 1); readRegValue(..); read(f, 1);
f = fopen(“C:\\test”); read(f, 1); read(f, 1); read(f, 1); f = fopen(“C:\\test”); read(f, 3); f = fopen(“C:\\test”); read(f, 1); readRegValue(..); read(f, 1);
f = fopen(“C:\\test”); read(f, 1); read(f, 1); read(f, 1); f = fopen(“C:\\test”); read(f, 3); f = fopen(“C:\\test”); read(f, 1); readRegValue(..); read(f, 1);
manipulated via system calls
– has a name and a type
– carried out on an OS object – the order of operations is irrelevant – the number of operations on a certain resource does not matter
(e.g., a copy operation from a source file to a target file)
– also reflect the true order of operations
the program (comparisons with tainted data)
between all pairs of points => O(n2)
Indyk and Motwani, to compute an approximate clustering that requires less than n2 distance computations
where each malware sample is represented as a set of features
compare our clustering results with a reference clustering of the same sample set
– since no reference clustering for malware exists, we had to create it first
were submitted to Anubis from Oct. 27th 2007 to Jan. 31st 2008
anti-virus programs reported the same malware family. This resulted in a total of 2,658 samples.
– precision measures how well a clustering algorithm distinguishes between samples that are different
– recall measures how well a clustering algorithm recognizes similar samples
0.959 0.60 LSH Jaccard Index Our Profile 0.959 0.61 Exact Jaccard Index Our Profile 0.656 0.19 Exact Jaccard Index Syscalls 0.801 0.63 Exact Jaccard Index Bailey- Profile 0.916 0.75 Exact NCD Bailey- profile Quality Optimal Threshold Clustering Similarity Measure Behavioral Description
45
47
Dynamic Taint Analysis: What values are derived from user input? Detect Exploits
[Costa2005,Crandall2005, Newsome2005,Suh2004]
Detect packing in malware
[Bayer2009,Yin2007]
Forward Symbolic Execution: What input will make execution reach this line of code? Input Filter Generation [Costa2007,Brumley2008] Automated Test Case Generation
[Cadar2008,Godefroid2005,Sen200 5]
Data derived from user input is tainted untainted tainted
untainted tainted
(Must be true to execute)
… strcpy(buffer,argv[1] ) ; … return ; Jumping to
return address
53
All values derived from user input are tainted??
Jump target could be any untainted memory cell value
54
values
jmp_table Policy Violation?
55
printa printb
Address expression is tainted
tainted
inputs
57
0x1234567 8
59
Branch 2 Branch 3 Branch 1 Exponential Number of Interpreters/formulas in # of branches
Interpreter
60
[Cadar2008]
However, these are heuristics. In the worst case all create an exponential number of formulas in the tree height.
s + s + s + s + s + s + s + s == 42
Π = (s + s + s + s + s + s + s + s) == 42
64
Goal: confine apps running in same address space – Codec code should not interfere with media player – Device drivers should not corrupt kernel Simple solution: runs apps in separate address spaces – Problem: slow if apps communicate frequently
SFI approach: – Partition process memory into segments
– At compile time, add guards before unsafe instructions – When loading code, ensure all guards are present
app #1 app #2
– compiler pretends these registers don’t exist – dr2 contains segment ID
dr1 R34 scratch-reg (dr1 >> 20) : get segment ID compare scratch-reg and dr2 : validate seg. ID trap if not equal R12 [dr1] : do load
dr1 R34 & segment-mask : zero out seg bits dr1 dr1 | dr2 : set valid seg ID R12 [dr1] : do load
… but does not catch offending instructions
Problem: what if jmp [addr] jumps directly into indirect (bypassing guard) Solution: jmp guard must ensure [addr] does not bypass load guard
caller domain callee domain
call draw call stub draw: return
br addr br addr br addr
ret stub
– Addresses are hard coded, read-only segment
– map same physical page to two segments in addr space
– Usually good: mpeg_play, 4% slowdown
– variable length instructions: unclear where to put guards – few registers: can’t dedicate three to SFI – many instructions affect memory: more guards needed
Dawn%Song%
Compromise% Create% Exploit%
Find% Vulnerability% $$$%
Dawn%Song%
Dawn%Song%
Bug%fixing%
– Find%vulnerabiliAes%&%eliminate%them%
– Ideally%prove%a%program%is%free%of%vulnerabiliAes%
Bug%finding% Internal%fix% Patch% Lower%cost% Higher%cost%
Dawn%Song%
– SLAM%project%at%Microso)% – hBp://research.microso).com/enRus/projects/slam%
Dawn%Song%
AutomaAc%test% case%generaAon% StaAc%analysis% Program%% verificaAon% Fuzzing% Dynamic% Symbolic% ExecuAon% Lower%coverage% Lower%false%posiAve% Higher%false%negaAve% Higher%coverage% Lower%false%negaAve% Higher%false%posiAve%
Dawn%Song%
Dawn%Song%
PDF%viewer%
Dawn%Song%
Dawn%Song%
– hBp,%SNMP,%SOAP%
Input% Generator% ApplicaAon% Monitor% Inputs%
Dawn%Song%
Regression" Fuzzing" DefiniAon% Run%program%on%many%normal% inputs,%look%for%badness.% Run%program%on%many%abnormal% inputs,%look%for%badness.% % Goals% Prevent%normal"users"from% encountering%errors%(e.g.%asserAon% failures%are%bad).% Prevent%a2ackers%from%encountering% exploitable%errors%(e.g.%asserAon% failures%are%o)en%ok).%
Dawn%Song%
character%forward)%
– E.g.,%ZZUF,%very%successful%at%finding%bugs%in%many%realRworld%programs,% hBp://sam.zoy.org/zzuf/% – Taof,%GPF,%ProxyFuzz,%FileFuzz,%Filep,%etc.%
Take%an%input% Perturb% Feed%to%program%
Crash?%
Dawn%Song%
1. Grab%a%file% 2. Mutate%that%file% 3. Feed%it%to%the%program% 4. Record%if%it%crashed%(and%input%that%crashed%it)%
Dawn%Song%
MutaAonR based%
% Super%easy%to% setup%and% automate% % LiBle%to%no%protocol% knowledge%required% % % Limited%by%iniAal% corpus% % % May%fail%for%protocols% with%checksums,%those% which%depend%on% challenge%% %
Dawn%Song%
documentaAon,%etc.%
– Using%specified%protocols/file%format%info% – E.g.,%SPIKE%by%Immunity% hBp://www.immunitysec.com/resourcesRfreeso)ware.shtml%
Take%a%spec% Generate% concrete%inputs% Feed%to%program% RFC% …%
Crash?%
Dawn%Song%
//png.spk //author: Charlie Miller // Header - fixed. s_binary("89504E470D0A1A0A"); // IHDRChunk s_binary_block_size_word_bigendian("IHDR"); //size of data field s_block_start("IHDRcrc"); s_string("IHDR"); // type s_block_start("IHDR"); // The following becomes s_int_variable for variable stuff // 1=BINARYBIGENDIAN, 3=ONEBYE s_push_int(0x1a, 1); // Width s_push_int(0x14, 1); // Height s_push_int(0x8, 3); // Bit Depth - should be 1,2,4,8,16, based on colortype s_push_int(0x3, 3); // ColorType - should be 0,2,3,4,6 s_binary("00 00"); // Compression || Filter - shall be 00 00 s_push_int(0x0, 3); // Interlace - should be 0,1 s_block_end("IHDR"); s_binary_block_crc_word_littleendian("IHDRcrc"); // crc of type and data s_block_end("IHDRcrc"); ...
Dawn%Song%
MutaAonR based%
Super%easy%to% setup%and% automate% LiBle%to%no%protocol% knowledge%required% % Limited%by% iniAal%corpus% % May%fail%for%protocols% with%checksums,%those% which%depend%on% challenge%% %
GeneraAon Rbased%
WriAng% generator%can%be% labor%intensive% for%complex% protocols% Have%to%have%spec%of% protocol%(O)en%can% find%good%tools%for% exisAng%protocols%e.g.% hBp,%SNMP)% Completeness% Can%deal%with%complex% dependencies%e.g.% checksums% %
Dawn%Song%
Dawn%Song%
– Mu%Dynamics,%Codenomicon,%PROTOS,%FTPFuzz,%WebScarab%
– SPIKE,%Peach,%Sulley%
– Taof,%GPF,%ProxyFuzz,%PeachShark%
– AcAveX%(AxMan),%regular%expressions,%etc.%
Dawn%Song%
– Run%program%on%fuzzed%file% – Replay%fuzzed%packet%trace%%
– Invoke%fuzzer%at%appropriate%point%
– e.g.%Peach%automates%generaAng%COM%interface%fuzzers%
Dawn%Song%
– Type%of%crash%can%tell%a%lot%(SEGV%vs.%assert%fail)%
– Catch%more%bugs,%but%more%expensive%per%run.%
Dawn%Song%
Dawn%Song%
test%cases...%%When%has%the%fuzzer%run%long%enough?%
test%cases.%%What%happens%when%theyre%all%run%and%no%bugs% are%found?%
Dawn%Song%
%
Dawn%Song%
if( a > 2 ) a = 2; if( b > 2 ) b = 2;
Line/block%coverage:%Measures%how%many%lines%of%
source%code%have%been%executed.% % For%the%code%on%the%right,%how%many%test%cases% (values%of%pair%(a,b))%needed%for%full(100%)%line% coverage?%
Dawn%Song%
if( a > 2 ) a = 2; if( b > 2 ) b = 2;
Branch%coverage:%Measures%how%many%
branches%in%code%have%been%taken% (condiAonal%jmps)% % For%the%code%on%the%right,%how%many%test%cases% needed%for%full%branch%coverage?%
Dawn%Song%
paths%have%been%taken.% %
For%the%code%on%the%right,%how%many%test%cases% needed%for%full%path%coverage?%
if( a > 2 ) a = 2; if( b > 2 ) b = 2;
Dawn%Song%
– How%good%is%this%iniAal%file?% – Am%I%ge`ng%stuck%somewhere?%
if(packet[0x10] < 7) { //hot path } else { //cold path }
– How%good%is%fuzzer%X%vs.%fuzzer%Y% – Am%I%ge`ng%benefits%from%running%a%different%fuzzer?%
%
Dawn%Song%
%○%%Yes %%%%%○%%No% % %
mySafeCpy(char *dst, char* src){ if(dst && src) strcpy(dst, src); }
Dawn%Song%
%○%%Yes %%%%%○%%No%
%○%%Yes %%%%%○%%No% % %
mySafeCpy(char *dst, char* src){ if(dst && src) strcpy(dst, src); }
Dawn%Song%
– GeneraAonal%tends%to%beat%random,%beBer%specs%make%beBer%fuzzers%
– Each%implementaAon%will%vary,%different%fuzzers%find%different%bugs%
– NoAce%where%your%ge`ng%stuck,%use%profiling!%