Peng Li
UNC, Chapel Hill, NC, USA
Debin Gao
School of Information Systems, Singapore Management University, Singapore
Mike Reiter
UNC, Chapel Hill, NC, USA
1
Peng Li UNC, Chapel Hill, NC, USA Debin Gao School of Information - - PowerPoint PPT Presentation
Peng Li UNC, Chapel Hill, NC, USA Debin Gao School of Information Systems, Singapore Management University, Singapore Mike Reiter UNC, Chapel Hill, NC, USA 1 Background and Introduction Overall Structure Conversion Algorithm
UNC, Chapel Hill, NC, USA
School of Information Systems, Singapore Management University, Singapore
UNC, Chapel Hill, NC, USA
1
2
3
Input Input
[S. A. Hofmeyr et al.] [R. Sekar et al.]
syscall syscall
4
Rebuild the model by collecting traces of the updated program
Problems: 1. Setting up sanitized environment free of attacks 2. Setting up environment as similar as possible to the one in which the updated program will be run 3. Multiple such environments
Adapt the old model to the changes induced by the patch
5
Bin Diff Analyzer Ingredient II Ingredient III Ingredient I
6
main () { 1: int a = 2; 2: f(a); } void f (int x){ 1: sys_call (1); 2: if (x == 1) 3: sys_call (3); 4: else if (x==2) 5: sys_call (5); } main () f ()
f.1 main.2 f.5
5 1 main.2 f.1 sys1 main.2 f.5 sys5 Call stack 1 Call stack 2
7
White-box technique: main () { 1: int a = 2; 2: f(a); } void f (int x){ 1: sys_call (1); 2: if (x == 1) 3: sys_call (3); 4: else if (x==2) 5: sys_call (5); }
enter
main ()
main.2 exit enter f.3 f.5 exit
f () 3 5
f.1
1
8
BinHunt [Gao, Reiter, Song, ICICS08]
A novel technique for finding semantic differences in binary programs Computes the maximum common induced subgraphs between
control flow graphs
unpatched patched
9
10
BinHunt CFG pieces Diff Old Execution Graph
11
12
13
noncall call f’() jz syscall3 call f’’() syscall4 call g’() jz syscall3 call g’’()
When simple copy doesn’t work
14
f()
syscall call f’() syscall enter exit
call f’’()
Please see proof in the paper
g()
syscall call g’() syscall enter exit
call g’’()
3 4 3 4
f’() g’()
“Extended Similarity”
15
16
17
18
19
Copied Not copied Nodes Edges Nodes Edges tar 478 1430 0 (0%) 0 (0%) ncompress 151 489 3 (1.9%) 23 (4.5%) ProFTPD 775 1850 6 (0.7%) 28 (1.5%) unzip 374 1004 50 (11.8%) 195 (16.3%)
Statistics for nodes and edges in the converted execution graph
20
Statistics for the size comparison and algorithm efficiency
Old Binary New binary Old EG (trained) New EG (converted) New EG (trained) nodes edges nodes edges time (sec) nodes edges tar 478 1430 478 1430 14.5 478 1430 ncompress 151 489 154 512 13.1 151 489 ProFTPD 775 1850 781 1878 17.4 776 1853 unzip 374 1004 424 1199 41.6 377 1017
21
System call sequences by analyzing the CFG System call sequences accepted by the converted model System call sequences accepted by the trained model System call sequence Please see proof in the paper
22
23
24
25