Peng Li UNC, Chapel Hill, NC, USA Debin Gao School of Information - - PowerPoint PPT Presentation

peng li
SMART_READER_LITE
LIVE PREVIEW

Peng Li UNC, Chapel Hill, NC, USA Debin Gao School of Information - - PowerPoint PPT Presentation

Peng Li UNC, Chapel Hill, NC, USA Debin Gao School of Information Systems, Singapore Management University, Singapore Mike Reiter UNC, Chapel Hill, NC, USA 1 Background and Introduction Overall Structure Conversion Algorithm


slide-1
SLIDE 1

Peng Li

UNC, Chapel Hill, NC, USA

Debin Gao

School of Information Systems, Singapore Management University, Singapore

Mike Reiter

UNC, Chapel Hill, NC, USA

1

slide-2
SLIDE 2

 Background and Introduction  Overall Structure  Conversion Algorithm  Evaluation  Summary

2

slide-3
SLIDE 3

 By statically analyzing the source code or

the binary

  • No false alarms
  • Less sensitive

 By training

  • Better tuned to the workload where they operate;

more sensitive

  • May suffer from false alarms

3

slide-4
SLIDE 4

… ...

Input Input

[S. A. Hofmeyr et al.] [R. Sekar et al.]

syscall syscall

4

slide-5
SLIDE 5

 Rebuild the model by collecting traces of the updated program

Problems: 1. Setting up sanitized environment free of attacks 2. Setting up environment as similar as possible to the one in which the updated program will be run 3. Multiple such environments

 Adapt the old model to the changes induced by the patch

5

slide-6
SLIDE 6

Bin Diff Analyzer Ingredient II Ingredient III Ingredient I

6

slide-7
SLIDE 7

main () { 1: int a = 2; 2: f(a); } void f (int x){ 1: sys_call (1); 2: if (x == 1) 3: sys_call (3); 4: else if (x==2) 5: sys_call (5); } main () f ()

f.1 main.2 f.5

5 1 main.2 f.1 sys1 main.2 f.5 sys5 Call stack 1 Call stack 2

7

slide-8
SLIDE 8

White-box technique: main () { 1: int a = 2; 2: f(a); } void f (int x){ 1: sys_call (1); 2: if (x == 1) 3: sys_call (3); 4: else if (x==2) 5: sys_call (5); }

enter

main ()

main.2 exit enter f.3 f.5 exit

f () 3 5

f.1

1

8

slide-9
SLIDE 9

BinHunt [Gao, Reiter, Song, ICICS08]

 A novel technique for finding semantic differences in binary programs  Computes the maximum common induced subgraphs between

control flow graphs

  • Maximum match per pair of functions
  • Maximum match between two programs

unpatched patched

9

slide-10
SLIDE 10

 Background and Introduction  Overall Structure  Conversion Algorithm  Evaluation  Summary

10

slide-11
SLIDE 11

BinHunt CFG pieces Diff Old Execution Graph

11

slide-12
SLIDE 12

 Background and Introduction  Overall Structure  Conversion Algorithm  Evaluation  Summary

12

slide-13
SLIDE 13

 We iteratively do the conversion for each

pair of matched functions: copying nodes and edges

 For the function that has no match or for

the unmatched portion of the matched functions, we resort to static analysis

 Then we do conversion on edges which

connect functions (calls and returns)

13

slide-14
SLIDE 14

Matched parts for two functions

noncall call f’() jz syscall3 call f’’() syscall4 call g’() jz syscall3 call g’’()

f() g()

?

When simple copy doesn’t work

14

slide-15
SLIDE 15

f()

syscall call f’() syscall enter exit

call f’’()

Please see proof in the paper

g()

syscall call g’() syscall enter exit

call g’’()

3 4 3 4

f’() g’()

“Extended Similarity”

15

slide-16
SLIDE 16

 Are the properties of Execution Graph

model preserved after our conversion?

 Will the converted model raise false

alarm on the language accepted by the trained model?

 How to make use of the output from

binary difference analyzer? …

16

slide-17
SLIDE 17

 Background and Introduction  Overall Structure  Conversion Algorithm  Evaluation  Summary

17

slide-18
SLIDE 18

 tar

  • Old version has an input validation error
  • Patched part only involves a function call that

does not make any syscalls

  • EG unchanged

 ncompress

  • Old version misses a boundary check
  • In the new version, an err-msg printing branch is

added

  • Converted EG expanded slightly

18

slide-19
SLIDE 19

 ProFTPD

  • Cross-site Request forgery is possible
  • Input validation checks added
  • Converted EG expanded slightly

 Unzip

  • May pass invalid pointers and potentially execute

arbitrary code

  • Patch involves changes in four functions
  • Nodes and edges increase more significantly

19

slide-20
SLIDE 20

Copied Not copied Nodes Edges Nodes Edges tar 478 1430 0 (0%) 0 (0%) ncompress 151 489 3 (1.9%) 23 (4.5%) ProFTPD 775 1850 6 (0.7%) 28 (1.5%) unzip 374 1004 50 (11.8%) 195 (16.3%)

Statistics for nodes and edges in the converted execution graph

20

slide-21
SLIDE 21

Statistics for the size comparison and algorithm efficiency

Old Binary New binary Old EG (trained) New EG (converted) New EG (trained) nodes edges nodes edges time (sec) nodes edges tar 478 1430 478 1430 14.5 478 1430 ncompress 151 489 154 512 13.1 151 489 ProFTPD 775 1850 781 1878 17.4 776 1853 unzip 374 1004 424 1199 41.6 377 1017

21

slide-22
SLIDE 22

System call sequences by analyzing the CFG System call sequences accepted by the converted model System call sequences accepted by the trained model System call sequence Please see proof in the paper

U

22

slide-23
SLIDE 23

 Background and Introduction  Overall Structure  Conversion Algorithm  Evaluation  Summary

23

slide-24
SLIDE 24

 An approach to adapt a trained anomaly

detector to software patches

 An algorithm for the conversion  We show that our algorithm is sound

  • (Proof) The behavior accepted by the converted

detector is consistent with the static analysis of the binary

  • (Empirically) The converted detector did not raise

alarms on the behavior accepted by the trained detector of the new binary

24

slide-25
SLIDE 25

pengli@cs.unc.edu

25