VUDDY: A Scalable Approach for Vulnerable Code Clone Detection - - PowerPoint PPT Presentation

vuddy a scalable approach for
SMART_READER_LITE
LIVE PREVIEW

VUDDY: A Scalable Approach for Vulnerable Code Clone Detection - - PowerPoint PPT Presentation

IEEE S&P 2017 VUDDY: A Scalable Approach for Vulnerable Code Clone Detection Seulbae Kim , Seunghoon Woo, Heejo Lee, and Hakjoo Oh Korea University May 23, 2017 Question Number of unpatched vulnerabilities in smartphone firmwares


slide-1
SLIDE 1

VUDDY: A Scalable Approach for Vulnerable Code Clone Detection

Seulbae Kim, Seunghoon Woo, Heejo Lee, and Hakjoo Oh Korea University May 23, 2017 IEEE S&P 2017

slide-2
SLIDE 2

Question

  • Number of unpatched vulnerabilities in smartphone firmware’s source code?

Computer & Communication Security Lab., Korea University 1

200+ unpatched vulnerable code clones detected!

slide-3
SLIDE 3

Motivation

  • Number of open source software is increasing

Computer & Communication Security Lab., Korea University 2

slide-4
SLIDE 4

Motivation

  • Code clones – reused code fragments
  • Major cause of vulnerability propagation

Computer & Communication Security Lab., Korea University 3

CVE-2016-5195

slide-5
SLIDE 5

Problem: Scalable & Accurate Vulnerable Code Clone Discovery

Computer & Communication Security Lab., Korea University 4

slide-6
SLIDE 6
  • Scalability

Scalable & Accurate Vulnerable Code Clone discovery

Computer & Communication Security Lab., Korea University 5

scalability accuracy

Software systems are getting bigger Linux kernel – 25.4 MLoC “L” Smart TV – 35 MLoC

slide-7
SLIDE 7
  • Accuracy

Scalable & Accurate Vulnerable Code Clone discovery

6

scalability accuracy

FP == increased time and efforts

Computer & Communication Security Lab., Korea University

slide-8
SLIDE 8
  • Previous approaches

Scalable & Accurate Vulnerable Code Clone discovery

7

scalability accuracy

Token-level matching Kamiya et al., CCFinder (TSE’02) Graph/tree matching Jiang et al., (ICSE’07) Bag-of-tokens matching Sajnani et al., SourcererCC (ICSE’16) Line-level matching Jang et al., ReDeBug (S&P’12) File-level matching Sasaki et al., FCFinder (MSR’10)

Computer & Communication Security Lab., Korea University

slide-9
SLIDE 9
  • Goal

Scalable & Accurate Vulnerable Code Clone discovery

Computer & Communication Security Lab., Korea University 8

scalability accuracy

?

Token-level matching Kamiya et al., CCFinder (TSE’02) Graph/tree matching Jiang et al., (ICSE’07) Bag-of-tokens matching Sajnani et al., SourcererCC (ICSE’16) Line-level matching Jang et al., ReDeBug (S&P’12) File-level matching Sasaki et al., FCFinder (MSR’10)

slide-10
SLIDE 10

Proposed Method: VUDDY

Computer & Communication Security Lab., Korea University 9

slide-11
SLIDE 11

Demonstration of VUDDY

Computer & Communication Security Lab., Korea University 10

slide-12
SLIDE 12

Proposed method: VUDDY

  • VUDDY: VUlnerable coDe clone DiscoverY

Computer & Communication Security Lab., Korea University 11

slide-13
SLIDE 13

Proposed method: VUDDY

  • VUDDY: VUlnerable coDe clone DiscoverY
  • Searches for vulnerable code clones

Computer & Communication Security Lab., Korea University 12

slide-14
SLIDE 14

Proposed method: VUDDY

  • VUDDY: VUlnerable coDe clone DiscoverY
  • Searches for vulnerable code clones
  • Scales beyond 1 BLoC target

Computer & Communication Security Lab., Korea University 13

slide-15
SLIDE 15

Proposed method: VUDDY

  • VUDDY: VUlnerable coDe clone DiscoverY
  • Searches for vulnerable code clones
  • Scales beyond 1 BLoC target
  • Detects both known & unknown vulnerability

Computer & Communication Security Lab., Korea University 14

slide-16
SLIDE 16

Proposed method: VUDDY

  • VUDDY: VUlnerable coDe clone DiscoverY
  • Searches for vulnerable code clones
  • Scales beyond 1 BLoC target
  • Detects both known & unknown vulnerability
  • Low false positive rate

Computer & Communication Security Lab., Korea University 15

slide-17
SLIDE 17
  • Overview

Proposed method: VUDDY

Computer & Communication Security Lab., Korea University 16

vulnerable functions A Program a target program fingerprint dictionary

  • f vulnerable functions

fingerprinting

fingerprint dictionary

  • f target functions

fingerprinting

vulnerable code clones dictionary comparison

slide-18
SLIDE 18

Collecting vulnerable code

  • Vulnerability patching

Computer & Communication Security Lab., Korea University 17

Old code (vulnerable) CVE patch New code (fixed)

slide-19
SLIDE 19

Collecting vulnerable code

  • Reconstructing vulnerability from security patch

Computer & Communication Security Lab., Korea University 18

Old code (vulnerable) Software repository CVE patch

slide-20
SLIDE 20

Fingerprinting a program

Computer & Communication Security Lab., Korea University 19

A Program

slide-21
SLIDE 21

Fingerprinting a program

  • 1. Retrieve all functions from a program

Computer & Communication Security Lab., Korea University 20

A Program

int sum (int a, int b) { return a + b; } void increment() { int num = 80; num++; // no return } void printer (char* src) { printf(“%s”, src); }

slide-22
SLIDE 22

Fingerprinting a program

  • 2. Apply abstraction and normalization to functions

Computer & Communication Security Lab., Korea University 21

A Program

int sum (int a, int b) { return a + b; } void increment() { int num = 80; num++; // no return } void printer (char* src) { printf(“%s”, src); } returnfparam+fparam; dtypelvar=80;lvar++; funccall(“%s”,fparam);

slide-23
SLIDE 23

Fingerprinting a program

  • 3. Compute length and hash value

Computer & Communication Security Lab., Korea University 22

A Program

int sum (int a, int b) { return a + b; } void increment() { int num = 80; num++; // no return } void printer (char* src) { printf(“%s”, src); } returnfparam+fparam; dtypelvar=80;lvar++; funccall(“%s”,fparam);

length : 20 hash val: C94D9910… length : 20 hash val: D6E77882… length : 23 hash val: 9A45E4A1…

slide-24
SLIDE 24

Fingerprinting a program

  • 4. Store in a dictionary

Computer & Communication Security Lab., Korea University 23

A Program length : 20 hash val: C94D9910… length : 20 hash val: D6E77882… length : 23 hash val: 9A45E4A1…

20: [C94D9910, D6E77882] 23: [9A45E4A1] “Fingerprint dictionary”

slide-25
SLIDE 25

Abstraction

  • Transform function by replacing
  • Formal parameters
  • Data types
  • Local variables
  • Function names

Computer & Communication Security Lab., Korea University 24

Level 0: No abstraction 1 void avg (float arr[], int len) { 2 static float sum = 0; 3 unsigned int i; 4 5 for (i = 0; i < len; i++) { 6 sum += arr[i]; 7 } 8 9 printf(“%f %d\n”, sum/len, validate(sum)); 10 }

slide-26
SLIDE 26

Abstraction

  • Transform function by replacing
  • Formal parameters
  • Data types
  • Local variables
  • Function names

Computer & Communication Security Lab., Korea University 25

Level 1: Formal parameter abstraction 1 void avg (float FPARAM[], int FPARAM) { 2 static float sum = 0; 3 unsigned int i; 4 5 for (i = 0; i < FPARAM; i++) { 6 sum += FPARAM[i]; 7 } 8 9 printf(“%f %d\n”, sum/FPARAM, validate(sum)); 10 }

slide-27
SLIDE 27
  • Transform function by replacing
  • Formal parameters
  • Data types
  • Local variables
  • Function names

Abstraction

Computer & Communication Security Lab., Korea University 26

Level 2: Local variable name abstraction 1 void avg (float FPARAM[], int FPARAM) { 2 static float LVAR = 0; 3 unsigned int LVAR; 4 5 for (LVAR = 0; LVAR < FPARAM; LVAR ++) { 6 LVAR += FPARAM[LVAR]; 7 } 8 9 printf(“%f %d\n”, LVAR/FPARAM, validate(LVAR)); 10 }

slide-28
SLIDE 28

Level 3: Data type abstraction 1 DTYPE avg (DTYPE FPARAM[], DTYPE FPARAM) { 2 DTYPE LVAR = 0; 3 unsigned DTYPE LVAR; 4 5 for (LVAR = 0; LVAR < FPARAM; LVAR ++) { 6 LVAR += FPARAM[LVAR]; 7 } 8 9 printf(“%f %d\n”, LVAR/FPARAM, validate(LVAR)); 10 }

  • Transform function by replacing
  • Formal parameters
  • Data types
  • Local variables
  • Function names

Abstraction

Computer & Communication Security Lab., Korea University 27

slide-29
SLIDE 29
  • Transform function by replacing
  • Formal parameters
  • Data types
  • Local variables
  • Function names

Level 4: Function call abstraction 1 DTYPE avg (DTYPE FPARAM[], DTYPE FPARAM) { 2 DTYPE LVAR = 0; 3 unsigned DTYPE LVAR; 4 5 for (LVAR = 0; LVAR < FPARAM; LVAR ++) { 6 LVAR += FPARAM[LVAR]; 7 } 8 9 FUNCCALL(“%f %d\n”, LVAR/FPARAM, FUNCCALL(LVAR)); 10 }

Abstraction

Computer & Communication Security Lab., Korea University 28

slide-30
SLIDE 30

1 DTYPE avg (DTYPE FPARAM[], DTYPE FPARAM) { 2 DTYPE LVAR = 0; 3 unsigned DTYPE LVAR; 4 5 for (LVAR = 0; LVAR < FPARAM; LVAR ++) { 6 LVAR += FPARAM[LVAR]; 7 } 8 9 FUNCCALL(“%f %d\n”, LVAR/FPARAM, FUNCCALL(LVAR)); 10 }

  • Remove
  • comments
  • tabs
  • white spaces
  • CRLF
  • Convert into lowercase

Normalization

Computer & Communication Security Lab., Korea University 29

dtypelvar=0;unsigneddtypelvar;for(lvar=0;lvar<fparam;lvar++){lvar+=fparam[lvar];}funccall(“%f %d\n”,lvar/fparam,funccall(lvar));

slide-31
SLIDE 31

Vulnerable code clone detection

  • By comparing two fingerprint dictionaries

Computer & Communication Security Lab., Korea University 30

repository fingerprint dictionary

  • f vulnerable functions
slide-32
SLIDE 32

Vulnerable code clone detection

  • By comparing two fingerprint dictionaries

Computer & Communication Security Lab., Korea University 31

repository fingerprint dictionary

  • f vulnerable functions

target program fingerprint dictionary

  • f target functions
slide-33
SLIDE 33

Vulnerable code clone detection

  • By comparing two fingerprint dictionaries

Computer & Communication Security Lab., Korea University 32

repository fingerprint dictionary

  • f vulnerable functions

target program fingerprint dictionary

  • f target functions

20: [C94D9910, D6E77882] 23: [9A45E4A1] 20: [ABCDEF01, C94D9910] 21: [D155F630] 22: [C67F45FD, DDBF3838]

slide-34
SLIDE 34

Vulnerable code clone detection

  • By comparing two fingerprint dictionaries

Computer & Communication Security Lab., Korea University 33

repository fingerprint dictionary

  • f vulnerable functions

target program fingerprint dictionary

  • f target functions

20: [C94D9910, D6E77882] 23: [9A45E4A1] 20: [ABCDEF01, C94D9910] 21: [D155F630] 22: [C67F45FD, DDBF3838]

key_lookup(20) hit

slide-35
SLIDE 35

Vulnerable code clone detection

  • By comparing two fingerprint dictionaries

Computer & Communication Security Lab., Korea University 34

repository fingerprint dictionary

  • f vulnerable functions

target program fingerprint dictionary

  • f target functions

20: [C94D9910, D6E77882] 23: [9A45E4A1] 20: [ABCDEF01, C94D9910] 21: [D155F630] 22: [C67F45FD, DDBF3838]

key_lookup(20) hit → have C94D9910 in common (CLONE!)

slide-36
SLIDE 36

Vulnerable code clone detection

  • By comparing two fingerprint dictionaries

Computer & Communication Security Lab., Korea University 35

repository fingerprint dictionary

  • f vulnerable functions

target program fingerprint dictionary

  • f target functions

20: [C94D9910, D6E77882] 23: [9A45E4A1] 20: [ABCDEF01, C94D9910] 21: [D155F630] 22: [C67F45FD, DDBF3838]

key_lookup(21) fail

slide-37
SLIDE 37

Vulnerable code clone detection

  • By comparing two fingerprint dictionaries

Computer & Communication Security Lab., Korea University 36

repository fingerprint dictionary

  • f vulnerable functions

target program fingerprint dictionary

  • f target functions

20: [C94D9910, D6E77882] 23: [9A45E4A1] 20: [ABCDEF01, C94D9910] 21: [D155F630] 22: [C67F45FD, DDBF3838]

key_lookup(22) fail

slide-38
SLIDE 38

Vulnerable code clone detection

  • By comparing two fingerprint dictionaries

Computer & Communication Security Lab., Korea University 37

repository fingerprint dictionary

  • f vulnerable functions

target program fingerprint dictionary

  • f target functions

20: [C94D9910, D6E77882] 23: [9A45E4A1] 20: [ABCDEF01, C94D9910] 21: [D155F630] 22: [C67F45FD, DDBF3838]

clone: C94D9910

int sum (int a, int b) { return a + b; }

slide-39
SLIDE 39

Performance Evaluation & Case Study

Computer & Communication Security Lab., Korea University 38

slide-40
SLIDE 40

Performance

  • Scalability evaluation
  • Dataset: 25 K GitHub projects (>1 push, >1 star during Jan 1~July 28, 2016)
  • Execution time when varying size of target programs are given to

VUDDY, CCFinderX, DECKARD, ReDeBug, and SourcererCC

Computer & Communication Security Lab., Korea University 39

Memory error (Deckard) File I/O error (CCFinderX) Explosive increase (SourcererCC) VUDDY Scales to 1 BLoC (ReDeBug)

slide-41
SLIDE 41

Time TP FP FN Precision VUDDY 22 s

9

3 1.000 SourcererCC 125 s

2

54 10 0.036 DECKARD 234 s

4

458 8 0.009 CCFinderX 1201 s

11

63 1 0.147

Performance

  • Accuracy evaluation
  • Vulnerability database VS Apache HTTPD 2.4.23 (350 KLoC)
  • TP: CCFinderX > VUDDY > DECKARD > SourcererCC (the greater, the better)
  • FP: VUDDY < SourcererCC < CCFinderX < DECKARD (the lower, the better)

Computer & Communication Security Lab., Korea University 40

TABLE I: Accuracy of VUDDY, SourcererCC, DECKARD, and CCFinderX when detecting clones between the vulnerability database and Apache HTTPD 2.4.23

slide-42
SLIDE 42

Performance

  • VUDDY vs ReDeBug (CMU, S&P’12)
  • Detecting vulnerable code clones in an Android smartphone’s firmware (15 MLoC)

Computer & Communication Security Lab., Korea University 41

VUDDY ReDeBug Preprocessing time 17 m 3 s 11 m 16 s Clone detection time 1.09 s 16 m 59 s # initial reports 206 2,090 # true positives 206 202 # false positives 1,888

TABLE II: Comparison of VUDDY and ReDeBug, targeting Android firmware

slide-43
SLIDE 43

Performance

  • VUDDY vs ReDeBug (CMU, S&P’12)
  • Detecting vulnerable code clones in an Android smartphone’s firmware (15 MLoC)

Computer & Communication Security Lab., Korea University 42

1 10 100 1000 10000 VUDDY ReDeBug 1023 636 1.09 1019

Preprocessing Clone detection

time (s)

Generated fingerprints can be reused Actual detection in practice: 1000x faster

slide-44
SLIDE 44

Case study

  • Unknown vulnerability detected in Linux kernel (even in 4.11.1)

Computer & Communication Security Lab., Korea University 43

Original patch for CVE-2008-3528 targeting ext2 file system 1 struct ext2_dir_entry_2 * ext2_dotdot (struct inode *dir, struct page **p) 2 { 3 - struct page *page = ext2_get_page(dir, 0); 4 + struct page *page = ext2_get_page(dir, 0, 0); 5 ext2_dirent *de = NULL; 6 7 if (!IS_ERR(page)) { Vulnerable function in nilfs2 file system 1 struct nilfs_dir_entry *nilfs_dotdot (struct inode * dir, struct page **p) 2 { 3 struct page *page = nilfs_get_page(dir, 0); 4 struct nilfs_dir_entry *de = NULL; 5 6 if (!IS_ERR(page)) { Patched function in ext2 file system 1 struct ext2_dir_entry *ext2_dotdot (struct inode * dir, struct page **p) 2 { 3 struct page *page = ext2_get_page(dir, 0, 0); 4 struct ext2_dir_entry *de = NULL; 5 6 if (!IS_ERR(page)) {

Could trigger “printk flood” & DoS in CentOS 7, and Ubuntu14.04

slide-45
SLIDE 45

Case study

  • Zero-day in Apache HTTPD 2.4.23 (2.4.20 through 2.4.25)
  • HTTPD uses unpatched Expat library for parsing XML
  • vulnerable to CVE-2012-0876
  • Hash DoS attack triggered by sending a crafted packet!

Computer & Communication Security Lab., Korea University 44

// Vulnerable function in httpd/srclib/apr-util/xml/expat/lib/xmlparse.c, lines 5429-5433. for (i = 0; i < table->size; i++){ if (table->v[i]) { unsigned long newHash = hash(table->v[i]->name); size_t j = newHash & newMask; step = 0;

slide-46
SLIDE 46

Summary

Computer & Communication Security Lab., Korea University 45

slide-47
SLIDE 47

Summary

  • VUDDY is an approach capable of detecting software vulnerability

using a database of previously security-patched functions

Computer & Communication Security Lab., Korea University 46

slide-48
SLIDE 48

Summary

  • VUDDY is an approach capable of detecting software vulnerability

using a database of previously security-patched functions

  • Applying abstraction to the functions enable identifying

unknown vulnerable functions while still maintaining a low margin of errors

Computer & Communication Security Lab., Korea University 47

slide-49
SLIDE 49

Summary

  • VUDDY is an approach capable of detecting software vulnerability

using a database of previously security-patched functions

  • Applying abstraction to the functions enable identifying

unknown vulnerable functions while still maintaining a low margin of errors

  • Function-level granularity and length-based filtering

reduces the number of signature comparisons, guaranteeing high scalability

Computer & Communication Security Lab., Korea University 48

slide-50
SLIDE 50

Summary

  • VUDDY is an approach capable of detecting software vulnerability

using a database of previously security-patched functions

  • Applying abstraction to the functions enable identifying

unknown vulnerable functions while still maintaining a low margin of errors

  • Function-level granularity and length-based filtering

reduces the number of signature comparisons, guaranteeing high scalability

  • Open web service
  • Implementation and testing available at https://iotcube.net

Computer & Communication Security Lab., Korea University 49