VUDDY: A Scalable Approach for Vulnerable Code Clone Detection - PowerPoint PPT Presentation

IEEE S&P 2017 VUDDY: A Scalable Approach for Vulnerable Code Clone Detection Seulbae Kim , Seunghoon Woo, Heejo Lee, and Hakjoo Oh Korea University May 23, 2017

Question • Number of unpatched vulnerabilities in smartphone firmware’s source code? 200+ unpatched vulnerable code clones detected! Computer & Communication Security Lab., Korea University 1

Motivation • Number of open source software is increasing Computer & Communication Security Lab., Korea University 2

Motivation • Code clones – reused code fragments • Major cause of vulnerability propagation CVE-2016-5195 Computer & Communication Security Lab., Korea University 3

Problem: Scalable & Accurate Vulnerable Code Clone Discovery Computer & Communication Security Lab., Korea University 4

Scalable & Accurate Vulnerable Code Clone discovery • Scalability Software systems are getting bigger Linux kernel – 25.4 MLoC accuracy “L” Smart TV – 35 MLoC scalability Computer & Communication Security Lab., Korea University 5

Scalable & Accurate Vulnerable Code Clone discovery • Accuracy scalability FP == increased time and efforts accuracy Computer & Communication Security Lab., Korea University 6

Scalable & Accurate Vulnerable Code Clone discovery • Previous approaches accuracy Line-level Token-level matching matching Jang et al., Kamiya et al., Graph/tree ReDeBug (S&P’12) CCFinder (TSE’02) matching Bag-of-tokens Jiang et al ., (ICSE’07) matching Sasaki et al., Sajnani et al., FCFinder (MSR’10) SourcererCC (ICSE’16) File-level matching scalability Computer & Communication Security Lab., Korea University 7

Scalable & Accurate Vulnerable Code Clone discovery • Goal accuracy ? Line-level Token-level matching matching Jang et al., Kamiya et al., Graph/tree ReDeBug (S&P’12) CCFinder (TSE’02) matching Bag-of-tokens Jiang et al ., (ICSE’07) matching Sasaki et al., Sajnani et al., FCFinder (MSR’10) SourcererCC (ICSE’16) File-level matching scalability Computer & Communication Security Lab., Korea University 8

Proposed Method: VUDDY Computer & Communication Security Lab., Korea University 9

Demonstration of VUDDY Computer & Communication Security Lab., Korea University 10

Proposed method: VUDDY • VUDDY: VUlnerable coDe clone DiscoverY Computer & Communication Security Lab., Korea University 11

Proposed method: VUDDY • VUDDY: VUlnerable coDe clone DiscoverY • Searches for vulnerable code clones Computer & Communication Security Lab., Korea University 12

Proposed method: VUDDY • VUDDY: VUlnerable coDe clone DiscoverY • Searches for vulnerable code clones • Scales beyond 1 BLoC target Computer & Communication Security Lab., Korea University 13

Proposed method: VUDDY • VUDDY: VUlnerable coDe clone DiscoverY • Searches for vulnerable code clones • Scales beyond 1 BLoC target • Detects both known & unknown vulnerability Computer & Communication Security Lab., Korea University 14

Proposed method: VUDDY • VUDDY: VUlnerable coDe clone DiscoverY • Searches for vulnerable code clones • Scales beyond 1 BLoC target • Detects both known & unknown vulnerability • Low false positive rate Computer & Communication Security Lab., Korea University 15

Proposed method: VUDDY • Overview fingerprinting dictionary vulnerable functions fingerprint dictionary comparison vulnerable of vulnerable functions code clones fingerprinting A Program a target program fingerprint dictionary of target functions Computer & Communication Security Lab., Korea University 16

Collecting vulnerable code • Vulnerability patching Old code New code CVE patch (vulnerable) (fixed) Computer & Communication Security Lab., Korea University 17

Collecting vulnerable code • Reconstructing vulnerability from security patch Old code Software repository CVE patch (vulnerable) Computer & Communication Security Lab., Korea University 18

Fingerprinting a program A Program Computer & Communication Security Lab., Korea University 19

Fingerprinting a program 1. Retrieve all functions from a program int sum (int a, int b) { return a + b; } void increment() { int num = 80; A Program num++; // no return } void printer (char* src) { printf(“%s”, src); } Computer & Communication Security Lab., Korea University 20

Fingerprinting a program 2. Apply abstraction and normalization to functions int sum (int a, int b) { returnfparam+fparam; return a + b; } void increment() { int num = 80; dtypelvar=80;lvar++; A Program num++; // no return } void printer (char* src) { funccall (“%s”, fparam); printf (“%s”, src); } Computer & Communication Security Lab., Korea University 21

Fingerprinting a program 3. Compute length and hash value int sum (int a, int b) length : 20 { returnfparam+fparam; return a + b; hash val: C94D9910… } void increment() { length : 20 int num = 80; dtypelvar=80;lvar++; A Program hash val: D6E77882… num++; // no return } void printer (char* src) length : 23 { funccall (“%s”, fparam); printf (“%s”, src); hash val: 9A45E4A1… } Computer & Communication Security Lab., Korea University 22

Fingerprinting a program 4. Store in a dictionary length : 20 hash val: C94D9910… “Fingerprint dictionary” 20: [C94D9910, D6E77882] length : 20 A Program hash val: D6E77882… 23: [9A45E4A1] length : 23 hash val: 9A45E4A1… Computer & Communication Security Lab., Korea University 23

Abstraction • Transform function by replacing • Formal parameters Level 0: No abstraction • Data types 1 void avg (float arr [], int len ) { 2 static float sum = 0; • Local variables 3 unsigned int i; 4 • Function names 5 for (i = 0; i < len ; i++) { 6 sum += arr [i]; 7 } 8 9 printf (“%f %d \ n”, sum/ len , validate (sum)); 10 } Computer & Communication Security Lab., Korea University 24

Abstraction • Transform function by replacing • Formal parameters Level 1: Formal parameter abstraction 1 void avg (float FPARAM [], int FPARAM ) { • Data types 2 static float sum = 0; • Local variables 3 unsigned int i; 4 • Function names 5 for (i = 0; i < FPARAM ; i++) { 6 sum += FPARAM [i]; 7 } 8 9 printf (“%f %d \ n”, sum/ FPARAM , validate (sum)); 10 } Computer & Communication Security Lab., Korea University 25

Abstraction • Transform function by replacing • Formal parameters Level 2: Local variable name abstraction 1 void avg (float FPARAM[], int FPARAM) { • Data types 2 static float LVAR = 0; • Local variables 3 unsigned int LVAR ; 4 • Function names 5 for ( LVAR = 0; LVAR < FPARAM; LVAR ++) { 6 LVAR += FPARAM[ LVAR ]; 7 } 8 9 printf (“%f %d \ n”, LVAR /FPARAM, validate ( LVAR )); 10 } Computer & Communication Security Lab., Korea University 26

Abstraction • Transform function by replacing • Formal parameters Level 3: Data type abstraction 1 DTYPE avg ( DTYPE FPARAM[], DTYPE FPARAM) { • Data types 2 DTYPE LVAR = 0; • Local variables 3 unsigned DTYPE LVAR; 4 • Function names 5 for (LVAR = 0; LVAR < FPARAM; LVAR ++) { 6 LVAR += FPARAM[LVAR]; 7 } 8 9 printf (“%f %d \ n”, LVAR/FPARAM, validate (LVAR)); 10 } Computer & Communication Security Lab., Korea University 27

Abstraction • Transform function by replacing • Formal parameters Level 4: Function call abstraction 1 DTYPE avg (DTYPE FPARAM[], DTYPE FPARAM) { • Data types 2 DTYPE LVAR = 0; • Local variables 3 unsigned DTYPE LVAR; 4 • Function names 5 for (LVAR = 0; LVAR < FPARAM; LVAR ++) { 6 LVAR += FPARAM[LVAR]; 7 } 8 9 FUNCCALL (“%f %d \ n”, LVAR/FPARAM, FUNCCALL (LVAR)); 10 } Computer & Communication Security Lab., Korea University 28

Normalization • Remove • comments 1 DTYPE avg (DTYPE FPARAM[], DTYPE FPARAM) { • tabs 2 DTYPE LVAR = 0; • white spaces 3 unsigned DTYPE LVAR; 4 • CRLF 5 for (LVAR = 0; LVAR < FPARAM; LVAR ++) { 6 LVAR += FPARAM[LVAR]; • Convert into lowercase 7 } 8 9 FUNCCALL (“%f %d \ n”, LVAR/FPARAM, FUNCCALL (LVAR)); 10 } dtypelvar=0;unsigneddtypelvar;for(lvar=0;lvar<fparam;lvar++){lvar+=fparam[lvar];} funccall (“% f %d\n ”, lvar/fparam, funccall (lvar)); Computer & Communication Security Lab., Korea University 29

Vulnerable code clone detection • By comparing two fingerprint dictionaries repository fingerprint dictionary of vulnerable functions Computer & Communication Security Lab., Korea University 30

Vulnerable code clone detection • By comparing two fingerprint dictionaries repository fingerprint dictionary of vulnerable functions target program fingerprint dictionary of target functions Computer & Communication Security Lab., Korea University 31

Vulnerable code clone detection • By comparing two fingerprint dictionaries 20: [ABCDEF01, C94D9910] 21: [D155F630] 22: [C67F45FD, DDBF3838] repository fingerprint dictionary of vulnerable functions 20: [C94D9910, D6E77882] 23: [9A45E4A1] target program fingerprint dictionary of target functions Computer & Communication Security Lab., Korea University 32

VUDDY: A Scalable Approach for Vulnerable Code Clone Detection - PowerPoint PPT Presentation

IEEE S&P 2017 VUDDY: A Scalable Approach for Vulnerable Code Clone Detection Seulbae Kim , Seunghoon Woo, Heejo Lee, and Hakjoo Oh Korea University May 23, 2017 Question Number of unpatched vulnerabilities in smartphone firmwares

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the

Lightcuts: A Scalable Lightcuts: A Scalable Approach to Illumination Approach to Illumination

A Scalable Scalable Approach Approach A for for Large- -Scale Scale Schema Schema

The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors Austin T.

Dyninst Scalable Tools Workshop Granlibakken Resort Lake Tahoe, California Dyninst Scalable

Scalable Distributed Lineage Authentication Ashish Gehani Scalable Distributed Lineage

Scalable Video Scalable Video Bishoy Gamil Stefanos Outline Outline Introduction

WSO2 Message Broker Scalable persistent Messaging System Outline Messaging Scalable

Scalable Performance Performance Signalling Signalling Scalable and Congestion Avoidance

Highly Scalable Highly Scalable Ethernets Ethernets Paul Bottorff, Chief Architect, Carrier

Scalable Vector Graphics (SVG) XML Graphics for the Web SVG Overview Scalable Vector Graphics

TenantGuard: Scalable Runtime Verification of Cloud-Wide VM-Level Network Isolation Han Song

ATLAS ATLAS A Scalable Emulator for A Scalable Emulator for Transactional Parallel Systems

Scalable Range Locks for Scalable Address Spaces And Beyond Alex Kogan Dave Dice Shady

Secure Scalable CCT Secure Scalable CCTV, Mobile, and W Mobile, and Wearable earable Video F

CS 162 Intro to Computer Science II Makefiles 1 Outline

Anne Bracy CS 3410 Computer Science Cornell University The slides are the product of many

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

On the Construction of ivas utra -Alphabets Wiebke Petersen Institute of Language and

CIS 371 Computer Organization and Design Unit 5: Pipelining Based on slides by Prof. Amir Roth

CS7038 - Malware Analysis - Wk02.1 Attack Introduction Coleman Kane kaneca@mail.uc.edu January

Recognizing objects and actions in Finding boundaries images and video Recognizing

Concurrency http://csunplugged.org/routing-and-deadlock Fundamentals of Computer Science Outline