QS QSYM YM : A : A P Pract ctical Con Concol olic Ex Executi - PowerPoint PPT Presentation

QS QSYM YM : A : A P Pract ctical Con Concol olic Ex Executi tion on En Engine Tailor ored for or Hyb Hybrid id F Fuzzin ing Insu Yun, Sangho Lee, Meng Xu, Yeongjin Jang †, and Taesoo Kim Georgia Institute of Technology & Oregon State University † 27th USENIX Security Symposium August 16, 2018 1

Two popular ways to find security bugs: Fuzzing & Concolic execution Fuzzing Symbolic Execution 2

Fuzzing and Concolic execution have their own pros and cons • Fuzzing • Good: Finding general inputs • Bad: Finding specific inputs • Concolic execution • Good: Finding specific inputs • Bad: State explosion 3

Hybrid fuzzing can address their problems • Use both techniques: Fuzzing + Concolic execution • Find specific inputs: Using concolic execution • Limit state explosion: Only fork at branches that are hard to fuzzing 4

Hybrid fuzzing has achieved great success in small- scale study • e.g.) Driller: a state-of-the-art hybrid fuzzer • Won 3 rd place in CGC competition • Found 6 new crashes: cannot be found by fuzzing nor concolic execution 5

However, current hybrid fuzzing suffers from problems to scale to real-world applications • Very slow to generate constraint • Cannot support complete system calls • Not effective in generating test cases 6

Our system, QSYM, addresses these issues by introducing several key ideas • Discard intermediate layer for performance • Use concrete environment to support system calls • Introduce heuristics to effectively generate test cases 7

QSYM is scalable to real-world software • 13 previously unknown bugs in open-source software • All applications are already fuzzed (OSS-Fuzz, AFL, …) • Including ffmpeg that is fuzzed by OSS-Fuzz for 2 years • Bugs are hard to pure fuzzing – require complex constraints 8

Overview: Hybrid fuzzing in general t0 = GET:I32(ebp ) push ebp t1 = GET:I32(esp ) mov ebp, esp t2 = Sub32(t1,0x00000004) Program … … Basic block Intermediate Representations A[0] == ‘A’ Coverage && A[1] == ‘A’ && A[2] == ‘A’ … Constraints State forking Fuzzing Test cases 9

Overview: Hybrid fuzzing in general t0 = GET:I32(ebp ) push ebp t1 = GET:I32(esp ) Performance mov ebp, esp t2 = Sub32(t1,0x00000004) overhead Program … … Basic block Intermediate Representations A[0] == ‘A’ Coverage && A[1] == ‘A’ && A[2] == ‘A’ … Constraints State forking Fuzzing Test cases 10

Overview: QSYM 1. Instruction-level execution A[0] == ‘A’ push ebp && A[1] == ‘A’ mov ebp, esp && A[2] == ‘A’ Program … … Basic block Constraints Coverage Test cases Fuzzing 11

Overview: Hybrid fuzzing in general t0 = GET:I32(ebp ) push ebp t1 = GET:I32(esp ) mov ebp, esp t2 = Sub32(t1,0x00000004) Program … … Basic block Intermediate Representations A[0] == ‘A’ Coverage && A[1] == ‘A’ && A[2] == ‘A’ … Constraints Incomplete State forking Fuzzing Test cases Environment modeling 12

Overview: QSYM 1. Instruction-level execution 2. Concrete environment modeling A[0] == ‘A’ push ebp && A[1] == ‘A’ mov ebp, esp && A[2] == ‘A’ Program … … Basic block Constraints Coverage Test cases Fuzzing 13

Overview: Hybrid fuzzing in general t0 = GET:I32(ebp ) push ebp t1 = GET:I32(esp ) mov ebp, esp t2 = Sub32(t1,0x00000004) Program … … Ineffective test case generation due to unsatisfiable paths Basic block Intermediate Representations A[0] == ‘A’ Coverage && A[1] == ‘A’ && A[2] == ‘A’ … Constraints State forking Fuzzing Test cases 14

Overview: QSYM 1. Instruction-level execution 2. Concrete environment modeling A[0] == ‘A’ push ebp && A[1] == ‘A’ 3. Optimistic Solving mov ebp, esp && A[2] == ‘A’ Program … … Basic block Constraints Coverage Test cases Fuzzing 15

Overview: Hybrid fuzzing in general t0 = GET:I32(ebp ) push ebp t1 = GET:I32(esp ) mov ebp, esp t2 = Sub32(t1,0x00000004) Program … Blocked … by complex logics Basic block Intermediate Representations A[0] == ‘A’ Coverage && A[1] == ‘A’ && A[2] == ‘A’ … Constraints State forking Fuzzing Test cases 16

Overview: QSYM 1. Instruction-level execution 2. Concrete environment modeling A[0] == ‘A’ push ebp && A[1] == ‘A’ 3. Optimistic Solving mov ebp, esp && A[2] == ‘A’ Program … … Basic block Constraints 4. Basic block pruning Refer our paper Coverage Test cases Fuzzing 17

Overview: Hybrid fuzzing in general t0 = GET:I32(ebp ) push ebp t1 = GET:I32(esp ) Performance mov ebp, esp t2 = Sub32(t1,0x00000004) overhead Program … … Basic block Intermediate Representations A[0] == ‘A’ Coverage && A[1] == ‘A’ && A[2] == ‘A’ … Constraints State forking Fuzzing Test cases 18

Intermediate representations (IR) are good to make implementations easier • Provide architecture-independent interpretations • Can re-use code for all architectures • e.g. angr works on many architectures: x86, arm, and mips 19

Problem1: IR incurs significant performance overhead • Increase the number of instructions • 4.7 times in VEX (IR used by angr) • Need to execute a whole basic block symbolically • Due to caching and optimization • Only 30% of instructions need to be symbolically executed 20

Solution1: Execute instructions directly without using intermediate layer • Remove the IR translation layer • Pay for the implementation complexity 21

QSYM reduces the number of instructions to execute symbolically • 126 CGC binaries 4x less 22

Overview: Hybrid fuzzing in general t0 = GET:I32(ebp ) push ebp t1 = GET:I32(esp ) mov ebp, esp t2 = Sub32(t1,0x00000004) Program … … Basic block Intermediate Representations A[0] == ‘A’ Coverage && A[1] == ‘A’ && A[2] == ‘A’ … Constraints Incomplete State forking Fuzzing Test cases Environment modeling 23

State forking can reduce re-execution overhead for constraint generation • No need to re-execute to reach the state • Recover from the snapshot 24

State forking for kernel is non-trivial • State in concolic execution = Program state + Kernel state • Forking program state is trivial • Save application memory + register • Save constraints • Forking kernel state is non-trivial • Need to maintain all kernel data structures • e.g., file system, network state, memory system … 25

Problem2: State forking introduces problems in either completeness or performance • Kernel modeling • e.g.) angr • Pros: Small performance overhead • Cons: Incompleteness – angr supports only 22 system calls in Linux • Full kernel emulation • e.g.) S2E • Pros: Completeness • Cons: Large performance overhead 26

Solution2: Re-execute to use concrete environment instead of kernel state forking • Instead of state forking, re-execute from start • High re-execution overhead • Instruction-level execution • Basic block pruning • Limit constraint solving: Based on coverage from fuzzing 27

Models minimal system calls and uses concrete values • Only model system calls that are relevant to user interactions • e.g.) standard input, file read, … • Other system calls: Call system call using concrete values • e.g.) mprotect(addr, sym_size , PROT_R) à mprotect(addr, conc_size , PROT_R) 28

Problem: Concrete environment results in incomplete constraints • Add implicit constraints • e.g.) mprotect(addr, sym_size , PROT_R) à mprotect(addr, conc_size , PROT_R) • Without knowing semantics of system calls • Concretize: Over-constrained • Ignore: Under-constrained 29

Unrelated constraint elimination can tolerate incomplete constraints x = int(input()) Constraints for x (Incomplete) y = int(input()) && y * y == 1337 * 1337 Path constraints # Incomplete constraints mprotect(addr, x, PROT_R) y * y == 1337 * 1337 if y * y == 1337 * 1337: Branch dependent constraints bug() x = Use concrete value y = 1337 30

Overview: Hybrid fuzzing in general t0 = GET:I32(ebp ) push ebp t1 = GET:I32(esp ) mov ebp, esp t2 = Sub32(t1,0x00000004) Program … Ineffective test case generation … due to unsatisfiable paths Basic block Intermediate Representations A[0] == ‘A’ Coverage && A[1] == ‘A’ && A[2] == ‘A’ … Constraints State forking Fuzzing Test cases 31

Problem3: Over-constrained paths results in no test cases type = int(input()) type = int(input()) if type == TYPE1: parse_TYPE1() type == TYPE1 type != TYPE1 … …. + long time if type == TYPE2: parse_TYPE2() type == TYPE2 Unsatisfiable: No test case 32

Problem3: Over-constrained paths results in no test cases If these branches are independent type = int(input()) type = int(input()) if type == TYPE1: parse_TYPE1() type == TYPE1 type != TYPE1 … …. + long time if type == TYPE2: parse_TYPE2() type == TYPE2 33

Solution3: Solve constraints optimistically type = int(input()) type = int(input()) if type == TYPE1: parse_TYPE1() type == TYPE1 type != TYPE1 … …. + long time if type == TYPE2: parse_TYPE2() type == TYPE2 34

QS QSYM YM : A : A P Pract ctical Con Concol olic Ex Executi - PowerPoint PPT Presentation

QS QSYM YM : A : A P Pract ctical Con Concol olic Ex Executi tion on En Engine Tailor ored for or Hyb Hybrid id F Fuzzin ing Insu Yun, Sangho Lee, Meng Xu, Yeongjin Jang , and Taesoo Kim Georgia Institute of Technology &

QSym over Sym has a stable basis Aaron Lauve and Sarah Mason 3 August 2010 Aaron Lauve and Sarah

Pract ctical Cybersecu curity Ri Risk a and C Control Ma Maturi urity A y Asse ssessme

Sa mpson Community Sa mpson Community Colle g e Colle g e BE BE ST ST PRACT PRACT ICE

CON MI NE CON MI NE CON MI NE CON MI NE CLOSURE & RECLAMATI ON CLOSURE & RECLAMATI ON

Consistency Maintenance: Propagation Consistency Maintenance: Propagation Con fl ict Resolution

QSYM : A PRACTICAL CONCOLIC EXECUTION ENGINE TAILORED FOR HYBRID FUZZING Insu Yun, Sangho Lee,

Act Active S Shoot ooter Be Best P Pract ctice ces With Special G Wit h Special Gues

Company Presentation Con Condo dor r Pr Pressu essure Con e Contr trol ol Parent Company

Representing Constraints datatype con = of ty * ty | /\ of con * con | TRIVIAL infix 4

Case S Studie dies a and P Practica ctical Inte terpreta tatio tions ns o of ISO

COURSE ELEMENTS NGA ARIA I ROTO NGA TIKANGA NGA TIKANGA THEORY PRA PRACTICA CTICAL

Practic Pra ctical al Experi perience ence to He o Help lp You our r Bu Busi siness

Storytelli Storytelling ng in in Infosec Infosec (Draft aft of) a Pr Practical ctical Guide

AA AAV9. 9. L LAM AMP-2B R 2B Reverses M Metabol olic a and Ph Physiologic Mu Multiorgan

Ca Cali liforni ornia: a: Pol olic icie ies, s, Pr Prac acti tices ces an and Preven

DIV IVERSIT ERSITY Y OF OF NU NUCLEAR LEAR POWER ER POL OLIC ICIES IES IN IN EU EUROP

Validation of the Interface to the Routing System (I2RS) intermediate talk Kerem Saka June 15,

Networking and Protocol Architectures Examples ITS323: Introduction to Data Communications

Routing and Switching End-to-end delivery on layer 3 in TCP/IP terms Network Layer Primary

CS 457 Lecture 12 Routing Fall 2011 IP Address and 24-bit Subnet Mask Address 12 34

On Under-Determined Dynamical Systems Oded Maler CNRS - VERIMAG Grenoble, France EMSOFT 2011

Radio-Activated Water (RAW) Systems RAW Exchange System Preliminary Design In-Process Stakeholder

CSE 6350 File and Storage System Infrastructure in Data centers Supporting Internet-wide Services

Create your own type system in 1 hour Michael Ernst University of Washington

QS QSYM YM : A : A P Pract ctical Con Concol olic Ex Executi - PowerPoint PPT Presentation

QS QSYM YM : A : A P Pract ctical Con Concol olic Ex Executi tion on En Engine Tailor ored for or Hyb Hybrid id F Fuzzin ing Insu Yun, Sangho Lee, Meng Xu, Yeongjin Jang , and Taesoo Kim Georgia Institute of Technology &

QSym over Sym has a stable basis Aaron Lauve and Sarah Mason 3 August 2010 Aaron Lauve and Sarah

Pract ctical Cybersecu curity Ri Risk a and C Control Ma Maturi urity A y Asse ssessme

Sa mpson Community Sa mpson Community Colle g e Colle g e BE BE ST ST PRACT PRACT ICE

CON MI NE CON MI NE CON MI NE CON MI NE CLOSURE &amp; RECLAMATI ON CLOSURE &amp; RECLAMATI ON

Consistency Maintenance: Propagation Consistency Maintenance: Propagation Con fl ict Resolution

QSYM : A PRACTICAL CONCOLIC EXECUTION ENGINE TAILORED FOR HYBRID FUZZING Insu Yun, Sangho Lee,

Act Active S Shoot ooter Be Best P Pract ctice ces With Special G Wit h Special Gues

Company Presentation Con Condo dor r Pr Pressu essure Con e Contr trol ol Parent Company

Representing Constraints datatype con = of ty * ty | /\ of con * con | TRIVIAL infix 4

Case S Studie dies a and P Practica ctical Inte terpreta tatio tions ns o of ISO

COURSE ELEMENTS NGA ARIA I ROTO NGA TIKANGA NGA TIKANGA THEORY PRA PRACTICA CTICAL

Practic Pra ctical al Experi perience ence to He o Help lp You our r Bu Busi siness

Storytelli Storytelling ng in in Infosec Infosec (Draft aft of) a Pr Practical ctical Guide

AA AAV9. 9. L LAM AMP-2B R 2B Reverses M Metabol olic a and Ph Physiologic Mu Multiorgan

Ca Cali liforni ornia: a: Pol olic icie ies, s, Pr Prac acti tices ces an and Preven

DIV IVERSIT ERSITY Y OF OF NU NUCLEAR LEAR POWER ER POL OLIC ICIES IES IN IN EU EUROP

Validation of the Interface to the Routing System (I2RS) intermediate talk Kerem Saka June 15,

Networking and Protocol Architectures Examples ITS323: Introduction to Data Communications

Routing and Switching End-to-end delivery on layer 3 in TCP/IP terms Network Layer Primary

CS 457 Lecture 12 Routing Fall 2011 IP Address and 24-bit Subnet Mask Address 12 34

On Under-Determined Dynamical Systems Oded Maler CNRS - VERIMAG Grenoble, France EMSOFT 2011

Radio-Activated Water (RAW) Systems RAW Exchange System Preliminary Design In-Process Stakeholder

CSE 6350 File and Storage System Infrastructure in Data centers Supporting Internet-wide Services

Create your own type system in 1 hour Michael Ernst University of Washington

CON MI NE CON MI NE CON MI NE CON MI NE CLOSURE & RECLAMATI ON CLOSURE & RECLAMATI ON