Formal Verification of x86 Machine-Code Programs Computer - - PowerPoint PPT Presentation

formal verification of x86 machine code programs
SMART_READER_LITE
LIVE PREVIEW

Formal Verification of x86 Machine-Code Programs Computer - - PowerPoint PPT Presentation

Formal Verification of x86 Machine-Code Programs Computer Architecture and Program Analysis Shilpi Goel shigoel@cs.utexas.edu Department of Computer Science The University of Texas at Austin Software and Reliability Can we rely on our


slide-1
SLIDE 1

Shilpi Goel shigoel@cs.utexas.edu Department of Computer Science The University of Texas at Austin

Computer Architecture and Program Analysis

Formal Verification of x86 Machine-Code Programs

slide-2
SLIDE 2

Software and Reliability

2

Can we rely on our software systems? Recent example of a serious bug: CVE-2016-5195 or “Dirty COW”

  • Privilege escalation vulnerability in Linux
  • E.g.: allowed a user to write to files intended to be read only
  • Copy-on-Write (COW) breakage of private read-only memory mappings
  • Existed since around v2.6.22 (2007) and was fixed on Oct 18, 2016
slide-3
SLIDE 3

Formal Verification of Software: Example 1

3

Software Formal Verification: proving or disproving that the implementation of a program meets its specification using mathematical techniques

slide-4
SLIDE 4

Formal Verification of Software: Example 1

3

Software Formal Verification: proving or disproving that the implementation of a program meets its specification using mathematical techniques Suppose you needed to count the number of 1s in the binary representation of a natural number (population count). Specification:

popcountSpec(v): [v: natural number] if v <= 0 then return 0 else lsb = v & 1 v = v >> 1 return (lsb + popcountSpec(v)) endif

slide-5
SLIDE 5

Formal Verification of Software: Example 1

4

Source: Sean Anderson’s Bit-Twiddling Hacks

popcountSpec(v): [v: natural number] if v <= 0 then return 0 else lsb = v & 1 v = v >> 1 return (lsb + popcountSpec(v)) endif

Specification:

slide-6
SLIDE 6

Formal Verification of Software: Example 1

4

Implementation:

int popcount_32 (unsigned int v) { v = v - ((v >> 1) & 0x55555555); v = (v & 0x33333333) + ((v >> 2) & 0x33333333); v = ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24; return(v); }

Source: Sean Anderson’s Bit-Twiddling Hacks

popcountSpec(v): [v: natural number] if v <= 0 then return 0 else lsb = v & 1 v = v >> 1 return (lsb + popcountSpec(v)) endif

Specification:

slide-7
SLIDE 7

Formal Verification of Software: Example 1

4

Implementation:

int popcount_32 (unsigned int v) { v = v - ((v >> 1) & 0x55555555); v = (v & 0x33333333) + ((v >> 2) & 0x33333333); v = ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24; return(v); }

Source: Sean Anderson’s Bit-Twiddling Hacks

popcountSpec(v): [v: natural number] if v <= 0 then return 0 else lsb = v & 1 v = v >> 1 return (lsb + popcountSpec(v)) endif

Specification: Do the specification and implementation behave the same way for all inputs?

slide-8
SLIDE 8

Formal Verification of Software: Example 2

5

Suppose you needed to check if a given natural number is a power of 2. Specification: isPowerOfTwoSpec(x): [x: natural number] if x == 0 then return 0 else if x == 1 then return 1 else if remainder(x,2) == 0 then return isPowerOfTwoSpec(x/2) else return 0 endif endif endif

slide-9
SLIDE 9

Formal Verification of Software: Example 2

6

Can you trust your specification?

Source: Sean Anderson’s Bit-Twiddling Hacks

slide-10
SLIDE 10

Formal Verification of Software: Example 2

6

Can you trust your specification? Correctness of isPowerOfTwoSpec:

  • 1. If isPowerOfTwoSpec(v) returns 1, then there exists a natural

number n such that v = 2n.

  • 2. If v = 2n, where n is a natural number, then isPowerOfTwoSpec(v)

returns 1.

Source: Sean Anderson’s Bit-Twiddling Hacks

slide-11
SLIDE 11

Formal Verification of Software: Example 2

6

Can you trust your specification? Correctness of isPowerOfTwoSpec:

  • 1. If isPowerOfTwoSpec(v) returns 1, then there exists a natural

number n such that v = 2n.

  • 2. If v = 2n, where n is a natural number, then isPowerOfTwoSpec(v)

returns 1. Implementation:

bool powerOfTwo (long unsigned int v) { bool f; f = v && !(v & (v - 1)); return f; }

Source: Sean Anderson’s Bit-Twiddling Hacks

slide-12
SLIDE 12

Formal Verification of Software: Example 2

6

Can you trust your specification? Correctness of isPowerOfTwoSpec:

  • 1. If isPowerOfTwoSpec(v) returns 1, then there exists a natural

number n such that v = 2n.

  • 2. If v = 2n, where n is a natural number, then isPowerOfTwoSpec(v)

returns 1. Implementation:

bool powerOfTwo (long unsigned int v) { bool f; f = v && !(v & (v - 1)); return f; }

Do the specification and implementation behave the same way for all inputs?

Source: Sean Anderson’s Bit-Twiddling Hacks

slide-13
SLIDE 13

Inspection of a Program’s Behavior

  • Testing:

xExhaustive analysis is infeasible

  • Formal Verification:

✓ Wide variety of techniques

  • Lightweight: e.g., checking if array indices are within bounds
  • Heavyweight: e.g., proving functional correctness

7

slide-14
SLIDE 14

8

Functional Correctness: RAX = popcountSpec(v) specification function popcountSpec(v): [v: unsigned int] if v <= 0 then return 0 else lsb = v & 1 v = v >> 1 return (lsb + popcountSpec(v)) endif

popcount_64:

89 fa mov %edi,%edx 89 d1 mov %edx,%ecx d1 e9 shr %ecx 81 e1 55 55 55 55 and $0x55555555,%ecx 29 ca sub %ecx,%edx 89 d0 mov %edx,%eax c1 ea 02 shr $0x2,%edx 25 33 33 33 33 and $0x33333333,%eax 81 e2 33 33 33 33 and $0x33333333,%edx 01 c2 add %eax,%edx 89 d0 mov %edx,%eax c1 e8 04 shr $0x4,%eax 01 c2 add %eax,%edx 48 89 f8 mov %rdi,%rax 48 c1 e8 20 shr $0x20,%rax 81 e2 0f 0f 0f 0f and $0xf0f0f0f,%edx 89 c1 mov %eax,%ecx d1 e9 shr %ecx 81 e1 55 55 55 55 and $0x55555555,%ecx 29 c8 sub %ecx,%eax 89 c1 mov %eax,%ecx c1 e8 02 shr $0x2,%eax 81 e1 33 33 33 33 and $0x33333333,%ecx 25 33 33 33 33 and $0x33333333,%eax 01 c8 add %ecx,%eax 89 c1 mov %eax,%ecx c1 e9 04 shr $0x4,%ecx 01 c8 add %ecx,%eax 25 0f 0f 0f 0f and $0xf0f0f0f,%eax 69 d2 01 01 01 01 imul $0x1010101,%edx,%edx 69 c0 01 01 01 01 imul $0x1010101,%eax,%eax c1 ea 18 shr $0x18,%edx c1 e8 18 shr $0x18,%eax 01 d0 add %edx,%eax c3 retq

Example: Pop-Count Program

slide-15
SLIDE 15

9

Case Study: Pop-Count Program

(defthm x86-popcount-64-symbolic-simulation (implies (and (x86p x86) (equal (model-related-error x86) nil) (unsigned-byte-p 64 n) (equal n (read 'register *rdi* x86)) (equal *popcount-64-program* (read 'memory (address-range (read 'pc x86) (len *popcount-64-program*)) x86))) (equal (read 'register *rax* (x86-run *num-of-steps* x86)) (popcountSpec n))))

slide-16
SLIDE 16

Heavyweight Formal Verification

10

slide-17
SLIDE 17

Heavyweight Formal Verification

10

  • Build a mathematical or formal model of programs
  • Prove theorems about this model in order to establish program properties
slide-18
SLIDE 18

Heavyweight Formal Verification

10

  • Build a mathematical or formal model of programs
  • Prove theorems about this model in order to establish program properties

ISA model

slide-19
SLIDE 19

Heavyweight Formal Verification

10

  • Build a mathematical or formal model of programs
  • Prove theorems about this model in order to establish program properties

ISA model Instruction Set Architecture: interface between hardware and software

  • Defines the machine language
  • Specification of state (registers, memory), machine instructions,

instruction encodings, etc.

slide-20
SLIDE 20

Heavyweight Formal Verification

10

  • Build a mathematical or formal model of programs
  • Prove theorems about this model in order to establish program properties

ISA model Instruction Set Architecture: interface between hardware and software

  • Defines the machine language
  • Specification of state (registers, memory), machine instructions,

instruction encodings, etc.

  • An ISA model specifies the behavior of each machine instruction in

terms of effects made to the processor state.

slide-21
SLIDE 21

Heavyweight Formal Verification

10

  • Build a mathematical or formal model of programs
  • Prove theorems about this model in order to establish program properties

ISA model Instruction Set Architecture: interface between hardware and software

  • Defines the machine language
  • Specification of state (registers, memory), machine instructions,

instruction encodings, etc.

  • An ISA model specifies the behavior of each machine instruction in

terms of effects made to the processor state.

  • All high-level programs compile down to machine-code programs.
  • A program is just a sequence of machine instructions.
slide-22
SLIDE 22

Heavyweight Formal Verification

10

  • Build a mathematical or formal model of programs
  • Prove theorems about this model in order to establish program properties

ISA model Instruction Set Architecture: interface between hardware and software

  • Defines the machine language
  • Specification of state (registers, memory), machine instructions,

instruction encodings, etc.

  • An ISA model specifies the behavior of each machine instruction in

terms of effects made to the processor state.

  • All high-level programs compile down to machine-code programs.
  • A program is just a sequence of machine instructions.
  • We can reason about a program by inspecting the cumulative effects of its

constituent instructions on the machine state.

slide-23
SLIDE 23

Why Not Use Abstract Machine Models?

11

slide-24
SLIDE 24

Why Not Use Abstract Machine Models?

11

slide-25
SLIDE 25

Why x86 Machine-Code Verification?

  • Why not high-level code verification?

xSometimes, high-level code is unavailable (e.g., malware) xHigh-level verification frameworks do not address compiler bugs

✓ Verified/verifying compilers can help

xBut these compilers typically generate inefficient code

xNeed to build verification frameworks for many high-level languages

  • Why x86?

✓ x86 is in widespread use

12

slide-26
SLIDE 26

Overview

Goal: Specify and verify properties of x86 programs

  • E.g., correctness w.r.t. behavior, security, resource usage, etc.

13

slide-27
SLIDE 27

Overview

Goal: Specify and verify properties of x86 programs

  • E.g., correctness w.r.t. behavior, security, resource usage, etc.

13

  • Program property: statement about a program’s behavior
  • One state, set of states, relationship between a set of final & initial states

x860 x861

specify: in terms of states of computation

slide-28
SLIDE 28

Overview

Goal: Specify and verify properties of x86 programs

  • E.g., correctness w.r.t. behavior, security, resource usage, etc.

13

  • Program property: statement about a program’s behavior
  • One state, set of states, relationship between a set of final & initial states

x860 x861

specify: in terms of states of computation

  • Program’s computation: how the execution of each instruction

transforms one state to another

slide-29
SLIDE 29

Overview

Goal: Specify and verify properties of x86 programs

  • E.g., correctness w.r.t. behavior, security, resource usage, etc.

13

  • Program property: statement about a program’s behavior
  • One state, set of states, relationship between a set of final & initial states

x860 x861

specify: in terms of states of computation

verify: reason about symbolic executions

  • Program’s computation: how the execution of each instruction

transforms one state to another

  • Symbolic Executions: a final (or next) x86 state is described in terms of

symbolic updates made to the initial x86 state

  • Allows consideration of many, if not all, possible executions at once
slide-30
SLIDE 30

Formal Tool Used: ACL2 Theorem-Proving System

14

  • ACL2: A Computational Logic for Applicative Common Lisp ︎
  • Programming Language
  • Mathematical Logic
  • ︎Mechanical Theorem Prover
  • See ACL2 Home Page for more details.
  • Extensive documentation!
  • ACL2 Research Group located at GDC 7S
slide-31
SLIDE 31

x86 ISA Model

15

slide-32
SLIDE 32

x86 ISA Model

Interpreter-Style Operational Semantics: x86 ISA model is a machine- code interpreter written in ACL2’s formal logic

  • x86 State: specifies the components of the ISA
  • Run Function: takes n steps or terminates early if an error occurs
  • Step Function: fetches, decodes, and executes one instruction
  • Instruction Semantic Functions: specifies instructions’ behavior

16

x860 x861 x86k … Step 1 A Run of the x86 Interpreter that executes k instructions Step 2 Step k

slide-33
SLIDE 33

Run Function

Recursively defined interpreter that specifies the x86 model run(n, x86): if n == 0 then return x86 else if model-related error encountered then return x86 else run(n - 1, step(x86)) end if end if

17

slide-34
SLIDE 34

Step Function

State-transition function that corresponds to the execution of a single x86 instruction step(x86): pc = rip(x86) [prefixes, opcode, ... , imm] = Fetch-and-Decode(pc, x86) case opcode: #x00 -> add-semantic-fn(prefixes, ... , imm, x86) ... ... #xFF -> inc-semantic-fn(prefixes, ... , imm, x86)

18

slide-35
SLIDE 35

Instruction Semantic Functions

  • A semantic function describes the effects of executing an instruction.
  • Input: x86 state and decoded parts of the instruction
  • Output: next x86 state
  • Every instruction has its own semantic function.

19

add-semantic-fn(prefixes, ... , imm, x86):

  • perand1 = getOperand1(prefixes, ... , imm, x86)
  • perand2 = getOperand2(prefixes, ... , imm, x86)

resultSum = fix(operand1 + operand2, ...) resultFlags = computeFlags(operand1, operand2, result, x86) x86 = updateState(resultSum, dst, resultFlags) return x86

slide-36
SLIDE 36

Obtaining the x86 ISA Specification

20

~3000 pages ~3400 pages

__asm__ volatile ("stc\n\t" // Set CF. "mov $0, %%eax\n\t" // Set EAX = 0. "mov $0, %%ebx\n\t" // Set EBX = 0. "mov $0, %%ecx\n\t" // Set ECX = 0. "mov %4, %%ecx\n\t" // Set CL = rotate_by. "mov %3, %%edx\n\t" // Set EDX = old_cf = 1. "mov %2, %%eax\n\t" // Set EAX = num. "rcl %%cl, %%al\n\t" // Rotate AL by CL. "cmovb %%edx, %%ebx\n\t" // Set EBX = old_cf if CF = 1. // Otherwise, EBX = 0. "mov %%eax, %0\n\t" // Set res = EAX. "mov %%ebx, %1\n\t" // Set cf = EBX. : "=g"(res), "=g"(cf) : "g"(num), "g"(old_cf), "g"(rotate_by) : "rax", "rbx", "rcx", "rdx");

Running tests on x86 machines

slide-37
SLIDE 37

x86 State

21

Figure 3-2. 64-Bit Mode Execution Environment

2^64 -1 Sixteen 64-bit 64-bits 64-bits General-Purpose Registers Segment Registers RFLAGS Register RIP (Instruction Pointer Register) Address Space Six 16-bit Registers Registers Eight 80-bit Registers Floating-Point Data Registers Eight 64-bit Registers MMX Registers XMM Registers Sixteen 128-bit Registers 16 bits Control Register 16 bits Status Register 64 bits FPU Instruction Pointer Register 64 bits FPU Data (Operand) Pointer Register FPU Registers MMX Registers XMM Registers 32-bits MXCSR Register Opcode Register (11-bits) Basic Program Execution Registers 16 bits Tag Register

Focus: Intel’s 64-bit mode

x860 x861

Source: Intel Manuals

slide-38
SLIDE 38

Figure 2-2. System-Level Registers and Data Structures in IA-32e Mode

Local Descriptor Table (LDT) CR1 CR2 CR3 CR4 CR0 Global Descriptor Table (GDT) Interrupt Descriptor Table (IDT) IDTR GDTR Interrupt Gate Trap Gate LDT Desc. TSS Desc. Code Stack Code Stack Code Stack Current TSS Code Stack

  • Interr. Handler

Interrupt Handler Exception Handler Protected Procedure TR Call-Gate Segment Selector Linear Address PML4 PML4. Linear Address Space Linear Addr.

  • Seg. Desc.

Segment Sel. Code, Data or Stack Segment (Base =0) Interrupt Vector

  • Seg. Desc.
  • Seg. Desc.

NULL Call Gate Task-State Segment (TSS)

  • Seg. Desc.

NULL NULL Segment Selector Linear Address Task Register CR3* Page LDTR This page mapping example is for 4-KByte pages and 40-bit physical address size. Register

*Physical Address

Physical Address CR8 Control Register RFLAGS Offset Table Directory Page Table Entry Physical Addr. Page Tbl Entry Page Dir.

  • Pg. Dir. Ptr.

PML4 Dir. Pointer

  • Pg. Dir.

Entry Interrupt Gate IST XCR0 (XFEM)

x86 State

21

Figure 3-2. 64-Bit Mode Execution Environment

2^64 -1 Sixteen 64-bit 64-bits 64-bits General-Purpose Registers Segment Registers RFLAGS Register RIP (Instruction Pointer Register) Address Space Six 16-bit Registers Registers Eight 80-bit Registers Floating-Point Data Registers Eight 64-bit Registers MMX Registers XMM Registers Sixteen 128-bit Registers 16 bits Control Register 16 bits Status Register 64 bits FPU Instruction Pointer Register 64 bits FPU Data (Operand) Pointer Register FPU Registers MMX Registers XMM Registers 32-bits MXCSR Register Opcode Register (11-bits) Basic Program Execution Registers 16 bits Tag Register

Focus: Intel’s 64-bit mode

x860 x861

Source: Intel Manuals

slide-39
SLIDE 39

Model Validation

How can we know that our model faithfully represents the x86 ISA? Validate the model to increase trust in the applicability of formal analysis

22

slide-40
SLIDE 40

Symbolic Execution

23

slide-41
SLIDE 41

Supporting Symbolic Execution

24

Rules (theorems) describing interactions between these reads and writes to the x86 state enable symbolic execution of programs.

add %edi, %eax je 0x400304

  • 1. read instruction from mem
  • 2. read operands
  • 3. write sum to eax
  • 4. write new value to flags
  • 5. write new value to pc
  • 1. read instruction from mem
  • 2. read flags
  • 3. write new value to pc
slide-42
SLIDE 42

25

y

memory non-interference Program Order

i j

Read-over-Write Theorem #1

slide-43
SLIDE 43

25

y

Wi(x) memory non-interference Program Order

x

i j

Read-over-Write Theorem #1

slide-44
SLIDE 44

25

y

Wi(x) Rj: y memory non-interference Program Order

x

i j

Read-over-Write Theorem #1

slide-45
SLIDE 45

26

memory

i

  • verlap

Read-over-Write Theorem #2

Program Order

slide-46
SLIDE 46

26

Wi(x) memory

x

i

  • verlap

Read-over-Write Theorem #2

Program Order

slide-47
SLIDE 47

26

Wi(x) Ri: x memory

x

i

  • verlap

Read-over-Write Theorem #2

Program Order

slide-48
SLIDE 48

27

memory independent writes commute safely

i j

Program Order

Write-over-Write Theorem #1

slide-49
SLIDE 49

27

memory independent writes commute safely Wi(x)

i j

x

Program Order

Write-over-Write Theorem #1

slide-50
SLIDE 50

27

memory independent writes commute safely Wi(x)

i j

x y

Wj(y) Program Order

Write-over-Write Theorem #1

slide-51
SLIDE 51

27

=

memory independent writes commute safely memory Wi(x)

i j

x y

Wj(y)

i j

Program Order

Write-over-Write Theorem #1

Program Order

slide-52
SLIDE 52

27

=

memory independent writes commute safely memory Wi(x)

i j

x y

Wj(y)

i j

Wj(y)

y

Program Order

Write-over-Write Theorem #1

Program Order

slide-53
SLIDE 53

27

=

memory independent writes commute safely memory Wi(x)

i j

x y

Wj(y)

i j

Wj(y) Wi(x)

x y

Program Order

Write-over-Write Theorem #1

Program Order

slide-54
SLIDE 54

28

memory visibility of writes

i

Write-over-Write Theorem #2

Program Order

slide-55
SLIDE 55

28

memory visibility of writes Wi(x)

i

x

Write-over-Write Theorem #2

Program Order

slide-56
SLIDE 56

28

memory visibility of writes Wi(x)

i

Wi(y)

y

Write-over-Write Theorem #2

Program Order

slide-57
SLIDE 57

28

=

memory visibility of writes memory Wi(x)

i

Wi(y)

i

y

Write-over-Write Theorem #2

Program Order Program Order

slide-58
SLIDE 58

28

=

memory visibility of writes memory Wi(x)

i

Wi(y)

i

Wi(y)

y y

Write-over-Write Theorem #2

Program Order Program Order

slide-59
SLIDE 59

Symbolic Execution

29

(implies (preconditions loc val x86) (let ((old-rbx (read 'register *rbx* x86)) (old-pc (read 'pc x86))) (equal (x86-run (clk) x86) (write 'register *rax* old-rbx (write 'pc (+ 18 old-pc) (write 'memory loc val x86)))))) These read-over-write and write-over-write lemmas operate on symbolic expressions that describe the program’s behavior. Also, we can project out relevant parts of the resulting state.

slide-60
SLIDE 60

Conclusions

30

slide-61
SLIDE 61

What I Haven’t Talked About Today…

31

  • 1. How to prove theorems using a mechanical theorem prover
  • Useful to reason about both hardware and software
  • Fall’16 Grad-level Course: Programming Languages
  • Spring’17 Grad-level Course: Recursion and Induction
  • 2. Supervisor-mode features of the x86 ISA
  • Useful for developing and analyzing kernel programs
  • An advanced architecture class
  • An OS class
slide-62
SLIDE 62

Opportunities for Future Research

32

Operating System Verification detect reliance on non-portable or undefined behaviors User-friendly Program Analysis automate the discovery of preconditions Multi-process/threaded Program Verification reason about concurrency-related issues Reasoning about the Memory System determine if caches are (mostly) transparent, as intended Firmware Verification formally specify software/hardware interfaces Micro-architecture Verification x86 ISA model serves as a build-to specification

slide-63
SLIDE 63

Resources

33

  • See ACL2 Home Page
  • Talk to people on GDC 7S
  • See some publications

We have exciting research and engineering projects in this area! Please feel free to email if you want to know more.

slide-64
SLIDE 64

Publications

Shilpi Goel, Warren A. Hunt, Jr., and Matt Kaufmann. Abstract Stobjs and Their Application to ISA Modeling. In Proceedings of the ACL2 Workshop 2013, EPTCS 114,

  • pp. 54-69, 2013

Shilpi Goel and Warren A. Hunt, Jr. Automated Code Proofs on a Formal Model of the

  • x86. In Verified Software: Theories, Tools, Experiments (VSTTE’13), volume 8164 of

Lecture Notes in Computer Science, pages 222– 241. Springer Berlin Heidelberg, 2014 Shilpi Goel, Warren A. Hunt, Jr., Matt Kaufmann, and Soumava Ghosh. Simulation and Formal Verification of x86 Machine-Code Programs That Make System Calls. In Proceedings of the 14th Conference on Formal Methods in Computer-Aided Design (FMCAD’14), pages 18:91–98, 2014 Shilpi Goel, Warren A. Hunt, Jr., and Matt Kaufmann. Engineering a Formal, Executable x86 ISA Simulator for Software Verification. In Provably Correct Systems (ProCoS), 2015 Shilpi Goel. Formal Verification of Application and System Programs Based on a Validated x86 ISA Model. Ph.D. Dissertation, The University of Texas at Austin, 2016

34

slide-65
SLIDE 65

[Source Code]

Github

[Documentation]

x86isa in the ACL2+Community Books Manual