Using LLVM to guarantee program integrity Simon Cook Background - - PowerPoint PPT Presentation

β–Ά
using llvm to guarantee program integrity
SMART_READER_LITE
LIVE PREVIEW

Using LLVM to guarantee program integrity Simon Cook Background - - PowerPoint PPT Presentation

Using LLVM to guarantee program integrity Simon Cook Background Compiling for security is becoming increasingly important Finding bugs through AddressSanitizer, MemorySanitizer, etc. Research programs such as LADA Use of


slide-1
SLIDE 1

Using LLVM to guarantee program integrity

Simon Cook

slide-2
SLIDE 2

Background

  • Compiling for security is becoming increasingly important
  • Finding bugs through AddressSanitizer, MemorySanitizer, etc.
  • Research programs such as LADA
  • Use of security-enhancing hardware can added to existing

programs by extending their use in the compiler

slide-3
SLIDE 3

Topics to Discuss

  • Hardware
  • C attributes
  • Clang/Sema, Clang/Codegen
  • LLVM Optimization Tweaks
  • Instruction Lowering/Selection
  • AsmPrinting
  • Creating post-link tools using MC
slide-4
SLIDE 4

What are we trying to protect?

  • Instruction integrity
  • Detection of any modification to program code at runtime
  • Control flow integrity
  • Ensuring that calls/branches only go to known locations and that

return values are correct

  • If either of these are invalid the hardware should trap as

soon as possible

slide-5
SLIDE 5

Encoding Instructions: Hardware

Each instruction becomes dependent on the previous one Given an instruction 𝐽", and internal state 𝑇$, we can produce the encoded instruction 𝐹" and output state 𝑇"

π‘“π‘œπ‘‘π‘π‘’π‘“ , β†’ ,

𝐽" 𝑇$ 𝐹" 𝑇"

𝑒𝑓𝑑𝑝𝑒𝑓 , β†’ ,

𝐹" 𝑇$ 𝐽" 𝑇"

add r0, r1 0xbeef

At run time, the hardware can use the same state, and using the encoded instruction, reproduce the original instruction

0xbeef add r0, r1

slide-6
SLIDE 6

Encoding a Function

lsli $r10, $r2, 2 919a 4000 andi $r13, $r3, 5 5d87 4002 add $r2, $r13, $r10 aa82 0900 jmp $r0 0050 int foo(int x, int y) { return (4*x) + (y&5); }

𝐽" 𝐽. 𝐽/ 𝐽0

lsli 0001 0203 andi 0405 0607 add 0809 0a0b jmp 0c0d 𝑓 , β†’ 𝑓 , β†’ 𝑓 , β†’ 𝑓 , β†’

𝐽" 𝑇$ 𝐹" 𝐽. 𝑇" 𝐹. 𝐽/ 𝑇. 𝐹/ 𝐽0 𝑇/ 𝐹0

slide-7
SLIDE 7

Encoding Branches

; BB#0: movi $r10, 0 809e 4000 bne .LBB0_2, $r4, $r10 e2c6 0100 ; BB#1: mov $r2, $r3 9812 .LBB0_2: jmp $r0 0050 int foo(int x, int y, bool z) { return z ? x : y; }

𝐽" 𝐽. 𝐽/ 𝐽0

𝑓 , β†’ 𝑓 , β†’

𝐽0 𝑇/ 𝐹0 𝐽0 𝑇2 𝐹0

For two cases, this may be solvable, but not for blocks with many direct predecessors

slide-8
SLIDE 8

Encoding Branches

; BB#0: movi $r10, 0 809e 4000 bne .LBB0_2, $r4, $r10 e2c6 0100 _correction_value_ .... ; BB#1: mov $r2, $r3 9812 .LBB0_2: jmp $r0 0050 int foo(int x, int y, bool z) { return z ? x : y; }

𝐽" 𝐽. 𝐷 𝐽/ 𝐽0

𝑓 , β†’ 𝑓 , β†’

𝐽0 𝑇/ 𝐹0 𝐽0 𝐷 𝐹0

𝑓 , β†’ 𝑓 , β†’

𝐽. 𝑇" 𝐹. 𝐷 𝑇" 𝐹3

slide-9
SLIDE 9

Function Calls

int foo(int x) { return bar(x+2); } subi $r1, $r1, 2 4a16 stw [$r1, 0], $r0 4038 addi $r2, $r2, 2 9214 bal bar, $r0 00c2 0000 ldw $r0, [$r1, 0] 0828 addi $r1, $r1, 2 4a14 jmp $r0 0050

𝐽" 𝐽. 𝐽/ 𝐽0 𝐽4 𝐽5 𝐽6

  • Calling bar pushes state to the encoding stack
  • Returning pops this value, so calls can be treated as part of same BB

𝑇0

slide-10
SLIDE 10

Scaling up to an entire program

foo.c bar.c baz.c

slide-11
SLIDE 11

Clang: -mencode-instructions?

Pros

  • Easy to enable, one flag

enables system for entire CU

Cons

  • ABI break, flag required

across entire project

  • Only affects C, assembly still

needs patching

  • Potential concerns about code

size

In the end we decided not to go down this route

slide-12
SLIDE 12

Clang: __attribute__((protected))

Pros

  • Per function granularity
  • Lower cost overhead for

β€œnon-secure” functions

  • ABI change is limited to those

functions it was requested for

Cons

  • Only affects C, assembly still

needs patching

  • Risk of user neglecting to add

attribute to all declarations of a function

slide-13
SLIDE 13

Clang Function Attribute

  • Added as a TypeAttr
  • We want to add error checking as pointers to protected functions are

not the same as to unprotected

  • Extend FunctionType to support having protected as a property
  • For calls, add protected as bit in ExtInfo
  • This is not the same as a different calling convention, as we use

different CCs and want to turn this on independently

  • For CodeGen, we map this down to a LLVM function attribute

β€œprotected”

slide-14
SLIDE 14

int (*__attribute__((protected)))()

  • Function pointers present a challenge
  • We need to know what 𝑇$ the target function is expecting
  • If 𝑇$ based on address of function, we have no problem…
  • … otherwise we need to calculate it
  • Could use same for each function? Defeats security benefits.
  • Calculate all possible call targets? Not necessarily possible.
  • User should know, let’s ask them!
  • Attribute becomes __attribute__((protected("somestring")))
slide-15
SLIDE 15

Changes to Middle-End LLVM

  • None, really…
  • … except one small change to the inliner
  • Avoid inlining secure functions into non-secure
  • Merging non-secure into secure is generally safe
slide-16
SLIDE 16

Instruction Selection

  • Update call target nodes with custom flag field
  • Flag field contains:
  • Bit indicating whether function expects security
  • 16-bit representation of group name

let isCall = 1 in def JAL : Inst_rrr <0x2, 0x9, (outs), (ins i64imm:$ i64imm:$flags flags, GR64:$rD, GR64:$rB), "jal\t $rD, $rB”, [(AAPcall timm timm:$ :$flags flags, GR64:$rD, GR64:$rB)]>;

slide-17
SLIDE 17

Encoding Control Flow I

  • Just before emission, SecurityAnalysisPass:
  • Prepares a function for annotation
  • Builds lists of branches/calls/jump tables
  • Adds placeholders for correction values
  • Generates report on code size impact

===--- CF encoding statistics for 'main' ---=== Bytes added: 10 Words added: 5 NOP gaps added: 3 Enable/Disable insns added: 1

slide-18
SLIDE 18

.debug_secure Record Format

  • Start function:
  • End function:
  • Direct Call:
  • Jump Table:

1 Function Start Address Group 2 Function End Address 6 Call Site Call Target 11 Count Target 1 Target 2

slide-19
SLIDE 19

Encoding Control Flow II

  • AsmPrinterHandler – Adds hooks to assembly printing
  • Used by us for adding labels/emitting encoding at end of module
  • beginInstruction
  • endInstruction
  • beginFunction
  • endFunction
  • endModule
slide-20
SLIDE 20

Resolving Values

  • 1. Reconstruct the control flow graph of all secure functions
  • 2. Assign correction values/𝑇$ to all functions/groups
  • 3. Encode each basic block, noting state of each reloc
  • 4. Validate all values are known
  • 5. Fill in relocations
  • 6. Writeback
slide-21
SLIDE 21

End result

simon@shadowfax$ llvm-objdump -d a.out a.out: file format ELF32-aap Disassembly of section .text: Section has correction values, printing real instructions foo: 8000000: [8f39] 91 9a 40 00 lsli $r10, $r2, 2 8000004: [81ca] 5d 87 40 02 andi $r13, $r3, 5 8000008: [053b] aa 82 09 00 add $r2, $r13, $r10 800000c: [93e4] 00 50 jmp $r0

slide-22
SLIDE 22

Thank you