Weird machines: a model for code-reuse attacks Sergey Bratus - - PowerPoint PPT Presentation

▶

Mar 20, 2024 200 likes •489 views

Weird machines: a model for code-reuse attacks Sergey Bratus Rebecca Shapiro Anna Shubina Dartmouth T rust Lab Outline Code re-use: unexpected computation, programming models Containing computation: Coarse intent-based ABI-level

SLIDE 1

Weird machines: a model for code-reuse attacks

Sergey Bratus Rebecca Shapiro Anna Shubina Dartmouth T rust Lab

SLIDE 2

Outline

Code re-use: unexpected computation, programming models Containing computation: Coarse intent-based ABI-level semantics/ region-describing types LangSec: co-design of data & code, via constrained input handlers & input languages

SLIDE 3

T erminology

"Code {re,ab}use" is unexpected computation Classes of attacks are more: they are unexpected programming models Essence of code reuse: code becomes part  

f an emergent programming model

SLIDE 4

Input data is the program

Strings are programs for regexps (DFAs) T ape is the program for T uring machines "Everything is an interpreter" (Greg Morrisett) "Any complex enough input is indistinguishable from bytecode"

SLIDE 5

Invisible machines: stack

Standard function prologues & epilogues are an automaton distributed through code. data fragments on stack are its programs implements control flow graph Aleph1 > Solar Designer > Newsham > gera > Nergal > ... Return-oriented Programming

SLIDE 6

Invisible machines: heap

Heap management code is a machine, heap metadata its programs "Once upon a free" (Phrack 57:8),   "Vudo malloc tricks" (Phrack 58:9) ISA: aa4bmo, chunk->flink->blink = chunk->blink Configured via a series of mallocs: "Heap Feng- shui" (Sotirov 2007), ..., starvation-based machines (Gorenc et al. Recon.cx 2015)

SLIDE 7

Invisible machines: signals

Sigreturn-oriented programming (Bosman & Bos, 2014) "portable shellcode" via sigreturn structs Counterfeit OO-oriented (COOP , 2015) "Interrupt-oriented programming" (T an et al, 2014) "bugdoor" via nesting MSP430 interrupts; fixed- entry, timed-exit "un-gadgets"

SLIDE 8

Symbol-related machines

Dynamic linker (cf. Nergal's RTLD gadget) Ld.so relocation (Shapiro et al, 2013; cf. LOCREATE) ELF relocation entries are T .-c. "bytecode" DWARF exception handler (helpfully a part of most processes) is T .-c. (Oakley et al, 2012)

Diff. between execve() & ld.so:

"All you need is GOT" (Bangert et al., 29c3)

SLIDE 9

The weirdest machine (possibly)

x86 MMU is T uring-complete on GDT+IDT +TSS+Page T ables (Bangert et al., 2013) Arbitrary computation can be compiled in a combinations of these tables No instruction is successfully dispatched #PF & #DF alternate, acting as clock cycles

SLIDE 10

The "weird machine" upshot

Code re-use/code abuse is possible whenever (meta)data guides code into actions Code re-use likely has an emergent programming model associated with it (a WM) data to drive it need not be ill-formed or corrupt memory

SLIDE 11

A verification problem

SLIDE 12

Ab Ovo

Proving correctness from axioms, by deductive construction

Cf. with construction
f types ~ proofs ~

programs

SLIDE 13

P { Q } R

Precondition Result Code

SLIDE 14

The root of weirdness?

Assume P { Q } R holds If P' is not quite right, what will P' do under Q?

P

SLIDE 15

The root of weirdness?

What can we make "correct" Q compute   by varying P it wasn't verified for? What is "∆R" given "∆P" for a Q? ∆P ∆P

P

∆R ∆R

SLIDE 16

Proof-carrying code FTW?

"Weird machines in PCC", Vanegue @ 1st IEEE LangSec S&P Workshop, 2014 PCC doesn't capture additional instructions a machine may execute ("divergent machines") Proof-carrying code can execute untrusted computations not captured by proofs

SLIDE 17

A hypothesis

We need "Differential computability": how to easily reason about   "∆R" given "∆P" for a Q We program not with statements {Q} but, implicitly, with tuples P {Q} - but we rarely capture P explicitly. Hence bugs & WMs.

SLIDE 18

Unforeseen preconditions

The "correct" P is rarely obvious e.g. "well formed" =/=> safe (ELF , MMU) Parser differentials ("master key", X.509) P influenced by opinion & idea/model of a system P can't reflect not-yet-discovered threats or state P may be dependent on composition effects!

SLIDE 19

Constraining Q

If Q is sufficiently "constrained", P doesn't have to be so large E.g.: P is "input is a formal language of class X" Question: how can we usefully characterize the power of Q? beyond the Chomsky   hierarchy of recognizers

Languages Acceptors

SLIDE 20

Coarse types for code & data intents

Control flow enforcement (not quite CFI :) ) ELFbac: Sections are types (with very coarse semantics by data access & flow) "Gostak semantics" (The Gostak distims the doshes) Dependent typing to enforce intended use of data Range dependencies, intent by range

SLIDE 21

Beyond address ranges

A code section's intended accesses are its type "You are what you work with/operate on"

SLIDE 22

Beyond address ranges

A code section's intended accesses are its type "You are what you work with/operate on"

SSL initialization SSL libpng app logic SSL keys Input buffer Output buffer

RW R RW R W RW

SLIDE 23

LangSec approach to input

Since all input data are programs driving the code, construct input-handing as verifiable recognizer automata Requires regular or context-free languages to avoid undecidability (e.g., in verifying parser equivalence) Verifying input-handlers: big payoff, but underused? Not all bugs are parser bugs, but latest biggest ones sure were! (Heartbleed, GnuTLS Hello, BERserk, ...)

SLIDE 24

More weird machines?

SLIDE 25

Code as a "contour/circuit" with a characteristic "frequency response"?

How code reacts to periodically injected failure? Systems: resource starvation WMs Networks: packet loss and/or delay What new behavior patterns can be produced? Protocol implementations exposed to induced periodic packet loss/delay

SLIDE 26

Periodic packet drop vs OpenVPN

DES-CBC RC2-CBC AES256-CBC Blowfish-CBC

SLIDE 27

Thank you

IEEE Language-theoretic Security Workshop (LangSec SPW) co-located with IEEE S&P Symposium (San Jose) http://spw14.langsec.org http://spw15.langsec.org

Weird machines: a model for code-reuse attacks

Outline

T erminology

"Code {re,ab}use" is unexpected computation Classes of attacks are more: they are unexpected programming models Essence of code reuse: code becomes part

Input data is the program

Strings are programs for regexps (DFAs) T ape is the program for T uring machines "Everything is an interpreter" (Greg Morrisett) "Any complex enough input is indistinguishable from bytecode"

Invisible machines: stack

Invisible machines: heap

Invisible machines: signals

Symbol-related machines

The weirdest machine (possibly)

x86 MMU is T uring-complete on GDT+IDT +TSS+Page T ables (Bangert et al., 2013) Arbitrary computation can be compiled in a combinations of these tables No instruction is successfully dispatched #PF & #DF alternate, acting as clock cycles

The "weird machine" upshot

Code re-use/code abuse is possible whenever (meta)data guides code into actions Code re-use likely has an emergent programming model associated with it (a WM) data to drive it need not be ill-formed or corrupt memory

A verification problem

Ab Ovo

Proving correctness from axioms, by deductive construction

programs

P { Q } R

Precondition Result Code

The root of weirdness?

P

The root of weirdness?

What can we make "correct" Q compute by varying P it wasn't verified for? What is "∆R" given "∆P" for a Q? ∆P ∆P

P

∆R ∆R

Proof-carrying code FTW?

"Weird machines in PCC", Vanegue @ 1st IEEE LangSec S&P Workshop, 2014 PCC doesn't capture additional instructions a machine may execute ("divergent machines") Proof-carrying code can execute untrusted computations not captured by proofs

A hypothesis

We need "Differential computability": how to easily reason about "∆R" given "∆P" for a Q We program not with statements {Q} but, implicitly, with tuples P {Q} - but we rarely capture P explicitly. Hence bugs & WMs.

Unforeseen preconditions

Constraining Q

Coarse types for code & data intents

Beyond address ranges

A code section's intended accesses are its type "You are what you work with/operate on"

Beyond address ranges

A code section's intended accesses are its type "You are what you work with/operate on"

LangSec approach to input

More weird machines?

Code as a "contour/circuit" with a characteristic "frequency response"?

How code reacts to periodically injected failure? Systems: resource starvation WMs Networks: packet loss and/or delay What new behavior patterns can be produced? Protocol implementations exposed to induced periodic packet loss/delay

Periodic packet drop vs OpenVPN

Thank you

IEEE Language-theoretic Security Workshop (LangSec SPW) co-located with IEEE S&P Symposium (San Jose) http://spw14.langsec.org http://spw15.langsec.org

"Code {re,ab}use" is unexpected computation Classes of attacks are more: they are unexpected programming models Essence of code reuse: code becomes part  

What can we make "correct" Q compute   by varying P it wasn't verified for? What is "∆R" given "∆P" for a Q? ∆P ∆P

We need "Differential computability": how to easily reason about   "∆R" given "∆P" for a Q We program not with statements {Q} but, implicitly, with tuples P {Q} - but we rarely capture P explicitly. Hence bugs & WMs.