Dynamically checking type-correctness of whole programs (work newly - - PowerPoint PPT Presentation

dynamically checking type correctness of whole programs
SMART_READER_LITE
LIVE PREVIEW

Dynamically checking type-correctness of whole programs (work newly - - PowerPoint PPT Presentation

Dynamically checking type-correctness of whole programs (work newly in-progress). Stephen Kell stephen.kell@cl.cam.ac.uk Computer Laboratory University of Cambridge libcrunch . . . p.1/22 Wanted (naive version): check this! if (obj


slide-1
SLIDE 1

Dynamically checking type-correctness of whole programs

(work newly in-progress). Stephen Kell

stephen.kell@cl.cam.ac.uk

Computer Laboratory University of Cambridge

  • libcrunch. . . – p.1/22
slide-2
SLIDE 2

Wanted (naive version): check this!

if (obj−>type == OBJ COMMIT) { if (process commit(walker, (struct commit ∗)obj)) return −1; return 0; }

  • libcrunch. . . – p.2/22
slide-3
SLIDE 3

Wanted (naive version): check this!

if (obj−>type == OBJ COMMIT) { if (process commit(walker, (struct commit ∗)obj)) return −1; տ ր return 0; CHECK this } (at run time)

  • libcrunch. . . – p.2/22
slide-4
SLIDE 4

Wanted (naive version): check this!

if (obj−>type == OBJ COMMIT) { if (process commit(walker, (struct commit ∗)obj)) return −1; տ ր return 0; CHECK this } (at run time)

But also wanted:

binary compatible source compatible reasonable performance avoid being C-specific!*

* mostly...

  • libcrunch. . . – p.2/22
slide-5
SLIDE 5

This talk in one slide I will describe libcrunch, which is

an infrastructure for run-time type checking encodes type checks as assertions no guarantee of “safety” (but...) support idiomatic unsafe code checks inserted by per-language front-ends no binary interface changes no source changes, usually*

(* but sometimes out-of-band guidance helps)

  • libcrunch. . . – p.3/22
slide-6
SLIDE 6

Introducing libcrunch The user’s view:

$ crunchcc -o myprog ...

# + other front-ends

$ ./myprog

# runs normally

$ LD PRELOAD=libcrunch.so ./myprog # does checks

where

myprog contains type assertions (we’ll see how) normally “disabled” enabled when libcrunch is linked in compiler [wrapper] inserts assertions automatically

  • libcrunch. . . – p.4/22
slide-7
SLIDE 7

What is run-time type checking? Check every program operation is “type-correct”, i.e.

program state is a collection of stored values ... allocated as instances of some “data type” data types signify meaning

  • perations consume and produce stored values...

More precise definition wanted...

for C, plan to use Cerberus to create formal definition

  • libcrunch. . . – p.5/22
slide-8
SLIDE 8

What checks are we interested in? Recall the example:

if (obj−>type == OBJ COMMIT) { if (process commit(walker, (struct commit ∗)obj)) return −1; return 0; }

Primitive errors are not our concern

even C compilers check primitive type-correctness

First-order and up

all about pointers first cut: check casts (& implicit strengthenings) in C

  • libcrunch. . . – p.6/22
slide-9
SLIDE 9

How it works, in a nutshell

if (obj−>type == OBJ COMMIT) { if (process commit(walker, (struct commit ∗)obj)) return −1; return 0; }

  • libcrunch. . . – p.7/22
slide-10
SLIDE 10

How it works, in a nutshell

if (obj−>type == OBJ COMMIT) { if (process commit(walker, (assert( is a (obj, ” struct commit”)), // or something like this (struct commit ∗)obj))) return −1; return 0; }

  • libcrunch. . . – p.7/22
slide-11
SLIDE 11

How it works, in a nutshell

if (obj−>type == OBJ COMMIT) { if (process commit(walker, (assert( is a (obj, ” struct commit”)), // or something like this (struct commit ∗)obj))) return −1; return 0; }

To make this work, we need:

type information on every allocation in program efficient run-time representation of types fast

is a function

something to write these assertions for us

  • libcrunch. . . – p.7/22
slide-12
SLIDE 12

Idealised view of libcrunch operation

.c

deployed binaries (with data-type assertions)

.f /lib/ libxyz.so .cc

debugging information (with allocation site information)

/bin/foo /bin/ .debug/ foo .java /lib/ .debug/ libxyz.so

precompute unique data types

/bin/ .uniqtyp/ foo.so

load, link and run (ld.so) program image

__is_a libcrunch .so uniqtypes heap_index

0xdeadbeef, “Widget”? true

  • libcrunch. . . – p.8/22
slide-13
SLIDE 13

Type info for each allocation Type info for allocation is reasonable because

... to allocate, you need a size three kinds of allocations: static, stack, heap assume all heap allocators are instrumented...

Assume we have debug info

handles stack and static cases

  • libcrunch. . . – p.9/22
slide-14
SLIDE 14

What happens at run time?

program image __is_a uniqtypes heap_index

__is_a(0xdeadbeec, “Widget”)? lookup(0xdeadbeec) allocsite: 0x8901234,

  • ffset: 0xc

true

find( &__uniqtype_Window, &__uniqtype_Widget, 0xc) found

allocsites

lookup(0x8901234) &__uniqtype_Window

libdl

lookup(“Widget”) &__uniqtype_Widget

  • libcrunch. . . – p.10/22
slide-15
SLIDE 15

Looking up object metadata (1) Recall: need info about an arbitrary object’s allocation

... given an arbitrary pointer

Stack case

walk the stack + use debug info for locals/args

Static case

use debug info

Heap case

hard! might be an interior pointer use clever virtual memory-based data structure (ask me)

  • libcrunch. . . – p.11/22
slide-16
SLIDE 16

is a, containment... A pointer might satisfy is a > 1 way

  • Consider “what is”

&my ellipse &my ellipse.ctr ...

(Subclassing is usually implemented this way.)

  • libcrunch. . . – p.12/22
slide-17
SLIDE 17

Efficiently reifying data types at run time

struct ellipse { double maj, min; struct { double x, y; } ctr ; };

__uniqtype__int 4 “int” __uniqtype__double 8 “double” 2 __uniqtype__anon0x123 16 3 __uniqtype__ellipse 32 “ellipse” 8 8 16 ...

Reify data types uniquely, describing containment

uniqueness → “exact type” test is a pointer comparison

  • is a() is a simple, fast search through this structure
  • libcrunch. . . – p.13/22
slide-18
SLIDE 18

Other flavours of check is a is a nominal check, but we can also write

  • like a – “1-structural” (unwrap one level)
  • phys a – “*-structural” (unwrap maximally)
  • refines – may instantiate padding (`

a la sockaddr)

  • named a – opaque workaround
  • libcrunch. . . – p.14/22
slide-19
SLIDE 19

Notes about memory correctness We (currently) do nothing about memory correctness! E.g.

void f () { int a; int bs[2]; for (int ∗p = &bs[0]; p <= 2; ++p) { /∗ ... ∗/ } }

bug-finding, not verification, not security... faster! avoid per-pointer (cf. per-object) metadata most memory-incorrect programs are type-incorrect... could “force a cast” after pointer arithmetic

SoftBound + CETS do a pretty good job

we could replicate them...

  • libcrunch. . . – p.15/22
slide-20
SLIDE 20

Recap What we’ve just seen is

a runtime system for evaluating type assertions fast (biggest slowdown seen 20%; often <10%) (by design) flexible a “whole program” language-neutral design binary compatible

What about source compatibility?

  • libcrunch. . . – p.16/22
slide-21
SLIDE 21

libcrunch prototype: C front-end Who inserts the assertions?

instrumentation: “one assertion per pointer cast” analysis: “what data type is being malloc()’d?” ... guess from use of sizeof

source tree main.c widget.c util.c ... main.i .allocs widget.i .allocs util.i .allocs ...

dump allocation sites (dumpallocs) instrument pointer casts

CIL-based compiler front-end

  • libcrunch. . . – p.17/22
slide-22
SLIDE 22

Complications (1) With metadata

dynamic loading (merge uniqtypes) non-standard alloc functions (explicit support)

With compilers (currently false pos/negs)

address-taken temporaries (fix compiler for debug info) varargs actuals alloca()

+ assert() usually isn’t quite what you want...

  • libcrunch. . . – p.18/22
slide-23
SLIDE 23

Complications (2) With the C front end (false pos or “intervention required”)

very weird uses of sizeof weird avoidance of sizeof char special case

  • bject re-use

unions (but mostly doable! three cases; ask me) some cases of multiple indirection cause false pos

  • libcrunch. . . – p.19/22
slide-24
SLIDE 24

Brutal honesty moment: a real false positive

void sort eight special (void ∗∗pt){ void ∗tt [8]; register int i ; for( i=0;i<8;i++)tt [ i]=pt[ i ]; for( i=XUP;i<=TUP;i++){pt[i]=tt[2∗i]; pt[OPP DIR(i)]=tt[2∗i+1];} }

Client then does (making libcrunch print a warning)

neighbor = (int ∗∗)calloc(NDIRS, sizeof(int ∗)); /∗ ... ∗/ sort eight special ((void ∗∗) neighbor );

Question: is this valid C?

  • libcrunch. . . – p.20/22
slide-25
SLIDE 25

What’s in it for REMS Check “agreement” between libcrunch and cerberus

inclusion, for the relevant subset of complaints

Tool for exploring behaviour of real programs

good at turning up “dodgy” code (oft also “correct”!)

Representative of a wider set of tools...

insight for bridging between source and run-time worlds linking tie-in...

  • libcrunch. . . – p.21/22
slide-26
SLIDE 26

Recap, conclusions We’ve seen

a runtime infrastructure for fast checking a prototype C front-end

Remaining challenges for the run-time part:

finish the paper... multi-language story support more complex specifications (“types”)

Code is here: https://github.com/stephenrkell/ Thanks for listening. Questions?

  • libcrunch. . . – p.22/22