Checked C Michael Hicks The University of Maryland joint work with - - PowerPoint PPT Presentation

checked c
SMART_READER_LITE
LIVE PREVIEW

Checked C Michael Hicks The University of Maryland joint work with - - PowerPoint PPT Presentation

Checked C Michael Hicks The University of Maryland joint work with David Tarditi (MSR), Andrew Ruef (UMD), Sam Elliott (UW) UM Motivation - Lots of C/C++ code out there. - One open source code indexer (openhub.net) found 8.5 billion lines of C


slide-1
SLIDE 1

Checked C

UM

Michael Hicks The University of Maryland joint work with David Tarditi (MSR), Andrew Ruef (UMD), Sam Elliott (UW)

slide-2
SLIDE 2

Motivation

  • Lots of C/C++ code out there.
  • One open source code indexer (openhub.net) found 8.5

billion lines of C code.

  • Huge investment in existing (often unsafe) code
  • Many approaches “saving C” have been proposed:
  • Static/dynamic analysis (ASAN, Softbound), fuzzing, OS/HW-

level mitigations

  • Rewrite the code in a type-safe language (Java/Rust…)
  • New dialects of C (Cyclone, CCured, Deputy, …)

2

slide-3
SLIDE 3

Checked C

Checked C is another take at making system software written in C more reliable and secure. Approach:

  • Extend C so that type-safe programs can be written in it.
  • A new language risks introducing unnecessary differences.
  • Backward compatible, support incremental

conversion of code to be type-safe.

  • Implement in widely used C compilers (Clang/LLVM)
  • Create automated conversion tools
  • Hypothesis: most source code behaves in a type-safe fashion.
  • Evaluate and get real-world use. Iterate based on experience.

3

https://github.com/Microsoft/checkedc https://github.com/Microsoft/checkedc-clang

slide-4
SLIDE 4

Design

  • Making code type safe
  • Static/dynamic bounds checking for pointers and arrays
  • Initialization of data, type casts, explicit memory management
  • No temporal safety enforcement at present (use GC)
  • Backward compatibility
  • All pointers binary-compatible (no “fat” pointers)
  • Strict extension (parsing) of C
  • Unchecked and checked code can co-exist (e.g., in a function)
  • The former can use checked types in a looser manner
  • Can prove a “blame” property that characterizes isolation
  • Can work with improving hardware designs

4

Unlike Cyclone Like Cyclone

slide-5
SLIDE 5

Clang & LLVM

5

Clang Frontend

LLVM

x86 / x64 / ARM / Other Assembly Analyses Optimizations Code Generators

Checked C's

LLVM IR Generator Analyses

slide-6
SLIDE 6

Pointers

6

T* val = malloc(sizeof(T)); T* vals = calloc(n, sizeof(T)); T vals[N] = { ... }; ptr<T> val = malloc(sizeof(T));

Singleton Pointer Array Pointer

array_ptr<T> vals = calloc(n, sizeof(T)); T vals checked[N] = { ... };

slide-7
SLIDE 7

Bounds Declarations

7

Declaration Invariant

array_ptr<T> p : bounds(l, u) l ≤ p < u array_ptr<T> p : count(n) p ≤ p < p+n array_ptr<T> p : byte_count(n) p ≤ p < (char*)p + n

Expressions in bounds(l, u) must be non-modifying

  • No Assignments or Increments/Decrements
  • No Calls
slide-8
SLIDE 8

NUL Terminated Pointers

8

char* str = calloc(n, sizeof(char)); char str[N] = { ... };

NT Array Pointer

nt_array_ptr<char> str : count(n-1) = calloc(n, sizeof(char)); char str checked[N+1] = { …, ‘\0’ };

nt_array_ptr<T> p : bounds(l, u) l ≤ p ≤ u && ∃c ≥ u. *c == ‘\0’ can read *u

  • r write ‘\0’ to it
slide-9
SLIDE 9

Bounds Expansion

9

size_t my_strlcpy( nt_array_ptr<char> dst: count(dst_sz - 1), nt_array_ptr<char> src, size_t dst_sz) { size_t i = 0; nt_array_ptr<char> s : count(i) = src; while (s[i] != ’\0’ && i < dst_sz - 1) { //bounds on s may expand by 1 dst[i] = s[i]; ++i; } dst[i] = ’\0’; //ok to write to upper bound return i; }

slide-10
SLIDE 10

Where Dynamic Checks Occur

10

int i = *p; *p = 0; *p += 1; (*p)++; Assignment Compound Assignment Increment/Decrement p[n] p->field Pointer Dereference Elided if the compiler can prove the access is safe

slide-11
SLIDE 11

11

bool echo( int16_t user_length, size_t user_payload_len, char *user_payload, resp_t *resp) { char *resp_data = malloc(user_length); resp->payload_buf = resp_data; resp->payload_buf_len = user_length; // memcpy(resp->payload_buf, user_payload_buf, user_length) for (int i = 0; i < user_length; i++) { resp->payload_buf[i] = user_payload_buf[i]; } return true; }

user_length is

provided by user

user_payload_len is

from the parser Copy data from user_payload into new buffer in

resp object

typedef struct { size_t payload_len; char *payload; // ... } resp_t;

Example inspired by Heartbleed error

slide-12
SLIDE 12

12

bool echo( int16_t user_length, size_t user_payload_len, char *user_payload, resp_t *resp) { char *resp_data = malloc(user_length); resp->payload = resp_data; resp->payload_len = user_length; // memcpy(resp->payload_buf, user_payload_buf, user_length) for (int i = 0; i < user_length; i++) { resp->payload_buf[i] = user_payload_buf[i]; } return true; }

malloc could fail

Copy data from user_payload into new buffer in

resp object user_length is

provided by user

user_payload_len is

from the parser

typedef struct { size_t payload_len; char *payload; // ... } resp_t;

slide-13
SLIDE 13

13

bool echo( int16_t user_length, size_t user_payload_len, char *user_payload, resp_t *resp) { char *resp_data = malloc(user_length); resp->payload = resp_data; resp->payload_len = user_length; // memcpy(resp->payload, user_payload, user_length) for (size_t i = 0; i < user_length; i++) { resp->payload[i] = user_payload[i]; } return true; }

Copy data from user_payload into new buffer in

resp object user_length is

provided by user

typedef struct { size_t payload_len; char *payload; // ... } resp_t;

user_payload_len is

from the parser

user_length could be

larger than user_payload_len

slide-14
SLIDE 14

14

bool echo( int16_t user_length, size_t user_payload_len, array_ptr<char> user_payload, ptr<resp_t> resp) { array_ptr<char> resp_data = malloc(user_length); resp->payload = resp_data; resp->payload_len = user_length; // memcpy(resp->payload, user_payload, user_length) for (size_t i = 0; i < user_length; i++) { resp->payload[i] = user_payload[i]; } return true; }

Step 1: Manually Convert to Checked Types

typedef struct { size_t payload_len; array_ptr<char> payload; // ... } resp_t;

slide-15
SLIDE 15

15

bool echo( int16_t user_length, size_t user_payload_len, array_ptr<char> user_payload : count(user_payload_len), ptr<resp_t> resp) { array_ptr<char> resp_data : count(user_length) = malloc(user_length); resp->payload = resp_data; resp->payload_len = user_length; // memcpy(resp->payload, user_payload, user_length) for (size_t i = 0; i < user_length; i++) { resp->payload[i] = user_payload[i]; } return true; }

Step 2: Manually Add Bounds Declarations

typedef struct { size_t payload_len; array_ptr<char> payload : count(payload_len); // ... } resp_t;

slide-16
SLIDE 16

bool echo( int16_t user_length, size_t user_payload_len, array_ptr<char> user_payload : count(user_payload_len), ptr<resp_t> resp) { array_ptr<char> resp_data : count(user_length) = malloc(user_length); dynamic_check(resp != NULL); resp->payload = resp_data; resp->payload_len = user_length; // memcpy(resp->payload, user_payload, user_length) for (size_t i = 0; i < user_length; i++) { dynamic_check(user_payload != NULL); dynamic_check(user_payload <= &user_payload[i]); dynamic_check(&user_payload[i] < user_payload + user_payload_len); dynamic_check(resp->payload != NULL); dynamic_check(resp->payload <= &resp->payload[i]); dynamic_check(&resp->payload[i] < resp->payload + resp->payload_len resp->payload[i] = user_payload[i]; } return true; }

No Memory Disclosure

malloc now checked

Step 3: Compiler Inserts Checks Automatically

16

slide-17
SLIDE 17

bool echo( int16_t user_length, size_t user_payload_len, array_ptr<char> user_payload : count(user_payload_len), ptr<resp_t> resp) { array_ptr<char> resp_data : count(user_length) = malloc(user_length); dynamic_check(resp != NULL); resp->payload = resp_data; resp->payload_len = user_length; // memcpy(resp->payload, user_payload, user_length) for (size_t i = 0; i < user_length; i++) { dynamic_check(user_payload != NULL); dynamic_check(user_payload <= &user_payload[i]); dynamic_check(&user_payload[i] < user_payload + user_payload_len); dynamic_check(resp->payload != NULL); dynamic_check(resp->payload <= &resp->payload[i]); dynamic_check(&resp->payload[i] < resp->payload + resp->payload_len resp->payload[i] = user_payload[i]; } return true; }

No Memory Disclosure

malloc now checked

Step 3: Compiler Inserts Checks Automatically

Code Not Bug-Free: Will signal run-time error if either

  • malloc(user_length) fails
  • user_length > user_payload_len

But: Vulnerable Executions Prevented

17

slide-18
SLIDE 18

18

bool echo( int16_t user_length, size_t user_payload_len, array_ptr<char> user_payload : count(user_payload_len), ptr<resp_t> resp) { array_ptr<char> resp_data : count(user_length) = malloc(user_length); dynamic_check(resp != NULL); resp->payload = resp_data; resp->payload_len = user_length; dynamic_check(user_payload != NULL); dynamic_check(resp->payload != NULL); // memcpy(resp->payload, user_payload, user_length) for (size_t i = 0; i < user_length; i++) { dynamic_check(i <= user_payload_len); resp->payload[i] = user_payload[i]; } return true; }

No Memory Disclosure

malloc still checked

Step 4: Restrictions on bounds expressions may allow removal

slide-19
SLIDE 19

Partially Converted Code

  • We may not want (or be able) to port all at once
  • Can use checked types wherever we want
  • Allows some hard-to-prove-safe idioms
  • But adds risk

19

void more(int *b, int idx, ptr<int *>out) { int oldidx = idx, c; do { c = readvalue(); b[idx++] = c; //could overflow b? } while (c != 0); *out = b+idx-oldidx; //bad if out corrupted }

slide-20
SLIDE 20

Checked Regions

  • Checked regions are scopes (or functions or files) that are

internally safe. Made so by disallowing

  • Explicit casts to checked pointers
  • Reads/writes via unchecked pointers
  • Use of varargs, or K&R prototypes
  • Calls to unchecked functions
  • Keep in mind checked pointers can appear anywhere
  • Different than prior systems which strongly partition

safe and unsafe code

20

slide-21
SLIDE 21

Checked Regions

21

void foo(int *out) { _Ptr<int> ptrout; if (out != (int *)0) { ptrout = (_Ptr<int>)out; // cast OK } else { return; } checked { int b checked[5][5]; for (int i = 0; i < 5; i++) { for (int j = 0; j < 5; j++) { b[i][j] = -1; // access safe } } *ptrout = b[0][0]; } }

}

}

unchecked code checked code

slide-22
SLIDE 22

Blame

What assurance do checked regions give us?

  • 1. Assuming accessible pointers well-formed, checked

region pointer accesses are safe

  • 2. Checked regions preserve well-formedness
  • 3. Hence: Checked code cannot be blamed

Proved this property in a simple formalization of the Checked C type system

22

slide-23
SLIDE 23

Checked Interfaces

Annotate Unchecked Pointers with Bounds, in code and in library function prototypes

23

T* val : itype(ptr<T>) = malloc(sizeof(T)); T* vals : count(n) = calloc(n, sizeof(T)); size_t fwrite(void *p : byte_count(s*n), size_t s, size_t n, FILE *st : itype(_Ptr<FILE>));

slide-24
SLIDE 24

Implementation

24

slide-25
SLIDE 25

Implementation

  • Extended C grammar recognized by Clang (not easy!)
  • Guts of the implementation includes type checking,

bounds inference, subsumption checking, dynamic check insertion

  • Dynamic checks placed so that subsequent LLVM
  • ptimization passes can eliminate redundant ones
  • Put serious effort into this, to get good performance

25

slide-26
SLIDE 26

Preliminary Evaluation

  • How much code must be changed to port?
  • How hard is it to do?
  • What is the overhead on the converted code?
  • Running time, compilation time, executable size

26

slide-27
SLIDE 27

Experimental Setup

  • Olden and Ptrdist Suites: 15 Programs
  • Used by CCured, Deputy, others
  • Converted by hand
  • 2 Conversions Incomplete
  • 12 Core Intel Xeon X5650, 2.66GHz, 32GB RAM

27

slide-28
SLIDE 28

28

Benchmark LoC Description Olden: bh 1,162

Barnes & Hut N-body force computation

Olden: bisort 263

Forward & Backward Bitonic Sort

Olden: em3d 478

3D Electromagnetic Wave Propagation

Olden: health 389

Columbian Health Care Simulation

Olden: mst 328

Minimum Spanning Tree

Olden: perimeter 399

Perimeters of Regions on Images

Olden: power 458

Power Pricing Optimisation Solver

Olden: treeadd 180

Recursive Sum over Tree

Olden: tsp 420

Travelling Salesman Problem

Olden: voronoi 814

Computes voronoi diagram of a set of points

Ptrdist: anagram 362

Finding Anagrams from a Dictionary

Ptrdist: bc 5,194

Arbitrary precision calculator

Ptrdist: ft 893

Minimum Spanning Tree using Fibonacci heaps

Ptrdist: ks 552

Schweikert-Kernighan Graph Partitioning

Ptrdist: yacr2 2,529

VSLI Channel Router

Benchmarks

slide-29
SLIDE 29

14.6% 14.6% 14.6% 14.6% 14.6% 14.6% 14.6% 14.6% 14.6% 14.6% 14.6% 14.6% 14.6%

yacr2 ks ft anagram tsp treadd power perimeter mst health em3d bisort bh 0% 10% 20% 30%

Lines Modified (%) Benchmark 83.9% 83.9% 83.9% 83.9% 83.9% 83.9% 83.9% 83.9% 83.9% 83.9% 83.9% 83.9% 83.9%

yacr2 ks ft anagram tsp treadd power perimeter mst health em3d bisort bh 0% 20% 40% 60% 80% 100%

Easy Modifications (%) 10.6% 10.6% 10.6% 10.6% 10.6% 10.6% 10.6% 10.6% 10.6% 10.6% 10.6% 10.6% 10.6%

yacr2 ks ft anagram tsp treadd power perimeter mst health em3d bisort bh 0% 10% 20% 30%

Lines Unchecked (%) Benchmark Suite

Olden Ptrdist

Code Modifications

29

slide-30
SLIDE 30

+ 8.2% + 8.2% + 8.2% + 8.2% + 8.2% + 8.2% + 8.2% + 8.2% + 8.2% + 8.2% + 8.2% + 8.2% + 8.2%

yacr2 ks ft anagram tsp treadd power perimeter mst health em3d bisort bh −20% 0% + 20% + 40% + 60%

Runtime Slowdown (±%) Benchmark + 19.5% + 19.5% + 19.5% + 19.5% + 19.5% + 19.5% + 19.5% + 19.5% + 19.5% + 19.5% + 19.5% + 19.5% + 19.5%

yacr2 ks ft anagram tsp treadd power perimeter mst health em3d bisort bh −25% 0% + 25% + 50% + 75%

Compile Time Slowdown (±%) + 6.3% + 6.3% + 6.3% + 6.3% + 6.3% + 6.3% + 6.3% + 6.3% + 6.3% + 6.3% + 6.3% + 6.3% + 6.3%

yacr2 ks ft anagram tsp treadd power perimeter mst health em3d bisort bh −20% 0% + 20% + 40% + 60%

Executable Size Change (±%) Benchmark Suite

Olden Ptrdist

Performance Overhead

30

slide-31
SLIDE 31

In Progress

  • Full implementation of NUL-terminated arrays, general pointer arithmetic
  • Need data flow analysis to handle changing bounds via pointer arithmetic

(flow-sensitive types)

  • Automated conversion of C code
  • Partially complete inference of ptr<T> types (based on CCured)
  • Working on inference of bounds expressions via abstract interpretation

framework

  • Smaller stuff
  • Definite initialization, better subsumption checks,
  • More experience (and then some)
  • Temporal safety (longer term)

31

slide-32
SLIDE 32

Summary

  • Checked C is a new effort to make C safe. Key design choices
  • Binary compatible with C — pointers annotated with

bounds declarations, not made “fat”

  • An extension, not a replacement, with a design to help

reason about mixed code

  • Part of LLVM, to encourage production use
  • Focus on spatial safety for now; temporal safety via GC in

meantime

32

https://github.com/Microsoft/checkedc https://github.com/Microsoft/checkedc-clang