Scalable Certification for Scalable Certification for Typed - - PowerPoint PPT Presentation

scalable certification for scalable certification for
SMART_READER_LITE
LIVE PREVIEW

Scalable Certification for Scalable Certification for Typed - - PowerPoint PPT Presentation

Scalable Certification for Scalable Certification for Typed Assembly Language Typed Assembly Language Dan Grossman (with Greg Morrisett) Cornell University 2000 ACM SIGPLAN Workshop on Types in Compilation AFTER AF Types After After


slide-1
SLIDE 1

Scalable Certification for Scalable Certification for Typed Assembly Language Typed Assembly Language

Dan Grossman (with Greg Morrisett)

Cornell University

2000 ACM SIGPLAN Workshop on Types in Compilation

AF AFTER

slide-2
SLIDE 2

September 2000 TIC00 Montreal 2

Types Types After After Compilation Compilation --

  • - Why?

Why? Verifying object code is “well-behaved” means we needn’t trust the code producer

  • Producer-supplied types guide verification
  • Encourages compiler robustness
  • Promises efficient untrusted plug-ins

To maximize benefit, we want...

slide-3
SLIDE 3

September 2000 TIC00 Montreal 3

Certified Code Design Goals Certified Code Design Goals

  • Low-level target language

avoids performance / trusted computed base trade-off

  • Source-language & compiler independent

avoids hacks, promotes re-use, the object-code way

  • Permit efficient object code
  • therwise, just interpret or monitor at run time
  • Small Certificates and Fast Verification
  • therwise, only small programs are possible

Still learning how to balance these needs in practice

slide-4
SLIDE 4

September 2000 TIC00 Montreal 4

State of the Art State of the Art

Low-level Compiler- independent Efficient Code Efficient Certification JVML No No Yes? Yes PCC Yes No Yes Yes ECC Yes No No Yes Appel/ Felty Yes! Yes Yes? ??? TAL Yes Yes Yes (This talk)

slide-5
SLIDE 5

September 2000 TIC00 Montreal 5

Scalable Certification in 15 mins Scalable Certification in 15 mins

  • Classification of Approaches
  • Why Compiler Independence Makes

Scalability Harder

  • Techniques that Make TAL Work
  • Experimental Results
  • Summary of some lessons learned

See the paper for much, much more

slide-6
SLIDE 6

September 2000 TIC00 Montreal 6

Approach #1 Approach #1 --

  • - Bake It In

Bake It In If you allow only one way, no annotations needed and it’s trivial to check

Examples:

  • Grouping code into procedures
  • Function prologues
  • Installing exception handlers

The type system is at a different level of abstraction An analogy: RISC vs. CISC

slide-7
SLIDE 7

September 2000 TIC00 Montreal 7

Approach #2 Approach #2 --

  • - Don’t Optimize

Don’t Optimize Optimizations that are expensive to prove safe are expensive to certify

Examples:

  • Dynamic type tests
  • Arithmetic (division by zero, array-bounds elimination)
  • Memory initialized before use

Better code can make a system look worse A new factor for where to optimize?

slide-8
SLIDE 8

September 2000 TIC00 Montreal 8

Approach #3 Approach #3 --

  • - Reconstruct

Reconstruct Don’t write down what the verifier can easily determine

Examples:

  • Don’t put types on every instruction/operand
  • Omit proof steps where inversion suffices
  • Re-verify target code at each “call” site (virtual inlining)

Can trade time for space or get a win/win Analogy: source-level type inference w/o the human factor

slide-9
SLIDE 9

September 2000 TIC00 Montreal 9

Approach #4 Approach #4 --

  • - Compress

Compress Let gzip and domain-specific tricks solve our problems

  • For annotation size, no reason not to compress
  • Easy to pipeline decompression, but certification is

not I/O bound Then again, object code compresses too

slide-10
SLIDE 10

September 2000 TIC00 Montreal 10

Approach #5 Approach #5 --

  • - Abbreviate

Abbreviate Give the code producer type-level tools for parameterization and re-use

  • Just (terminating) functions at the type level
  • Usually easy for the code producer
  • Improves certificate size, but may hurt certification time

Not much harder than implementing the lambda-calculus

slide-11
SLIDE 11

September 2000 TIC00 Montreal 11

Approaches Summary Approaches Summary

  • Bake it in
  • Don’t optimize
  • Reconstruct
  • Compress
  • Abbreviate

Now let’s get our hands dirty...

slide-12
SLIDE 12

September 2000 TIC00 Montreal 12

An Example An Example – – Code Pre Code Pre-

  • condition

condition

int foo(int x) { return x; } foo:τ MOV EAX, [ESP+0] RETN

Pre-condition describes calling convention: where are the arguments, results, return address, exception handler (what’s an exception anyway), ...

slide-13
SLIDE 13

September 2000 TIC00 Montreal 13

Bake it in... Bake it in...

int foo(int x) { return x; } foo:int→int MOV EAX, [ESP+0] RETN

Pre-condition describes calling convention: where are the arguments, results, return address, exception handler (what’s an exception anyway), ...

slide-14
SLIDE 14

September 2000 TIC00 Montreal 14

Really bake it in... Really bake it in...

int foo(int x) { return x; } foo_Fii: MOV EAX, [ESP+0] RETN

Pre-condition describes calling convention: where are the arguments, results, return address, exception handler (what’s an exception anyway), ...

slide-15
SLIDE 15

September 2000 TIC00 Montreal 15

Or spell it all out... Or spell it all out...

int foo(int x) { return x; }

foo:∀a:T,b:T,c:T,r1:S,r2:S,e1:C,e2:C. {ESP: {ESP:int::r1@{EAX:exn,ESP:r2,M:e2}::r2 EAX:int, EBX:a,ESI:b,EDI:c, M:e1+e2, EBP: {EAX:exn,ESP:r2,M:e2}::r2, }::int::r1@{EAX:exn,ESP:r2,M:e2}::r2, EBP: {EAX:exn,ESP:r2,M:e2}::r2, EBX:a, ESI:b, EDI:c, M:e1+e2} MOV EAX, [ESP+0] RETN Pre-condition describes calling convention: arguments, results, return address pre-condition, callee-save registers, exception handler, ...

slide-16
SLIDE 16

September 2000 TIC00 Montreal 16

What to do? What to do?

∀a:T,b:T,c:T,r1:S,r2:S,e1:C,e2:C. {ESP: {ESP:int:: r1@{EAX:exn,ESP:r2,M:e2}::r2 EAX:int, EBX:a,ESI:b,EDI:c, M:e1+e2, EBP: {EAX:exn,ESP:r2,M:e2}::r2, }::int:: r1@{EAX:exn,ESP:r2,M:e2}::r2, EBP: {EAX:exn,ESP:r2,M:e2}::r2, EBX:a, ESI:b, EDI:c, M:e1+e2}

  • Compress (compiler invariants are very repetitious)
  • Don’t optimize (fewer invariants)
  • Abbreviate:

foo: F [int] int F = λ args λ results .

args args result

slide-17
SLIDE 17

September 2000 TIC00 Montreal 17

And Reconstruction Too And Reconstruction Too

If we elide a pre-condition, the verifier can re-verify the block for each predecessor

  • Restrict to forward jumps to prevent loops
  • Beware exponential blowup
  • Bad news: Optimal type placement appears intractable
  • Good news: Naive heuristics save significant space
slide-18
SLIDE 18

September 2000 TIC00 Montreal 18

A real application A real application A bootstrapping compiler from Popcorn to TAL

  • Popcorn:
  • “Java w/o objects, w/ polymorphism and limited pattern-matching”
  • “ML w/o closures or modules, w/ C-like core syntax”
  • “Safe C – pointerful, garbage collection, exceptions”
  • Compiler:
  • Conventional
  • Graph-coloring register allocation, null-check elimination
  • Verifier: OCaml 2.04
  • System: Pentium II, 266MHz, 64MB, NT4.0
slide-19
SLIDE 19

September 2000 TIC00 Montreal 19

Bottom line Bottom line – – it works it works

  • Source code: 18KLOC, 39 files
  • Target code: 816 Kb (335 Kb after strip)
  • Target types: 419 Kb
  • Compilation: 40 secs
  • Assembly: 20 secs
  • Verification: 34.5 secs

And proportional to file size

slide-20
SLIDE 20

September 2000 TIC00 Montreal 20

The engineering matters The engineering matters (Recall: 419Kb of types, 34.5 secs to verify)

  • Without abbreviations: 2041Kb
  • Without pre-condition elision: 550Kb
  • Without either: 4500Kb
  • As much elision as legal: 402Kb, 740 secs
  • gzip reduces the 419Kb to 163Kb
slide-21
SLIDE 21

September 2000 TIC00 Montreal 21

Also studied... Also studied...

  • Differences among code styles
  • Techniques for speeding up the verifier
  • Other forms of reconstruction
  • Being “gzip-friendly”
slide-22
SLIDE 22

September 2000 TIC00 Montreal 22

Some engineering lessons Some engineering lessons

  • Compiler-independence produces large

repetitious annotations.

  • Abbreviations are easy and space-

effective, but not time-effective.

  • Overhead should never be proportional to

the number of loop-free paths in the code.

  • Certification bottlenecks often do not

appear in small, simple programs.