Programming languages and their trustworthy implementation Xavier - - PowerPoint PPT Presentation

programming languages and their trustworthy implementation
SMART_READER_LITE
LIVE PREVIEW

Programming languages and their trustworthy implementation Xavier - - PowerPoint PPT Presentation

Programming languages and their trustworthy implementation Xavier Leroy INRIA Paris Van Wijngaarden award, 2016-11-05 A brief history of programming languages and their compilation Its all zeros and ones, right? 10111000 00000001


slide-1
SLIDE 1

Programming languages and their trustworthy implementation

Xavier Leroy

INRIA Paris

Van Wijngaarden award, 2016-11-05

slide-2
SLIDE 2

A brief history

  • f programming languages

and their compilation

slide-3
SLIDE 3

It’s all zeros and ones, right?

10111000 00000001 00000000 00000000 00000000 10111010 00000010 00000000 00000000 00000000 00111001 11011010 01111111 00000110 00001111 10101111 11000010 01000010 11101011 11110110 11000011

(x86 machine code for the factorial function)

Machine code is. That doesn’t make it a usable language.

slide-4
SLIDE 4

Antiquity (1950): assembly language

A textual representation of machine code, with mnemonic names for instructions, symbolic names for code and data labels, and comments for humans to read.

Example (Factorial in x86 assembly language)

; Input: argument N in register EBX ; Output: factorial N in register EAX Factorial: mov eax, 1 ; initial result = 1 mov edx, 2 ; loop index = 2 L1: cmp edx, ebx ; while loop <= N ... jg L2 imul eax, edx ; multiply result by index inc edx ; increment index jmp L1 ; end while L2: ret ; end Factorial function

slide-5
SLIDE 5

The Renaissance: arithmetic expressions

(FORTRAN 1957)

Express mathematical formulas the way we write them on paper. x1, x2 = −b ± √ b2 − 4ac 2a In assembly: In FORTRAN:

mul t1, b, b sub x1, d, b D = SQRT(B*B - 4*A*C) mul t2, a, c div x1, x1, t3 X1 = (-B + D) / (2*A) mul t2, t2, 4 neg x2, b X2 = (-B - D) / (2*A) sub t1, t1, t2 sub x2, x2, d sqrt d, t1 div x2, x2, t3 mul t3, a, 2

slide-6
SLIDE 6

A historical parallel with mathematics

Brahmagupta, 628: Whatever is the square-root of the rupas multiplied by the square [and] increased by the square of half the unknown, diminish that by half the unknown [and] divide [the remainder] by its square. [The result is] the unknown. Cardano, Vi` ete, et al, 1550–1600: x1, x2 = −b ± √ b2 − 4ac 2a

slide-7
SLIDE 7

The Enlightenment: functions, procedures and recursion

(Lisp, 1958; Algol, 1960)

procedure q u a d r a t i c ( x1 , x2 , a , b , c ) ; value a , b , c ; real a , b , c , x1 , x2 ; begin real d ; d := s q r t (b ∗ b − 4 ∗ a ∗ c ) ; x1 := (−b + d) / (2 ∗ a ) ; x2 := (−b − d) / (2 ∗ a ) end ; integer procedure f a c t o r i a l (n) ; value n ; integer n ; begin i f n < 2 then f a c t o r i a l := 1 else f a c t o r i a l := n ∗ f a c t o r i a l (n−1) end ;

slide-8
SLIDE 8

Industrial revolution and modern times

APL 1962, Algol W 1966, ISWIM 1966, BCPL 1967, Algol 1968, Pascal 1970, C 1972, Prolog 1972, ML 1973, CLU 1974, Modula 1975, Smalltalk 1976, Ada 1983, C++ 1983, Common Lisp 1984, Eiffel 1986, Modula-3 1989, Haskell 1990, Python 1991, Java 1995, OCaml 1996, Javascript 1997, C# 2000, Scala 2003, Go 2009, Rust 2010, Swift 2014

A proliferation of languages that provide support for high-level programming constructs.

slide-9
SLIDE 9

Implementing programming languages

1940 1950 1960 1970 1980 1990 2000 2010 Expressiveness of machine language C

  • m

p l e x i t y

  • f

a p p l i c a t i

  • n

s Expressiveness of programming languages Compilation Programming

slide-10
SLIDE 10

The challenge of compilation

1 Translate faithfully a high-level programming language into

very low-level machine language.

2 “Optimize”, or more exactly improve performance of

generated machine code:

  • by taking advantage of hardware features;
  • by eliminating inefficiencies left by the programmer.
slide-11
SLIDE 11

An example of optimizing compilation

  • a ·

b =

i<n

  • i=0

aibi double dotproduct(int n, double * a, double * b) { double dp = 0.0; int i; for (i = 0; i < n; i++) dp = dp + a[i] * b[i]; return dp; } Compiled with a good compiler, then manually decompiled to C. . .

slide-12
SLIDE 12

double dotproduct(int n, double a[], double b[]) { dp = 0.0; if (n <= 0) goto L5; r2 = n - 3; f1 = 0.0; r1 = 0; f10 = 0.0; f11 = 0.0; if (r2 > n || r2 <= 0) goto L19; prefetch(a[16]); prefetch(b[16]); if (4 >= r2) goto L14; prefetch(a[20]); prefetch(b[20]); f12 = a[0]; f13 = b[0]; f14 = a[1]; f15 = b[1]; r1 = 8; if (8 >= r2) goto L16; L17: f16 = b[2]; f18 = a[2]; f17 = f12 * f13; f19 = b[3]; f20 = a[3]; f15 = f14 * f15; f12 = a[4]; f16 = f18 * f16; f19 = f29 * f19; f13 = b[4]; a += 4; f14 = a[1]; f11 += f17; r1 += 4; f10 += f15; f15 = b[5]; prefetch(a[20]); prefetch(b[24]); f1 += f16; dp += f19; b += 4; if (r1 < r2) goto L17; L16: f15 = f14 * f15; f21 = b[2]; f23 = a[2]; f22 = f12 * f13; f24 = b[3]; f25 = a[3]; f21 = f23 * f21; f12 = a[4]; f13 = b[4]; f24 = f25 * f24; f10 = f10 + f15; a += 4; b += 4; f14 = a[8]; f15 = b[8]; f11 += f22; f1 += f21; dp += f24; L18: f26 = b[2]; f27 = a[2]; f14 = f14 * f15; f28 = b[3]; f29 = a[3]; f12 = f12 * f13; f26 = f27 * f26; a += 4; f28 = f29 * f28; b += 4; f10 += f14; f11 += f12; f1 += f26; dp += f28; dp += f1; dp += f10; dp += f11; if (r1 >= n) goto L5; L19: f30 = a[0]; f18 = b[0]; r1 += 1; a += 8; f18 = f30 * f18; b += 8; dp += f18; if (r1 < n) goto L19; L5: return dp; L14: f12 = a[0]; f13 = b[0]; f14 = a[1]; f15 = b[1]; goto L18; }

slide-13
SLIDE 13

L17: f16 = b[2]; f18 = a[2]; f17 = f12 * f13; f19 = b[3]; f20 = a[3]; f15 = f14 * f15; f12 = a[4]; f16 = f18 * f16; f19 = f29 * f19; f13 = b[4]; a += 4; f14 = a[1]; f11 += f17; r1 += 4; f10 += f15; f15 = b[5]; prefetch(a[20]); prefetch(b[24]); f1 += f16; dp += f19; b += 4; if (r1 < r2) goto L17;

slide-14
SLIDE 14

double dotproduct(int n, double a[], double b[]) { dp = 0.0; if (n <= 0) goto L5; r2 = n - 3; f1 = 0.0; r1 = 0; f10 = 0.0; f11 = 0.0; if (r2 > n || r2 <= 0) goto L19; prefetch(a[16]); prefetch(b[16]); if (4 >= r2) goto L14; prefetch(a[20]); prefetch(b[20]); f12 = a[0]; f13 = b[0]; f14 = a[1]; f15 = b[1]; r1 = 8; if (8 >= r2) goto L16; L16: f15 = f14 * f15; f21 = b[2]; f23 = a[2]; f22 = f12 * f13; f24 = b[3]; f25 = a[3]; f21 = f23 * f21; f12 = a[4]; f13 = b[4]; f24 = f25 * f24; f10 = f10 + f15; a += 4; b += 4; f14 = a[8]; f15 = b[8]; f11 += f22; f1 += f21; dp += f24; L18: f26 = b[2]; f27 = a[2]; f14 = f14 * f15; f28 = b[3]; f29 = a[3]; f12 = f12 * f13; f26 = f27 * f26; a += 4; f28 = f29 * f28; b += 4; f10 += f14; f11 += f12; f1 += f26; dp += f28; dp += f1; dp += f10; dp += f11; if (r1 >= n) goto L5; L19: f30 = a[0]; f18 = b[0]; r1 += 1; a += 8; f18 = f30 * f18; b += 8; dp += f18; if (r1 < n) goto L19; L5: return dp; L14: f12 = a[0]; f13 = b[0]; f14 = a[1]; f15 = b[1]; goto L18; }

slide-15
SLIDE 15

Can you trust your compiler?

slide-16
SLIDE 16

Miscompilation happens

We tested thirteen production-quality C compilers and, for each, found situations in which the compiler generated incorrect code for accessing volatile variables.

  • E. Eide & J. Regehr, EMSOFT 2008

To improve the quality of C compilers, we created Csmith, a randomized test-case generation tool, and spent three years using it to find compiler bugs. During this period we reported more than 325 previously unknown bugs to compiler

  • developers. Every compiler we tested was found to crash and

also to silently generate wrong code when presented with valid input.

  • X. Yang, Y. Chen, E. Eide & J. Regehr, PLDI 2011
slide-17
SLIDE 17

Are miscompilation bugs a problem?

For non-critical software:

  • Programmers rarely run into them.
  • Globally negligible compared with bugs in the program itself.

For critical software:

  • A source of concern.
  • Require additional verification activities. (E.g. manual reviews
  • f generated assembly code; more tests.)
  • Reduce the usefulness of formal verification.

(A provably-correct source program can still misbehave at run-time!)

slide-18
SLIDE 18

Addressing miscompilation

A radical solution: why not formally verify the compiler itself? After all, compilers have simple specifications: If compilation succeeds, the generated code should behave as prescribed by the semantics of the source program. As a corollary, we obtain: Any safety property of the observable behavior of the source program carries over to the generated executable code.

slide-19
SLIDE 19

An old idea. . .

Mathematical Aspects of Computer Science, 1967

slide-20
SLIDE 20

An old idea. . .

Machine Intelligence (7), 1972.

slide-21
SLIDE 21

CompCert: a formally-verified C compiler

slide-22
SLIDE 22

The CompCert project

(X. Leroy, S. Blazy, et al)

Develop and prove correct a realistic compiler, usable for critical embedded software.

  • Source language: a very large subset of C 99.
  • Target language: PowerPC/ARM/x86 assembly.
  • Generates reasonably compact and fast code

⇒ careful code generation; some optimizations. Note: compiler written from scratch, along with its proof; not trying to prove an existing compiler.

slide-23
SLIDE 23

The formally verified part of the compiler

CompCert C Clight C#minor Cminor CminorSel RTL LTL Linear Mach Asm PPC Asm ARM Asm x86

side-effects out

  • f expressions

type elimination loop simplifications stack allocation

  • f “&” variables

instruction selection CFG construction

  • expr. decomp.

register allocation (IRC) calling conventions linearization

  • f the CFG

layout of stack frames asm code generation Optimizations: constant prop., CSE, inlining, tail calls

slide-24
SLIDE 24

Formally verified using Coq

The correctness proof (semantic preservation) for the compiler is entirely machine-checked, using the Coq proof assistant.

Theorem transf_c_program_correct: forall (p: Csyntax.program) (tp: Asm.program) (b: behavior), transf_c_program p = OK tp -> program_behaves (Asm.semantics tp) b -> exists b’, program_behaves (Csem.semantics p) b’ /\ behavior_improves b’ b.

Shows refinement of observable behaviors beh:

  • Reduction of internal nondeterminism

(e.g. choose one evaluation order among the several allowed by C)

  • Replacement of run-time errors by more defined behaviors

(e.g. optimize away a division by zero)

slide-25
SLIDE 25

Compiler verification patterns (for each pass)

transformation transformation validator × transformation untrusted solver × checker Verified transformation Verified translation validation External solver with verified validation = formally verified = not verified

slide-26
SLIDE 26

Programmed (mostly) in Coq

All the verified parts of the compiler are programmed directly in Coq’s specification language, using pure functional style.

  • Monads to handle errors and mutable state.
  • Purely functional data structures.

Coq’s extraction mechanism produces executable Caml code from these specifications. Claim: purely functional programming is the shortest path to writing and proving a program.

slide-27
SLIDE 27

The whole Compcert compiler

AST C AST Asm C source Assembly Executable

preprocessing, parsing, AST construction type-checking, de-sugaring Verified compiler printing of asm syntax assembling linking Register allocation Code linearization heuristics

Proved in Coq

(extracted to Caml)

Not proved

(hand-written in Caml) Part of the TCB Not part of the TCB

slide-28
SLIDE 28

Performance of generated code

(On a Power 7 processor) fib qsort fft sha1 aes almabench lists binarytrees fannkuch knucleotide mandelbrot nbody nsieve nsievebits spectral vmach bisect chomp perlin arcode lzw lzss raytracer Execution time gcc -O0 CompCert gcc -O1 gcc -O3

slide-29
SLIDE 29

A tangible increase in quality

The striking thing about our CompCert results is that the middleend bugs we found in all other compilers are

  • absent. As of early 2011, the under-development version
  • f CompCert is the only compiler we have tested for

which Csmith cannot find wrong-code errors. This is not for lack of trying: we have devoted about six CPU-years to the task. The apparent unbreakability of CompCert supports a strong argument that developing compiler

  • ptimizations within a proof framework, where safety

checks are explicit and machine-checked, has tangible benefits for compiler users.

  • X. Yang, Y. Chen, E. Eide, J. Regehr, PLDI 2011
slide-30
SLIDE 30

Conclusions and perspectives

slide-31
SLIDE 31

Ongoing and future work

Verifying program provers & static analyzers Other source languages More assurance More

  • ptimizations

“Bootstrapping” (verified extraction) Shared-memory concurrency Connections w/ hardware verification Other source languages besides C: experiments in progress with functional languages, SPARK Ada and SCADE/Lustre.

slide-32
SLIDE 32

Ongoing and future work

Verifying program provers & static analyzers Other source languages More assurance More

  • ptimizations

“Bootstrapping” (verified extraction) Shared-memory concurrency Connections w/ hardware verification Prove or validate more of the trusted base: preprocessing, lexing, elaboration, assembling, linking, . . .

slide-33
SLIDE 33

Ongoing and future work

Verifying program provers & static analyzers Other source languages More assurance More

  • ptimizations

“Bootstrapping” (verified extraction) Shared-memory concurrency Connections w/ hardware verification Add advanced optimizations, esp. loop optimizations.

slide-34
SLIDE 34

Ongoing and future work

Verifying program provers & static analyzers Other source languages More assurance More

  • ptimizations

“Bootstrapping” (verified extraction) Shared-memory concurrency Connections w/ hardware verification Gain formal confidence in the tools that build CompCert. (Coq’s extraction, OCaml compilation.)

slide-35
SLIDE 35

Ongoing and future work

Verifying program provers & static analyzers Other source languages More assurance More

  • ptimizations

“Bootstrapping” (verified extraction) Shared-memory concurrency Connections w/ hardware verification Race-free programs + concurrent separation logic (A. Appel et al)

  • r: racy programs + hardware memory models (P. Sewell et al).
slide-36
SLIDE 36

Ongoing and future work

Verifying program provers & static analyzers Other source languages More assurance More

  • ptimizations

“Bootstrapping” (verified extraction) Shared-memory concurrency Connections w/ hardware verification Formal specs for architectures & instruction sets, as the missing link between compiler verification and hardware verification.

slide-37
SLIDE 37

Ongoing and future work

Verifying program provers & static analyzers Other source languages More assurance More

  • ptimizations

“Bootstrapping” (verified extraction) Shared-memory concurrency Connections w/ hardware verification The Verasco project: formal verification of a static analyzer based

  • n abstract interpretation.
slide-38
SLIDE 38

In closing. . .

Critical software deserves the most trustworthy tools that computer science can produce. Let’s make this a reality!