Programming languages and their trustworthy implementation Xavier - PowerPoint PPT Presentation

Programming languages and their trustworthy implementation Xavier Leroy INRIA Paris Van Wijngaarden award, 2016-11-05

A brief history of programming languages and their compilation

It’s all zeros and ones, right? 10111000 00000001 00000000 00000000 00000000 10111010 00000010 00000000 00000000 00000000 00111001 11011010 01111111 00000110 00001111 10101111 11000010 01000010 11101011 11110110 11000011 (x86 machine code for the factorial function) Machine code is. That doesn’t make it a usable language.

Antiquity (1950): assembly language A textual representation of machine code, with mnemonic names for instructions, symbolic names for code and data labels, and comments for humans to read. Example (Factorial in x86 assembly language) ; Input: argument N in register EBX ; Output: factorial N in register EAX Factorial: mov eax, 1 ; initial result = 1 mov edx, 2 ; loop index = 2 L1: cmp edx, ebx ; while loop <= N ... jg L2 imul eax, edx ; multiply result by index inc edx ; increment index jmp L1 ; end while L2: ret ; end Factorial function

The Renaissance: arithmetic expressions (FORTRAN 1957) Express mathematical formulas the way we write them on paper. √ b 2 − 4 ac x 1 , x 2 = − b ± 2 a In assembly: In FORTRAN: mul t1, b, b sub x1, d, b D = SQRT(B*B - 4*A*C) mul t2, a, c div x1, x1, t3 X1 = (-B + D) / (2*A) mul t2, t2, 4 neg x2, b X2 = (-B - D) / (2*A) sub t1, t1, t2 sub x2, x2, d sqrt d, t1 div x2, x2, t3 mul t3, a, 2

A historical parallel with mathematics Brahmagupta, 628: Whatever is the square-root of the rupas multiplied by the square [and] increased by the square of half the unknown, diminish that by half the unknown [and] divide [the remainder] by its square. [The result is] the unknown. Cardano, Vi` ete, et al, 1550–1600: √ b 2 − 4 ac x 1 , x 2 = − b ± 2 a

The Enlightenment: functions, procedures and recursion (Lisp, 1958; Algol, 1960) procedure q u a d r a t i c ( x1 , x2 , a , b , c ) ; value a , b , c ; real a , b , c , x1 , x2 ; begin real d ; d := s q r t (b ∗ b − 4 ∗ a ∗ c ) ; x1 := ( − b + d) / (2 ∗ a ) ; x2 := ( − b − d) / (2 ∗ a ) end ; integer procedure f a c t o r i a l (n) ; value n ; integer n ; begin n < 2 then i f f a c t o r i a l := 1 else f a c t o r i a l := n ∗ f a c t o r i a l (n − 1) end ;

Industrial revolution and modern times APL 1962, Algol W 1966, ISWIM 1966, BCPL 1967, Algol 1968, Pascal 1970, C 1972, Prolog 1972, ML 1973, CLU 1974, Modula 1975, Smalltalk 1976, Ada 1983, C++ 1983, Common Lisp 1984, Eiffel 1986, Modula-3 1989, Haskell 1990, Python 1991, Java 1995, OCaml 1996, Javascript 1997, C# 2000, Scala 2003, Go 2009, Rust 2010, Swift 2014 A proliferation of languages that provide support for high-level programming constructs.

Implementing programming languages s n o i t a c i l p p Programming a f o y t i x e l p m o C Expressiveness of programming languages Compilation Expressiveness of machine language 1940 1950 1960 1970 1980 1990 2000 2010

The challenge of compilation 1 Translate faithfully a high-level programming language into very low-level machine language. 2 “Optimize”, or more exactly improve performance of generated machine code: • by taking advantage of hardware features; • by eliminating inefficiencies left by the programmer.

An example of optimizing compilation i < n a · � � � b = a i b i i =0 double dotproduct(int n, double * a, double * b) { double dp = 0.0; int i; for (i = 0; i < n; i++) dp = dp + a[i] * b[i]; return dp; } Compiled with a good compiler, then manually decompiled to C. . .

double dotproduct(int n, double a[], double b[]) { dp = 0.0; if (n <= 0) goto L5; r2 = n - 3; f1 = 0.0; r1 = 0; f10 = 0.0; f11 = 0.0; if (r2 > n || r2 <= 0) goto L19; prefetch(a[16]); prefetch(b[16]); if (4 >= r2) goto L14; prefetch(a[20]); prefetch(b[20]); f12 = a[0]; f13 = b[0]; f14 = a[1]; f15 = b[1]; r1 = 8; if (8 >= r2) goto L16; L17: f16 = b[2]; f18 = a[2]; f17 = f12 * f13; f19 = b[3]; f20 = a[3]; f15 = f14 * f15; f12 = a[4]; f16 = f18 * f16; f19 = f29 * f19; f13 = b[4]; a += 4; f14 = a[1]; f11 += f17; r1 += 4; f10 += f15; f15 = b[5]; prefetch(a[20]); prefetch(b[24]); f1 += f16; dp += f19; b += 4; if (r1 < r2) goto L17; L16: f15 = f14 * f15; f21 = b[2]; f23 = a[2]; f22 = f12 * f13; f24 = b[3]; f25 = a[3]; f21 = f23 * f21; f12 = a[4]; f13 = b[4]; f24 = f25 * f24; f10 = f10 + f15; a += 4; b += 4; f14 = a[8]; f15 = b[8]; f11 += f22; f1 += f21; dp += f24; L18: f26 = b[2]; f27 = a[2]; f14 = f14 * f15; f28 = b[3]; f29 = a[3]; f12 = f12 * f13; f26 = f27 * f26; a += 4; f28 = f29 * f28; b += 4; f10 += f14; f11 += f12; f1 += f26; dp += f28; dp += f1; dp += f10; dp += f11; if (r1 >= n) goto L5; L19: f30 = a[0]; f18 = b[0]; r1 += 1; a += 8; f18 = f30 * f18; b += 8; dp += f18; if (r1 < n) goto L19; L5: return dp; L14: f12 = a[0]; f13 = b[0]; f14 = a[1]; f15 = b[1]; goto L18; }

L17: f16 = b[2]; f18 = a[2]; f17 = f12 * f13; f19 = b[3]; f20 = a[3]; f15 = f14 * f15; f12 = a[4]; f16 = f18 * f16; f19 = f29 * f19; f13 = b[4]; a += 4; f14 = a[1]; f11 += f17; r1 += 4; f10 += f15; f15 = b[5]; prefetch(a[20]); prefetch(b[24]); f1 += f16; dp += f19; b += 4; if (r1 < r2) goto L17;

double dotproduct(int n, double a[], double b[]) { dp = 0.0; if (n <= 0) goto L5; r2 = n - 3; f1 = 0.0; r1 = 0; f10 = 0.0; f11 = 0.0; if (r2 > n || r2 <= 0) goto L19; prefetch(a[16]); prefetch(b[16]); if (4 >= r2) goto L14; prefetch(a[20]); prefetch(b[20]); f12 = a[0]; f13 = b[0]; f14 = a[1]; f15 = b[1]; r1 = 8; if (8 >= r2) goto L16; L16: f15 = f14 * f15; f21 = b[2]; f23 = a[2]; f22 = f12 * f13; f24 = b[3]; f25 = a[3]; f21 = f23 * f21; f12 = a[4]; f13 = b[4]; f24 = f25 * f24; f10 = f10 + f15; a += 4; b += 4; f14 = a[8]; f15 = b[8]; f11 += f22; f1 += f21; dp += f24; L18: f26 = b[2]; f27 = a[2]; f14 = f14 * f15; f28 = b[3]; f29 = a[3]; f12 = f12 * f13; f26 = f27 * f26; a += 4; f28 = f29 * f28; b += 4; f10 += f14; f11 += f12; f1 += f26; dp += f28; dp += f1; dp += f10; dp += f11; if (r1 >= n) goto L5; L19: f30 = a[0]; f18 = b[0]; r1 += 1; a += 8; f18 = f30 * f18; b += 8; dp += f18; if (r1 < n) goto L19; L5: return dp; L14: f12 = a[0]; f13 = b[0]; f14 = a[1]; f15 = b[1]; goto L18; }

Can you trust your compiler?

Miscompilation happens We tested thirteen production-quality C compilers and, for each, found situations in which the compiler generated incorrect code for accessing volatile variables. E. Eide & J. Regehr, EMSOFT 2008 To improve the quality of C compilers, we created Csmith, a randomized test-case generation tool, and spent three years using it to find compiler bugs. During this period we reported more than 325 previously unknown bugs to compiler developers. Every compiler we tested was found to crash and also to silently generate wrong code when presented with valid input. X. Yang, Y. Chen, E. Eide & J. Regehr, PLDI 2011

Are miscompilation bugs a problem? For non-critical software: • Programmers rarely run into them. • Globally negligible compared with bugs in the program itself. For critical software: • A source of concern. • Require additional verification activities. (E.g. manual reviews of generated assembly code; more tests.) • Reduce the usefulness of formal verification. (A provably-correct source program can still misbehave at run-time!)

Addressing miscompilation A radical solution: why not formally verify the compiler itself? After all, compilers have simple specifications: If compilation succeeds, the generated code should behave as prescribed by the semantics of the source program. As a corollary, we obtain: Any safety property of the observable behavior of the source program carries over to the generated executable code.

An old idea. . . Mathematical Aspects of Computer Science , 1967

An old idea. . . Machine Intelligence (7), 1972.

CompCert: a formally-verified C compiler

The CompCert project (X. Leroy, S. Blazy, et al) Develop and prove correct a realistic compiler, usable for critical embedded software. • Source language: a very large subset of C 99. • Target language: PowerPC/ARM/x86 assembly. • Generates reasonably compact and fast code ⇒ careful code generation; some optimizations. Note: compiler written from scratch, along with its proof; not trying to prove an existing compiler.

The formally verified part of the compiler type elimination side-effects out CompCert C Clight C # minor of expressions loop simplifications stack allocation Optimizations: constant prop., CSE, of “&” variables inlining, tail calls CFG construction instruction RTL CminorSel Cminor expr. decomp. selection register allocation (IRC) calling conventions linearization layout of LTL Linear Mach of the CFG stack frames asm code generation Asm x86 Asm ARM Asm PPC

Programming languages and their trustworthy implementation Xavier - PowerPoint PPT Presentation

Programming languages and their trustworthy implementation Xavier Leroy INRIA Paris Van Wijngaarden award, 2016-11-05 A brief history of programming languages and their compilation Its all zeros and ones, right? 10111000 00000001

Trustworthy Computing * Reverse engineers agree on that! Trustworthy Computing Trustworthy

61A Lecture 26 Announcements Programming Languages Programming Languages 4 Programming

TCIPG TECHNICAL CLUSTERS AND THREADS Trustworthy Trustworthy Technologies for Wide Technologies

Chapter 2 Early History: low level languages The 1950s: first programming languages History of

Trustworthy Technologies for Wide Area Monitoring and Control Carl Hauser Number of Activities:

Trustworthy Technologies for Local Area Management, Monitoring, and Control Tom Overbye Number

The History Of Programming Languages Chapter Twenty-Four Modern Programming Languages, 2nd ed.

Programming Languages Chapter One Modern Programming Languages, 2nd ed. 1 Outline What

Big Ideas for CS 251 Theory of Programming Languages Principles of Programming Languages

Big Ideas for CS 251 Theory of Programming Languages Principles of Programming Languages

Before We Start Any questions? Context Free Languages PDAs and CFLs Languages Context Free

SINGLE-SIDED PGAS COMMUNICATIONS LIBRARIES Parallel Programming Languages and Approaches

CSC 1800 Organization of Programming Languages Object Oriented Languages 1 Introduction

Hardware Design with VHDL VHDL Introduction ECE 443 Programming Languages (PL) vs. Hardware

Programming Languages Janyl Jumadinova September 24, 2020 Janyl Jumadinova Programming

Programming Languages Janyl Jumadinova September 10-15, 2020 Janyl Jumadinova Programming

4-Day School Week in Idaho What is a 4-Day School Week? First started in New Mexico in the

Everyone with Diabetes Counts Geneva M. Wilgus Diabetes Self-Management Regional Lead 1 General

EUROCASH Empowering modern retail entrepreneurs Eurocash Group executive summary Polands

Presentation of MAN & MAN Local Content Group to Nigerian Local Content Monitoring Board

MONTOUR AREA RECREATION COMMISSION OPERATION AND MAINTENANCE OF THE MONTOUR PRESERVE 7:00pm

LCCMR ID: 192-F Project Title: Benson Park LCCMR 2010 Funding Priority: F. Environmental

T he ST AT E OF PL AY OF T HE NE GOT IAT ONS ON T HE E CONOMIC PART NE RSHIP AGRE

Pahokee Multifamily Development Highlights DEVELOPMENT HIGHLIGHTS Land 16 acres Type of

Programming languages and their trustworthy implementation Xavier - PowerPoint PPT Presentation

Programming languages and their trustworthy implementation Xavier Leroy INRIA Paris Van Wijngaarden award, 2016-11-05 A brief history of programming languages and their compilation Its all zeros and ones, right? 10111000 00000001

Trustworthy Computing * Reverse engineers agree on that! Trustworthy Computing Trustworthy

61A Lecture 26 Announcements Programming Languages Programming Languages 4 Programming

TCIPG TECHNICAL CLUSTERS AND THREADS Trustworthy Trustworthy Technologies for Wide Technologies

Chapter 2 Early History: low level languages The 1950s: first programming languages History of

Trustworthy Technologies for Wide Area Monitoring and Control Carl Hauser Number of Activities:

Trustworthy Technologies for Local Area Management, Monitoring, and Control Tom Overbye Number

The History Of Programming Languages Chapter Twenty-Four Modern Programming Languages, 2nd ed.

Programming Languages Chapter One Modern Programming Languages, 2nd ed. 1 Outline What

Big Ideas for CS 251 Theory of Programming Languages Principles of Programming Languages

Big Ideas for CS 251 Theory of Programming Languages Principles of Programming Languages

Before We Start Any questions? Context Free Languages PDAs and CFLs Languages Context Free

SINGLE-SIDED PGAS COMMUNICATIONS LIBRARIES Parallel Programming Languages and Approaches

CSC 1800 Organization of Programming Languages Object Oriented Languages 1 Introduction

Hardware Design with VHDL VHDL Introduction ECE 443 Programming Languages (PL) vs. Hardware

Programming Languages Janyl Jumadinova September 24, 2020 Janyl Jumadinova Programming

Programming Languages Janyl Jumadinova September 10-15, 2020 Janyl Jumadinova Programming

4-Day School Week in Idaho What is a 4-Day School Week? First started in New Mexico in the

Everyone with Diabetes Counts Geneva M. Wilgus Diabetes Self-Management Regional Lead 1 General

EUROCASH Empowering modern retail entrepreneurs Eurocash Group executive summary Polands

Presentation of MAN &amp; MAN Local Content Group to Nigerian Local Content Monitoring Board

MONTOUR AREA RECREATION COMMISSION OPERATION AND MAINTENANCE OF THE MONTOUR PRESERVE 7:00pm

LCCMR ID: 192-F Project Title: Benson Park LCCMR 2010 Funding Priority: F. Environmental

T he ST AT E OF PL AY OF T HE NE GOT IAT ONS ON T HE E CONOMIC PART NE RSHIP AGRE

Pahokee Multifamily Development Highlights DEVELOPMENT HIGHLIGHTS Land 16 acres Type of

Presentation of MAN & MAN Local Content Group to Nigerian Local Content Monitoring Board