How much is CompCerts proof worth, qualification-wise? Xavier Leroy - - PowerPoint PPT Presentation

how much is compcert s proof worth qualification wise
SMART_READER_LITE
LIVE PREVIEW

How much is CompCerts proof worth, qualification-wise? Xavier Leroy - - PowerPoint PPT Presentation

How much is CompCerts proof worth, qualification-wise? Xavier Leroy Inria Paris-Rocquencourt Dagstuhl seminar Qualification of FM tools, April 2015 1 / 29 In this talk Some thoughts on taking advantage of CompCerts correctness


slide-1
SLIDE 1

How much is CompCert’s proof worth, qualification-wise?

Xavier Leroy

Inria Paris-Rocquencourt

Dagstuhl seminar “Qualification of FM tools”, April 2015

1 / 29

slide-2
SLIDE 2

In this talk

Some thoughts on taking advantage of CompCert’s correctness proof in a DO-330-style tool qualification. Based on discussions with two aircraft manufacturers, and work done in ANR project Verasco (http://verasco.imag.fr/).

2 / 29

slide-3
SLIDE 3

In this talk

Some thoughts on taking advantage of CompCert’s correctness proof in a DO-330-style tool qualification. Based on discussions with two aircraft manufacturers, and work done in ANR project Verasco (http://verasco.imag.fr/). Disclaimer 1: all this is speculation. Qualification of a C compiler has never been attempted before, and may not be viable economically speaking. Disclaimer 2: the views presented here are mine and are not endorsed by said manufacturers. Disclaimer 3: my understanding of tool qualification is very limited.

2 / 29

slide-4
SLIDE 4

The formally-verified part of CompCert

CompCert C Clight C#minor Cminor CminorSel RTL LTL Linear Mach Asm PPC Asm ARM Asm x86

side-effects

  • ut of expr

type elim. loop simplif. stack allocation

  • f variables

instruction selection CFG constr.

  • expr. decomp.

register allocation (IRC) calling conventions linearization

  • f the CFG

layout of stack frames asm code generation Optimizations: constant prop., CSE, dead code, inlining, tail calls Value and neededness analyses parsing

3 / 29

slide-5
SLIDE 5

A possible mapping to DO-330

Tool Operational Requirements (High-Level) Tool Requirements Low-Level Tool Requirements Source code

slide-6
SLIDE 6

A possible mapping to DO-330

Tool Operational Requirements (High-Level) Tool Requirements Low-Level Tool Requirements Source code English prose Coq specs Coq code OCaml code

slide-7
SLIDE 7

A possible mapping to DO-330

Tool Operational Requirements (High-Level) Tool Requirements Low-Level Tool Requirements Source code English prose Coq specs Coq code OCaml code Manual verification Automated verification (Coq proof checking) Automatic code generation (Coq’s extraction mechanism)

4 / 29

slide-8
SLIDE 8

Partitioning the Coq development

The specifications: everything involved in the statement of semantics preservation.

  • Abstract syntax and operational semantics for CompCert C.
  • Abstract syntax and operational semantics for Asm.
  • Notions of semantic preservation: backward simulation diagrams

OR preservation of whole-program observable behaviors.

  • Supporting theories: integers, floating-point numbers, memory

states, observable events.

5 / 29

slide-9
SLIDE 9

Example of specification

The semantics of sequencing in CompCert C

Inductive sstep: state -> trace -> state -> Prop := [...] | step_seq: forall f s1 s2 k e m, sstep (State f (Ssequence s1 s2) k e m) E0 (State f s1 (Kseq s2 k) e m) | step_skip_seq: forall f s k e m, sstep (State f Sskip (Kseq s k) e m) E0 (State f s k e m) | step_continue_seq: forall f s k e m, sstep (State f Scontinue (Kseq s k) e m) E0 (State f Scontinue k e m) | step_break_seq: forall f s k e m, sstep (State f Sbreak (Kseq s k) e m) E0 (State f Sbreak k e m)

6 / 29

slide-10
SLIDE 10

Partitioning the Coq development

The specifications: everything involved in the statement of semantics preservation. The code: algorithms for code generation and optimization; supporting static analyses.

  • Programmed in Coq’s specification language (Gallina).
  • Pure functional style: recursive functions operating by

pattern-matching on tree-shaped data structures.

  • Uses monads for mutable state and error reporting.
  • Similar to Haskell code or pure OCaml code.

7 / 29

slide-11
SLIDE 11

Example of Coq code

Building a control-flow graph for a structured expression

Fixpoint transl_expr (map: mapping) (a: expr) (rd: reg) (nd: node) {struct a}: mon node := match a with | Evar v => do r <- find_var map v; add_move r rd nd | Eop op al => do rl <- alloc_regs map al; do no <- add_instr (Iop op rl rd nd); transl_exprlist map al rl no | Eload chunk addr al => do rl <- alloc_regs map al; do no <- add_instr (Iload chunk addr rl rd nd); transl_exprlist map al rl no | [...] end.

8 / 29

slide-12
SLIDE 12

Partitioning the Coq development

The specifications: everything involved in the statement of semantics preservation. The code: algorithms for code generation and optimization; supporting static analyses. The proof: everything that contributes to showing that the code satisfies the semantic preservation specs.

  • Abstract syntax and semantics of intermediate languages.
  • Many other auxiliary definitions.
  • Copious proof scripts.

9 / 29

slide-13
SLIDE 13

Smaller high-level requirements

Thanks to the Coq proof, the high-level requirements do not need to talk about:

  • Intermediate languages.
  • Compilation algorithms.
  • Optimizations and their supporting static analyses.

“Draw me a compiler.” “The compiler is in this box.”

10 / 29

slide-14
SLIDE 14

Partitioning the Coq development

The three aspects (specs, code, proofs) are intertwined in the current CompCert development, and difficult to separate without tool assistance.

Theorem transf_c_program_preservation: forall p tp beh, transf_c_program p = OK tp -> program_behaves (Asm.semantics tp) beh -> exists beh’, program_behaves (Csem.semantics p) beh’ /\ behavior_improves beh’ beh.

  • Proof. ... Qed.

(Caption: Specs; Code; Proof.) Also: some definitions belong both to the specs and to the code. Also: bits of proofs in the code (dependent types).

11 / 29

slide-15
SLIDE 15

Verification activities

Between high-level and low-level requirements: fully automated Coq proof checking.

  • Checked at every Qed. in interactive proof mode.
  • Re-checked by coqc batch compilation.
  • Re-re-checked by the coqchk validator.

(But: not independent, and doesn’t work currently.)

12 / 29

slide-16
SLIDE 16

Verification activities

Between high-level and low-level requirements: fully automated Coq proof checking. Between low-level requirements and source code: Coq’s extraction mechanism. + Automatic code generation. – Need to increase confidence. (No real test suite; few users.) + Generated OCaml code looks a lot like the Coq code. – But too big for manual code review. (30 000 LOC + 20 000 LOC for the generated parser.)

13 / 29

slide-17
SLIDE 17

Example of extracted code

let rec transl_expr map a rd nd s = match a with | Evar v -> (match find_var map v s with | Error msg0 -> Error msg0 | OK (a0, s’) -> add_move a0 rd nd s’) | Eop (op, al) -> (match alloc_regs map al s with | Error msg0 -> Error msg0 | OK (a0, s’) -> (match add_instr (Iop (op, a0, rd, nd)) s’ with | Error msg0 -> Error msg0 | OK (a1, s’0) -> transl_exprlist map al a0 a1 s’0)) | Eload (chunk, addr, al) -> (match alloc_regs map al s with | Error msg0 -> Error msg0 | OK (a0, s’) -> (match add_instr (Iload (chunk, addr, a0, rd, nd)) s’ with | Error msg0 -> Error msg0 | OK (a1, s’0) -> transl_exprlist map al a0 a1 s’0))

14 / 29

slide-18
SLIDE 18

Verification activities

Between high-level and low-level requirements: fully automated Coq proof checking. Between low-level requirements and source code: Coq’s extraction mechanism. Between operational and high-level requirements: here are dragons!

  • The notion of semantic preservation needs explaining.
  • Asm semantics is not too hard to relate with the processor’s

documentation.

  • The CompCert C semantics is complex, and the ISO C99

standard is a mess → difficulties relating the two.

15 / 29

slide-19
SLIDE 19

Validating a formal C semantics

By reviews:

  • Need to be fluent in Coq and expert on the C standard.
  • About 2500 LOC of Coq + 5000 for the supporting theories.

By formal proofs of equivalence with other semantics:

  • E.g. Norrish’s Cholera (HOL), Krebber’s CH2O (Coq).
  • Different subsets of C, different refinements.
  • The other semantics haven’t been validated either!

By testing:

  • Using an executable presentation of the semantics.
  • Much easier to test than a whole compiler, because no
  • ptimizations and no context dependencies.
  • Also: identifies undefined behaviors precisely.

16 / 29

slide-20
SLIDE 20

The CompCert C reference interpreter

An interpreter for C, written in Coq, proved sound and complete against the operational semantics. Several levels of tracing:

  • -quiet: only prints program outputs
  • default: prints all observable actions (e.g. volatile accesses)
  • -trace: full trace of execution steps.

Several ways to handle C nondeterminism:

  • Pick one, fixed evaluation order.
  • Randomize the evaluation order.
  • Explore all possible evaluation orders.

Example of use: 3-way differential random testing (CompCert interpreter, CompCert compiler, GCC) using Csmith.

17 / 29

slide-21
SLIDE 21

The full CompCert compiler

AST C AST Asm C source Assembly Executable

preprocessing, parsing, construction of an AST elaboration, type-checking, de-sugaring Verified compiler printing of asm syntax assembling linking Type reconstruction Register allocation Code linearization heuristics

Proved in Coq

(extracted to Caml)

Not proved

(hand-written in Caml) Verification needed No additional verification needed

18 / 29

slide-22
SLIDE 22

What to do with the unproved parts?

Assembly and linking:

  • An unverified validation tool that matches the ELF executable

against the Asm AST. From C source to CompCert C AST:

  • Testing? (mostly local, compositional transformations)
  • Proving more things? (e.g. lexing)
  • Lack of high-level formal specifications.

19 / 29

slide-23
SLIDE 23

When spec = implementation. . .

Example: interpreting the “tag soup”. volatile long unsigned inline long static * const f(void) { ... } → Function declaration: name: f storage class: static inline: true result type: TPtr(TInt(IULongLong, volatile), const) parameters: none varargs: no body: . . .

20 / 29

slide-24
SLIDE 24

Non-functional properties

CompCert’s proof shows preservation of the functional correctness of the program through compilation. What about non-functional properties and their preservation?

  • WCET: not definable at the source level;

best analyzed after compilation, at the machine code level. (AbsInt’s aiT)

  • Stack consumption: semi-definable at the source level;

easy to analyze at the machine or assembly code level. (AbsInt’s StackAnalyzer)

  • Code coverage.

21 / 29

slide-25
SLIDE 25

Bidirectional traceability

High-level principle in DO-178C:

  • Every requirement is fulfilled by the executable code.
  • Every line of the source / instruction of the executable is

connected to a requirement (i.e. has a purpose). A particularly strict interpretation of this principle: there exists a test suite that

  • exercises every requirement, and
  • covers the source (MCDC) or executable (instruction + branch

coverage). In particular: no unreachable code in the executable.

22 / 29

slide-26
SLIDE 26

Proving coverage preservation?

Conjecture

If the C source is covered by a test suite (MCDC coverage), the assembly code generated by CompCert is covered by the same test suite (instruction + branch coverage). Not easy to formulate in a mathematically-precise way. Wrong in general!

23 / 29

slide-27
SLIDE 27

Failure of coverage preservation

The PowerPC instruction set has a fctiwz instruction to convert a FP number to a signed integer, but no instruction to convert to an unsigned integer. The latter conversion is implemented by C compilers as follows:

i = (unsigned int) f; ---> if (f < 231) i = fctiwz(f); else i = fctiwz(f - 231) + 231

If the logic of the program is such that f is always in [0, 231), the else branch is never exercised.

24 / 29

slide-28
SLIDE 28

Failure of coverage preservation

The PowerPC instruction set has a fctiwz instruction to convert a FP number to a signed integer, but no instruction to convert to an unsigned integer. The latter conversion is implemented by C compilers as follows:

i = (unsigned int) f; ---> if (f < 231) i = fctiwz(f); else i = fctiwz(f - 231) + 231

If the logic of the program is such that f is always in [0, 231), the else branch is never exercised. Reaction #1: “This is bad. Either you change the compiler, or we have to change our source code (to use a signed int).” Reaction #2: “No problem. This else branch is here for a purpose, we can explain easily.”

24 / 29

slide-29
SLIDE 29

Concluding remarks

25 / 29

slide-30
SLIDE 30

Messages for formal methods people

A formal verification is not a certification/qualification. A formal verification can contribute usefully to a certification/qualification. Focus on your specifications. (A clearer spec is worth a 2x increase in proof effort.) Make provisions for testing your specifications. (Semantics that are executable, for example.) Focus your verification efforts on parts that cannot be adequately verified by testing. Rome wasn’t built in a day. Keep bringing nice stones!

26 / 29

slide-31
SLIDE 31

Messages for developers of proof assistants

Yes, it is good mathematical practice to prove some properties after every definition, but quality assurance people don’t work this way. We desperately need tools to help review a mechanized proof and separate out the specs, the code, and the proofs proper. Just a slicing tool would be very useful already. Also: make the prover absolutely trustworthy. The proof checker should remain small and exist in at least two independent implementations.

27 / 29

slide-32
SLIDE 32

Messages for quality assurance people

Learn enough of Coq/Isabelle/HOL to be able to read a formal

  • specification. (The proofs can be developed by others.)

Popularize: tutorials with concrete case studies; MOOCs; etc. Re-evaluate your priorities:

  • Functional correctness matters more than absence of dead code.
  • Quality of final software matters more than the development

process.

28 / 29