A concrete memory model for CompCert Frdric Besson Sandrine Blazy - - PowerPoint PPT Presentation

a concrete memory model for compcert
SMART_READER_LITE
LIVE PREVIEW

A concrete memory model for CompCert Frdric Besson Sandrine Blazy - - PowerPoint PPT Presentation

A concrete memory model for CompCert Frdric Besson Sandrine Blazy Pierre Wilke Rennes, France P . Wilke A concrete memory model for CompCert 1 / 28 CompCert real-world C to ASM compiler used in industry (commercialised by AbsInt)


slide-1
SLIDE 1

A concrete memory model for CompCert

Frédéric Besson Sandrine Blazy Pierre Wilke

Rennes, France

P . Wilke A concrete memory model for CompCert 1 / 28

slide-2
SLIDE 2

CompCert

  • real-world C to ASM compiler used in industry (commercialised by AbsInt)
  • proven correct in Coq: it does not introduce bugs!

C Clight Cminor RTL ASM

P . Wilke A concrete memory model for CompCert 2 / 28

slide-3
SLIDE 3

CompCert

  • real-world C to ASM compiler used in industry (commercialised by AbsInt)
  • proven correct in Coq: it does not introduce bugs!

C Clight Cminor RTL ASM

P . Wilke A concrete memory model for CompCert 2 / 28

slide-4
SLIDE 4

CompCert

  • real-world C to ASM compiler used in industry (commercialised by AbsInt)
  • proven correct in Coq: it does not introduce bugs!

C Clight Cminor RTL ASM

Each language has a Formal Semantics

i.e. a mathematical meaning for programs

P . Wilke A concrete memory model for CompCert 2 / 28

slide-5
SLIDE 5

CompCert

  • real-world C to ASM compiler used in industry (commercialised by AbsInt)
  • proven correct in Coq: it does not introduce bugs!

C Clight Cminor RTL ASM

Each language has a Formal Semantics

i.e. a mathematical meaning for programs

Proof of semantic preservation

For every source program S that has a defined semantics, If the compiler succeeds to generate a target program T, Then T has the same behavior as S.

P . Wilke A concrete memory model for CompCert 2 / 28

slide-6
SLIDE 6

CompCert

  • real-world C to ASM compiler used in industry (commercialised by AbsInt)
  • proven correct in Coq: it does not introduce bugs!

C Clight Cminor RTL ASM Memory model

Each language has a Formal Semantics

i.e. a mathematical meaning for programs

Proof of semantic preservation

For every source program S that has a defined semantics, If the compiler succeeds to generate a target program T, Then T has the same behavior as S.

P . Wilke A concrete memory model for CompCert 2 / 28

slide-7
SLIDE 7

Goal: Make the semantics of C more defined

Why did C leave some behaviors undefined?

  • Portability
  • Performance

Why do we want to make it more defined?

  • real-life programs use features that are undefined, according to C
  • the compilation theorem will be more useful

What kind of undefined behaviors do we aim at?

  • undefined pointer arithmetic, i.e. bitwise operators
  • use of uninitialised memory

Our starting point: CompCert

P . Wilke A concrete memory model for CompCert 3 / 28

slide-8
SLIDE 8

An example of low-level C program in CompCert

int main(){ int * p = (int *) malloc (sizeof (int));

*p = 42;

int * q = p | 5; int * r = (q » 3) « 3; return *r;

} bp bq br

P . Wilke A concrete memory model for CompCert 4 / 28

slide-9
SLIDE 9

An example of low-level C program in CompCert

int main(){ int * p = (int *) malloc (sizeof (int));

*p = 42;

int * q = p | 5; int * r = (q » 3) « 3; return *r;

}

(b,0)

bp bq br b

P . Wilke A concrete memory model for CompCert 4 / 28

slide-10
SLIDE 10

An example of low-level C program in CompCert

int main(){ int * p = (int *) malloc (sizeof (int));

*p = 42;

int * q = p | 5; int * r = (q » 3) « 3; return *r;

}

(b,0)

bp bq br 42 b

P . Wilke A concrete memory model for CompCert 4 / 28

slide-11
SLIDE 11

An example of low-level C program in CompCert

int main(){ int * p = (int *) malloc (sizeof (int));

*p = 42;

int * q = p | 5; int * r = (q » 3) « 3; return *r;

}

(b,0)

bp bq br 42 b Bitwise operators on pointers are undefined behavior! CompCert [JAR’09], KCC [POPL ’12], Krebbers [POPL ’14], Norrish [PhD’98]: undefined behavior Kang et al. [PLDI’15]: don’t model bitwise operators

P . Wilke A concrete memory model for CompCert 4 / 28

slide-12
SLIDE 12

Contributions

  • Previous work [APLAS’14]:

A memory model for low-level programs

  • This work:
  • integration of the memory model inside CompCert
  • correctness proofs of the memory model
  • correctness proofs of the transformations of the frontend (up to Cminor)

P . Wilke A concrete memory model for CompCert 5 / 28

slide-13
SLIDE 13

Outline

1 CompCert’s memory model 2 New features of the memory model 3 Consistency of the memory models 4 CompCert proof: Overview 5 Conclusion

P . Wilke A concrete memory model for CompCert 6 / 28

slide-14
SLIDE 14

Outline

1 CompCert’s memory model 2 New features of the memory model 3 Consistency of the memory models 4 CompCert proof: Overview 5 Conclusion

P . Wilke A concrete memory model for CompCert 7 / 28

slide-15
SLIDE 15

New features of the memory model

Symbolic expressions

val ::= i | (b,o) not expressive enough We change the semantic domain to: expr ::= val | op1 expr | expr op2 expr

P . Wilke A concrete memory model for CompCert 8 / 28

slide-16
SLIDE 16

New features of the memory model

Symbolic expressions

val ::= i | (b,o) not expressive enough We change the semantic domain to: expr ::= val | op1 expr | expr op2 expr

Alignment constraints

We need information about some bits of the concrete address of a pointer The alloc primitive takes an extra parameter mask, such that: A(b) & mask = A(b)

P . Wilke A concrete memory model for CompCert 8 / 28

slide-17
SLIDE 17

Interaction with the memory model

What is the semantics of reading from memory: *p ? In CompCert, p is evaluated into a pointer (b,i), then we can use load(M,b,i) In our model, p is a symbolic expression. It needs to be transformed into a pointer so that we can use load. normalise : mem → expr → ⌊val⌋ We need to modify the semantics to include calls to normalise

  • memory accesses (load and store)
  • conditionnal branches

P . Wilke A concrete memory model for CompCert 9 / 28

slide-18
SLIDE 18

Back to the example

int main(){ int * p = (int *) malloc (sizeof (int));

*p = 42;

int * q = p | 5; int * r = (q » 3) « 3; return *r;

}

(b,0)

bp bq br 42 b8

P . Wilke A concrete memory model for CompCert 10 / 28

slide-19
SLIDE 19

Back to the example

int main(){ int * p = (int *) malloc (sizeof (int));

*p = 42;

int * q = p | 5; int * r = (q » 3) « 3; return *r;

}

(b,0)

bp

(b,0) | 5

bq br 42 b8

P . Wilke A concrete memory model for CompCert 10 / 28

slide-20
SLIDE 20

Back to the example

int main(){ int * p = (int *) malloc (sizeof (int));

*p = 42;

int * q = p | 5; int * r = (q » 3) « 3; return *r;

}

(b,0)

bp

(b,0) | 5

bq

  • (b,0) | 5
  • ≫ 3
  • ≪ 3

br 42 b8

P . Wilke A concrete memory model for CompCert 10 / 28

slide-21
SLIDE 21

Back to the example

int main(){ int * p = (int *) malloc (sizeof (int));

*p = 42;

int * q = p | 5; int * r = (q » 3) « 3; return *r;

}

(b,0)

bp

(b,0) | 5

bq

  • (b,0) | 5
  • ≫ 3
  • ≪ 3

br 42 b8

(b,0)

normalise

P . Wilke A concrete memory model for CompCert 10 / 28

slide-22
SLIDE 22

Back to the example

int main(){ int * p = (int *) malloc (sizeof (int));

*p = 42;

int * q = p | 5; int * r = (q » 3) « 3; return *r;

}

(b,0)

bp

(b,0) | 5

bq

  • (b,0) | 5
  • ≫ 3
  • ≪ 3

br 42 b8

(b,0)

normalise

P . Wilke A concrete memory model for CompCert 10 / 28

slide-23
SLIDE 23

Normalisation specification: concrete memories

Abstract memory m

(b2,2)

5 cm1 cm2 cm3 cm4 cm5 cm6 8 16 24 32 40 48 56 Concrete memories of m cmi ⊢ m

  • range : ]0;55[
  • no overlap
  • alignment

P . Wilke A concrete memory model for CompCert 11 / 28

slide-24
SLIDE 24

Normalisation: example 1

e = ((( b ,0) | 5) ≫ 3)≪3 cm1 8

= (b,o)cm1

cm2 8

= (b,o)cm2

cm3 16

= (b,o)cm3

cm4 24

= (b,o)cm4

cm5 32

= (b,o)cm5

cm6 32

= (b,o)cm6

8 16 24 32 40 48 56

ecm1 = (((cm1(b)+ 0) | 5) ≫ 3) = ((8 | 5) ≫ 3) = ((0b1000 | 5) ≫ 3)≪3 = (0b1101 ≫ 3)≪3 =

0b0001≪3 = 0b1000 = 8 = cm1(b)

∀i,ecmi = cmi(b), hence e normalises into (b,0)

P . Wilke A concrete memory model for CompCert 12 / 28

slide-25
SLIDE 25

Normalisation: example 2

e = ( b ,0) > ( b′ ,0) cm1 true cm2 true cm3 true cm4 false cm5 false cm6 false 8 16 24 32 40 48 56 There is no v such that ∀i,ecmi = vcmi , hence e doesn’t normalise

P . Wilke A concrete memory model for CompCert 13 / 28

slide-26
SLIDE 26

CompCert with symbolic expressions

C Clight Cminor RTL ASM Memory model

(b2,2)

5 5 7

(b,o) | 5

b1 b3 b2 expr ::= val | op1 expr | expr op2 expr S S S S S

P . Wilke A concrete memory model for CompCert 14 / 28

slide-27
SLIDE 27

Outline

1 CompCert’s memory model 2 New features of the memory model 3 Consistency of the memory models 4 CompCert proof: Overview 5 Conclusion

P . Wilke A concrete memory model for CompCert 15 / 28

slide-28
SLIDE 28

How does our model compare to CompCert?

t x(t) t x(t) Behaviors in CompCert Behaviors with symbolic expressions We are an extension of CompCert

P . Wilke A concrete memory model for CompCert 16 / 28

slide-29
SLIDE 29

How does our model compare to CompCert?

Formally,

Lemma expr_add_ok: ∀ v1 v2 m v, sem_add v1 v2 m = ⌊v⌋ →

∃ e, sem_add_expr v1 v2 m = ⌊e⌋ ∧

normalise m e = v.

If the addition of v1 and v2 succeeds in CompCert, Then it should succeed in our model as well, And the expression we compute should normalise into the same value.

P . Wilke A concrete memory model for CompCert 17 / 28

slide-30
SLIDE 30

Discovery of bugs

2 cases where our model disagrees with CompCert

  • Bug in CompCert 2.4: Pointer comparison to NULL

(fixed in CompCert 2.5)

  • Bug in our model: incorrect handling of pointers one past the end

P . Wilke A concrete memory model for CompCert 18 / 28

slide-31
SLIDE 31

Incorrect pointer comparison to NULL

In CompCert:

  • pointers are pairs (b,o)
  • the NULL pointer is represented as the integer 0

p == 0 was incorrectly defined to always evaluate to false when p is a pointer. 8 16 24 32 40 48 56 b But we need to check that o is a valid offset of b

  • (b,o)cm = cm(b)+ o is not zero only in that case
  • otherwise (b,−8) evaluates to zero

P . Wilke A concrete memory model for CompCert 19 / 28

slide-32
SLIDE 32

Outline

1 CompCert’s memory model 2 New features of the memory model 3 Consistency of the memory models 4 CompCert proof: Overview 5 Conclusion

P . Wilke A concrete memory model for CompCert 20 / 28

slide-33
SLIDE 33

Overview of CompCert architecture

C Clight C♯minor Cminor frontend backend CminorSel RTL LTL Linear Mach ASM

:

conserves the memory layout

:

modifies the memory layout

P . Wilke A concrete memory model for CompCert 21 / 28

slide-34
SLIDE 34

Memory injections: a generic memory transformation

In CompCert C, each local variable has its own block. During the compilation these variables are merged into a stack frame. mem_inject f m m′ m m′ 1

(b3,o)

37 b1 b2 b3 1 b′

(b′,o +δ2)

37

δ1 δ2

f f f Adapting to symbolic expressions:

  • generalization of the injection over values
  • lots of proofs to adapt (relation with normalisation)

P . Wilke A concrete memory model for CompCert 22 / 28

slide-35
SLIDE 35

Memory injections - Central theorem

Theorem norm_inject: ∀ f m m’ e e’ (Minj: inject f m m’) (Einj: expr_inject f e e’), val_inject f (normalise m e) (normalise m’ e’).

  • We can show that: ∃v,val_inject f (normalise m e) v
  • Let’s now prove that: normalise m′ e′ = v
  • ∀cm′ ⊢ m′,e′cm′ = vcm′
  • From the specification of the normalisation of e in m we know:

∀cm ⊢ m,ecm = normalise m ecm

  • We need a theorem relating evaluations in cm and cm′!

P . Wilke A concrete memory model for CompCert 23 / 28

slide-36
SLIDE 36

Memory injections - Evaluation

mem_inject f m m’ 8 16 24 32 40 48 Concrete memories of m 8 16 24 32 40 48 Concrete memories of m′ pre_cm(f,cm’): recovers a concrete memory as it was before injection

Definition pre_cm f cm’ := fun (b: block) ⇒ let (b’, delta) := f b in cm’ b’ + delta. Theorem expr_inject_eval: ∀ f cm’ e e’ (Einj: expr_inject f e e’),

e’ cm′ = e pre_cm(f,cm′).

P . Wilke A concrete memory model for CompCert 24 / 28

slide-37
SLIDE 37

Memory injections - Central theorem

Theorem norm_inject: ∀ f m m’ e e’ (Minj: inject f m m’) (Einj: expr_inject f e e’), val_inject f (normalise m e) (normalise m’ e’).

Concrete memories of m Concrete memories of m′

expr_inject_eval : expr_inject f e e′ ⇒

e′cm′ = epre_cm(f,cm′)

  • We are left to prove:

∀cm′ ⊢ m′,e′cm′ = vcm′

  • We rewrite both sides using expr_inject_eval, the goal becomes:

∀cm′ ⊢ m′,epre_cm(f,cm′) = normalise m epre_cm(f,cm′)

  • From the specification of the normalisation of e in m we know:

∀cm ⊢ m,ecm = normalise m ecm

which solves our goal.

P . Wilke A concrete memory model for CompCert 25 / 28

slide-38
SLIDE 38

Outline

1 CompCert’s memory model 2 New features of the memory model 3 Consistency of the memory models 4 CompCert proof: Overview 5 Conclusion

P . Wilke A concrete memory model for CompCert 26 / 28

slide-39
SLIDE 39

Conclusion

A semantics for C

  • more precise than CompCert’s
  • compatible with CompCert
  • nearly as proven correct as CompCert

Future directions

  • finish the proof by adapting the last remaining unproven pass
  • add a more concrete assembly language to the certified compilation chain
  • plug back in optimizations at RTL level (precision improvement?, still

sound?)

P . Wilke A concrete memory model for CompCert 27 / 28

slide-40
SLIDE 40

Questions?

P . Wilke A concrete memory model for CompCert 28 / 28