How to get an efficient yet verified arbitrary-precision integer - - PowerPoint PPT Presentation

how to get an efficient yet verified arbitrary precision
SMART_READER_LITE
LIVE PREVIEW

How to get an efficient yet verified arbitrary-precision integer - - PowerPoint PPT Presentation

How to get an efficient yet verified arbitrary-precision integer library Raphal Rieu-Helft (joint work with Guillaume Melquiond and Claude March) TrustInSoft Inria May 28, 2018 1/31 Context, motivation, goals goal: efficient and


slide-1
SLIDE 1

1/31

How to get an efficient yet verified arbitrary-precision integer library

Raphaël Rieu-Helft

(joint work with Guillaume Melquiond and Claude Marché)

TrustInSoft Inria

May 28, 2018

slide-2
SLIDE 2

2/31

Context, motivation, goals

goal: efficient and formally verified large-integer library GMP: widely-used, high-performance library tested, but hard to ensure good coverage (unlikely branches) correctness bugs have been found in the past idea:

1 formally verify GMP algorithms with Why3 2 extract efficient C code

slide-3
SLIDE 3

3/31

Outline

1 Deductive verification with Why3 2 Reimplementing GMP using Why3 3 Extracting to idiomatic C code 4 An example: schoolbook multiplication 5 Benchmarks, conclusions

slide-4
SLIDE 4

4/31

Deductive verification with Why3

slide-5
SLIDE 5

5/31

Deductive verification

program + specification verification conditions proof

slide-6
SLIDE 6

6/31

Kadane’s algorithm

(* | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | *) (* ......|######## max ########|.............. *) (* ..............................|### cur #### *) let maximum_subarray (a: array int): int ensures { forall l h: int. 0 ≤ l ≤ h ≤ length a → sum a l h ≤ result } ensures { exists l h: int. 0 ≤ l ≤ h ≤ length a ∧ sum a l h = result } = let max = ref 0 in let cur = ref 0 in for i = 0 to length a - 1 do cur += a[i]; if !cur < 0 then cur := 0; if !cur > !max then max := !cur done; !max

slide-7
SLIDE 7

7/31

Kadane’s algorithm

(* | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | *) (* ......|######## max ########|.............. *) (* ..............................|### cur #### *) let maximum_subarray (a: array int): int ensures { forall l h: int. 0 ≤ l ≤ h ≤ length a → sum a l h ≤ result } ensures { exists l h: int. 0 ≤ l ≤ h ≤ length a ∧ sum a l h = result } = let max = ref 0 in let cur = ref 0 in let ghost cl = ref 0 in for i = 0 to length a - 1 do invariant { forall l: int. 0 ≤ l ≤ i → sum a l i ≤ !cur } invariant { 0 ≤ !cl ≤ i ∧ sum a !cl i = !cur } cur += a[i]; if !cur < 0 then begin cur := 0; cl := i+1 end; if !cur > !max then max := !cur done; !max

slide-8
SLIDE 8

8/31

Kadane’s algorithm

(* | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | *) (* ......|######## max ########|.............. *) (* ..............................|### cur #### *) let maximum_subarray (a: array int): int ensures { forall l h: int. 0 ≤ l ≤ h ≤ length a → sum a l h ≤ result } ensures { exists l h: int. 0 ≤ l ≤ h ≤ length a ∧ sum a l h = result } = let max = ref 0 in let cur = ref 0 in let ghost cl = ref 0 in let ghost lo = ref 0 in let ghost hi = ref 0 in for i = 0 to length a - 1 do invariant { forall l: int. 0 ≤ l ≤ i → sum a l i ≤ !cur } invariant { 0 ≤ !cl ≤ i ∧ sum a !cl i = !cur } invariant { forall l h: int. 0 ≤ l ≤ h ≤ i → sum a l h ≤ !max } invariant { 0 ≤ !lo ≤ !hi ≤ i ∧ sum a !lo !hi = !max } cur += a[i]; if !cur < 0 then begin cur := 0; cl := i+1 end; if !cur > !max then begin max := !cur; lo := !cl; hi := i+1 end done; !max

slide-9
SLIDE 9

9/31

Computing verification conditions

A Hoare triplet: {P} e {Q} P precondition property e expression Q postcondition property {P}e {Q} if we execute e in a state that satisfies P, then the computation terminates in a state that satisfies Q Predicate transformer WP(e,Q) (Dijkstra) e expression Q postcondition computes the weakest precondition P such that {P}e {Q}

slide-10
SLIDE 10

10/31

Definition of WP

WP(skip,Q) = Q WP(t,Q) = Q[result → t ] WP(x := t,Q) = Q[x → t ] WP(e1 ; e2,Q) = WP(e1,WP(e2,Q)) WP(let v = e1 in e2,Q) = WP(e1,WP(e2,Q)[v → result]) WP(let x = ref e1 in e2,Q) = WP(e1,WP(e2,Q)[x → result]) WP(if t then e1 else e2,Q) = (t → WP(e1,Q)) ∧ (¬t → WP(e2,Q)) WP(assert R,Q) = R ∧(R → Q)

slide-11
SLIDE 11

11/31

Soundness of WP

Theorem

For any e and Q, the triple {WP(e,Q)}e {Q} is valid. Can be proved by induction on the structure of the program e w.r.t. some reasonable semantics (axiomatic, operational, etc.)

Corollary

To show that {P}e {Q} is valid, it suffices to prove P → WP(e,Q). This is what Why3 does.

slide-12
SLIDE 12

12/31

Why3 proof session

slide-13
SLIDE 13

13/31

Reimplementing GMP using Why3

slide-14
SLIDE 14

14/31

General approach

file.mlw

Why3

Alt-Ergo CVC4 Z3 etc. file.ml file.c game plan: implement the GMP algorithms in WhyML verify them with Why3 extract to C difficulties: preserve all GMP implementation tricks prove them correct extract to efficient C code

slide-15
SLIDE 15

15/31

An example: comparison

large integer ≡ pointer to array of unsigned integers a0 ...an−1 called limbs value(a,n) =

n−1

i=0

aiβ i usually β = 264

type ptr 'a = ... exception Return32 int32 let wmpn_cmp (x y: ptr uint64) (sz: int32): int32 = let i = ref sz in try while !i ≥ 1 do i := !i - 1; let lx = x[!i] in let ly = y[!i] in if lx = ly then if lx > ly then raise (Return32 1) else raise (Return32 (-1)) done; with Return32 r → r end

slide-16
SLIDE 16

16/31

Memory model

simple memory model, more restrictive than C

type ptr 'a = abstract { mutable data: array 'a ; offset: int } predicate valid (p:ptr 'a) (sz:int) = 0 ≤ sz ∧ 0 ≤ p.offset ∧ p.offset + sz ≤ plength p

p.data 1 2 3

p.offset

4 5 6 7 8

  • valid(p,5)

val malloc (sz:uint32) : ptr 'a (* malloc(sz * sizeof('a)) *) ... val free (p:ptr 'a) : unit (* free(p) *) ...

no explicit address for pointers

slide-17
SLIDE 17

17/31

Alias control

aliased C pointers ⇔ point to the same memory object aliased Why3 pointers ⇔ same data field

  • nly way to get aliased pointers: incr

type ptr 'a = abstract { mutable data: array 'a ; offset: int } val incr (p:ptr 'a) (ofs:int32): ptr 'a (* p+ofs *) alias { result.data with p.data } ensures { result.offset = p.offset + ofs } ... val free (p:ptr 'a) : unit requires { p.offset = 0 } writes { p.data } ensures { p.data.length = 0 }

Why3 type system: all aliases are known statically ⇒ no need to prove non-aliasing hypotheses

slide-18
SLIDE 18

18/31

Example specification: long multiplication

specifications are defined in terms of value

(** [wmpn_mul r x y sx sy] multiplies [(x, sx)] and [(y,sy)] and writes the result in [(r, sx+sy)]. [sx] must be greater than or equal to [sy]. Corresponds to [mpn_mul]. *) let wmpn_mul (r x y: ptr uint64) (sx sy: int32) : unit requires { 0 < sy ≤ sx } requires { valid x sx } requires { valid y sy } requires { valid r (sy + sx) } writes { r.data.elts } ensures { value r (sy + sx) = value x sx * value y sy }

Why3 typing constraint: r cannot be aliased to x or y simplifies proofs : aliases are known statically we need separate functions for in-place operations

slide-19
SLIDE 19

19/31

Extracting to idiomatic C code

slide-20
SLIDE 20

20/31

Extraction mechanism

goals: simple, straightforward extraction (trusted) performance: no added complexity, no closures or indirections inefficiencies caused by extraction must be optimizable by the compiler tradeoff: handle only a small, C-like fragment of WhyML ✓ loops ✓ references ✓ machine integers ✓ manual memory management ✗ polymorphism, abstract types ✗ higher order ✗ mathematical integers ✗ garbage collection

slide-21
SLIDE 21

21/31

Exceptions and return/break statements

return and break are emulated by exceptions in WhyML recognize the patterns, extract as native return/break reject all other exceptions

let f (args) = ... ; try (* tail position *) ... raise (R e) ... with R v → v end f (args) { ...; ...; return e; ... } try while ... do ... raise B ... done with B → () end while ... { ... break; ... }

slide-22
SLIDE 22

22/31

Comparison: extracted C code

let wmpn_cmp (x y: ptr uint64) (sz: int32): int32 = let i = ref sz in try while !i ≥ 1 do i := !i - 1; let lx = x[!i] in let ly = y[!i] in if lx = ly then if lx > ly then raise (Return32 1) else raise (Return32 (-1)) done; with Return32 r → r end int32_t wmpn_cmp(uint64_t * x, uint64_t * y, int32_t sz) { int32_t i, o; uint64_t lx , ly; i = (sz); while (i >= 1) {

  • = (i - 1); i = o;

lx = (*(x+(i))); ly = (*(y+(i))); if (lx != ly) { if (lx > ly) return (1); else return ( -(1)); } } return (0); }

slide-23
SLIDE 23

23/31

An example: schoolbook multiplication

slide-24
SLIDE 24

24/31

Schoolbook multiplication

simple algorithm, optimal for smaller sizes GMP switches to divide-and-conquer algorithms at ∼ 20 words

mp_limb_t mpn_mul (mp_ptr rp , mp_srcptr up , mp_size_t un , mp_srcptr vp , mp_size_t vn) { /* We first multiply by the low

  • rder
  • limb. This

result can be stored , not added , to rp. We also avoid a loop for zeroing this

  • way. */

rp[un] = mpn_mul_1 (rp , up , un , vp [0]); /* Now accumulate the product

  • f up[] and the

next higher limb from vp []. */ while (--vn >= 1) { rp += 1, vp += 1; rp[un] = mpn_addmul_1 (rp , up , un , vp [0]); } return rp[un]; }

slide-25
SLIDE 25

25/31

Why3 implementation

while !i < sy do invariant { value r (!i + sx) = value x sx * value y !i } ly := get_ofs y !i; let c = addmul_limb !rp x !ly sx in set_ofs !rp sx c; i := !i + 1; !rp := C.incr !rp 1; done; ...

slide-26
SLIDE 26

25/31

Why3 implementation

while !i < sy do invariant { 0 ≤ !i ≤ sy } invariant { value r (!i + sx) = value x sx * value y !i } invariant { (!rp).offset = r.offset + !i } invariant { plength !rp = plength r } invariant { pelts !rp = pelts r } variant { sy - !i } ly := get_ofs y !i; let c = addmul_limb !rp x !ly sx in value_sub_update_no_change (pelts r) ((!rp).offset + sx) r.offset (r.offset + !i) c; set_ofs !rp sx c; i := !i + 1; value_sub_tail (pelts r) r.offset (r.offset + sx + k); value_sub_tail (pelts y) y.offset (y.offset + k); value_sub_concat (pelts r) r.offset (r.offset + k) (r.offset + k + sx); assert { value r (!i + sx) = value x sx * value y !i by ... so ... so ... (* 20+ subgoals *) }; !rp := C.incr !rp 1; done; ...

slide-27
SLIDE 27

26/31

Building block: addmul_limb

(** [addmul_limb r x y sz] multiplies [(x, sz)] by [y], adds the [sz] least significant limbs to [(r, sz)] and writes the result in [(r,sz)]. Returns the most significant limb of the product plus the carry

  • f the addition. Corresponds to [mpn_addmul_1].*)

let addmul_limb (r x: ptr uint64) (y: uint64) (sz: int32): uint64 requires { valid x sz } requires { valid r sz } ensures { value r sz + (power radix sz) * result = value (old r) sz + value x sz * y } writes { r.data.elts } ensures { forall j. j < r.offset ∨ r.offset + sz ≤ j → r[j] = (old r)[j] }

adds y ×x to r does not change the contents of r outside the first sz cells called on r+i, x and yi for 0 ≤ i ≤ sy

slide-28
SLIDE 28

27/31

Extracted code

void mul(uint64_t * r12 , uint64_t * x15 , uint64_t * y13 , int32_t sx4 , int32_t sy4) { uint64_t ly9 , c8 , res16; uint64_t * rp3; int32_t i16 , o28; uint64_t * o29; ly9 = (*( y13 )); c8 = (mul_limb(r12 , x15 , ly9 , sx4 )); *( r12 +( sx4 )) = c8; rp3 = (r12 +(1)); i16 = (1); while (i16 < sy4) { ly9 = *( y13 +( i16 )); res16 = ( addmul_limb (rp3 , x15 , ly9 , sx4 )); *( rp3 +( sx4 )) = res16;

  • 28 = (i16 + 1); i16 = o28;
  • 29 = (rp3 +(1)); (rp3) = o29;

} return (*( rp3 +( sx4 - 1))); }

not as concise as GMP, but close enough to be optimized by the compiler

slide-29
SLIDE 29

28/31

Benchmarks, conclusions

slide-30
SLIDE 30

29/31

Comparison with GMP

we compare with GMP without assembly (option --disable-assembly) we only consider inputs of 20 words or less (∼ 1300 bits) ⇒ above that, GMP uses different algorithms multiplication: less than 5% slower than GMP division: ∼ 10% slower than GMP except for very small inputs except for sx very close to sy ⇒ GMP uses a different algorithm for sy > sx/2, to do performances are very dependent on the compiled code of the primitives

  • ngoing: link to GMP to use the exact same primitives
slide-31
SLIDE 31

30/31

Proof effort

6000 lines of Why3 code

1350 of programs 4650 of specifications and (mostly) assertions

4200 subgoals, around two thirds are for division large proof contexts, nonlinear arithmetic ⇒ many long assertions are needed even for some “easy” goals Ongoing: use computational reflection to automate some proofs and delete the assertions ⇒ ∼ 700 lines of assertions deleted, work in progress ⇒ removes the need for some tedious proofs, but still finicky

slide-32
SLIDE 32

31/31

Conclusions

verified C library, bit-compatible with GMP GMP mpn functions implemented: schoolbook add, sub, mul, div, shifts, divide-and-conquer multiplication (wip) GMP implementation tricks preserved ⇒ satisfactory performances in the handled cases new Why3 features: extraction and memory model for C alias of return value and parameter Why3 framework for proofs by reflection coming soon: divide-and-conquer algorithms for multiplication and division GMP mpz functions extract specifications as well?

slide-33
SLIDE 33

1/1

Arithmetic primitives: the uint64 type

type uint64 val function to_int (n:uint64): int meta coercion function to_int let constant max = 0xffff_ffff_ffff_ffff predicate in_bounds (n:int) = 0 ≤ n ≤ max axiom to_int_in_bounds: forall n:uint64. in_bounds n val mul (x y:uint64): uint64 (* defensive, no overflow *) requires { in_bounds (x * y) } ensures { result = x * y } val mul_mod (x y:uint64): uint64 (* with overflow *) ensures { result = mod (x * y) (max+1) } val mul_double (x y:uint64): (uint64,uint64) (* returns two words*) returns { (l,h) → l + (max+1) * h = x * y }