Salsa An Automatic Tool to Improve the Numerical Accuracy of - - PowerPoint PPT Presentation

salsa
SMART_READER_LITE
LIVE PREVIEW

Salsa An Automatic Tool to Improve the Numerical Accuracy of - - PowerPoint PPT Presentation

Salsa An Automatic Tool to Improve the Numerical Accuracy of Programs Nasrine Damouche & Matthieu Martel University of Perpignan, LAMPS Laboratory Metalibm Workshop 2018 March 12, 2018 N. DAMOUCHE MetaLibm Workshop 1 / 61 SALSA Context


slide-1
SLIDE 1

Salsa

An Automatic Tool to Improve the Numerical Accuracy of Programs

Nasrine Damouche & Matthieu Martel

University of Perpignan, LAMPS Laboratory Metalibm Workshop 2018 March 12, 2018

  • N. DAMOUCHE

MetaLibm Workshop 1 / 61

slide-2
SLIDE 2

SALSA Context

Critical systems perform computations

  • N. DAMOUCHE

MetaLibm Workshop 2 / 61

slide-3
SLIDE 3

SALSA Context

Critical systems perform computations

1998: USS Yorktown ship (drive system stopped) (÷) 1996: Ariane 5 (flight crash) (overflow) 1991: Patriot (failed interception) (accuracy)

  • N. DAMOUCHE

MetaLibm Workshop 3 / 61

slide-4
SLIDE 4

SALSA Context

Critical systems perform computations

1998: USS Yorktown ship (drive system stopped) (÷) 1996: Ariane 5 (flight crash) (overflow) 1991: Patriot (failed interception) (accuracy)

Important to guarantee the accuracy of computations

  • N. DAMOUCHE

MetaLibm Workshop 4 / 61

slide-5
SLIDE 5

SALSA Context

Computer arithmetic is error-prone

Exact arithmetic (a + b) − b

= a + (b − b)

(1.0 + 100.020) − 100.020

= 1.0 + (100.020 − 100.020) 1.0 = 1.0

Computer arithmetic (a + b) − b =

a + (b − b)

(1.0 + 100.020) − 100.020 =

1.0 + (100.020 − 100.020) 0.0

=

1.0

  • N. DAMOUCHE

MetaLibm Workshop 5 / 61

slide-6
SLIDE 6

SALSA Context

Automatic Optimization of Accuracy

Original program a = 1.0; b = 10.0−21; c = (a + b) + b; What we can do with SARDANA [SAS’12] /HERBIE [PLDI’15]? a = 1.0; b = 10.0−21; a = 1.0; b = 10.0−21; c = (a + b) + b; c = a + (b + b); What we cannot do with SARDANA [SAS’12] /HERBIE [PLDI’15]?

a = 1.0; b = 10.0−21; a = 1.0; b = 10.0−21; c = a + b; c = a + b; d = c + b; d = a + (b + b);

  • N. DAMOUCHE

MetaLibm Workshop 6 / 61

slide-7
SLIDE 7

SALSA Context

Motivating Example

θ(t + 1) = θ(t) + ∆θ(t) ∆d(t) = ∆dr(t) + ∆dl(t)

2

∆θ(t) = ∆dr(t) − ∆dl(t)

L

∆dl(t) = sl(t) × C ∆dr(t) = sr(t) × C

x(t + 1) = x(t) + ∆d(t + 1) × cos

  • θ(t) + ∆θ(t + 1)

2

  • y(t + 1) = y(t) + ∆d(t + 1) × sin
  • θ(t) + ∆θ(t + 1)

2

  • ϴ

L Sl y x Sr Trajectory

  • N. DAMOUCHE

MetaLibm Workshop 7 / 61

slide-8
SLIDE 8

SALSA Context

Motivating Example

Straightforward implementation of equations Initial program

sl = [0.52, 0.53] double main(){ x = 0.0; y = 0.0; arg = 0.0; deltad = 0.0; deltadl = 0.0; deltadr = 0.0; deltatheta = 0.0; sr = 0.62831853071; t = 0.0; x = 0.0; y = 0.0; invl = 0.1; c = 12.34; theta = −0.985; while(t < 1.0){ deltadl = c ∗ sl; deltadr = c ∗ sr; deltad = (deltadl + deltadr) ∗ 0.5; deltatheta = (deltadr − deltadl) ∗ invl; arg = theta + (deltatheta ∗ 0.5); z = cosi(arg); x = x + (deltad ∗ z); q = sini(arg); y = y + (deltad ∗ q); theta = theta + deltatheta; t = t + 0.1;

}

return y;

}

double sini(double u){ w = u − ((u ∗ u ∗ u) ∗ 0.1666) + ((u ∗ u ∗ u ∗ u ∗ u) ∗ 0.0083); return w;

}

double cosi(double a){ b = 1.0 − (a ∗ a ∗ 0.5) + ((a ∗ a ∗ a ∗ a) ∗ 0.0416); return b;

}

ϴ L Sl y x Sr Trajectory

  • N. DAMOUCHE

MetaLibm Workshop 8 / 61

slide-9
SLIDE 9

SALSA Context

Motivating Example

Non intuitive accurate implementation

Transformed program

sl = [0.52, 0.53, −0., 0.] double main(){ t = 0.0; theta = −0.985; y = 0.0; x = 0.0; arg = 0.0; deltatheta = 0.0; deltad = 0.0; deltadr = 0.0; deltadl = 0.0; while(t < 1.0){ deltadl = (12.34 ∗ sl); deltadr = 7.753450668961402; deltad = (0.5 ∗ (deltadl + deltad r)); deltatheta = (0.1 ∗ (deltadr − deltad l)); arg = (theta + (deltatheta ∗ 0.5)); z = cosiTMP2(theta, deltatheta); x = (x + (deltad ∗ z)); q = siniTMP6(theta, deltatheta); y = (y + (deltad ∗ q)); theta = (theta + deltatheta); t = (t + 0.1);

};

return y;

}

double siniTMP6(doubletheta, doubledeltatheta){ TMP3 = (theta + (deltatheta ∗ 0.5)); w = ((TMP3 − (((TMP3 ∗ TMP3) ∗ TMP3) ∗ 0.1666)) + (((((TMP3 ∗ TMP3) ∗ TMP3) ∗ TMP3) ∗ TMP3) ∗ 0.0083)); return w;

}

double cosiTMP2(doubletheta, doubledeltatheta){ TMP1 = (theta + (deltatheta ∗ 0.5)); b = ((1.0 − ((TMP1 ∗ TMP1) ∗ 0.5)) + ((((TMP1 ∗ TMP1) ∗ TMP1) ∗ TMP1) ∗ 0.0416)); return b;

}

  • N. DAMOUCHE

MetaLibm Workshop 9 / 61

slide-10
SLIDE 10

SALSA Context

Motivating Example

Original and transformed value of x

It Original Transformed

1

8.681698 8.444116

2

17.038230 16.589474

3

24.756744 24.147995

4

31.549016 30.852965

5

37.163761 36.469708

6

41.398951 40.806275

7

44.114126 43.724118

8

45.242707 45.148775

Original Odometry Transformed Odometry

  • N. DAMOUCHE

MetaLibm Workshop 10 / 61

slide-11
SLIDE 11

SALSA Floating-point arithmetics

Background

Floating-point numbers Computation of errors

  • N. DAMOUCHE

MetaLibm Workshop 11 / 61

slide-12
SLIDE 12

SALSA Floating-point arithmetics

Floating-point numbers F = finite representation of Reals R

1 3 = 0.33333333333333333333333333333333333.....

Computed F result

=

expected R result ⇒ Round-off errors These errors may have catastrophic consequences

  • N. DAMOUCHE

MetaLibm Workshop 12 / 61

slide-13
SLIDE 13

SALSA Floating-point arithmetics

IEEE754 Standard

Binary representation of floating-point numbers

mantissa sign exponent 1

1 1 1 1 1 1 1 1 1

s : sign {+, -} e : exponent (8 or 11 bits) m : mantissa (23 or 52 bits)

Basic IEEE754 Standard floating-point format

Type Mantissa (bits) emax emin Exponent (bits) Single 23+1 +127

  • 126

8 Double 52+1 +1023

  • 1022

11

  • N. DAMOUCHE

MetaLibm Workshop 13 / 61

slide-14
SLIDE 14

SALSA Floating-point arithmetics

IEEE754 Standard

Rounding mode

  • infinity

+infinity To nearest Toward -infinity Toward zero Toward +infinity To nearest Toward +infinity Toward zero Toward -infinity +1/3

  • 1/3
  • N. DAMOUCHE

MetaLibm Workshop 14 / 61

slide-15
SLIDE 15

SALSA Floating-point arithmetics

Floating-point numbers

Representation of 1

3 with p = 10 in b = 2

1.0101010101

Rounding Error 1.0101010101 . . . 0101

  • 1.0101010101

= 0.0000000000 . . . 0101

Rounding function :

↑∼(1/3) = 0.0101010101

Round-off error : ↓∼(1/3) = 0.0000000000 . . . 0101

  • N. DAMOUCHE

MetaLibm Workshop 15 / 61

slide-16
SLIDE 16

SALSA Floating-point arithmetics

Addition of two numbers

[3.141, 3.142] [0.00059, 0.000592] µ1

+

[99.88, 99.99] [0.09, 0.1] µ2

=

[103.021, 103.132] [0.00059, 0.000592] µ1

+

[0.09, 0.1] µ2

=

[103.0, 103.1] [0.00059, 0.000592] µ1

+

[0.09, 0.1] µ2

+

[0.0, 0.032] ↓∼ (x1 + x2)

=

[103.0, 103.1] [0.09059, 0.132592]

Global error

  • N. DAMOUCHE

MetaLibm Workshop 16 / 61

slide-17
SLIDE 17

SALSA Floating-point arithmetics

Multiplication of two numbers

[3.141, 3.142] [0.00059, 0.000592] µ1 × [99.98, 99.99] [0.09, 0.1] µ2

=

[314.03718 [0.28269, 0.3142]

x1 × µ2

, 314.16858]

+ [0.0589882, 0.05919408] x2 × µ1 +

[0.0000531, 0.0000592] µ1 × µ2

=

[314.0, 314.2] [0.28269, 0.3142]

x1 × µ2 + [0.0589882, 0.05919408] x2 × µ1 +

[0.0000531, 0.0000592] µ1 × µ2

+

[0.0, 0.03718] ↓∼ (x1 × x2)

=

[314.0, 314.2] [0.3417313, 0.4123328]

Global error

  • N. DAMOUCHE

MetaLibm Workshop 17 / 61

slide-18
SLIDE 18

SALSA Related work

Related Work

Transformation for Numerical Accuracy

1

Transformation of Expressions

2

Intraprocedural Transformation

3

Interprocedural Transformation

  • N. DAMOUCHE

MetaLibm Workshop 18 / 61

slide-19
SLIDE 19

SALSA Related work

Transformation of Expressions

APEG introduced to limit combinatory [A. Ioualalen]

APEG (Abstract Program Expression Graph)

(a + a + b) × c (a + b + a) × c (b + a + a) × c c × (a + a + b) c × (a + b + a) c × (b + a + a) ((a + a) × c) + ( b × c) (((2 × a) + b ) × c) ((2 × a) × c) + (b × c) (c × b) + (c × (a + a)) (a × c) + (a × c) + (b × c) ((a + b) × c ) + (a × c)

. . .

2 a × + b □ +(a,a,b) × c × + c b c × a a + × × +

APEG for e = ((a + a) + b) × c.

  • N. DAMOUCHE

MetaLibm Workshop 19 / 61

slide-20
SLIDE 20

SALSA Related work

Transformation of Expressions

APEG introduced to limit combinatory [A. Ioualalen]

APEG (Abstract Program Expression Graph)

(a + a + b) × c (a + b + a) × c (b + a + a) × c c × (a + a + b) c × (a + b + a) c × (b + a + a) ((a + a) × c) + ( b × c) (((2 × a) + b ) × c) ((2 × a) × c) + (b × c) (c × b) + (c × (a + a)) (a × c) + (a × c) + (b × c) ((a + b) × c ) + (a × c)

. . .

2 a × + b □ +(a,a,b) × c × + c b c × a a + × × +

APEG for e = ((a + a) + b) × c.

Build the APEG of expression and search an accurate formula among it

([3.141, 3.142], [0.00059, 0.000592]) ([99.98, 99.99], [0.09, 0.1]) because

0.000592 ≤ 0.1

  • N. DAMOUCHE

MetaLibm Workshop 20 / 61

slide-21
SLIDE 21

SALSA Related work

Related Work

Transformation for Numerical Accuracy

1

Transformation of Expressions

2

Intraprocedural Transformation

3

Interprocedural Transformation

  • N. DAMOUCHE

MetaLibm Workshop 21 / 61

slide-22
SLIDE 22

SALSA Related work

Quadruplet Used in Transformation

⟨ c , ẟ , C , β ⟩ Command to optimize Context of c env. Var Expr Blacklist of Var Var Expr Formal Env.

while (t < 100.0 ){ ... } c 12.34 inv_l 0.1 sl = [0.52,0.53]; sr = 0.785398163397; ... c = 12.34 ; [] x

Algorithm

Priority on application of the transformation rules

c, δ, C, β ⇒ν c, δ, C, β

Re-application of transformation rules on resulting program

  • N. DAMOUCHE

MetaLibm Workshop 22 / 61

slide-23
SLIDE 23

SALSA Related work

Intraprocedural Transformation Rules

δ′ = δ[id → e]

id ∈ β

id = e, δ, C, β ⇒ν nop, δ′, C, β (A1)

e′ = δ(e)

σ♯ = [ [C[c]] ]♯ι♯ e′, σ♯ ❀∗ e′′ id = e, δ, C, β ⇒ν id = e′′, δ, C, β (A2) nop ; c , δ, C, β ⇒ν c, δ, C, β (S1) c ; nop , δ, C, β ⇒ν c, δ, C, β (S2)

C′ = C

  • []; c2
  • c1, δ, C′, β ⇒∗

ν c′ 1, δ′, C′, β′ C′′ = C[c′ 1; []]

c2, δ′, C′′, β′ ⇒ν c′

2, δ′′, C′′, β′′

c1 ; c2 , δ, C, β ⇒ν c′

1 ; c′ 2, δ′′, C, β′′

(S3) σ♯ = [ [C[ifΦ e then c1 else c2]] ]♯ι♯ [ [e] ]♯σ♯ = true c1, δ, C, β ⇒ν c′

1, δ′, C, β

ifΦ e then c1 else c2, δ, C, β ⇒ν Ψ(Φ, c′

1), Ψ(Φ, δ′), C, β

(C1) σ♯ = [ [C[ifΦ e then c1 else c2]] ]♯ι♯ [ [e] ]♯σ♯ = false c2, δ, C, β ⇒ν c′

2, δ′, C, β

ifΦ e then c1 else c2, δ, C, β ⇒ν Ψ(Φ, c′

2), Ψ(Φ, δ′), C, β

(C2)

Var(e) ∩ Dom(δ) = ∅

β′ = β ∪ Assigned(c1) ∪ Assigned(c2) c1, δ, C, β′ ⇒ν c′

1, δ1, C, β1

c2, δ, C, β′ ⇒ν c′

2, δ2, C, β2

δ′ = δ1 ∪ δ2 ifΦ e then c1 else c2, δ, C, β ⇒ν ifΦ e then c′

1 else c′ 2, δ′, C, β′

(C3)

V = Var(e) c′ = AddDefs(V, δ)

δ′ = δ|Dom(δ)\V c′; ifΦ e then c1 else c2, δ′, C, β ∪ V ⇒ν c′′, δ′, C, β′ ifΦ e then c1 else c2, δ, C, β ⇒ν c′′, δ′, C, β′ (C4)

Var(e) ∩ Dom(δ) = ∅ C′ = C[whileΦ e do []]

c, δ, C′, β ⇒ν c′, δ′, C′, β′ whileΦ e do c, δ, C, β ⇒ν whileΦ e do c′, δ′, C, β′ (W1)

V = Var(e) ∪ Var(Φ) c′ = AddDefs(V, δ)

δ′ = δ|Dom(δ)\V c′; whileΦ e do c, δ′, C, β ∪ V ⇒ν c′′, δ′, C, β′ whileΦ e do c, δ, C, β ⇒ν c′′, δ′, C, β′ (W2)

  • N. DAMOUCHE

MetaLibm Workshop 23 / 61

slide-24
SLIDE 24

SALSA Related work

Example of Intraprocedural Transformation

Original Program

δ = ∅ β = {z}

C = [] x = a; if (x > 3.) then z = x + 2. + 1.; else z = x − 2. − 1.; 1 Transformed Program: Using Assignment Rules

δ′ = δ[x → a] β = {z}

C = x = a; [] if (x > 3.) then z = x + 2. + 1.; else z = x − 2. − 1.; 2 Transformed Program: Using Conditionals rules

δ = ∅ β = {z, x}

C = [] x = a; if (x > 3.) then z = x + 2. + 1.; else z = x − 2. − 1.; 3 Original Program: Using Partial Evaluation rules

δ = ∅ β = {z, x}

C = x = a; if (x > 3)then [] else [] x = a; if (x > 3.) then z = x + 3; else z = x − 3; 4

  • N. DAMOUCHE

MetaLibm Workshop 24 / 61

slide-25
SLIDE 25

SALSA Related work

Related Work

Transformation for Numerical Accuracy

1

Transformation of Expressions

2

Intraprocedural Transformation

3

Interprocedural Transformation

  • N. DAMOUCHE

MetaLibm Workshop 25 / 61

slide-26
SLIDE 26

SALSA Related work

Interprocedural Transformation Rules

f (u){c; return v}

z = f (e), δ, C, β →ϑ u = e; c; z = v, δ, C, β (Inlining) γ = [ [e] ]♯σ♯ σ♯ = [ [C[c]] ]♯ι♯

g(a){c′; return v} f (u){c; return v}

c, δ[u → γ], [], {v} →v c′, δ, C, β z = f (e), δ, C, β →ϑ z = g(), δ, C, β (Specialization)

f (u){c; return v}c, δ[u → e], [], {v} →z c′, δ, [], {v} g(u){c′; return v}

z = f (e), δ, C, β →ϑ z = g(Var(e)), δ, C, β (Laziness)

  • N. DAMOUCHE

MetaLibm Workshop 26 / 61

slide-27
SLIDE 27

SALSA Related work

Example of Interprocedural Transformation

Specialization of Function Original Program

assert a = [10.0, 20.0] double caller(){ x = 2.0; y = 3.0 ∗ x + 9.0; z = callee(y); return z;

}

double callee(double u){ v = (a ∗ a ∗ u ∗ u ∗ u)

+(a ∗ u ∗ u) +((a∗0.5)∗u)+0.3;

return v;

}

Interprocedural

assert a = [10.0, 20.0] double caller(){ x = 2.0; y = 15.0; z = calleey(); return z;

}

double calleey(){ v = (a∗a∗15.0∗15.0∗15.0)

+(a ∗ 15.0 ∗ 15.0) +((a ∗ 0.5) ∗ 15.0) + 0.3;

return v;

}

Intraprocedural

assert a = [10.0, 20.0] double caller(){ x = 2.0; y = 15.0; z = calleey(); return z;

}

double calleey(){ v = ((0.5 ∗ (15.0 ∗ a))

+((0.3 + ((a ∗ 15.0) ∗ 15.0)) +((((a∗a)∗15.0)∗15.0)∗15.0)));

return v;

}

  • N. DAMOUCHE

MetaLibm Workshop 27 / 61

slide-28
SLIDE 28

SALSA Architecture

Salsa Tool

Architecture Inputs Outputs

  • N. DAMOUCHE

MetaLibm Workshop 28 / 61

slide-29
SLIDE 29

SALSA Architecture

Architecture of Salsa

Parser Static-Analyzer Sardana Salsa-Intra Salsa-Function

  • N. DAMOUCHE

MetaLibm Workshop 29 / 61

slide-30
SLIDE 30

SALSA Architecture

Architecture of Salsa

Parser Static-Analyzer Sardana Salsa-Intra Salsa-Function

  • Programs in C-like language with annotations
  • Puts programs in SSA form
  • Returns the syntactic tree
  • N. DAMOUCHE

MetaLibm Workshop 30 / 61

slide-31
SLIDE 31

SALSA Architecture

Architecture of Salsa

Parser Static-Analyzer Sardana Salsa-Intra Salsa-Function

  • Based on abstract interpretation
  • infers safe ranges
  • computes error bounds
  • N. DAMOUCHE

MetaLibm Workshop 31 / 61

slide-32
SLIDE 32

SALSA Architecture

Architecture of Salsa

Parser Static-Analyzer Sardana Salsa-Intra Salsa-Function

  • Implements intraprocedural rules
  • Takes a source program
  • Calls static-analyzer, Sardana on expr.
  • N. DAMOUCHE

MetaLibm Workshop 32 / 61

slide-33
SLIDE 33

SALSA Architecture

Architecture of Salsa

Parser Static-Analyzer Sardana Salsa-Intra Salsa-Function

  • Implements interprocedural rules
  • Transforms functions
  • Calls Salsa-Intra
  • N. DAMOUCHE

MetaLibm Workshop 33 / 61

slide-34
SLIDE 34

SALSA Architecture

Architecture of Salsa

Parser Static-Analyzer Sardana Salsa-Intra Salsa-Function

  • Improves the accuracy of expressions
  • Takes as input an expression
  • Returns a more accurate expression
  • N. DAMOUCHE

MetaLibm Workshop 34 / 61

slide-35
SLIDE 35

SALSA Inputs

Salsa

Architecture Inputs Outputs

  • N. DAMOUCHE

MetaLibm Workshop 35 / 61

slide-36
SLIDE 36

SALSA Inputs

Inputs (1/5)

assert m = [8.0, 8.0, 0.0, 0.0] assert c = [5.0, 5.0, 0.0, 0.0] << Variables comes from sensors or introduced by user

%Salsa%

<< begin of Program double main() { eold = 0.0; t = 0.0; i = 0.0; kp = 9.4514; kd = 2.8454; dt = 0.2; invdt = 5.0; m = 0.0; while (t < 100.0) { e = c − m; p = kp ∗ e; i = integral(i, m, c, dt) ; << Caller function d = kd ∗ invdt ∗ (e − eold); r = p + i + d; eold = e; m = m + r ∗ 0.01; t = t + dt;

}

return m ; << Reference variable

}

double integral(double ii, double mm, double cc, double ddt) { ki = 0.69006; res = ii + (ki ∗ ddt ∗ (cc − mm)); return res;

}

<< Callee function

  • N. DAMOUCHE

MetaLibm Workshop 36 / 61

slide-37
SLIDE 37

SALSA Inputs

Inputs (2/5)

1

Program in SSA form (Static Single Assignment form)

x = 2 ; If (x > 1 ) th e n x = x * 2 Els e x = x / 2 En d if ; z = x ; x 1 = 2 ; If (x 1 > 1 ) th e n x 2 = x 1 * 2 Els e x 3 = x 1 / 2 En d if ; ϕ (x 4 ,x 2 ,x 3 ) z = x 4 ; 2

Optimization of one Reference Variable

value returned by the main function

3

Assertions id1 ∈ [a, b, c, d],

[a, b] is the range of floating-point value [c, d] is the error range associated to [a,b]

  • N. DAMOUCHE

MetaLibm Workshop 37 / 61

slide-38
SLIDE 38

SALSA Inputs

Inputs (3/5)

Parameters of Salsa

sliceSize: number corresponds to the height of the syntactic tree where we slice the arithmetic expression

x = ((a + (2 .0 * b )) +c ) + ( d + e );

+ +

c d e 2 .0 b a

+ +

*

  • N. DAMOUCHE

MetaLibm Workshop 38 / 61

slide-39
SLIDE 39

SALSA Inputs

Inputs (4/5)

Parameters of Salsa

sliceSize: number corresponds to the height of the syntactic tree where we slice the arithmetic expression

x = ((a + (2.0 * b )) + c ) + ( d + e ); + + + + a * c d e 2.0 b sliceSize = 2 + + + c d e TMP1 = (a + (2.0 * b)) ; x = (TMP1 + c) + (d + e) ; TMP1 Before slicing After slicing

  • N. DAMOUCHE

MetaLibm Workshop 39 / 61

slide-40
SLIDE 40

SALSA Inputs

Inputs (5/5)

Parameters of Salsa ω : width of the intervals corresponding to the values of the

arguments of a function ruleSelector: selects which rule to apply widen : number of iterations allowed before applying a widening in static analysis etc.

  • N. DAMOUCHE

MetaLibm Workshop 40 / 61

slide-41
SLIDE 41

SALSA Outputs

Salsa

Architecture Inputs Outputs

  • N. DAMOUCHE

MetaLibm Workshop 41 / 61

slide-42
SLIDE 42

SALSA Outputs

Outputs (1/2)

sl = [0.52, 0.53, 0., 0.] double main(){ t = 0.0; theta = −0.985; y = 0.0; x = 0.0; arg = 0.0; deltatheta = 0.0; deltad = 0.0; deltadr = 0.0; deltadl = 0.0; while(t < 1.0){ deltadl = (12.34 ∗ sl); deltadr = 7.753450668961402; deltad = (0.5 ∗ (deltadl + deltadr)); deltatheta = (0.1 ∗ (deltadr − deltadl)); arg = (theta + (deltatheta ∗ 0.5)); z = cosiTMP2(theta, deltatheta); x = (x + (deltad ∗ z)); q = siniTMP6(theta, deltatheta); y = (y + (deltad ∗ q)); theta = (theta + deltatheta); t = (t + 0.1);

};

return y;

}

double cosiTMP2(doubletheta, doubledeltatheta){ TMP1 = (theta + (deltatheta ∗ 0.5)); b = ((1.0 − ((TMP1 ∗ TMP1) ∗ 0.5)) + ((((TMP1 ∗ TMP1) ∗ TMP1) ∗ TMP1) ∗ 0.0416)); return b;

}

double siniTMP6(doubletheta, doubledeltatheta){ TMP3 = (theta + (deltatheta ∗ 0.5)); w = ((TMP3 −(((TMP3 ∗TMP3)∗TMP3)∗0.1666))+(((((TMP3 ∗TMP3)∗TMP3)∗TMP3)∗TMP3)∗0.0083)); return w;

}

  • N. DAMOUCHE

MetaLibm Workshop 42 / 61

slide-43
SLIDE 43

SALSA Outputs

Outputs (2/2)

Interval of target variable

Original value (e.g. Odometry 10.1407240387) Transformed value (e.g. Odometry 8.71925051821 )

Interval of the error

Absolute original error (e.g. Odometry 4.3972539271e−14) Transformed absolute error (e.g. Odometry 1.5940113894e−14)

  • N. DAMOUCHE

MetaLibm Workshop 43 / 61

slide-44
SLIDE 44

SALSA Experimental results

Experimentations

Numerical Accuracy Transformation Time Execution Time Data-Types variable optimization Multi-criteria Optimization

  • N. DAMOUCHE

MetaLibm Workshop 44 / 61

slide-45
SLIDE 45

SALSA Experimental results

Case Studies

Odometry PID Controller Simpson’s Method Runge-Kutta 4 Spring-Mass System Newton-Raphson Method Rocket Trajectory

  • 5e+06
  • 4e+06
  • 3e+06
  • 2e+06
  • 1e+06

1e+06 2e+06 3e+06 4e+06

  • 2e+06
  • 1e+06

1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 Rocket Trajectory Computed by the Optimized Code Rocket Trajectory Computed by the Original Code

  • N. DAMOUCHE

MetaLibm Workshop 45 / 61

slide-46
SLIDE 46

SALSA Experimental results

Experimental Results

Numerical accuracy and Transformation time

Code Ori. Abs. Trans. Abs. Ori. Rel. Trans. Rel. Trans. Err. Err. Err. Err. Time (s) Odometry

5.55168e-14 2.13253e-14 7.7719e-16 3.7346e-16 0.047

PID Controller

4.66802e-14 2.13462e-16 1.6226e-15 7.4200e-17 0.121

Lead-Lag System

1.61083e-13 1.42663e-14 7.1038e-17 1.1145e-17 0.009

Runge-Kutta 4

1.26621e-15 2.77237e-16 7.1105e-16 1.6766e-16 0.022

Newton-Raphson

3.21484e-14 1.07444e-15 1.0937e-15 8.9536e-17 0.022

Simpson’s Method

1.91875e-10 9.44669e-15 1.1605e-13 6.2320e-18 0.620

Rocket Trajectory

2.85693e-07 6.04012e-10 3.7310e-11 7.8882e-13 0.086

  • N. DAMOUCHE

MetaLibm Workshop 46 / 61

slide-47
SLIDE 47

SALSA Experimental results

Experimental Results

Improvements of the relative errors of programs

Relative Errors

1e-17 1e-15 1e-12 1e-9 O d

  • m

e t r y P I D L e a d

  • L

a g R K 4 N e w t

  • n

S i m p s

  • n

R

  • c

k e t Original Relative Error Transformed Relative Error

Codes

  • N. DAMOUCHE

MetaLibm Workshop 47 / 61

slide-48
SLIDE 48

SALSA Convergence Acceleration

Iterative methods

Newton-Raphson’s method Jacobi’s method Iterative Gram-Schmidt method Iterated power method

  • N. DAMOUCHE

MetaLibm Workshop 48 / 61

slide-49
SLIDE 49

SALSA Convergence Acceleration

Example : Jacobi’s Method

System of n equations: Ax = b x(k+1)

i

=

bi −

n

  • j=1,j=i

aijx(k)

j

aii stop when

|x(k+1)

i

− xk

i | < ǫ

 

0.62 0.1 0.2

−0.3

0.3 0.602

−0.1

0.2 0.2

−0.3

0.6006 0.1

−0.1

0.2 0.3 0.601

  ·  

x1 x2 x3 x4

  =  

1.0/2.0 1.0/3.0 1.0/4.0 1.0/5.0

 

Sufficient condition to converge in R

|aii| >

j=4

  • j=1,j=i

|aij|

  • N. DAMOUCHE

MetaLibm Workshop 49 / 61

slide-50
SLIDE 50

SALSA Convergence Acceleration

Jacobi’s Method

Original program

e = 1.0; eps = 10e−16; a11 = 0.61; a22 = 0.602; a33 = 0.6006; a44 = 0.601; b1 = 0.5; b2 = 1.0/3.0; b3 = 0.25; b4 = 1.0/5.0; while (e > eps) { xn1 = (b1/a11) − (0.1/a11) ∗ x2 − (0.2/a11) ∗ x3 + (0.3/a11) ∗ x4; xn2 = (b2/a22) − (0.3/a22) ∗ x1 + (0.1/a22) ∗ x3 − (0.2/a22) ∗ x4; xn3 = (b3/a33) − (0.2/a33) ∗ x1 + (0.3/a33) ∗ x2 − (0.1/a33) ∗ x4; xn4 = (b4/a44) + (0.1/a44) ∗ x1 − (0.2/a44) ∗ x2 − (0.3/a44) ∗ x3; e = xn1 − x1; x1 = xn1; x2 = xn2; x3 = xn3; x4 = xn4;

}

  • N. DAMOUCHE

MetaLibm Workshop 50 / 61

slide-51
SLIDE 51

SALSA Convergence Acceleration

Jacobi’s Method

The percentage of the optimization is up to 44.5% Transformed program

e = 1.0; eps = 10e−16; while (e > eps) { TMP1 = (0.553709856035437 − (x1 ∗ 0.498338870431894)); TMP2 = (0.166112956810631 ∗ x3); TMP6 = (0.333000333000333 ∗ x1); xn1 = (((0.819672131147541 − (0.163934426229508 ∗ ((TMP1 + TMP2) − (0.332225913621263 ∗ x4)))) − (0.327868852459016 ∗ (((0.416250416250416 − TMP6) + (0.4995004995005 ∗ x2)) −

(0.166500166500167∗x4))))+(0.491803278688525∗(((0.332778702163062+(0.166389351081531∗

x1)) − (0.332778702163062 ∗ x2)) − (0.499168053244592 ∗ x3)))); xn2

= (((0.553709856035437 − (0.498338870431894 ∗

xn1))

+ (0.166112956810631 ∗ (((0.416250416250416 − TMP6) + (0.4995004995005 ∗ x2)) − (0.166500166500167 ∗ x4)))) − (0.332225913621263 ∗ (((0.332778702163062 + (0.166389351081531 ∗ x1)) − (0.332778702163062 ∗

x2)) − (0.499168053244592 ∗ x3)))); xn3

= (((0.416250416250416 − (0.333000333000333 ∗ xn1)) + (0.4995004995005 ∗ xn2)) − (0.166500166500167 ∗ (((0.332778702163062 + (0.166389351081531 ∗ x1)) − (0.332778702163062 ∗

x2)) − (0.499168053244592 ∗ x3)))); xn4

= (((0.332778702163062 + (0.166389351081531 ∗ xn1)) − (0.332778702163062 ∗ xn2)) − (0.499168053244592 ∗ xn3));

e = (xn4 − x4); x1 = xn1; x2 = xn2; x3 = xn3; x4 = xn4;

}

  • N. DAMOUCHE

MetaLibm Workshop 51 / 61

slide-52
SLIDE 52

SALSA Convergence Acceleration

Jacobi’s Method

Number of iterations of Jacobi’s method before and after optimization to compute xi, 1 ≤ i ≤ 4 xi

Nbre it initial Nbre it optimized Difference Percentage

x1

1891 1628

−263

14.0

x2

2068 1702

−366

17.3

x3

2019 1702

−317

15.7

x4

1953 1628

−325

16.7

  • N. DAMOUCHE

MetaLibm Workshop 52 / 61

slide-53
SLIDE 53

SALSA Convergence Acceleration

Experimental Results

Execution time and code size measurements of programs

Code Exec. timeo(s) Exec. timet % Code sizeo Code sizet

codesizeo codesizet

Odometry

0.049 0.038 22.44 808 2.1 K 0.38

PID

0.055 0.025 54.54 472 573 0.82

Lead-Lag Sys.

0.049 0.046 6.12 618 752 0.82

Runge-Kutta 4

0.047 0.046 2.12 697 1.4 K 0.49

Newton

0.059 0.047 20.33 465 1.1 K 0.42

Simpson’s

0.082 0.051 37.80 648 4.2 K 0.15

Rocket

0.055 0.047 14.54 1.6K 2.9 K 0.55

  • N. DAMOUCHE

MetaLibm Workshop 53 / 61

slide-54
SLIDE 54

SALSA Convergence Acceleration

Floating-point operations needed by numerical iterative methods to converge (Flops)

♯ of ± per it ♯ of ± per it Total ♯ of ± Total ♯ of ± opt Percentage of Method Original Code Optimized Code Original Code Optimized Code Improvement Jacobi

13 15 25389 24420 3.81

Newton

11 11 3465 132 96.19

Eigenvalue

15 15 694080 685995 1.16

Gram-Schmidt

21 19 791364 715996 9.52

♯ of × per it ♯ of × per it Total ♯ of × Total ♯ of × opt Percentage of Method Original Code Optimized Code Original Code Optimized Code Improvement Jacobi

28 14 54684 22792 58.32

Newton

27 26 8505 312 96.33

Eigenvalue

19 19 879168 868927 1.16

Gram-Schmidt

22 20 712316 647560 9.09

  • N. DAMOUCHE

MetaLibm Workshop 54 / 61

slide-55
SLIDE 55

SALSA Single vs Double Precision?

Simpson’s Method

b

a

f (x) dx ≈ h 3

  • f (x0) + 2

n 2 −1

  • j=1

f (x2j) + 4

n 2

  • j=1

f (x2j−1) + f (xn)

  • where

n is the number of subintervals of [a,b] with n is pair h = (b − a)/n is the length of the subintervals xi = a + i × h for i = 0, 1, . . . , n − 1, n

  • N. DAMOUCHE

MetaLibm Workshop 55 / 61

slide-56
SLIDE 56

SALSA Single vs Double Precision?

Simpson’s Method

Developed form of polynomial evaluates very poorly close to a root Problems due to floating-point errors f (x) = (x − 2)7 f (x) = x7 − 14× x6 + 84× x5 − 280× x4 + 560× x3 − 672× x2 + 448× x − 128

  • N. DAMOUCHE

MetaLibm Workshop 56 / 61

slide-57
SLIDE 57

SALSA Single vs Double Precision?

Simpson’s Method

Simulation results of the Simpson’s method with single, double precision and transformed program

  • N. DAMOUCHE

MetaLibm Workshop 57 / 61

slide-58
SLIDE 58

SALSA Multi-criteria optimization

Experimental Results

Multi-criteria optimization

1

Balancing the syntactic tree

2

Reducing the operations number

3

Merging 1 and 2

4

Computing the execution time

  • N. DAMOUCHE

MetaLibm Workshop 58 / 61

slide-59
SLIDE 59

SALSA Multi-criteria optimization

Multi-criteria optimization

Original Program (eq. tree) Transformed Program (eq. tree) Code Initial error Transfo. error transfo. time (s) Initial error Transfo. error transfo. time (s) Odometry

4.9801e−13 1.9829e−13 0.098 2.5637e−14 1.9829e−15 0.090

Jacobi

2.9142e−16 1.7258e−16 1.098 2.5658e−16 1.5517e−16 0.125

PID

4.6680e−14 2.1346e−16 0.125 2.5637e−15 1.9829e−16 0.098

Runge-Kutta 4

4.6666e−13 2.9226e−13 0.098 4.6047e−14 2.9253e−15 0.067

Original Program (op. nbre) Transformed Program (op. nbre) Code Initial error Transfo. error transfo. time (s) Initial error Transfo. error transfo. time (s) Odometry

4.9801e−13 1.9829e−13 0.098 2.4011e−14 1.8574e−15 0.091

Jacobi

2.9142e−16 1.7258e−16 1.098 1.7176e−16 4.4408e−17 0.115

PID

4.6680e−14 2.1346e−16 0.125 2.8574e−15 1.5211e−15 0.100

Runge-Kutta 4

4.6666e−13 2.9226e−13 0.098 4.6805e−14 2.9312e−16 0.751

Original Program (eq. & op.) Transformed Program (eq. & op.) Code Initial error Transfo. error transfo. time (s) Initial error Transfo. error transfo. time (s) Odometry

4.9801e−13 1.9829e−13 0.098 1.5847e−14 1.9229e−15 0.080

Jacobi

2.9142e−16 1.7258e−16 1.098 1.5658e−16 1.5017e−16 0.120

PID

4.6680e−14 2.1346e−16 0.125 2.5622e−15 1.9829e−16 0.070

Runge-Kutta 4

4.6666e−13 2.9226e−13 0.098 3.6095e−14 .9253e−15 0.050

  • N. DAMOUCHE

MetaLibm Workshop 59 / 61

slide-60
SLIDE 60

SALSA Conclusion

Conclusion and Future Work

Mixed Precision and Numerical accuracy (Submitted) Extending our techniques to optimize parallel programs Studying how our technique could improve reproducibility Studying the impact of accuracy on the convergence time of distributed numerical algorithms Find a permanent position ;-)

  • N. DAMOUCHE

MetaLibm Workshop 60 / 61

slide-61
SLIDE 61

SALSA Conclusion

Thank you

  • N. DAMOUCHE

MetaLibm Workshop 61 / 61