Numerical Error Analysis for Statistical N i l E A l i f St ti - - PowerPoint PPT Presentation

numerical error analysis for statistical n i l e a l i f
SMART_READER_LITE
LIVE PREVIEW

Numerical Error Analysis for Statistical N i l E A l i f St ti - - PowerPoint PPT Presentation

gart.de ch.uni-stuttg Numerical Error Analysis for Statistical N i l E A l i f St ti ti l www.simtec Software on Multi-Core Systems y w Wenbin Li, Sven Simon liwn@ipvs.uni-stuttgart.de Institute of Parallel and Distributed Systems


slide-1
SLIDE 1

gart.de ch.uni-stuttg

N i l E A l i f St ti ti l

www.simtec

Numerical Error Analysis for Statistical Software on Multi-Core Systems

w

y

Wenbin Li, Sven Simon liwn@ipvs.uni-stuttgart.de Institute of Parallel and Distributed Systems University of Stuttgart, Germany

slide-2
SLIDE 2

O tli

gart.de

Outline

1 Numerical Errors and Accuracy Control

ch.uni-stuttg

  • 1. Numerical Errors and Accuracy Control
  • 2. A Method for Accuracy Control

www.simtec

  • e.g. Result : 9.676383250285714*103

Accurate 12 digits Contaminated digits

w

Accuracy: 12 digits

3 Applied to a Statistical Software R

g due to rounding errors

  • 3. Applied to a Statistical Software – R
  • 4. Parallelization on Multi-Core Systems
  • 5. Conclusion

2

slide-3
SLIDE 3

Di t C d b N i l E

gart.de

Disasters Caused by Numerical Errors

ch.uni-stuttg

“Numerical precision is the very soul of science.”

  • - D'Arcy Wentworth

www.simtec

y http://ta twi tudelft nl/users/vuik/wi211/disasters html

w

http://ta.twi.tudelft.nl/users/vuik/wi211/disasters.html

28 persons killed !!! $7 billion lost !!! $700 million lost !! p 3 $700 million lost !!

slide-4
SLIDE 4

Schemes to Evaluate Numerical Quality

gart.de

Schemes to Evaluate Numerical Quality

  • Compare the result with a high precision reference

ch.uni-stuttg

p g p

  • Reference is obtained by repeating the computation in arithmetic of

increasing precision.

www.simtec

3.14159265 3589793238462 …………………

14 . 3 = π

w

4

slide-5
SLIDE 5

Schemes to Evaluate Numerical Quality

gart.de

  • Compare the result with a high precision reference

Schemes to Evaluate Numerical Quality

ch.uni-stuttg

p g p

  • How many digits are sufficient?
  • Low performance:

www.simtec

  • Single precision performance >> Double precision

(e.g. GPU, CellBE, etc.). A bit hi h i i i l d f it d ti l

w

  • Arbitrary high precision is several orders of magnitude times slower.
  • Hard to modify source codes to generate high precision reference:

illi f li f d

  • millions of lines of source code.
  • Unknown libraries source code available (readable) ?
  • Mismatch in memory:
  • Fortran: common
  • C/C++ : union

5

slide-6
SLIDE 6

Schemes to Evaluate Numerical Quality

gart.de

  • Compare the result with a high precision reference

Schemes to Evaluate Numerical Quality

ch.uni-stuttg

p g p

  • Reference is obtained by repeating the computation in arithmetic of

increasing precision.

www.simtec

3.14159265 3589793238462 …………………

14 . 3 = π

w

  • Interval arithmetic

e g Yo r res lt is in g aranteed

( )

+∞ ∞

e.g. Your result is in guaranteed

  • Probabilistic error analysis

( )

+∞ ∞ − ,

6

slide-7
SLIDE 7

Probability Distribution of the Rounding Error

gart.de

  • The calculated result in a program

Probability Distribution of the Rounding Error

ch.uni-stuttg

p g : exact result

rounding

r R ε + = ˆ r

www.simtec

: exact result : approximation of due to rounding error. : rounding error

r R ˆ

rounding

ε r

w

: rounding error.

  • First order model ([1]) :

rounding

( )

p n k i p i rounding

d g

2 1

2 2 ) (

− = −

Ο + ⋅ = ∑ α ε

r?

Gaussian Distribution

Central Limit pdf

Assume a probability distribution for rounding error of elementary operation

Theorem

g y p

2

OP

1

OP

p p p

N

OP

R ˆ

( )

R ˆ σ

?

7

R

slide-8
SLIDE 8

Probabilistic Rounding Error Analysis (PREA)

gart.de

  • Consider a sequence of computations providing an exact result .

Probabilistic Rounding Error Analysis (PREA)

r

ch.uni-stuttg

  • Perform computations N times with random rounding ( randomly choosing

rounding to or ), N results are obtained.

  • The computed result is

N i Ri ,... 1 , ˆ = ∞ −

N

Input

∞ +

www.simtec

p

  • Instead of

we have sample variance:

=

=

N i i

R N R

1

ˆ 1

( )

R

p

User’s program, User’s program, User’s program,

w

  • Instead of , we have sample variance:

( )

− − =

N i i

R R N s

1 2 2

ˆ 1 1

( )

R σ

random rounding random rounding random rounding

ˆ ˆ ˆ

  • Define

sample from Student’s t-distribution

= i

N

1

1

N s r R T − =

1

R

2

R

N

R

  • Probability density function

N s

⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + −

⎟ ⎟ ⎞ ⎜ ⎜ ⎛ + ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + Γ = =

2 1 2

1 2 1 ) (

ν

ν t t T f

( )

⎟ ⎟ ⎠ ⎜ ⎜ ⎝ + Γ ⋅ = = 1 2 ) ( ν ν νπ t T f

8

slide-9
SLIDE 9

Number of Significant Digits

gart.de

Number of Significant Digits

  • Confidence Interval

ch.uni-stuttg

  • Confidence Interval

β τ τ

β β

= ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ < − < − N s r R Pr

www.simtec

  • With probability , the number of the significant digits (i.e. accurate

digits) of is

β

R

⎟ ⎞ ⎜ ⎛ ⋅ R N

w

  • Two Hypothesis should hold ([1 2]) :

⎟ ⎟ ⎟ ⎠ ⎜ ⎜ ⎜ ⎝ ⋅ ≥ s N

R β

τ

10

  • f

Digit t Significan

log

  • Two Hypothesis should hold ([1, 2]) :
  • Random rounding error is Gaussian distributed.
  • - not importance because of robustness of Student’s t-test.

p

  • The first order approximation is legitimate.
  • The operands of multiplication must be significant.

Self

  • The divisor of division must be significant.

9

validation

slide-10
SLIDE 10

Applied to Complex Statistical Software

gart.de

R (v2 10 1): a software for statistical computing and graphics

Applied to Complex Statistical Software

ch.uni-stuttg

R (v2.10.1): a software for statistical computing and graphics.

Benchmarks: NIST StRD (a collection of data sets and certified values)

A E ti ti

⎟ ⎞ ⎜ ⎛ ⋅ R N

Certified values

www.simtec

Accuracy Estimation using PREA :

⎟ ⎟ ⎟ ⎠ ⎜ ⎜ ⎜ ⎝ ⋅ s R N

β

τ

10

log ⎟ ⎞ ⎜ ⎛ − r R

Certified values from StRD

w

Reference (true accuracy):

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − = r r R LRE

10

log

PREA True PREA True PREA True PREA True

UNIV

benchmark Mean Standard Deviation

Autocorrelation

  • mean

sd acf

Deviation Mavro 15 15 13 13 13 14

  • Numacc3

15 15 10 10 11 10

  • ac

Numacc3 15 15 10 10 11 10 10

slide-11
SLIDE 11

Applied to Complex Statistical Software

gart.de

R (v2 10 1): a software for statistical computing and graphics

Applied to Complex Statistical Software

ch.uni-stuttg

R (v2.10.1): a software for statistical computing and graphics.

Benchmarks: NIST StRD (a collection of data sets and certified values)

A E ti ti

⎟ ⎞ ⎜ ⎛ ⋅ R N

Certified values

www.simtec

Accuracy Estimation using PREA :

⎟ ⎟ ⎟ ⎠ ⎜ ⎜ ⎜ ⎝ ⋅ s R N

β

τ

10

log ⎟ ⎞ ⎜ ⎛ − r R

Certified values from StRD

w

Reference (true accuracy):

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − = r r R LRE

10

log

PREA True PREA True PREA True PREA True

UNIV

benchmark Mean Standard Deviation

Autocorrelation

  • mean

sd acf

Deviation Mavro 15 15 13 13 13 14

  • Numacc3

15 15 10 10 11 10

  • ac

Numacc3 15 15 10 10 11 10 11 Number of significant Digits

slide-12
SLIDE 12

Applied to Complex Statistical Software

gart.de

PREA True PREA True PREA True PREA True b h k SST SSE

MSE

F t ti ti

Applied to Complex Statistical Software

ch.uni-stuttg

ANOVA aov

benchmark SST SSE

MSE

F-statistic SiRstv 12 12 12 13 12 13 12 12 SmLs08 4 4 2 2 2 2 2 2

LINR

Coefficient RSD R2 F statistic

www.simtec

LINR lm

Coefficient RSD R2 F-statistic Norris 12 13 14 14 15 15 13 14 Wampler5 5 6 15 15 14 14 14 15

NLINR

Coefficient Coefficient RSS RSD

w

NLINR Nls

Coefficienti Coefficientj RSS RSD Lanczos2 5 5 6 6 7 8 8 8 Bennet5 4 4 4 5 9 9 9 9 Exact estimation Underestimation by 1 digit Overestimation by 1 digit Relative Frequency 67% 29% 4% Relative Frequency 67% 29% 4%

If underestimation by 1 digit is tolerable:

12

reliable estimation in 96% of the cases

slide-13
SLIDE 13

Parallelization for Acceleration

gart.de

  • Why do we need parallel computing?

Parallelization for Acceleration

ch.uni-stuttg

  • The multiple runs of the code are intensive in computation time.
  • Multi-cores are the mainstream hardware architecture.

www.simtec

  • To harvest the performance potential of multi-core processors.
  • Parallel execution

T t N d di lt th ’ d b t d

w

  • To get N random rounding result, the user’s code can be executed

concurrently on different CPU cores.

  • Communication between threads is necessary to:

y

  • make a unitive decision for instructions such as “ IF() THEN”.
  • perform Self Validation (SV)
  • perform Numerical Instabilities Checking (NIC).

i.e. sudden accuracy loss due to cancellation in ‘+’ or ‘-’ ). H h i i i i i & h i i h d?

  • How to the minimize communication & synchronization overhead?

13

slide-14
SLIDE 14

Parallelization with Asynchronous SV & NIC

gart.de

Parallelization with Asynchronous SV & NIC

ch.uni-stuttg

Random number generator

www.simtec

Copy 1 of the code:

Computation with random rounding arithmetic

w

Copy 1 of the code: exe_thread 1

Branch decision

Operands/ intermediate result for SV&NIC M lti l

Start End

Copy N of the code: exe_thread N

Operands/ intermediate results for SV&NIC M lti l SV&NIC; Multiplex

buf_SV,NIC (1,1) buf_SV,NIC (1,2) buf_SV,NIC (N,1) buf_SV,NIC (N,2)

SV&NIC; Multiplex

Self validation & Numerical instability checking (SN)

Multiplex Multiplex SN_thread1 SN_threadM

14

slide-15
SLIDE 15

Performance of Parallelization with

gart.de

Performance of Parallelization with Asynchronous SV & NIC

ch.uni-stuttg www.simtec w

S ti l CADNA P ll li ti ith h Sequential CADNA Parallelization with asynchronous SV & NIC

15

slide-16
SLIDE 16

Performance of Parallelization with

gart.de

Performance of Parallelization with Asynchronous SV & NIC

ch.uni-stuttg

This parallelization approach achieves an almost linear scalability.

www.simtec w

Average speedup Minimum speedup Maximum speedup

16

slide-17
SLIDE 17

C l i

gart.de

1 Probabilistic error analysis method is very robust and

Conclusion

ch.uni-stuttg

  • 1. Probabilistic error analysis method is very robust and

provide tight rounding error estimation. 2 Numerical instability can be localized

www.simtec

  • 2. Numerical instability can be localized.

9.676383250285714*103 2 786786218987928*103

Line 10034 Line 10035

w

2.786786218987928*103

Line 10035

Sudden increase

  • 3. Parallelization approaches significantly accelerate the

in numerical error

pp g y proposed method, and shows an almost linear scalability.

17

slide-18
SLIDE 18

R f

gart.de

  • 1. J. –M. Chesneaux, J.-M , Study of the computing accuracy by using

Reference

ch.uni-stuttg

  • 1. J. M. Chesneaux, J. M , Study of the computing accuracy by using

probabilistic approach; Contribution to Computer Arithmetic and Self- Validating Numerical Methods, ed. C. Ulrich (J.C. Baltzer) 1990, 19-30. 2 J Vignes Discrete stochastic arithmetic for validating results of numerical

www.simtec

  • 2. J. Vignes, Discrete stochastic arithmetic for validating results of numerical
  • software. Numerical Algorithms, Vol. 37, p. 377-390
  • 3. Douglas C. Montgomery, Applied Statistics and Probability for Engineers.

page 257

w

page 257.

  • 4. N. Tajima, FORTRAN benchmark tests. http : //www:apphy:fukui-u.ac.jp/

tajima/bench/index_e.html 18

slide-19
SLIDE 19

gart.de ch.uni-stuttg

Thanks!

www.simtec w

Questions? Comment?

19