Interactions between parallelism and numerical stability, accuracy - PowerPoint PPT Presentation

Interactions between parallelism and numerical stability, accuracy Prof. Richard Vuduc Georgia Institute of Technology CSE/CS 8803 PNA: Parallel Numerical Algorithms [L.13] Tuesday, February 19, 2008 1

Source: Dubey, et al ., of Intel (2005) 2

Problem: Seamless image cloning. (Source: Pérez, et al. , SIGGRAPH 2003) 3

… then reconstruct. (Source: Pérez, et al. , SIGGRAPH 2003) 4

Review: Multigrid 5

Exploiting structure to obtain fast algorithms for 2-D Poisson Dense LU : Assume no structure ⇒ O(n 6 ) Sparse LU : Sparsity ⇒ O(n 3 ), need extra memory, hard to parallize CG : Symmetric positive definite ⇒ O(n 3 ), a little extra memory RB SOR : Fixed sparsity pattern ⇒ O(n 3 ), no extra memory, easy to parallelize FFT : Eigendecomposition ⇒ O(n 2 log n) Multigrid : Eigendecomposition ⇒ O(n 2 ) [Optimal!] 6

Problem: Slow convergence RHS True solution 5 steps of Jacobi Best possible 5-step 7

Error “frequencies” ( I − w ǫ ( t ) = R t 2 Z Λ Z T ) t · ǫ (0) w · ǫ (0) = I − w � t � Z T · ǫ (0) = 2 Λ Z ⇓ I − w � t � Z T · ǫ ( t ) Z T · ǫ (0) = 2 Λ I − w � t � Z T · ǫ ( t ) � � � Z T · ǫ (0) � = 2 Λ j j jj 8

Initial error “Rough” Lots of high frequency components Norm = 1.65 Error after 1 weighted Jacobi step “Smoother” Less high frequency component Norm = 1.06 Error after 2 weighted Jacobi steps “Smooth” Little high frequency component Norm = .92, won’t decrease much more 9

“Multigrids” in 2-D 2 i + 1 2 i + 1 P ( i ) � � � � = Problem on grid × T ( i ) x ( i ) b ( i ) = P (3) P (2) P (1) 10

Full multigrid algorithm � b ( k ) , x ( k ) � FMG x (1) ← Exact solution to P (1) for i = 2 to k do � b ( i ) , L ( i − 1) � x ( i − 1) �� x ( i ) ← MGV k=2 3 4 5 11

Interactions between parallelism and numerical stability, accuracy Prof. Richard Vuduc Georgia Institute of Technology CSE/CS 8803 PNA: Parallel Numerical Algorithms [L.13] Tuesday, February 19, 2008 12

Example 1: When single-precision is faster than double On STI Cell SPEED(single) = 14x SPEED(double) : 204.8 Gflop/s vs. 14.6 Gflop/s SPEs fully IEEE-compliant for double, but only support round-to-zero in single On regular CPUs with SIMD units SPEED(single) ~ 2x SPEED(double) SSE2: S(single) = 4 flops / cycle vs. S(double) = 2 flops/cycle PowerPC Altivec: S(single) = 8 flops / cycle; no double (4 flops / cycle) On a GPU, might not have double-precision support 13

Example 2: Parallelism and floating- point semantics: Bisection on GPUs Bisection algorithm computes eigenvalues of symmetric tridiagonal matrix Inner-kernel is a routine, Count(x) , which counts the number of eigenvalues less than x Correctness 1: Count(x) must be “ monotonic ” Correctness 2: (Some approaches) IEEE FP-compliance ATI Radeon X1900 XT GPU does not strictly adhere to IEEE floating-point standard, causing error in some cases But workaround possible 14

The impact of parallelism on numerical algorithms Larger problems magnify errors: Round-off, ill-conditioning, instabilities Reproducibility : a + (b + c) ≠ (a + b) + c Fast parallel algorithm may be much less stable than fast serial algorithm Flops cheaper than communication Speeds at different precisions may vary significantly [ e.g. , SSE k , Cell] Perils of arithmetic heterogenity , e.g. , CPU vs. GPU support of IEEE 15

A computational paradigm Use fast algorithm that may be unstable (or “less” stable) Check result at the end If needed, re-run or fix-up using slow-but-safe algorithm 16

Sources for today’s material Applied Numerical Linear Algebra , by Demmel Accuracy and stability of numerical algorithms , by Higham “Trading off parallelism and numerical stability,” by Demmel (1992) “Exploiting the performance of 32 bit arithmetic in obtaining 64 bit accuracy,” by Langou, et al . (2006) “Using GPUs to accelerate the bisection algorithm for finding eigenvalues of symmetric tridiagonal matrices,” by Volkov and Demmel (2007) CS 267 (Demmel, UCB) Plamen Koev (NCSU) 17

Reasoning about accuracy and stability 18

Sources of error in a computation Uncertainty in input data, due to measurements, earlier computations, ... Truncation errors, due to algorithmic approximations Rounding errors, due to finite-precision arithmetic Today’s focus: Rounding error analysis 19

Accuracy vs. precision Accuracy : Absolute or relative error in computed quantity Precision : Accuracy of basic operations (+, -, *, /, …) Accuracy not limited by precision! Can simulate arbitrarily higher precision with a given precision Cost: Speed 20

A model of floating-point arithmetic Basic operations (+, -, *, /, …) satisfy: fl( a op b ) = ( a op b )(1 + δ ) , | δ | ≤ ǫ ε = “Unit round-off” or “machine/format precision” IEEE 754 single-precision (32-bit) ~ 10 -8 Double (64-bit) ~ 10 -16 Extended (80-bit on Intel) ~ 10 -20 Fused multiply-add: Two ops with only one error 21

Error analysis notation Desired computation: y = f ( x ) What our algorithm actually computes: y = ˆ ˆ f ( x ) 22

Measuring errors Absolute error | ˆ x − x | Relative error | ˆ x − x | | x | (Forward) Stability : “Small” bounds on error For vectors and matrices, use “norms” �� x 2 || x || 2 ≡ || x || 1 ≡ | x i | || x || ∞ ≡ max | x i | i i i i 23

Input space Output space x y = f ( x ) Forward error ˆ y 24

Forward vs. backward errors Forward error analysis bounds | ˆ y − y | | ˆ y − y | or | y | Backward error analysis bounds ∆ x : y = f ( x + ∆ x ) ˆ Numerically stable : Bounds are “small” 25

Input space Output space x y = f ( x ) Backward error x + ∆ x Forward error y = f ( x + ∆ x ) ˆ 26

Mixed (forward-backward) stability Computed answer “near” exact solution of a nearby problem ∆ x, ∆ y : y + ∆ y = f ( x + ∆ x ) ˆ x y = f ( x ) ˆ y x + ∆ x ∆ y f ( x + ∆ x ) 27

Conditioning: Relating forward and backward error f ( x ) + f ′ ( x ) ∆ x = y + f ′ ( x ) ∆ x y = f ( x + ∆ x ) ˆ ≈ f ′ ( x ) ∆ x = ⇒ ˆ y − y ≈ ˆ f ′ ( x ) ∆ x y − y · x ≈ f ( x ) y x ⇓ � � � � � � ˆ xf ′ ( x ) ∆ x y − y � � � � � � � � · � � � � � � f ( x ) y x � � � � � 28

Conditioning: Relating forward and backward error � � � � � � ˆ xf ′ ( x ) ∆ x y − y � � � � � � � � · � � � � � � f ( x ) y x � � � � � Define (relative) condition number : � � xf ′ ( x ) � � c ( x ) = � � f ( x ) � � Roughly: (Forward error) ≤ (Condition number) * (Backward error) 29

Comments on conditioning Rule-of-thumb: (Forward error) ≤ (Condition number) * (Backward error) � � xf ′ ( x ) � � c ( x ) = � � f ( x ) � � Condition number is problem dependent Backward stability ⇒ Forward stability , but not vice-versa Ill-conditioned problem can have large forward error 30

Example: Condition number for solving linear systems = Ax b ( A + ∆ A ) · ˆ = b + ∆ b x ⇓ � || ∆ A || � || ∆ x || || ∆ b || || A − 1 || · || A || · || A || + ≤ || ˆ x || || A || · || ˆ x || � �� ≡ κ ( A ) Condition number 31

Example: Condition number for solving linear systems − A · x = − b ( A + ∆ A ) · ˆ = b + ∆ b x A · (ˆ x − x ) + ∆ A · ˆ = x ∆ b ⇓ A − 1 · ( ∆ b − ∆ A · ˆ = x ) ∆ x || A − 1 || · ( || ∆ A || · || ˆ || ∆ x || x || + || ∆ b || ) ≤ � � || ∆ x || || ∆ A || + || ∆ b || || A − 1 || · ≤ || ˆ x || || ˆ x || ⇓ � � || ∆ x || || ∆ A || || ∆ b || || A − 1 || · || A || · || A || + ≤ || ˆ x || || A ||·|| ˆ x || 32

Subtractive cancellation a ≈ a > 0 , ˆ a + ˆ b ≈ b > 0 b a + b ˆ = ˆ ⇒ ≈ a · ˆ b a · b ˆ ≈ a/ ˆ b a/b ˆ ≈ May lose accuracy when subtracting nearly equal values . 123456789 xxx Example : . 123456789 yyy 12 ⇒ 3 significant digits − . 000000000 zzz 33

Example: Subtractive cancellation 1 − cos x < 1 f ( x ) ∀ x � = 0 ≡ x 2 2 0.5 0.45 0.4 0.35 0.3 f(x) 0.25 0.2 0.15 0.1 0.05 0 0 0.5 1 1.5 2 2.5 3 3.5 x / π 34

Example: Subtractive cancellation 1 − cos x < 1 f ( x ) ∀ x � = 0 ≡ x 2 2 Suppose x = 1.2 × 10 -5 and precision = 10 digits: c ≡ cos(1 . 2 × 10 − 5 ) 0 . 999 999 999 9 ≈ 0 . 000 000 000 1 = 10 − 10 1 − c = 10 − 10 1 − c = 1 . 44 × 10 − 10 = 0 . 6944 . . . x 2 “ 1 - c ” is exact, but result is same size as error in c 35

Cancellation magnifies errors ˆ a · (1 + ∆ a ) a ≡ ˆ b · (1 + ∆ b ) b ≡ a − ˆ ˆ ˆ y b ≡ � � � � ˆ − a ∆ a − b ∆ b y − y � � � � = � � � � y a − b � � � � max( | ∆ a | , | ∆ b | ) · | a | + | b | ≤ | a − b | 36

Cancellation not always bad! Operands may be exact ( e.g. , initial data) Cancellation may be symptomatic of ill-conditioning Effect may be irrelevant, e.g. , x + (y - z) is safe if 0 < y ≈ z ≪ x 37

Interactions between parallelism and numerical stability, accuracy - PowerPoint PPT Presentation

Interactions between parallelism and numerical stability, accuracy Prof. Richard Vuduc Georgia Institute of Technology CSE/CS 8803 PNA: Parallel Numerical Algorithms [L.13] Tuesday, February 19, 2008 1 Source: Dubey, et al ., of Intel (2005)

Hardware Parallelism vs. Software Parallelism USENIX Workshop on Hot Topics in Parallelism March

Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism

Pervasive Parallelism Laboratory Stanford University ppl.stanford.edu Make parallelism

Data-Level Parallelism Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures

Advanced OpenMP Lecture 6: Nested parallelism Nested parallelism Nested parallelism is

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING Exploits potential parallelism

CS 5220: Locality and parallelism in simulations I David Bindel 2017-09-12 1 Parallelism and

Plan Parallelism Complexity Measures 1 Multithreaded Parallelism and Performance Measures cilk

CO444H parallelism Ben Livshits 1 Why Parallelism? One way to speed up a computation is to

Parallel Models Different ways to exploit parallelism Outline Shared-Variables Parallelism

Parallelism ! Multiple processes concurrently Parallelism CPU1 CPU1 CPU1 Pseudo- Process 1

Multi-core Programming: Implicit Parallelism Tuukka Haapasalo April 16, 2009 Tuukka Haapasalo

Opportunities for Parallelism Dr. Michael K. Bane HIGH END COMPUTE Questions 1. What do you

A tour on Bridgeland stability Paolo Stellari Hamburg, June 2015 Paolo Stellari A tour on

Risk Aggregation, Motivation Numerical Stability and a CreditRisk + and extensions

Brownian dynamics simulations with hard-body interactions: Exact numerical treatment PHYSICAL

High-speed Diffie-Hellman, part 2 D. J. Bernstein University of Illinois at Chicago Classic

SIPE: Small Integer Plus Exponent Vincent LEFVRE AriC, INRIA Grenoble Rhne-Alpes / LIP,

RNS Arithmetic Approach in Lattice-based Cryptography Accelerating the Rounding-off Core

CSL 860: Modern Parallel Computation Computation Course Information

Competitive Sorter-based Encoding of PB-Constraints into SAT nski , Marek Piotr ow Micha l

Array Abstract Data Type Table of Contents What this module is about

Homomorphisms and Chinese Remainder Algorithms (cont.) L. Yohanes Stefanus L. Yohanes Stefanus

Drivers Prof. S. Ben-Yaakov , DC-DC Converters [8- 2] Driving a MOSFET L D V in V o R L C 1