enhancing scientific computation using a variable
play

ENHANCING SCIENTIFIC COMPUTATION USING A VARIABLE PRECISION FPU WITH - PowerPoint PPT Presentation

ENHANCING SCIENTIFIC COMPUTATION USING A VARIABLE PRECISION FPU WITH A RISC-V PROCESSOR Y.Durand, C.Fabre, A. Bocco, T. Trevisan | IMPRENUM Project | Oct 2019 | 1 USE CASES FOR (LARGE) VARIABLE PRECISION Applications Techniques & Kernels


  1. ENHANCING SCIENTIFIC COMPUTATION USING A VARIABLE PRECISION FPU WITH A RISC-V PROCESSOR Y.Durand, C.Fabre, A. Bocco, T. Trevisan | IMPRENUM Project | Oct 2019 | 1

  2. USE CASES FOR (LARGE) VARIABLE PRECISION Applications Techniques & Kernels • • Computational Physics Dense/sparse linear algebra • • Solvers, eigenvalues Computational chemistry • • Numerical integration Computational statistics • RK, but not only … • Computational geometry • Monte Carlo • Spectral techniques • Large PDEs • FFT and others • Finite elements, finite • Interval arithmetics differences • ODE s • optimization Our main focus today: linear algebra solvers However, there are many other area in scientific computing where variable precision is sought Y.Durand | Oct 2019 | 2

  3. VARIABLE PRECISION FOR SCIENTIFIC COMPUTATION JACOBI While error > tolerance augment precision while convergence not reached do Accumulation : for i := 1:n do Requires max Matrix coeffs: read-only,  =0 sparse doubles precision should be done Stay in remote memory for j := 1:n do inside the FPU if j ≠ i then (𝑙) 𝜏 += 𝑏 𝑗𝑘 𝑦 𝑘 Vector update : end • dense • Requires high precision end • should be kept in close (𝑙+1) = 1 𝑦 𝑗 𝑏 𝑗𝑗 (𝑐 𝑗 − 𝜏) memory end we need 1. extended precision operators, k=k+1 2. dedicated accumulators in registers inside end the FPU, end 3. Extended precision storage in close memory | 3

  4. MORE IN DEPTH WITH JACOBI : EXECUTING ON THE V1 ACCELERATOR Input data, RO, in RAM, k = 0 double format (sparse) while convergence not reached do for i = 1:n do  =0 Rocket tile for j = 1:n do FPU if j ≠ i then Risc V (𝑙) $ 𝜏 += 𝑏 𝑗𝑘 𝑦 𝑘 L&S R R L1/ $ A A RoCC end L2/ L1 M M VP L3 end co-proc L&S (𝑙+1) = 1 𝑦 𝑗 𝑏 𝑗𝑗 (𝑐 𝑗 − 𝜏) Scratchpad Internal format, for end accumulation (high precision) k=k+1 Intermediate vector, end adjustable format (dense) Y.Durand | Oct 2019 | 4

  5. VARIABLE PRECISION SYSTEM Large size registers for V.P Floating accumulation Point Unit (FPU) (eg 64 512b Standard core registers) + specialized  FPU registers scratchpad VP Specific access to memory hierarchy L1$ L1$ Large size (10s of MB) coherent close memory LLC$ Distant Shared memory Y.Durand | Oct 2019 | 5

  6. PROGRAMMING MODEL: HARDWARE & SOFTWARE LAYERS application Domain Specific library Solver & algorithms i/f SOLVERS & VP SOLVERS & Variable precision is ALGORITHMS ALGORITHMS contained within calls to kernel Computation routines i/f (BLAS level) and Solver (LaPack level) calls Variable precision kernel kernel kernel Auxiliary support library Hardware Y.Durand | Oct 2018 | 6

  7. RECAP: BENEFITS OF VARIABLE PRECISION • Augmenting accuracy inside the kernel reduces rounding errors  improves stability of the computation • Augmenting the mantissa during accumulation is not sufficient • Usual solution is to tweak the solver (pre-conditioning, etc.) but this is costly, hazardous and very limited • Another solution is to double precision (  quad !!) in the intermediate calculation  huge impact in memory and in calculation time • Using specialized data types (GMP, MPFR) has the same pitfalls • At even higher cost in memory • Our solution: • Variable precision, byte-aligned data format for intermediate data in memory • affordable memory footprint for intermediate data • Hardware support for variable precision in hardware co-processor • Up to 4x64 bits fractional part in internal accumulator Y.Durand | Oct 2019 | 7

  8. PERSPECTIVES • Early investigation carried on by CEA • With support of other research projects • OPRECOMP, Imprenum, QUANTEX • First Use cases • Proof of concept = First FPGA prototype • Investigation on Compiler and library support • Mid-term Target : Proof of realization • Re-engineering with actual memory subsystem & infrastructure • Improve co-processor integration with processor • SW integration (libraries, execution model ?) • Main publications • Andrea Bocco, Yves Durand, and Florent de Dinechin. SMURF: Scalar multiple-precision unum Risc-V floating-point accelerator for scientific computing. In Conference on Next-Generation Arithmetic , March 2019 • Tiago Trevisan Jost, Andrea Bocco, Yves Durand, Christian Fabre, Florent De Dinechin, Anca Molnos, Albert Cohen:Variable Precision Capabilities in RISC-V Processors, RISC-V Workshop Zurich (June 11 – 13, 2019) • Andrea Bocco, Yves Durand, and Florent de Dinechin. Dynamic precision numerics using a variable-precision UNUM type I HW coprocessor. In 26th IEEE Symposium of Computer Arithmetic (ARITH-26) , June 2019 . Y.Durand | April 2019 | 8

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend