dynamic precision numerics using a variable precision
play

DYNAMIC PRECISION NUMERICS USING A VARIABLE-PRECISION UNUM TYPE I HW - PowerPoint PPT Presentation

DYNAMIC PRECISION NUMERICS USING A VARIABLE-PRECISION UNUM TYPE I HW COPROCESSOR ARITH26 | BOCCO Andrea | 11 June 2019 INTRODUCTION: STATE OF THE ART Variable Precision (VP) computing has been investigated to improve convergence of


  1. DYNAMIC PRECISION NUMERICS USING A VARIABLE-PRECISION UNUM TYPE I HW COPROCESSOR ARITH’26 | BOCCO Andrea | 11 June 2019

  2. INTRODUCTION: STATE OF THE ART ➢ Variable Precision (VP) computing has been investigated to improve convergence of algorithms. It has been investigated in: ▪ Software (SW): GMP [2] and MPFR [3] ▪ Slow, they might not met requirements in high speed applications ▪ Hardware (HW): ▪ Kulisch [4] : large fixed point accumulator ▪ Schulte and Swartzlander [5] : mantissas divided in multiple words ➢ None of the previous works show how to store efficiently VP Floating Point (FP) number in main memory ▪ They support IEEE 754 FP format in main memory [1] IEEE754-2008 2008. IEEE Standard for Floating-Point Arithmetic. IEEE 754-2008 https://doi.org/10.1109/IEEESTD.2008.4610935 [2] Torbjörn Granlund and the GMP development team. 2012. GNU MP: The GNU Multiple Precision Arithmetic Library. https://gmplib.org/ [3] Laurent Fousse, et al. MPFR: A Multiple precision Binary Floating-point Library with Correct Rounding. https://doi.org/10.1145/1236463.1236468 [4] Ulirich Kulisch. 2013. Computer arithmetic and validity: Theory, implementation, and applications [5] M. J. Schulte and E. E. Swartzlander. 2000. A family of variable precision interval arithmetic processors. https://doi.org/10.1109/12.859535 | 2

  3. INTRODUCTION: MY WORK Our previous work [6] : a VP FP hardware accelerator : • Supports the UNUM type I format in Rocket tile main memory 1 5 FPU RISC-V • Does computation internally with another Rocket LSU Chip R R (hardware friendly) FP format $ $ RoCC A A 2 L1 L1 • M M 3 Supports I nterval A rithmetic (IA) UNUM co-proc LSU Scratchpad 4 This work: ▪ Refines the UNUM type I FP format. ▪ Proposes a new VP FP architecture. ▪ Proposes a new programming model. ▪ Benchmarks our system. [6] A. Bocco, Y. Durand, F. Dinechin, 2019, SMURF: Scalar Multiple-precision UNUM RISC-V Floating-point Accelerator for Scientific Computing. | 3

  4. OUTLINE • Choice of the memory format: the UNUM type I • Refinements on the UNUM type I FP format • The adopted VP FP Architecture • The programming model • System benchmark: gauss elimination solver • Conclusions | 4

  5. OUTLINE • Choice of the memory format: the UNUM type I • Refinements on the UNUM type I FP format • The adopted VP FP Architecture • The programming model • System benchmark: gauss elimination solver • Conclusions | 5

  6. CHOICE OF THE MEMORY FORMAT: THE UNUM TYPE I We decided to use the UNUM type I FP format in main memory • It is 6 sub-fields self-descriptive FP format es bits fs bits s e f u es-1 fs-1 sign exponent fraction ubit exponent fraction size size 3 more that conventional IEEE 754 FP numbers • WHY? • UNUM is a VP FP format • It self-encodes the exponent and fraction field lengths However UNUM type I has some peculiarities to be fixed: • How to organize UNUM arrays in main memory • How to organize the UNUM fields in memory | 6

  7. OUTLINE • Choice of the memory format: the UNUM type I • Refinements on the UNUM type I FP format • The adopted VP FP Architecture • The programming model • System benchmark: gauss elimination solver • Conclusions | 7

  8. REFINEMENTS ON THE UNUM TYPE I FP FORMAT: - UNUM FIELD ORGANIZATION For a UNUM/ubound which spans multiple addresses in main memory it is important to have the descriptor fields present in the lower addresses. ➢ We have re-organized the order of the fields for UNUM and ubound LSB MSB s u es-1 fs-1 e f 1 left right left right left right 2 s u es-1 fs-1 s u es-1 fs-1 e e f f 00--00 00--00 @1’: ? U1 @1’: U1 ? ? ? @2’: ? U2 ? ? ? FF--FF FF--FF p p | 8

  9. REFINEMENTS ON THE UNUM TYPE I FP FORMAT: - UNUM ARRAY ORGANIZATION Handling a two-element UNUM array on main memory with p bits parallelism p p p U1_0 U1_1 U1 : U2_0 U2_1 U2_2 U2 : bit length 0 p 2p 3p 1 2 00--00 00--00 U3_0 U3_0 @1’: U1_0 @1’: U1_0 U3_1 U3_1 U1_1 U1_1 ! U3_2 U2_0 U3_2 @2’’: U2_0 U2_1 @2’: U3=U1*U2 U2_1 U2_2 Array support : Guarantee affine U2_2 addressing FF--FF FF--FF p p scheme | 9

  10. OUTLINE • Choice of the memory format: the UNUM type I • Refinements on the UNUM type I FP format • The adopted VP FP Architecture • The programming model • System benchmark: gauss elimination solver • Conclusions | 10

  11. THE ADOPTED VP FP ARCHITECTURE • 1 integer register file (iRF): 32 integer general purpose register (GPR) + pc, in the main processor. • 1 g-bound register file (gRF): 32 entries, in the co-processor. • UNUMs/u-bounds are strictly considered as memory formats: • Load operations: • Load UNUMs/u-bounds from the main memory, and converts them into internal g-bounds. • Store operations: • Convert internal g-bounds (entries of the internal gRF) into u-bounds. Store the latter the main memory. • The coprocessor internal parallelism is fixed to 64 bits • Coprocessor’s status registers: Rocket tile • 1 5 DUE FPU RISC-V • SUE Rocket • LSU Chip MBB NEW! R R $ $ • A A WGP RoCC 2 L1 L1 M M 3 UNUM co-proc LSU Scratchpad 4 | 11

  12. THE MBB: MAXIMUM BYTE BUDGET UNUM format is variable length (up to a maximum length) ▪ It is impossible to have compacted arrays having random access to its elements ➢ We define the Maximum Byte Budget (MBB) as the maximum length that a UNUM number can have in main memory LSU MBB MBB u ’0 g0 u0 u’1 g1 u1 u’2 g2 G2U u2 BMF u’3 g3 u3 u’4 g4 u4 MBB ➢ The user can address VP FP numbers specifying their length with Byte granularity. | 12

  13. THE BMF: BOUNDED MEMORY FORMAT ess ’ fss ’ es_max fs_max s u es-1 fs-1 1a) 0 1 1-----1 1-----1 1--------------1 1---------------------------------1 qNaN 2a) 1 1 1-----1 1-----1 1--------------1 1---------------------------------1 sNaN +∞↓ 3a) 0 0 1-----1 1-----1 1--------------1 1---------------------------------1 UNUSED BITS - ∞↓ MBB 4a) 1 0 1-----1 1-----1 1--------------1 1---------------------------------1 +∞) right >= 5a) 0 1 1-----1 1-----1 1--------------1 1-------------------------------10 (- ∞ left max unum lengh 6a) 1 1 1-----1 1-----1 1--------------1 1-------------------------------10 +∞) right 7a) 0 1 es-1 fs-1 1------1 1---------------------1 (- ∞ left 8a) 1 1 es-1 fs-1 1------------1 1------------------------1 9a) s u es-1 fs-1 e f x 1b) 0 1 1--------1 1--------1 qNaN 2b) 1 1 1--------1 1--------1 sNaN UNUSED BITS +∞↓ MBB 3b) 0 0 1--------1 1--------1 < - ∞↓ 4b) 1 0 1--------1 1--------1 +∞) right max unum lengh 5b) 0 1 es-1 fs-1 1------1 1---------------------1 (- ∞ left 6b) 1 1 es-1 fs-1 1------------1 1------------------------1 7b) s u es-1 fs-1 e f x s u es-1 fs-1 es fs fss ’’ ess ’’ bit length 0 MBB*8 | 13

  14. OUTLINE • Choice of the memory format: the UNUM type I • Refinements on the UNUM type I FP format • The adopted VP FP Architecture • The programming model • System benchmark: gauss elimination solver • Conclusions | 14

  15. THE COPROCESSOR PROGRAMMING MODEL Our hardware is best suited for VP kernels which exploit three different storage types: • The external (main memory) storage • The intermediate (L1 cache) storage • The internal (register-level) storage 01: k = 0 Legend: Outermost loop 02: while convergence not reached do · Intermediate loop 03: for i := 1:n do Ā = x b Innermost loop 04:  =0 05: for j := 1:n do Rocket tile 2 FPU 06: if j ≠ i then RISC-V (𝒍) LSU 07: 𝝉 += 𝒃 𝒋𝒌 𝒚 𝒌 R $ RoCC 08: end A UNUM L1 UNUM M 09: end co-proc LSU co-proc (𝒍+𝟐) = 𝝉 𝟐 3 1 Scratchpad 10: 𝒚 𝒋 𝒃 𝒋𝒋 (𝒄 𝒋 − 𝝉) 11: end 12: k=k+1 x 13: end | 15

  16. OUTLINE • Choice of the memory format: the UNUM type I • Refinements on the UNUM type I FP format • The adopted VP FP Architecture • The programming model • System benchmark: gauss elimination solver • Conclusions | 16

  17. SYSTEM BENCHMARK: GAUSS ELIMINATION SOLVER Our system benchmarked with a Gauss elimination solver, both in UNUM (scalar) and ubound (interval), showed: • A gain of up to 65 decimal digits on IEEE double • The result precision is constrained by the adopted precision in memory. • Intervals do not converge always but it is useful in the computational error estimation (Ax-b). • A speed up of 4-10x with respect to the MPFR software library | 17

  18. OUTLINE • Choice of the memory format: the UNUM type I • Refinements on the UNUM type I FP format • The adopted VP FP Architecture • The programming model • System benchmark: gauss elimination solver • Conclusions | 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend