AM P A R CudA Multiple Precision ARithmetic librarY When do - PowerPoint PPT Presentation

CAMPARY CudA Multiple Precision ARithmetic librarY Valentina Popescu Joined work with: Mioara Joldes, Jean-Michel Muller March 2015 AM P A R CudA Multiple Precision ARithmetic librarY

When do we need more precision? Youtube view counter: 32 -bit integer register (limit 2 , 147 , 483 , 647 ) ∗ courtesy of http://www.reddit.com/r/ProgrammerHumor/ 1 / 21

When do we need more precision? Youtube view counter: 32 -bit integer register (limit 2 , 147 , 483 , 647 ) in 2014 the video Psy - Gangnam Style touched the limit ∗ ∗ courtesy of http://www.reddit.com/r/ProgrammerHumor/ 1 / 21

When do we need more precision? Youtube view counter: 32 -bit integer register (limit 2 , 147 , 483 , 647 ) in 2014 the video Psy - Gangnam Style touched the limit update to 64 -bit with a limit of 9 , 223 , 372 , 036 , 854 , 775 , 808 (more than 9 quintillion) ∗ statement: We never thought a video would be watched in numbers greater than a 32-bit integer... but that was before we met Psy . ∗ courtesy of http://www.reddit.com/r/ProgrammerHumor/ 1 / 21

More serious things! Dynamical systems: 0.4 bifurcation analysis, 0.2 compute periodic orbits (finding sinks 0 x2 in the H´ enon map, iterating the -0.2 Lorenz attractor), -0.4 long term stability of the solar system. -1.5 -1 -0.5 0 0.5 1 1.5 x1 2 / 21

More serious things! Dynamical systems: 0.4 bifurcation analysis, 0.2 compute periodic orbits (finding sinks 0 x2 in the H´ enon map, iterating the -0.2 Lorenz attractor), -0.4 long term stability of the solar system. -1.5 -1 -0.5 0 0.5 1 1.5 x1 Need more precision –few hundred bits– than standard available Need massive parallel computations: –high performance computing (HPC)– 2 / 21

Floating point arithmetics A real number X is approximated in a machine by a rational x = M x · 2 e x − p +1 , where M x is the significand , a p -digit signed integer in radix 2 s.t. 2 p − 1 ≤ | M x | ≤ 2 p − 1; e x is the exponent , a signed integer ( e min ≤ e x ≤ e max ). 3 / 21

Floating point arithmetics A real number X is approximated in a machine by a rational x = M x · 2 e x − p +1 , where M x is the significand , a p -digit signed integer in radix 2 s.t. 2 p − 1 ≤ | M x | ≤ 2 p − 1; e x is the exponent , a signed integer ( e min ≤ e x ≤ e max ). Concepts: unit in the last place (Goldberg’s definition): ulp ( x ) = 2 e x − p +1 . unit in the last significant place : uls ( x ) = ulp ( x ) · 2 z x , where z x is the number of trailing zeros at the end of M x . 3 / 21

Reminder: IEEE 754-2008 standard Most common formats Single precision format ( p = 24 ): 1 8 23 s e m Double precision format ( p = 53 ): 1 11 52 s e m − → Implicit bit that is not stored. 4 / 21

Reminder: IEEE 754-2008 standard Most common formats Single precision format ( p = 24 ): 1 8 23 s e m Double precision format ( p = 53 ): 1 11 52 s e m − → Implicit bit that is not stored. Rounding modes 4 rounding modes: RD, RU, RZ, RN Correct rounding for: + , − , × , ÷ , √ (return what we would get by infinitely precise operations followed by rounding). Portability, determinism. 4 / 21

Multiple precision arithmetic libraries Two ways of representing numbers in extended precision multiple-digit representation - a number is represented by a sequence of digits coupled with a single exponent (Ex. GNU MPFR, ARPREC, GARPREC, CUMP); multiple-term representation - a number is expressed as the unevaluated sum of several FP numbers (also called a FP expansion ) (Ex. QD, GQD). 5 / 21

Multiple precision arithmetic libraries Two ways of representing numbers in extended precision multiple-digit representation - a number is represented by a sequence of digits coupled with a single exponent (Ex. GNU MPFR, ARPREC, GARPREC, CUMP); multiple-term representation - a number is expressed as the unevaluated sum of several FP numbers (also called a FP expansion ) (Ex. QD, GQD). Need for another multiple precision library: GNU MPFR - not ported on GPU GARPREC & CUMP - tuned for big array operations where the data is generated on the host and only the operations are performed on the device QD & GQD - offer only double-double and quad-double precision; the results are not correctly rounded 5 / 21

CAMPARY (CudaA Multiple Precision ARithmetic librarY) Our approach: multiple-term representation Drawback: redundant concept, more than one representation. 6 / 21

CAMPARY (CudaA Multiple Precision ARithmetic librarY) Our approach: multiple-term representation Drawback: redundant concept, more than one representation. Example: p = 5 (in radix 2 ) The real number R = 1 . 11010011 e − 1 can be represented as: R = x 0 + x 1 + x 2 :  x 0 = 1 . 1000 e − 1 ;  x 1 = 1 . 0010 e − 3 ; x 2 = 1 . 0110 e − 6 .  6 / 21

CAMPARY (CudaA Multiple Precision ARithmetic librarY) Our approach: multiple-term representation Drawback: redundant concept, more than one representation. Example: p = 5 (in radix 2 ) The real number R = 1 . 11010011 e − 1 can be represented as: R = x 0 + x 1 + x 2 :  x 0 = 1 . 1000 e − 1 ;  x 1 = 1 . 0010 e − 3 ; x 2 = 1 . 0110 e − 6 .  Most compact R = z 0 + z 1 : � z 0 = 1 . 1101 e − 1 ; z 1 = 1 . 1000 e − 8 . 6 / 21

CAMPARY (CudaA Multiple Precision ARithmetic librarY) Our approach: multiple-term representation Drawback: redundant concept, more than one representation. Example: p = 5 (in radix 2 ) The real number R = 1 . 11010011 e − 1 can be represented as: Least compact R = x 0 + x 1 + x 2 : R = y 0 + y 1 + y 2 + y 3 + y 4 + y 5 :  x 0 = 1 . 1000 e − 1 ;   y 0 = 1 . 0000 e − 1 ; x 1 = 1 . 0010 e − 3 ;   y 1 = 1 . 0000 e − 2 ;  x 2 = 1 . 0110 e − 6 .     y 2 = 1 . 0000 e − 3 ;  Most compact R = z 0 + z 1 : � z 0 = 1 . 1101 e − 1 ; y 3 = 1 . 0000 e − 5 ;   y 4 = 1 . 0000 e − 8 ;   z 1 = 1 . 1000 e − 8 .   y 5 = 1 . 0000 e − 9 ;  6 / 21

CAMPARY (CudaA Multiple Precision ARithmetic librarY) Our approach: multiple-term representation Drawback: redundant concept, more than one representation. Example: p = 5 (in radix 2 ) The real number R = 1 . 11010011 e − 1 can be represented as: Least compact R = x 0 + x 1 + x 2 : R = y 0 + y 1 + y 2 + y 3 + y 4 + y 5 :  x 0 = 1 . 1000 e − 1 ;   y 0 = 1 . 0000 e − 1 ; x 1 = 1 . 0010 e − 3 ;   y 1 = 1 . 0000 e − 2 ;  x 2 = 1 . 0110 e − 6 .     y 2 = 1 . 0000 e − 3 ;  Most compact R = z 0 + z 1 : � z 0 = 1 . 1101 e − 1 ; y 3 = 1 . 0000 e − 5 ;   y 4 = 1 . 0000 e − 8 ;   z 1 = 1 . 1000 e − 8 .   y 5 = 1 . 0000 e − 9 ;  Solution To ensure that an expansion carries significantly more information than one FP number only, it is required to be non-overlapping . − → (re-)normalization algorithms 6 / 21

Nonoverlapping expansions Definition1: P -nonoverlapping (according to Priest’s definition). For an expansion u 0 , u 1 , . . . , u n − 1 if for all 0 < i < n , we have | u i | < ulp ( u i − 1 ) . x 0 = 1 . 1010 e − 2; ;   x 1 = 1 . 1101 e − 7 ;  Example: x 2 = 1 . 0100 e − 12 ;   x 3 = 1 . 1000 e − 18 . 7 / 21

Nonoverlapping expansions Definition1: P -nonoverlapping (according to Priest’s definition). For an expansion u 0 , u 1 , . . . , u n − 1 if for all 0 < i < n , we have | u i | < ulp ( u i − 1 ) . x 0 = 1 . 1010 e − 2; ;   x 1 = 1 . 1101 e − 7 ;  Example: x 2 = 1 . 0100 e − 12 ;   x 3 = 1 . 1000 e − 18 . Definition2 (nonzero-overlapping): S -nonoverlapping (according to Shewchuk’s definition). For an expansion u 0 , u 1 , . . . , u n − 1 if for all 0 < i < n , we have | u i | < uls ( u i − 1 ) . x 0 = 1 . 1000 e − 1 ;   x 1 = 1 . 0100 e − 3 ;  Example: x 2 = 1 . 1001 e − 7 ;   x 3 = 1 . 1010 e − 12 ; 7 / 21

Error-Free Transforms: 2Sum & 2ProdFMA Theorem 1 ( 2Sum algorithm) Let a and b be FP numbers. Algorithm 2Sum computes two FP numbers s and e that satisfy the following: s + e = a + b exactly; s = RN ( a + b ) . ( RN stands for performing the operation in rounding to nearest rounding mode.) Algorithm 1 ( 2Sum ( a, b ) ) s ← RN ( a + b ) t ← RN ( s − b ) e ← RN ( RN ( a − t ) + RN ( b − RN ( s − t ))) return ( s, e ) − → 6 FP operations (proved to be optimal unless we have information on the ordering of | a | and | b | ) 8 / 21

AM P A R CudA Multiple Precision ARithmetic librarY When do - PowerPoint PPT Presentation

CAMPARY CudA Multiple Precision ARithmetic librarY Valentina Popescu Joined work with: Mioara Joldes, Jean-Michel Muller March 2015 AM P A R CudA Multiple Precision ARithmetic librarY When do we need more precision? Youtube view

Sell Your Innovation Build in Canada Innovation Program Office of Small and Medium Enterprises

STRUCTURAL DESIGN FINAL PROJECT REPORT Valentjn BATLLE | Paola TORRES | Aurlien DE BOIS TABLE

STESSA 1 8 1 4 - 1 7 February 2 0 1 8 STESSA 1 8 Christchurch, New Zealand Scope of

Homeland Security Prevent, Protect, Prepare Narrowbanding Basics 1 Todays Radios

TEXTILE REINFORCED CEMENT COMPOSITES FOR THE DESIGN OF VERY THIN SADDLE SHELLS: A CASE STUDY T.

Sign In 1 5/4/17 Please allow me to introduce myself Objectives By the end of this

Earnings Update Oct 23, 2019 Allsec- A Quess Enterprise Disclaimer This presentation may

QMS Shareholders (other than Rollover Shareholders, who will receive a mixed consideration

Presentation 2nd Quarter 2018 Bergen 22.08.2018 CFO Thor Krukhaug Disclaimer This presentation

Urban planning and affordable housing Dr Marcus Spiller Presentation to Shelter (Tasmania) AGM

eID the corner Stone of digitalizing Wednesday September 7th from 9 am to 12 am. Hotel Hafnia

Use WordPress Multisite To Create Your Own Website As A Service (WaaS) @nick_le ffl er Whats

Giving a Presentation DAVID SOTIR - ADVISOR Location: CB01.03.08 Telephone: 9514 9733 Email:

Vital Status on Emergency Visits and Hospital Admissions: A Remote Patient Monitoring Case Study

DOF Subsea Group DOF Subsea highlights 1 Attractive long-term market fundamentals supporting

Digital Health for Allied Health professionals with Katrina Otto, Train IT Medical

2016 Census Data Seminar Value of the Census Allocation of government funds and support for

1 st Quarter 2015 Erik Osmundsen, CEO and Dean Zuzic, CFO Disclaimer VV Holding AS is providing

THE NEXT LEVEL Cautionary Statement Any forward-looking statements in this presentation factors

Originally presented by ISF at Putting food on the table conference Mon 13 th Tues 14 th

The Effect of Facades on Outdoor Microclimate Jonathan Fox PhD Candidate Faculty of Built

European Maritime and Fisheries Fund (EMFF) Funding opportunities for small ports The Danish

at Highway Expansion Schiphol-Amsterdam- Almere (SAA) Ingeborg Ligtenberg Senior Project

Kick-off del Programa de Doctorado Formacin en la Sociedad del Conocimiento Edicin