Return of the hardware floating-point elementary functions J er - - PowerPoint PPT Presentation

return of the hardware floating point elementary functions
SMART_READER_LITE
LIVE PREVIEW

Return of the hardware floating-point elementary functions J er - - PowerPoint PPT Presentation

ARITH 18 June 2527, 2007 Return of the hardware floating-point elementary functions J er emie Detrey, Florent de Dinechin, and Xavier Pujol Projet Ar enaire LIP UMR CNRS ENS Lyon UCB Lyon INRIA 5668


slide-1
SLIDE 1

ARITH 18 – June 25–27, 2007

Return of the hardware floating-point elementary functions

J´ er´ emie Detrey, Florent de Dinechin, and Xavier Pujol

Projet Ar´ enaire – LIP UMR CNRS – ENS Lyon – UCB Lyon – INRIA 5668 http://www.ens-lyon.fr/LIP/Arenaire/

CENTRE NATIONAL DE LA RECHERCHE SCIENTIFIQUE

ECOLE NORMALE SUPERIEURE DE LYON

slide-2
SLIDE 2

1

Outline of the talk

◮ Context ◮ Double-precision exponential ◮ Results ◮ Conclusion

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

1 / 22

slide-3
SLIDE 3

2

Outline of the talk

◮ Context ◮ Double-precision exponential ◮ Results ◮ Conclusion

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

2 / 22

slide-4
SLIDE 4

3

A long time ago...

(in a galaxy not so far away)

◮ a bit of paleo-bibliography

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

3 / 22

slide-5
SLIDE 5

3

A long time ago...

(in a galaxy not so far away)

◮ a bit of paleo-bibliography

  • M. D. Ercegovac (IEEE TC, 1975)

Radix-16 evaluation of certain elementary functions.

  • G. Paul and M. W. Wilson (ACM TOMS, 1976)

Should the elementary functions be incorporated into computer instruction sets?

  • C. Wrathall and T. C. Chen. (ARITH 4, 1978)

Convergence guarantee and improvements for a hardware exponential and logarithm evaluation scheme.

  • P. Farmwald (ARITH 5, 1981)

High-bandwidth evaluation of elementary functions.

  • M. Cosnard, A. Guyot, B. Hochet, J.-M. Muller, H. Ouaouicha, P. Paul, and
  • E. Zysmann (ARITH 8, 1987)

The FELIN arithmetic coprocessor chip.

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

3 / 22

slide-6
SLIDE 6

4

FPUs strike back

◮ ... then came the floating-point unit

  • dedicated efficient hardware operators
  • only basic operations: +, −, ×, ÷ and √
  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

4 / 22

slide-7
SLIDE 7

4

FPUs strike back

◮ ... then came the floating-point unit

  • dedicated efficient hardware operators
  • only basic operations: +, −, ×, ÷ and √

◮ what about elementary functions?

  • comparatively rare operations
  • hardware implementation would be a waste of silicon
  • dedicate silicon to more useful units (ALUs, FPUs, caches)
  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

4 / 22

slide-8
SLIDE 8

4

FPUs strike back

◮ ... then came the floating-point unit

  • dedicated efficient hardware operators
  • only basic operations: +, −, ×, ÷ and √

◮ what about elementary functions?

  • comparatively rare operations
  • hardware implementation would be a waste of silicon
  • dedicate silicon to more useful units (ALUs, FPUs, caches)

◮ only software or micro-code implementations

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

4 / 22

slide-9
SLIDE 9

5

FPGAs: a new hope?

◮ Field-Programmable Gate Arrays ◮ reconfigurable integrated circuits

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

5 / 22

slide-10
SLIDE 10

5

FPGAs: a new hope?

◮ Field-Programmable Gate Arrays ◮ reconfigurable integrated circuits ◮ architecture based on programmable logic cells and routing resources

  • lower performances than ASICs
  • high flexibility
  • fine-grain parallelism
  • lower cost per unit
  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

5 / 22

slide-11
SLIDE 11

5

FPGAs: a new hope?

◮ Field-Programmable Gate Arrays ◮ reconfigurable integrated circuits ◮ architecture based on programmable logic cells and routing resources

  • lower performances than ASICs
  • high flexibility
  • fine-grain parallelism
  • lower cost per unit

◮ 1 billion transistor FPGAs: huge computational capacity ◮ many application domains:

  • digital signal and image processing
  • cryptography
  • bioinformatics
  • scientific computing
  • ...
  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

5 / 22

slide-12
SLIDE 12

6

FPGAs and arithmetic

◮ initially: LUT-based logic cells

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

6 / 22

slide-13
SLIDE 13

6

FPGAs and arithmetic

◮ initially: LUT-based logic cells ◮ currently: only integer arithmetic

  • dedicated logic and routing for fast adders
  • small embedded multipliers (18 × 18 bits)
  • multiply-and-accumulate blocks

◮ not enough for many applications

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

6 / 22

slide-14
SLIDE 14

6

FPGAs and arithmetic

◮ initially: LUT-based logic cells ◮ currently: only integer arithmetic

  • dedicated logic and routing for fast adders
  • small embedded multipliers (18 × 18 bits)
  • multiply-and-accumulate blocks

◮ not enough for many applications ◮ strong need for more complex operators

  • other operations: division, square root, elementary functions, ...
  • other number systems: modular arithmetic, real arithmetic, ...
  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

6 / 22

slide-15
SLIDE 15

6

FPGAs and arithmetic

◮ initially: LUT-based logic cells ◮ currently: only integer arithmetic

  • dedicated logic and routing for fast adders
  • small embedded multipliers (18 × 18 bits)
  • multiply-and-accumulate blocks

◮ not enough for many applications ◮ strong need for more complex operators

  • other operations: division, square root, elementary functions, ...
  • other number systems: modular arithmetic, real arithmetic, ...
  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

6 / 22

slide-16
SLIDE 16

7

FPLibrary

◮ library of portable VHDL operators for floating-point ◮ all operators are parameterized in terms of range and precision

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

7 / 22

slide-17
SLIDE 17

7

FPLibrary

◮ library of portable VHDL operators for floating-point ◮ all operators are parameterized in terms of range and precision single precision double precision +/−

  • ×
  • ÷
  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

7 / 22

slide-18
SLIDE 18

7

FPLibrary

◮ library of portable VHDL operators for floating-point ◮ all operators are parameterized in terms of range and precision single precision double precision +/−

  • ×
  • ÷
  • log x
  • ex
  • sin x / cos x
  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

7 / 22

slide-19
SLIDE 19

7

FPLibrary

◮ library of portable VHDL operators for floating-point ◮ all operators are parameterized in terms of range and precision single precision double precision +/−

  • ×
  • ÷
  • log x
  • ex
  • sin x / cos x
  • ◮ single-precision logarithm and exponential
  • hardware-specific algorithms
  • ad-hoc range reduction
  • table-based fixed-point evaluation
  • small and fast operators
  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

7 / 22

slide-20
SLIDE 20

7

FPLibrary

◮ library of portable VHDL operators for floating-point ◮ all operators are parameterized in terms of range and precision single precision double precision +/−

  • ×
  • ÷
  • log x
  • ?

ex

  • ?

sin x / cos x

  • ◮ single-precision logarithm and exponential
  • hardware-specific algorithms
  • ad-hoc range reduction
  • table-based fixed-point evaluation
  • small and fast operators
  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

7 / 22

slide-21
SLIDE 21

8

Double precision: using the same method?

◮ range reduction and reconstruction are scalable

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

8 / 22

slide-22
SLIDE 22

8

Double precision: using the same method?

◮ range reduction and reconstruction are scalable ◮ table-based method for the actual computation

  • exponential growth of the area
  • estimations w.r.t. single precision: 15× larger for the exponential, and

40× larger for the logarithm!!

  • unacceptable overhead for usual FPGAs

◮ need for another algorithm, suited to higher precisions

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

8 / 22

slide-23
SLIDE 23

8

Double precision: using the same method?

◮ range reduction and reconstruction are scalable ◮ table-based method for the actual computation

  • exponential growth of the area
  • estimations w.r.t. single precision: 15× larger for the exponential, and

40× larger for the logarithm!!

  • unacceptable overhead for usual FPGAs

◮ need for another algorithm, suited to higher precisions ◮ iterative method

  • smaller architecture
  • higher scalability
  • longer critical path
  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

8 / 22

slide-24
SLIDE 24

9

Outline of the talk

◮ Context ◮ Double-precision exponential ◮ Results ◮ Conclusion

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

9 / 22

slide-25
SLIDE 25

10

Number format

1 wE wF

SX EX FX

◮ 2 parameters: wE (range) and wF (precision) ◮ inspired from the IEEE-754 standard: X = (−1)SX · 1.FX · 2EX−E0

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

10 / 22

slide-26
SLIDE 26

10

Number format

1 wE wF

SX EX FX

2

❡①♥X

◮ 2 parameters: wE (range) and wF (precision) ◮ inspired from the IEEE-754 standard: X = (−1)SX · 1.FX · 2EX−E0 ◮ 2 extra bits for exceptional cases: zero, infinity or Not-a-Number (NaN)

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

10 / 22

slide-27
SLIDE 27

11

Evaluation method

◮ range reduction: X = k · log 2 + Y with k ∈ Z and 0 ≤ Y < 1 ◮ we obtain: R = eX = 2k · eY

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

11 / 22

slide-28
SLIDE 28

11

Evaluation method

◮ range reduction: X = k · log 2 + Y with k ∈ Z and 0 ≤ Y < 1 ◮ we obtain: R = eX = 2k · eY ◮ fixed-point eY?

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

11 / 22

slide-29
SLIDE 29

11

Evaluation method

◮ range reduction: X = k · log 2 + Y with k ∈ Z and 0 ≤ Y < 1 ◮ we obtain: R = eX = 2k · eY ◮ fixed-point eY? generalization of an idea by Wong and Goto (IEEE TC 1994)

  • successive range reductions of the fixed-point argument Y
  • once the argument sufficiently reduced, direct evaluation of the exponential
  • reconstructions using rectangular multipliers
  • computes eY − 1
  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

11 / 22

slide-30
SLIDE 30

12

Iterative method: range reductions

◮ for step each i, we consider the argument Yi (starting with Y0 = Y )

± ✵ ✵✳ ✵ ✳ ✳ ✳ Yi

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

12 / 22

slide-31
SLIDE 31

12

Iterative method: range reductions

◮ for step each i, we consider the argument Yi (starting with Y0 = Y )

± ✵ ✵✳ ✵ ✳ ✳ ✳ Yi

αi − 1 βi

± Ai Bi

◮ splitting Yi as Ai + Bi, we address two look-up tables with Ai:

  • eAi − 1, rounded to its αi most significant bits, noted

eAi − 1

  • Li = log
  • eAi
  • , rounded to its αi + βi most significant bits
  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

12 / 22

slide-32
SLIDE 32

12

Iterative method: range reductions

◮ for step each i, we consider the argument Yi (starting with Y0 = Y )

± ✵ ✵✳ ✵ ✳ ✳ ✳ Yi

αi − 1 βi

± Ai Bi ± ✵ ✵✳ ✵ ✳ ✳ ✳ Li

◮ splitting Yi as Ai + Bi, we address two look-up tables with Ai:

  • eAi − 1, rounded to its αi most significant bits, noted

eAi − 1

  • Li = log
  • eAi
  • , rounded to its αi + βi most significant bits

◮ by construction, Li ≈ Yi

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

12 / 22

slide-33
SLIDE 33

12

Iterative method: range reductions

◮ for step each i, we consider the argument Yi (starting with Y0 = Y )

± ✵ ✵✳ ✵ ✳ ✳ ✳ Yi

αi − 1 βi

± Ai Bi ± ✵ ✵✳ ✵ ✳ ✳ ✳ Li ± ✵ ✵✳ ✳ ✳ ✳ ✳ ✵ Yi+1 ✳ ✳ ✳ ✳ ✳ ✳

◮ splitting Yi as Ai + Bi, we address two look-up tables with Ai:

  • eAi − 1, rounded to its αi most significant bits, noted

eAi − 1

  • Li = log
  • eAi
  • , rounded to its αi + βi most significant bits

◮ by construction, Li ≈ Yi ◮ we then define Yi+1 as Yi − Li:

  • the αi − 1 most significant bits of Yi are cancelled
  • Yi+1 is a 1 + βi-bit number
  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

12 / 22

slide-34
SLIDE 34

13

Iterative method: computing the exponential

◮ the reduction process is iterated until the step k such that Yk < 2−⌈wF/2⌉

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

13 / 22

slide-35
SLIDE 35

13

Iterative method: computing the exponential

◮ the reduction process is iterated until the step k such that Yk < 2−⌈wF/2⌉ ◮ we can then approximate the exponential as eYk − 1 ≈ Yk

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

13 / 22

slide-36
SLIDE 36

14

Iterative method: reconstructions

◮ at each step i, we have:

eAi − 1, from the corresponding range reduction step

  • eYi+1 − 1, from the previous reconstruction, with Yi+1 = Yi − log
  • eAi
  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

14 / 22

slide-37
SLIDE 37

14

Iterative method: reconstructions

◮ at each step i, we have:

eAi − 1, from the corresponding range reduction step

  • eYi+1 − 1, from the previous reconstruction, with Yi+1 = Yi − log
  • eAi
  • ◮ we then compute eYi − 1 as
  • eAi − 1
  • ×
  • eYi+1 − 1
  • +
  • eAi − 1
  • +
  • eYi+1 − 1
  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

14 / 22

slide-38
SLIDE 38

14

Iterative method: reconstructions

◮ at each step i, we have:

eAi − 1, from the corresponding range reduction step

  • eYi+1 − 1, from the previous reconstruction, with Yi+1 = Yi − log
  • eAi
  • ◮ we then compute eYi − 1 as
  • eAi − 1
  • ×
  • eYi+1 − 1
  • +
  • eAi − 1
  • +
  • eYi+1 − 1
  • =

eAi · eYi+1 − 1

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

14 / 22

slide-39
SLIDE 39

14

Iterative method: reconstructions

◮ at each step i, we have:

eAi − 1, from the corresponding range reduction step

  • eYi+1 − 1, from the previous reconstruction, with Yi+1 = Yi − log
  • eAi
  • ◮ we then compute eYi − 1 as
  • eAi − 1
  • ×
  • eYi+1 − 1
  • +
  • eAi − 1
  • +
  • eYi+1 − 1
  • =

eAi · eYi+1 − 1 = eAi · eYi · e

− log „ f eAi «

− 1

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

14 / 22

slide-40
SLIDE 40

15

Architecture

±1

round

1

shift

E0 1 ♦✈❡r✢♦✇✴ 1/ log 2 k E0 EX SX log 2 Xfix Y eY − 1 eY exnX

normalize / round sign / exception handling

  • R ≈ eX

✉♥❞❡r✢♦✇ FX

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

15 / 22

slide-41
SLIDE 41

15

Architecture

±1

round

1

shift

E0 1 ♦✈❡r✢♦✇✴ 1/ log 2 k E0 EX SX log 2 Xfix Y eY − 1 eY exnX

normalize / round sign / exception handling

  • R ≈ eX

✉♥❞❡r✢♦✇ FX Y0 Y eY − 1 eY Xfix ♦✈❡r✢♦✇✴ exnX SX k EX FX 1

  • R ≈ eX

sign / exception handling normalize / round

1

E0 log 2

round

1/ log 2

shift

±1

E0 ✉♥❞❡r✢♦✇ Y

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

15 / 22

slide-42
SLIDE 42

15

Architecture

±1

round

1

shift

E0 1 ♦✈❡r✢♦✇✴ 1/ log 2 k E0 EX SX log 2 Xfix Y eY − 1 eY exnX

normalize / round sign / exception handling

  • R ≈ eX

✉♥❞❡r✢♦✇ FX Y0 Y eY − 1 eY Xfix ♦✈❡r✢♦✇✴ exnX SX k EX FX 1

  • R ≈ eX

sign / exception handling normalize / round

1

E0 log 2

round

1/ log 2

shift

±1

E0 ✉♥❞❡r✢♦✇ Y A0 B0 Y1 log eA0

  • eA0 − 1
  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

15 / 22

slide-43
SLIDE 43

15

Architecture

±1

round

1

shift

E0 1 ♦✈❡r✢♦✇✴ 1/ log 2 k E0 EX SX log 2 Xfix Y eY − 1 eY exnX

normalize / round sign / exception handling

  • R ≈ eX

✉♥❞❡r✢♦✇ FX Y0 Y eY − 1 eY Xfix ♦✈❡r✢♦✇✴ exnX SX k EX FX 1

  • R ≈ eX

sign / exception handling normalize / round

1

E0 log 2

round

1/ log 2

shift

±1

E0 ✉♥❞❡r✢♦✇ Y A0 B0 Y1 log eA0

  • eA0 − 1

B1 A1

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

15 / 22

slide-44
SLIDE 44

15

Architecture

±1

round

1

shift

E0 1 ♦✈❡r✢♦✇✴ 1/ log 2 k E0 EX SX log 2 Xfix Y eY − 1 eY exnX

normalize / round sign / exception handling

  • R ≈ eX

✉♥❞❡r✢♦✇ FX Y0 Y eY − 1 eY Xfix ♦✈❡r✢♦✇✴ exnX SX k EX FX 1

  • R ≈ eX

sign / exception handling normalize / round

1

E0 log 2

round

1/ log 2

shift

±1

E0 ✉♥❞❡r✢♦✇ Y A0 B0 Y1 log eA0

  • eA0 − 1

B1 A1 Yk

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

15 / 22

slide-45
SLIDE 45

15

Architecture

±1

round

1

shift

E0 1 ♦✈❡r✢♦✇✴ 1/ log 2 k E0 EX SX log 2 Xfix Y eY − 1 eY exnX

normalize / round sign / exception handling

  • R ≈ eX

✉♥❞❡r✢♦✇ FX Y0 Y eY − 1 eY Xfix ♦✈❡r✢♦✇✴ exnX SX k EX FX 1

  • R ≈ eX

sign / exception handling normalize / round

1

E0 log 2

round

1/ log 2

shift

±1

E0 ✉♥❞❡r✢♦✇ Y A0 B0 Y1 log eA0

  • eA0 − 1

B1 A1 Yk eYk − 1

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

15 / 22

slide-46
SLIDE 46

15

Architecture

±1

round

1

shift

E0 1 ♦✈❡r✢♦✇✴ 1/ log 2 k E0 EX SX log 2 Xfix Y eY − 1 eY exnX

normalize / round sign / exception handling

  • R ≈ eX

✉♥❞❡r✢♦✇ FX Y0 Y eY − 1 eY Xfix ♦✈❡r✢♦✇✴ exnX SX k EX FX 1

  • R ≈ eX

sign / exception handling normalize / round

1

E0 log 2

round

1/ log 2

shift

±1

E0 ✉♥❞❡r✢♦✇ Y A0 B0 Y1 log eA0

  • eA0 − 1

B1 A1 Yk eYk − 1 eY1 − 1

  • eA0 − 1
  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

15 / 22

slide-47
SLIDE 47

15

Architecture

±1

round

1

shift

E0 1 ♦✈❡r✢♦✇✴ 1/ log 2 k E0 EX SX log 2 Xfix Y eY − 1 eY exnX

normalize / round sign / exception handling

  • R ≈ eX

✉♥❞❡r✢♦✇ FX Y0 Y eY − 1 eY Xfix ♦✈❡r✢♦✇✴ exnX SX k EX FX 1

  • R ≈ eX

sign / exception handling normalize / round

1

E0 log 2

round

1/ log 2

shift

±1

E0 ✉♥❞❡r✢♦✇ Y A0 B0 Y1 log eA0

  • eA0 − 1

B1 A1 Yk eYk − 1 eY1 − 1

  • eA0 − 1

eY0 − 1

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

15 / 22

slide-48
SLIDE 48

15

Architecture

±1

round

1

shift

E0 1 ♦✈❡r✢♦✇✴ 1/ log 2 k E0 EX SX log 2 Xfix Y eY − 1 eY exnX

normalize / round sign / exception handling

  • R ≈ eX

✉♥❞❡r✢♦✇ FX Y0 Y eY − 1 eY Xfix ♦✈❡r✢♦✇✴ exnX SX k EX FX 1

  • R ≈ eX

sign / exception handling normalize / round

1

E0 log 2

round

1/ log 2

shift

±1

E0 ✉♥❞❡r✢♦✇ Y A0 B0 Y1 log eA0

  • eA0 − 1

B1 A1 Yk eYk − 1 eY1 − 1

  • eA0 − 1

eY0 − 1 eY − 1

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

15 / 22

slide-49
SLIDE 49

16

Outline of the talk

◮ Context ◮ Double-precision exponential ◮ Results ◮ Conclusion

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

16 / 22

slide-50
SLIDE 50

17

Operator area (exponential)

500 1000 1500 2000 6 10 14 18 22 26 30 34 38 42 46 50 0% 10% 20% 30% 40% ♣r❡❝✐s✐♦♥ wF ✭✐♥ ❜✐ts✮ ❛r❡❛ ✭✐♥ s❧✐❝❡s✮ ❋P●❆ ♦❝❝✉♣❛t✐♦♥ t❛❜❧❡✲❜❛s❡❞

◮ single precision (wE, wF) = (8, 23) (table-based method): 938 slices (18% of a Virtex-II 1000 FPGA)

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

17 / 22

slide-51
SLIDE 51

17

Operator area (exponential)

500 1000 1500 2000 6 10 14 18 22 26 30 34 38 42 46 50 0% 10% 20% 30% 40% ♣r❡❝✐s✐♦♥ wF ✭✐♥ ❜✐ts✮ ❛r❡❛ ✭✐♥ s❧✐❝❡s✮ ❋P●❆ ♦❝❝✉♣❛t✐♦♥ t❛❜❧❡✲❜❛s❡❞

◮ single precision (wE, wF) = (8, 23) (table-based method): 938 slices (18% of a Virtex-II 1000 FPGA)

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

17 / 22

slide-52
SLIDE 52

17

Operator area (exponential)

500 1000 1500 2000 6 10 14 18 22 26 30 34 38 42 46 50 0% 10% 20% 30% 40% ♣r❡❝✐s✐♦♥ wF ✭✐♥ ❜✐ts✮ ❛r❡❛ ✭✐♥ s❧✐❝❡s✮ ❋P●❆ ♦❝❝✉♣❛t✐♦♥ t❛❜❧❡✲❜❛s❡❞ ✐t❡r❛t✐✈❡

◮ single precision (wE, wF) = (8, 23) (table-based method): 938 slices (18% of a Virtex-II 1000 FPGA) ◮ double precision (wE, wF) = (11, 52) (iterative method): 2045 slices (40%)

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

17 / 22

slide-53
SLIDE 53

18

Operator latency (exponential)

60 100 140 180 220 6 10 14 18 22 26 30 34 38 42 46 50 ♣r❡❝✐s✐♦♥ wF ✭✐♥ ❜✐ts✮ ❧❛t❡♥❝② ✭✐♥ ♥s✮ t❛❜❧❡✲❜❛s❡❞ ✐t❡r❛t✐✈❡

◮ single precision (wE, wF) = (8, 23) (table-based method): 97 ns ◮ double precision (wE, wF) = (11, 52) (iterative method): 229 ns

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

18 / 22

slide-54
SLIDE 54

19

Outline of the talk

◮ Context ◮ Double-precision exponential ◮ Results ◮ Conclusion

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

19 / 22

slide-55
SLIDE 55

20

Our contribution

◮ exponential and logarithm operators ◮ up to double precision ◮ guaranteed faithful rounding ◮ scalable method ◮ hardware-specific algorithms: fast and cheap operators

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

20 / 22

slide-56
SLIDE 56

21

Future work

◮ pipeline ◮ implement double precision for other functions for FPLibrary ◮ study compound functions

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

21 / 22

slide-57
SLIDE 57

21

Future work

◮ pipeline ◮ implement double precision for other functions for FPLibrary ◮ study compound functions ◮ careful error analysis:

  • certified algorithms and operators
  • generic proofs (Gappa)

◮ most of this work is not FPGA-specific: extend it to ASICs

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

21 / 22

slide-58
SLIDE 58

22

Thank you for your attention

◮ more information & download page: http://www.ens-lyon.fr/LIP/Arenaire/

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

22 / 22

slide-59
SLIDE 59

22

Thank you for your attention

◮ more information & download page: http://www.ens-lyon.fr/LIP/Arenaire/

Questions?

  • J. Detrey, F. de Dinechin, and X. Pujol – Return of the hardware floating-point elementary functions

22 / 22