fpga specific arithmetic pipeline design using flopoco
play

FPGA-specific arithmetic pipeline design using FloPoCo Bogdan - PowerPoint PPT Presentation

FPGA-specific arithmetic pipeline design using FloPoCo Bogdan Pasca, Ar enaire CARAMEL, 17/02/2011 Outline FPGAs and floating-point Datapath design using FloPoCo Inside FloPoCo Back-end for HLS Conclusion Bogdan Pasca, Ar enaire


  1. FPGA-specific arithmetic pipeline design using FloPoCo Bogdan Pasca, Ar´ enaire CARAMEL, 17/02/2011

  2. Outline FPGAs and floating-point Datapath design using FloPoCo Inside FloPoCo Back-end for HLS Conclusion Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 1

  3. FPGAs and floating-point FPGAs and floating-point Datapath design using FloPoCo Inside FloPoCo Back-end for HLS Conclusion Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 2

  4. What’s an FPGA? F ield P rogrammable G ate A rray integrated circuit has a regular architecture (hence array ) logic elements can be programmed to perform various functions Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 3

  5. Modern FPGA Architecture a set of configurable logic elements Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 4

  6. Modern FPGA Architecture RAM RAM RAM RAM a set of configurable logic elements on chip memory blocks Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 4

  7. Modern FPGA Architecture DSP RAM RAM DSP DSP RAM RAM DSP a set of configurable logic elements on chip memory blocks digital signal processing (DSP) blocks (including multipliers) Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 4

  8. Modern FPGA Architecture DSP RAM RAM DSP DSP RAM RAM DSP a set of configurable logic elements on chip memory blocks digital signal processing (DSP) blocks (including multipliers) connected by a configurable wire network Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 4

  9. Modern FPGA Architecture DSP RAM RAM DSP DSP RAM RAM DSP a set of configurable logic elements on chip memory blocks digital signal processing (DSP) blocks (including multipliers) connected by a configurable wire network all connected to outside world by I/O pins Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 4

  10. Modern FPGA Architecture DSP LUT RAM RAM DSP DSP RAM RAM DSP a set of configurable logic elements on chip memory blocks digital signal processing (DSP) blocks (including multipliers) connected by a configurable wire network all connected to outside world by I/O pins Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 4

  11. Modern FPGA Architecture DSP LUT RAM RAM DSP 18 DSP RAM RAM 18 shift 17 DSP a set of configurable logic elements on chip memory blocks digital signal processing (DSP) blocks (including multipliers) connected by a configurable wire network all connected to outside world by I/O pins Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 4

  12. A bit of history Year 1995 2011 FPGA XC4010 XC6VHX565T 5SGXAB Capacity ( K LE) 1 500 1.000 DSPs - 1K 1.5K Bock RAM - 2K (18Kb) 2K (20Kb) Frequency ( MHz) 10 600 FPAdder 28% 0.05% 0.025% ( w E = 6 , w F = 9) 1 * 2 FPMultiplier ( w E = 6 , w F = 9) 44% * FPDivider 46% 0.1% 0.05% ( w E = 6 , w F = 9) 1 Shirazi et al., Quantitative Analysis of Floating Point Arithmetic on FPGA Based Custom Computing Machines (1995) 2 Multiplications are usually implemented using DSPs on modern FPGAs Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 5

  13. A bit of history Year 1995 2011 FPGA XC4010 XC6VHX565T 5SGXAB Capacity ( K LE) 1 500 1.000 DSPs - 1K 1.5K Bock RAM - 2K (18Kb) 2K (20Kb) Frequency ( MHz) 10 600 FPAdder 28% 0.05% 0.025% ( w E = 6 , w F = 9) 1 * 2 FPMultiplier ( w E = 6 , w F = 9) 44% * FPDivider 46% 0.1% 0.05% ( w E = 6 , w F = 9) FPGAs are now large enough to implement complex datapaths 1 Shirazi et al., Quantitative Analysis of Floating Point Arithmetic on FPGA Based Custom Computing Machines (1995) 2 Multiplications are usually implemented using DSPs on modern FPGAs Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 5

  14. So, are FPGAs any good at floating-point in 2011? Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 6

  15. So, are FPGAs any good at floating-point in 2011? Today’s basic operations: + , − , × j Highly optimized FPU in the processor j Each operator 10x slower in an FPGA ⋆ Massive parallelism on an FPGA → FPGA faster than PC, but no match to GPGPU, Cell ... Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 6

  16. So, are FPGAs any good at floating-point in 2011? Today’s basic operations: + , − , × j Highly optimized FPU in the processor j Each operator 10x slower in an FPGA ⋆ Massive parallelism on an FPGA → FPGA faster than PC, but no match to GPGPU, Cell ... If you lose according to a metric, change the metric. Peak figures for double-precision floating-point exponential 3 . Pentium core: 20 cycles / DPExp @ 3GHz: 150 MDPExp/s FPGA: 1 DPExp/cycle @ 400MHz: 400 MDPExp/s Chip vs chip: 8 Pentium cores vs 150 FPExp/FPGA ⋆ Power consumption also better (Intel MKL vector libm, vs FPExp in FloPoCo version 2.0.0) 3 de Dinechin, Pasca. Floating-point exponential functions for DSP-enabled FPGAs (2010) Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 6

  17. The FloPoCo project: Not your neighbour’s FPU Useful operators that would not be economical in a processor Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 7

  18. The FloPoCo project: Not your neighbour’s FPU Useful operators that would not be economical in a processor ⋆ Elementary functions (sine, exponential, logarithm...) x ⋆ Algebraic functions ( x 2 + y 2 , polynomials, ...) � ⋆ Compound functions (log 2 (1 ± 2 x ), e − Kt 2 , ...) ⋆ Floating-point sums, dot products, sums of squares ⋆ Specialized operators: constant multipliers, squarers, ... Complex arithmetic ⋆ LNS arithmetic ⋆ Decimal arithmetic Interval arithmetic ... Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 7

  19. The FloPoCo project: Not your neighbour’s FPU Useful operators that would not be economical in a processor ⋆ Elementary functions (sine, exponential, logarithm...) x ⋆ Algebraic functions ( x 2 + y 2 , polynomials, ...) � ⋆ Compound functions (log 2 (1 ± 2 x ), e − Kt 2 , ...) ⋆ Floating-point sums, dot products, sums of squares ⋆ Specialized operators: constant multipliers, squarers, ... Complex arithmetic ⋆ LNS arithmetic ⋆ Decimal arithmetic Interval arithmetic ... Oh yes, basic operations, too. Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 7

  20. VHDL Limitations One instance: double-precision, Virtex4, 400MHz - FPExp: 52 pipeline stages 37 subcomponents 6000 lines of VHDL Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 8

  21. VHDL Limitations One instance: double-precision, Virtex4, 400MHz - FPExp: 52 pipeline stages 37 subcomponents 6000 lines of VHDL vs 600 lines of FloPoCo Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 8

  22. VHDL Limitations One instance: double-precision, Virtex4, 400MHz - FPExp: 52 pipeline stages 37 subcomponents 6000 lines of VHDL vs 600 lines of FloPoCo Our questions for today: How to productively design an optimized architecture? Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 8

  23. VHDL Limitations One instance: double-precision, Virtex4, 400MHz - FPExp: 52 pipeline stages 37 subcomponents 6000 lines of VHDL vs 600 lines of FloPoCo Our questions for today: How to productively design an optimized architecture? How to be future-proof ? need a different precision target a different FPGA family (different multiplier sizes) need faster frequency Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 8

  24. Datapath design using FloPoCo FPGAs and floating-point Datapath design using FloPoCo Inside FloPoCo Back-end for HLS Conclusion Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 9

  25. A question of granularity productivity performance FloPoCo abstraction high low system builder loop C−like FPGA management arithmetic primitives datapath Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 10

  26. Sum of squares: performance approach x 2 + y 2 + z 2 (not a toy example but a useful building block) Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 11

  27. Sum of squares: performance approach x 2 + y 2 + z 2 (not a toy example but a useful building block) A square is simpler than a multiplication half the hardware required Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 11

  28. Sum of squares: performance approach x 2 + y 2 + z 2 (not a toy example but a useful building block) A square is simpler than a multiplication half the hardware required x 2 , y 2 , and z 2 are positive: one half of your FP adder is useless Bogdan Pasca, Ar´ enaire FPGA-specific arithmetic pipeline design using FloPoCo 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend