fpga multipliers
play

FPGA Multipliers Bogdan PASCA projet Ar enaire, - PowerPoint PPT Presentation

FPGA Multipliers Bogdan PASCA projet Ar enaire, ENS-Lyon/INRIA/CNRS/Universit e de Lyon, France RAIM11 February 7-10, 2011 Outline Background & Context Algorithmic techniques for reducing DSP count of large multipliers


  1. FPGA Multipliers Bogdan PASCA projet Ar´ enaire, ENS-Lyon/INRIA/CNRS/Universit´ e de Lyon, France RAIM’11 February 7-10, 2011

  2. Outline Background & Context Algorithmic techniques for reducing DSP count of large multipliers Karatsuba-Ofman algorithm Non-Standard tilings Squarers Truncated multipliers Conclusions Bogdan PASCA FPGA Multipliers 1

  3. What’s an FPGA? F ield P rogrammable G ate A rray integrated circuit has a regular architecture (hence array ) logic elements can be programmed to perform various functions Bogdan PASCA FPGA Multipliers 2

  4. Modern FPGA Architecture a set of configurable logic elements on chip memory blocks digital signal processing (DSP) blocks (including multipliers) connected by a configurable wire network all connected to outside world by I/O pins Bogdan PASCA FPGA Multipliers 3

  5. Modern FPGA Architecture RAM RAM RAM RAM a set of configurable logic elements on chip memory blocks digital signal processing (DSP) blocks (including multipliers) connected by a configurable wire network all connected to outside world by I/O pins Bogdan PASCA FPGA Multipliers 3

  6. Modern FPGA Architecture DSP RAM RAM DSP DSP RAM RAM DSP a set of configurable logic elements on chip memory blocks digital signal processing (DSP) blocks (including multipliers) connected by a configurable wire network all connected to outside world by I/O pins Bogdan PASCA FPGA Multipliers 3

  7. Modern FPGA Architecture DSP RAM RAM DSP DSP RAM RAM DSP a set of configurable logic elements on chip memory blocks digital signal processing (DSP) blocks (including multipliers) connected by a configurable wire network all connected to outside world by I/O pins Bogdan PASCA FPGA Multipliers 3

  8. Modern FPGA Architecture DSP RAM RAM DSP DSP RAM RAM DSP a set of configurable logic elements on chip memory blocks digital signal processing (DSP) blocks (including multipliers) connected by a configurable wire network all connected to outside world by I/O pins Bogdan PASCA FPGA Multipliers 3

  9. Modern FPGA Architecture DSP LUT RAM RAM DSP DSP RAM RAM DSP a set of configurable logic elements on chip memory blocks digital signal processing (DSP) blocks (including multipliers) connected by a configurable wire network all connected to outside world by I/O pins Bogdan PASCA FPGA Multipliers 3

  10. Modern FPGA Architecture DSP LUT RAM RAM DSP 18 DSP RAM RAM 18 shift 17 DSP a set of configurable logic elements on chip memory blocks digital signal processing (DSP) blocks (including multipliers) connected by a configurable wire network all connected to outside world by I/O pins Bogdan PASCA FPGA Multipliers 3

  11. What can we compute? u 2 x 2 x 1 x 0 × x 2 y 1 LUT y 1 y 0 l 2 l 1 l 0 + p 4 u 1 x 1 u 2 u 1 u 0 p 3 y 1 LUT LUT p 4 p 3 p 2 p 1 p 0 u 0 x 0 p 2 y 1 LUT LUT l 0 = y 0 ∧ x 0 = l 1 y 0 ∧ x 1 x 2 p 1 y 0 l 2 LUT LUT l 2 = y 0 ∧ x 2 l 1 x 1 y 0 LUT u 0 = y 1 ∧ x 0 l 0 = u 1 y 1 ∧ x 1 x 0 p 0 y 0 LUT u 2 = y 1 ∧ x 2 Bogdan PASCA FPGA Multipliers 4

  12. What can we compute? u 2 x 2 x 1 x 0 × x 2 y 1 LUT y 1 y 0 l 2 l 1 l 0 + p 4 u 1 x 1 u 2 u 1 u 0 p 3 y 1 LUT LUT p 4 p 3 p 2 p 1 p 0 u 0 x 0 p 2 y 1 LUT LUT l 0 = y 0 ∧ x 0 = l 1 y 0 ∧ x 1 x 2 p 1 y 0 l 2 LUT LUT l 2 = y 0 ∧ x 2 l 1 x 1 y 0 LUT u 0 = y 1 ∧ x 0 l 0 = u 1 y 1 ∧ x 1 x 0 p 0 y 0 LUT u 2 = y 1 ∧ x 2 Bogdan PASCA FPGA Multipliers 4

  13. What can we compute? u 2 x 2 x 1 x 0 × x 2 y 1 LUT y 1 y 0 l 2 l 1 l 0 + p 4 u 1 x 1 u 2 u 1 u 0 p 3 y 1 LUT LUT p 4 p 3 p 2 p 1 p 0 u 0 x 0 p 2 y 1 LUT LUT l 0 = y 0 ∧ x 0 = l 1 y 0 ∧ x 1 x 2 p 1 y 0 l 2 LUT LUT l 2 = y 0 ∧ x 2 l 1 x 1 y 0 LUT u 0 = y 1 ∧ x 0 l 0 = u 1 y 1 ∧ x 1 x 0 p 0 y 0 LUT u 2 = y 1 ∧ x 2 Bogdan PASCA FPGA Multipliers 4

  14. What can we compute? u 2 x 2 x 1 x 0 × x 2 y 1 LUT y 1 y 0 l 2 l 1 l 0 + p 4 u 1 x 1 u 2 u 1 u 0 p 3 y 1 LUT FA p 4 p 3 p 2 p 1 p 0 u 0 x 0 p 2 y 1 LUT FA l 0 = y 0 ∧ x 0 = l 1 y 0 ∧ x 1 x 2 p 1 y 0 l 2 LUT l 2 = y 0 ∧ x 2 FA l 1 x 1 y 0 LUT u 0 = y 1 ∧ x 0 l 0 = u 1 y 1 ∧ x 1 x 0 p 0 y 0 LUT u 2 = y 1 ∧ x 2 Bogdan PASCA FPGA Multipliers 4

  15. Need of DSP blocks Multiplication in logic is expensive n 2 + n ( n − 1) n × n bit ≈ LUTs ���� � �� � partial products adder tree 18 × 18 bit ≈ 324 LUT + 306 LUT = 630 LUTs 1 DSP block = 8 LEs (size on FPGA layout) Bogdan PASCA FPGA Multipliers 5

  16. � Need of DSP blocks Multiplication in logic is expensive n 2 + n ( n − 1) n × n bit ≈ LUTs ���� � �� � partial products adder tree 18 × 18 bit ≈ 324 LUT + 306 LUT = 630 LUTs 1 DSP block = 8 LEs (size on FPGA layout) DSP blocks are a need in modern FPGAs A 18 B 48 P 18 C 17 bit shift 17 bit shift 48 P Bogdan PASCA FPGA Multipliers 5

  17. DSP-Hungry Applications FPGA floating point performance – a pencil and paper evaluation 1 → DSP-blocks are a scarce resource for accelerating DP apps. Efficient reconfigurable design for pricing asian options 2 → LUTs 46%, RAM 4%, DSP 100% (192) Implementation and evaluation of an arithmetic pipeline on FLOPS-2D: multi-FPGA system 3 → a)LE 30%, DSP 86%, b) LE 52%, DSP 88%, c) LE 63%, DSP 100% A temporal coding hardware implementation for spiking neural networks 4 → 16PE: LE 22%, RAM 3%, DSP 74% (100/136) Four recipes for saving DSPs 1 D. Strenski (HPCWire, 2007.) 2 Anson H.T. Tse, David B. Thomas, K. H. Tsoi, Wayne Luk (HEART’10) 3 H. Morisita, K. Inakagata, Y. Osana, N. Fujita, H. Amano (HEART’10) 4 Marco Nuno-Maganda, Cesar Torres-Huitzil (HEART’10) Bogdan PASCA FPGA Multipliers 6

  18. DSP-Hungry Applications FPGA floating point performance – a pencil and paper evaluation 1 → DSP-blocks are a scarce resource for accelerating DP apps. Efficient reconfigurable design for pricing asian options 2 → LUTs 46%, RAM 4%, DSP 100% (192) Implementation and evaluation of an arithmetic pipeline on FLOPS-2D: multi-FPGA system 3 → a)LE 30%, DSP 86%, b) LE 52%, DSP 88%, c) LE 63%, DSP 100% A temporal coding hardware implementation for spiking neural networks 4 → 16PE: LE 22%, RAM 3%, DSP 74% (100/136) Four recipes for saving DSPs 1 D. Strenski (HPCWire, 2007.) 2 Anson H.T. Tse, David B. Thomas, K. H. Tsoi, Wayne Luk (HEART’10) 3 H. Morisita, K. Inakagata, Y. Osana, N. Fujita, H. Amano (HEART’10) 4 Marco Nuno-Maganda, Cesar Torres-Huitzil (HEART’10) Bogdan PASCA FPGA Multipliers 6

  19. DSP-Hungry Applications FPGA floating point performance – a pencil and paper evaluation 1 → DSP-blocks are a scarce resource for accelerating DP apps. Efficient reconfigurable design for pricing asian options 2 → LUTs 46%, RAM 4%, DSP 100% (192) Implementation and evaluation of an arithmetic pipeline on FLOPS-2D: multi-FPGA system 3 → a)LE 30%, DSP 86%, b) LE 52%, DSP 88%, c) LE 63%, DSP 100% A temporal coding hardware implementation for spiking neural networks 4 → 16PE: LE 22%, RAM 3%, DSP 74% (100/136) Four recipes for saving DSPs 1 D. Strenski (HPCWire, 2007.) 2 Anson H.T. Tse, David B. Thomas, K. H. Tsoi, Wayne Luk (HEART’10) 3 H. Morisita, K. Inakagata, Y. Osana, N. Fujita, H. Amano (HEART’10) 4 Marco Nuno-Maganda, Cesar Torres-Huitzil (HEART’10) Bogdan PASCA FPGA Multipliers 6

  20. DSP-Hungry Applications FPGA floating point performance – a pencil and paper evaluation 1 → DSP-blocks are a scarce resource for accelerating DP apps. Efficient reconfigurable design for pricing asian options 2 → LUTs 46%, RAM 4%, DSP 100% (192) Implementation and evaluation of an arithmetic pipeline on FLOPS-2D: multi-FPGA system 3 → a)LE 30%, DSP 86%, b) LE 52%, DSP 88%, c) LE 63%, DSP 100% A temporal coding hardware implementation for spiking neural networks 4 → 16PE: LE 22%, RAM 3%, DSP 74% (100/136) Four recipes for saving DSPs 1 D. Strenski (HPCWire, 2007.) 2 Anson H.T. Tse, David B. Thomas, K. H. Tsoi, Wayne Luk (HEART’10) 3 H. Morisita, K. Inakagata, Y. Osana, N. Fujita, H. Amano (HEART’10) 4 Marco Nuno-Maganda, Cesar Torres-Huitzil (HEART’10) Bogdan PASCA FPGA Multipliers 6

  21. DSP-Hungry Applications FPGA floating point performance – a pencil and paper evaluation 1 → DSP-blocks are a scarce resource for accelerating DP apps. Efficient reconfigurable design for pricing asian options 2 → LUTs 46%, RAM 4%, DSP 100% (192) Implementation and evaluation of an arithmetic pipeline on FLOPS-2D: multi-FPGA system 3 → a)LE 30%, DSP 86%, b) LE 52%, DSP 88%, c) LE 63%, DSP 100% A temporal coding hardware implementation for spiking neural networks 4 → 16PE: LE 22%, RAM 3%, DSP 74% (100/136) Four recipes for saving DSPs 1 D. Strenski (HPCWire, 2007.) 2 Anson H.T. Tse, David B. Thomas, K. H. Tsoi, Wayne Luk (HEART’10) 3 H. Morisita, K. Inakagata, Y. Osana, N. Fujita, H. Amano (HEART’10) 4 Marco Nuno-Maganda, Cesar Torres-Huitzil (HEART’10) Bogdan PASCA FPGA Multipliers 6

  22. Perceiving Multiplications Visually X Y � classical binary multiplication all sub-products can be properly located inside the diamond rotate the diamond so to obtain a rectangle Bogdan PASCA FPGA Multipliers 7

  23. Perceiving Multiplications Visually X 2:0 Y 2:0 � classical binary multiplication all sub-products can be properly located inside the diamond rotate the diamond so to obtain a rectangle Bogdan PASCA FPGA Multipliers 7

  24. Perceiving Multiplications Visually X 5:3 Y 5:3 � classical binary multiplication all sub-products can be properly located inside the diamond rotate the diamond so to obtain a rectangle Bogdan PASCA FPGA Multipliers 7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend