FPGA Multipliers Bogdan PASCA projet Ar enaire, - PowerPoint PPT Presentation

FPGA Multipliers Bogdan PASCA projet Ar´ enaire, ENS-Lyon/INRIA/CNRS/Universit´ e de Lyon, France RAIM’11 February 7-10, 2011

Outline Background & Context Algorithmic techniques for reducing DSP count of large multipliers Karatsuba-Ofman algorithm Non-Standard tilings Squarers Truncated multipliers Conclusions Bogdan PASCA FPGA Multipliers 1

What’s an FPGA? F ield P rogrammable G ate A rray integrated circuit has a regular architecture (hence array ) logic elements can be programmed to perform various functions Bogdan PASCA FPGA Multipliers 2

Modern FPGA Architecture a set of configurable logic elements on chip memory blocks digital signal processing (DSP) blocks (including multipliers) connected by a configurable wire network all connected to outside world by I/O pins Bogdan PASCA FPGA Multipliers 3

Modern FPGA Architecture RAM RAM RAM RAM a set of configurable logic elements on chip memory blocks digital signal processing (DSP) blocks (including multipliers) connected by a configurable wire network all connected to outside world by I/O pins Bogdan PASCA FPGA Multipliers 3

Modern FPGA Architecture DSP RAM RAM DSP DSP RAM RAM DSP a set of configurable logic elements on chip memory blocks digital signal processing (DSP) blocks (including multipliers) connected by a configurable wire network all connected to outside world by I/O pins Bogdan PASCA FPGA Multipliers 3

Modern FPGA Architecture DSP LUT RAM RAM DSP DSP RAM RAM DSP a set of configurable logic elements on chip memory blocks digital signal processing (DSP) blocks (including multipliers) connected by a configurable wire network all connected to outside world by I/O pins Bogdan PASCA FPGA Multipliers 3

Modern FPGA Architecture DSP LUT RAM RAM DSP 18 DSP RAM RAM 18 shift 17 DSP a set of configurable logic elements on chip memory blocks digital signal processing (DSP) blocks (including multipliers) connected by a configurable wire network all connected to outside world by I/O pins Bogdan PASCA FPGA Multipliers 3

What can we compute? u 2 x 2 x 1 x 0 × x 2 y 1 LUT y 1 y 0 l 2 l 1 l 0 + p 4 u 1 x 1 u 2 u 1 u 0 p 3 y 1 LUT LUT p 4 p 3 p 2 p 1 p 0 u 0 x 0 p 2 y 1 LUT LUT l 0 = y 0 ∧ x 0 = l 1 y 0 ∧ x 1 x 2 p 1 y 0 l 2 LUT LUT l 2 = y 0 ∧ x 2 l 1 x 1 y 0 LUT u 0 = y 1 ∧ x 0 l 0 = u 1 y 1 ∧ x 1 x 0 p 0 y 0 LUT u 2 = y 1 ∧ x 2 Bogdan PASCA FPGA Multipliers 4

What can we compute? u 2 x 2 x 1 x 0 × x 2 y 1 LUT y 1 y 0 l 2 l 1 l 0 + p 4 u 1 x 1 u 2 u 1 u 0 p 3 y 1 LUT FA p 4 p 3 p 2 p 1 p 0 u 0 x 0 p 2 y 1 LUT FA l 0 = y 0 ∧ x 0 = l 1 y 0 ∧ x 1 x 2 p 1 y 0 l 2 LUT l 2 = y 0 ∧ x 2 FA l 1 x 1 y 0 LUT u 0 = y 1 ∧ x 0 l 0 = u 1 y 1 ∧ x 1 x 0 p 0 y 0 LUT u 2 = y 1 ∧ x 2 Bogdan PASCA FPGA Multipliers 4

Need of DSP blocks Multiplication in logic is expensive n 2 + n ( n − 1) n × n bit ≈ LUTs �� partial products adder tree 18 × 18 bit ≈ 324 LUT + 306 LUT = 630 LUTs 1 DSP block = 8 LEs (size on FPGA layout) Bogdan PASCA FPGA Multipliers 5

� Need of DSP blocks Multiplication in logic is expensive n 2 + n ( n − 1) n × n bit ≈ LUTs �� partial products adder tree 18 × 18 bit ≈ 324 LUT + 306 LUT = 630 LUTs 1 DSP block = 8 LEs (size on FPGA layout) DSP blocks are a need in modern FPGAs A 18 B 48 P 18 C 17 bit shift 17 bit shift 48 P Bogdan PASCA FPGA Multipliers 5

DSP-Hungry Applications FPGA floating point performance – a pencil and paper evaluation 1 → DSP-blocks are a scarce resource for accelerating DP apps. Efficient reconfigurable design for pricing asian options 2 → LUTs 46%, RAM 4%, DSP 100% (192) Implementation and evaluation of an arithmetic pipeline on FLOPS-2D: multi-FPGA system 3 → a)LE 30%, DSP 86%, b) LE 52%, DSP 88%, c) LE 63%, DSP 100% A temporal coding hardware implementation for spiking neural networks 4 → 16PE: LE 22%, RAM 3%, DSP 74% (100/136) Four recipes for saving DSPs 1 D. Strenski (HPCWire, 2007.) 2 Anson H.T. Tse, David B. Thomas, K. H. Tsoi, Wayne Luk (HEART’10) 3 H. Morisita, K. Inakagata, Y. Osana, N. Fujita, H. Amano (HEART’10) 4 Marco Nuno-Maganda, Cesar Torres-Huitzil (HEART’10) Bogdan PASCA FPGA Multipliers 6

Perceiving Multiplications Visually X Y � classical binary multiplication all sub-products can be properly located inside the diamond rotate the diamond so to obtain a rectangle Bogdan PASCA FPGA Multipliers 7

Perceiving Multiplications Visually X 2:0 Y 2:0 � classical binary multiplication all sub-products can be properly located inside the diamond rotate the diamond so to obtain a rectangle Bogdan PASCA FPGA Multipliers 7

Perceiving Multiplications Visually X 5:3 Y 5:3 � classical binary multiplication all sub-products can be properly located inside the diamond rotate the diamond so to obtain a rectangle Bogdan PASCA FPGA Multipliers 7

FPGA Multipliers Bogdan PASCA projet Ar enaire, - PowerPoint PPT Presentation

FPGA Multipliers Bogdan PASCA projet Ar enaire, ENS-Lyon/INRIA/CNRS/Universit e de Lyon, France RAIM11 February 7-10, 2011 Outline Background & Context Algorithmic techniques for reducing DSP count of large multipliers

Classes of Herz-Schur multipliers Ivan Todorov April 2014 Toronto Content Positive multipliers

Decomposable Schur multipliers and non-commutative Fourier multipliers Christoph Kriegler

5 Multipliers Of IMPACT How do you measure IMPACT? The 5 Multipliers of IMPACT Awareness

Littlewood-Paley Theory and Multipliers George Kinnear September 11, 2009 George Kinnear

Extracting INT8 Multipliers from INT18 Multipliers Bogdan Pasca, Martin Langhammer, Gregg

Norms of idempotent Schur multipliers Rupert Levene University College Dublin Banach Algebras

Open Source FPGA Toolchain FPGA LSE Summer Week 2015 iCE40 Flow Conclusion Vincent Gatine

Tips about an FPGA 02/09/2018 J.C. special topic FPGA ( field-programmable gate array ) FPGA :

FPGA What is a FPGA? How FPGAs work How do they work? Manufacturers

WWW.FPGA What is an FPGA? Field Programmable Gate Array Introduction to FPGA designs

Current Trends in Hybrid FPGA/CPU Devices Hybrid FPGA/CPU Devices Xilinx Zynq Series Real

FPGA-CAPELLA: A REAL TIME AUDIO FX UNIT COSMA KUFA AND JUSTIN XIAO WHAT IS FPGA-CAPELLA?

Public FPGA based DM Public FPGA based DMA Atta A Attacking king UlfFrisk Agenda Background

GRVI Phalanx Update: A Massively Parallel RISC-V FPGA Accelerator Framework Jan Gray |

An introduction to FPGA-based acceleration of neural networks Marco Pagani 1 What is an FPGA?

RTLinux in an FPGA Alejandro Lucero alucero@os3sl.com www.os3sl.com RTLinux in a FPGA 1.

Hoplite-DSP Harnessing the Xilinx DSP48 Multiplexers to efficiently support NoCs on FPGAs

Why is it important to measure operational wireless networks? Diagnose faults Identify

Single Touch Payroll - Phase 2 Digital Service Providers (DSPs) Presented by: Michael Karavas

Optimizing Stream Programs Using Linear State Space Analysis Sitij Agrawal 1,2 , William Thies 1 ,

Warmup Use a k-map to fi nd a minimal implementation of this truth table: A B C D | Y 0 0 0 0 0

Interface, Data, Approximation Sarita Adve With: Vikram Adve, Johnathan Alsop, Maria Kotsifakou,

Addressing Deployment Challenges in Data Stream Processing Corso di Sistemi e Architetture per

DSP HW2-1 HMM Training and Testing Outline 1.