Reconfigurable Computing Reconfigurable Computing Applications - PowerPoint PPT Presentation

Reconfigurable Computing Reconfigurable Computing Applications Applications Chapter 9 Chapter 9 Prof. Dr.- -Ing. Jürgen Teich Ing. Jürgen Teich Prof. Dr. Lehrstuhl für Hardware- -Software Software- -Co Co- -Design Design Lehrstuhl für Hardware Reconfigurable Computing

Overview Overview � FPGAs have been used in the past mostly in � Rapid prototyping � Non-frequent reconfigurable systems � Hardware implementation, sometimes specific for the FPGA architecture The most important application areas are: � Searching (text, genetic database, etc.) � Image processing � Mechanical control � Etc. Reconfigurable Computing 2

Searching – – pattern matching pattern matching Searching � Pattern matching is the basis of search engines � The purpose is to find and (count) the occurrence of a given pattern in a given text � Useful in: � Dictionaries � Document collection indexing � Document filtering and classification � Spam avoidance � Content surveillance Reconfigurable Computing 3

Searching – – pattern matching pattern matching – – sliding windows sliding windows Searching � Sliding windows (Cockscot & Foulk ) � Keywords are kept in register. One character / Byte � A set of comparators are used. One comparator / Byte � Hit signal is set whenever the text- segment matches the corresponding word � Advantage: � Easy to replace old patterns � Drawbacks: � Not flexible: Fixed length of registers � Redundancy: more comparators than necessary for word with same prefix Reconfigurable Computing 4

Searching – – pattern matching pattern matching - - sliding windows sliding windows Searching � Avoid redundancy Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit � Use only one comparator for common characters in different words � Data folding (Foulk) 8-bit comparator � Fold the data in the circuit � Consider the bit-representation of Bit Bit Bit Bit Bit Bit Bit Bit each character � Generate a comparator circuit for each character in the words to be searched for 01001110-Comparator Reconfigurable Computing 5

Searching – – pattern matching pattern matching - - FSM FSM- -Based Based Searching � FSM-Based pattern matcher � Each regular grammar can be recognized by an FSM � In pattern matching, the target words define the regular grammar � The target words are compiled in the automaton � Each word defines a unique path from the start state to an end state � When scanning a text, the automaton changes its state with FSM-Recognizer and corresponding the appearance of characters state transition table for the word conte � Reaching a final state corresponds to the appearance of a word � Redundancy is avoided by implementing common prefix Reconfigurable Computing 6

Searching – – pattern matching pattern matching - - FSM FSM- -Based Based Searching � FSM-Based pattern matcher � RAM-implementation Char reg � One RAM or ROM for storing the Hit detect state transition table RAM ROM Character � One state register stream � One character register Next State Reg � A hit detector state � The Input character and the state RAM/ROM implementation register are used to determine the next of the word recognizer state � The hit detector checks if the current state is equal to a hit state and sets a hit for the corresponding word � Advantage: � Simple to implement � Drawback: Expensive in terms of flip flops � Reconfigurable Computing 7

Searching – – pattern matching pattern matching - - FSM FSM- -Based Based Searching � FSM-Based pattern matcher � One-hot implementation � Each state is coded in one flip flop � The D-input of the flip flop is obtained by an AND of the output of the previous flip flop with the result of the comparator � The comparator is character- c specific � Only n FF are used to implement a e t o n word of length n � Advantage: � Low cost � Reflects the structure of the grammar � Drawback: Character-specific comparators � Not easy to build � Redundancy in the comparators Reconfigurable Computing 8

Searching – – pattern matching pattern matching - - FSM FSM- -Based Based Searching FSM-Based pattern matcher � � Exploiting common prefix � For words with common prefix, only one common starting path corresponding to the length of the common prefix is used. � Redundancy of comparators can be avoided by implementing only one comparator for each character. The result of the comparison will then be provided to all gates using them Words with common prefix and the corresponding FSM Reconfigurable Computing 9

Searching – – pattern matching pattern matching - - FSM FSM- -Based Based Searching FSM-Based pattern matcher � � Optimized architecture � Implement the common prefix � Redundancy of comparators is removed: Each character in the set is implemented in a position vector: pos(i) = 1 iff Block diagram of the optimal character i is detected pattern matcher Detailed structure of the optimal pattern matcher Reconfigurable Computing 10

Searching – – pattern matching pattern matching – – use of use of Searching reconfiguration reconfiguration Bit Bit Bit Bit Bit Bit Bit Bit � FSM-Based pattern matcher � Use of reconfiguration � Replace the character comparators � Replace the FSM for a set of Reconfiguration words New character comparator New set of words R e c o n fig u r a tio n Reconfigurable Computing 11

Signal processing – – distributed arithmetic distributed arithmetic - - Signal processing Motivation Motivation � Signal processing applications (FFT, Convolution, Filter algorithms) are characterized by MAC-intensive computations Signal processing functions are usually implemented on � special processors � DSPs � ASICs FPGAs provide the advantage of reconfigurability, but � MAC-intensive applications are expensive � However, for MAC computations involving one constant vector, FPGAs present one of the best alternatives to DSPs Reconfigurable Computing 12

Signal processing- - distributed arithmetic distributed arithmetic - - Basics Basics Signal processing ( ) ∑ ∗ ∗ Solution of the following equation: Z = A X = A i X i A constant row vector, X column vector X ∑ j With the binary representation for X i : = X 2 i ij ( ) ( ) ∑ ∑ ∑ ∑ j j ∗ ∗ ∗ ∗ ∗ Z = A X = A X 2 = 2 A X i ij i ij ( ) ∗ ∑ ∑ j ∗ is the classical form of distributed arithmetic Z = 2 A X i ij Because the A i are constant, there exist 2 n possible values ∑ A ∗ i X for ij We can pre-compute the possible values and store them in a LUT (DALUT) and retrieve them on demand at run-time FPGA Advantage: Computation is memory-based (use of LUTs) Reconfigurable Computing 13

Signal processing- - distributed arithmetic distributed arithmetic - - Basics Basics Signal processing To better understand, we spread the DA equation Z=[ ] 0 ∗ ∗ ∗ ∗ X ( ) A ( ) + X A X A + X A 2 ......................... − − n 1 0 n 1 n0 n 10 1 20 2 ] 1 ∗ ∗ ∗ ∗ X ( ) A ( ) + X A + [ X A + X A 2 ......................... − − n 1 1 n 1 n1 n 11 1 21 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . ] W ∗ ∗ ∗ ∗ X ( ) A ( ) + X A + [ X A + X A 2 ......................... − − n 1 W n 1 nW n 1 W 1 2 W 2 The bits of the variables will be used to address the memory and retrieve the required values in a bit-serial way. The DA-datapath implementation is straightforward Reconfigurable Computing 14

Signal processing- - distributed arithmetic distributed arithmetic - - Datapath Datapath Signal processing DA-LUT Address DA-LUT 0 X X X 1 X ( ) ................ − 11 10 W 1 W 1 A 1 A 2 X 2 X ( ) X X − ................ A 1 A + W 2 W 1 21 20 2 A 3 . A + A . 3 1 . A + A . 3 2 . . A + A + A 3 2 1 A 4 X nW X ( ) X X − ................ n W 1 n1 n0 . . . Parallel bit-serial j-shift input Z +/- Reconfigurable Computing 15

Signal processing- - distributed arithmetic distributed arithmetic - - Datapath Datapath Signal processing k-parallel X X ( ) X 11 X − ................ 1 W 1 W 1 10 X X ( ) X 21 X − ................ 2 W 2 W 1 20 . . . . . . X nW X ( ) X n1 X − ................ n W 1 n0 DA-LUT 1 DA-LUT 2 DA-LUT k ACC 2 ACCk ACC 1 Adder tree Z Reconfigurable Computing 16

Signal processing- - distributed arithmetic distributed arithmetic - - Example Example Signal processing Recursive convolution of time domain simulation of optical multimode intra/system interconnects Recursive formula to be implemented on 3 intervals ( ) ( ) ∗ ∗ − ∗ ∗ ∗ y t = f y t + f x f x + f x + f x − n 0 n 1 4 0 5 1 24 2 53 3 Comparison of different Virtex 2000E implementation on implementations the Celoxica RC1000-PP board Reconfigurable Computing 17

Reconfigurable Computing Reconfigurable Computing Applications - PowerPoint PPT Presentation

Reconfigurable Computing Reconfigurable Computing Applications Applications Chapter 9 Chapter 9 Prof. Dr.- -Ing. Jrgen Teich Ing. Jrgen Teich Prof. Dr. Lehrstuhl fr Hardware- -Software Software- -Co Co- -Design Design

Reconfigurable Computing Reconfigurable Computing Reconfigurable Architectures Reconfigurable

Reconfigurable Computing Computing Reconfigurable Reconfigurable Architectures Architectures

Reconfigurable Computing Computing Reconfigurable Design and implementation implementation

Reconfigurable Computing Reconfigurable Computing Design and implementation Design and

Reconfigurable Computing Computing Reconfigurable Partial reconfiguration reconfiguration

Reconfigurable Computing Reconfigurable Computing Partitioning Partitioning Chapter 5 Chapter

Reconfigurable Computing Reconfigurable Computing for System on a Chip for System on a Chip

Reconfigurable Computing Reconfigurable Computing VHDL Crash Course VHDL Crash Course Chapter 2

Reconfigurable Computing Reconfigurable Computing Introduction Introduction Chapter 1 1

Reconfigurable Computing Computing Reconfigurable On- -line line communication communication

Reconfigurable and Reconfigurable and Adaptive Systems (RAS) Adaptive Systems (RAS) 7. Adaptive

Using Reconfigurable Logic Using Reconfigurable Logic to Simulate Computer Systems Derek Chiou

Reconfigurable and Reconfigurable and Adaptive Systems (RAS) Adaptive Systems (RAS) 4.

CSCI 2570 Introduction to Nanocomputing Reconfigurable Computing John E Savage Overview

A Programming Model for Reconfigurable Computing Based in Functional Concurrency Bill Harrison,

Programming Soft Processors in High Performance Reconfigurable Computing Andrew W. H. House

A Fast Fourier Transform Compiler Matteo Frigo Supercomputing Technologies Group MIT Laboratory

A Gentle Introduction to Machine Learning First Lecture Olov Andersson, AIICS Linkpings

Programming Heterogeneous Many-cores Using Directives HMPP - OpenAcc F. Bodin, CAPS CTO

Chapel: Global HPCC Benchmarks and Status Update Brad Chamberlain Chapel Team CUG 2007 May 7,

Tensor numerical methods in scientific computing: Basic theory and initial applications Boris

Better Performance at Lower Occupancy Vasily Volkov UC Berkeley September 22, 2010 1 Prologue

Divide and Conquer Paradigm By: Melissa Manley How does it work? Divide : the original problem

Welcome t to Year 2 2 Broadstone First School 2019-2020 Whos W s Who? CEO of the