table of contents
play

Table of Contents Chapter 1 Introduction Chapter 2 First - PowerPoint PPT Presentation

Table of Contents Chapter 1 Introduction Chapter 2 First Prototypes of an Associative Computing (ASC) Processor Design and Implementation of an FPGA-Based Chapter 3 A Scalable Pipelined ASC Processor With Scalable


  1. Table of Contents � Chapter 1 – Introduction � Chapter 2 – First Prototypes of an Associative Computing (ASC) Processor Design and Implementation of an FPGA-Based � Chapter 3 – A Scalable Pipelined ASC Processor With Scalable Pipelined Associative SIMD Processor Reconfigurable PE Interconnection Network Array with Specialized Variations for Sequence � Chapter 4 – A Specialized ASC Processor with Reconfigurable Comparison and MSIMD Operation 2D Mesh for Solving the Longest Common Subsequence (LCS) Problem � Chapter 5 – An ASC Processor to Support Multiple Instruction Stream Associative Computing (MASC) Hong Wang � Chapter 6 – Conclusions and Future Work Department of Computer Science Kent State University Nov 3 rd , 2006 Dissertation Defense Nov 3 rd , 2006 Dissertation Defense 2 Implementing Associative Computing in the ASC Associative Computing Processor � Associative computing is particularly well suited to processing � Associative search : the Control Unit broadcasts the search key records of data in a tabular format to all PEs to compare with local memory. If search is successful, � As illustrated, each Processing Element (PE) of the SIMD those PEs are designated responders , and they set their associative computing array can store a record of this tabular Responder bit and the top of their Mask Stack to ‘1’ data in its memory � Process the responders sequentially : STEP instruction uses Responder Resolution Unit and Mask Stack to process responding PEs one by one. Search STEP1 STEP2 Student Name ID Grade Mask RSPD Mask RSPD Mask RSPD � Searching for maximum/minimum value in a field uses Falkoff PE0 John Smith 07 66 0 0 0 0 0 0 Algorithm, process bit slices from left to right . PE1 Gary Heath 05 95 1 1 1 0 0 0 PE2 Peter Smith 11 87 0 0 0 0 0 0 John Smith 04 78 0 0 0 0 0 0 PE3 Search STEP1 STEP2 Tarry Stanley 02 100 1 1 0 1 1 0 PE4 Student Name ID Grade Mask RSPD Mask RSPD Mask RSPD PE5 Will Hanson 01 84 0 0 0 0 0 0 PE6 Jane Antony 06 64 0 0 0 0 0 0 John Smith 07 66 0 0 0 0 0 0 PE7 Mark Bloggs 13 88 0 0 0 0 0 0 Gary Heath 05 95 1 1 1 0 0 0 PE8 Gill Pister 09 75 0 0 0 0 0 0 Peter Smith 11 87 0 0 0 0 0 0 PE9 Min Lee 10 83 0 0 0 0 0 0 PE10 Goby Carmen 03 83 0 0 0 0 0 0 John Smith 04 78 0 0 0 0 0 0 PE11 Gillian Roger 08 26 0 0 0 0 0 0 Tarry Stanley 02 100 1 1 0 1 1 0 Nov 3 rd , 2006 Dissertation Defense 3 Nov 3 rd , 2006 Dissertation Defense 4

  2. Image Processing (Edge Detection Using Database Processing Convolution) � In the following slides I present some applications of our Output Image processor 0 0 0 0 0 0 � Relational Database Processing: O(|B|) 0 1 0 0 1 0 � Intersection, Union, Cartesian Product and Join are basic 0 1 0 0 1 0 operations in Database processing. Using associative Search Input Image 0 1 0 0 1 0 and STEP operations, we can achieve much faster processing 0 0 1 0 0 0 0 0 0 0 time 0 0 0 0 0 1 1 1 1 0 Intersection Union 0 1 1 1 1 0 Relation A Relation A 0 1 1 1 1 0 Student ID Class Student ID Class PE7 04 239 04 239 0 1 1 1 1 0 PE8 11 111 11 111 PE9 07 239 CR 07 239 CR 0 0 0 0 0 0 PE10 07 124 07 124 PE11 05 124 05 124 PE12 04 111 04 111 -1 0 1 Relation B Relation B Step 1 Step 1 Weight -1 0 1 PE13 05 111 05 111 2 2 PE14 04 111 04 111 -1 0 1 3 3 PE15 07 124 07 124 4 4 PE16 11 124 11 124 Nov 3 rd , 2006 Dissertation Defense 5 Nov 3 rd , 2006 Dissertation Defense 6 String Matching Table of Contents text$ counter$ match$ patt_counter text$ counter$ match$ patt_counter 1 @ 0 0 R APE 0 1 @ 0 0 R APE 0 Assoc. Assoc. Control Control patt_length patt_length 2 A 0 0 R APE Unit 2 A 0 0 R APE Unit (CU) 2 (CU) 2 3 B 0 0 R APE 3 B 0 0 R APE � Chapter 1 – Introduction patt_string patt_string 4 A 0 0 R APE AB 4 A 0 0 R APE AB � Chapter 2 – First Prototypes of an Associative Computing 5 A 0 0 R APE 5 A 0 0 R APE j (ASC) Processor text$ counter$ match$ patt_counter text$ counter$ match$ patt_counter 1 @ 0 0 APE 0 Assoc. 1 @ 0 0 APE 0 � 1 Assoc. � Chapter 3 – A Scalable Pipelined ASC Processor With Control Control patt_length patt_length 2 A 0 0 APE Unit 2 A 0 � 1 0 APE Unit (CU) 2 Reconfigurable PE Interconnection Network (CU) 2 3 B 0 0 R APE 3 B 0 0 R APE patt_string patt_string 4 A 0 0 APE AB � Chapter 4 – A Specialized ASC Processor with Reconfigurable AB 4 A 0 0 APE 5 A 0 0 APE j 5 A 0 0 APE j 2D Mesh for Solving the Longest Common Subsequence (LCS) text$ counter$ match$ patt_counter Problem text$ counter$ match$ patt_counter 1 @ 0 0 R APE 1 Assoc. 1 @ 0 0 APE 1 Assoc. Control � Chapter 5 – An ASC Processor to Support Multiple Instruction patt_length Control 2 A 1 0 R APE patt_length Unit 2 A 1 0 R APE Unit (CU) 2 (CU) 2 Stream Associative Computing (MASC) 3 B 0 0 R APE 3 B 0 0 APE patt_string patt_string 4 A 0 0 R APE AB AB 4 A 0 0 APE � Chapter 6 – Conclusions and Future Work 5 A 0 0 R APE j 5 A 0 0 APE j text$ counter$ match$ patt_counter text$ counter$ match$ patt_counter 1 @ 0 � 2 0 APE 1 � 2 Assoc. 1 @ 2 0 R APE 2 Assoc. Control patt_length Control 2 A 1 0 R APE Unit patt_length 2 A 1 0 R APE Unit (CU) 2 (CU) 2 3 B 0 0 APE 3 B 0 0 R APE patt_string patt_string 4 A 0 0 APE AB AB 4 A 0 0 R APE 5 A 0 0 APE j 5 A 0 0 R APE Nov 3 rd , 2006 Dissertation Defense 7 Nov 3 rd , 2006 Dissertation Defense 8

  3. Implementing 1-D and 2-D PE Interconnection Control Network PE and Memory Unit NWIN Control NWOUT � The network is implemented PE and Memory Register Signal Register Instruction as a large 8xN bit wide NWIN Bus PE0 PE0 memory register (where N is the and supporting Responder number of PEs), an 8xN bit Network Resolution circuitry PE1 PE1 Unit NWOUT register Data Bus From Control � Data enters the network Unit PE2 PE2 through the NWIN register, which stores data for PE j in bits from 8j to 8j+7, and then PE and Memory that data is routed to the Common proper place in the NWOUT Registers PE(n-3) PE(n-3) register PE and Memory PE(n-2) PE(n-2) PE Array PE(n-1) PE(n-1) Scalable ASC (Associative Computing) Processor Nov 3 rd , 2006 Dissertation Defense 9 Nov 3 rd , 2006 Dissertation Defense 10 Implementing 1-D and 2-D PE Interconnection Table of Contents Network This version of ASC processor supports both a 1-D and 2-D PE � Chapter 1 – Introduction interconnection network for those applications that require a � Chapter 2 – First Prototypes of an Associative Computing network (ASC) Processor . . . � Chapter 3 – A Scalable Pipelined ASC Processor With - - - , , , Reconfigurable PE Interconnection Network � Chapter 4 – A Specialized ASC Processor with Reconfigurable 2D Mesh for Solving the Longest Common Subsequence (LCS) . - !"# ,-. , Problem !"# !"$ !"% � Chapter 5 – An ASC Processor to Support Multiple Instruction Stream Associative Computing (MASC) . - , !"$ ,-. � Chapter 6 – Conclusions and Future Work !"& !"( !"' !"% ,-. !") !"* !"+ Nov 3 rd , 2006 Dissertation Defense 11 Nov 3 rd , 2006 Dissertation Defense 12

  4. Pipelined ASC Processor with Reconfigurable ASC Processor’s Pipelined Architecture Interconnection Network Control Unit (CU) Parallel PE (PPE) Array � I have implemented a scalable pipelined SIMD Associative (ASC) Processor using Altera FPGAs Instruction Memory � Field Programmable Gate Arrays (FPGAs) are typically used for IF/ID Latch designs and can be thought of as programmable hardware � Five single-clock-cycle pipeline stages are split between the Decoder Immediate SIMD Control Unit (CU) and the PEs Data � In the Control Unit Register File � Instruction Fetch (IF) Broadcast Register � Part of Instruction Decode (ID) ID/EX Latch Data � In the Scalar PE (SPE), in each Parallel PE (PPE) � Rest of Instruction Decode (ID) EX/MEM Latch � Execute (EX) Data Memory � Memory Access (MEM) � Data Write Back (WB) MEM/WB Latch Sequential PE (SPE) Nov 3 rd , 2006 Dissertation Defense 13 Nov 3 rd , 2006 Dissertation Defense 14 Processing Element (PE) Pipelined ASC Processor’s Performance Mask � Our pipelined ASC Processor has been implemented on an Altera APEX20KC1000 FPGA with 70 8-bit PEs Comparator � Other 8-bit processor cores implemented on this FPGA / speed grade have clock speeds ranging from 30 to 106 MHz, typically 60-68 MHz Data Memory MEM/WB Latch EX/MEM Latch Data Switch ID/EX Latch Register File MUX � Our pipelined ASC Processor has a clock speed of 56.4 MHz, comparable with these other processors � With the 5-stage pipeline, our ASC Processor can approach a peak performance of 300 MHz � Comparator implements associative search, pushes ‘1’ onto top of stack for responders, ‘0’ otherwise � Top of mask of ‘0’ disables ID/EX Latch Nov 3 rd , 2006 Dissertation Defense 15 Nov 3 rd , 2006 Dissertation Defense 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend