reconfigurable and
play

Reconfigurable and Adaptive Systems (RAS) 7. Adaptive Reconfigurable - PowerPoint PPT Presentation

Institut fr Technische Informatik Institut fr Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel Chair for Embedded Systems - Prof. Dr. J. Henkel Vorlesung im SS 2014 Reconfigurable and Adaptive Systems (RAS) 7.


  1. Institut für Technische Informatik Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel Chair for Embedded Systems - Prof. Dr. J. Henkel Vorlesung im SS 2014 Reconfigurable and Adaptive Systems (RAS) 7. Adaptive Reconfigurable Processors Lars Bauer, Jörg Henkel - 1 - - 2 - RAS Topic Overview Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel 1. Introduction 2. Overview 7.1 RISPP: Rotating 7 3. Special Instructions Instruction Set 4. Fine-Grained Reconfigurable Processors Processing Platform 5. Configuration Prefetching • RISPP 6. Coarse-Grained Reconfigurable Processors • WARP • Dynamic Instruction 7. Adaptive Merging (DIM) Reconfigurable Processors • Further relevant 8. Fault-tolerance architectures / by Reconfiguration domains - 3 - - 4 - L. Bauer, CES, KIT, 2014

  2. Overview RISPP Recall � Some parts were already introduced as case-study in � Developed at CES, KIT previous lectures � Tightly-coupled fine-grained reconfigurable � Instruction Format (up to 4 read and 2 write registers, fabric immediate values, 10-bit virtual opcode) � Using the core ISA (cISA) to implement SIs when their � Introduces and implements modular SIs reconfiguration is not completed yet (trap handler) � Provide different performance/area trade-offs at run- � Special Instructions have access to main memory and to a time fast on-chip scratch-pad memory � Using two independent 128-bit ports � Realizes high run-time adaptivity, i.e. a run-time � Pipeline stalls when SI executes in hardware system decides which reconfigurations shall be � Dynamic Prefetching (called ‘Forecasting’) using weighted performed and when they shall be performed error-back propagation - 5 - - 6 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014 Analysis of Special Instruction RISPP HW Architecture Overview Execution On-chip Memory Legend: System 128 � Partition the reconfi- Added Bus gurable fabric into so- Core Pipeline parts Core Pipeline 32 128 called SI Containers Memory � aka ‘Reconfigurable Functional Unit’ Arbiter 32 32 � An SI may be loaded Data Cache Off-Chip into any free Container WB Core Pipeline 32 Memory 128 128 Legend: Core Pipeline (scaled down): � Problems: MEM Reconfigu- Special Instruction Container Container Container VGA Reconfig. Reconfig. Reconfig. rable area: Container (SIC): � Relatively long reconfi- Load / … EXE guration time Store Units Corresponds to ICAP � Limited Resource Sharing ID OneChip, Chimaera, � Fragmentation (not the … Proteus, … entire available space Inter- Inter- nect … Inter- IF Intercon- con- con- con- may be usable) nect nect nect - 7 - - 8 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014

  3. Analysis of Special Instruction Fundamental Processor Extension: Execution (cont’d) Atom / Molecule Model All 31,977 SI executions completed � Definition Atom: 35 Executions (in thousands) � A computational data path No cISA exec. � Smallest block that can be reconfigured (‘atomic’ in that 30 With cISA exec. sense) #Accumulated SI 25 With cISA exec. & smaller SIs � Example: Transform Atom With cISA exec. & upgrades 20 DCT HT 15 Y 00 + � X 00 >> 1 RISPP’s 10 + � modular SIs X 30 << 1 Y 10 >> 1 5 >> 1 � Y 30 0 X 10 << 1 � 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.8 2.0 1.6 + + X 20 >> 1 Execution Time [Million cycles] src: [BSH08a] Y 20 - 9 - - 10 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014 Fundamental Processor Extension: Fundamental Processor Extension: Atom / Molecule Model Atom / Molecule Model � Definition Special Instruction: � Example: Sum of Absolute � Definition Molecule: reconfigured) Atoms � Similar to HLS scheduling after An assembly instruction Transformed Differences (SATD) Implementation of an SI � � allocating a certain number of Atoms Dataflow graph of Atoms g p Using the available (i.e. at that time Using the available (i.e. at that time � � DCT=0 HT=0 DCT=0 HT=1 INPUT: OUTPUT: Repack (2 instances) Transform (2 instances) SAV (2 instances) 10 11 12 13 14 15 16 17 + + + + + + SAV (Sum of QSub Repack Transform Absolute Values) - 11 - - 11 - - 12 - - 12 - L. Bauer, CES, KIT, 2014 . Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014 . Bauer, CES, KIT, 2014

  4. Fundamental Processor Extension: Difference to SI Containers Atom / Molecule Model SI Containers Atom Containers SPECIAL IN- SI A SI B SI C STRUCTIONS Core Pipeline Core Pipeline (SIs) MOLECULES (an SI can be implemented by any of its C cISA A 1 A 2 A 3 A cISA B 1 B 2 B cISA C 1 C 2 Molecules) 1 � Multiple SIs may share common Atoms ATOMS 2 1 2 2 1 1 1 2 2 1 2 1 2 2 1 1 (the numbers 1 2 1 � There is no predetermined maximum of supported SIs denote: #Atom- instances requi- red for this � But: it is not possible/easy to execute two SIs at the same time Atom 1 Atom 2 Atom 3 Atom 4 Atom 5 Atom 6 Molecule) (as they are no longer independent) � For each SI there are different Atoms can be shared among different � � Not necessarily a problem, see Molen (single controller unit) and Molecules and SIs implementations (Molecules) � Implementation of a particular SI OneChip (memory coherency problems) There is one Molecule that does not � can be gradually upgraded by need any Atom (Software � SIs can be upgraded (step-by-step by loading more Atoms) loading more Atoms Implementation with core-ISA: cISA) - 13 - - 14 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014 Adaptivity Through Dynamic Summary Modular SIs Performance vs. Area Trade-off SI Molecules: Performance vs. Reconfigurable Resources � Concept improves the efficiency and flexibility 40 Area requirements [# loaded Atoms] � Atom sharing 35 � Reduced fragmentation max 10 Execution Time [Cycles] IPred VDC 16x16 (I-MB) 30 � Reduced reconfiguration overhead (due to SI upgrading) IPred HDC 16x16 (I-MB) � Decision how many Atom Containers shall be 25 MC Hz 4 (P-MB) 5 spend for which SI can be adapted at run time 20 � However, this adaptivity demands a run-time 15 system that determines the decision and that 10 0 implies overhead (to execute it) 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Hardware Resources [Atom Containers] - 15 - - 16 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014

  5. Run-time System: Simplified Run-time System: Simplified Overview Overview (cont’d) Core Pipeline � Decode: detects SIs and Forecasts (for prefetching) and sends them to the execution controls (only SIs) and Monitoring (SIs and Instruction Forecasts) Status / Control Memory Reconfigurable HW � Execution Control: executes SIs by determining their fastest Instruction currently available Molecule (state is maintained in a look-up table) and triggers the hardware execution (using the Atoms) or the software emulation (using the trap handler) Execution Reconf. Decode Replacing � Monitoring: Counts the executions for each SI Control Sequence � Prediction: Fine-tunes the Forecasts (recall: dynamic prefetching; Run-time see below) and resets the monitoring values System Selection ME EE LF ME EE P ME P EE P LF P ME P EE Monitoring Prediction P: Prefetching Point EE: Encoding Engine ME: Motion Estimation LF: Loop Filter - 17 - - 18 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014 Run-time System: Simplified Formal Atom/Molecule Model Overview (cont’d) # Instances of Atom A 1 � Selection: Select Molecules to implement the � Representing the Molecules as a 5 � � o � � forecasted SIs o o o 1 4 1, 4 1, 4 1, 4 vector of Atoms � � � o � � y o o p o 5 5 � � � The example only � 5, 4 � Reconfiguration Sequence Scheduling: 4 shows 2 Atom Types y � � y 9 9 ( A 0 and A 1 ), thus each Determine the reconfiguration sequence of � � vector has 2 entries; x � � � � � � � x x x o o o o o p p p p 1, 2 1, 2 3 in general: � n the Atoms that are required to implement the � � x x 3 3 � Basic operators � � p � � p p p 5 2 5, 2 5, 2 5, 2 selected Molecules 2 � How many Atoms are p � � p 7 7 needed for a Molecule � Replacing: Determines, which currently � Which Atoms have 1 two Molecules in configured Atom shall be replaced by a new common � Which Atoms are Atom that is scheduled to be reconfigured needed to fulfill the demands of two 1 2 3 5 6 4 Molecules # Instances of Atom A 0 - 19 - - 20 - L. Bauer, CES, KIT, 2014 L. Bauer, CES, KIT, 2014

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend