reconfigurable and reconfigurable and adaptive systems
play

Reconfigurable and Reconfigurable and Adaptive Systems (RAS) - PowerPoint PPT Presentation

Institut fr Technische Informatik Institut fr Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel Chair for Embedded Systems - Prof. Dr. J. Henkel Vorlesung im SS 2013 Reconfigurable and Reconfigurable and Adaptive


  1. Institut für Technische Informatik Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel Chair for Embedded Systems - Prof. Dr. J. Henkel Vorlesung im SS 2013 Reconfigurable and Reconfigurable and Adaptive Systems (RAS) Adaptive Systems (RAS) 7. Adaptive Reconfigurable Processors Lars Bauer, Jörg Henkel - 1 - - 2 - RAS Topic Overview Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel 1. Introduction 2. Overview 3. Special Instructions 4. Fine-Grained Reconfigurable Processors 5. Configuration Prefetching • RISPP 6. Coarse-Grained Reconfigurable Processors • WARP • Dynamic Instruction 7. Adaptive Merging (DIM) Reconfigurable Processors • Further relevant 8. Fault-tolerance architectures / by Reconfiguration domains - 3 - - 4 - L. Bauer, CES, KIT, 2013

  2. Overview RISPP Recall � Some parts were already introduced as case-study in � Developed at CES, KIT previous lectures � Tightly-coupled fine-grained reconfigurable � Instruction Format (up to 4 read and 2 write registers, immediate values, 10-bit virtual opcode) fabric � Using the core ISA (cISA) to implement SIs when their � Introduces and implements modular SIs reconfiguration is not completed yet (trap handler) ◦ Provide different performance/area trade-offs at run- � Special Instructions have access to main memory and to a time fast on-chip scratch-pad memory ◦ Using two independent 128-bit ports � Realizes high run-time adaptivity, i.e. a run-time ◦ Pipeline stalls when SI executes in hardware system decides which reconfigurations shall be � Dynamic Prefetching (called ‘Forecasting’) using weighted performed and when they shall be performed error-back propagation - 5 - - 6 - L. Bauer, CES, KIT, 2013 L. Bauer, CES, KIT, 2013 Analysis of Special Instruction RISPP HW Architecture Overview Execution On-chip Memory System Legend: 128 � Partition the reconfi- Added Bus gurable fabric into so- Core Pipeline parts Core Pipeline 32 128 called SI Containers Memory ◦ aka ‘Reconfigurable Functional Unit’ Arbiter 32 32 � An SI may be loaded Data Cache Off-Chip WB into any free Container 32 Memory Core Pipeline 128 128 Legend: Core Pipeline (scaled down): MEM � Problems: Special Instruction Reconfigu- Container Container Container VGA Reconfig. Reconfig. Reconfig. rable area: Container (SIC): ◦ Relatively long reconfi- Load / … EXE Store Units guration time ICAP Corresponds to ◦ Limited Resource Sharing ID OneChip, Chimaera, ◦ Fragmentation (not the … Proteus, … entire available space Inter- Inter- Inter- IF nect … Intercon- con- con- con- may be usable) nect nect nect - 7 - - 8 - L. Bauer, CES, KIT, 2013 L. Bauer, CES, KIT, 2013

  3. Analysis of Special Instruction Fundamental Processor Extension: Execution (cont’d) Atom / Molecule Model All 31,977 SI executions completed � Definition Atom: 35 Executions (in thousands) ◦ A computational data path No cISA exec. 30 ◦ Smallest block that can be reconfigured (‘atomic’ in that With cISA exec. #Accumulated SI sense) 25 With cISA exec. & smaller SIs � Example: Transform Atom With cISA exec. & upgrades 20 DCT HT 15 Y 00 + − X 00 >> 1 RISPP’s 10 + modular SIs − X 30 << 1 Y 10 >> 1 5 >> 1 − Y 30 0 X 10 << 1 − 0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 + + X 20 >> 1 Execution Time [Million cycles] src: [BSH08a] Y 20 - 9 - - 10 - L. Bauer, CES, KIT, 2013 L. Bauer, CES, KIT, 2013 Fundamental Processor Extension: Fundamental Processor Extension: Atom / Molecule Model Atom / Molecule Model � Definition Special Instruction: � Example: Sum of Absolute � Definition Molecule: reconfigured) Atoms ◦ Similar to HLS scheduling after ◦ An assembly instruction Transformed Differences (SATD) ◦ Implementation of an SI allocating a certain number of Atoms ◦ Dataflow graph of Atoms g p ◦ Using the available (i.e. at that time Using the available (i.e. at that time DCT=0 HT=0 DCT=0 HT=1 INPUT: OUTPUT: Repack (2 instances) Transform (2 instances) SAV (2 instances) 10 11 12 13 14 15 16 17 + + + + + + SAV (Sum of QSub Repack Transform Absolute Values) - 11 - - 11 - - 12 - - 12 - L. Bauer, CES, KIT, 2013 . Bauer, CES, KIT, 2013 L. Bauer, CES, KIT, 2013 . Bauer, CES, KIT, 2013

  4. Fundamental Processor Extension: Difference to SI Containers Atom / Molecule Model SI Containers Atom Containers SPECIAL IN- SI A SI B SI C STRUCTIONS Core Pipeline Core Pipeline (SIs) MOLECULES (an SI can be implemented by any of its A 1 A 2 A 3 A cISA B 1 B 2 B cISA C 1 C 2 C cISA Molecules) � Multiple SIs may share common Atoms 1 ATOMS 2 1 2 2 1 1 1 2 2 1 1 2 2 (the numbers 2 1 1 1 2 1 � There is no predetermined maximum of supported SIs denote: #Atom- instances requi- red for this � But: it is not possible/easy to execute two SIs at the same time Atom 1 Atom 2 Atom 3 Atom 4 Atom 5 Atom 6 Molecule) (as they are no longer independent) � For each SI there are different Atoms can be shared among different ◦ ◦ Not necessarily a problem, see Molen (single controller unit) and Molecules and SIs implementations (Molecules) � Implementation of a particular SI OneChip (memory coherency problems) There is one Molecule that does not ◦ can be gradually upgraded by need any Atom (Software � SIs can be upgraded (step-by-step by loading more Atoms) Implementation with core-ISA: cISA) loading more Atoms - 13 - - 14 - L. Bauer, CES, KIT, 2013 L. Bauer, CES, KIT, 2013 Adaptivity Through Dynamic Summary Modular SIs Performance vs. Area Trade-off SI Molecules: Performance vs. Reconfigurable Resources � Concept improves the efficiency and flexibility 40 Area requirements [# loaded Atoms] ◦ Atom sharing 35 ◦ Reduced fragmentation max Execution Time [Cycles] 10 IPred VDC 16x16 (I-MB) 30 ◦ Reduced reconfiguration overhead (due to SI upgrading) IPred HDC 16x16 (I-MB) 25 � Decision how many Atom Containers shall be MC Hz 4 (P-MB) 5 spend for which SI can be adapted at run time 20 15 � However, this adaptivity demands a run-time system that determines the decision and that 10 0 implies overhead (to execute it) 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Hardware Resources [Atom Containers] - 15 - - 16 - L. Bauer, CES, KIT, 2013 L. Bauer, CES, KIT, 2013

  5. Run-time System: Simplified Run-time System: Simplified Overview Overview (cont’d) Core Pipeline � Decode: detects SIs and Forecasts (for prefetching) and sends them to the execution controls (only SIs) and Monitoring (SIs and Instruction Status / Control Forecasts) Memory Reconfigurable HW � Execution Control: executes SIs by determining their fastest Instruction currently available Molecule (state is maintained in a look-up table) and triggers the hardware execution (using the Atoms) or the software emulation (using the trap handler) Execution Reconf. Decode Replacing � Monitoring: Counts the executions for each SI Control Sequence � Prediction: Fine-tunes the Forecasts (recall: dynamic prefetching; Run-time see below) and resets the monitoring values Selection System P ME ME P EE EE P LF LF P ME ME P EE EE Monitoring Prediction P: Prefetching Point EE: Encoding Engine ME: Motion Estimation LF: Loop Filter - 17 - - 18 - L. Bauer, CES, KIT, 2013 L. Bauer, CES, KIT, 2013 Run-time System: Simplified Formal Atom/Molecule Model Overview (cont’d) # Instances of Atom A 1 � Selection: Select Molecules to implement the � Representing the 5 � � Molecules as a o � � o o o 1 4 1, 4 1, 4 1, 4 forecasted SIs vector of Atoms � � � o � � y o o p o 5 5 � � � ◦ The example only 5, 4 4 � Reconfiguration Sequence Scheduling: shows 2 Atom Types y � � y 9 9 ( A 0 and A 1 ), thus each Determine the reconfiguration sequence of � � x � � � � � � � vector has 2 entries; x x x o o o o p p p p 1, 2 1, 2 3 in general: ℕ n � � x x 3 3 the Atoms that are required to implement the � � � Basic operators p � � p p p 5, 2 5, 2 5, 2 5 2 selected Molecules 2 ◦ How many Atoms are p � � p 7 7 needed for a Molecule � Replacing: Determines, which currently ◦ Which Atoms have 1 two Molecules in common configured Atom shall be replaced by a new ◦ Which Atoms are Atom that is scheduled to be reconfigured needed to fulfill the demands of two 1 2 3 5 6 4 Molecules # Instances of Atom A 0 - 19 - - 20 - L. Bauer, CES, KIT, 2013 L. Bauer, CES, KIT, 2013

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend