reconfigurable and
play

Reconfigurable and Adaptive Systems (RAS) Lars Bauer, Jrg Henkel - - PDF document

Institut fr Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel Vorlesung im SS 2014 Reconfigurable and Adaptive Systems (RAS) Lars Bauer, Jrg Henkel - 1 - Institut fr Technische Informatik Chair for Embedded Systems


  1. Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel Vorlesung im SS 2014 Reconfigurable and Adaptive Systems (RAS) Lars Bauer, Jörg Henkel - 1 - Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel Reconfigurable and Adaptive Systems (RAS) 5. Configuration Prefetching - 2 -

  2. RAS Topic Overview 1. Introduction 2. Overview • Motivation and 3. Special Instructions Definition 4. Fine-Grained • Static Prefetching Reconfigurable Processors • Clock Frequency 5. Configuration Prefetching Variation 6. Coarse-Grained • Dynamic Reconfigurable Processors Prefetching 7. Adaptive • Area Models Reconfigurable Processors 8. Fault-tolerance by Reconfiguration - 3 - L. Bauer, CES, KIT, 2014 Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel 5 5.1 Motivation and Definition - 4 -

  3. Recall: Performing Run-time Reconfigurations � PRISC ◦ Reconfiguration is triggered implicitly by SI execution ◦ Reconfiguration time: 100-600 cycles ◦ Fast reconfiguration time at the cost of very limited SI complexity � XiRISC ◦ pGA-load: load a configuration into the array ◦ pGA-free: remove a configuration ◦ 16 cycles to receive a complete configuration if it is available in 2 nd level configuration cache ◦ Approx. ‘128+startup’ cycles (not explicitly stated by the authors) to receive it from external memory � 8 times slower because 2 nd level cache bus is 8 times wider (256 bit) than memory bus (32 bit) - 5 - L. Bauer, CES, KIT, 2014 Recall: Performing Run-time Reconfigurations (cont’d) � Garp ◦ gaconf reg : Load (or switch to) configuration at address given by reg ◦ gasave reg : Save all array data state to memory at address given by reg ◦ garestore reg : Restore previously saved data state from memory at address given by reg ◦ Approx. 50 μs to reconfigure 32 rows (12 bus cycles per row plus some startup time) � MOLEN: ◦ p-set address : reconfigure those parts that that seldom change ◦ c-set address : reconfigure those parts not addressed by p-set ◦ set-prefetch address : prefetches the Microcode that is responsible for a p-set or c-set operation ◦ Reconfiguration time between 2 and 12 ms - 6 - L. Bauer, CES, KIT, 2014

  4. Recall: Performing Run-time Reconfigurations (cont’d) � Reconfiguration can last from few cycles (if available in cache) over microseconds (for limited SI complexity) to milliseconds (powerful FPGA fabrics) � If a configuration is not available when the SI shall execute then the system performance is significantly affected ◦ Either stall the execution until the reconfiguration completes ◦ Or use the core ISA to implement the SI functionality (trap handler or conditional branch) � Solution: if some region of the reconfigurable fabric is currently free (i.e. not occupied by another configuration), then configuration prefetching can be used to perform the reconfiguration before the SI is needed - 7 - L. Bauer, CES, KIT, 2014 Configuration Prefetching � Definition: Start loading the configuration data of a particular SI before that SI is actually used � Goal: Minimize performance loss due to pending reconfigurations ◦ Typically: try to finish the reconfiguration before the SI is executed ◦ Note: sometimes it is better to avoid reconfiguration for an SI and to execute it with the core ISA instead (to avoid Thrashing ) ◦ Configuration prefetching can be used to transfer confi- guration data from external memory to configuration cache (preparing a reconfiguration) or to perform the reconfiguration right ahead - 8 - L. Bauer, CES, KIT, 2014

  5. Configuration Prefetching (cont’d) prefetch SI � Example Control- flow graph Time for ◦ Each node is a Base- reconfigu- ration Block (BB) ◦ Color indicates the execution frequency Execute ◦ Edges show the SI control flow ◦ Red edges are function calls (dashed lines) or Return from subroutine returns (solid lines) - 9 - L. Bauer, CES, KIT, 2014 Configuration Prefetching (cont’d) � Example Control- flow graph ◦ Each node is a Base- Block (BB) ◦ Color indicates the execution frequency ◦ Edges show the control flow ◦ Red edges are function calls (dashed lines) or returns (solid lines) - 10 - L. Bauer, CES, KIT, 2014

  6. Relevant Parameters for Prefetching � Temporal distance between starting the prefetching operation and the SI execution ◦ Depends on control flow ◦ Starting too late � SI is demanded before prefetching completes ◦ Starting too early � Potential conflicts between currently demanded SIs and SIs that shall be prefetched � Probability that the SI executions are reached ◦ Depends on control flow ◦ Typically: when prefetching is started earlier, then the uncertainty whether the SI execution is eventually reached is higher � Number of SI executions ◦ Depends on control flow ◦ If the SI is executed rather seldom then it might be better to execute it using the core ISA rather then speculating a prefetch operation - 11 - L. Bauer, CES, KIT, 2014 Aborting prefetching operations � False Prefetching: Due to control-flow uncertainty, it can happen that prefetching for an SI was triggered and even before it finishes it becomes clear that the SI is not going to execute at all � ‘Still Pending’ False Prefetching: ◦ The prefetching was triggered to a queue and did not start yet ◦ Simply remove it from that queue � ‘Already Running’ False Prefetching: ◦ The prefetching operation is currently running ◦ For line-based reconfigurable fabrics (e.g. Garp or XiRisc) finish prefetching the current line and abort it afterwards (short delay) ◦ For FPGA-based reconfigurable fabrics aborting may not be possible (unless prefetching to a cache) - 12 - L. Bauer, CES, KIT, 2014

  7. Aborting prefetching operations (cont’d) SI-centric � In node I4 and I3 Control Flow Graph all 4 SIs may be prefetched Circle: set of in- I4 structions (poten- � When the R Rectangle: tially with embed- control flow usage of SI I3 ded control flow, moves to I1 then sub routine it is clear that calls etc.) I1 SIs 3 and 4 are not demanded ◦ Their prefetching I2 may be stopped (if possible) src: [LH02] - 13 - L. Bauer, CES, KIT, 2014 Aborting prefetching operations (cont’d) ‘0’ ‘1’ - 14 - L. Bauer, CES, KIT, 2014

  8. Aborting prefetching operations (cont’d) ‘0’ - 15 - L. Bauer, CES, KIT, 2014 Aborting prefetching operations (cont’d) ‘0’ ‘1’ - 16 - L. Bauer, CES, KIT, 2014

  9. Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel 5 5.2 Static Prefetching - 17 - Static Prefetching � Idea: At compile time analyze the control-flow graph and embed prefetching instructions into to code that statically decide which SIs shall be prefetched � Required: probability which branch will be taken ◦ From profiling; shown as edge labels (in percent) � Next step: At each node, establish a list of all reachable SIs, sorted by the probability to reach them src: [LH02] - 18 - L. Bauer, CES, KIT, 2014

  10. ���� ���� Static Prefetching (cont’d) � Probability of a node n to reach SI s : � � � � � n s , : � e � Paths from p Edges e on node to SI n s Path p � Example: 3 Paths to reach SI 3 from I10 ◦ 0.3 * 0.4 * 0.4 + ◦ 0.3 * 0.6 * 0.4 + ◦ 0.2 * 0.8 * 0.4 ◦ Probability to reach SI 3 from node I10: 18.4% src: [LH02] - 19 - L. Bauer, CES, KIT, 2014 Static Prefetching (cont’d) All SI nodes that can be reached (in decreasing probability) P1,4,3,2 � Algorithm moving backwards through the graph ◦ Initialize a queue with all ‘SI- P3,4,1,2 nodes’ (squares, 100% reachability) P4,3,2 ◦ Remove a node n from the P1,3,2 queue and update the pro- bability information of all P1,2 P4,3 its predecessors that they can also reach the SIs that can be reached from node n ◦ When all successors of a P1 P2 P3 P4 node are processed then add it to the queue ◦ Iterate until queue is empty src: [LH02] - 20 - L. Bauer, CES, KIT, 2014

  11. Static Prefetching (cont’d) P1,4,3,2 � Depending on the capacity of the FPGA, limit the prefetches to e.g. the 2 most probable SIs (this P3,4,1,2 affects I7, I8, I9, I10) � Some prefetch instructions are P4,3,2 redundant (e.g. due to previously P1,3,2 executed prefetches, I1, I4) ◦ Prefetch at I2 may not be removed P1,2 P4,3 because the control flow may come from node I8 (no SI 2 at I8) or from I5 (SI 1 might have started first and needs to be aborted now) ◦ Prefetch at I6 may be removed P1 P2 P3 P4 even though I9 prefetches P3 first; here, it is not beneficial to abort P3 to start P4 from scratch src: [LH02] - 21 - L. Bauer, CES, KIT, 2014 Institut für Technische Informatik Chair for Embedded Systems - Prof. Dr. J. Henkel 5 5.3 Clock Frequency Variation - 22 -

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend