razor and recycle
play

Razor and ReCycle A M E E N A K E L Razor Razor Motivation - PowerPoint PPT Presentation

Razor and ReCycle A M E E N A K E L Razor Razor Motivation Power Todays designs are extremely power hungry Power now a limiting factor Performance cannot be sacrificed (overall) to save on power Both situations must


  1. Razor and ReCycle A M E E N A K E L

  2. Razor

  3. Razor – Motivation  Power  Today’s designs are extremely power hungry  Power now a limiting factor  Performance cannot be sacrificed (overall) to save on power  Both situations must continue improving  Static Voltage Scaling  Not adaptive enough  Must be conservative estimates  Wastes power savings for little to no performance benefit  Make average silicon matter!

  4. Razor – Approach  The Main Idea  Circuit delay is data dependent, so why should designers care about the conservative case?  Shooting for the “average” case – just like with ReCycle  Lower the supply voltage (to sub-critical voltages) to reduce power throughout the chip  What happens if execution encounters a “worst-case” path through a pipeline stage?  Wrong data can be latched and moved to the next stage  Razor hybrids a few previous designs to solve this.

  5. Razor Design Goals  Razor hardware must not interfere with error-free operation of a pipeline  Nearly invisible to the common case  Razor hardware cannot fail; it must always be correct  Razor hardware must be minimal in both hardware size and power footprint

  6. Razor – Approach  What’s unique to Razor?  Counterflow pipeline in the synchronous world  Handling metastability  Inducing error-prone state  What did it inherit?  Delayed latch idea (Triple Latch), but not implementation (Shadow Latch)  Error correction method (from DIVA)

  7. Razor – Approach  Exploring the Shadow Latch  Method to detect and recover from errors in a minimal number of cycles  Shadow latch is delayed about 50% behind the main flip-flop’s clock in order to catch any timing errors  A comparator will quickly decide how accurate the data in the flip-flop is via an XOR gate.  Pipeline stages are designed so that in the absolute worst case, the shadow latch’s setup time is met.  An encountered error will invalidate any data coming out of the flip-flop for that cycle clk Cycle 1 Cycle 2 Cycle 3 Cycle 4 D1 Logic stage Logic stage Q1 clk Main L1 0 L2 flip-flop clk_delayed 1 Error_L D Instr 1 Instr 2 Shadow latch Razor FF Comparator Error Error clk_delayed Q Instr 1 Instr 2

  8. Razor – Approach  Metastability – The state in which a signal is neither 0 nor 1. The state usually settles around V dd /2.  Shadow latch can never be metastable, based upon its timing constraints.  If flip-flop becomes metastable, the metastability detector can report on that fact (most of the time).  Small chance that Error can become metastable, which is claimed as inevitable. In this case, a panic signal is raised and the pipeline is flushed. !"#$% !"#$% !"# !"# & & ' ' !"#$% !"#$% !"# !"# .-/012/0%3"3/45,-/-!/*) .-/012/0%3"3/45,-/-!/*) !"#$" !"#$" ())*)$+ ())*)$+ !"#$% !"#$% !"#$,-"$% !"#$,-"$% ())*)$+ ())*)$+ !"#$,-" !"#$,-" 670,*85+0/!7 670,*85+0/!7 Figure 2. Reduced overhead Razor flip-flop and meta- stability detection circuits.

  9. Razor Approach – Recovery  Clock Gating WB ST Stabilizer FF MEM IF ID EX Razor FF Razor FF Razor FF Razor FF (reg/mem) PC Error Error Error Error Recover Recover Recover Recover clk (a) Razor latch gets Correct value correct EX value provided to MEM Time (in cycles) IF ID EX MEM ST WB Instructions IF ID EX* MEM* MEM ST WB IF ID EX Stall MEM ST WB Stall IF ID EX MEM ST (b)

  10. Razor Approach – Recovery  Clock Gating  Pipeline stalls on any Razor error  Forward progress is guaranteed, as the problematic input is always available at the previous stage’s Shadow Latch  Only a single cycle stall is required to recompute the next stage’s value, and the pipeline can continue.  Possible long cycle time  Cycle time must be long enough so that any stage in the pipeline can deliver a clock gating signal to the rest of the Flip Flops.

  11. Razor Approach – Recovery  Counterflow ST WB MEM Stabilizer FF IF ID EX Razor FF Razor FF Razor FF Razor FF (reg/mem) (read only) PC Bubble Bubble Bubble Bubble Error Error Error Error Recover Recover Recover Recover Flush control FlushID FlushID FlushID FlushID (a) Razor detects fault, Pipeline flush forwards bubble toward WB, completes initiates flush toward IF Time (in cycles) ST IF ID EX MEM WB Instructions IF ID EX* Bubble MEM ST WB IF ID EX Flush EX Flush ID Flush IF IF ID IF ID IF (b)

  12. Razor Approach – Recovery  Counterflow Pipelining  Uses an asynchronous-like design to propagate errors backwards  Now the error propagation is also pipelined, which translates to a minimal effect on the cycle time of each stage.  This translates into a tradeoff between resuming within one cycle versus a faster cycle time  Error signal travels through each pipelined register until reaching the PC, which then restarts execution.

  13. Razor Approach – Dynamic Adjustments  Focus on a constant error rate (E ref )  Change voltages based upon this measurement  Pros  Real dynamic changes based on the runtime conditions  Cons  Voltage regulators are slow  Slow reaction causes overcompensation reset reset E diff = E ref - E sample E diff = E ref - E sample Voltage Voltage E sample E sample V dd V dd E diff E diff Voltage Voltage . . signals signals Pipeline Pipeline error error Σ Σ Control Control E ref E ref . . Regulator Regulator . . Function Function - - panic panic Figure 6. Supply Voltage Control System

  14. Razor – Simulations/Data  Alpha-64 Simulation  Parameters:  In-order pipeline  8 KB I/D Caches  192/2408 flip-flops were augmented with a shadow latch.  Important results:  3.1% total power overhead for Razor parts  1% of total power for recovery overhead

  15. Razor – Simulations/Data  FPGA Multiplier Simulation 100.0000000% 100.0000000% 35% energy savings with 1.3% error 35% energy savings with 1.3% error 35% energy savings with 1.3% error 10.0000000% 10.0000000% Error rate (log scale) Error rate (log scale) 1.0000000% 1.0000000% 30% energy saving 30% energy saving 30% energy saving 0.1000000% 0.1000000% 22% saving 22% saving 22% saving 0.0100000% 0.0100000% 0.0010000% 0.0010000% 0.0001000% 0.0001000% 0.0000100% 0.0000100% One error every ~20 seconds One error every ~20 seconds random random 0.0000010% 0.0000010% 0.0000001% 0.0000001% 0.0000000% 0.0000000% 1.78 1.78 1.74 1.74 1.70 1.70 1.66 1.66 1.62 1.62 1.58 1.58 1.54 1.54 1.50 1.50 1.46 1.46 1.42 1.42 1.38 1.38 1.34 1.34 1.30 1.30 1.26 1.26 1.22 1.22 1.18 1.18 1.14 1.14 Supply Voltage (V) Supply Voltage (V) Environmental-margin Environmental-margin Safety-margin Safety-margin Zero-margin Zero-margin @ 1.69 V @ 1.69 V @ 1.63 V @ 1.63 V @ 1.54 V @ 1.54 V Figure 9. Measured Error Rates for an 18x18-bit FPGA Multiplier Block at 90 MHz and 27 C.

  16. Razor – Simulations/Data  Adder Simulation BZIP 1 .5 R e l E ne rgy  Fixed voltage sweep 1 .3 R e l P e rform ance Relative IPC and Energy 1 .1  Goal: 0 .9 0 .7  Reduce energy without 0 .5 sacrificing IPC 0 .3 1 % E rror R ate 0 .3 1 .8 1 .5 1.2 0 .9 0 .6 1.725 1 .65 1.575 1 .425 1 .35 1 .275 1 .125 1 .05 0 .975 0.825 0 .75 0.675 Voltage P ipe line P ipe line Throughput Throughput GCC E ne rgy E ne rgy 1 .5 IP C IP C R e l E ne rgy Tota l Adder E ne rgy, Tota l Adder E ne rgy, 1 .3 R e l P e rform ance Relative IPC and Energy E adder = E additions + E recovery E adder = E additions + E recovery 1 .1 0 .9 Optimal E adder Optimal E adder 0 .7 1 .6 2 % E rror R ate E nergy of Adde r E nergy of Adde r E ne rgy of E ne rgy of O pera tions, E additions O pera tions, E additions P ipe line P ipe line 0 .5 R e covery, R e covery, E nergy of Adde r E nergy of Adde r E recove ry E recove ry w/o R a zor S upport w/o R a zor S upport 0 .3 1.8 1 .725 1 .65 1 .575 1 .5 1 .425 1.35 1 .275 1 .2 1 .125 1 .05 0 .975 0.9 0 .825 0.75 0 .675 0 .6 D e cre asing S upply V oltage D e cre asing S upply V oltage Voltage Figure 11. The Qualitative Relationship Between Supply Voltage, Energy and Pipeline Throughput (for Figure 12. Relative Adder Energy and Pipeline a fixed frequency) . Throughput for Simulated Benchmarks.

  17. Razor – Simulations/Data GCC  Dynamic Scaling 2 40.00% Voltage 1.8 Error Rate 35.00%  Target error rate was 1.5% 1.6 30.00% 1.4 Supply Voltage  Takes 5000 cycle chunk 25.00% Error Rate 1.2 1 20.00% samples 0.8 15.00% 0.6 10.00% 0.4  Uses those chunks to 5.00% 0.2 0 0.00% dynamically scale voltage Time G ap  Slow reaction times 2 3 0 . 0 0 % V o lta g e 2 7 . 0 0 % 1 . 8 E rro r R a te 2 4 . 0 0 % 1 . 6 2 1 . 0 0 % Supply Voltage Error Rate 1 8 . 0 0 % 1 . 4 1 5 . 0 0 % 1 . 2 1 2 . 0 0 % 9 .0 0 % 1 6 .0 0 % 0 . 8 3 .0 0 % 0 . 6 0 .0 0 % T im e Figure 13. Adder Error Rate and Voltage Controller Response.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend