Individual Voltage Scaling in Logic and Memory Circuits towards Runtime Energy Optimization in Processors Jun Shiomi, Tohru Ishihara, Hidetoshi Onodera Graduate School of Informatics, Kyoto University, Japan 1
Energy Reduction by Dynamic Voltage Scaling Threshold voltage tuning ( π th ) Supply voltage tuning ( π DD ) DVFS: Dynamic Voltage and ABB: Adaptive Body Biasing Frequency Scaling Delay Delay Energy Energy Threshold voltage ( π th ) Supply voltage ( π DD ) Dynamic energy Static energy DD - and π th -tuning technique for energy minimization π 2
Minimum Energy Point Tracking (MEP Tracking) Energy minimization by voltage scaling under a given frequency 1.2 Supply Voltage [V] MEPT example: Renesas SOTB 65-nm 1.0 140 pJ Cell-based memory 0.8 90 pJ Performance contour 0.6 0.4 - 0.5 - 1.0 - 1.5 - 2.0 0 Body Bias [V] Small π Large π th th Minimum Energy Point: MEP (Best combination of π DD and π th ) Target: MEP tracking technique for processors 3
Activity Factor Dependency of MEP Curves (Activity πππ% β ππ% ) Issue: MEPs heavily depend on activity factors (toggle rates) 1.2 Optimized Supply Voltage [V] 12 pJ 1.0 0.8 Performance contour 0.6 Unoptimized 0.4 20 pJ - 0.5 - 1.0 - 1.5 - 2.0 0 Body Bias [V] Small π Large π th th οΌ Activity factor: Important parameter determining MEPs 4
Overview of This Work MEP with 10% activity β On-chip memory 1.2 Supply Voltage [V] 1.0 0.8 Performance contour 0.6 MEP with 100% activity 0.4 β Logic circuits - 0.5 - 1.0 - 1.5 - 2.0 0 Body Bias [V] Small π Large π th th οΌ Individual voltage scaling problem in logic and memory circuits οΌ Heuristic algorithm for runtime optimization 5
Outline β’ Background β’ Individual Voltage Scaling Problem β’ Silicon Measurement β’ Conclusion 6
(Existing) Uniform Voltage Scaling Problem MEP curve Circuit energy πΉ min Performance contour DD for πΈ = πΈ 0 s. t. πΈ β€ πΈ 0 π Target performance π DD , π th β β Circuit delay Solution π th β’ Existing approach: Runtime MEP tracking [5] οΌ Tunes π DD and π th iteratively Initial point DD οΌ Requires only simple circuits π οΌ Enables to track MEPs at runtime even if πΈ = πΈ 0 target performance Finish π th temperature dynamically change Energy & delay monitoring activity 7 (MEP check)
Individual Voltage Scaling Problem π π DD,M π π DD,L th,M th,L πΉ L + πΉ M min Memory Logic πΈ L + πΈ M β€ πΈ 0 s. t. πΈ M πΈ L π DD,L , π th,L , π DD,M , π th,M β β Constraint πΈ 0 L No runtime algorithms due to complex delay assignment between πΈ L and πΈ M Logic Memory Power Power Voltage scaling in logic Huge energy saving Voltage boost in mem. Delay Delay πΈ 0 πΈ 0 This work: Heuristic algorithm for runtime voltage scaling 8
Various Strategies in Uniform Voltage Scaling Delay contour ( πΈ L + πΈ M = πΈ 0 ) DD π Memory MEP ( πΉ M min.) Processor MEP ( πΉ L + πΉ M min.) Logic MEP ( πΉ L min.) π th πΉ L optimized, but πΉ M NOT optimized πΉ L , πΉ M balanced β Solution in uniform voltage scaling 9
Concept of the Proposed Heuristic Algorithm Delay contour ( πΈ L + πΈ M = πΈ 0 ) DD π Memory MEP ( πΉ M min.) Processor MEP ( πΉ L + πΉ M min.) Logic MEP ( πΉ L min.) π th Logic voltages ( π DD,L , π th,L ) Memory voltages ( π DD,M , π th,M ) Point: πΈ L and πΈ M are constant over the delay contour ( ) οΌ Enable local minimum energy point operation 10
Simple Heuristic Algorithm for Individual Voltage Scaling Logic MEP Step 1 Logic Energy Optimization Delay contour DD,M 1. Uniform voltage tuning in Logic & Mem. πΈ L + πΈ M = πΈ 0 Init. point (i.e., π DD,M & π th,M ) DD,L = π th,L = π DD,L = π Enables to apply existing techniques Mem. MEP 2. Find logic MEP ( ) π π th,L = π th,M Step 2 Memory Energy Optimization 1. Tune only mem. voltages ( π DD,M & π th,M ) Tune only mem. voltages DD,M 2. Find memory MEP ( ) DD,L β π οΌ Enable runtime energy optimization Fix logic voltages π οΌ Local minimum energy point operation π th,L β π 11 th,M
Outline β’ Background β’ Individual Voltage Scaling Problem β’ Silicon Measurement β’ Conclusion 12
Case Study: 32-bit RISC Processor Target β’ Renesas SOTB 65-nm β’ On-chip memory - 4 kB I-Cache + TAG - 8 kB I-SPM - 16 kB D-SPM οΌ Standard-cell based memory Logic ( π DD,L ) Mem. ( π DD,M ) Main memory (DCT loop) Supply voltage & body bias I/O β’ Individual in logic and mem. - Body bias for nMOSFETs in logic circuits is fixed at GND β’ No level converters between logic and memory 13 Body bias π π BN,M π BP,L BP,M
Activity Factor Dependency of Memory MEPs ( π BB,M ) DD,L = π DD,M & π BB,L = π Fmax contour of the fabricated processor [MHz] 1.2 π· π = π. ππ π· π = π. π Supply Voltage [V] π· π = π π½ M : Memory activity factor 1.0 1 Activate in each clock cycle 0.8 0.1 Activate once in 10 clock cycles 0.01 0.6 Activate once in 100 clock cycles Logic 0.4 MEP -0.5 -1.5 0 -1.0 -2.0 Body Bias [V] Small π Large π th th οΌ MEPs move to the upper right as activity π½ M decreases 14
Measurement Results of the Proposed Algorithm ( π½ M = 0.01 ) Fmax contour of the fabricated processor [MHz] 1.2 Step 1 Supply Voltage [V] 1.0 1. Uniform voltage scaling 2. Find logic MEP ( ) 0.8 Step 2 1. Fix logic voltages @ 0.6 Mem. 2. Tune only mem. voltage & MEP Logic find mem. MEP ( ) 0.4 MEP -0.5 -1.5 0 -1.0 -2.0 Body Bias [V] Small π Large π th th οΌ Individual voltage tuning achieved by the proposed algorithm 15
Energy Reduction by Individual Voltage Scaling οΌ π½ M = 0.01 οΌ 100 Memory static energy Memory dynamic energy 80 Logic static energy β10% Logic dynamic energy Total Energy 60 β13% Consumption β15% [pJ / cycle] β16% 40 20 0 Fmax 4 MHz 8 MHz 20 MHz 29 MHz 16 οΌ Up to 16% energy reduction by individual voltage scaling
Conclusion & Future Work Conclusion ⒠Individual voltage scaling problem in logic and memory presented - Key: Activity factor gap between logic and memory circuits ⒠A heuristic algorithm proposed for runtime energy optimization ⒠Case study using RSIC processors in 65-nm process - Up to 16% energy reduction compared with uniform voltage scaling Future work ⒠Energy overhead compared with the global solution ⒠Energy overhead introduced by fine- grained voltage tuning, etc⦠17
18
Energy Reduction by Individual Voltage Scaling οΌ π½ M = 0.1 οΌ 100 Memory static energy β5% Memory dynamic energy 80 Logic static energy Logic dynamic energy Total Energy β7% 60 Consumption β11% β9% [pJ / cycle] 40 20 0 Fmax 4 MHz 8 MHz 20 MHz 29 MHz οΌ No energy improvement when π½ M = 1 19
Definition of π½ M On-chip memory property β’ No clock gating circuits β’ Dynamic energy consumption @ each clock cycle Implemented on-chip memory has large activity factor β’ Parameter π½ M implemented to scale activity factor Measured value Evaluated value Γ π½ M Dynamic energy Static energy Measured Leakage 20 memory energy energy
System-Level Optimization Problem The problem can be abstracted to system-level optimization CPU execution time π DD , π th ( β πΈ M ) CPU Low activity ( β Memory) Time DSP execution time ( β πΈ L ) DSP High activity ( β Logic) Time π DD , π th Deadline ( β πΈ 0 ) Future work: Applying the heuristic to system-level optimization 21
Recommend
More recommend