Motivation Physical implementation impacts power dissipation. - - PDF document

motivation
SMART_READER_LITE
LIVE PREVIEW

Motivation Physical implementation impacts power dissipation. - - PDF document

Motivation Physical implementation impacts power dissipation. Implementation Techniques for Reduced Power and Energy Professor Per Larsson-Edefors Computer Science and Engineering Chalmers University of Technology Implementation


slide-1
SLIDE 1

Implementation Techniques for Reduced Power and Energy

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 1

Computer Science and Engineering Chalmers University of Technology

Professor Per Larsson-Edefors

  • Physical implementation impacts power dissipation.

Motivation

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 2

  • MOSFET.
  • Voltage is applied on gate.
  • The electric field regulates

the material properties beneath gate

Field-Effect Transistor Basics

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 3

beneath gate.

  • Threshold voltage (VTH):

VTH is the gate voltage required to create a conducting channel.

Source: USC

Conducting channel can be created. Body electrode.

  • For one full transition, that is,

rising + falling output signal: Psw = E/T = (Q V)/T = (CV V) f.

  • To reduce switching power…

Switching Power Dissipation

d

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 4

g

– reduce supply voltage (VDD). – reduce clock frequency (f). – reduce signal activity (α). – reduce nodal capacitance (C).

Psw = f αC VDD

2

g s d

Energy, Speed, and Power

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 5

  • To maintain switching speed, VTH must follow VDD.

Voltage Scaling for Reduced Psw

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 6 Source: LPDE’09/Ch 1

slide-2
SLIDE 2
  • But … as VTH decreases,

the subthreshold leakage in

Unwanted Consequences of Scaling, 1

1

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 7

the subthreshold leakage in

  • ff-state transistors increases:

Pleak ∝ e-VTH.

Subthreshold Operation

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 8

S = 60 mV/decade

  • Physical dimensions

scaled down to the level when tunneling through

Unwanted Consequences of Scaling, 2

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 9

g g insulators occurs.

  • High-K gate insulator

and metal gate technology.

Projections on Power

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 10

  • In P ∝ e-VTH, VTH depends on

MOSFET terminal voltages.

Low-Power Technique 1: Body Biasing

1

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 11

MOSFET terminal voltages.

  • In an NMOSFET,

VTH decreases with an increasing body voltage (VB).

VB 1

  • Reverse body biasing (RBB)

⇒ VB < 0 V (NMOSFET) ⇒ VTH increases ⇒ leakage decreases.

Binning for Performance and Power

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 12

  • Forward body biasing (FBB)

⇒ VB > 0 V (NMOSFET) ⇒ VTH decreases ⇒ higher speed.

  • Performance and power

binning becomes possible.

Source: ABB’02

slide-3
SLIDE 3

Delay Distribution of Logic

Source: LPDE’09/Ch 4 Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 13

  • Logic paths exhibit different delays.
  • Critical paths must satisfy clock rate constraint ⇒

implementation must ensure gates are fast enough.

  • But what about the fast paths … can their

intrinsic speed be converted to power reductions?

Source: LPDE 09/Ch 4

Low-Power Technique 2: Multi-VT

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 14

  • Assign fast paths to slow transistors ⇒

use transistors with high VTH ⇒ leakage is reduced.

Source: LPDE’09/Ch 4

Match VDD to Performance Need

Source: LPDE’09/Ch 4 Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 15

  • First order delay ∝ 1/VDD and P ∝ VDD

2:

  • Reduce VDD for circuits that are not timing critical.
  • How many different VDD levels should be used?

Source: LPDE 09/Ch 4

Low-Power Technique 3: Multi-VDD

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 16 Dual- VDD ALU example from LPDE’09/Ch 4

  • Slack can be used for

power reductions:

Low-Power Technique 4: DVFS

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 17

power reductions: Dynamic Voltage and Frequency Scaling.

Read more in CATPE’08/Ch 3

Circuit Adaptation for DVFS

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 18

  • Aggressive VDD reduction causes timing violations!
  • Implement a feedback system that regulates speed,

in the process also handling variations.

Read more in CATPE’08/Ch 3.5

slide-4
SLIDE 4
  • Clock arrival times are hard to synchronize.

Example on Variations

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 19 Source: POWER’11

  • High clock rates or extremely compute-intensive

code expose timing issues.

Detection of Timing Failures

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 20 Source: ARM’11

Recap 1: Low-Power Techniques

  • Body biasing
  • Multi-VTH
  • Multi-VDD
  • DVFS
  • Implementation Techniques for Reduced Power and Energy, March 16, 2012

Page 21

  • Operand isolation
  • Factoring
  • Encoding
  • Clock gating
  • Power gating

32

Input Output

STATE- REG5 STATE- REG3 STATE-

Feedback

Distributed MUX Distributed State-Reg

MUX3 MUX5 MUX8 CRC5 CRC8

CRC16 32

Low-Power Technique 5: Operand Isolation

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 22

ModeSel [1:0] REG8 STATE- REG16 Enb Clk

Bus width: 3-bit; 5-bit; 8-bit; 16-bit; 32-bit; MUX8

CRC32

  • Reduce Psw by gating input signals

which are not needed.

Multiplier Circuit for Variable Data Width

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 23

  • 1x16-b mode ⇒ >60% power reduction.

Read more in CATPE’08/Ch 4.3

Low-Power Technique 6: Factoring

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 24

  • Reduce Psw by reordering gates to minimize the

switched capacitance of f αC VDD

2.

Source: Synopsys

slide-5
SLIDE 5

Low-Power Technique 7: Encoding

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 25

  • Encode data before bus transmission.
  • Above: Bus invert coding reduces 64 trans. to 53.

Read more in LPDE’09/Ch 6 and CATPE’08/Ch 4.12

Low-Power Technique 8: Clock Gating

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 26 Source: LPDE’09/Ch 8

Recap 2: Low-Power Techniques

  • Body biasing
  • Multi-VTH
  • Multi-VDD
  • DVFS
  • Implementation Techniques for Reduced Power and Energy, March 16, 2012

Page 27

  • Operand isolation
  • Factoring
  • Encoding
  • Clock gating
  • Power gating
  • Accelerators ⇒ reduce execution time + energy.

The Energy Perspective, 1

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 28

The Energy Perspective, 2

  • Accelerators reduce energy via

execution time reductions (E = T x P).

  • But accelerators also increase

circuit area and, probably, power dissipation.

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 29

  • Will the addition of an accelerator pay off?

This depends on the application(s) that will be executed!

  • Can we do something about the power overhead?

– Power gating ⇒ reduced static power. – Operand isolation/clock gating ⇒ reduced Psw.

Low-Power Technique 9: Power Gating

Power switch Enable

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 30

Logic circuit Virtual supply rail Power switch driver

slide-6
SLIDE 6

Power Gating of Execution Unit

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 31

Power-Gated FlexCore P&R

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 32

Power-Gated Multiplier

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 33

Identify Multiply Activity

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 34

Multiplier Utilization in Application

EEMBC Autocorrelation

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 35

EEMBC FFT

Overall Power Gains from Power Gating

  • Gray –

post synthesis

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 36

post synthesis

  • Black –

post P&R

slide-7
SLIDE 7

Low-Power Options – Pros and Cons

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 37

Source: Cadence

  • Early design decisions

yields higher power reductions than late

Power Reductions in Design Flow

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 38

decisions.

  • Important for system

architects to know what low-power techniques can be used.

Missed Opportunities?

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 39

  • Low-power techniques exist, but how do we make

use of them in complex systems?

  • Designer’s competence + EDA tools.

Conclusion

  • Quality of physical implementation is vital

to power and energy efficiency.

  • Different reasons for power dissipation and,

thus, different techniques to reduce power.

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 40

  • What relevant areas did I not discuss:

– Dedicated vs flexible hardware: General-purpose systems are power hungry. – To reduce power dissipation, you need to be well informed: Analysis is complex.

References

  • LPDE’09: Low Power Design Essentials, J. Rabaey, Springer, 2009.
  • ABB’02: “Adaptive Body Bias for Reducing Impacts of Die-to-Die and Within-Die

Parameter Variations on Microprocessor Frequency and Leakage”, J. Tschanz et. al, JSSC, Nov. 2002.

  • CATPE’08: Computer Architecture Techniques for Power-Efficiency, S. Kaxiras

and M. Martonosi, Morgan & Claypool, 2008.

Implementation Techniques for Reduced Power and Energy, March 16, 2012 Page 41

  • POWER’11: “POWER7™, a Highly Parallel, Scalable Multi-Core High End Server

Processor”, D. F. Wendel, et. al, JSSC, 2011.

  • ARM’11: “A Power-Efficient 32 bit ARM Processor Using Timing-Error Detection

and Correction for Transient-Error Tolerance and Adaptation to PVT Variation”, D. Bull, et. al, JSSC, 2011.

  • http://chipdesignmag.com/lpd/elmore/2011/07/21/top-5-reasons-for-power-

consumption-waste/

  • And many local Chalmers papers.