The Elusive Metric for The Elusive Metric for Low- -Power - - PowerPoint PPT Presentation

the elusive metric for the elusive metric for low power
SMART_READER_LITE
LIVE PREVIEW

The Elusive Metric for The Elusive Metric for Low- -Power - - PowerPoint PPT Presentation

The Elusive Metric for The Elusive Metric for Low- -Power Architecture Power Architecture Low Research Research Hsien- -Hsin Hsin Sean Sean Lee Lee Joshua B. Fryman Hsien A. Utku Diril Yuvraj S. Dhillon Center for


slide-1
SLIDE 1

The Elusive Metric for The Elusive Metric for Low Low-

  • Power Architecture

Power Architecture Research Research

Center for Experimental Research in Computer Systems Center for Experimental Research in Computer Systems Georgia Institute of Technology Georgia Institute of Technology Atlanta, GA 30332 Atlanta, GA 30332

Workshop for Complexity-Effective Design, San Diego, CA, 2003

Hsien Hsien-

  • Hsin

Hsin “ “Sean Sean” ” Lee Lee Joshua B. Fryman

  • A. Utku Diril

Yuvraj S. Dhillon

slide-2
SLIDE 2

2 WCED-03

Background Picture Background Picture

Energy-Delay product (EDP) [Gonzalez & Horowitz 96]

“Power” is meaningless (∝ frequency) “Energy per instruction” is elusive (∝ CV2) “Energy × Delay” (J/SPEC or J × IPC) is better Use Alpha-power model, Note that no “physical” meaning of EDP

Widespread adoption

De facto standard by community Metric for energy and complexity effectiveness

New architectural techniques have arrived

New hardware exploiting low-power opportunities Temperature-aware power detectors Voltage & Frequency Scaling Multi-threshold voltage

α

) V

  • (V

CV ED

th dd 3 dd

slide-3
SLIDE 3

3 WCED-03

Outline of the Talk Outline of the Talk

Potential pitfalls

Yeah, we all know, it is obvious…. but

Which “E” goes in ED product? Impact of new hardware (more transistors) Methodology matters in deep submicron

processes

Observations Summary

slide-4
SLIDE 4

4 WCED-03

Calculating ED Product Calculating ED Product

New architecture solutions save energy at the

expense of (insensitive) performance loss

A number of research results were reported in the

following manner:

Technique “X” for Data Cache

Reduce 50% energy of Data Cache Lose 20% IPC EDP = (1-0.5)×(1+0.2) = 0.60 ⇒ Very Energy efficient

Technique “Y” for Branch Predictor

Reduce 10% energy of Branch Predictor Lose 20% IPC EDP = (1-0.1)×(1+0.2) = 1.08 ⇒ Energy inefficient

slide-5
SLIDE 5

5 WCED-03

So What is E and What is D in EDP? So What is E and What is D in EDP?

Hypothetical black box

Battery (i.e. E) shared by ⇒

CPU, DRAM, chipsets, graphics,

TFT, Wi-Fi, HDD, flash disk

D typically account for some system effect

such as DRAM latency

Improvement proposed:

Remove 5% of E from flash disk No delay incurred

Is this a good design decision?

Flash disk is 10% of total E in system Improvement amounts to 0.5% system

impact

“In-the-noise” improvement Is the “complexity” worth the effort?

So, is EDP used in the right way? And

is EDP so important?

Battery flash 802.11 Gfx card C.S. DDR- DRAM HDD TFT Display

slide-6
SLIDE 6

6 WCED-03

Energy Efficiency: E versus D Energy Efficiency: E versus D

0.0001 0.001 0.01 0.1 1 10 100 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Esaved=99% Esaved=90% Esaved=58% Esaved=50% Esvaed=30% Esaved=10% Esaved=5%

Maxmum Delay Tolerance Power Distribution of a FU w.r.t. target system

slide-7
SLIDE 7

7 WCED-03

Example: Energy Efficiency: E vs. D Example: Energy Efficiency: E vs. D

0.0001 0.001 0.01 0.1 1 10 100 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Esaved=99% Esaved=90% Esaved=58% Esaved=50% Esvaed=30% Esaved=10% Esaved=5%

Maxmum Delay Tolerance Energy Distribution w.r.t. target system

Tolerate ~25% performance loss

slide-8
SLIDE 8

8 WCED-03

Using EDP: Pentium Pro Using EDP: Pentium Pro

Maximum Delay Tolerance Energy Saved for a functional unit u

Data Source: [Brooks

et al. 00]

Assume 100% for

CPU

40% IFU power

reduction can tolerate < 10% performance loss

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26 0.28 0.3 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 IFU (22%) IEU (14%) ROB, DCU (11.1%) RS, FPU, Global Clock (7.9%) RAT, MOB (6.3%) BTB (4.7%)

slide-9
SLIDE 9

9 WCED-03

But CPU is not 100% of a System But CPU is not 100% of a System

CPU=100% CPU=75% CPU=50% CPU=25% 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150

Energy Saving for a functional unit µ Energy Distribution of µ w.r.t. CPU only Maximum Delay Tolerance

slide-10
SLIDE 10

10 WCED-03

Case Study: Filter Cache Case Study: Filter Cache [Kin et. al 97,00]

[Kin et. al 97,00]

The Filter Cache design as reported

58% Energy savings in “L1 Caches” 21% IPC degradation ED product as shown

(1-0.58)(1+0.21) << 1 suggests this is a winning design

Question is “which E ?”

slide-11
SLIDE 11

11 WCED-03

Filter Cache: E Values Filter Cache: E Values

Maximum Delay Tolerance Energy distribution for a functional unit u wrt CPU only

Use StrongARM 110 43% (◊) energy by

Caches

27% in I-CACHE 16% in D-CACHE

CPU=X% stands for

X% of overall power drawn by CPU

Delay Tolerance

33% : CPU=100% 21% : CPU=70% 14% : CPU=50% 6% : CPU=25%

Not energy-efficient if

CPU < 70%

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 FilterCache CPU=100% CPU=70% CPU=50% CPU=25% FilterCache SA-110 (I$+D$=43%)

Esaved = 58% [Kin et al. 00]

FC slowdown 21%

slide-12
SLIDE 12

12 WCED-03

Rethinking EDP: Rethinking EDP: Switching Activity vs. New Hardware Switching Activity vs. New Hardware

Ignore leakage and short-circuit power Dynamic switching power is dominant The “E” would be below

T: Transistor count f: frequency

) ( ) ( T T f f a T f a P P V T C f a V C f a P

new ref dyn dyn dd g dd dyn

new ref avg

∆ + ⋅ ∆ + ⋅ ≥ ⋅ ⋅ ≥ ⋅ ⋅ ⋅ ⋅ = ⋅ ⋅ ⋅ =

2 2

slide-13
SLIDE 13

13 WCED-03

ED Variables ED Variables

The elegant ratio governing E… To include the application delay, D… Can be applied to Macromodeling to

determine the trade-off between transistor count and performance degradation

T f T f T T f f a a

new ref

∆ ∆ + ∆ + ∆ + ≥1

2

1 1       ∆ +       ∆ + ∆ + ≥ D D T T f f a a

new ref

slide-14
SLIDE 14

14 WCED-03

Impact of Additional Transistor Count Impact of Additional Transistor Count

5 10 15 20 25 30 35 40 45 50

  • 35
  • 30
  • 25
  • 20
  • 15
  • 10
  • 5

5 10 15 20 25 30 35 40 45 30% switching reduced 25% switching reduced 10% switching reduced

% Impact on T (given freq. unchanged) % Impact on T (given delay unchanged by frequency scaling % Impact on f % Impact on D

Given a new avg switching probability of new architecture LHS: Trading transistors with delay given no freq. scaling RHS: Delay recovered by freq. scaling

5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 30% switching reduced 25% switching reduced 10% switching reduced

slide-15
SLIDE 15

15 WCED-03

Role of Leakage Energy Role of Leakage Energy

As Deep Sub-Micron (DSM) era is upon us...

Source: Intel Corp. Custom Integrated Circuits Conference 2002 More than 50% power from leakage

Leakage ignorance could revert conclusion Early architecture evaluation

Leakage cannot be isolated from switching during evaluation Additional HW can be harmful

slide-16
SLIDE 16

16 WCED-03

Evaluate the Leakage when adding Evaluate the Leakage when adding HW in Early Stage of Arch Definition HW in Early Stage of Arch Definition

Example: Dual-speed pipeline [Pyreddy and

Tyson’01]

Idea appears to be plausible

Identify critical instructions [Tune et al 01] [Seng et al. 01] Two datapaths: fast and slow Critical inst → fast pipe; remainder to slow Slow pipe consumes less E than fast pipe

E.g. Multi-voltage supply, lower frequency

Let’s evaluate and assume:

N instructions; x → slow datapath (N-x) → fast datapath

How does leakage impact efficiency? What x value to achieve energy efficiency?

slow fast x% inst non-critical 1-x% inst critical

slide-17
SLIDE 17

17 WCED-03

Dual Dual Datapath Datapath Leakage Impact Leakage Impact

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 r = 0.9 r = 0.75 r = 0.60 r = 0.5 r = 0.4 r = 0.2

Minimum instructions to Slow Datapath Static-to-Total Energy Ratio

Today Soon to be

”r” is power

ratio of slow vs. fast

A small r ⇒

impair

performance

Slow path

becomes critical path

slide-18
SLIDE 18

18 WCED-03

Dual Dual Datapath Datapath Leakage Impact Leakage Impact

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 r = 0.9 r = 0.75 r = 0.60 r = 0.5 r = 0.4 r = 0.2

Minimum instructions to Slow Datapath Static-to-Total Energy Ratio

”r” is power

ratio of slow vs. fast

A small r ⇒

impair

performance

Slow path

becomes critical path

% of non-critical

inst needed for slow datapath

Today: ~17% Soon: ~40% Soon to be Today

slide-19
SLIDE 19

19 WCED-03

Energy Savings v. Energy Savings v. # Inst of # Inst of Slow Path Slow Path

  • 60
  • 55
  • 50
  • 45
  • 40
  • 35
  • 30
  • 25
  • 20
  • 15
  • 10
  • 5

5 10 15 20 0.1 0.2 0.3 0.4 Static-to-Total=1% Static-to-Total=20% Static-to-Total=33% Static-to-Total=50% Static-to-Total=67% Static-to-Total=75%

  • 60
  • 55
  • 50
  • 45
  • 40
  • 35
  • 30
  • 25
  • 20
  • 15
  • 10
  • 5

5 10 15 20 0.1 0.2 0.3 0.4 Static-to-Total=1% Static-to-Total=20% Static-to-Total=33% Static-to-Total=50% Static-to-Total=67% Static-to-Total=75%

r = 75% r = 50% X-axis : % of instructions to non-critical datapath Y-axis : % Energy saved If send 30% instructions to non-critical datapth

Only save ~5% energy (savings only on datapath) in DSM for r=75% Consume more energy in DSM for r=50%

Is the extra complexity paid off?

slide-20
SLIDE 20

20 WCED-03

Observations Observations

It is insufficient to examine ED product on

a microscale; the entire system must be examined.

Adding HW complexity for low energy

needs to be evaluated thoroughly

If the target process is not DSM, ED product

can be examined via simplified ratio analysis

For DSM process

Leakage must be accounted for in local and

system E

Additional HW could be an overkill

slide-21
SLIDE 21

21 WCED-03

Summary Summary

Low-power architecture research:

Metric  could be elusive Methodology 

More susceptible to reverse conclusions than

performance research, if not meticulously applied

2nd order effect today ⇒ 1st order effect tomorrow

“Complexity” can be ineffective in energy reduction

Purposes of our study

Provide analytical models and methodology for early

evaluation

No intention to invalidate prior results

WCED ≠ WDDD

Raise more discussions To get it right in education

slide-22
SLIDE 22

22 WCED-03

That That’ ’s All Folks ! s All Folks !