Q UESTIONS ? David.Snowdon@nicta.com.au http://ertos.nicta.com.au - - PowerPoint PPT Presentation

q uestions
SMART_READER_LITE
LIVE PREVIEW

Q UESTIONS ? David.Snowdon@nicta.com.au http://ertos.nicta.com.au - - PowerPoint PPT Presentation

R UN T IME P REDICTION O F P ERFORMANCE AND E NERGY WHEN F REQUENCY S CALING David Snowdon, Stefan Petters and Gernot Heiser David.Snowdon@nicta.com.au The imagination driving Australias ICT fu R UN T IME P REDICTION O F P ERFORMANCE AND E


slide-1
SLIDE 1

The imagination driving Australia’s ICT fu

RUN TIME PREDICTION OF PERFORMANCE AND ENERGY WHEN FREQUENCY SCALING

David Snowdon, Stefan Petters and Gernot Heiser David.Snowdon@nicta.com.au

slide-2
SLIDE 2

The imagination driving Australia’s ICT fu

RUN TIME PREDICTION OF PERFORMANCE AND ENERGY WHEN FREQUENCY SCALING

David Snowdon, Stefan Petters and Gernot Heiser David.Snowdon@nicta.com.au

➀ Motivation: problems with DVFS ➁ Modelling performance and energy ➂ Evaluation ➃ Future work

slide-3
SLIDE 3

MOTIVATION

The imagination driving Australia’s ICT fu

➜ Embedded systems are often restricted by battery life. ➜ Total system energy consumption.

Our work looks at effective DVFS in real systems

! " # $

2 RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-4
SLIDE 4

MOTIVATION

The imagination driving Australia’s ICT fu

Theory: E ∝ V 2

50 100 150 200 250 300 350 400 450 500 Normalised Total Energy CPU Frequency (MHz)

3 RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-5
SLIDE 5

MOTIVATION

The imagination driving Australia’s ICT fu

Practice: (PXA255 based system)

50 100 150 200 250 300 350 400 450 500 Normalised Total Energy CPU Frequency (MHz)

4 RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-6
SLIDE 6

MOTIVATION

The imagination driving Australia’s ICT fu

Why?:

5 RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-7
SLIDE 7

MOTIVATION

The imagination driving Australia’s ICT fu

Why?: Simple models

➜ P ∝ fV 2 ➜ T ∝ 1

f

➜ V = F(f) and F monotonically increasing

Modern systems aren’t simple!

➜ Varying number of switches (workload specific!) ➜ Multiple frequency domains ➜ Frequency independent (static) power

5-A RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-8
SLIDE 8

MOTIVATION

The imagination driving Australia’s ICT fu

Why?: Simple models

➜ P ∝ fV 2 ➜ T ∝ 1

f

➜ V = F(f) and F monotonically increasing

Modern systems aren’t simple!

➜ Varying number of switches (workload specific!) ➜ Multiple frequency domains ➜ Frequency independent (static) power

We want to be able to deal with these nuances

5-B RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-9
SLIDE 9

EXECUTION TIME MODEL

The imagination driving Australia’s ICT fu

➜ Simple execution time model: T ∝

1 fcpu

➜ i.e. Constant cycles ➜ Problem: Ignores execution time independent of CPU-clock

6 RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-10
SLIDE 10

EXECUTION TIME MODEL

The imagination driving Australia’s ICT fu

➜ Simple execution time model: T ∝

1 fcpu

➜ i.e. Constant cycles ➜ Problem: Ignores execution time independent of CPU-clock

0.8 1 1.2 1.4 1.6 1.8 2 50 100 150 200 250 300 350 400 450 500 Normalised cycles CPU Frequency (MHz) bitcnt 0.8 1 1.2 1.4 1.6 1.8 2 50 100 150 200 250 300 350 400 450 500 Normalised Cycles CPU Frequency (MHz) gzip

Implicaton:

➜ Memory-bound performance is less dependent on CPU frequency

6-A RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-11
SLIDE 11

EXECUTION TIME MODEL

The imagination driving Australia’s ICT fu

➜ Task: predict the execution time of a workload in an arbitrary system configuration ➜ Low overhead, cross-architectural, dynamic applications

7 RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-12
SLIDE 12

EXECUTION TIME MODEL

The imagination driving Australia’s ICT fu

➜ Task: predict the execution time of a workload in an arbitrary system configuration ➜ Low overhead, cross-architectural, dynamic applications

T = Ccpu fcpu + Cbus fbus + Cmem fmem + Cio fio + . . .

7-A RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-13
SLIDE 13

EXECUTION TIME MODEL

The imagination driving Australia’s ICT fu

➜ Task: predict the execution time of a workload in an arbitrary system configuration ➜ Low overhead, cross-architectural, dynamic applications

T = Ccpu fcpu + Cbus fbus + Cmem fmem + Cio fio + . . . Cx: characterise a-priori, or online using performance counters Cbus = α1PMC1 + α2PMC2 + . . . Cmem = β1PMC1 + β2PMC2 + . . . (Ccpu inferred from the other results)

7-B RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-14
SLIDE 14

EXECUTION TIME MODEL

The imagination driving Australia’s ICT fu

➜ Task: predict the execution time of a workload in an arbitrary system configuration ➜ Low overhead, cross-architectural, dynamic applications

T = Ccpu fcpu + Cbus fbus + Cmem fmem + Cio fio + . . . Cx: characterise a-priori, or online using performance counters Cbus = α1PMC1 + α2PMC2 + . . . Cmem = β1PMC1 + β2PMC2 + . . . (Ccpu inferred from the other results)

➜ 2-parameter: avg 1.7%, max 7% ➜ CPU frequency only: avg 10%, max 36%

7-C RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-15
SLIDE 15

POWER MODEL

The imagination driving Australia’s ICT fu

➜ Simple CMOS model: P ∝ fV 2

Problems:

➜ System power ➜ Static power and leakage ➜ Multiple frequency/voltage domains ➜ Temperature dependence ➜ Conversion inefficiencies

8 RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-16
SLIDE 16

POWER MODEL

The imagination driving Australia’s ICT fu

➜ Simple CMOS model: P ∝ fV 2

Problems:

➜ System power ➜ Static power and leakage ➜ Multiple frequency/voltage domains ➜ Temperature dependence ➜ Conversion inefficiencies

A (slightly more) realistic model: P =

N

  • n=0

CnfnV 2

n + Pstatic 8-A RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-17
SLIDE 17

POWER MODEL

The imagination driving Australia’s ICT fu

The interaction of run-time and static power:

➜ Dynamic energy increases as frequency increases ➜ Static energy decreases as frequency increases

Etotal = Pdyn∆t + Pstatic∆t

Energy CPU Frequency Etotal(f) Edyn(f) Estatic(f)

9 RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-18
SLIDE 18

POWER MODEL

The imagination driving Australia’s ICT fu

Power/Energy model principles:

➜ Events each use an amount of energy ➜ An event may use energy in more than one voltage domain

For our system: Eevents = V 2

cpu(α0PMC 0+· · ·+αmPMC m)+β0PMC 0+· · ·+βmPMC m 10 RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-19
SLIDE 19

POWER MODEL

The imagination driving Australia’s ICT fu

Power/Energy model principles:

➜ Events each use an amount of energy ➜ An event may use energy in more than one voltage domain ➜ Clocks cycles count as events

For our system: Eevents = V 2

cpu(α0PMC 0+· · ·+αmPMC m)+β0PMC 0+· · ·+βmPMC m

Efreqs = V 2

cpu(γ1fcpu+γ2fbus+γ3fmem)∆t+(γ4fcpu+γ5fbus+γ6fmem)∆t 10-A RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-20
SLIDE 20

POWER MODEL

The imagination driving Australia’s ICT fu

Power/Energy model principles:

➜ Events each use an amount of energy ➜ An event may use energy in more than one voltage domain ➜ Clocks cycles count as events ➜ Static power models power not related to events or voltages. ➜ Constant IO power for the benchmarks tested.

For our system: Eevents = V 2

cpu(α0PMC 0+· · ·+αmPMC m)+β0PMC 0+· · ·+βmPMC m

Efreqs = V 2

cpu(γ1fcpu+γ2fbus+γ3fmem)∆t+(γ4fcpu+γ5fbus+γ6fmem)∆t

Estatic = Pstatic∆t

10-B RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-21
SLIDE 21

POWER MODEL

The imagination driving Australia’s ICT fu

Parameter selection:

➜ Systematically picking the best model for N counters ➜ Least-squares regression finds the coefficients

11 RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-22
SLIDE 22

EVALUATION

The imagination driving Australia’s ICT fu

➜ Typical embedded platform (PLEB 2, XScale based) ➜ Cycle counter, 2 performance counters, 13 events

12 RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-23
SLIDE 23

EVALUATION

The imagination driving Australia’s ICT fu

➜ Typical embedded platform (PLEB 2, XScale based) ➜ Cycle counter, 2 performance counters, 13 events ➜ 37 benchmarks run to completion at each setpoint for all frequency settings ➜ 22 frequency setpoints with different fcpu, fbus and fmem ➜ Voltage varied to three settings for each frequency

12-A RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-24
SLIDE 24

EVALUATION

The imagination driving Australia’s ICT fu

➜ Typical embedded platform (PLEB 2, XScale based) ➜ Cycle counter, 2 performance counters, 13 events ➜ 37 benchmarks run to completion at each setpoint for all frequency settings ➜ 22 frequency setpoints with different fcpu, fbus and fmem ➜ Voltage varied to three settings for each frequency ➜ Measurements: Cycles, Frequencies, Performance counters, Energy ➜ Benchmarks were partitioned for calibration and validation

12-B RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-25
SLIDE 25

EVALUATION

The imagination driving Australia’s ICT fu

R2 (Intercept) v*v*fcpu v*fcpu fcpu v*v*fbus fbus v*v*fmem fmem v*v*PMC0/t v*v*PMC1/t v*v*PMC2/t v*v*PMC3/t v*v*PMC4/t v*v*PMC5/t v*v*PMC6/t v*v*PMC7/t v*v*PMC8/t v*v*PMC9/t v*v*PMC10/t v*v*PMC11/t v*v*PMC12/t v*v*PMC13/t PMC0/t PMC1/t PMC2/t PMC3/t PMC4/t PMC5/t PMC6/t PMC7/t PMC8/t PMC9/t PMC10/t PMC11/t PMC12/t PMC13/t 0.92 0.96 0.98 0.98 0.99 0.99 0.99 0.99 0.99 0.99 0.99 0.99 1 1 1 1 1 1 1 1

13 RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-26
SLIDE 26

EVALUATION

The imagination driving Australia’s ICT fu

Power error data (using a measured run-time): Counters Param. R2 Max Err (%) Avg Err (%) 1 4 0.983 7.5 2.1 2 5 0.987 6.9 2.3 3 6 0.990 4.8 1.3 4 7 0.992 3.8 1.2 5 8 0.993 3.7 0.9 6 9 0.994 2.9 0.9 6 11 0.995 2.7 0.8

14 RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-27
SLIDE 27

EVALUATION

The imagination driving Australia’s ICT fu

Combining with our performance model:

➜ Max: 4.9%, Avg: 1.5% — (Four counters)

15 RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-28
SLIDE 28

CONCLUSIONS

The imagination driving Australia’s ICT fu

Conclusions:

➜ A time estimate is essential for an energy estimate ➜ Our method: Low overhead, accurate, generally applicable ➜ Parameter selection is a useful tool for analysing power ➜ Energy can be saved by changing the memory/bus frequency

16 RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-29
SLIDE 29

CONCLUSIONS

The imagination driving Australia’s ICT fu

Conclusions:

➜ A time estimate is essential for an energy estimate ➜ Our method: Low overhead, accurate, generally applicable ➜ Parameter selection is a useful tool for analysing power ➜ Energy can be saved by changing the memory/bus frequency

Future work:

➜ Temperature, IO, interrupts and DMA are unaccounted for. ➜ Investigate the effectiveness for other platforms. ➜ Develop frequency switching policies: ➜ Model frequency switching overheads, sleep state energy ➜ Operating system and application knowledge

16-A RUN TIME PREDICTION OF PERFORMANCE AND. . .

slide-30
SLIDE 30

QUESTIONS?

The imagination driving Australia’s ICT fu

QUESTIONS?

David.Snowdon@nicta.com.au http://ertos.nicta.com.au

17 QUESTIONS?

slide-31
SLIDE 31

AVAILABLE PMC EVENTS

The imagination driving Australia’s ICT fu PMC Description 0x0 ICache miss 0x1 ICache stall cycles 0x2 Data dependency stalls 0x3 ITLB miss 0x4 DTLB miss 0x5 Branch instruction executed 0x6 Branch mispredicted 0x7 Instruction executed 0x8 DCache buffer stall cycles 0x9 DCache buffer stall 0xa DCache access 0xb DCache miss 0xc DCache write-back 0xd Software changed the PC 18 QUESTIONS?

slide-32
SLIDE 32

AVAILABLE FREQUENCY SETPOINTS

The imagination driving Australia’s ICT fu

fcpu (MHz) fbus (MHz) fmem (MHz) 1 99.531 49.766 99.531 2 117.964 58.981 117.964 3 132.71 66.354 132.71 4 149.299 49.766 99.531 5 176.946 58.981 117.964 6 199.064 49.766 99.531 7 199.064 66.354 132.71 8 199.064 99.531 99.531 9 235.929 58.981 117.964 10 235.929 117.964 117.964 11 265.420 66.354 132.710 12 265.420 132.710 132.710 13 298.598 49.766 99.531 14 298.598 99.531 99.531 15 353.894 58.981 117.964 16 353.894 117.964 117.964 17 398.131 66.354 132.71 18 398.131 99.531 99.531 19 398.131 132.71 132.71 20 398.131 199.064 99.531 21 471.858 117.964 117.964 22 471.858 235.929 117.964

19 QUESTIONS?

slide-33
SLIDE 33

ERROR DISTRIBUTION

The imagination driving Australia’s ICT fu

  • 5
  • 4
  • 3
  • 2
  • 1

1 2 3 4 5 50 100 150 200 250 300 Energy estimation error (%) Actual Energy (J)

20 QUESTIONS?

slide-34
SLIDE 34

FIXED MEMORY FREQUENCY

The imagination driving Australia’s ICT fu

1 1.5 2 2.5 3 100 150 200 250 300 350 400 Normalised Time CPU Core Frequency (MHz) Bus: 66.4MHz, Memory: 132.7MHz

21 QUESTIONS?