david.skellern@nicta. com.au From imagination to impact 1 - - PDF document

david skellern nicta com au
SMART_READER_LITE
LIVE PREVIEW

david.skellern@nicta. com.au From imagination to impact 1 - - PDF document

From imagination to impact david.skellern@nicta. com.au From imagination to impact 1 Saturday, 4 April 2009 From imagination to impact david.skellern@nicta. com.au From imagination to impact 1 Saturday, 4 April 2009 David Snowdon,


slide-1
SLIDE 1
  • From imagination to impact

david.skellern@nicta. com.au

From imagination to impact 1 Saturday, 4 April 2009
slide-2
SLIDE 2
  • From imagination to impact

david.skellern@nicta. com.au

From imagination to impact 1 Saturday, 4 April 2009
slide-3
SLIDE 3

Koala

David Snowdon, Etienne Le Sueur, Stefan Petters and Gernot Heiser A platform for Operating-System Level Power-Management 2 Saturday, 4 April 2009 * This is a talk about Koala -- a platform which forgets heuristic-based power management techniques, and uses empirical models to allow real trade-offs between reduced performance and energy savings. It solves a serious problems facing power management researchers -- that platforms don’t behave the way they’re supposed to!
slide-4
SLIDE 4

Talk outline

3 Saturday, 4 April 2009
  • Hardware is really complicated
  • over-simplifying assumptions
Koala is workload-aware, uses realistic models and has practical policies. Need to say that energy is different to power. You need to save energy, but you need to manage the power. Energy = Power x Time. Need to say that Koala manages CPU and memory energy.
slide-5
SLIDE 5
  • 1. Energy is really important!

Talk outline

3 Saturday, 4 April 2009
  • Hardware is really complicated
  • over-simplifying assumptions
Koala is workload-aware, uses realistic models and has practical policies. Need to say that energy is different to power. You need to save energy, but you need to manage the power. Energy = Power x Time. Need to say that Koala manages CPU and memory energy.
slide-6
SLIDE 6
  • 1. Energy is really important!
  • 2. PM is really hard.

Talk outline

3 Saturday, 4 April 2009
  • Hardware is really complicated
  • over-simplifying assumptions
Koala is workload-aware, uses realistic models and has practical policies. Need to say that energy is different to power. You need to save energy, but you need to manage the power. Energy = Power x Time. Need to say that Koala manages CPU and memory energy.
slide-7
SLIDE 7
  • 1. Energy is really important!
  • 2. PM is really hard.
  • 3. Koala helps... how?

Workload-aware Realistic models Practical policies Talk outline

3 Saturday, 4 April 2009
  • Hardware is really complicated
  • over-simplifying assumptions
Koala is workload-aware, uses realistic models and has practical policies. Need to say that energy is different to power. You need to save energy, but you need to manage the power. Energy = Power x Time. Need to say that Koala manages CPU and memory energy.
slide-8
SLIDE 8

The importance of energy...

4 Saturday, 4 April 2009
  • Energy effjciency is really important!
  • Each of you probably has a mobile phone in your pocket, and in this crowd, they’re probably smart phones.
  • Mobile devices are energy-conscious for two reasons
  • Thermal dissiption -- the devices are small and don’t have space for heatsinks/fans.
  • Battery lifetime -- power limits the number of operations that can be performed, which limits potential
applications. What about the cost of energy?
  • Using energy has both an environmental and a monetary impact.
  • A server has about the same CO2 emissions as 1.5 cars! (\cite[Reduce Energy Costs and Go Green With
VMware Green IT Solutions ]
  • Energy-star compliance has become a big issue.
  • VMWARE: In the United States alone, datacenters consumed $4.5 billion worth of electricity in 2006.
  • VMWARE: 4 Tons of CO2 per server per year.
For all of these reasons we consider energy effjciency to be one of the premier problems in computer science and engineering.
slide-9
SLIDE 9

The importance of energy...

4 Saturday, 4 April 2009
  • Energy effjciency is really important!
  • Each of you probably has a mobile phone in your pocket, and in this crowd, they’re probably smart phones.
  • Mobile devices are energy-conscious for two reasons
  • Thermal dissiption -- the devices are small and don’t have space for heatsinks/fans.
  • Battery lifetime -- power limits the number of operations that can be performed, which limits potential
applications. What about the cost of energy?
  • Using energy has both an environmental and a monetary impact.
  • A server has about the same CO2 emissions as 1.5 cars! (\cite[Reduce Energy Costs and Go Green With
VMware Green IT Solutions ]
  • Energy-star compliance has become a big issue.
  • VMWARE: In the United States alone, datacenters consumed $4.5 billion worth of electricity in 2006.
  • VMWARE: 4 Tons of CO2 per server per year.
For all of these reasons we consider energy effjciency to be one of the premier problems in computer science and engineering.
slide-10
SLIDE 10

The importance of energy...

4 Saturday, 4 April 2009
  • Energy effjciency is really important!
  • Each of you probably has a mobile phone in your pocket, and in this crowd, they’re probably smart phones.
  • Mobile devices are energy-conscious for two reasons
  • Thermal dissiption -- the devices are small and don’t have space for heatsinks/fans.
  • Battery lifetime -- power limits the number of operations that can be performed, which limits potential
applications. What about the cost of energy?
  • Using energy has both an environmental and a monetary impact.
  • A server has about the same CO2 emissions as 1.5 cars! (\cite[Reduce Energy Costs and Go Green With
VMware Green IT Solutions ]
  • Energy-star compliance has become a big issue.
  • VMWARE: In the United States alone, datacenters consumed $4.5 billion worth of electricity in 2006.
  • VMWARE: 4 Tons of CO2 per server per year.
For all of these reasons we consider energy effjciency to be one of the premier problems in computer science and engineering.
slide-11
SLIDE 11

Power Management

5 Saturday, 4 April 2009
  • Power management is really all about controlling power-related hardware knobs in order to achieve some goal.
  • Some of those knobs are... (list knobs).
  • These knobs trade performance against power.
  • To limit our scope, we’re looking at one of these hardware controlled knobs -- DVFS -- but there’s no reason that, in the
future, this approach couldn’t be applied to other knobs which afgect power/performance.
  • These knobs are normally controlled in naiive ways: in Linux for example, there are two main CPU power management schemes
  • - ondemand is applies to DVFS. In academic terms, this is based on Mark Weiser’s 1994 OSDI paper. That work was good, and
applied well to systems at the time, but modern computers don’t work in the same way.
  • But there is so much academic research!? Why doesn’t it ever get used? Answer: it could be, it just needs to be made practical.
The answer is Koala. Koala bridges the gap between the real world and the academic world.
  • Why are we using 1994 technology to run computers in 2009? They’re simply not the same devices that they were.
slide-12
SLIDE 12

Power Management

5
  • Power management:

– Controlling hardware knobs

Saturday, 4 April 2009
  • Power management is really all about controlling power-related hardware knobs in order to achieve some goal.
  • Some of those knobs are... (list knobs).
  • These knobs trade performance against power.
  • To limit our scope, we’re looking at one of these hardware controlled knobs -- DVFS -- but there’s no reason that, in the
future, this approach couldn’t be applied to other knobs which afgect power/performance.
  • These knobs are normally controlled in naiive ways: in Linux for example, there are two main CPU power management schemes
  • - ondemand is applies to DVFS. In academic terms, this is based on Mark Weiser’s 1994 OSDI paper. That work was good, and
applied well to systems at the time, but modern computers don’t work in the same way.
  • But there is so much academic research!? Why doesn’t it ever get used? Answer: it could be, it just needs to be made practical.
The answer is Koala. Koala bridges the gap between the real world and the academic world.
  • Why are we using 1994 technology to run computers in 2009? They’re simply not the same devices that they were.
slide-13
SLIDE 13

Power Management

5
  • Power management:

– Controlling hardware knobs

Sleep States

Saturday, 4 April 2009
  • Power management is really all about controlling power-related hardware knobs in order to achieve some goal.
  • Some of those knobs are... (list knobs).
  • These knobs trade performance against power.
  • To limit our scope, we’re looking at one of these hardware controlled knobs -- DVFS -- but there’s no reason that, in the
future, this approach couldn’t be applied to other knobs which afgect power/performance.
  • These knobs are normally controlled in naiive ways: in Linux for example, there are two main CPU power management schemes
  • - ondemand is applies to DVFS. In academic terms, this is based on Mark Weiser’s 1994 OSDI paper. That work was good, and
applied well to systems at the time, but modern computers don’t work in the same way.
  • But there is so much academic research!? Why doesn’t it ever get used? Answer: it could be, it just needs to be made practical.
The answer is Koala. Koala bridges the gap between the real world and the academic world.
  • Why are we using 1994 technology to run computers in 2009? They’re simply not the same devices that they were.
slide-14
SLIDE 14

Power Management

5
  • Power management:

– Controlling hardware knobs

Sleep States Dynamic Cache Sizing

Saturday, 4 April 2009
  • Power management is really all about controlling power-related hardware knobs in order to achieve some goal.
  • Some of those knobs are... (list knobs).
  • These knobs trade performance against power.
  • To limit our scope, we’re looking at one of these hardware controlled knobs -- DVFS -- but there’s no reason that, in the
future, this approach couldn’t be applied to other knobs which afgect power/performance.
  • These knobs are normally controlled in naiive ways: in Linux for example, there are two main CPU power management schemes
  • - ondemand is applies to DVFS. In academic terms, this is based on Mark Weiser’s 1994 OSDI paper. That work was good, and
applied well to systems at the time, but modern computers don’t work in the same way.
  • But there is so much academic research!? Why doesn’t it ever get used? Answer: it could be, it just needs to be made practical.
The answer is Koala. Koala bridges the gap between the real world and the academic world.
  • Why are we using 1994 technology to run computers in 2009? They’re simply not the same devices that they were.
slide-15
SLIDE 15

Power Management

5
  • Power management:

– Controlling hardware knobs

Sleep States Dynamic Cache Sizing Frequency/Voltage Scaling (DVFS)

Saturday, 4 April 2009
  • Power management is really all about controlling power-related hardware knobs in order to achieve some goal.
  • Some of those knobs are... (list knobs).
  • These knobs trade performance against power.
  • To limit our scope, we’re looking at one of these hardware controlled knobs -- DVFS -- but there’s no reason that, in the
future, this approach couldn’t be applied to other knobs which afgect power/performance.
  • These knobs are normally controlled in naiive ways: in Linux for example, there are two main CPU power management schemes
  • - ondemand is applies to DVFS. In academic terms, this is based on Mark Weiser’s 1994 OSDI paper. That work was good, and
applied well to systems at the time, but modern computers don’t work in the same way.
  • But there is so much academic research!? Why doesn’t it ever get used? Answer: it could be, it just needs to be made practical.
The answer is Koala. Koala bridges the gap between the real world and the academic world.
  • Why are we using 1994 technology to run computers in 2009? They’re simply not the same devices that they were.
slide-16
SLIDE 16

Power Management

5
  • Power management:

– Controlling hardware knobs ➡ Performance vs. power

Sleep States Dynamic Cache Sizing Frequency/Voltage Scaling (DVFS)

Saturday, 4 April 2009
  • Power management is really all about controlling power-related hardware knobs in order to achieve some goal.
  • Some of those knobs are... (list knobs).
  • These knobs trade performance against power.
  • To limit our scope, we’re looking at one of these hardware controlled knobs -- DVFS -- but there’s no reason that, in the
future, this approach couldn’t be applied to other knobs which afgect power/performance.
  • These knobs are normally controlled in naiive ways: in Linux for example, there are two main CPU power management schemes
  • - ondemand is applies to DVFS. In academic terms, this is based on Mark Weiser’s 1994 OSDI paper. That work was good, and
applied well to systems at the time, but modern computers don’t work in the same way.
  • But there is so much academic research!? Why doesn’t it ever get used? Answer: it could be, it just needs to be made practical.
The answer is Koala. Koala bridges the gap between the real world and the academic world.
  • Why are we using 1994 technology to run computers in 2009? They’re simply not the same devices that they were.
slide-17
SLIDE 17

Power Management

5
  • Power management:

– Controlling hardware knobs ➡ Performance vs. power

Sleep States Dynamic Cache Sizing Frequency/Voltage Scaling (DVFS) Frequency/Voltage Scaling (DVFS)

Saturday, 4 April 2009
  • Power management is really all about controlling power-related hardware knobs in order to achieve some goal.
  • Some of those knobs are... (list knobs).
  • These knobs trade performance against power.
  • To limit our scope, we’re looking at one of these hardware controlled knobs -- DVFS -- but there’s no reason that, in the
future, this approach couldn’t be applied to other knobs which afgect power/performance.
  • These knobs are normally controlled in naiive ways: in Linux for example, there are two main CPU power management schemes
  • - ondemand is applies to DVFS. In academic terms, this is based on Mark Weiser’s 1994 OSDI paper. That work was good, and
applied well to systems at the time, but modern computers don’t work in the same way.
  • But there is so much academic research!? Why doesn’t it ever get used? Answer: it could be, it just needs to be made practical.
The answer is Koala. Koala bridges the gap between the real world and the academic world.
  • Why are we using 1994 technology to run computers in 2009? They’re simply not the same devices that they were.
slide-18
SLIDE 18

Power Management

5
  • Power management:

– Controlling hardware knobs ➡ Performance vs. power

Sleep States Dynamic Cache Sizing Frequency/Voltage Scaling (DVFS)

  • Power management practice

– Linux ondemand: keep

utilisation high, but not too high.

– cpuidle menu: choose

progressively lower sleep states.

Frequency/Voltage Scaling (DVFS)

Saturday, 4 April 2009
  • Power management is really all about controlling power-related hardware knobs in order to achieve some goal.
  • Some of those knobs are... (list knobs).
  • These knobs trade performance against power.
  • To limit our scope, we’re looking at one of these hardware controlled knobs -- DVFS -- but there’s no reason that, in the
future, this approach couldn’t be applied to other knobs which afgect power/performance.
  • These knobs are normally controlled in naiive ways: in Linux for example, there are two main CPU power management schemes
  • - ondemand is applies to DVFS. In academic terms, this is based on Mark Weiser’s 1994 OSDI paper. That work was good, and
applied well to systems at the time, but modern computers don’t work in the same way.
  • But there is so much academic research!? Why doesn’t it ever get used? Answer: it could be, it just needs to be made practical.
The answer is Koala. Koala bridges the gap between the real world and the academic world.
  • Why are we using 1994 technology to run computers in 2009? They’re simply not the same devices that they were.
slide-19
SLIDE 19

Power Management

5
  • Power management:

– Controlling hardware knobs ➡ Performance vs. power

Sleep States Dynamic Cache Sizing Frequency/Voltage Scaling (DVFS)

  • Power management theory

– Martin: battery

nonlinearities.

– Optimal scheduling – Highly refined speed

setting.

Frequency/Voltage Scaling (DVFS)

Saturday, 4 April 2009
  • Power management is really all about controlling power-related hardware knobs in order to achieve some goal.
  • Some of those knobs are... (list knobs).
  • These knobs trade performance against power.
  • To limit our scope, we’re looking at one of these hardware controlled knobs -- DVFS -- but there’s no reason that, in the
future, this approach couldn’t be applied to other knobs which afgect power/performance.
  • These knobs are normally controlled in naiive ways: in Linux for example, there are two main CPU power management schemes
  • - ondemand is applies to DVFS. In academic terms, this is based on Mark Weiser’s 1994 OSDI paper. That work was good, and
applied well to systems at the time, but modern computers don’t work in the same way.
  • But there is so much academic research!? Why doesn’t it ever get used? Answer: it could be, it just needs to be made practical.
The answer is Koala. Koala bridges the gap between the real world and the academic world.
  • Why are we using 1994 technology to run computers in 2009? They’re simply not the same devices that they were.
slide-20
SLIDE 20

Power Management

5
  • Power management:

– Controlling hardware knobs ➡ Performance vs. power

Sleep States Dynamic Cache Sizing Frequency/Voltage Scaling (DVFS)

  • Power management theory

– Martin: battery

nonlinearities.

– Optimal scheduling – Highly refined speed

setting.

Why aren’t these techniques used? Frequency/Voltage Scaling (DVFS)

Saturday, 4 April 2009
  • Power management is really all about controlling power-related hardware knobs in order to achieve some goal.
  • Some of those knobs are... (list knobs).
  • These knobs trade performance against power.
  • To limit our scope, we’re looking at one of these hardware controlled knobs -- DVFS -- but there’s no reason that, in the
future, this approach couldn’t be applied to other knobs which afgect power/performance.
  • These knobs are normally controlled in naiive ways: in Linux for example, there are two main CPU power management schemes
  • - ondemand is applies to DVFS. In academic terms, this is based on Mark Weiser’s 1994 OSDI paper. That work was good, and
applied well to systems at the time, but modern computers don’t work in the same way.
  • But there is so much academic research!? Why doesn’t it ever get used? Answer: it could be, it just needs to be made practical.
The answer is Koala. Koala bridges the gap between the real world and the academic world.
  • Why are we using 1994 technology to run computers in 2009? They’re simply not the same devices that they were.
slide-21
SLIDE 21

Dynamic Voltage and Frequency Scaling

6
  • Reduce performance,

lower the power

  • Assumption: constant

cycles

  • Assumption:

Vmin ∝ f P ∝ fV 2 T ∝ 1 f

E ∝ f

Vmin ∝ f P ∝ fV 2 T ∝ 1 f

E ∝ f 2

P ∝ fV 2

Saturday, 4 April 2009 * Both real-world, and lots of research, assume some fairly simple models. * These assumptions are good on a gate-level, but don’t work for complex systems where, on each cycle, the gates perform difgerent tasks. They ignore static power, memory, other efgects * Koala allows you to manage modern systems which have more complicated models. * I’m going to show a summary of the experiments we ran to investigate these assumptions. For real detail, see the paper. Need to be explicit that T is Time, not temperature.
slide-22
SLIDE 22

Dynamic Voltage and Frequency Scaling

6
  • Reduce performance,

lower the power

  • Assumption: constant

cycles

  • Assumption:

Vmin ∝ f P ∝ fV 2 T ∝ 1 f

E ∝ f

P ∝ fV 2

Saturday, 4 April 2009 * Both real-world, and lots of research, assume some fairly simple models. * These assumptions are good on a gate-level, but don’t work for complex systems where, on each cycle, the gates perform difgerent tasks. They ignore static power, memory, other efgects * Koala allows you to manage modern systems which have more complicated models. * I’m going to show a summary of the experiments we ran to investigate these assumptions. For real detail, see the paper. Need to be explicit that T is Time, not temperature.
slide-23
SLIDE 23 7

Performance

Assumed Latitude D600 - GZIP Latitude D600 - SWIM Saturday, 4 April 2009 * Lets look at performance. The commonly assumed models suggest that the number of CPU cycles for a workload is constant across frequency changes -- doubling the CPU frequency halves the execution time. * Looking at a CPU bound benchmark, this is indeed the case! The number of cycles stays nearly constant as the CPU frequency is increased. So far so good. * But let’s look at a memory bound program. The performance of memory isn’t improved by increased CPU frequency, so a memory-bound workload doesn’t really benefit from increased frequency. Therefore the number of cycles increases as the CPU clock runs faster. * Those cycles use extra energy, and the CPU voltage must be increased to support the higher frequency.
slide-24
SLIDE 24 7

Performance

600 900 1200 1500 1800 0.5 1 1.5 2 2.5 3 Cycles vs. CPU frequency for Dell Latitude D600 Cycles Ratio CPU Frequency (MHz) Assumed Latitude D600 - GZIP Latitude D600 - SWIM Saturday, 4 April 2009 * Lets look at performance. The commonly assumed models suggest that the number of CPU cycles for a workload is constant across frequency changes -- doubling the CPU frequency halves the execution time. * Looking at a CPU bound benchmark, this is indeed the case! The number of cycles stays nearly constant as the CPU frequency is increased. So far so good. * But let’s look at a memory bound program. The performance of memory isn’t improved by increased CPU frequency, so a memory-bound workload doesn’t really benefit from increased frequency. Therefore the number of cycles increases as the CPU clock runs faster. * Those cycles use extra energy, and the CPU voltage must be increased to support the higher frequency.
slide-25
SLIDE 25 7

Performance

600 900 1200 1500 1800 0.5 1 1.5 2 2.5 3 Cycles vs. CPU frequency for Dell Latitude D600 Cycles Ratio CPU Frequency (MHz) Assumed Latitude D600 - GZIP Latitude D600 - SWIM Saturday, 4 April 2009 * Lets look at performance. The commonly assumed models suggest that the number of CPU cycles for a workload is constant across frequency changes -- doubling the CPU frequency halves the execution time. * Looking at a CPU bound benchmark, this is indeed the case! The number of cycles stays nearly constant as the CPU frequency is increased. So far so good. * But let’s look at a memory bound program. The performance of memory isn’t improved by increased CPU frequency, so a memory-bound workload doesn’t really benefit from increased frequency. Therefore the number of cycles increases as the CPU clock runs faster. * Those cycles use extra energy, and the CPU voltage must be increased to support the higher frequency.
slide-26
SLIDE 26 7

Performance

600 900 1200 1500 1800 0.5 1 1.5 2 2.5 3 Cycles vs. CPU frequency for Dell Latitude D600 Cycles Ratio CPU Frequency (MHz) Assumed Latitude D600 - GZIP Latitude D600 - SWIM

CPU-bound

Saturday, 4 April 2009 * Lets look at performance. The commonly assumed models suggest that the number of CPU cycles for a workload is constant across frequency changes -- doubling the CPU frequency halves the execution time. * Looking at a CPU bound benchmark, this is indeed the case! The number of cycles stays nearly constant as the CPU frequency is increased. So far so good. * But let’s look at a memory bound program. The performance of memory isn’t improved by increased CPU frequency, so a memory-bound workload doesn’t really benefit from increased frequency. Therefore the number of cycles increases as the CPU clock runs faster. * Those cycles use extra energy, and the CPU voltage must be increased to support the higher frequency.
slide-27
SLIDE 27 7

Performance

600 900 1200 1500 1800 0.5 1 1.5 2 2.5 3 Cycles vs. CPU frequency for Dell Latitude D600 Cycles Ratio CPU Frequency (MHz) Assumed Latitude D600 - GZIP Latitude D600 - SWIM

CPU-bound Memory-bound

Saturday, 4 April 2009 * Lets look at performance. The commonly assumed models suggest that the number of CPU cycles for a workload is constant across frequency changes -- doubling the CPU frequency halves the execution time. * Looking at a CPU bound benchmark, this is indeed the case! The number of cycles stays nearly constant as the CPU frequency is increased. So far so good. * But let’s look at a memory bound program. The performance of memory isn’t improved by increased CPU frequency, so a memory-bound workload doesn’t really benefit from increased frequency. Therefore the number of cycles increases as the CPU clock runs faster. * Those cycles use extra energy, and the CPU voltage must be increased to support the higher frequency.
slide-28
SLIDE 28 8

Multiple Frequencies

Assumed 99/49/99 199/99/99 398/199/199 117/58/117 235/117/117 471/235/117 132/66/132 265/132/132 Saturday, 4 April 2009 Things get even more complicated when we start modifying the memory frequency -- on this XScale based platform, we can’t easily modify the CPU frequency without modifying the memory and bus frequency.
slide-29
SLIDE 29 8

Multiple Frequencies

0.8 1.04 1.28 1.52 1.76 2 170 280 390 500 Memory-bound workload (gzip) on Xscale Cycles Ratio CPU Frequency (MHz) Assumed 99/49/99 199/99/99 398/199/199 117/58/117 235/117/117 471/235/117 132/66/132 265/132/132 Lines connect points of equal memory frequency Saturday, 4 April 2009 Things get even more complicated when we start modifying the memory frequency -- on this XScale based platform, we can’t easily modify the CPU frequency without modifying the memory and bus frequency.
slide-30
SLIDE 30 9

Energy

Assumed swim gzip Saturday, 4 April 2009 * Now let’s look at energy. The simplest models would suggest a quadratic relationship between energy consumption and energy use -- the lowest frequency is always the most energy effjcient. * For the memory bound benchmark, where the execution time remains nearly constant, this is the case! The workload takes more energy as the CPU frequency increases, although not nearly so much as the assumed model suggests. * But if we look at the CPU bound benchmark, the energy used is reduced when we increase the frequency! What’s going on? How can we be so wrong? * Well... While the power for both benchmarks is definitely increased at higher frequencies, the CPU-bound benchmark runs for a much shorter time at the higher frequecies. Since the CPU-bound benchmark runs for a much shorter amount of time, it uses less energy over-all.
slide-31
SLIDE 31 9

Energy

600 900 1200 1500 1800 0.25 0.5 0.75 1 1.25 1.5 1.75 Energy for two workloads on a Dell Latitude D600 Energy Ratio CPU Frequency (MHz) Assumed swim gzip Saturday, 4 April 2009 * Now let’s look at energy. The simplest models would suggest a quadratic relationship between energy consumption and energy use -- the lowest frequency is always the most energy effjcient. * For the memory bound benchmark, where the execution time remains nearly constant, this is the case! The workload takes more energy as the CPU frequency increases, although not nearly so much as the assumed model suggests. * But if we look at the CPU bound benchmark, the energy used is reduced when we increase the frequency! What’s going on? How can we be so wrong? * Well... While the power for both benchmarks is definitely increased at higher frequencies, the CPU-bound benchmark runs for a much shorter time at the higher frequecies. Since the CPU-bound benchmark runs for a much shorter amount of time, it uses less energy over-all.
slide-32
SLIDE 32 9

Energy

600 900 1200 1500 1800 0.25 0.5 0.75 1 1.25 1.5 1.75 Energy for two workloads on a Dell Latitude D600 Energy Ratio CPU Frequency (MHz) Assumed swim gzip Saturday, 4 April 2009 * Now let’s look at energy. The simplest models would suggest a quadratic relationship between energy consumption and energy use -- the lowest frequency is always the most energy effjcient. * For the memory bound benchmark, where the execution time remains nearly constant, this is the case! The workload takes more energy as the CPU frequency increases, although not nearly so much as the assumed model suggests. * But if we look at the CPU bound benchmark, the energy used is reduced when we increase the frequency! What’s going on? How can we be so wrong? * Well... While the power for both benchmarks is definitely increased at higher frequencies, the CPU-bound benchmark runs for a much shorter time at the higher frequecies. Since the CPU-bound benchmark runs for a much shorter amount of time, it uses less energy over-all.
slide-33
SLIDE 33 9

Energy

600 900 1200 1500 1800 0.25 0.5 0.75 1 1.25 1.5 1.75 Energy for two workloads on a Dell Latitude D600 Energy Ratio CPU Frequency (MHz) Assumed swim gzip Saturday, 4 April 2009 * Now let’s look at energy. The simplest models would suggest a quadratic relationship between energy consumption and energy use -- the lowest frequency is always the most energy effjcient. * For the memory bound benchmark, where the execution time remains nearly constant, this is the case! The workload takes more energy as the CPU frequency increases, although not nearly so much as the assumed model suggests. * But if we look at the CPU bound benchmark, the energy used is reduced when we increase the frequency! What’s going on? How can we be so wrong? * Well... While the power for both benchmarks is definitely increased at higher frequencies, the CPU-bound benchmark runs for a much shorter time at the higher frequecies. Since the CPU-bound benchmark runs for a much shorter amount of time, it uses less energy over-all.
slide-34
SLIDE 34 9

Energy

600 900 1200 1500 1800 0.25 0.5 0.75 1 1.25 1.5 1.75 Energy for two workloads on a Dell Latitude D600 Energy Ratio CPU Frequency (MHz) Assumed swim gzip Saturday, 4 April 2009 * Now let’s look at energy. The simplest models would suggest a quadratic relationship between energy consumption and energy use -- the lowest frequency is always the most energy effjcient. * For the memory bound benchmark, where the execution time remains nearly constant, this is the case! The workload takes more energy as the CPU frequency increases, although not nearly so much as the assumed model suggests. * But if we look at the CPU bound benchmark, the energy used is reduced when we increase the frequency! What’s going on? How can we be so wrong? * Well... While the power for both benchmarks is definitely increased at higher frequencies, the CPU-bound benchmark runs for a much shorter time at the higher frequecies. Since the CPU-bound benchmark runs for a much shorter amount of time, it uses less energy over-all.
slide-35
SLIDE 35 10

Sleep States

gzip - 0W gzip - C2 (13.4W) gzip - C4 (11.4W) gzip - 5W Saturday, 4 April 2009 This assumes that we either use the extra time created by running fast, or we shut the system down. But what if we don’t have anything useful to do with that extra time? The system goes idle... And there are difgerent idle modes. This graph shows what would happen for four difgerent idle states when we execute a particular benchmark (gzip -- CPU bound). Note that the lowest energy frequency to run at is dependent on which sleep state we’ll enter. If we’re going into a higher-power state, we should run at the lowest frequency, and if we’re going to end up in a low-power state, we need to run at a high frequency.
slide-36
SLIDE 36 10

Sleep States

900 1200 1500 1800 800 1400 2000 Energy for a Dell Latitude D600 executing for a fixed period Energy (J) CPU Frequency (MHz) gzip - 0W gzip - C2 (13.4W) gzip - C4 (11.4W) gzip - 5W Saturday, 4 April 2009 This assumes that we either use the extra time created by running fast, or we shut the system down. But what if we don’t have anything useful to do with that extra time? The system goes idle... And there are difgerent idle modes. This graph shows what would happen for four difgerent idle states when we execute a particular benchmark (gzip -- CPU bound). Note that the lowest energy frequency to run at is dependent on which sleep state we’ll enter. If we’re going into a higher-power state, we should run at the lowest frequency, and if we’re going to end up in a low-power state, we need to run at a high frequency.
slide-37
SLIDE 37 10

Sleep States

900 1200 1500 1800 800 1400 2000 Energy for a Dell Latitude D600 executing for a fixed period Energy (J) CPU Frequency (MHz) gzip - 0W gzip - C2 (13.4W) gzip - C4 (11.4W) gzip - 5W

High-power state Low-power state

Saturday, 4 April 2009 This assumes that we either use the extra time created by running fast, or we shut the system down. But what if we don’t have anything useful to do with that extra time? The system goes idle... And there are difgerent idle modes. This graph shows what would happen for four difgerent idle states when we execute a particular benchmark (gzip -- CPU bound). Note that the lowest energy frequency to run at is dependent on which sleep state we’ll enter. If we’re going into a higher-power state, we should run at the lowest frequency, and if we’re going to end up in a low-power state, we need to run at a high frequency.
slide-38
SLIDE 38 11

Temperature

Low fan Medium Fan High Fan Saturday, 4 April 2009 There are lots more of these DVFS “gotchas” discussed in the paper. The DVFS behaviour of real systems just doesn’t fit the
  • model. We looked at the efgect of temperature and CPU fan speed... As the system warms up, it uses more and more power...
And when the fan kicks in to cool the system it uses even more power.
slide-39
SLIDE 39 11

Temperature

55 60 65 70 27 27.5 28 28.5 29 Power for a Dell Latitude D600 executing gzip Power (W) System Temperature (Degrees C) Low fan Medium Fan High Fan Saturday, 4 April 2009 There are lots more of these DVFS “gotchas” discussed in the paper. The DVFS behaviour of real systems just doesn’t fit the
  • model. We looked at the efgect of temperature and CPU fan speed... As the system warms up, it uses more and more power...
And when the fan kicks in to cool the system it uses even more power.
slide-40
SLIDE 40 11

Temperature

55 60 65 70 27 27.5 28 28.5 29 Power for a Dell Latitude D600 executing gzip Power (W) System Temperature (Degrees C) Low fan Medium Fan High Fan Saturday, 4 April 2009 There are lots more of these DVFS “gotchas” discussed in the paper. The DVFS behaviour of real systems just doesn’t fit the
  • model. We looked at the efgect of temperature and CPU fan speed... As the system warms up, it uses more and more power...
And when the fan kicks in to cool the system it uses even more power.
slide-41
SLIDE 41 11

Temperature

55 60 65 70 27 27.5 28 28.5 29 Power for a Dell Latitude D600 executing gzip Power (W) System Temperature (Degrees C) Low fan Medium Fan High Fan Saturday, 4 April 2009 There are lots more of these DVFS “gotchas” discussed in the paper. The DVFS behaviour of real systems just doesn’t fit the
  • model. We looked at the efgect of temperature and CPU fan speed... As the system warms up, it uses more and more power...
And when the fan kicks in to cool the system it uses even more power.
slide-42
SLIDE 42 11

Temperature

55 60 65 70 27 27.5 28 28.5 29 Power for a Dell Latitude D600 executing gzip Power (W) System Temperature (Degrees C) Low fan Medium Fan High Fan Saturday, 4 April 2009 There are lots more of these DVFS “gotchas” discussed in the paper. The DVFS behaviour of real systems just doesn’t fit the
  • model. We looked at the efgect of temperature and CPU fan speed... As the system warms up, it uses more and more power...
And when the fan kicks in to cool the system it uses even more power.
slide-43
SLIDE 43 12

Voltage Regulator Efficiency

Assumed 1.3V 1.2V 1.1V 1.0V Saturday, 4 April 2009 Another one that really had us flummoxed for a while was the effjciency of the system’s voltage regulators. In the Dell Latitude D600, the main core regulator’s effjciency is highly dependent on the amount of power running through it as well as the input voltage. It meant that, at a particular temperature, the system power actually _increased_ when changing down from 1300MHz to 1200MHz.
slide-44
SLIDE 44 12

Voltage Regulator Efficiency

25 30 35 40 20 30 40 Expected Power for a Dell Latitude D600 with artificially added Vcore load Actual Power (W) Expected power (W) Assumed 1.3V 1.2V 1.1V 1.0V Saturday, 4 April 2009 Another one that really had us flummoxed for a while was the effjciency of the system’s voltage regulators. In the Dell Latitude D600, the main core regulator’s effjciency is highly dependent on the amount of power running through it as well as the input voltage. It meant that, at a particular temperature, the system power actually _increased_ when changing down from 1300MHz to 1200MHz.
slide-45
SLIDE 45 12

Voltage Regulator Efficiency

25 30 35 40 20 30 40 Expected Power for a Dell Latitude D600 with artificially added Vcore load Actual Power (W) Expected power (W) Assumed 1.3V 1.2V 1.1V 1.0V Saturday, 4 April 2009 Another one that really had us flummoxed for a while was the effjciency of the system’s voltage regulators. In the Dell Latitude D600, the main core regulator’s effjciency is highly dependent on the amount of power running through it as well as the input voltage. It meant that, at a particular temperature, the system power actually _increased_ when changing down from 1300MHz to 1200MHz.
slide-46
SLIDE 46 12

Voltage Regulator Efficiency

25 30 35 40 20 30 40 Expected Power for a Dell Latitude D600 with artificially added Vcore load Actual Power (W) Expected power (W) Assumed 1.3V 1.2V 1.1V 1.0V Saturday, 4 April 2009 Another one that really had us flummoxed for a while was the effjciency of the system’s voltage regulators. In the Dell Latitude D600, the main core regulator’s effjciency is highly dependent on the amount of power running through it as well as the input voltage. It meant that, at a particular temperature, the system power actually _increased_ when changing down from 1300MHz to 1200MHz.
slide-47
SLIDE 47

And so many more...

  • Temperature
  • Fan power
  • Power supply efficiency
  • Memory performance variation
  • Real-time dependencies
  • Frequency switch overheads
  • Changing hardware configurations
  • Manufacturing variation
13 Saturday, 4 April 2009 There are lots of other quirks discussed in the paper. It means that the traditional assumptions can actually cause power management schemes to use more energy, not less. This will become increasingly true as we see more and more hardware power management features. Hardware platforms behave difgerently to each other and workloads behave difgerently to each other on them. You need to build a model that reflects the actual hardware you’re dealing with, and you need to scale workloads independently. And it’s not good enough to set the settings system wide -- in a multi-tasking workload, serious gains can be made by customising the system settings for individual workloads. To do this, we need a more realistic model.
slide-48
SLIDE 48

And so many more...

  • Temperature
  • Fan power
  • Power supply efficiency
  • Memory performance variation
  • Real-time dependencies
  • Frequency switch overheads
  • Changing hardware configurations
  • Manufacturing variation
13

We need a realistic model!

Saturday, 4 April 2009 There are lots of other quirks discussed in the paper. It means that the traditional assumptions can actually cause power management schemes to use more energy, not less. This will become increasingly true as we see more and more hardware power management features. Hardware platforms behave difgerently to each other and workloads behave difgerently to each other on them. You need to build a model that reflects the actual hardware you’re dealing with, and you need to scale workloads independently. And it’s not good enough to set the settings system wide -- in a multi-tasking workload, serious gains can be made by customising the system settings for individual workloads. To do this, we need a more realistic model.
slide-49
SLIDE 49

Koala overview

14

α

Performance Counters Settings

Saturday, 4 April 2009 Koala elegantly deals with real hardware and real platforms by using models to represent the system. We use: * CPU performance counters to measure the properties of running workloads; * A workload-agnostic system tuning knob -- alpha. And we can select the right combination of settings for the system’s scaling knobs.
slide-50
SLIDE 50

Koala overview

14

1% performance loss 26% system energy saving

α

Performance Counters Settings

Saturday, 4 April 2009 Koala elegantly deals with real hardware and real platforms by using models to represent the system. We use: * CPU performance counters to measure the properties of running workloads; * A workload-agnostic system tuning knob -- alpha. And we can select the right combination of settings for the system’s scaling knobs.
slide-51
SLIDE 51

Koala overview

14

1% performance loss 26% system energy saving

α

Performance Counters Settings

46%

DYNAMIC ENERGY SAVED

Saturday, 4 April 2009 Koala elegantly deals with real hardware and real platforms by using models to represent the system. We use: * CPU performance counters to measure the properties of running workloads; * A workload-agnostic system tuning knob -- alpha. And we can select the right combination of settings for the system’s scaling knobs.
slide-52
SLIDE 52

The Koala Approach

15

Workload Prediction

Candidate Setpoints QoS Info Setpoint

Energy/Performance Models Selection Policy

Workload Statistics

Saturday, 4 April 2009
  • First, we look at which workloads we are going to run, and predict some characteristics about those workloads based on what
they’ve been doing recently (i.e. we argue temporal locality).
  • Second, we use the information from the workload prediction to estimate what the performance and energy would be for
various candidate setpoints.
  • Third, we use a selection policy to choose from the candidate setpoints, based on quality of service constraints.
slide-53
SLIDE 53

Workload Prediction

16

Workload Prediction

Candidate Setpoints QoS Info Setpoint

Energy/Performance Models Selection Policy

Workload Statistics

Saturday, 4 April 2009 Our workload predictor is, at present, very simple. There is lots of work around on how this can be done much better, but our present method is to assume locality -- workloads will continue to do what they’ve been doing -- we assume that the next time slice will have the same properties as the previous timeslice. Multi-tasking -- if you’re running multiple workloads, the settings need to be appropriate for the particular application.
slide-54
SLIDE 54

Workload Prediction

16

Workload Prediction

Candidate Setpoints QoS Info Setpoint

Energy/Performance Models Selection Policy Workload Prediction

Workload Statistics

Saturday, 4 April 2009 Our workload predictor is, at present, very simple. There is lots of work around on how this can be done much better, but our present method is to assume locality -- workloads will continue to do what they’ve been doing -- we assume that the next time slice will have the same properties as the previous timeslice. Multi-tasking -- if you’re running multiple workloads, the settings need to be appropriate for the particular application.
slide-55
SLIDE 55

Energy and performance models

17

Workload Prediction

Candidate Setpoints QoS Info Setpoint

Energy/Performance Models Selection Policy

Workload Statistics

Saturday, 4 April 2009 * Next, I’ll talk about a major component of Koala -- energy models. These are built and characterised ofg-line for use at run- time.
slide-56
SLIDE 56

Energy and performance models

17

Workload Prediction

Candidate Setpoints QoS Info Setpoint

Energy/Performance Models Selection Policy Energy/Performance Models

Workload Statistics

Saturday, 4 April 2009 * Next, I’ll talk about a major component of Koala -- energy models. These are built and characterised ofg-line for use at run- time.
slide-57
SLIDE 57

Energy and performance models

18

Previously published: performance counter based performance and energy models.

Saturday, 4 April 2009 * The models we use here are similar to those which we’ve discussed in previous work. Our performance model calculates the ratio between the number of cycles at a target frequency and the number of cycles at the sampled frequency. The model which you see here is for a single adjustable frequency, but more generic models are possible and discussed in the papers. * The energy model we use is also based on previous works -- it is based on the number of events that occur in both voltage scaled and static voltage domains. These events might be as simple as CPU cycles, but can include other events like external bus accesses and particular types of instructions which use more energy than others. * We select the appropriate performance counters, and characterise the models ofg-line for each platform. Note though, that these models encapsulate all of the platform-specificity in Koala -- if you can build a model for your platform, Koala can make power management decisions. * We presented a couple of extra things in this paper -- a method for building the empirical models in a scientific way, modelling idle mode power, switching overheads, temperature, fans, etc.
  • - Talk about the macro-level workflow. Characterising ofg-line. Systematic way of choosing performance counters, etc.
slide-58
SLIDE 58

Energy and performance models

18

Previously published: performance counter based performance and energy models.

C′ C = 1 + β0(f ′

cpu − fcpu)PMC 0

C + . . .

Performance

Saturday, 4 April 2009 * The models we use here are similar to those which we’ve discussed in previous work. Our performance model calculates the ratio between the number of cycles at a target frequency and the number of cycles at the sampled frequency. The model which you see here is for a single adjustable frequency, but more generic models are possible and discussed in the papers. * The energy model we use is also based on previous works -- it is based on the number of events that occur in both voltage scaled and static voltage domains. These events might be as simple as CPU cycles, but can include other events like external bus accesses and particular types of instructions which use more energy than others. * We select the appropriate performance counters, and characterise the models ofg-line for each platform. Note though, that these models encapsulate all of the platform-specificity in Koala -- if you can build a model for your platform, Koala can make power management decisions. * We presented a couple of extra things in this paper -- a method for building the empirical models in a scientific way, modelling idle mode power, switching overheads, temperature, fans, etc.
  • - Talk about the macro-level workflow. Characterising ofg-line. Systematic way of choosing performance counters, etc.
slide-59
SLIDE 59

Energy and performance models

18

Previously published: performance counter based performance and energy models.

C′ C = 1 + β0(f ′

cpu − fcpu)PMC 0

C + . . .

Performance

E′ = V ′2

cpu(α0C′ + α1PMC 0 + . . . ) +

(γ0PMC 0 + . . . ) + PstaticT ′

Energy

Saturday, 4 April 2009 * The models we use here are similar to those which we’ve discussed in previous work. Our performance model calculates the ratio between the number of cycles at a target frequency and the number of cycles at the sampled frequency. The model which you see here is for a single adjustable frequency, but more generic models are possible and discussed in the papers. * The energy model we use is also based on previous works -- it is based on the number of events that occur in both voltage scaled and static voltage domains. These events might be as simple as CPU cycles, but can include other events like external bus accesses and particular types of instructions which use more energy than others. * We select the appropriate performance counters, and characterise the models ofg-line for each platform. Note though, that these models encapsulate all of the platform-specificity in Koala -- if you can build a model for your platform, Koala can make power management decisions. * We presented a couple of extra things in this paper -- a method for building the empirical models in a scientific way, modelling idle mode power, switching overheads, temperature, fans, etc.
  • - Talk about the macro-level workflow. Characterising ofg-line. Systematic way of choosing performance counters, etc.
slide-60
SLIDE 60

Energy and performance models

18

Previously published: performance counter based performance and energy models. Added extras:

  • Empirical model building techniques
  • Idle power, switching overheads, temperature

C′ C = 1 + β0(f ′

cpu − fcpu)PMC 0

C + . . .

Performance

E′ = V ′2

cpu(α0C′ + α1PMC 0 + . . . ) +

(γ0PMC 0 + . . . ) + PstaticT ′

Energy

Saturday, 4 April 2009 * The models we use here are similar to those which we’ve discussed in previous work. Our performance model calculates the ratio between the number of cycles at a target frequency and the number of cycles at the sampled frequency. The model which you see here is for a single adjustable frequency, but more generic models are possible and discussed in the papers. * The energy model we use is also based on previous works -- it is based on the number of events that occur in both voltage scaled and static voltage domains. These events might be as simple as CPU cycles, but can include other events like external bus accesses and particular types of instructions which use more energy than others. * We select the appropriate performance counters, and characterise the models ofg-line for each platform. Note though, that these models encapsulate all of the platform-specificity in Koala -- if you can build a model for your platform, Koala can make power management decisions. * We presented a couple of extra things in this paper -- a method for building the empirical models in a scientific way, modelling idle mode power, switching overheads, temperature, fans, etc.
  • - Talk about the macro-level workflow. Characterising ofg-line. Systematic way of choosing performance counters, etc.
slide-61
SLIDE 61

Modelling in action

19

fcpu Cycles PMC0 PMC1 1862.0 7445578 56285 134734

Sample data:

Saturday, 4 April 2009 Lets look at how these models are used in Koala. First, we take a sample of the workload and assume that the next timeslice of the workload is going to behave in a very similar way. Then, for several candidate setpoints, the models predict what the percentage performance and energy will be. Note that the model can be arbitrarily accurate depending on the hardware available -- it can take into account as many of the hardware quirks as possible.
slide-62
SLIDE 62

Modelling in action

19

fcpu Cycles PMC0 PMC1 1862.0 7445578 56285 134734

Sample data: Via the models:

fcpu Vcpu Performance Energy

798 0.988 1064 1.068 1197 1.100 1330 1.148 1463 1.180 1596 1.228 1729 1.260 1862 1.308 Saturday, 4 April 2009 Lets look at how these models are used in Koala. First, we take a sample of the workload and assume that the next timeslice of the workload is going to behave in a very similar way. Then, for several candidate setpoints, the models predict what the percentage performance and energy will be. Note that the model can be arbitrarily accurate depending on the hardware available -- it can take into account as many of the hardware quirks as possible.
slide-63
SLIDE 63

Modelling in action

19

fcpu Cycles PMC0 PMC1 1862.0 7445578 56285 134734

Sample data: Via the models:

fcpu Vcpu Performance Energy

798 0.988 1064 1.068 1197 1.100 1330 1.148 1463 1.180 1596 1.228 1729 1.260 1862 1.308

Performance Energy

75.5% 93.7% 84.6% 89.8% 88.1% 89.2% 91.1% 90.3% 93.8% 91.3% 96.1% 93.9% 98.1% 96.0% 100.0% 100% Saturday, 4 April 2009 Lets look at how these models are used in Koala. First, we take a sample of the workload and assume that the next timeslice of the workload is going to behave in a very similar way. Then, for several candidate setpoints, the models predict what the percentage performance and energy will be. Note that the model can be arbitrarily accurate depending on the hardware available -- it can take into account as many of the hardware quirks as possible.
slide-64
SLIDE 64

Selection Policy

20

Performance Energy

75.5% 93.7% 84.6% 89.8% 88.1% 89.2% 91.1% 90.3% 93.8% 91.3% 96.1% 93.9% 98.1% 96.0% 100.0% 100% Saturday, 4 April 2009 Now that we have some information about the performance and energy used by the workload at various frequencies, we can try to choose a setting based on our needs. We could choose the minimum energy setting if we really cared about energy, or the minimum time (max performance) setting if we really cared about that. If we had problems with thermal dissipation in the system we could choose the minimum power setting. One issue with choosing the minimum energy setting is that we might be getting a large performance hit for very little energy
  • savings. This is addressed by a Minimum Energy * Delay policy, since you can expect at least an equal energy saving for any
performance hit. A policy that we’ve seen in the literature (Frank, and others) that we can easily implement with Koala is a Bounded Performance Degradation policy. Here we bound the performance degradation at 90%, and so we choose setting 4.
slide-65
SLIDE 65

Selection Policy

20

Performance Energy

75.5% 93.7% 84.6% 89.8% 88.1% 89.2% 91.1% 90.3% 93.8% 91.3% 96.1% 93.9% 98.1% 96.0% 100.0% 100%

Workload Prediction Energy/Performance Models Selection Policy Selection Policy

Saturday, 4 April 2009 Now that we have some information about the performance and energy used by the workload at various frequencies, we can try to choose a setting based on our needs. We could choose the minimum energy setting if we really cared about energy, or the minimum time (max performance) setting if we really cared about that. If we had problems with thermal dissipation in the system we could choose the minimum power setting. One issue with choosing the minimum energy setting is that we might be getting a large performance hit for very little energy
  • savings. This is addressed by a Minimum Energy * Delay policy, since you can expect at least an equal energy saving for any
performance hit. A policy that we’ve seen in the literature (Frank, and others) that we can easily implement with Koala is a Bounded Performance Degradation policy. Here we bound the performance degradation at 90%, and so we choose setting 4.
slide-66
SLIDE 66

Selection Policy

20

Performance Energy

75.5% 93.7% 84.6% 89.8% 88.1% 89.2% 91.1% 90.3% 93.8% 91.3% 96.1% 93.9% 98.1% 96.0% 100.0% 100%

Performance Energy

1 75.5% 93.7% 2 84.6% 89.8% 3 88.1% 89.2% 4 91.1% 90.3% 5 93.8% 91.3% 6 96.1% 93.9% 7 98.1% 96.0% 8 100.0% 100% Saturday, 4 April 2009 Now that we have some information about the performance and energy used by the workload at various frequencies, we can try to choose a setting based on our needs. We could choose the minimum energy setting if we really cared about energy, or the minimum time (max performance) setting if we really cared about that. If we had problems with thermal dissipation in the system we could choose the minimum power setting. One issue with choosing the minimum energy setting is that we might be getting a large performance hit for very little energy
  • savings. This is addressed by a Minimum Energy * Delay policy, since you can expect at least an equal energy saving for any
performance hit. A policy that we’ve seen in the literature (Frank, and others) that we can easily implement with Koala is a Bounded Performance Degradation policy. Here we bound the performance degradation at 90%, and so we choose setting 4.
slide-67
SLIDE 67

Selection Policy

20
  • Minimum Energy

Performance Energy

75.5% 93.7% 84.6% 89.8% 88.1% 89.2% 91.1% 90.3% 93.8% 91.3% 96.1% 93.9% 98.1% 96.0% 100.0% 100%

Performance Energy

1 75.5% 93.7% 2 84.6% 89.8% 3 88.1% 89.2% 4 91.1% 90.3% 5 93.8% 91.3% 6 96.1% 93.9% 7 98.1% 96.0% 8 100.0% 100% 88.1% 89.2% Saturday, 4 April 2009 Now that we have some information about the performance and energy used by the workload at various frequencies, we can try to choose a setting based on our needs. We could choose the minimum energy setting if we really cared about energy, or the minimum time (max performance) setting if we really cared about that. If we had problems with thermal dissipation in the system we could choose the minimum power setting. One issue with choosing the minimum energy setting is that we might be getting a large performance hit for very little energy
  • savings. This is addressed by a Minimum Energy * Delay policy, since you can expect at least an equal energy saving for any
performance hit. A policy that we’ve seen in the literature (Frank, and others) that we can easily implement with Koala is a Bounded Performance Degradation policy. Here we bound the performance degradation at 90%, and so we choose setting 4.
slide-68
SLIDE 68

Selection Policy

20
  • Minimum Energy
  • Maximum

Performance

Performance Energy

75.5% 93.7% 84.6% 89.8% 88.1% 89.2% 91.1% 90.3% 93.8% 91.3% 96.1% 93.9% 98.1% 96.0% 100.0% 100%

Performance Energy

1 75.5% 93.7% 2 84.6% 89.8% 3 88.1% 89.2% 4 91.1% 90.3% 5 93.8% 91.3% 6 96.1% 93.9% 7 98.1% 96.0% 8 100.0% 100% 100.0% 100% Saturday, 4 April 2009 Now that we have some information about the performance and energy used by the workload at various frequencies, we can try to choose a setting based on our needs. We could choose the minimum energy setting if we really cared about energy, or the minimum time (max performance) setting if we really cared about that. If we had problems with thermal dissipation in the system we could choose the minimum power setting. One issue with choosing the minimum energy setting is that we might be getting a large performance hit for very little energy
  • savings. This is addressed by a Minimum Energy * Delay policy, since you can expect at least an equal energy saving for any
performance hit. A policy that we’ve seen in the literature (Frank, and others) that we can easily implement with Koala is a Bounded Performance Degradation policy. Here we bound the performance degradation at 90%, and so we choose setting 4.
slide-69
SLIDE 69

Selection Policy

20
  • Minimum Energy
  • Maximum

Performance

  • Minimum Power

Performance Energy

75.5% 93.7% 84.6% 89.8% 88.1% 89.2% 91.1% 90.3% 93.8% 91.3% 96.1% 93.9% 98.1% 96.0% 100.0% 100%

Performance Energy

1 75.5% 93.7% 2 84.6% 89.8% 3 88.1% 89.2% 4 91.1% 90.3% 5 93.8% 91.3% 6 96.1% 93.9% 7 98.1% 96.0% 8 100.0% 100% 75.5% 93.7% Saturday, 4 April 2009 Now that we have some information about the performance and energy used by the workload at various frequencies, we can try to choose a setting based on our needs. We could choose the minimum energy setting if we really cared about energy, or the minimum time (max performance) setting if we really cared about that. If we had problems with thermal dissipation in the system we could choose the minimum power setting. One issue with choosing the minimum energy setting is that we might be getting a large performance hit for very little energy
  • savings. This is addressed by a Minimum Energy * Delay policy, since you can expect at least an equal energy saving for any
performance hit. A policy that we’ve seen in the literature (Frank, and others) that we can easily implement with Koala is a Bounded Performance Degradation policy. Here we bound the performance degradation at 90%, and so we choose setting 4.
slide-70
SLIDE 70

Selection Policy

20
  • Minimum Energy
  • Maximum

Performance

  • Minimum Power
  • Minimum E*D product

Performance Energy

75.5% 93.7% 84.6% 89.8% 88.1% 89.2% 91.1% 90.3% 93.8% 91.3% 96.1% 93.9% 98.1% 96.0% 100.0% 100%

Performance Energy

1 75.5% 93.7% 2 84.6% 89.8% 3 88.1% 89.2% 4 91.1% 90.3% 5 93.8% 91.3% 6 96.1% 93.9% 7 98.1% 96.0% 8 100.0% 100% 93.8% 91.3% Saturday, 4 April 2009 Now that we have some information about the performance and energy used by the workload at various frequencies, we can try to choose a setting based on our needs. We could choose the minimum energy setting if we really cared about energy, or the minimum time (max performance) setting if we really cared about that. If we had problems with thermal dissipation in the system we could choose the minimum power setting. One issue with choosing the minimum energy setting is that we might be getting a large performance hit for very little energy
  • savings. This is addressed by a Minimum Energy * Delay policy, since you can expect at least an equal energy saving for any
performance hit. A policy that we’ve seen in the literature (Frank, and others) that we can easily implement with Koala is a Bounded Performance Degradation policy. Here we bound the performance degradation at 90%, and so we choose setting 4.
slide-71
SLIDE 71

Selection Policy

20
  • Minimum Energy
  • Maximum

Performance

  • Minimum Power
  • Minimum E*D product
  • Bounded Performance

Degradation -- 90%

Performance Energy

75.5% 93.7% 84.6% 89.8% 88.1% 89.2% 91.1% 90.3% 93.8% 91.3% 96.1% 93.9% 98.1% 96.0% 100.0% 100%

Performance Energy

1 75.5% 93.7% 2 84.6% 89.8% 3 88.1% 89.2% 4 91.1% 90.3% 5 93.8% 91.3% 6 96.1% 93.9% 7 98.1% 96.0% 8 100.0% 100% 91.1% 90.3% Saturday, 4 April 2009 Now that we have some information about the performance and energy used by the workload at various frequencies, we can try to choose a setting based on our needs. We could choose the minimum energy setting if we really cared about energy, or the minimum time (max performance) setting if we really cared about that. If we had problems with thermal dissipation in the system we could choose the minimum power setting. One issue with choosing the minimum energy setting is that we might be getting a large performance hit for very little energy
  • savings. This is addressed by a Minimum Energy * Delay policy, since you can expect at least an equal energy saving for any
performance hit. A policy that we’ve seen in the literature (Frank, and others) that we can easily implement with Koala is a Bounded Performance Degradation policy. Here we bound the performance degradation at 90%, and so we choose setting 4.
slide-72
SLIDE 72 21

Bounded performance degradation

Ideal lbm mcf swim gzip milc povray equake Saturday, 4 April 2009 This plot represents real Koala data -- benchmarks running with the bounded performance degradation policy. The CPU bound benchmarks achieve the requested minimum performance, but the memory bound benchmarks can’t be degraded that far because even if you ran at the lowest frequency, the performance wouldn’t really change that much. Koala makes the best efgort. The thing to take away from this graph is that if we ask for a performance degradation, the CPU bound benchmarks will definitely do it. Set the performance to 90%, and you’ll get 90% of the performance for a CPU bound benchmark.
slide-73
SLIDE 73 21

Bounded performance degradation

25 50 75 100 25 43.75 62.5 81.25 100 Actual vs. requested performance with Koala Actual Performance (%) Requested Minimum Performance (%) Ideal lbm mcf swim gzip milc povray equake Saturday, 4 April 2009 This plot represents real Koala data -- benchmarks running with the bounded performance degradation policy. The CPU bound benchmarks achieve the requested minimum performance, but the memory bound benchmarks can’t be degraded that far because even if you ran at the lowest frequency, the performance wouldn’t really change that much. Koala makes the best efgort. The thing to take away from this graph is that if we ask for a performance degradation, the CPU bound benchmarks will definitely do it. Set the performance to 90%, and you’ll get 90% of the performance for a CPU bound benchmark.
slide-74
SLIDE 74 21

Bounded performance degradation

25 50 75 100 25 43.75 62.5 81.25 100 Actual vs. requested performance with Koala Actual Performance (%) Requested Minimum Performance (%) Ideal lbm mcf swim gzip milc povray equake Saturday, 4 April 2009 This plot represents real Koala data -- benchmarks running with the bounded performance degradation policy. The CPU bound benchmarks achieve the requested minimum performance, but the memory bound benchmarks can’t be degraded that far because even if you ran at the lowest frequency, the performance wouldn’t really change that much. Koala makes the best efgort. The thing to take away from this graph is that if we ask for a performance degradation, the CPU bound benchmarks will definitely do it. Set the performance to 90%, and you’ll get 90% of the performance for a CPU bound benchmark.
slide-75
SLIDE 75 21

Bounded performance degradation

25 50 75 100 25 43.75 62.5 81.25 100 Actual vs. requested performance with Koala Actual Performance (%) Requested Minimum Performance (%) Ideal lbm mcf swim gzip milc povray equake

Memory-bound CPU-bound

Saturday, 4 April 2009 This plot represents real Koala data -- benchmarks running with the bounded performance degradation policy. The CPU bound benchmarks achieve the requested minimum performance, but the memory bound benchmarks can’t be degraded that far because even if you ran at the lowest frequency, the performance wouldn’t really change that much. Koala makes the best efgort. The thing to take away from this graph is that if we ask for a performance degradation, the CPU bound benchmarks will definitely do it. Set the performance to 90%, and you’ll get 90% of the performance for a CPU bound benchmark.
slide-76
SLIDE 76 22

Bounded performance degradation

lbm mcf swim gzip milc povray equake Saturday, 4 April 2009 Looking at the energy, we see that the CPU bound benchmarks also behave difgerently to memory bound ones. For any value of minimum performance les than 100%, the CPU bound benchmarks use _more_ energy. The memory bound benchmarks use less. A value found empirically is around about 90%, but that is sub-optimal for both the CPU bound and memory-bound
  • benchmarks. Moreover -- we lose 10% of the performance on the CPU-bound benchmarks... for an energy INCREASE!
It means that the performance setting really isn’t a globally applicable metric for how much we want to scale. We need a policy that is entirely workload agnostic!
slide-77
SLIDE 77 22

Bounded performance degradation

25 50 75 100 70 90 110 130 150 Actual energy vs. requested minimum performance with Koala Actual Energy (%) Requested Minimum Performance (%) lbm mcf swim gzip milc povray equake Saturday, 4 April 2009 Looking at the energy, we see that the CPU bound benchmarks also behave difgerently to memory bound ones. For any value of minimum performance les than 100%, the CPU bound benchmarks use _more_ energy. The memory bound benchmarks use less. A value found empirically is around about 90%, but that is sub-optimal for both the CPU bound and memory-bound
  • benchmarks. Moreover -- we lose 10% of the performance on the CPU-bound benchmarks... for an energy INCREASE!
It means that the performance setting really isn’t a globally applicable metric for how much we want to scale. We need a policy that is entirely workload agnostic!
slide-78
SLIDE 78 22

Bounded performance degradation

25 50 75 100 70 90 110 130 150 Actual energy vs. requested minimum performance with Koala Actual Energy (%) Requested Minimum Performance (%) lbm mcf swim gzip milc povray equake Saturday, 4 April 2009 Looking at the energy, we see that the CPU bound benchmarks also behave difgerently to memory bound ones. For any value of minimum performance les than 100%, the CPU bound benchmarks use _more_ energy. The memory bound benchmarks use less. A value found empirically is around about 90%, but that is sub-optimal for both the CPU bound and memory-bound
  • benchmarks. Moreover -- we lose 10% of the performance on the CPU-bound benchmarks... for an energy INCREASE!
It means that the performance setting really isn’t a globally applicable metric for how much we want to scale. We need a policy that is entirely workload agnostic!
slide-79
SLIDE 79 22

Bounded performance degradation

25 50 75 100 70 90 110 130 150 Actual energy vs. requested minimum performance with Koala Actual Energy (%) Requested Minimum Performance (%) lbm mcf swim gzip milc povray equake

Memory-bound CPU-bound

Saturday, 4 April 2009 Looking at the energy, we see that the CPU bound benchmarks also behave difgerently to memory bound ones. For any value of minimum performance les than 100%, the CPU bound benchmarks use _more_ energy. The memory bound benchmarks use less. A value found empirically is around about 90%, but that is sub-optimal for both the CPU bound and memory-bound
  • benchmarks. Moreover -- we lose 10% of the performance on the CPU-bound benchmarks... for an energy INCREASE!
It means that the performance setting really isn’t a globally applicable metric for how much we want to scale. We need a policy that is entirely workload agnostic!
slide-80
SLIDE 80 23

Generalised Energy-Delay Policy

η = P (1−α)T (1+α)

Saturday, 4 April 2009
  • What if, instead of minimising energy, or time, or power, we minimised some function which gave us a good trade-ofg.
  • We came up with such a function, and call the resulting policy generalised E*D, or Alpha.
  • By using various difgerent values of alpha, we can express the full spectrum of policies, including Minimum Energy, Minimum
Time (max performance), Minimum Energy and, for thermal throttling, minimum power.
slide-81
SLIDE 81
  • α
0: Minimum Energy 23

Generalised Energy-Delay Policy

η = PT = E

Saturday, 4 April 2009
  • What if, instead of minimising energy, or time, or power, we minimised some function which gave us a good trade-ofg.
  • We came up with such a function, and call the resulting policy generalised E*D, or Alpha.
  • By using various difgerent values of alpha, we can express the full spectrum of policies, including Minimum Energy, Minimum
Time (max performance), Minimum Energy and, for thermal throttling, minimum power.
slide-82
SLIDE 82
  • α
0: Minimum Energy 23

Generalised Energy-Delay Policy

η = P (1−α)T (1+α)

Saturday, 4 April 2009
  • What if, instead of minimising energy, or time, or power, we minimised some function which gave us a good trade-ofg.
  • We came up with such a function, and call the resulting policy generalised E*D, or Alpha.
  • By using various difgerent values of alpha, we can express the full spectrum of policies, including Minimum Energy, Minimum
Time (max performance), Minimum Energy and, for thermal throttling, minimum power.
slide-83
SLIDE 83
  • α
0: Minimum Energy 23

Generalised Energy-Delay Policy

1
  • 1: Maximum Performance

η = T 2

Saturday, 4 April 2009
  • What if, instead of minimising energy, or time, or power, we minimised some function which gave us a good trade-ofg.
  • We came up with such a function, and call the resulting policy generalised E*D, or Alpha.
  • By using various difgerent values of alpha, we can express the full spectrum of policies, including Minimum Energy, Minimum
Time (max performance), Minimum Energy and, for thermal throttling, minimum power.
slide-84
SLIDE 84
  • α
0: Minimum Energy 23

Generalised Energy-Delay Policy

1
  • 1: Maximum Performance

η = P (1−α)T (1+α)

Saturday, 4 April 2009
  • What if, instead of minimising energy, or time, or power, we minimised some function which gave us a good trade-ofg.
  • We came up with such a function, and call the resulting policy generalised E*D, or Alpha.
  • By using various difgerent values of alpha, we can express the full spectrum of policies, including Minimum Energy, Minimum
Time (max performance), Minimum Energy and, for thermal throttling, minimum power.
slide-85
SLIDE 85
  • α
0: Minimum Energy 23

Generalised Energy-Delay Policy

  • 0.33
0.33: Minimum Energy-Delay 1
  • 1: Maximum Performance

η = (ET)

2 3

Saturday, 4 April 2009
  • What if, instead of minimising energy, or time, or power, we minimised some function which gave us a good trade-ofg.
  • We came up with such a function, and call the resulting policy generalised E*D, or Alpha.
  • By using various difgerent values of alpha, we can express the full spectrum of policies, including Minimum Energy, Minimum
Time (max performance), Minimum Energy and, for thermal throttling, minimum power.
slide-86
SLIDE 86
  • α
0: Minimum Energy 23

Generalised Energy-Delay Policy

  • 0.33
0.33: Minimum Energy-Delay 1
  • 1: Maximum Performance

η = P (1−α)T (1+α)

Saturday, 4 April 2009
  • What if, instead of minimising energy, or time, or power, we minimised some function which gave us a good trade-ofg.
  • We came up with such a function, and call the resulting policy generalised E*D, or Alpha.
  • By using various difgerent values of alpha, we can express the full spectrum of policies, including Minimum Energy, Minimum
Time (max performance), Minimum Energy and, for thermal throttling, minimum power.
slide-87
SLIDE 87
  • α
0: Minimum Energy 23

Generalised Energy-Delay Policy

  • 1
  • 1: Minimum Power
  • 0.33
0.33: Minimum Energy-Delay 1
  • 1: Maximum Performance

η = P 2

Saturday, 4 April 2009
  • What if, instead of minimising energy, or time, or power, we minimised some function which gave us a good trade-ofg.
  • We came up with such a function, and call the resulting policy generalised E*D, or Alpha.
  • By using various difgerent values of alpha, we can express the full spectrum of policies, including Minimum Energy, Minimum
Time (max performance), Minimum Energy and, for thermal throttling, minimum power.
slide-88
SLIDE 88
  • α
0: Minimum Energy 23

Generalised Energy-Delay Policy

  • 1
  • 1: Minimum Power
  • 0.33
0.33: Minimum Energy-Delay 1
  • 1: Maximum Performance

η = P (1−α)T (1+α)

Saturday, 4 April 2009
  • What if, instead of minimising energy, or time, or power, we minimised some function which gave us a good trade-ofg.
  • We came up with such a function, and call the resulting policy generalised E*D, or Alpha.
  • By using various difgerent values of alpha, we can express the full spectrum of policies, including Minimum Energy, Minimum
Time (max performance), Minimum Energy and, for thermal throttling, minimum power.
slide-89
SLIDE 89 24 40 50 60 70 80 90 100
  • 1
  • 0.5
0.5 1 80 90 100 110 120 130 140 150 160 170 Actual Performance (%) Actual Energy (%) Alpha setting Act Perf POVRAY Act Energy POVRAY

Generalised Energy-Delay Policy

Saturday, 4 April 2009
slide-90
SLIDE 90 24 40 50 60 70 80 90 100
  • 1
  • 0.5
0.5 1 80 90 100 110 120 130 140 150 160 170 Actual Performance (%) Actual Energy (%) Alpha setting Act Perf POVRAY Act Energy POVRAY

Generalised Energy-Delay Policy

Energy Performance

Saturday, 4 April 2009
slide-91
SLIDE 91 24 40 50 60 70 80 90 100
  • 1
  • 0.5
0.5 1 80 90 100 110 120 130 140 150 160 170 Actual Performance (%) Actual Energy (%) Alpha setting Act Perf POVRAY Act Energy POVRAY Act Perf MILC Act Energy MILC

Generalised Energy-Delay Policy

Energy Performance

Saturday, 4 April 2009
slide-92
SLIDE 92

Implementation

25
  • Implemented in Linux 2.6.24.
  • Characterised using SPEC2000.
  • Validated using SPEC2006.
  • Measured using a custom-built

data logger.

1.Dell Latitude D600 2.IBM T41 3.AMD Opteron Server 4.Intel XEON Server 5.Gumstix 6.UNSW PLEB2 7.NICTA Ibox 8.Menlow 9.Asus EEEPC 901 10.Phycore iMX31 Platforms

Saturday, 4 April 2009
slide-93
SLIDE 93 26

Ten more reasons to read the paper.

  • More hardware quirks.
  • Empirical data from

several platforms

  • Parameter and model

selection

  • Experimental details
  • Implementation details
  • Multi tasking
  • Frequency switch
  • verheads
  • Calculation overheads
  • Higher level policies
  • Practicality issues
Saturday, 4 April 2009
slide-94
SLIDE 94

Koala

27
  • The commonly assumed models are wrong.
  • Use empirical models to manage power.
  • Use workload-agnostic policies.
  • Characterised, tested and evaluated on lots
  • f real hardware.

http://ertos.nicta.com.au David.Snowdon@nicta.com.au

Saturday, 4 April 2009 The idea with Koala is that if you can model how a system is likely to behave in various conditions, you can control it. If you can build a model for your particular platform, Koala can control it. If that model encompasses the quirks of your platform, Koala will avoid the pitfalls and take advantage of the opportunities. You just need to build the model.
slide-95
SLIDE 95

From imagination to impact

28 Saturday, 4 April 2009
slide-96
SLIDE 96

From imagination to impact

28 Saturday, 4 April 2009