Modelling the Energy Consumption of Soft Real-Time Tasks on - - PowerPoint PPT Presentation

modelling the energy consumption of soft real time tasks
SMART_READER_LITE
LIVE PREVIEW

Modelling the Energy Consumption of Soft Real-Time Tasks on - - PowerPoint PPT Presentation

Modelling the Energy Consumption of Soft Real-Time Tasks on Heterogeneous Computing Architectures H.E. Zahaf 1 , R. Olejnik 1 , G. Lipari 1 , A.E. Benyamina 2 1 Universit de Lille 2 University of Oran January 19, 2016 Outline Introduction


slide-1
SLIDE 1

Modelling the Energy Consumption of Soft Real-Time Tasks on Heterogeneous Computing Architectures

H.E. Zahaf1, R. Olejnik1, G. Lipari1, A.E. Benyamina2

1Université de Lille 2University of Oran

January 19, 2016

slide-2
SLIDE 2

Outline

Introduction Experimental setting Time vs. energy Conclusions and Current work

slide-3
SLIDE 3

Outline

Introduction Experimental setting Time vs. energy Conclusions and Current work

slide-4
SLIDE 4

Context and motivation

Computing at the edge

slide-5
SLIDE 5

Fog Computing

◮ Fog Computing characteristics

◮ Computing at the edge means that data are pre-processed

before being stored in the cloud

◮ thus reducing network load

◮ Fog Computing requirements

◮ Multicore, heterogeneous ◮ different kind of computation are needed ◮ Low power consumption ◮ (Soft real-time)

slide-6
SLIDE 6

Minimise power consumption

◮ Modern processors have many ways of reducing power

consumption

◮ Dynamic Voltage and Frequency Scaling (DVFS)

◮ dynamically adjust processor frequency to minimise energy . . . ◮ . . . without reducing performances too much

◮ Dynamic Power Management (DPM)

◮ Turn off processors that are not used/needed ◮ Pack all computation in a small number of processors . . . ◮ . . . without reducing performance too much

◮ In any case, performance is the key here

slide-7
SLIDE 7

Soft real-time tasks

◮ A soft real-time task consists of a sequence of processing to

be executed periodically

◮ e.g.: every 20 msec, encode one video frame ◮ Period = 20 msec

◮ Usually associated with a deadline

◮ every video frame must be encoded within 20 msec ◮ Deadline = 20 msec

◮ Goal:

◮ find the minimum frequency such that the task completes

within its deadline

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40

τ1

slide-8
SLIDE 8

Soft real-time tasks

◮ A soft real-time task consists of a sequence of processing to

be executed periodically

◮ e.g.: every 20 msec, encode one video frame ◮ Period = 20 msec

◮ Usually associated with a deadline

◮ every video frame must be encoded within 20 msec ◮ Deadline = 20 msec

◮ Goal:

◮ find the minimum frequency such that the task completes

within its deadline

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40

τ1

slide-9
SLIDE 9

Soft real-time tasks

◮ A soft real-time task consists of a sequence of processing to

be executed periodically

◮ e.g.: every 20 msec, encode one video frame ◮ Period = 20 msec

◮ Usually associated with a deadline

◮ every video frame must be encoded within 20 msec ◮ Deadline = 20 msec

◮ Goal:

◮ find the minimum frequency such that the task completes

within its deadline

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40

τ1

slide-10
SLIDE 10

Soft real-time tasks

◮ A soft real-time task consists of a sequence of processing to

be executed periodically

◮ e.g.: every 20 msec, encode one video frame ◮ Period = 20 msec

◮ Usually associated with a deadline

◮ every video frame must be encoded within 20 msec ◮ Deadline = 20 msec

◮ Goal:

◮ find the minimum frequency such that the task completes

within its deadline

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40

τ1

slide-11
SLIDE 11

The problem

◮ Model:

◮ a set of soft real-time tasks ◮ with different periods, deadlines, execution time profiles ◮ scheduled by an operating system using a real-time scheduling

algorithm

◮ on a set of heterogeneous processors

◮ Problem:

◮ allocate tasks to processor ◮ set frequency ◮ set scheduling parameters

◮ Objective

◮ minimise total energy without missing deadlines

slide-12
SLIDE 12

Energy model

◮ In order to solve the problem, we need to have a model of the

energy consumption

◮ Problems:

◮ Processor model: ◮ Energy saving mechanisms, internal to the chip and

transparent to the programmer, try to minimise energy of micro-operations

◮ Complexity of the hardware/software interaction ◮ Influence of pipeline and cache on execution time ◮ Tasks share resources (caches, memory bus, peripherals)

◮ It is impossible to derive an exact model

◮ we resort to measurement

slide-13
SLIDE 13

Outline

Introduction Experimental setting Time vs. energy Conclusions and Current work

slide-14
SLIDE 14

ARM Big/Little

ARM Cortex A7

L1 Cache

ARM Cortex A7

L1 Cache

ARM Cortex A7

L1 Cache

ARM Cortex A7

L1 Cache

ARM Cortex A15

L1 Cache

ARM Cortex A15

L1 Cache

ARM Cortex A15

L1 Cache

ARM Cortex A15

L1 Cache RAM Memory L2 Cache L2 Cache

Little's cluster Big's cluster

GPU Mali

Figure : ARM Big/Little

◮ It is possible to set the frequency of each processor group, but

not of individual cores

◮ Each group (Little or Big) has its own characteristics in terms

  • f execution time speed-up and energy consumption
slide-15
SLIDE 15

Energy Sensors

◮ Sensors:

◮ current for Little’s group ◮ current for Big’s group ◮ current for RAM

◮ A global amperometer for the card

◮ used to check consistency of measurements

slide-16
SLIDE 16

Benchmarks

◮ Three periodic tasks ◮ MATMUL (L)

◮ Multiplying 2 square matrices of LxL for a certain number of

times

◮ FFT

◮ Fast Fourier transform of a random input signal

◮ FFMEG

◮ the decoding algorithm of a specific video input

◮ Tasks were executed periodically every T units of time

◮ we measured execution time, and energy consumption of

processors and memory

◮ Linux OS

◮ Frequency governor disabled

slide-17
SLIDE 17

Outline

Introduction Experimental setting Time vs. energy Conclusions and Current work

slide-18
SLIDE 18

Execution time

◮ The execution time of MATMUL(200x200) thread allocated

  • n one little/big Core, with no interference

500 1,000 1,500 2,000 0.5 1 1.5 2 2.5 Frequency (Mhz) Execution Time (S) B-avg L-avg

slide-19
SLIDE 19

Model of computation time

◮ Computation time varies with frequency according to the

following rule Ci(f ) = f m f cti +mti

◮ Two components:

◮ cti represents the number of instruction cycles executed on the

processor

◮ mti represent the main memory access

◮ The second component does not vary with frequency, but

depends on the number of cache misses

◮ hence on the interference of other tasks on the cache and on

the bus

slide-20
SLIDE 20

Computing task’s parameters

◮ We can compute both components for each task in a typical

setting, with a simple regression

◮ Example: MATMUL(size)

◮ (times are expressed in milliseconds)

Size RSS (Kb) ct (L) mt (L) ct (B) mt (B) 150 1272 98 15 23 7 200 1452 254 17 66 8 250 1651 526 19 146 9 300 1840 978 21 278 10

slide-21
SLIDE 21

Impact of interference

◮ Co-execution of an interfering task (MATMUL(200x200))

◮ the interference increases with the size of the matrix

200 400 600 800 1,000 1,200 1,400 0.2 0.4 0.6 0.8 Frequency (Mhz) Execution Time (S) L-With-P L-Without 500 1,000 1,500 2,000 0.2 0.4 0.6 0.8 Frequency (Mhz) Execution Time (S) B-With-P B-Without

slide-22
SLIDE 22

Dynamic power

◮ Energy consumption of MATMUL(150) on Big and Little cores 500 1,000 1,500 2,000 1 2 3 4 Frequency (Mhz) Power (w) B-avg L-avg ◮ The little at fmax = 1400 consumes less than the big at

fmin = 200

◮ Power can be model as a polynomial of 3rd degree:

P(f ) = af 3 +bf 2 +cf +d

slide-23
SLIDE 23

Impact of idle processors

◮ We can only measure the energy consumed by all little cores

◮ one single sensor per group of cores

500 1,000 1,500 2,000 2 4 6 Frequency (Mhz) Power (w) One-L Two-L Three-L One-B Two-B Three-B ◮ Not easy to understand what it is going on:

◮ the OS puts the core in low power mode when not executing,

reducing also static energy

◮ however, there is a shared "base" for all processors

slide-24
SLIDE 24

Power consumption of RAM

◮ Big core consumes slightly less

◮ probably due to the larger L2 cache (less cache misses)

500 1,000 1,500 2,000 2·10−2 4·10−2 6·10−2 8·10−2 0.1 Frequency (Mhz) Little core Big core

slide-25
SLIDE 25

Model of energy

◮ We used a 3-degree polynomial of frequency to model the

energy consumption: P(f ) = af 3 +bf 2 +cf +d

200 400 600 800 1,000 1,200 1,400 0.1 0.2 0.3 0.4 Frequency (Mhz) Power (W) Real-FFT-1

slide-26
SLIDE 26

Model of energy

◮ We used a 3-degree polynomial of frequency to model the

energy consumption: P(f ) = af 3 +bf 2 +cf +d

200 400 600 800 1,000 1,200 1,400 0.1 0.2 0.3 0.4 Frequency (Mhz) Power (W) Real-FFT-1 Rg-FFT-1

slide-27
SLIDE 27

Model of energy

◮ We used a 3-degree polynomial of frequency to model the

energy consumption: P(f ) = af 3 +bf 2 +cf +d

200 400 600 800 1,000 1,200 1,400 0.1 0.2 0.3 0.4 Frequency (Mhz) Power (W) Real-FFT-1 Rg-FFT-1 Real-MM-1

slide-28
SLIDE 28

Model of energy

◮ We used a 3-degree polynomial of frequency to model the

energy consumption: P(f ) = af 3 +bf 2 +cf +d

200 400 600 800 1,000 1,200 1,400 0.1 0.2 0.3 0.4 Frequency (Mhz) Power (W) Real-FFT-1 Rg-FFT-1 Real-MM-1 Rg-MM-1

slide-29
SLIDE 29

Model of energy

◮ We used a 3-degree polynomial of frequency to model the

energy consumption: P(f ) = af 3 +bf 2 +cf +d

200 400 600 800 1,000 1,200 1,400 0.1 0.2 0.3 0.4 Frequency (Mhz) Power (W) Real-FFT-1 Rg-FFT-1 Real-MM-1 Rg-MM-1

◮ Regression: FFT MatMul a 4.6·10−11 5.2·10−11 b 2.2·10−8 4.1·10−9 c 3.4·10−8 7.8·10−5 d 4.4·10−2 1.7·10−2

slide-30
SLIDE 30

Summary

◮ Fact 1: each task has its own coefficients of power

consumption (static and dynamic)

◮ due to internal power optimization by the hardware

◮ Fact 2: scalability of computation time varies with the task

◮ it depends on size of data vs. size of L1/L2 cache, and

interference

◮ We need to put things together

slide-31
SLIDE 31

Energy consumption

◮ The energy consumed by a task E(f ) = P(f )Ci(f )

◮ (we are not considering deadline constraints in this graph)

500 1,000 1,500 2,000 0.2 0.4 0.6 0.8 1 Frequency (Mhz) Energy (Wh) FFT-L FFT-B MM-L MM-B ◮ Fact 3: The optimal frequency is different for each task

slide-32
SLIDE 32

Outline

Introduction Experimental setting Time vs. energy Conclusions and Current work

slide-33
SLIDE 33

Summary

◮ To minimise energy

◮ Profiling of applications to find coefficients

Pi(f ) = aif 3 +bif 2 +cif +di Ci(f ) = cti fmax f +mti Ei(f ) = Ci(f )Pi(f ) Etot(f ,∆) =

  • i∈L

Ei(fL)ni(∆)+

  • i∈B

Ei(fB)ni(∆)+Emem

◮ Allocating tasks on processors and set processor frequencies

(fL and fB)

◮ Constraints: ◮ deadlines must be respected ◮ group frequency ◮ A non-linear mixed-integer programming optimisation problem ◮ Difficult to solve with exact optimisation tools ◮ We are developing heuristics

slide-34
SLIDE 34

Profiling

◮ There is no free lunch! ◮ Complex model

◮ Energy consumption depends on task code ◮ Profiling seems to be the only concrete option available ◮ However, the complexity of profiling grows with the complexity

  • f the software

◮ need to explore input space!

◮ Further research is needed to reduce the profiling effort

slide-35
SLIDE 35

Parallelization

◮ Can we parallelize tasks? ◮ Pros:

◮ may simplify task allocation ◮ may help respecting deadlines

◮ Cons:

◮ One more complexity dimension to an already difficult problem

slide-36
SLIDE 36

Thank you for listening! Questions ?