Power and Energy Modelling of Multi-core Processors for System-Level - - PowerPoint PPT Presentation

power and energy modelling of multi core processors for
SMART_READER_LITE
LIVE PREVIEW

Power and Energy Modelling of Multi-core Processors for System-Level - - PowerPoint PPT Presentation

Power and Energy Modelling of Multi-core Processors for System-Level Design Space Exploration Santhosh Kumar Rethinagiri, Oscar Palomar , Osman Unsal, Adrian Cristal Barcelona Supercomputing Center Energy-Aware Computing Workshop 2014 This


slide-1
SLIDE 1

This project and the research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/2007-2013] under grant agreement n 318693

Santhosh Kumar Rethinagiri, Oscar Palomar, Osman Unsal, Adrian Cristal Barcelona Supercomputing Center Energy-Aware Computing Workshop 2014

Power and Energy Modelling of Multi-core Processors for System-Level Design Space Exploration

slide-2
SLIDE 2

EACO Workshop, Sept. 10th 2014, Bristol

ParaDIME Consortium

2

TU – Dresden (GERMANY)

IMEC- Leuven (BELGIUM) BSC- Barcelona (SPAIN)

University of Neuchatel (SWITZERLAND)

Cloud & Heat Technologies - Dresden (GERMANY)

slide-3
SLIDE 3

EACO Workshop, Sept. 10th 2014, Bristol

Why ParaDIME ?

Parallel Distributed Infrastructure for Minimization of Energy Rising cost

Hardware cost Programming efficiency Runtime optimization Energy aware data center computing

3

slide-4
SLIDE 4

EACO Workshop, Sept. 10th 2014, Bristol

The ParaDIME Stack

4

ParaDIME Infrastructure Data Center

Computing Node/Stack

OS JVM API Application/BM

Intra Data Center Scheduler

Simulated HW Cores Accelerators Interconnect Future Devices JVM

Scala

AKKA Actor Sched

Computing Node/Stack

Real HW JVM JVM OS OS VM VM Hypervisor Real HW Real HW Real HW Real HW Hyper JVM JVM OS VM

Scala

AKKA Actor Sched API Application/BM

Multi Data Center Scheduler

BSC IMEC UNINE TuD Cloud & Heat Technologies

slide-5
SLIDE 5

EACO Workshop, Sept. 10th 2014, Bristol 5

Challenges of modelling power of heterogeneous systems Estimating power/energy is a critical design goal for electronic devices. Designers today must evaluate power estimation as early as possible in the electronics design. Design changes are easier in the design phase and have the greatest impact on application power estimation at System-Level. A platform to use different processors and components. Functional level is accurate but it’s a course grain. Restriction in terms of measuring power from the real board. For fine grain, we can achieve it from gate level simulation. Restriction applies as we don’t have the tools and RTL sources. Very slow simulation speed. Another challenge is power law holds for a simple processor but for complex processor system remains debatable ?

slide-6
SLIDE 6

EACO Workshop, Sept. 10th 2014, Bristol 6

Power estimation methodology and tools

McPAT

slide-7
SLIDE 7

EACO Workshop, Sept. 10th 2014, Bristol

Hybrid design space exploration methodology

slide-8
SLIDE 8

EACO Workshop, Sept. 10th 2014, Bristol

First step : FLPA ( Functional Level Power Analysis)

slide-9
SLIDE 9

EACO Workshop, Sept. 10th 2014, Bristol

Functional block (ARM Cortex-A9)

slide-10
SLIDE 10

EACO Workshop, Sept. 10th 2014, Bristol

Generic Power Model Parameters

The Parameters which influence the power in a system.

slide-11
SLIDE 11

EACO Workshop, Sept. 10th 2014, Bristol

Power measurement environment

11

Courtesy: Open-People project

slide-12
SLIDE 12

EACO Workshop, Sept. 10th 2014, Bristol

Variation of Instruction Per Cycle (IPC) in Power for ARM Cortex-A8

5 10 15 20 25 30 35 40 45 50 0.5 1 1.5 2 Power (mW) Instruc on Per Cycle (IPC) Power (mW)

slide-13
SLIDE 13

EACO Workshop, Sept. 10th 2014, Bristol

Power consumption models generated with FLPA

slide-14
SLIDE 14

EACO Workshop, Sept. 10th 2014, Bristol

Second Step: System Level Power Analysis

slide-15
SLIDE 15

EACO Workshop, Sept. 10th 2014, Bristol

Result Interface

15

slide-16
SLIDE 16

EACO Workshop, Sept. 10th 2014, Bristol

Result Interface

16

slide-17
SLIDE 17

EACO Workshop, Sept. 10th 2014, Bristol

Results and Comparison (Power estimation)

slide-18
SLIDE 18

EACO Workshop, Sept. 10th 2014, Bristol

Results and comparison (Energy)

slide-19
SLIDE 19

EACO Workshop, Sept. 10th 2014, Bristol

Third step: Auto optimization DVFS

Runtime – Inter task DVFS Programmer annotation based DVFS

Work-load balancing based on task

Runtime Programmer based request

19

slide-20
SLIDE 20

EACO Workshop, Sept. 10th 2014, Bristol

Task scheduling

20

slide-21
SLIDE 21

EACO Workshop, Sept. 10th 2014, Bristol

Optimization based on work load balancing

21

slide-22
SLIDE 22

EACO Workshop, Sept. 10th 2014, Bristol

Optimization (Inter task DVFS)

22

slide-23
SLIDE 23

EACO Workshop, Sept. 10th 2014, Bristol

Conclusion

In our tool, we have proved that our estimates are accurate. Adaptable for any kind of complex processor system. Added advantage, rapid prototyping of the components and porting

  • f the applications made easy.

Estimating power and designing applications made easy and time efficient.

slide-24
SLIDE 24

EACO Workshop, Sept. 10th 2014, Bristol

The ParaDIME Stack

24

ParaDIME Infrastructure Data Center

Computing Node/Stack

OS JVM API Application/BM

Intra Data Center Scheduler

Simulated HW Cores Accelerators Interconnect Future Devices JVM

Scala

AKKA Actor Sched

Computing Node/Stack

Real HW JVM JVM OS OS VM VM Hypervisor Real HW Real HW Real HW Real HW Hyper JVM JVM OS VM

Scala

AKKA Actor Sched API Application/BM

Multi Data Center Scheduler

BSC IMEC UNINE TuD Cloud & Heat Technologies

slide-25
SLIDE 25

EACO Workshop, Sept. 10th 2014, Bristol

Hardware Architecture Energy-Efficient Message Passing

Message passing microarchitecture Message passing accelerator

Task passing

Operation Below Safe Vdd

Automatic HW lowering of Vdd SW-guided (low-power annotation) Errors?

Heterogeneous Computing

Architectural level Device level

25

slide-26
SLIDE 26

EACO Workshop, Sept. 10th 2014, Bristol

Heterogeneous system-level environment

26

Memory Bus Task 3 Data

PETS Tool activity counter Interface

Virtual Platform

Task 1 Task 2

ARM Cortex-A9 Quad-core ISS ARM Cortex-A8 Quad-core ISS DSP C64x ISS FPGA Hardware Accelerator GPU accelerator

slide-27
SLIDE 27

EACO Workshop, Sept. 10th 2014, Bristol

Heterogeneous computing results and comparison

27 200 400 600 800 1000 1200 1400 1600 ARM Cortex-A9 (quad core) ARM Cortex-A8 (quad core) FPGA (ZynQ) DSP C64x GPU Tegra3

K-means

Energy (J)

slide-28
SLIDE 28

EACO Workshop, Sept. 10th 2014, Bristol

Message passing programming model

Actor model (Akka+Scala)

Annotations to provide information to the hardware

Operation below Safe Vdd Approximate Computing

@Storage(Array("precise=false", "VF_relax=true"))

var x = 5 @Calculation(Array("VF_relax=true", "VF_det=DMR", VF_corr=TM")) def calc(first:Array[double])

Rewrite/Expand annotated code with Scala Macro Annotations

Programming model

28

slide-29
SLIDE 29

This project and the research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/2007-2013] under grant agreement n 318693

Oscar Palomar (BSC) Energy-Aware Computing Workshop 2014

Power and Energy Modeling of Multi-core Processors for System-Level Design Space Exploration

slide-30
SLIDE 30

EACO Workshop, Sept. 10th 2014, Bristol

Computation of power and measurement of voltage for OMAP

slide-31
SLIDE 31

EACO Workshop, Sept. 10th 2014, Bristol

Power measurement environment