Toward Runtime Power Management of Exascale Networks by On/Off - - PowerPoint PPT Presentation

toward runtime power management of exascale networks by
SMART_READER_LITE
LIVE PREVIEW

Toward Runtime Power Management of Exascale Networks by On/Off - - PowerPoint PPT Presentation

Toward Runtime Power Management of Exascale Networks by On/Off Control of Links Ehsan Totoni University of Illinois-Urbana Champaign, PPL Charm++ Workshop, April 16 2013 Power challenge Power is a major challenge Blue Waters


slide-1
SLIDE 1

Toward Runtime Power Management of Exascale Networks by On/Off Control of Links

Ehsan Totoni University of Illinois-Urbana Champaign, PPL Charm++ Workshop, April 16 2013

slide-2
SLIDE 2

Power challenge

ò Power is a major challenge ò Blue Waters consuming up to 13 MW

ò Enough to electrify a small town ò Power and cooling infrastructure

ò Up to 30% of power in network

ò Projected for future by Peter Kogge ò Saving 25% power in current Cray XT system by turning down network

ò Work from Sandia

2 Ehsan Totoni

slide-3
SLIDE 3

Network link power

ò Network is not “energy proportional”

ò Consumption is not related to utilization ò Near peak most of the time ò Unlike processor

ò Recent study:

ò Work from Google in ISCA’10 ò 50% of power in network of non-HPC data center ò When CPU’s underutilized

ò Up to 65% of network’s power is in links

3 Ehsan Totoni

slide-4
SLIDE 4

Exascale networks

ò Dragonfly

ò IBM PERCS in Power 775 machines ò Cray Aries network in XC30 “Cascade” ò DOE Exascale Report

ò High dimensional Tori

ò 5D Torus in IBM Blue Gen/Q ò 6D Torus in K Computer

ò Higher radix -> a lot of links!

4 Ehsan Totoni

slide-5
SLIDE 5

Communication patterns

ò Applications’ communication patterns are different ò Network topology designed for a wide range of applications

MILC NPB CG

5 Ehsan Totoni

slide-6
SLIDE 6

Fraction of links ever used

20 40 60 80 100

NAMD_PME NAMD MILC CG MG BT

Link Usage (%)

Full Network PERCS 3D Torus 6D Torus

6 Ehsan Totoni

slide-7
SLIDE 7

Nearest neighbor usage

20 40 60 80 100

Jacobi2D Jacobi3D Jacobi4D

Link Usage (%)

Full Network PERCS 3D Torus 6D Torus

7 Ehsan Totoni

slide-8
SLIDE 8

More expensive links

20 40 60 80 100

NAMD_PME NAMD MILC CG MG BT

Link Usage (%)

LL links LR links D links all links

8 Ehsan Totoni

slide-9
SLIDE 9

Nearest neighbor

20 40 60 80 100

Jacobi2D Jacobi3D Jacobi4D

Link Usage (%)

LL links LR links D links all links

9 Ehsan Totoni

slide-10
SLIDE 10

Solution to power waste

ò Many of the links are never used

ò For common applications

ò Are networks over-built? Maybe

ò FFTs are crucial ò But processors are also overbuilt

ò Let’s make them “energy proportional”

ò Consume according to workload ò Just like processors

ò Turn off unused links

ò Commercial network exists (Motorola)

10 Ehsan Totoni

slide-11
SLIDE 11

Runtime system solution

ò Hardware can cause delays

ò According to related work ò Not enough application knowledge

ò Small window size

ò Compiler does not have enough info

ò Input dependent program flow

ò Application does not know hardware

ò Significant programming burden to expose

ò Runtime system is the best

ò mediates all communication

ò knows the application ò knows the hardware

11 Ehsan Totoni

slide-12
SLIDE 12

Feasibility

ò Not probably available for your cluster downstairs

ò Need to convince hardware vendors ò Runtime hints to hardware, small delay penalty if wrong

ò Multiple jobs: interference

ò Isolated allocations are becoming common

ò Blue Genes allocate cubes already

ò Capability machines are for big jobs

12 Ehsan Totoni

slide-13
SLIDE 13

Software design choices

20 40 60 80 100

Default Random Indirect

Link Usage of Jacobi3d 300K (%)

LL links LR links D links all links

ò Random mapping and indirect routing have similar performance but different link usages

13 Ehsan Totoni

slide-14
SLIDE 14

Power model

ò We saw many links that are never used ò Used links are not used all the time

ò For only a fraction of iteration time ò Compute-communicate paradigm

ò A power model for “network capacity utilization”

ò “Average” utilization of all the links ò Assume that links are turned magically on and off

ò At the exact right time

ò No switching overhead ò Example: network used one tenth of iteration time

14 Ehsan Totoni

slide-15
SLIDE 15

Model results

5 10 15 20 25 30 35 40 45

NAMD MILC CG MG BT

Network Capacity Utilization (U %)

PERCS 6D Torus

15 Ehsan Totoni

slide-16
SLIDE 16

Scheduling on/offs

ò Runtime roughly knows when a message will arrive

ò For common iterative HPC applications ò Low noise systems (e.g. IBM Blue Genes)

ò There is a delay for switching the link

ò 10μs for current implementation ò Much smaller than iteration time ò Runtime can be conservative

ò Schedule “on”s earlier ò Similar to having more switching delay

16 Ehsan Totoni

slide-17
SLIDE 17

Delay overhead

20 40 60 80 100 0.01 0.1 1 10

Network Capacity Utilization (U %) Link Transition Delay (ms)

NAMD MILC CG MG BT

17 Ehsan Totoni

slide-18
SLIDE 18

Results summary

5 10 15 20 25 30

NAMD_PME MILC CG MG BT

Machine Power Saving Potential (%)

Basic PERCS Basic 6D Torus Schedule 1ms delay PERCS Schedule 1ms delay 6D Torus

18 Ehsan Totoni

slide-19
SLIDE 19

Questions?

Are you convinced?

19 Ehsan Totoni