Rethinking Power, Resilience, and Sustainability Issues for - - PowerPoint PPT Presentation

rethinking power resilience and sustainability issues for
SMART_READER_LITE
LIVE PREVIEW

Rethinking Power, Resilience, and Sustainability Issues for - - PowerPoint PPT Presentation

Rethinking Power, Resilience, and Sustainability Issues for Large-scale Computing and Storage Systems Data-intensive applications Devesh Tiwari Actionable analytical tools, Assistant Professor runtime systems and libraries, Northeastern


slide-1
SLIDE 1

Rethinking Power, Resilience, and Sustainability Issues for Large-scale Computing and Storage Systems

Devesh Tiwari Assistant Professor Northeastern University tiwari@northeastern.edu Selected Recent Publications

DSN’16, DSN’15, DSN’14 SC’16 (2), SC’15 (4), SC’14 MICRO’16, HPCA’16, HPCA’15 IPDPS’16, IPDPS’14, IPDPS’12 USENIX FAST’13, HPCA’11 3 Best Paper Nominations

Selected Recent PC Service

DSN’17, ICDCS’17, IPDPS’17 CCGrid’17, HoStorage’16 ICDCS’16, IPDPS’16, SC’15

Actionable analytical tools, runtime systems and libraries, resource manager for improving application and system efficiency under power, temperature, performance, resilience, and operating cost constraints

Data-intensive applications Large-scale compute & storage systems

slide-2
SLIDE 2

How to provision, manage, and utilize resources in a data center?

Need to know how a large-scale system is designed, built, and operated? What are the design trade-offs? What are practical operational issues?

Research Goal: Improving Cost-efficiency of data centers

If you can’t convert something into dollars, it’s probably worth nothing.

Power-capping, workload performance, reliability, data center capex & opex cost optimizations

1 1.5 2 2.5

CoMD+MPI MiniFE+MPI Snap+MPI

i7 Sandy Bridge Xeon Phi 2 4 2 4 8 2 4 8 16 32 61 Peak Power (relative to 1 core)

0% 50% 100% 150% 200% 0% 50% 100% Peak Pow er
  • Avg. Pow er
Prediction Error on i7 CDF

(A) (C)

0% 50% 100% 150% 200% 0% 50% 100% Peak Pow er
  • Avg. Pow er
Prediction Error on Sandy Bridge CDF

Improving operational efficiency, power/cooling, and reliability of heterogeneous data-center systems

slide-3
SLIDE 3

Effective Management and Utilization of Large-scale Systems Accurate peak power prediction for different workloads across platforms

Normalized Runtime Normalized Runtime 50 100 0.5 1 Instantaneous Power (W) Normalized Runtime Sandy Bridge MPI CoMD 8 4 2 1 50 100 0.5 1 Instantaneous Power (W) Normalized Runtime Sandy Bridge OMP CoMD 8 4 2 1 Instantaneous (W) Xeon Phi Instantaneous (W) Xeon Phi Normalized Runtime Normalized Runtime 50 100 0.5 1 Instantaneous Power (W) Normalized Runtime Sandy Bridge MPI miniFE 8 4 2 1 50 100 0.5 1 Instantaneous Power (W) Normalized Runtime Sandy Bridge OMP miniFE 8 4 2 1 Instantaneous (W) Xeon Phi Instantaneous (W) Xeon Phi 1 2 4 8 16 32 61

1 2 3

Peak Power (relative to 1 core) # of Active Cores

I7 (floor) I7 (ceil) Sandy Bridge (floor) Xeon Phi (floor) Xeon Phi (ceil) Sandy Bridge (ceil)

1 1.5 2 2.5

CoMD+MPI MiniFE+MPI Snap+MPI

i7 Sandy Bridge Xeon Phi 2 4 2 4 8 2 4 8 16 32 61 Peak Power (relative to 1 core)

(A)

slide-4
SLIDE 4

Effective Management and Utilization of Large-scale Systems Accurate peak power prediction for different workloads across platforms

1 2 4 8 16 32 61

1 2 3

Peak Power (relative to 1 core) # of Active Cores

I7 (floor) I7 (ceil) Sandy Bridge (floor) Xeon Phi (floor) Xeon Phi (ceil) Sandy Bridge (ceil)

1 1.5 2 2.5

CoMD+MPI MiniFE+MPI Snap+MPI

i7 Sandy Bridge Xeon Phi 2 4 2 4 8 2 4 8 16 32 61 Peak Power (relative to 1 core)

(A)

  • Normalized Runtime

Normalized Runtime 20 40 60 80 100 50 100 Peak Power Prediction Error Normalized Runtime Sandy Bridge MPI FT 8 4 2 1 20 40 60 80 100 50 100 Peak Power Prediction Error Normalized Runtime Sandy Bridge MPI LU 8 4 2 1

slide-5
SLIDE 5

Effective Management and Utilization of Large-scale Systems

Power, temperature, and reliability driven optimizations for workloads in data centers (guided by machine learning models)

1 2 3 4 5 Time (hour) 25 30 35 40 45 50 T emperature(C)

Real PRACTISE

(a) (b)

) ) supercomputer

slide-6
SLIDE 6

Careful Provisioning of Large-scale Systems Effective Management and Utilization of Large-scale Systems

D1 D14 Disk Enclosure 1 DEM D15 D28 ... DEM D29 D42 ... DEM D56 ... DEM DEM DEM DEM DEM A1 B1 C1 D1 E1 F1 G1 H1 P1 S1 A1 B1 A2 B2 C2 D2 E2 F2 G2 H2 P2 S2 A2 B2 Controller2 ... Disk Enclosure 2 Disk Enclosure 5 Controller1 ... D43 IO Module IO Module IO Module IO Module IO Module IO Module IO Module IO Module IO Module IO Module Power Supply (House) Power Supply (UPS) Power Supply (House) Power Supply (UPS) Power Supply (House) Power Supply (UPS)

...

15 17 18 19 20 21 16 22 23 24 25 26 32 27 28

...

33 34 35 36 37 38 39

...

72 73 74 75 92 105

... ...

1 8 2 9 3 10 4 11

...

House Power Supply (Controller) UPS Power Supply (Controller) Controller I/O Module House Power Supply (Disk Enclosure) Disk Enclosure DEM Baseboard Disk Drive UPS Power Supply (Disk Enclosure)

Algorithm 1 Spare Provisioning Algorithm

Input: Current spare pool SP, replacement log and unit price of each type of FRU, annual budget for spare provisioning B. Output: Spare provisioning results X = [x1, x2, . . . , xN ]. Obtain number of spares in SP, n = [n1, n2, . . . , nN ]; Calculate [m1, m2, . . . , mN ]; Calculate [MTTR1, MTTR2, . . . , MTTRN] for i = [1, 2, . . . , N] do Calculate yi, the expected number of failures of FRUi; Add yi into Y, and MTTRi into MTTR; Add mi into m, and bi into b; end for X = ResolveOptimizationModel(Y, MTTR, m, b, B) for i = [1, 2, . . . , N] do if ni < xi then Add (xi − ni) spares FRUi in to SP; end if end for

10 20 30 40 20 40 60 80 100 120 140 Annual Provision Budget (10,000 USD) Average Unavailable Duration in 5 Years (Hours) 48 SSUs, RAID6 Configuration Optimized Controller−first Enclosure−first Unlimited Budget

Capturing system topology, reliability, and performance needs to improve data availability duration (multi-tier storage hierarchy with SSDs)

1 2 3 4

Power Cap Level (watt) Execution Time

1 2 3 4

Power Cap Level (watt) Execution Time

1 2 3 4

Power Cap Level (watt) Execution Time

1 2 3 4

Power Cap Level (watt) Execution Time

1 2 3 4

Power Cap Level (watt) Execution Time

1 2 3 4

Power Cap Level (watt) Execution Time

  • 1

2 3 4

Power Cap Level (watt) Execution Time

1 2 3 4

Power Cap Level (watt) Execution Time

1 2 3 4

Power Cap Level (watt) Execution Time

  • 1

2 3 4

Power Cap Level (watt) Execution Time

1 2 3 4

Power Cap Level (watt) Execution Time

60 50 40 30

  • CG

MG LU FT BT SP LUD IS EP CFD lavaMD CG : y = 22e(−0.08x) + 1 lavaMD: y = 35e(−0.12x) + 1

Time (s) Power Consumption (watts) Time (s) Power Consumption (watts) Time (s) Power Consumption (watts) Time (s) Power Consumption (watts) Time (s) Power Consumption (watts) Time (s) Power Consumption (watts) Time (s) Power Consumption (watts) Time (s) Power Consumption (watts)

9 18 27 36 45 30 60 90

25 w 30 w 35 w 40 w 45 w 50 w 55 w 60 w

Power Cap Level (watt) Power Cap Level (watt)

600 1200 1800 2400

60 50 40 30

  • 200

400 600 800

Execution Time (Hours) Energy Consumption (MWh)

First order High order

  • First order

High order

1 2 3 4

Power Cap Level (watt) Execution Time

  • 1

2 3 4

Power Cap Level (watt) Execution Time

1 2 3 4

Power Cap Level (watt) Execution Time

1 2 3 4

Power Cap Level (watt)

1 2 3 4

Power Cap Level (watt)

1 2 3 4

Power Cap Level (watt)

1 2 3 4

Power Cap Level (watt)

1 2 3 4

Power Cap Level (watt)

  • 1

2 3 4

Power Cap Level (watt)

60 55 50 45 40 35 30 25

1 2 3 4

Energy Consumption

  • LU

SP BT

Checkpoint Interval (Hours) Checkpoint Interval (Hours)

  • ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

2400 4800 7200 9600

2 4 6 8

Checkpoint Interval (Hours) Checkpoint Interval (Hours)

  • ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
  • ● ● ●

600 1200 1800 2400

Execution Time (Hours) Energy Consumption (MWh)

  • Sim

First High αt

+

αt

  • Sim

First High αe

+

αe

N

Power Cap Level (watt) Data Volume (PB) Power Cap Level (watt) Data Volume (PB) Power Cap Level (watt) Data Volume (PB) Power Cap Level (watt) Data Volume (PB) Power Cap Level (watt) Data Volume (PB) Power Cap Level (watt) Data Volume (PB) Power Cap Level (watt) Data Volume (PB) Power Cap Level (watt) Data Volume (PB) Power Cap Level (watt) Data Volume (PB)

60 50 40 30 10 20 30 40 50 60

αt

+

αt

αe

+

αe

− 10 20 30 40 50 60 10 20 30 40 50 60

Temperature (°C) MTBF (hours)

10 20 30 40 50 60 10 20 30 40 50 60

Temperature (°C) MTBF (hours)

10 20 30 40 50 60 10 20 30 40 50 60

Temperature (°C) MTBF (hours)

  • 10

20 30 40 50 60 10 20 30 40 50 60

Temperature (°C) MTBF (hours)

10 20 30 40 50 60 10 20 30 40 50 60

Temperature (°C) MTBF (hours)

10 20 30 40 50 60 10 20 30 40 50 60

Temperature (°C) MTBF (hours)

10 20 30 40 50 60 10 20 30 40 50 60

Temperature (°C) MTBF (hours)

10 20 30 40 50 60 10 20 30 40 50 60

Temperature (°C) MTBF (hours)

  • Lower Bound

Upper Bound Average Arrhenius

Power-capping, reliability, and performance

  • ptimization