SLIDE 6 Careful Provisioning of Large-scale Systems Effective Management and Utilization of Large-scale Systems
D1 D14
Disk Enclosure 1
DEM D15 D28
...
DEM D29 D42
...
DEM D56
...
DEM DEM DEM DEM DEM
A1 B1 C1 D1 E1 F1 G1 H1 P1 S1 A1 B1 A2 B2 C2 D2 E2 F2 G2 H2 P2 S2 A2 B2 Controller2 ... Disk Enclosure 2 Disk Enclosure 5 Controller1 ...
D43 IO Module IO Module IO Module IO Module IO Module IO Module IO Module IO Module IO Module IO Module Power Supply (House) Power Supply (UPS) Power Supply (House) Power Supply (UPS) Power Supply (House) Power Supply (UPS)
...
15 17 18 19 20 21 16 22 23 24 25 26 32 27 28
...
33 34 35 36 37 38 39
...
72 73 74 75 92 105
... ...
1 8 2 9 3 10 4 11
...
House Power Supply (Controller) UPS Power Supply (Controller) Controller I/O Module House Power Supply (Disk Enclosure) Disk Enclosure DEM Baseboard Disk Drive UPS Power Supply (Disk Enclosure)
Algorithm 1 Spare Provisioning Algorithm
Input: Current spare pool SP, replacement log and unit price of each type of FRU, annual budget for spare provisioning B. Output: Spare provisioning results X = [x1, x2, . . . , xN ]. Obtain number of spares in SP, n = [n1, n2, . . . , nN ]; Calculate [m1, m2, . . . , mN ]; Calculate [MTTR1, MTTR2, . . . , MTTRN] for i = [1, 2, . . . , N] do Calculate yi, the expected number of failures of FRUi; Add yi into Y, and MTTRi into MTTR; Add mi into m, and bi into b; end for X = ResolveOptimizationModel(Y, MTTR, m, b, B) for i = [1, 2, . . . , N] do if ni < xi then Add (xi − ni) spares FRUi in to SP; end if end for
10 20 30 40 20 40 60 80 100 120 140 Annual Provision Budget (10,000 USD) Average Unavailable Duration in 5 Years (Hours) 48 SSUs, RAID6 Configuration Optimized Controller−first Enclosure−first Unlimited Budget
Capturing system topology, reliability, and performance needs to improve data availability duration (multi-tier storage hierarchy with SSDs)
1 2 3 4
Power Cap Level (watt) Execution Time
1 2 3 4
Power Cap Level (watt) Execution Time
1 2 3 4
Power Cap Level (watt) Execution Time
1 2 3 4
Power Cap Level (watt) Execution Time
1 2 3 4
Power Cap Level (watt) Execution Time
1 2 3 4
Power Cap Level (watt) Execution Time
2 3 4
Power Cap Level (watt) Execution Time
1 2 3 4
Power Cap Level (watt) Execution Time
1 2 3 4
Power Cap Level (watt) Execution Time
2 3 4
Power Cap Level (watt) Execution Time
1 2 3 4
Power Cap Level (watt) Execution Time
60 50 40 30
MG LU FT BT SP LUD IS EP CFD lavaMD CG : y = 22e(−0.08x) + 1 lavaMD: y = 35e(−0.12x) + 1
Time (s) Power Consumption (watts) Time (s) Power Consumption (watts) Time (s) Power Consumption (watts) Time (s) Power Consumption (watts) Time (s) Power Consumption (watts) Time (s) Power Consumption (watts) Time (s) Power Consumption (watts) Time (s) Power Consumption (watts)
9 18 27 36 45 30 60 90
25 w 30 w 35 w 40 w 45 w 50 w 55 w 60 w
Power Cap Level (watt) Power Cap Level (watt)
600 1200 1800 2400
60 50 40 30
400 600 800
Execution Time (Hours) Energy Consumption (MWh)
First order High order
High order
1 2 3 4
Power Cap Level (watt) Execution Time
2 3 4
Power Cap Level (watt) Execution Time
1 2 3 4
Power Cap Level (watt) Execution Time
1 2 3 4
Power Cap Level (watt)
1 2 3 4
Power Cap Level (watt)
1 2 3 4
Power Cap Level (watt)
1 2 3 4
Power Cap Level (watt)
1 2 3 4
Power Cap Level (watt)
2 3 4
Power Cap Level (watt)
60 55 50 45 40 35 30 25
1 2 3 4
Energy Consumption
SP BT
Checkpoint Interval (Hours) Checkpoint Interval (Hours)
- ●
- ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
2400 4800 7200 9600
2 4 6 8
Checkpoint Interval (Hours) Checkpoint Interval (Hours)
- ●
- ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
- ● ● ●
600 1200 1800 2400
Execution Time (Hours) Energy Consumption (MWh)
First High αt
+
αt
−
First High αe
+
αe
−
N
Power Cap Level (watt) Data Volume (PB) Power Cap Level (watt) Data Volume (PB) Power Cap Level (watt) Data Volume (PB) Power Cap Level (watt) Data Volume (PB) Power Cap Level (watt) Data Volume (PB) Power Cap Level (watt) Data Volume (PB) Power Cap Level (watt) Data Volume (PB) Power Cap Level (watt) Data Volume (PB) Power Cap Level (watt) Data Volume (PB)
60 50 40 30 10 20 30 40 50 60
αt
+
αt
−
αe
+
αe
− 10 20 30 40 50 60 10 20 30 40 50 60
Temperature (°C) MTBF (hours)
10 20 30 40 50 60 10 20 30 40 50 60
Temperature (°C) MTBF (hours)
10 20 30 40 50 60 10 20 30 40 50 60
Temperature (°C) MTBF (hours)
20 30 40 50 60 10 20 30 40 50 60
Temperature (°C) MTBF (hours)
10 20 30 40 50 60 10 20 30 40 50 60
Temperature (°C) MTBF (hours)
10 20 30 40 50 60 10 20 30 40 50 60
Temperature (°C) MTBF (hours)
10 20 30 40 50 60 10 20 30 40 50 60
Temperature (°C) MTBF (hours)
10 20 30 40 50 60 10 20 30 40 50 60
Temperature (°C) MTBF (hours)
Upper Bound Average Arrhenius
Power-capping, reliability, and performance