Virtual Melting Temperature: Managing Server Load to Minimize - - PowerPoint PPT Presentation

virtual melting temperature
SMART_READER_LITE
LIVE PREVIEW

Virtual Melting Temperature: Managing Server Load to Minimize - - PowerPoint PPT Presentation

Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase Change Materials Matt Skach 1 , Manish Arora 2,3 , Dean Tullsen 3 , Lingjia Tang 1 , Jason Mars 1 University of Michigan 1 -- Advanced Micro Devices, Inc. 2


slide-1
SLIDE 1

Virtual Melting Temperature:

Managing Server Load to Minimize Cooling Overhead with Phase Change Materials

Matt Skach1, Manish Arora2,3, Dean Tullsen3, Lingjia Tang1, Jason Mars1

University of Michigan1 -- Advanced Micro Devices, Inc.2 -- UC San Diego3

ISCA ‘18

slide-2
SLIDE 2

Datacenters

2

Facebook Ireland Datacenter Facebook datacenter

Huge warehouses full of servers that host the internet and the cloud

slide-3
SLIDE 3

Datacenters Cooling

3

  • Heat must be removed to prevent:

○ Overheating ○ Thermal downclocking ○ Component failure

http://www.asetek.com/media/1031/rackcdu_d2c_datacenter.jpg

slide-4
SLIDE 4

Global Energy Consumption (CIA World Factbook)

4

Energy Consumption Electricity Consumption (TWh/year) 1 China 6,100 2 United States 4,100 3 European Union 3,100 4 India 1,300 5 Russia 1,000 6 Japan 980 7 Canada 640

slide-5
SLIDE 5

Datacenter Energy Consumption (Avgerinou, 2017)

5

Energy Consumption Electricity Consumption (TWh/year) 1 China 6,100 2 United States 4,100 3 European Union 3,100 Datacenters (global, est.) 1,600 4 India 1,300 5 Russia 1,000 6 Japan 980 7 Canada 640

slide-6
SLIDE 6

Datacenter Energy Consumption (Avgerinou, 2017)

6

Energy Consumption Electricity Consumption (TWh/year) 1 China 6,100 2 United States 4,100 3 European Union 3,100 Datacenters (global, est.) 1,600 4 India 1,300 5 Russia 1,000 6 Japan 980 Datacenter Cooling (global, est.) 650 7 Canada 640

slide-7
SLIDE 7

Datacenter Cooling

7

  • Datacenter cooling is very

expensive

○ Infrastructure can cost 10s of millions of dollars for large DCs

(Kontorinis, 2014)

○ Generally, more power efficient systems are more expensive up front

Open Compute cooling system

slide-8
SLIDE 8

Datacenter Workloads

  • Diurnal load is problematic

○ Work is uneven ○ Work is distributed ○ Heat is produced when work is done

8

Google Search: US Load

slide-9
SLIDE 9

Datacenter Cooling

  • Build a big cooling system for peak load

○ Underutilized most of the time

9

Expensive

100% coverage, low utilization

slide-10
SLIDE 10

Datacenter Cooling ctd.

  • Build a big cooling system for peak load

○ Underutilized most of the time

10

Expensive

100% coverage, low utilization

slide-11
SLIDE 11

Datacenter Cooling ctd.

  • Build a big cooling system for peak load

○ Underutilized most of the time

11

Expensive Best

100% coverage, low utilization 50% coverage, maximum utilization

slide-12
SLIDE 12

Thermal Time Shifting (TTS) [ISCA ‘15]

3am 7am 7pm 12am Time Cooling Load Store heat to flatten peak Release heat during off hours Coupled Decoupled

12

slide-13
SLIDE 13

Cooling Load

  • Metric of heat that must be removed
  • Datacenter is primarily concerned with IT & support equipment

13

http://www.slideshare.net/spsu/12-cooling-load-calculations

slide-14
SLIDE 14

A Phase Change Material (PCM)

14

  • Store energy in a Solid->Liquid phase change
  • Commercial paraffin wax offers the best properties of currently

available PCMs (Skach, 2015)

slide-15
SLIDE 15

The problem with passive TTS

Thermal Time Shifting:

  • Paraffin has a limited range of melting temperatures
  • Melting temperature cannot be changed
  • Power and temperature profiles vary over lifetime of servers

15

Wikimedia Commons

slide-16
SLIDE 16

Virtual Melting Temperature

  • Datacenters need more flexibility
  • Create a “virtual” melting temperature separate from the actual melting

temperature

16

Microsoft, Wikimedia Commons

slide-17
SLIDE 17

Test Infrastructure

  • 2U High Throughput Server
  • 2-day Google Workload trace divided between 5 datacenter workloads

17

slide-18
SLIDE 18

Test Methodology

  • 5 common datacenter workloads

1. Web Search 2. Data Caching 3. Video Encoding 4. Virus Scan 5. Clustering

  • Consider datacenter where all are colocated

○ Contention mitigation techniques applied (eg. Bubble Up (Mars, 2011) and Protean Code (Laurenzano, 2014))

18

slide-19
SLIDE 19

Baseline: Load Balancing Schedulers

  • Round Robin and Coolest First

19

slide-20
SLIDE 20

Baseline: Load Balancing Schedulers

  • Round Robin and Coolest First
  • Problem: Average cluster temperature is too low to melt wax
slide-21
SLIDE 21

Thermal Aware VMT

  • Categorize jobs based upon thermal characteristics

○ Binary classification: Would they melt significant wax in isolation?

21

slide-22
SLIDE 22

Thermal Aware VMT

  • Grouping Value (GV): Controllable ratio of group size

○ Proportional to hot group size

  • Locate ‘hot jobs’ together in ‘hot group’ to melt wax

22

slide-23
SLIDE 23

Thermal Aware VMT Results

  • Hot Group sized to melt wax during peak hours

23

slide-24
SLIDE 24

Thermal Aware VMT Results

  • Balance between melting wax too soon and not melting enough wax

24

GV=24: Hot group is too big GV=22: Hot group is just right GV=20: Hot Group is too small

slide-25
SLIDE 25

Thermal Aware VMT Results

  • Balance between melting wax too soon and not melting enough wax

25

GV=24: Hot group is too big GV=22: Hot group is just right GV=20: Hot Group is too small

slide-26
SLIDE 26

Wax Aware VMT

  • Begin with same setup as VMT-TA
  • When wax in hot group is fully melted, expand hot group

26

slide-27
SLIDE 27

Wax Aware VMT Results

  • Hot Group slightly too small: automatically expands during peak load

27

slide-28
SLIDE 28

Wax Aware VMT Results

  • Wax expansion preserves significant cooling load reduction

28

GV=24: Hot group is too big GV=22: Hot group is just right GV=20: Hot Group is too small

slide-29
SLIDE 29

Wax Aware VMT Results

  • Wax expansion preserves significant cooling load reduction

29

GV=24: Hot group is too big GV=22: Hot group is just right GV=20: Hot Group is too small

slide-30
SLIDE 30

VMT-TA vs. VMT-WA

  • Both work well at ideal GV
  • VMT-WA offers much more flexibility for unpredictable load

30

Smaller Hot Group Bigger Hot Group

slide-31
SLIDE 31

Summary

  • VMT stores thermal energy when passive TTS alone cannot

○ Reduces maximum cooling load of a diurnal workload ○ Configurable for varying datacenter power and load levels

  • VMT-enabled thermal energy storage can:

○ Reduce cooling system size 12% ○ Or allow up to 14% more servers under the same cooling budget

31

slide-32
SLIDE 32

Thank you!

32

slide-33
SLIDE 33

Questions?

33