From something that fits in your pocket ... ... to, well, this. The - - PowerPoint PPT Presentation
From something that fits in your pocket ... ... to, well, this. The - - PowerPoint PPT Presentation
From something that fits in your pocket ... ... to, well, this. The future? ... Energy A look at cluster computers and datacenters Tarun Prabhu, Radha Venkatagiri Datacenters Datacenters Most (all?) of you probably know what they are Most
From something that fits in your pocket ...
... to, well, this.
The future? ...
Energy
A look at cluster computers and datacenters
Tarun Prabhu, Radha Venkatagiri
Datacenters
Datacenters
Most (all?) of you probably know what they are Most (all?) of you know what they are used for
Energy usage in datacenters
Used 76,000,000,000 kWH in 2010 2% of all electricity produced in the US ≈1.3% of all electricity produced globally †
†Koomey, J. “Growth in Data Center Electricity Use 2005 to 2010”, Analytics Press, 2011
Increasing datacenter efficiency
Reduce infrastructure overheads Reduce ancillary (non-computing) costs Reduce computing costs
How efficient are these?
PUE(Power Energy Effectiveness): indicates how much energy is used for non-computing functions. Average PUE is 1.8 (this means that for every 1 Watt used for computing, another 0.8 Watts is used in overheads) † Company PUE Comments Facebook 1.07 Google 1.14 Individual facility goes to 1.06 Yahoo – Individual facility goes to 1.08 Amazon 1.45 Assumption by Amazon themselves Microsoft 1.25 Target for April 2013 Apple – They shall never tell ...
Table : Efficiency of datacenter giants‡
†http://www.datacenterknowledge.com/archives/2011/05/10/uptime-institute-the-average-pue-is-1-8/ ‡http://gigaom.com/cloud/whose-data-centers-are-more-efficient-facebooks-or-googles/
Infrastructure overheads - What are they?
Infrastructure overheads - Cooling, power etc.
Heat management to reduce hot spots Natural cooling
Air - Buffalo(Yahoo), Lulea(Facebook) Sea-water - Hamina(Google) Evaporative cooling - Prineville(Facebook)
Optimize power distribution
Efficient power-supplies Minimize AC/DC conversion stages
Nifty new ideas, for instance
- il baths
Ancillary costs ...
What Facebook did
Ancillary costs ...
What Facebook did Toss everything and go back to the drawing board.
Ancillary costs ...
What Facebook did Toss everything and go back to the drawing board. Literally
Open Compute Project
Facebook custom-designed ... everything † Kept only what was strictly necessary 38% more efficient, 24% cheaper Made all specifications (CAD drawings etc.) publicly available http://www.opencompute.org
- motherboards
power supplies
- server chassis
- server racks
- battery cabinets
†https://www.facebook.com/notes/facebook-engineering/building-efficient-data-centers-with-the-open- compute-project/10150144039563920
Reducing computing costs
Reducing computing costs
Tackle under-utilization and overprovisioning
Server utilization
Average server utilization in datacenters is ≈ 50%
Reasons for under-utilization
Planning for traffic spikes Reliability considerations System-software maintenance is safer
Real reason
Clients get cranky!
Energy-proportional computing
Under-utilization is a problem because, as things stand today, power consumed is not proportional to work done Ideally, the dynamic range of energy consumption should be
- increased. In this, no power will be consumed when idle, little
power will be consumed when doing minimal work and the consumption would increase gradually until the machine is fully loaded
Reducing compute costs - I
Tackling under-utilization with operating system support Turn/off suspend hosts during low-usage periods Intelligent load-balancing Resource-aware scheduling Power-aware scheduling
Google’s warehouse computing
Google’s approach to building datacenters Treat entire datacenter as one BIG computer Centralized resource management. Provides greater flexibility in decision-making to improve metrics
Reducing compute costs - II
Customizing hardware to applications
An example - FAWN
Fast Array of Wimpy Nodes Single-core AMD Geode processor(500 Mhz) 256 MB DDR SDRAM (400 Mhz) 4GB CompactFlash storage Intel Atom front-end
Academic research
Jointly optimize computing and cooling energy (ICDCS ’12) Data-centric approaches by focusing on where to place data to minimize energy consumption (SC ’12) Improving network and interconnect efficiency by scaling network up and down based on traffic demands (USENIX ’10) Intelligent allocation of work to compute-units based on job characteristics, environmental conditions etc.
Supercomputers
Supercomputers
Tens of thousands (or more) of compute elements operating together TB’s (now PB’s) of memory PB’s (nearing EB’s of storage)
Uses of supercomputers
Molecular dynamics Fluid dynamics: Airframe design Modelling astrophysics phenomena Earthquake system science Simulation of spread of contagion Cosmology (formation of the first galaxies) Climate modelling and hypothesis confirmation
Uses of supercomputers
Molecular dynamics Fluid dynamics: Airframe design Modelling astrophysics phenomena Earthquake system science Simulation of spread of contagion Cosmology (formation of the first galaxies) Climate modelling and hypothesis confirmation (of global warming perhaps)
Top 500 List
Top 500 List
# Name Location Cores PFLOPS Power(MW) 1 Titan USA 560K* 17.59 8.21 2 Sequoia USA 1572K 16.32 7.89 3 K Computer Japan 705K 10.5 12.66 4 Mira USA 786K 8.16 3.95 5 JuQueen Germany 131K 4.14 1.97 6 SuperMUC Germany 147K 2.89 3.42 7 Stampede USA 204K* 2.66 8 Tianhe-1A China 186K 2.56 4.04 9 Fermi Italy 163K 1.72 0.82 10 DTS USA 63K 1.51 3.57
Table : Power consumption of world’s fastest computers
http://www.top500.org/list/2012/06/100
How efficient is this?
These machines can simulate a rat’s brain
How efficient is this?
WARNING: Some math here
How efficient is this?
WARNING: Some Bad math here
How efficient is this?
WARNING: Some Bad math here Brain weight comparison Whuman = 1400 gms† Wrat = 2 gms† Phuman ≈ 30 W‡
†http://faculty.washington.edu/chudler/facts.html
How efficient is this?
WARNING: Some Bad math here Brain weight comparison Whuman = 1400 gms† Wrat = 2 gms† Phuman ≈ 30 W‡ ∴ Prat brain ≈ 2 1400 × 30 = 0.043W
†http://faculty.washington.edu/chudler/facts.html
How efficient is this?
WARNING: Some Bad math here Brain weight comparison Whuman = 1400 gms† Wrat = 2 gms† Phuman ≈ 30 W‡ ∴ Prat brain ≈ 2 1400 × 30 = 0.043W Metabolism fraction Prat ≈ Wrat Whuman × Phuman = 0.4 62 × 100 §‡ = 0.64 W
†http://faculty.washington.edu/chudler/facts.html ‡http://hypertextbook.com/facts/2001/JacquelineLing.shtml
How efficient is this?
WARNING: Some Bad math here Brain weight comparison Whuman = 1400 gms† Wrat = 2 gms† Phuman ≈ 30 W‡ ∴ Prat brain ≈ 2 1400 × 30 = 0.043W Metabolism fraction Prat ≈ Wrat Whuman × Phuman = 0.4 62 × 100 §‡ = 0.64 W Prat brain = 0.05 × Prat = 0.032W
†http://faculty.washington.edu/chudler/facts.html ‡http://hypertextbook.com/facts/2001/JacquelineLing.shtml
How efficient is this?
WARNING: Some Bad math here Brain weight comparison Whuman = 1400 gms† Wrat = 2 gms† Phuman ≈ 30 W‡ ∴ Prat brain ≈ 2 1400 × 30 = 0.043W Metabolism fraction Prat ≈ Wrat Whuman × Phuman = 0.4 62 × 100 §‡ = 0.64 W Prat brain = 0.05 × Prat = 0.032W
†http://faculty.washington.edu/chudler/facts.html ‡http://hypertextbook.com/facts/2001/JacquelineLing.shtml
How efficient is this?
WARNING: Some Bad math here Brain weight comparison Whuman = 1400 gms† Wrat = 2 gms† Phuman ≈ 30 W‡ ∴ Prat brain ≈ 2 1400 × 30 = 0.043W Metabolism fraction Prat ≈ Wrat Whuman × Phuman = 0.4 62 × 100 §‡ = 0.64 W Prat brain = 0.05 × Prat = 0.032W
†http://faculty.washington.edu/chudler/facts.html ‡http://hypertextbook.com/facts/2001/JacquelineLing.shtml
How efficient is this?
WARNING: Some Bad math here Brain weight comparison Whuman = 1400 gms† Wrat = 2 gms† Phuman ≈ 30 W‡ ∴ Prat brain ≈ 2 1400 × 30 = 0.043W Metabolism fraction Prat ≈ Wrat Whuman × Phuman = 0.4 62 × 100 §‡ = 0.64 W Prat brain = 0.05 × Prat = 0.032W
†http://faculty.washington.edu/chudler/facts.html ‡http://hypertextbook.com/facts/2001/JacquelineLing.shtml §http://www.biomedcentral.com/1471-2458/12/439
One of these ...
is equivalent to ...
is equivalent to ...
Exascale?
Exascale?
Enough computing power to simulate the human brain (2019?)
Exascale?
Needs 700 MW or more?
Exascale?
Needs 700 MW or more?
http://farm5.staticflickr.com/4011/4710638282 5e226f00f6.jpg
Exascale?
Needs 700 MW or more?
http://farm5.staticflickr.com/4011/4710638282 5e226f00f6.jpg http://images4.wikia.nocookie.net/ cb20100331223557/simpsons/images/0/0c/Springfield Nuclear Power Plant 1.PNG
Exascale?
Typical nuclear power plant produces 400-1200MW Needs 700 MW or more?
http://farm5.staticflickr.com/4011/4710638282 5e226f00f6.jpg http://images4.wikia.nocookie.net/ cb20100331223557/simpsons/images/0/0c/Springfield Nuclear Power Plant 1.PNG
Improving efficiency of clusters
Clusters Datacenters Tightly coupled execution Requests are typically independent Usually compute-intensive Usually data-intensive I/O tends to occur in waves Small amount of I/O most of the time Drastically different application characteristics makes tuning nearly impossible Workload for any one applica- tion is the same, so clusters of machines can be tuned
Improving efficiency of clusters
Modelling power consumption at a fine-granularity is even harder
Improving efficiency of clusters
Modelling power consumption at a fine-granularity is even harder Where do you stick the meter?
Improving efficiency of clusters
Divide application into phases and run each phase in the best power mode Reconfigurable network interconnects Minimize communication in the program. Or else, exploit patters in the communication and allocate interacting processes to nearby compute units Use dynamic voltage and frequency scaling intelligently
Improving efficiency of clusters
Divide application into phases and run each phase in the best power mode Reconfigurable network interconnects Minimize communication in the program. Or else, exploit patters in the communication and allocate interacting processes to nearby compute units Use dynamic voltage and frequency scaling intelligently Fallback on heterogeneity (each component of the cluster
- ptimized for a specific task e.g. wimpy nodes for IO, GPU’s
for matrix operations)
The hard way
Programmer explicitly coding for energy-efficiency? Potentially nightmarish, but tool support might be possible (gcc -P2)?
Clusters: Green 500 List
# Name Country GFlops/W Power(kW) Top 500 # 1 Beacon USA 2.499 44.89 253 2 SANAM KSA 2.351 179.5 53 3 Titan USA 2.142 8209 1 4 Todi Sui 2.121 129 91 5 JuQueen Ger 2.102 1970 5 6 (UoT) Can 2.101 41.09 401 7 (LLNL) USA 2.101 41.09 399 8 (IBM) USA 2.101 41.09 400 9 (IBM) USA 2.101 82.19 140 10 CADMOS Sui 2.100 82.19 141
How green are the Top 500?
# Name Power(MW) Green 500 # 1 Titan 8.21 3 2 Sequoia 7.89 30 3 K Computer 12.66 85 4 Mira 3.95 29 5 JuQueen 1.97 5 6 SuperMUC 3.42 82 7 Stampede 2.66 142 8 Tianhe-1A 4.04 87 9 Fermi 1.72 23 10 DTS 1.51 159