Computing: Is it Worth the Pain? A TCO Perspective Sandra Wienke, - - PowerPoint PPT Presentation

computing
SMART_READER_LITE
LIVE PREVIEW

Computing: Is it Worth the Pain? A TCO Perspective Sandra Wienke, - - PowerPoint PPT Presentation

Accelerators in Technical Computing: Is it Worth the Pain? A TCO Perspective Sandra Wienke, Dieter an Mey, Matthias S. Mller Center for Computing and Communication JARA High-Performance Computing RWTH Aachen University Rechen- und


slide-1
SLIDE 1

Rechen- und Kommunikationszentrum (RZ)

Accelerators in Technical Computing: Is it Worth the Pain?

A TCO Perspective Sandra Wienke, Dieter an Mey, Matthias S. Müller

Center for Computing and Communication JARA – High-Performance Computing RWTH Aachen University

slide-2
SLIDE 2

TCO of Accelerators Sandra Wienke | Center for Computing and Communication

2

Agenda

 Introduction  Modeling

 Total Cost of Ownership (TCO)  Comparison Metrics

 Case Study on Accelerators

 Programming Models & System Types  TCO Components @ RWTH  Real-World Application  Results

 Conclusion & Outlook

slide-3
SLIDE 3

TCO of Accelerators Sandra Wienke | Center for Computing and Communication

3

 Today: Varity of HPC clusters  Usage of accelerators (NVIDIA GPU, Intel Xeon Phi) motivated by promising performance per watt ratio  System comparison by performance or performance per watt not sufficient for purchase decision  Total costs of ownership (TCO)

 Acquisition costs, housing, operation costs,..  Inclusion of manpower costs (administration & programming)

 Comparison of costs per program run (application-dependent)

 Investigation of a real-world software package  OpenMP on Intel Sandy Bridge  OpenMP + LEO on Intel Xeon Phi  OpenCL, OpenACC on NVIDA Fermi GPU

Introduction

Impact of manpower effort/ programming model?

slide-4
SLIDE 4

TCO of Accelerators Sandra Wienke | Center for Computing and Communication

4

Modeling – Total Cost of Ownership (TCO)

 Basis: single compute node  extrapolate to cluster amount  𝐉𝐨𝐰𝐟𝐭𝐮𝐧𝐟𝐨𝐮 𝑱 = 𝐔𝐃𝐏 𝒐, 𝝊 = 𝑫𝒑𝒖(𝒐) + 𝑫𝒒𝒃(𝒐) ∙ 𝝊

 One-time costs Cot Per node: HW acquisition, building/infrastructure, OS/ env. installation Per node type: OS/ env. installation, programming effort  Annual costs Cpa Per node: HW maintenance, building/infrastructure, OS/ env. maintenance, power consumption Per node type: OS/ env. maintenance, compiler/software, application maintenance

 TCO depends on architecture & application

𝑜: number of nodes 𝜐: system lifetime

slide-5
SLIDE 5

TCO of Accelerators Sandra Wienke | Center for Computing and Communication

5

Modeling – Comparison Metrics

 Costs per program run Cppr

 Includes investment/ TCO & application performance

𝐷𝑞𝑞𝑠 𝑜, 𝜐 = TCO(𝑜, 𝜐) 𝑜𝑓𝑦(𝜐) ∙ 𝑜 with 𝑜𝑓𝑦 𝜐 = 𝑙 ∙ 𝜐 𝑢𝑞𝑏𝑠

 Used baseline for system X: Intel Sandy Bridge (SNB) + OpenMP

𝐷𝑞𝑞𝑠,𝑌 𝑜𝑌, 𝜐 − 𝐷𝑞𝑞𝑠,𝑃𝑁𝑄 𝑜𝑃𝑁𝑄, 𝜐 𝐷𝑞𝑞𝑠,𝑃𝑁𝑄 𝑜𝑃𝑁𝑄, 𝜐 < 0 ≥ 0 𝑗𝑔 𝑌 𝑃𝑁𝑄 beneficial  Break-even investments

 Min. budget needed so that system X beneficial over OpenMP on SNB  Solve for 𝐽 with given fixed lifetime 𝜐:

𝐷𝑞𝑞𝑠,𝑌 𝑜𝑌, 𝜐 − 𝐷𝑞𝑞𝑠,𝑃𝑁𝑄 𝑜𝑃𝑁𝑄, 𝜐 = 0 with TCO 𝑜, 𝜐 = 𝐽

𝑜 ∶ number of nodes 𝜐 ∶ system lifetime 𝑜𝑓𝑦 ∶ #app. executions 𝑙 ∶ system usage rate 𝑢𝑞𝑏𝑠: parallel runtime

slide-6
SLIDE 6

TCO of Accelerators Sandra Wienke | Center for Computing and Communication

6

Case Study on Accelerators – Programming Models & System Types

Programming Model Accelerator Host Compiler Serial 2x Intel Sandy Bridge, 16 cores, 2 GHz Intel 13.0.1 OpenMP (simple, vectorized) LEO + OpenMP Intel Xeon Phi 5110P, 60 cores 1x Intel Westmere, 4 cores, 2.4 GHz Intel 13.0.1 OpenACC NVIDIA Tesla C2050 (Fermi), ECC on PGI 12.9 OpenCL Intel 13.0.1

slide-7
SLIDE 7

TCO of Accelerators Sandra Wienke | Center for Computing and Communication

7

Case Study on Accelerators – TCO Components @ RWTH

 One-time costs

 HW purchase: list prices from Bull  Building/infrastructure: as annual costs since it is amortized over 25 years  OS/env. installation: -  Programming effort: Full-time employee costs 285.71€ a day

 Annual costs

 HW maintenance: 5% of HW purchase costs  Building/infrastructure: 200,000€ per year; costs per node: division by 1.6MW; multiplication by max. power consumption of each node  OS/env. maintenance: 4 admins, 75% maintenance cluster (~2300 nodes): 180,000€ / 2300 = 78€ per node and year  Software/compiler: -  Power: PUE 1.5, regional electricity costs 0.15 €/kWh  Application maintenance: - (small kernels)

 Given lifetime of 4 years & investment  Cppr

 #nodes, #executions (usage rate 80%)

slide-8
SLIDE 8

TCO of Accelerators Sandra Wienke | Center for Computing and Communication

8

 Basis

 Serial version  Small kernel  Assumption: homogeneous app. landscape

 KegelSpan2

 3D simulation of bevel gear cutting process  Kernel artificially increased from 25% to 90%

Case Study on Accelerators – Real-World Application

Source: BMW, ZF, Klingelnberg

2 C. Brecher, C. Gorgels, and A. Hardjosuwito. Simulation based Tool Wear Analysis in

Bevel Gear Cutting. In International Conference on Gears, volume 2108.2 of VDI- Berichte, pp.1381–1384, Düsseldorf, VDI Verlag, 2010.

slide-9
SLIDE 9

TCO of Accelerators Sandra Wienke | Center for Computing and Communication

9

Case Study on Accelerators – TCO Components of Application

5.0 1.5 4.5 3.5 0.5 2 4 6 effort [days] 119 140 158 50 100 150 200 250 20 40 60 80 100 120 140 160 180 power consumption [W] runtime [s]

OpenCL (GPU) OpenACC (GPU) OpenMP+LEO (Phi) OpenMP-vec (SNB) OpenMP-simp (SNB)

slide-10
SLIDE 10

TCO of Accelerators Sandra Wienke | Center for Computing and Communication

10

Case Study on Accelerators – Results

7,787 1,809 7,231 0 € 5,000 € 10,000 € break-even investment

  • 20%
  • 10%

0% 10% 20% 0€ 100K€ 200K€ costs per program run (relative to OMP-simp) Investment

  • 17.15%

3.62%

  • 12.09%
  • 16.82%

OpenCL (GPU) OpenACC (GPU) OpenMP+LEO (Phi) OpenMP-vec (SNB)

slide-11
SLIDE 11

TCO of Accelerators Sandra Wienke | Center for Computing and Communication

11

Conclusion

 Are accelerators beneficial? “It depends”

 TCO spreadsheet1 for own computations available

 Our results (w/ 90% kernel portion) show

 GPU Fermi beneficial over 2-socket Intel SNB server  Intel Xeon Phi results disappointing for now Mainly due to high acquisition costs  NVIDIA Kepler probably similar  Programming effort impacts break-even investment (see OpenACCOpenCL)

 Bigger codes: increase of kernel size ~ increase of break-even invest.  Projections possible (e.g. hybrid codes)

1 Wienke, S., an Mey, D., Müller, M.S.: Accelerators for Technical

Computing: Is it Worth the Pain? TCO Spreadsheet. https://sharepoint. campus.rwth-aachen.de/units/rz/HPC/public/Shared%20Documents/ WienkeEtAl_Accelerators-TCO-Perspective.xlsx, 2013

SNB-OMP (4 years, 250 K€)

  • 17% Cppr

+ 4% Cppr

slide-12
SLIDE 12

TCO of Accelerators Sandra Wienke | Center for Computing and Communication

12

Outlook

 Hybrid code implementation (cmp to projections)  Model extensions

 New programming models & architectures (OpenMP 4.0, NVIDIA Kepler)  Network communication (MPI)  Mixed job execution (heterogeneous application landscape)  Assessment of decrease in runtime/ gaining more results

 Comprehensive TCO calculation with predictive powers

 Performance, power consumption, manpower

 Towards exascale computing, architectures might get more complex

 More difficult to manage & program  Impact of manpower effort might get stronger

Thank you for your attention!