computing
play

Computing: Is it Worth the Pain? A TCO Perspective Sandra Wienke, - PowerPoint PPT Presentation

Accelerators in Technical Computing: Is it Worth the Pain? A TCO Perspective Sandra Wienke, Dieter an Mey, Matthias S. Mller Center for Computing and Communication JARA High-Performance Computing RWTH Aachen University Rechen- und


  1. Accelerators in Technical Computing: Is it Worth the Pain? A TCO Perspective Sandra Wienke, Dieter an Mey, Matthias S. Müller Center for Computing and Communication JARA – High-Performance Computing RWTH Aachen University Rechen- und Kommunikationszentrum (RZ)

  2. Agenda  Introduction  Modeling  Total Cost of Ownership (TCO)  Comparison Metrics  Case Study on Accelerators  Programming Models & System Types  TCO Components @ RWTH  Real-World Application  Results  Conclusion & Outlook TCO of Accelerators 2 Sandra Wienke | Center for Computing and Communication

  3. Introduction  Today: Varity of HPC clusters  Usage of accelerators (NVIDIA GPU, Intel Xeon Phi) motivated by promising performance per watt ratio  System comparison by performance or performance per watt not sufficient for purchase decision  Total costs of ownership (TCO)  Acquisition costs, housing, operation costs,..  Inclusion of manpower costs (administration & programming)  Comparison of costs per program run (application-dependent)  Investigation of a real-world software package  OpenMP on Intel Sandy Bridge Impact of manpower effort/  OpenMP + LEO on Intel Xeon Phi programming model?  OpenCL, OpenACC on NVIDA Fermi GPU TCO of Accelerators 3 Sandra Wienke | Center for Computing and Communication

  4. Modeling – Total Cost of Ownership (TCO)  Basis: single compute node  extrapolate to cluster amount 𝑜: number of nodes  𝐉𝐨𝐰𝐟𝐭𝐮𝐧𝐟𝐨𝐮 𝑱 = 𝐔𝐃𝐏 𝒐, 𝝊 = 𝑫 𝒑𝒖 (𝒐) + 𝑫 𝒒𝒃 (𝒐) ∙ 𝝊 𝜐: system lifetime  One-time costs C ot  Per node: HW acquisition, building/infrastructure, OS/ env. installation  Per node type: OS/ env. installation, programming effort  Annual costs C pa  Per node: HW maintenance, building/infrastructure, OS/ env. maintenance, power consumption  Per node type: OS/ env. maintenance, compiler/software, application maintenance  TCO depends on architecture & application TCO of Accelerators 4 Sandra Wienke | Center for Computing and Communication

  5. Modeling – Comparison Metrics  Costs per program run C ppr 𝑜 ∶ number of nodes 𝜐 ∶ system lifetime  Includes investment/ TCO & application performance 𝑜 𝑓𝑦 ∶ #app. executions 𝐷 𝑞𝑞𝑠 𝑜, 𝜐 = TCO(𝑜, 𝜐) 𝑜 𝑓𝑦 (𝜐) ∙ 𝑜 with 𝑜 𝑓𝑦 𝜐 = 𝑙 ∙ 𝜐 𝑙 ∶ system usage rate 𝑢 𝑞𝑏𝑠 : parallel runtime 𝑢 𝑞𝑏𝑠  Used baseline for system X: Intel Sandy Bridge (SNB) + OpenMP 𝐷 𝑞𝑞𝑠,𝑌 𝑜 𝑌 , 𝜐 − 𝐷 𝑞𝑞𝑠,𝑃𝑁𝑄 𝑜 𝑃𝑁𝑄 , 𝜐 < 0 ≥ 0 𝑗𝑔 𝑌 𝑃𝑁𝑄 beneficial 𝐷 𝑞𝑞𝑠,𝑃𝑁𝑄 𝑜 𝑃𝑁𝑄 , 𝜐  Break-even investments  Min. budget needed so that system X beneficial over OpenMP on SNB  Solve for 𝐽 with given fixed lifetime 𝜐 : 𝐷 𝑞𝑞𝑠,𝑌 𝑜 𝑌 , 𝜐 − 𝐷 𝑞𝑞𝑠,𝑃𝑁𝑄 𝑜 𝑃𝑁𝑄 , 𝜐 = 0 with TCO 𝑜, 𝜐 = 𝐽 TCO of Accelerators 5 Sandra Wienke | Center for Computing and Communication

  6. Case Study on Accelerators – Programming Models & System Types Programming Model Accelerator Host Compiler Serial 2x Intel Sandy Bridge, Intel 13.0.1 OpenMP 16 cores, 2 GHz (simple, vectorized) Intel Xeon Phi LEO + OpenMP Intel 13.0.1 5110P, 60 cores 1x Intel Westmere, OpenACC NVIDIA Tesla PGI 12.9 4 cores, 2.4 GHz C2050 (Fermi), OpenCL Intel 13.0.1 ECC on TCO of Accelerators 6 Sandra Wienke | Center for Computing and Communication

  7. Case Study on Accelerators – TCO Components @ RWTH  One-time costs  HW purchase: list prices from Bull  Building/infrastructure: as annual costs since it is amortized over 25 years  OS/env. installation: -  Programming effort: Full-time employee costs 285.71 € a day  Annual costs  HW maintenance: 5% of HW purchase costs  Building/infrastructure: 200,000 € per year; costs per node: division by 1.6MW; multiplication by max. power consumption of each node  OS/env. maintenance: 4 admins, 75% maintenance cluster (~2300 nodes): 180,000 € / 2300 = 78 € per node and year  Software/compiler: -  Power: PUE 1.5, regional electricity costs 0.15 € /kWh  Application maintenance: - (small kernels)  Given lifetime of 4 years & investment  C ppr  #nodes, #executions (usage rate 80%) TCO of Accelerators 7 Sandra Wienke | Center for Computing and Communication

  8. Case Study on Accelerators – Real-World Application  Basis  Serial version  Small kernel  Assumption: homogeneous app. landscape  KegelSpan 2 Source: BMW, ZF, Klingelnberg  3D simulation of bevel gear cutting process  Kernel artificially increased from 25% to 90% TCO of Accelerators 2 C. Brecher, C. Gorgels, and A. Hardjosuwito. Simulation based Tool Wear Analysis in 8 Sandra Wienke | Center for Computing and Communication Bevel Gear Cutting. In International Conference on Gears, volume 2108.2 of VDI- Berichte, pp.1381 – 1384, Düsseldorf, VDI Verlag, 2010.

  9. Case Study on Accelerators – TCO Components of Application 180 250 OpenCL (GPU) 158  power consumption [W] 160 OpenACC (GPU) 140 200 140 119 OpenMP+LEO (Phi) runtime [s] 120 OpenMP-vec (SNB) 150 100 OpenMP-simp (SNB) 80 100 60 40 50 20 0 0 6 5.0 effort [days] 4.5 3.5 4 1.5 2 0.5 0 TCO of Accelerators 9 Sandra Wienke | Center for Computing and Communication

  10. Case Study on Accelerators – Results 20% costs per program run (relative to OMP-simp) OpenCL (GPU) OpenACC (GPU) 10% OpenMP+LEO (Phi) 3.62% 0% OpenMP-vec (SNB) -10% -12.09% -16.82% -20% -17.15% 0 € 100K € 200K € Investment 10,000 € break-even investment 7,787 7,231 5,000 € 1,809 0 € TCO of Accelerators 10 Sandra Wienke | Center for Computing and Communication

  11. Conclusion  Are accelerators beneficial? “It depends”  TCO spreadsheet 1 for own computations available  Our results (w/ 90% kernel portion) show SNB-OMP (4 years, 250 K € )  GPU Fermi beneficial over 2-socket Intel SNB server -17% C ppr + 4% C ppr  Intel Xeon Phi results disappointing for now  Mainly due to high acquisition costs  NVIDIA Kepler probably similar  Programming effort impacts break-even investment (see OpenACC  OpenCL)  Bigger codes: increase of kernel size ~ increase of break-even invest.  Projections possible (e.g. hybrid codes) 1 Wienke, S., an Mey, D., Müller, M.S.: Accelerators for Technical TCO of Accelerators 11 Computing: Is it Worth the Pain? TCO Spreadsheet. https://sharepoint. Sandra Wienke | Center for Computing and Communication campus.rwth-aachen.de/units/rz/HPC/public/Shared%20Documents/ WienkeEtAl_Accelerators-TCO-Perspective.xlsx, 2013

  12. Outlook  Hybrid code implementation (cmp to projections)  Model extensions  New programming models & architectures (OpenMP 4.0, NVIDIA Kepler)  Network communication (MPI)  Mixed job execution (heterogeneous application landscape)  Assessment of decrease in runtime/ gaining more results  Comprehensive TCO calculation with predictive powers  Performance, power consumption, manpower  Towards exascale computing, architectures might get more complex  More difficult to manage & program Thank you for  Impact of manpower effort might get stronger your attention! TCO of Accelerators 12 Sandra Wienke | Center for Computing and Communication

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend