Piz Daint & Piz Kesch: from general purpose super- computing to - - PowerPoint PPT Presentation

piz daint piz kesch from general purpose super computing
SMART_READER_LITE
LIVE PREVIEW

Piz Daint & Piz Kesch: from general purpose super- computing to - - PowerPoint PPT Presentation

Piz Daint & Piz Kesch: from general purpose super- computing to an appliance for weather forecasting Thomas C. Schulthess GTC 2016, San Jose, Wednesday April 6, 2016 T. Schulthess 1 Cray XC30 with Piz Daint 5272


slide-1
SLIDE 1
  • T. Schulthess

GTC 2016, San Jose, Wednesday April 6, 2016

Thomas C. Schulthess

1

“Piz Daint” & “Piz Kesch”: from general purpose super- computing to an appliance for weather forecasting

slide-2
SLIDE 2
  • T. Schulthess

GTC 2016, San Jose, Wednesday April 6, 2016 2

Cray XC30 with 
 5272 hybrid, GPU accelerated compute nodes Compute node: > Host: Intel Xeon E5 2670 (SandyBridge 8c) > Accelerator: NVIDIA K20X GPU (GK110)

“Piz Daint”

slide-3
SLIDE 3
  • T. Schulthess

GTC 2016, San Jose, Wednesday April 6, 2016 3

September 15, 2015

Today’s Outlook: GPU-accelerated Weather Forecasting

John Russell

3

“Piz Kesch”

slide-4
SLIDE 4
  • T. Schulthess

GTC 2016, San Jose, Wednesday April 6, 2016

Swiss High-Performance Computing & Networking Initiative (HPCN)

4

Upgrade to
 Cray XE6 47,200 cores

High-risk & high-impact projects (www.hp2c.ch)

Three pronged approach of the HPCN Initiative

  • 1. New, flexible, and efficient building
  • 2. Efficient supercomputers
  • 3. Efficient applications

Hex-core upgrade 22’128 cores Monte Rosa
 Cray XT5 14’762 cores

2009 2011 2012 2013 2014 2015 2016 2017 2010

Begin construction


  • f new building

New building complete Development & procurement of petaflop/s scale supercomputer(s)

Application driven co-design 


  • f pre-exascale supercomputing ecosystem

Phase I Aries network & multi-core Phase II K20X based hybrid

Upgrade

Phase II Pascal based hybrid

slide-5
SLIDE 5
  • T. Schulthess

GTC 2016, San Jose, Wednesday April 6, 2016

Platform for Advanced Scientific Computing

5

Climate Physics Solid Earth Dynamics Materials simulations Life Sciences Structuring project of the Swiss University Conference (swissuniversities) 5 domain science networks > distributed application support >20 projects see: www.pasc-ch.org

1.ANSWERS 2.Angiogenesis 3.AV-FLOPW 4.CodeWave 5.Coupled Cardiac Simulations 6.DIAPHANE 7.Direct GPU to GPU com. 8.Electronic Structure Calc. 9.ENVIRON 10.Genomic Data Processing 11.GeoPC 12.GeoScale 13.Grid Tools 14.Heterogen. Compiler Platform 15.HPC-ABGEM 16.MD-based drug design 17.Multiscale applications 18.Multiscale economical data 19.Particles and fields 20.Snowball sampling

slide-6
SLIDE 6
  • T. Schulthess

GTC 2016, San Jose, Wednesday April 6, 2016 6

slide-7
SLIDE 7
  • T. Schulthess

GTC 2016, San Jose, Wednesday April 6, 2016 7

Leutwyler, D., O. Fuhrer, X. Lapillone, D. Lüthi, C. Schär, 2015: Continental-Scale Climate Simulation at Kilometer resolution. 
 ETH Zurich Online Resource, DOI: http://dx.doi.org/10.3929/ethz-a-010483656, online video: http://vimeo.com/136588806

slide-8
SLIDE 8
  • T. Schulthess

GTC 2016, San Jose, Wednesday April 6, 2016

Meteo Swiss production suite until March 30, 2016

8

ECMWF 2x per day 16 km lateral grid, 91 layers COSMO-7 3x per day 72h forecast 6.6 km lateral grid, 60 layers COSMO-2 8x per day 24h forecast 2.2 km lateral grid, 60 layers

Some of the products generate from these simulations:

  • Daily weather forecast on TV / radio
  • Forecasting for air traffic control (Sky Guide)
  • Safety management in event of nuclear incidents
slide-9
SLIDE 9
  • T. Schulthess

GTC 2016, San Jose, Wednesday April 6, 2016

“Albis” & “Lema”:
 CSCS production systems for Meteo Swiss until March 2016

9

Cray XE6 procured in spring 2012 based on 12-core AMD Opteron multi-core processors

slide-10
SLIDE 10
  • T. Schulthess

GTC 2016, San Jose, Wednesday April 6, 2016

Improving simulation quality requires higher performance – what exactly and by how much?

10

Resource determining factors for Meteo Swiss’ simulations COSMO-2: 24h forecast running in 30 min. 
 8x per day COSMO-1: 24h forecast running in 30 min. 
 8x per day (~10x COSMO-2) COSMO-2E: 21-member ensemble,120h forecast
 in 150 min., 2x per day (~26x COSMO-2) KENDA: 40-member ensemble,1h forecast
 in 15 min., 24x per day (~5x COSMO-2) Current model running through spring 2016 New model starting operation on in spring 2016 New production system must deliver ~40x the simulations performance


  • f “Albis” and “Lema”
slide-11
SLIDE 11
  • T. Schulthess

GTC 2016, San Jose, Wednesday April 6, 2016 11

  • New system needs to be installed Q2-3/2015
  • Assuming 2x improvement in per-socket performance:


~20x more X86 sockets would require 30 Cray XC cabinets Current Cray XC30/XC40 platform 
 (space for 40 racks XC) New system for Meteo Swiss if we build it like the German Weather Service (DWD) did theirs, or UK Met Office, or ECMWF … (30 racks XC) Albis & Lema: 3 cabinets Cray XE6 installed Q2/2012 Thinking inside the box is not a good option! CSCS machine room

State of the art implementation


  • f new system for Meteo Swiss
slide-12
SLIDE 12
  • T. Schulthess

GTC 2016, San Jose, Wednesday April 6, 2016

COSMO: old and new (refactored) code

12

main (current / Fortran) physics (Fortran) dynamics (Fortran) MPI system main (new / Fortran) physics (Fortran)
 with OpenMP / OpenACC dynamics (C++) MPI or whatever system Generic Comm. Library boundary conditions & halo exchg. stencil library X86 GPU Shared Infrastructure

Used by most weather services 
 (incl. MeteoSwiss until 3/2016)
 as well as most HPC centres HP2C/PASC development in production

  • n “Piz Daint” since 01/2014 and for

Meteo Meteo Swiss since 04/206

slide-13
SLIDE 13
  • T. Schulthess

GTC 2016, San Jose, Wednesday April 6, 2016 13

Piz Kesch / Piz Escha: appliance for meteorology

  • Water cooled rack (48U)
  • 12 compute nodes with
  • 2 Intel Xeon E5-2690v3 12

cores @ 2.6 GHz256 GB 2133 MHz DDR4 memory

  • 8 NVIDIA Tesla K80 GPU
  • 3 login nodes
  • 5 post-processing nodes
  • Mellanox FDR InfiniBand
  • Cray CLFS Luster Storage
  • Cray Programming Environment
slide-14
SLIDE 14
  • T. Schulthess

GTC 2016, San Jose, Wednesday April 6, 2016

Origin of factor 40 performance improvement

14

  • Current production system installed in 2012
  • New Piz Kesch/Escha installed in 2015
  • Processor performance
  • Improved system utilisation
  • General software performance
  • Port to GPU architecture
  • Increase in number of processors
  • Total performance improvement
  • Bonus: simulation running on GPU is 3x

more energy efficient compared to conventional state of the art CPU Performance of COSMO running on new “Piz Kesch” compared to (in Sept. 2015) (1) previous production system – Cray XE6 with AMD Barcelona (2) “Piz Dora” – Cray XE40 with Intel Haswell (E5-2690v3)

Moore’s Law Software
 refactoring

~40x 2.8x 2.8x 1.3x 2.3x 1.7x

slide-15
SLIDE 15
  • T. Schulthess

GTC 2016, San Jose, Wednesday April 6, 2016

A factor 40 improvement with the same footprint

15

Current production system: Albis & Lema New system: Kesch & Escha

slide-16
SLIDE 16
  • T. Schulthess

GTC 2016, San Jose, Wednesday April 6, 2016 16

2012 2013 2014 2015 2016 2017+ Summit Aurora Xeon Phi (accelerated) GPU - accelerated hybrid DARPA HPCS Tsuname-3.0

  • U. Tokyo

2011 post-K Multi-core

MeteoSwiss

Both architecture have heterogeneous memory!