Running the FIM and NIM Weather Models on GPUs Mark - PowerPoint PPT Presentation

Running ¡the ¡FIM ¡and ¡NIM ¡Weather ¡ Models ¡on ¡GPUs ¡ Mark ¡Gove: ¡ NOAA ¡Earth ¡System ¡Research ¡Laboratory ¡

Next-‑GeneraFon ¡Weather ¡Models ¡ increasingly ¡dependent ¡on ¡compuFng ¡ Lat-‑lon ¡grid ¡ Improved ¡PredicFon ¡of ¡ Global ¡Models ¡ Regional ¡Models ¡ – Hurricanes ¡ FIM ¡ Rapid ¡Refresh ¡ ¡ – Severe ¡Weather ¡ NIM ¡ – Wind ¡& ¡Solar ¡Energy ¡ – Regional ¡Climate ¡ HRRR ¡ – AviaFon ¡Weather ¡ – TransportaFon ¡ 240-‑km ¡icosahedral ¡grid ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡– ¡10,242 ¡polygons ¡ Real-‑Fme ¡FIM ¡forecasts-‑ ¡ Data ¡AssimilaFon ¡ ¡ ¡10-‑km ¡ ¡– ¡8.8M ¡polygons ¡ ¡

FIM ¡Global ¡Forecast ¡of ¡Hurricane ¡Sandy ¡ ¡ Wednesday ¡ October ¡24,2012 ¡ • Correctly ¡predicted ¡a ¡ 948 ¡mb ¡low ¡pressure ¡ into ¡Northern ¡New ¡ Jersey ¡ • Under-‑predicted ¡the ¡ winds ¡at ¡50 ¡knots ¡ ¡

HRRR ¡Forecast ¡of ¡Hurricane ¡Sandy ¡ • ESRL/GSD’s ¡HRRR ¡ model ¡consistently ¡ predicted ¡gusts ¡ above ¡70 ¡knots ¡ from ¡the ¡southeast ¡ over ¡the ¡New ¡York ¡ area ¡up ¡to ¡15 ¡hours ¡ in ¡advance. ¡

NIM ¡& ¡FIM ¡Fine-‑Grain ¡ParallelizaFon ¡ • Goal ¡ – Maintain ¡a ¡single ¡source ¡code ¡ • GPU, ¡MIC, ¡CPU, ¡serial, ¡parallel ¡ – DirecFves ¡used ¡for ¡parallelizaFon ¡ • DirecFve-‑based ¡Compilers ¡ – NVIDIA ¡GPU: ¡ ¡CAPS, ¡PGI, ¡CRAY, ¡F2C-‑ACC ¡ – AMD ¡GPU: ¡ ¡CAPS ¡ – MIC: ¡ ¡ ¡ ¡OMP ¡+ ¡extensions ¡ – MPI: ¡ ¡ ¡Scalable ¡Modeling ¡System ¡(SMS) ¡ ¡ ¡ ¡ ¡ ¡-‑ ¡Developed ¡in ¡ESRL, ¡used ¡for ¡2 ¡decades ¡ • Code ¡opFmizaFon ¡and ¡comparisons ¡ – Some ¡architecture ¡friendly ¡code ¡changes ¡explored ¡

NIM ¡Development ¡ • Developed ¡by ¡a ¡team ¡of ¡modelers, ¡computaFonal ¡ scienFsts, ¡parallel ¡programmers ¡ • Designed ¡for ¡Fine-‑Grain ¡architectures ¡ – ¡GPUs ¡& ¡Intel ¡Phi ¡ • TargeFng ¡3.5 ¡KM ¡resoluFon ¡ 2008 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡2009 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡2010 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡2011 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡2012 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡2013 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡2014 ¡

F2C-‑ACC ¡Compiler ¡ • DirecFve-‑based ¡Compiler ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡!ACC$<direcFve> ¡ • Developed ¡in ¡2009 ¡to ¡speed ¡code ¡conversion ¡of ¡NIM ¡ • Goal ¡is ¡to ¡have ¡a ¡single ¡source ¡code ¡that ¡runs ¡on ¡CPU, ¡GPU ¡ and ¡MIC ¡ – Important ¡for ¡code ¡developers ¡(scienFsts) ¡ – Reduces ¡development ¡Fme ¡ – Allows ¡for ¡direct ¡performance ¡comparisons ¡between ¡CPU, ¡GPU, ¡MIC ¡ • Being ¡used ¡to ¡parallelize ¡ ¡ – NIM, ¡FIM ¡dynamics ¡and ¡WRF ¡/ ¡YSU ¡physics ¡ • Working ¡with ¡the ¡GPU ¡compiler ¡vendors ¡on ¡improvements ¡ ¡ – CAPS, ¡PGI, ¡CRAY ¡

F2C-‑ACC ¡Compiler ¡Improvements ¡ • Ease ¡of ¡Use ¡ – AutomaFc ¡generaFon ¡of ¡data ¡movement ¡ – Assumes ¡data ¡is ¡resident ¡on ¡the ¡CPU ¡ • Bit-‑for-‑bit ¡correctness ¡with ¡CPU ¡ – Improvements ¡mimic ¡Fortran ¡behavior ¡ • POW, ¡MAX, ¡MIN ¡intrinsics ¡ – Variable ¡promoFon ¡– ¡adding ¡an ¡array ¡dimension ¡ • Performance ¡ – Variable ¡demoFon ¡– ¡removing ¡an ¡array ¡dimension ¡ – Control ¡of ¡global, ¡local, ¡shared ¡and ¡register ¡memory ¡ – OpFons ¡for ¡blocking ¡and ¡chunking ¡

Standalone ¡Test ¡Codes ¡ • From ¡FIM, ¡NIM, ¡WRF ¡rouFnes ¡ • Share ¡with ¡vendors ¡ – PGI, ¡CAPS, ¡Cray, ¡Intel, ¡NVIDIA ¡ • Fortran ¡Language ¡ – Test ¡compilers ¡ability ¡to ¡support ¡our ¡codes ¡ • Modules, ¡nested ¡rouFnes ¡& ¡kernels, ¡loop ¡structure ¡ • Fortran ¡intrinsic, ¡user ¡defined ¡funcFons ¡ • Correctness ¡ – GPU ¡results ¡are ¡bit ¡for ¡bit ¡exact ¡versus ¡CPU ¡ • Performance ¡ – Direct ¡comparisons ¡between ¡F2C-‑ACC ¡and ¡commercial ¡ compilers ¡

Performance ¡Results ¡ RunFmes ¡in ¡seconds ¡ • No ¡changes ¡to ¡the ¡Fortran, ¡only ¡the ¡F2C-‑ACC ¡direcFves ¡ • Explicit ¡use ¡of ¡GPU ¡memories ¡was ¡always ¡be:er ¡than ¡GPU ¡global ¡memory ¡with ¡cache. ¡ ¡ • Different ¡F2C-‑ACC ¡opFmizaFons ¡were ¡effecFve ¡for ¡different ¡rouFnes ¡ • RouHne ¡ GPU ¡-‑ ¡F2C ¡ GPU ¡– ¡F2C ¡ ¡ ¡ GPU ¡– ¡F2C ¡ GPU ¡– ¡F2C ¡ ¡ ¡ GPU ¡– ¡F2C ¡ CPU ¡ 1 ¡socket ¡ 1 ¡socket ¡ 1 ¡socket ¡ 1 ¡socket ¡ 1 ¡socket ¡ Westmere ¡ GLOBAL ¡ SHARED ¡ Shared ¡+ ¡ BLOCK ¡or ¡ ¡ ¡ ¡ ¡ 2 ¡sockets ¡ MEMORY ¡ MEMORY ¡ DemoHon ¡ CHUNKING ¡ BEST ¡ BEST ¡ trcadv ¡ 2.07 ¡ 1.79 ¡ 1.72 ¡ 1.53 ¡ 1.28 ¡ 4.22 ¡ cnuity ¡ ¡ 5.21 ¡ 3.20 ¡ 1.19 ¡ ¡ 1.08 ¡ 1.46 ¡ momtum ¡ 0.57 ¡ 0.52 ¡ 0.41 ¡ 1.67 ¡ vdmintv ¡ 12.5 ¡ 7.50 ¡ 3.68 ¡ 58.7* ¡ wrf_pbl ¡ 52.3 ¡ 3.04 ¡ 39.0* ¡

FIM ¡Performance ¡– ¡CPU, ¡GPU, ¡MIC ¡ • Run ¡Fmes ¡in ¡seconds ¡ • GPU ¡Fmings ¡used ¡F2C-‑ACC ¡compiler ¡ – Commercial ¡compilers ¡performance ¡between ¡Global ¡and ¡OpFmized ¡ • Intel ¡Xeon ¡Phi ¡ ¡(MIC) ¡– ¡SE10x ¡Pre-‑producFon ¡chip ¡ – 61 ¡cores, ¡1.091 ¡GHz, ¡8GB ¡memory ¡ FIM ¡Dynamics ¡ Intel ¡CPU ¡ Fermi ¡GPU ¡ Fermi ¡GPU ¡ Intel ¡CPU ¡ Intel ¡Xeon ¡Phi ¡ RouHnes ¡ Westmere ¡ Global ¡Mem ¡ OpHmized ¡ SandyBridge ¡ -‑ ¡KNC ¡ 1 ¡socket ¡ 1 ¡socket ¡ 1 ¡socket ¡ ¡ ¡ ¡ ¡1 ¡socket ¡ 1 ¡socket ¡ trcadv ¡ 4.04 ¡ 2.07 ¡ 1.28 ¡ 3.13 ¡ 1.59 ¡ cnuity ¡ 2.89 ¡ 5.20 ¡ 1.04 ¡ 1.46 ¡ 0.74 ¡ momtum ¡ 1.67 ¡ 0.57 ¡ 0.41 ¡ 0.93 ¡ 2.03 ¡ hybgen ¡ ¡ 6.01 ¡ in ¡progress ¡ In ¡progress ¡ 4.09 ¡ 3.40 ¡

Looking ¡Forward ¡to ¡Kepler ¡ • Architectures ¡are ¡diverse ¡and ¡conFnue ¡to ¡evolve ¡ – OpFmizaFons ¡can ¡differ ¡depending ¡on ¡chip ¡ • Challenge ¡to ¡retain ¡single ¡source ¡& ¡performance ¡portability ¡ GPU ¡Chip ¡ Tesla ¡(2008) ¡ ¡ Fermi ¡(2010) ¡ Fermi ¡(2011) ¡ Kepler ¡(2012) ¡ Intel ¡MIC ¡ C1060 ¡ C2050/70 ¡ C2090 ¡ K10 ¡ Knights ¡Corner ¡ Cores ¡ 240 ¡ 448 ¡ 512 ¡ 2 ¡x ¡1536 ¡ 61 ¡ ¡ ¡-‑ ¡Clock ¡Speed ¡ 1.15 ¡GHz ¡ 1.3 ¡GHz ¡ 0.74 ¡GHz ¡ ¡ ¡-‑ ¡Flops ¡SP ¡ 0.9 ¡TF ¡ 1.0 ¡TF ¡ 1.3 ¡TF ¡ 4.58 ¡TF ¡ Memory ¡ 2 ¡GB ¡ 3-‑6 ¡GB ¡ 6 ¡GB ¡ 8 ¡GB ¡ ¡ ¡-‑ ¡Bandwidth ¡ 102 ¡GB/sec ¡ 144 ¡GB/sec ¡ 177 ¡GB/sec ¡ 320 ¡GB/sec ¡ ¡ ¡ ¡-‑ ¡Shared/L1 ¡ 64 ¡KB ¡ 64 ¡KB ¡ 64KB ¡ Power ¡ 188 ¡W ¡ 238 ¡W ¡ 225 ¡W ¡ 225 ¡W ¡ Programming ¡ CUDA ¡ Cache ¡Mem ¡ Dynamic ¡ Features ¡ Parallelism ¡

Conclusion ¡& ¡Future ¡Work ¡ • FIM, ¡NIM ¡GPU ¡& ¡MIC ¡parallelizaFon ¡going ¡well ¡ • F2C-‑ACC ¡development ¡will ¡conFnue ¡for ¡now ¡ – EvaluaFon ¡of ¡commercial ¡GPU ¡compilers ¡planned ¡in ¡2013 ¡ • OpFmizaFons ¡of ¡parallel ¡NIM ¡conFnuing ¡ – In ¡earnest ¡once ¡we ¡access ¡to ¡Titan ¡ • Complete ¡GPU ¡parallelizaFon ¡of ¡FIM ¡ – Provide ¡as ¡standalone ¡test ¡case ¡to ¡vendors ¡ ¡

Running the FIM and NIM Weather Models on GPUs Mark - PowerPoint PPT Presentation

Running the FIM and NIM Weather Models on GPUs Mark Gove: NOAA Earth System Research Laboratory Next-GeneraFon Weather Models increasingly dependent

Nim on everything @PMunch peterme.net Peter Munch-Ellingsen, M.Sc What is Nim? Compiled

CHAMPIONSHIP 2016 FIM X-TRIAL DES NATIONS FORMAT & RULES FRANCE, Nice, 1st April 2016

FIM : Fbi IMproved 21.03.2013 A flexible image viewer muc.ccc.de svn export

Red- -Light Running Light Running Red Red-Light Running 2 Traffic Signals Traffic Signals

Red- -Light Running Light Running Red Red-Light Running 2 Traffic Signals Traffic Signals

How Weather Forecasting Works Extension Climate Learning Lab Forecasting Weather Weather

Reflections on the CIPM MRA: the NIM China Perspective Yuning Duan Vice Director, National

45 th Weather Squadron Space Weather Support to Launch Space Weather Workshop, 29 April 2016

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

Updates to ESRLs FIM-Chem global modeling system and comparison of aerosol optical depth

TSC Presentation EHR System Function and Information Model (EHR-S FIM) Release 3.0 Preparation (

Small Fim Success in Liberia: Information Frictions, Linkages and Informality Jonas Hjort 1 June

FIM CEV riders Biography ALBERT ARENAS #11 Date of birth: 11/12/96 Girona Spain Age: 19

Paralleliza(on and Performance of the NIM Weather Model on CPU, GPU and MIC Architectures Mark

2Q20 1 History and Business Highlights 3 Market Presence 7 Digital Bank 9 Results 13 NIM

Error Detection and Correction: Nim; Secure Communication; RAID Greg Plaxton Theory in

Automatic code features extraction using bio-inspired algorithms - presentation Data October

Pre-Trial Release PROS ECUTORIAL OPTIONS Rebecca Lange the duty to prosecute

Achieving Our Full Potential TOGETHER! District Goals: 1. Achieve educational excellence and

A V V O A V V O U I I D S B I U R T A A O L T R I Y O N AVVO WATCH IS A

Ohio Medicaid 2019 & EAPG Updates Agenda Ohio Medicaid 2019 updates EAPG overview

European Network Code Implementation Overview Wednesday 3 May 2017 EU Code Implementation

Council Sustainability Committee January 14, 2019 1 Hayward Police Department Year-End Report

General presentation of Commander66 Release: V1.02 28/08/2011 http://www.skynam.com Machine

Running the FIM and NIM Weather Models on GPUs Mark - PowerPoint PPT Presentation

Running the FIM and NIM Weather Models on GPUs Mark Gove: NOAA Earth System Research Laboratory Next-GeneraFon Weather Models increasingly dependent

Nim on everything @PMunch peterme.net Peter Munch-Ellingsen, M.Sc What is Nim? Compiled

CHAMPIONSHIP 2016 FIM X-TRIAL DES NATIONS FORMAT &amp; RULES FRANCE, Nice, 1st April 2016

FIM : Fbi IMproved 21.03.2013 A flexible image viewer muc.ccc.de svn export

Red- -Light Running Light Running Red Red-Light Running 2 Traffic Signals Traffic Signals

Red- -Light Running Light Running Red Red-Light Running 2 Traffic Signals Traffic Signals

How Weather Forecasting Works Extension Climate Learning Lab Forecasting Weather Weather

Reflections on the CIPM MRA: the NIM China Perspective Yuning Duan Vice Director, National

45 th Weather Squadron Space Weather Support to Launch Space Weather Workshop, 29 April 2016

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

Updates to ESRLs FIM-Chem global modeling system and comparison of aerosol optical depth

TSC Presentation EHR System Function and Information Model (EHR-S FIM) Release 3.0 Preparation (

Small Fim Success in Liberia: Information Frictions, Linkages and Informality Jonas Hjort 1 June

FIM CEV riders Biography ALBERT ARENAS #11 Date of birth: 11/12/96 Girona Spain Age: 19

Paralleliza(on and Performance of the NIM Weather Model on CPU, GPU and MIC Architectures Mark

2Q20 1 History and Business Highlights 3 Market Presence 7 Digital Bank 9 Results 13 NIM

Error Detection and Correction: Nim; Secure Communication; RAID Greg Plaxton Theory in

Automatic code features extraction using bio-inspired algorithms - presentation Data October

Pre-Trial Release PROS ECUTORIAL OPTIONS Rebecca Lange the duty to prosecute

Achieving Our Full Potential TOGETHER! District Goals: 1. Achieve educational excellence and

A V V O A V V O U I I D S B I U R T A A O L T R I Y O N AVVO WATCH IS A

Ohio Medicaid 2019 &amp; EAPG Updates Agenda Ohio Medicaid 2019 updates EAPG overview

European Network Code Implementation Overview Wednesday 3 May 2017 EU Code Implementation

Council Sustainability Committee January 14, 2019 1 Hayward Police Department Year-End Report

General presentation of Commander66 Release: V1.02 28/08/2011 http://www.skynam.com Machine

CHAMPIONSHIP 2016 FIM X-TRIAL DES NATIONS FORMAT & RULES FRANCE, Nice, 1st April 2016

Ohio Medicaid 2019 & EAPG Updates Agenda Ohio Medicaid 2019 updates EAPG overview