Research and Forecasting (WRF) Model Bormin Huang Space Science and - PowerPoint PPT Presentation

CUDA Implementation of the Weather Research and Forecasting (WRF) Model Bormin Huang Space Science and Engineering Center University of Wisconsin-Madison SC13 nVIDIA Booth #613 Colorado Convention Center

Outline  Numerical weather prediction (NWP)  Weather Research and Forecasting (WRF) Model  GPU WSM5 Optimization  Benchmarks  Validation of the results  Conclusions Image: Wielicki, Bruce A., and Coauthors, 2013: Achieving Climate Change Absolute Accuracy in Orbit. Bull. Amer. Meteor. Soc., 94, 1519 – 1539.

What is numerical weather prediction (NWP)? • Numerical weather prediction uses mathematical models of the atmosphere and oceans to predict the weather based on current weather conditions. • First attempted in the 1920s • Computer simulation in the 1950s -> NWP produced realistic results. • Advances in NWP linked with advances in CS • Major application in HPC business Weather models use systems of differential equations based on the laws of physics, fluid motion, and chemistry, and use a coordinate system which divides the planet into a 3D grid. Winds, heat transfer, solar radiation, relative humidity, and surface hydrology are calculated within Atmospheric model schematic each grid cell, and the interactions with Wikimedia Commons neighboring cells are used to calculate atmospheric properties in the future.

Grid spacing (resolution) • Grid spacing (resolution) defines the scale of features you can simulate with the model •“Global” vs. “regional” regional = higher resolution over smaller domain Wikimedia Commons: NASA satellite photograph of the Hawaiian Islands

WRF Overview • WRF is mesoscale and global Weather Research and Forecasting model • Designed for both operational forecasters and atmospheric researchers • WRF is currently in operational use at numerous weather centers around the world • WRF is suitable for a broad spectrum of applications across domain scales ranging from meters to hundreds of kilometers. • Increases in computational power enables - Increased vertical as well as horizontal resolution - More timely delivery of forecasts - Probabilistic forecasts based on ensemble methods • Why accelerators? -Cost performance -Need for strong scaling WRF simulation of Hurricane Rita (2005) tracks Wikimedia Commons Image: Welcome Remarks, 14 th Annual WRF Users’ Workshop.

WRF system components Jimy Dudhia: WRF physics options • The WRF physics categories are microphysics , cumulus parametrization, planetary boundary layer (PBL), land-surface model and radiation.

Performance Profile of WRF CONUS 12km workload * 9 Code lines (f90) 1553 91 WSM5 % Runtime Others 511557 25 75 * John Michalakes , “Code restructuring to improve performance in WRF model physics on Intel Xeon Phi ”, Workshop on Programming weather, Jan. 2000, 30km workload * climate, and earth-system models on heterogeneous multi-core platforms, September 20, 2013

WRF Microphysics • Microphysics provides atmospheric heat and moisture Water vapor P i d tendencies. e p Psdep d n • o Microphysics includes explicitly c P P i g e n resolved water vapor, cloud, and Cloud water Cloud ice precipitation processes. Pracw • Surface snowfall and rainfall are Psaci Praut Psaut Psevp computed by microphysical Prevp Psacw schemes. • Several bulk water microphysics Psmlt Rain Snow schemes are available within the Microphysics processes in the WSM5 scheme WRF, with different numbers of simulated hydrometeor classes and methods for estimating their size fall speeds, distributions and densities

Analyzing the WSM5 on CONUS 12 km domain Measured using • Arithmetic intensity (=FLOPS / byte) cachegrind (valgrind) - high arithmetic intensity -> computation bound - low arithmetic intensity -> memory bound • WSM5 CONUS 12km workload: 24.25 billion instructions • 7.30 billion memory reads • 3.18 billion memory writes -> 0.83 instructions / byte Tesla K20 delivers up to 3519 GFLOPS / 208 GB/s ~16.9 FLOPS/byte Computer Organization and Design: The Hardware/software Interface By David A. Patterson, John L. Hennessy Arithmetic Intensity O(1) N(Log(N)) O(N) BLAS 1 Dense linear algebra N-body FFT BLAS 2 (BLAS 3) (Particle Methods) Arithmetic intensity is relatively low -> reduce memory accesses

Parallelization of the computational domain • WRF domain is 2d grid 12km resolution case Water vapor Pidep Grid dimention: parallel to the ground Psdep Pcond Pigen X=433 • Y=308 Multiple levels correspond Cloud water Cloud ice Z=35 to the vertical heights in the Pracw Psaci Praut Psevp atmosphere Psaut p Executed in v e Psacw r • P Vertical dependencies one thread Psmlt Rain Snow - Columns are independent - Parallelizable in horizontal: Z two dimensions of parallelism to work with - Each thread computes one column at a grid point Y X

Additional optimizations for CUDA C Decreases processing time from 29.6 ms to 25.4 ms on K20 1.Seven additional temporaries were eliminated 2.Four additional loop fusions were performed 3.Several global arrays were prefetched from global memory to registers. Results were written back at the end of the loop. 4.Dead-code was eliminated 5.Removed computation of the same array thrice 6.After a loop-inversion, three loops were fused (2x) 7.Used const __restrict__* to utilize read-only cache Mielikainen, J.; Bormin Huang; Huang, H.A.; Goldberg, M.D., "Improved GPU/CUDA Based Parallel Weather and Research Forecast (WRF) Single Moment 5-Class (WSM5) Cloud Microphysics," IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, Vol. 5, No. 4, pp. 1256-1265, 2012.

Analysis of WSM5 on Tesla K20 Metric Description Old WSM5 New WSM5 Processing time 29.6 ms 25.4 ms 14% faster GFLOPS/s 220.5 257.0 Registers per thread 56 62 Additional registers are used for data prefetching/temp. removal Stack frame 0 bytes 8 bytes Spill stores 0 bytes 4 bytes Constant memory 840 bytes 784 bytes 7x64-bit pointers were removed Achieved Occupancy 0.47 0.47 Increase in register usage didn't reduce occupancy Executed IPC 1.17 1.30 Increased by loop fusion L2 Hit Rate 46.18% 57.31% Increased by temporary elimination Texture Cache Hit Rate 53.30% 59.74% Global Load Transactions 25,283,839 24,217,376 Reduced by temporary elimination Global Store Transactions 12,078,815 8,802,572 Reduced by temporary elimination Global Load Throughput 93.9 GB/s 103.8 GB/s

Limiting factors Different type of instructions are executed on different function units within each SM. Performance can be limited if a function unit is overused Achieved compute and memory bandwidth below 60% indicate latency issues Kernel Performance is Bound by Instruction and Memory Latency

Benchmarking GPUs GPU Core clock CUDA Peak Peak Memory Total cores Single Double Bandwidth Memory Precision Precision ( ECC off ) Size Processing Processing Power Power Tesla K20 705 MHz 2496 3519 GFLOPS 1173 208 GB/s 5 GB (Nov. 2012) (758 MHz *) GFLOPS Tesla K40 745 MHz 2880 3837 GFLOPS 1279 288 GB/s 12 GB (Nov. 2013) (875 MHz *) GFLOPS • NVIDIA GPU Boost is a feature that makes use of the power headroom to run the SM clock to a higher frequency. • The default clock is set to the base clock, which is necessary for some applications that are demanding on power (e.g., DGEMM), many application workloads are less demanding on power and can take advantage of a higher boost clock setting for added performance.

Memory Bandwidth and Utilization K40 Base Mode K40 Boost Mode

Nvidia K40 vs. Xeon Phi Xeon Phi * Tesla K40 Processing Time 29.7 ms 16.5 ms Concurrent CUDA threads 3840 14336 (60 cores, 4 HT, 16 SIMD) (28 warps/MP, 16 MPs) Vector Instructions 49.73% 100% DRAM Write Throughput 33.5 GB/s 57.7 GB/s DRAM Read Throughput 19.0 GB/s 93.3 GB/s • Xeon Phi vectorized 1/2 of WSM5 – the other half utilizes only multiple cores • Xeon Phi with a higher cache size/number of threads ratio can serve more memory requests from caches than K40 • K40 is able to hide latency better even with a higher usage of global memory than Xeon Phi - a larger number of concurrent threads allows for better latency hiding * Xeon Phi optimization: John Michalakes, NOAA Additional Optimization: I. Gokhale, L. Meadows, R. Sasanka, Intel Corp.

Code Validation -Fused multiply-addition was turned off (--fmad=false) -GNU C math library was used on GPU, i.e. powf(), expf(), sqrt() and logf() are replaced by library routines from GNU C library -> bit-exact output -Small output differences for – fast-math Potential temperature Difference between CPU and GPU outputs

GPU-accelerated WRF modules WRF Module name Speedup Single moment 6-class microphysics 500x Eta microphysics 272x Purdue Lin microphysics 692x Stony-Brook University 5-class microphysics 896x Betts-Miller-Janjic convection 105x Kessler microphysics 816x New Goddard shortwave radiance 134x Single moment 3-class microphysics 331x New Thompson microphysics 153x Double moment 6-class microphysics 206x Dudhia shortwave radiance 409x Goddard microphysics 1311x Double moment 5-class microphysics 206x Total Energy Mass Flux surface layer 214x Mellor-Yamada Nakanishi Niino surface layer 113x Single moment 5-class microphysics 350x Pleim-Xiu surface layer 665x

Research and Forecasting (WRF) Model Bormin Huang Space Science and - PowerPoint PPT Presentation

CUDA Implementation of the Weather Research and Forecasting (WRF) Model Bormin Huang Space Science and Engineering Center University of Wisconsin-Madison SC13 nVIDIA Booth #613 Colorado Convention Center Outline Numerical weather prediction

Earth System Research Lab in Boulder, Colorado What is WRF? WRF is NCARs mesoscale and

Forecasts and potential futures Rob Hyndman Author, forecast Forecasting Using R Sample

Flood Forecasting Initiative Guy Shalev Flooding impact Flood Forecasting Flood Forecasting

Lecture 10 Forecasting and Model Fitting Colin Rundel 02/20/2017 1 Forecasting 2 Forecasting

WRF Nesting: Set Up and Run Wei Wang NCAR/MMM Mesoscale & Microscale Meteorological

Welcome to Forecasting Using R Rob Hyndman Author, forecast Forecasting Using R What you will

LA LAUNCH NCH System Architecture GFS WRF MIKE Application India Meteorological Department

Forecasting 21 January 2013 1 FCAS Agenda Business Goals & Forecasting Approach

Probabilistic Forecasting with DeepAR and AWS SageMaker EuroPython 2020 - Probabilistic

2018-2019 FORECASTING INTRODUCTION TO COUNSELORS FRESHMAN YEAR REQUIREMENTS FORECASTING

Electricity price forecasting: from prob- abilistic to deep learning approaches TU Delft &

Air quality forecasting in Europe Forecasting emissions Cross-cutting activities with working

Processing Forecasting Queries Processing Forecasting Queries Songyun Duan, Shivnath Babu Duke

Tool Demonstration: Demand Forecasting PACE D 2.0 RE Team Agenda Demand Forecasting

Modeling the characteristics of the earth surface in the Arctic using the regional model WRF and

WRF-ARW Model for Prediction of High Temperatures in South and South East Regions of Armenia H.

campaign to GOES-R and MTG Rachel Albrecht 1 Collaborators: Carlos Morales 2 , Steve Goodman 3 ,

. (Peter) M.K. Yau 1 Collaborators : Frederic Chosson 1 Paul Vaillancourt 2 Jason Milbrandt 2 1

Forecasting Lightning Threat Earth-Sun System Division National Aeronautics and Space

Do the areas with insects contain more toads? By: Sam Quick, Owen Goettner, and Jared Battat

CLEAR CREEK RESTORATION 3406 (b)(12) Matt Brown, FWS Jim De Staso, Reclamation 530-527-3043

Bryn Fosburgh Bryn Fosburgh Vice President Converging forces have placed geospatial information

Russell Callender, Ph.D. Deputy Assistant Administrator NOAAs National Ocean Service HSRP

FA/GGBS-based Geopolymer Apriany SALUDUNG, Yuko OGAWA, Kenji KAWAI Department of Civil and

Research and Forecasting (WRF) Model Bormin Huang Space Science and - PowerPoint PPT Presentation

CUDA Implementation of the Weather Research and Forecasting (WRF) Model Bormin Huang Space Science and Engineering Center University of Wisconsin-Madison SC13 nVIDIA Booth #613 Colorado Convention Center Outline Numerical weather prediction

Earth System Research Lab in Boulder, Colorado What is WRF? WRF is NCARs mesoscale and

Forecasts and potential futures Rob Hyndman Author, forecast Forecasting Using R Sample

Flood Forecasting Initiative Guy Shalev Flooding impact Flood Forecasting Flood Forecasting

Lecture 10 Forecasting and Model Fitting Colin Rundel 02/20/2017 1 Forecasting 2 Forecasting

WRF Nesting: Set Up and Run Wei Wang NCAR/MMM Mesoscale &amp; Microscale Meteorological

Welcome to Forecasting Using R Rob Hyndman Author, forecast Forecasting Using R What you will

LA LAUNCH NCH System Architecture GFS WRF MIKE Application India Meteorological Department

Forecasting 21 January 2013 1 FCAS Agenda Business Goals &amp; Forecasting Approach

Probabilistic Forecasting with DeepAR and AWS SageMaker EuroPython 2020 - Probabilistic

2018-2019 FORECASTING INTRODUCTION TO COUNSELORS FRESHMAN YEAR REQUIREMENTS FORECASTING

Electricity price forecasting: from prob- abilistic to deep learning approaches TU Delft &amp;

Air quality forecasting in Europe Forecasting emissions Cross-cutting activities with working

Processing Forecasting Queries Processing Forecasting Queries Songyun Duan, Shivnath Babu Duke

Tool Demonstration: Demand Forecasting PACE D 2.0 RE Team Agenda Demand Forecasting

Modeling the characteristics of the earth surface in the Arctic using the regional model WRF and

WRF-ARW Model for Prediction of High Temperatures in South and South East Regions of Armenia H.

campaign to GOES-R and MTG Rachel Albrecht 1 Collaborators: Carlos Morales 2 , Steve Goodman 3 ,

. (Peter) M.K. Yau 1 Collaborators : Frederic Chosson 1 Paul Vaillancourt 2 Jason Milbrandt 2 1

Forecasting Lightning Threat Earth-Sun System Division National Aeronautics and Space

Do the areas with insects contain more toads? By: Sam Quick, Owen Goettner, and Jared Battat

CLEAR CREEK RESTORATION 3406 (b)(12) Matt Brown, FWS Jim De Staso, Reclamation 530-527-3043

Bryn Fosburgh Bryn Fosburgh Vice President Converging forces have placed geospatial information

Russell Callender, Ph.D. Deputy Assistant Administrator NOAAs National Ocean Service HSRP

FA/GGBS-based Geopolymer Apriany SALUDUNG, Yuko OGAWA, Kenji KAWAI Department of Civil and

WRF Nesting: Set Up and Run Wei Wang NCAR/MMM Mesoscale & Microscale Meteorological

Forecasting 21 January 2013 1 FCAS Agenda Business Goals & Forecasting Approach

Electricity price forecasting: from prob- abilistic to deep learning approaches TU Delft &