Paralleliza(on and Performance
- f the NIM Weather
Paralleliza(on and Performance of the NIM Weather Model on CPU, - - PowerPoint PPT Presentation
Paralleliza(on and Performance of the NIM Weather Model on CPU, GPU and MIC Architectures Mark Gove? NOAA Earth System Research Laboratory We Need Be?er Numerical Weather Predic(on Superstorm Sandy Second most destruc(ve in U.S.
“A European forecast that closely predicted Hurricane Sandy's onslaught days ahead
raising complaints in the meteorological community.” "The U.S. does not lead the world; we are not No. 1 in weather forecasCng, I'm very sorry to say that," says AccuWeather's Mike Smith…”
October 28, 2012 Hurricane Sandy Source: USA Today, October 30, 2012
Superstorm Sandy
Some improvement
Forecast Model intensity forecasts were accurate
forecasts in South Carolina 36 hours in advance (verified) But …
would not make landfall (verified)
– All U..S models incorrectly predicted landfall
never issued any hurricane watches or warnings for the mainland
– Forecasters relied on the European model for guidance
October 2, 2015
NY Times: Why U.S weather model has fallen behind WashingtonPost: Why the forecast cone of uncertainty is inadequate
– A one hour forecast produced in 8.5 minutes – Data assimila(on, post processing are similarly constrained
Data HPC Assimila<on NWP Post- Processing Forecaster Stakeholders
0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0
1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
36 Hour Forecast 72 Hour Forecast
NCEP Central Operations January 2015NCEP Operational Forecast Skill
36 and 72 Hour Forecasts @ 500 MB over North America
[100 * (1-S1/70) Method]15 Years
IBM P690 IBM 701 IBM 704 IBM 7090 IBM 7094 CDC 6600 IBM 360/195 CYBER 205 CRAY Y-MP CRAY C90 IBM SP IBM P655+ IBM Power 64-km 1-km
Simula(ons with GFDL’s variable-resolu(on FV3, non-hydrosta(c (aka cloud-permijng) model. Courtesy of Lin and Harris (2015 manuscript)
More Intense UpdraGs Produces a Tornado
Hurricane Joaquin
European model US model w/old data assimila(on US model w/new data assimila(on
Actual track
(through 03Z 07 October)
00Z October 1, 2015
50°N 45°N 40°N 35°N 30°N 25°N 80°W 70°W 60°W
Source: Corey Guas(ni EMC’s Model EvaluaCon Group
– 100 – 1000 (mes more than current models use
– Run on 10K GPUs, 600 MIC, 250K CPU cores – Tested at 3KM resolu(on
– Serial, parallel execu(on on CPU, GPU, MIC
OpenACC, F2C-ACC
OpenMP
OpenMP
SMS
Fine-Grained Parallelism
49.8 26.8 20 15.9 23.6 15.1 13.9 7.8 16.4 10 20 30 40 50 60 2010/11 2012 2013 2014
110 KM RESOLUTION 96 VERTICAL LEVELS Intel CPU NVIDIA GPU Intel MIC run(me (sec)
Year Intel CPU (cores) NVIDIA GPU (cores) Intel MIC (cores) 2010/11 Westmere (12) Fermi (448) 2012 SandyBridge (16) Kepler K20x (2688) 2013 IvyBridge (20) Kepler K40 (2880) Knights Corner (61) 2014 Haswell (24) Kepler K80 (4992)
using F2C-ACC
Symmetric Mode Execu<on
81 74 73 58 42 46 33 10 20 30 40 50 60 70 80 90
IB20 only IB24 only MIC only GPU only IB24 + MIC IB20 + GPU IB20 + 2 GPU
Run-<me (sec)
120 KM Resolu<on 40,968 Columns, 96 Ver<cal Levels 100 <me steps
Results from: NOAA / ESRL - August 2014 – IB20: Intel IvyBridge, 20 cores, 3.0GHz – IB24: Intel IvyBridge 24 cores, 2.70 GHz – GPU: Kepler K40 2880 cores, 745 MHz – MIC: KNC 7120 61 cores, 1.23GHZ
Node Type:
Numeric values represent node run-(mes for each configura(on
0.95 0.90 0.77 0.71 5 10 15 20 25 30 35 40 45 50 2 4 6 8 Run<me (seconds) GPUs
NIM Single Node Performance
40,962 Columns, 100 <mesteps
Run(me Communica(ons Cols/GPU 40962 20481 10241 6827 5120
Parallel Efficiency
260 230 165 145.5 132.5 50 100 150 200 250 300
40 20 10 7 5
Cost (thousands) numCPUs:
CPU versus GPU Cost-Benefit NIM 30 km resolu(on
CPU only CPU & GPU
K80s per CPU: 0 1 2 3 4