prediction via advancing model initialization Brian Etherton, with - - PowerPoint PPT Presentation

prediction via advancing
SMART_READER_LITE
LIVE PREVIEW

prediction via advancing model initialization Brian Etherton, with - - PowerPoint PPT Presentation

Improving weather prediction via advancing model initialization Brian Etherton, with Christopher W. Harrop, Lidia Trailovic, and Mark W. Govett NOAA/ESRL/GSD 15 November 2016 The HPC group at NOAA/ESRL/GSD Strong track record in high


slide-1
SLIDE 1

Improving weather prediction via advancing model initialization

Brian Etherton, with Christopher W. Harrop, Lidia Trailovic, and Mark W. Govett

NOAA/ESRL/GSD 15 November 2016

slide-2
SLIDE 2

TP 4DDA - NOAA/ESRL/GSD 2

  • Strong track record in high performance computing
  • Massively Parallel Fine Grain (MPFG) Computing
  • Graphics Processing Units (GPUs)
  • Many Integrated Core (MIC)
  • Working to advance the state of the art in data

assimilation, in particular, via improved performance and design

  • NOAA/NCEP GSI has a core limit in the hundreds
  • 4D-Var approaches are time consuming
  • 4D-Ensemble memory & I/O intensive
  • Wish to use a ‘great’ solver with any model

(atmos, ocean…)

  • First steps into data assimilation (started this year)

The HPC group at NOAA/ESRL/GSD

WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY

slide-3
SLIDE 3

TP 4DDA - NOAA/ESRL/GSD 3

1. Intrinsic Predictability Limitations a) Is the system inherently chaotic? 2. Errors in the a) Does the model represent the system correctly? b) Is model resolution sufficient? c) Are unresolved physical processes well parameterized? 3. Errors in the Initial Conditions and Boundary Conditions

Keys to accurate weather prediction models

WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY

slide-4
SLIDE 4
  • Consider two estimates of the temperature in this room

– TF shall be what we set the thermostat to (a forecast) – TO shall be the value from my phone (an observation)

  • Use average squared errors (Variance) to weight the two

estimates – Where sO

2 = Error Variance associated with TO

– Where sF

2 = Error Variance associated with TF

  • The optimal estimate (most likely value) of the temperature

in the room, (TA), is:

TA – TF = (sF

2)(sF 2+sO 2)-1[TO - TF]

Data Assimilation – What is it?

TP 4DDA - NOAA/ESRL/GSD WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY 4

slide-5
SLIDE 5
  • The estimate of the temperature with minimum error

variance, the analysis value (TA), is:

TA – TF = (sF

2)(sF 2+sO 2)-1[TO - TF]

  • What if the thermostat is perfect?

–Then sF

2 = 0

–Then TA – TF = 0, so TA = TF

  • What if the my phone is perfect?

–Then sO

2 = 0

–Then TA – TF = TO - TF, so TA = TO

  • TA is a weighted average of the observation and first guess

Data Assimilation – What is it?

TP 4DDA - NOAA/ESRL/GSD 5 WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY

slide-6
SLIDE 6

All the data we wish to incorporate

From the NASA hyperwall

slide-7
SLIDE 7

Data Assimilation – Full model

TA – TF = (sF

2)(sF 2+sO 2)-1[TO - TF]

  • Assuming that observation and forecast errors are

uncorrelated, the analysis increment (xa - xf ) that minimizes analysis error variance is (Cohn, 1997):

xa - xf = BHT(HBHT+R)-1[y-Hxf]

  • The vectors xa and xf are equal to the number of prediction

points (gridpoints * vertical levels * variables) in the model. For the ECMWF global model, that is about 1-billion

  • The matrix B is, for the full model, 1-billion*1-billion in size
  • The matrix H can involve compute-needy processes

Data Assimilation – Full Model - Challenges

TP 4DDA - NOAA/ESRL/GSD 7 WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY

slide-8
SLIDE 8

4D Data Assimilation

  • In prior equations all

data assumed to be at the analysis time.

  • All data in time

window assumed to

  • ccur at the middle of

that window.

  • Introduces some

errors ⇒ weather systems move and develop! Four Dimensional Data Assimilation (4D DA)

TP 4DDA - NOAA/ESRL/GSD 8 WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY

(12 hours)

slide-9
SLIDE 9

4D Data Assimilation

  • In some approaches,

information at different times is achieved by running a model forward (Tangent-Linear) and backward (Adjoint) in time

  • Optimal results with a

TL and AD that mimic the true model Four Dimensional Data Assimilation (4D DA)

TP 4DDA - NOAA/ESRL/GSD WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY 9

(12 hours)

slide-10
SLIDE 10

TP 4DDA - NOAA/ESRL/GSD 10

  • The time spent running the TL

and AD is, roughly:

LENGTH OF ASSIM WINDOW * 2 (TL & AD) * 1.5 (TL TAKES LONGER) * 1.5 (AD TAKES LONGER) * NUMBER OF ITERATIONS

  • For a 12 hour window, 40

iterations, this value is 54*40=2160 hours, or 90 days

  • This is, perhaps, 6x longer than

the forecast itself - this must be improved

WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY

Time Parallel 4D DA - Motivation

slide-11
SLIDE 11

11 TP 4DDA - NOAA/ESRL/GSD

4DVAR traditionally involves taking one state (bucket), moving it all the way from the start to finish to start Time parallel 4DVAR sends a number of states (buckets) from one time to the adjacent time

TRADITIONAL TIME PARALLEL

WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY

Time Parallel 4D DA - Motivation

slide-12
SLIDE 12

12 TP 4DDA - NOAA/ESRL/GSD

We did not invent time-parallel 4DVAR – the ECMWF has done this sort of work, as have others (Virginia Tech) Our goal is not to develop a brand new DA system, but to explore promising existing approaches

TRADITIONAL TIME PARALLEL

WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY

Time Parallel 4D DA - Motivation

slide-13
SLIDE 13

TP 4DDA - NOAA/ESRL/GSD 13

  • If the assimilation window could

be broken into 48 ¼-hour windows, then run time could be closer to 2 model days (rather than 90)

  • Would take ~27-minutes to

compute for 1% real-time model

  • Achieve scaling when your model

is no longer scaling

  • If scaling achieved, is the solution

from this time-parallel version just as good?

Time Parallel 4D DA - Motivation

WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY

slide-14
SLIDE 14

14

Results – Assimilation Methods

TP 4DDA - NOAA/ESRL/GSD

Test 1: The eastward propagation of a 1D sine wave

  • Timing results (seconds):

3DVAR 35.0 4DVAR 108.8 4DVAR-TP-1 108.6 4DVAR-TP-3 44.2

  • Results show that using the 3

OMP-THREAD Time Parallel 4DVAR results in a substantial reduction in run-time length

WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY

slide-15
SLIDE 15

15 TP 4DDA - NOAA/ESRL/GSD

Test 2: The Lorenz96 Model

  • The time parallel 4DVAR (yellow

line) performed better than 3DVAR, but not quite as well as 4DVAR

  • No great performance statistics

here – the 40-point problem was not taxing

  • Nonetheless – the Time Parallel

4DVAR results encourage us to continue on

Results – Assimilation Methods

WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY

slide-16
SLIDE 16

16 TP 4DDA - NOAA/ESRL/GSD

Performance of Procedural Implementation Lorenz Model with 4000 points Results – Time to Completion

WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY

slide-17
SLIDE 17

TP 4DDA - NOAA/ESRL/GSD 17

  • From Govett et al (BAMS paper)

G11 NIM (3.75KM resolution) using 64*20=1280 K80 GPUs runs in 1.6% of forecast time

  • 12-hour forecast takes 12-

minutes (could do only ONE iteration of 4DVAR)

  • Time-parallel could do 40

iterations in ~30 minutes if the iterations could be subdivided into ~48 sections (60,000 K80s, 30,000 Pascals)

4D DA – Full Earth - Requirements

WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY

Cray CS Storm Node Architecture

slide-18
SLIDE 18
  • Status

– Delivered October 2016 – Now in acceptance

  • Plans

– Support development of FV3 and next- gen data assimilation – Parallelization of FV3 in progress

  • 760 Pascal GPUs

– 3584 cores / GPU

  • Cray Storm, 8 GPUs / node
  • Mellanox InfiniBand

– QDR (40 Gb/s)

Cray CS Storm Node Architecture

  • NOAA has received large GPU cluster from Cray / NVIDIA

NOAA Fine Grain System

TP 4DDA - NOAA/ESRL/GSD 18 WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY

slide-19
SLIDE 19

TP 4DDA - NOAA/ESRL/GSD 19

  • What would it take to produce a 3km resolution global analysis of

the atmosphere (10-billion prediction points)?

Thoughts for the future

WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY

slide-20
SLIDE 20

TP 4DDA - NOAA/ESRL/GSD 20

  • Total amount of memory

required for the analysis: 40GB (10 billion points, 4 bytes per value)

  • For time parallel of 48 intervals

in a window: ~2PB

  • Observational data also could

be quite sizable (TBs)

  • Our issues are not just

processing, but also the speed

  • f memory and I/O

Thoughts for the future

WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY

slide-21
SLIDE 21
  • Status

– Delivered October 2016 – Now in acceptance

  • Plans

– Support development of FV3 and next- gen data assimilation – Parallelization of FV3 in progress

  • 760 Pascal GPUs

– 3584 cores / GPU

  • Cray Storm, 8 GPUs / node
  • Mellanox InfiniBand

– QDR (40 Gb/s)

Cray CS Storm Node Architecture

  • NOAA has received GPU cluster from Cray / NVIDIA

This Presentation is Now Complete

TP 4DDA - NOAA/ESRL/GSD 21 WELCOME DATA-ASSIM TIME-PAR RESULTS COMPUTE SUMMARY