Convergence of computation and data workflows IS-ENES Workshop on - - PowerPoint PPT Presentation

convergence of computation and data workflows
SMART_READER_LITE
LIVE PREVIEW

Convergence of computation and data workflows IS-ENES Workshop on - - PowerPoint PPT Presentation

Convergence of computation and data workflows IS-ENES Workshop on Workflows and Metadata Generation Lisbon, PORTUGAL V. Balaji NOAA/GFDL and Princeton University 28 September 2016 V. Balaji ( balaji@princeton.edu ) Convergence 28 September


slide-1
SLIDE 1

Convergence of computation and data workflows

IS-ENES Workshop on Workflows and Metadata Generation Lisbon, PORTUGAL

  • V. Balaji

NOAA/GFDL and Princeton University

28 September 2016

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 1 / 35

slide-2
SLIDE 2

Amy Langenhorst 1977-2016

Principal Developer of the FMS Runtime Environment FRE.

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 2 / 35

slide-3
SLIDE 3

Outline

1

Hardware Directions GPUs, MICs, ARM Inexact computing Energy cost of algorithms and data movement

2

A Graph Approach Directed Acyclic Graphs Convergence of computation and data Fault tolerance across the workflow

3

Metadata and provenance Development and production workflow Statistical and scientific reproducibility

4

Summary

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 3 / 35

slide-4
SLIDE 4

Outline

1

Hardware Directions GPUs, MICs, ARM Inexact computing Energy cost of algorithms and data movement

2

A Graph Approach Directed Acyclic Graphs Convergence of computation and data Fault tolerance across the workflow

3

Metadata and provenance Development and production workflow Statistical and scientific reproducibility

4

Summary

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 4 / 35

slide-5
SLIDE 5

Power-8 with NVLink

Figure courtesy IBM.

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 5 / 35

slide-6
SLIDE 6

KNL Overview

Figure courtesy Intel.

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 6 / 35

slide-7
SLIDE 7

The inexorable triumph of commodity computing

... means ARM? From The Platform, Hemsoth (2015).

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 7 / 35

slide-8
SLIDE 8

Irreproducible Computing, Inexact Hardware

Figure 1 from Düben et al, Phil. Trans. A, 2016. Which bits can we allow to be “inexactly” flipped? Lorenz 96 as canonical test case of non-linearity and chaos.

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 8 / 35

slide-9
SLIDE 9

Irreproducible Computing, Inexact Hardware

Figure 2 from Düben et al, Phil. Trans. A, 2016.

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 9 / 35

slide-10
SLIDE 10

COSMO: energy to solution

  • T. Schulthess

ENES HPC Workshop, Hamburg, March 17, 2014

Cray XE6 (Nov. 2011) Cray XK7 (Nov. 2012) Cray XC30 (Nov. 2012) Cray XC30 hybrid (GPU) (Nov. 2013) 6.0 4.5 3.0 1.5 Current production code 1.75x New HP2C funded code 1.41x 1.49x 2.51x 2.64x 6.89x

Energy to solution (kWh / ensemble member)

!6

3.93x

Figure courtesy Thomas Schulthess, CSCS.

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 10 / 35

slide-11
SLIDE 11

JPSY comparison across ESMs

Model Machine Resol SYPD CHSY JPSY CM4 gaea/c2 1.2×108 4.5 16000 8.92×108 CM4 gaea/c3 1.2×108 10 7000 3.40×108 Comparative measures of capability (SYPD), capacity (CHSY), and energy cost (JPSY) per “unit of science”. Can you have codes that are “slower but greener”? Algorithms that are “less accurate but more eco-friendly”? From Balaji et al (2016), in review at GMDD. http://goo.gl/Nj1c2N

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 11 / 35

slide-12
SLIDE 12

Workflows for the exascale

Billion-way concurrency still a daunting challenge for everyone: no magic bullets anywhere to be found. Exotic hardware is on the way; this is quite likely the last generation of conventional hardware. Computing is likely to become irreproducible. Software investment paid back in power savings (Schulthess). Energy to solution will become key metric. More threading needs to be found: to fit 1018 op/s within a 1 MW power budget, an operation should be 1 pJ: data movement is ∼10 pJ to main memory; ∼100 pJ on network! DARPA: commodity improvements will slow to a trickle within 10 years: go back to specialized computing? DOE: double investment in exascale.

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 12 / 35

slide-13
SLIDE 13

A network of compute and data nodes

FRE and other elements in the GFDL modeling environment manage the complex scheduling of jobs across a distributed computing resource.

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 13 / 35

slide-14
SLIDE 14

... a global network of compute and data nodes

Workflow task is to minimize data flow across the global network. Figure courtesy IPSL.

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 14 / 35

slide-15
SLIDE 15

Outline

1

Hardware Directions GPUs, MICs, ARM Inexact computing Energy cost of algorithms and data movement

2

A Graph Approach Directed Acyclic Graphs Convergence of computation and data Fault tolerance across the workflow

3

Metadata and provenance Development and production workflow Statistical and scientific reproducibility

4

Summary

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 15 / 35

slide-16
SLIDE 16

Examples of DAG parallelism

DAG example: Cholesky Inversion

Source: Stan Tomov, ICL, University of Tennessee, Knoxville

DAG = Directed Acyclic Graph Can IFS use this technology?

ECMWF Seminar 2013

Figure courtesy George Mozdzynski, ECMWF .

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 16 / 35

slide-17
SLIDE 17

SWARM for DAGs

Jeffrey et al, IEEE Micro 2016.

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 17 / 35

slide-18
SLIDE 18

KNL Overview

Figure courtesy Intel.

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 18 / 35

slide-19
SLIDE 19

SWARM for DAGs: hardware implementation

Jeffrey et al, IEEE Micro 2016.

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 19 / 35

slide-20
SLIDE 20

NVRAM will blur distinction between memory and filesystem

Hemsoth, 2014: http://goo.gl/3ZeOXt

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 20 / 35

slide-21
SLIDE 21

NVRAM will blur distinction between memory and filesystem

Hemsoth, 2014: http://goo.gl/3ZeOXt

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 21 / 35

slide-22
SLIDE 22

Work avoidance

Work avoidance: find minimal path to complete output

make: traverse tree backwards; state is the filesystem state. cylc/chaco: traverse tree forwards; each task formulated as a no-op if outputs exist; fred contains state including tasks in flight.

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 22 / 35

slide-23
SLIDE 23

Work avoidance

Work avoidance: find minimal path to complete output

make: traverse tree backwards; state is the filesystem state. cylc/chaco: traverse tree forwards; each task formulated as a no-op if outputs exist; fred contains state including tasks in flight.

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 23 / 35

slide-24
SLIDE 24

Work avoidance

Work avoidance: find minimal path to complete output

make: traverse tree backwards; state is the filesystem state. cylc/chaco: traverse tree forwards; each task formulated as a no-op if outputs exist; fred contains state including tasks in flight.

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 24 / 35

slide-25
SLIDE 25

Use of cross-network message queues

MQ Cluster MQ Apps API DB’s

I P S L I P S L

IPSL User @ Browser | Command Line | Desktop

json

TGCC

MQ Relay

IDRIS

MQ Relay

CINES

MQ Relay

CNRM

MQ Relay

XXX

MQ Relay msg msg msg msg msg

IPSL have tested handling O(105) enqueues/dequeues per day. Google reports Rabbit service of O(106) per second! (more than all SMS/WhatsApp/etc) https://goo.gl/GBlAAz AMQP: active messages containing instructions as well as data. Figure courtesy Sébastien Denvil, IPSL.

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 25 / 35

slide-26
SLIDE 26

Outline

1

Hardware Directions GPUs, MICs, ARM Inexact computing Energy cost of algorithms and data movement

2

A Graph Approach Directed Acyclic Graphs Convergence of computation and data Fault tolerance across the workflow

3

Metadata and provenance Development and production workflow Statistical and scientific reproducibility

4

Summary

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 26 / 35

slide-27
SLIDE 27

Development and production workflow

Model developers have different workflow priorities and requirements. Production workflow benefits from coherence and similarity across runs. Development workflow requires extremely fine-grained access to code, namelists, scripts. A lot of rules broken:

Favored IDE/UI is called vi! source code edits in user directories input file modifications on the fly

Analysis workflow requires random access to local disk: inspiration-driven rather than industrial strength Still benefit from regression testing harness: multiple compilers, platforms Emulators? e.g SoftFloat http://www.jhauser.us/arithmetic/SoftFloat.html Provenance and metadata requirements relaxed for development workflow.

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 27 / 35

slide-28
SLIDE 28

Statistical comparison across model versions

Live monitoring of model runs. From GFDL MDT Tracking Page...

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 28 / 35

slide-29
SLIDE 29

Statistical comparison across model versions

Are two runs the same or different? What difference in inputs is responsible for the disrepancy? From GFDL MDT Tracking Page...

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 29 / 35

slide-30
SLIDE 30

Multi-model ensembles for climate projection

Figure SPM.7 from the IPCC AR5 Report. Can be interpreted as the most general and rigorous test of scientific reproducibility.

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 30 / 35

slide-31
SLIDE 31

Multi-model ensembles for climate projection

Critically depends on software, metadata, and data standards: the Earth System Grid Federation (http://esgf.org): a 3 PB federated archive. Workflows for replication, versioning, subsetting, QC, citation.

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 31 / 35

slide-32
SLIDE 32

Outline

1

Hardware Directions GPUs, MICs, ARM Inexact computing Energy cost of algorithms and data movement

2

A Graph Approach Directed Acyclic Graphs Convergence of computation and data Fault tolerance across the workflow

3

Metadata and provenance Development and production workflow Statistical and scientific reproducibility

4

Summary

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 32 / 35

slide-33
SLIDE 33

Summary

Community is struggling to move from synchronous to asynchronous data flow on the coming hardware platforms. Hardware is blurring the line between cache and memory, memory and storage (deep hierarchy). Energy to solution as a benchmark across the entire workflow. Expressing entire workflow as graphs (DAGs).

Maximize parallelism across the entire graph Minimize graph traversal during fault recovery

Accommodating different needs for development and production workflows: relax provenance and metadata requirements during development. Irreproducible computing: include statistical consistency testing into workflow.

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 33 / 35

slide-34
SLIDE 34

Summary

Community is struggling to move from synchronous to asynchronous data flow on the coming hardware platforms. Hardware is blurring the line between cache and memory, memory and storage (deep hierarchy). Energy to solution as a benchmark across the entire workflow. Expressing entire workflow as graphs (DAGs).

Maximize parallelism across the entire graph Minimize graph traversal during fault recovery

Accommodating different needs for development and production workflows: relax provenance and metadata requirements during development. Irreproducible computing: include statistical consistency testing into workflow. CMIP8 is going to be awesome!

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 34 / 35

slide-35
SLIDE 35

Thank you

Thank you.

  • V. Balaji (balaji@princeton.edu)

Convergence 28 September 2016 35 / 35