Exploring Scientific Discovery with Large-Scale Parallel Scripting - - PowerPoint PPT Presentation

exploring scientific discovery with large scale parallel
SMART_READER_LITE
LIVE PREVIEW

Exploring Scientific Discovery with Large-Scale Parallel Scripting - - PowerPoint PPT Presentation

Exploring Scientific Discovery with Large-Scale Parallel Scripting Tim Armstrong 1 Justin M. Wozniak 2 Michael Wilde 12 1 University of Chicago 2 Argonne National Laboratory May 15, 2013 Parallel Scripting with Swift/T SciColSim Application


slide-1
SLIDE 1

Exploring Scientific Discovery with Large-Scale Parallel Scripting

Tim Armstrong 1 Justin M. Wozniak 2 Michael Wilde12

1University of Chicago 2Argonne National Laboratory

May 15, 2013

slide-2
SLIDE 2

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Overview

Parallel scripting: massive scalability with (relative) ease

  • Scaling up real science applications difficult:
  • Must adapt code to radically different programming model
  • Concurrency bugs
  • Load balancing, data management, etc
  • SciColSim: compute-intensive science app
  • Swift/T: super-scalable high-performance scripting system for

parallel composition of existing code

slide-3
SLIDE 3

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

The Scripting Paradigm

  • Low-level language (e.g. C) + high-level language (e.g.

Python) High-level script

Optimized performance- critical functions

  • rchestrates
slide-4
SLIDE 4

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Parallel Scripting

  • Can retrofit parallelism onto sequential scripting languages:
  • Threads
  • Message passing (MPI, etc.)
  • Abstractions (MapReduce, etc.)
  • But parallelism is a second-class concept in the language...
  • Q: Why can’t I express parallelism with loops, conditionals,

variables, etc?

slide-5
SLIDE 5

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Parallel Scripting in Swift/T

  • Q: Why can’t I express parallelism with loops, conditionals,

variables?

  • A: you can in Swift!
  • The Swift parallel scripting language[WFI+09]:
  • Implicit dataflow parallelism
  • Language statements execute concurrently in dataflow order
  • Single-assignment variables guarantee determinism
  • Determinism extends to additional, rich, data structures:

arrays, hash tables, structs.

float results[]; file data = input_file("my.data"); foreach i in [1:N] { Independent parallel iterations if (predicate(i)) { Dataflow dependencies results[i] = compute(i, data); } } mean, stdev = stat_summ( results); Swift code with implied parallel dataflow

slide-6
SLIDE 6

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Swift/T Scalable Implementation[WAW+]

  • Can harness tens or hundreds of thousands of cores
  • All runtime components distributed and scalable: data store,

task distributor & script executor

  • Optimizing compiler (stc) reduces messaging

Shared State Data Store Task Queue Server Processes Execution Control/Worker Processes

Control Flow Load Balancing

Rule Engine Server Server

Task Execution … … … … … … … …

Process Task flow

Rule Engine

Legend

Swift/T runtime services breakdown (left) and task dispatch (right)

slide-7
SLIDE 7

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

SciColSim Application: Simulating Scientific Discovery

  • Ongoing research at University of Chicago
  • Want to understand process of scientific discovery: [ER10]
  • How do scientists select hypotheses to work on?
  • What are the most effective strategies?
  • Can explore with simulation:
  • Model knowledge as graph of concepts
  • Simulate different graph exploration strategies
  • Can measure how “efficient” strategy is
  • Computational characteristics:
  • Each simulation implemented with sequential C++ code
  • Floating point intensive: many probability calculations
slide-8
SLIDE 8

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Evaluating model parameters

  • “Ensemble” of randomized simulations
  • Results of simulation are averaged to evaluate “goodness” of

current parameters

  • Task duration is 0.2-20s. Runtime depends on input

parameters, plus significant random variation.

ensemble of randomized simulations analyze and choose new parameters parameter set i parameter set i + i parameter set i + 2

Task Dataflow variable Data dependency

Evaluating objective function and updating parameters

slide-9
SLIDE 9

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Simulated Annealing

  • Want to find “best” set of simulation parameters
  • Optimize using a simulated annealing algorithm
  • Basic idea:
  • 1. Perturb one parameter
  • 2. Evaluate objective function for current parameters
  • 3. Depending on result, maybe undo parameter change
  • 4. Repeat...

10x independent simulated annealing instances

500-1000x parameter updates

… … … … …

Visualization of parallel simulated annealing with 8-way parallelized

  • bjective function. Real runs have 1000-way parallelism.
slide-10
SLIDE 10

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Scale-up requirements

  • Optimization + validation: 0.25–0.5M CPU-hours per model
  • Fast feedback needed: scientists want to iterate models
  • Need to get high speedup: 4000x+ to get timely results
  • Relatively short-lived tasks: 0.2s-20s. Fan-out and fan-in

every 1-2 minutes.

  • Unpredictable task duration: need to dynamically assign tasks

to processors, in scalable way

  • High-performance dynamic task allocation mandatory

10x independent simulated annealing instances

500-1000x parameter updates

… … … … …

slide-11
SLIDE 11

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Adapting for Swift/T

  • Kept compute-intensive simulation logic in C++
  • Converted simulated annealing algorithm to Swift/T:
  • Nested parallel loops
  • Sequential iteration
  • Logic and formulas to update parameters
  • Logging and output

Original Swift/T Version Lines of Code Python: 33 lines C++: 1175 lines Swift/T: 269 lines C++: 861 lines Scalability One node, many cores Many cores, 100’s

  • r

1000’s of nodes

slide-12
SLIDE 12

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Scaling up!

  • Strong scaling results for production workload at different

compiler optimization levels: scales well!

  • Mainly limited by amount of parallelism in workload ⇒ could

scale further with different optimization algorithm

  • STC compiler optimization: reduces messaging ⇒ better

scaling

1000 2000 3000 4000 5 10 15 O0 O1 O2 O3 Ideal

Cores Iters/sec

2000 4000 6000 8000 0.01 0.02 0.03 0.04 0.05 iters/ sec Ideal

Cores Iters/sec

Strong scaling for down-scaled at different STC optimization levels (left) and full-scale problem (right)

slide-13
SLIDE 13

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Task Prioritization

  • Key technique enabled by Swift/T: task prioritization
  • Improves resource utilization and time-to-solution
  • Exploits application knowledge:
  • “Catch-up” heuristic for slower optimization chains
  • Prioritize long-running tasks: target parameter correlated

with runtime

@prio= 100*(niters - iter) + target run simulation(...);

1200 1300 1400 1500 1600 5 10 15 20 25 30

without priorities with priorities Time (seconds) Busy cores

slide-14
SLIDE 14

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

References

  • J. Evans and A. Rzhetsky, Machine science, Science 329 (2010), no. 5990.
  • J. M. Wozniak, T. G. Armstrong, M. Wilde, D. S. Katz, E. Lusk, and I. T.

Foster, Swift/T: Large-scale application composition via distributed-memory data flow processing, Proc. CCGrid ’13.

  • M. Wilde, I. Foster, K. Iskra, P. Beckman, Z. Zhang, A. Espinosa,
  • M. Hategan, B. Clifford, and I. Raicu, Parallel scripting for applications at

the petascale and beyond, Computer 42 (2009), no. 11. Acknowledgements This research is supported in part by the U.S. DOE Office of

Science under contract DE-AC02-06CH11357, FWP-57810. This research was supported in part by NIH through resources provided by the Computation Institute and the Biological Sciences Division of the University of Chicago and Argonne National Laboratory, under grant S10 RR029030-01.

slide-15
SLIDE 15

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Demo

  • Compile application from scratch to illustrate toolchain
  • Production-scale run of SciColSim on 8400 cores of Beagle

Cray XE6 supercomputer @ UChicago

slide-16
SLIDE 16

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Conclusions

  • Can scale up existing applications with parallel scripting
  • Quick development cycle: easy to debug and modify code,

compared with alternative cluster programming models

  • Appropriate for applications that can be implemented as

user-defined tasks with explicit data dependences

  • Much better for moderately fine-grained workloads on large

clusters than traditional centralized workflow systems

  • Does not support wide-area grids/clouds (yet)
slide-17
SLIDE 17

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Task Dispatch Speed

  • Cray XE6
  • On 10 nodes, 24 cores per node
  • Many independent 0s tasks

ADLB O0 O1 O2 O3

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

Work Tasks/s (Mil.)

slide-18
SLIDE 18

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Scaling up to 105

  • Experiment on Blue Gene/P Intrepid at Argonne National Lab
  • 100s task durations
  • Experiment used old version of Swift/T. Many improvements

since.

slide-19
SLIDE 19

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Optimizations

  • O0: Only optimize write reference counts.
  • O1: Basic optimizations: constant folding, dead code

elimination, forward data flow, and loop fusion.

  • O2: More aggressive optimizations: asynchronous op

expansion, wait coalescing, hoisting, and small loop expansion.

  • O3: All optimizations: function inlining, pipeline fusion, loop

unrolling, intra-block instruction reordering, and simple algebra.

slide-20
SLIDE 20

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Comparison with Other Systems

  • Hadoop
  • Fixed communication pattern
  • Must reorganize code to fit MapReduce model
  • Minimize per data-item overhead versus minimize per

task-overhead

  • Swift and other workflow systems
  • Single master node limits scalability
  • Optimizing compiler
  • Better foreign-function interface for directly calling C++ code
  • No support yet for wide-area systems
slide-21
SLIDE 21

Parallel Scripting with Swift/T SciColSim Application Scaling up SciColSim with Swift/T

Comparison with PGAS

  • Scripting paradigm versus one language for computation +

coordination

  • Focus on simplicity
  • No explicit data placement: managed by runtime
  • Strong safety guarantees (e.g. determinism)