Grid Computing in Numerical Relativity and Astrophysics Gabrielle - - PDF document

grid computing in numerical relativity and astrophysics
SMART_READER_LITE
LIVE PREVIEW

Grid Computing in Numerical Relativity and Astrophysics Gabrielle - - PDF document

Grid Computing in Numerical Relativity and Astrophysics Gabrielle Allen: gallen@cct.lsu.edu Depts Computer Science & Physics Center for Computation & Technology (CCT) Louisiana State University Challenge Problems Cosmology


slide-1
SLIDE 1

1

Grid Computing in Numerical Relativity and Astrophysics

Gabrielle Allen: gallen@cct.lsu.edu Depts Computer Science & Physics Center for Computation & Technology (CCT) Louisiana State University

Challenge Problems

  • Cosmology
  • Black Hole

and Neutron Star Models

  • Supernovae
  • Astronomical

Databases

  • Gravitational

Wave Data Analysis

  • Drive HEC &

Grids

slide-2
SLIDE 2

2

Gravitational Wave Physics

Observations Models Analysis & Insight

Complex Simulations

slide-3
SLIDE 3

3

Computational Science Needs

  • Requires incredible mix of technologies & expertise!
  • Many scientific/engineering components

– Physics, astrophysics, CFD, engineering,...

  • Many numerical algorithm components

– Finite difference? Finite volume? Finite elements? – Elliptic equations: multigrid, Krylov subspace,... – Mesh refinement

  • Many different computational components

– Parallelism (HPF, MPI, PVM, ???) – Multipatch – Architecture (MPP, DSM, Vector, PC Clusters, FPGA, ???) – I/O (generate TBs/simulation, checkpointing…) – Visualization of all that comes out!

  • New technologies

– Grid computing – Steering, data archives

  • Such work cuts across many disciplines, areas of CS…

Cactus Code

  • Freely available, modular, portable and

manageable environment for collaboratively developing parallel, high-performance multi- dimensional simulations

  • Developed for Numerical Relativity, but now

general framework for parallel computing (CFD, astrophysics, climate modeling, chemical eng, quantum gravity, …)

  • Finite difference, adaptive mesh refinement

(Carpet, Samrai, Grace), adding FE/FV, multipatch

  • Active user and developer communities,

main development now at LSU and AEI.

  • Open source, documentation, etc
slide-4
SLIDE 4

4

  • Cactus modules (thorns) for

numerical relativity.

  • Many additional thorns

available from other groups (AEI, CCT, …)

  • Agree on some basic

principles (e.g. names of variables) and then can share evolution, analysis etc.

  • Can choose whether or not to

use e.g. gauge choice, macros, masks, matter coupling, conformal factor

  • Over 100 relativity papers &

30 student theses: production research code

ADM EvolSimple Evolve ADMAnalysis ADMConstraints AHFinder Extract PsiKadelia TimeGeodesic Analysis IDAnalyticBH IDAxiBrillBH IDBrillData IDLinearWaves IDSimple InitialData CoordGauge Maximal Gauge Conditions SpaceMask ADMCoupling ADMMacros StaticConformal ADMBase

Cactus Einstein

Grand Challenge Collaborations

NSF Black Hole Grand Challenge

  • 8 US Institutions
  • 5 years
  • Attack colliding black

hole problem

Examples of Future of Science & Engineering

  • Require Large Scale Simulations,

beyond reach of any machine

  • Require Large Geo-distributed

Cross-Disciplinary Collaborations

  • Require Grid Technologies, but not

yet using them!

NASA Neutron Star Grand Challenge

  • 5 US sites
  • 3 years
  • Colliding neutron

star problem EU Astrophysics Network

  • 10 EU sites
  • 3 years
  • Continuing these

problems

slide-5
SLIDE 5

5

New Paradigm: Grid Computing

  • Computational resources across

the world

– Compute servers (double each 18 months) – File servers – Networks (double each 9 months) – Playstations, cell phones etc…

  • Grid computing integrates

communities and resources

  • How to take advantage of this for

scientific simulations?

– Harness multiple sites and devices – Models with new level of complexity and scale, interacting with data – New possibilities for collaboration and advanced scenarios

NLR and Louisiana Optical Network (LONI)

State initiative ($40M) to support research: 40 Gbps optical network Connects 7 sites Grid resources (IBM P5) at sites LIGO/CAMD

New possibilities: Dynamical provisioning and scheduling of network bandwidth Network dependent scenarios “EnLIGHTened” Computing (NSF)

slide-6
SLIDE 6

6

Current Grid Application Types

  • Community Driven

– Distributed communities share resources – Video Conferencing – Virtual Collaborative Environments

  • Data Driven

– Remote access of huge data, data mining – Eg. Gravitational wave analysis, particle physics, astronomy

  • Process/Simulation Driven

– Demanding Simulations of Science and Engineering – Task farming, resource brokering, distributed computations, workflow

  • Remote visualization, steering and

interaction, etc…

Typical scenario:

Find remote resources (task farm,

distribute)

Launch jobs (static) Visualize, collect results

Prototypes and demos: need to move to:

Fault tolerance Robustness Scaling Easy to use Complete solutions

New Paradigms for Dynamic Grids

  • Addressing large, complex, multidisciplinary

problems with collaborative teams of varied researchers ...

  • Code/User/Infrastructure should be aware
  • f environment

– Discover and monitor resources available NOW – What is my allocation on these resources? – What is bandwidth/latency Code/User/Infrastructure should make decisions – Slow part of simulation can run independently … spawn it off! – New powerful resources just became available … migrate there! – Machine went down … reconfigure and recover! – Need more memory (or less!), get by adding (dropping) machines! Dynamically provision and use new high end resources and networks

slide-7
SLIDE 7

7

S1 S2 P1 P2 S1 S2 P2 P1 S

Future Dynamic Grid Computing

We see something, but too weak. Please simulate to enhance signal! Found a black hole, Load new component Look for horizon Calculate/Output

  • Grav. Waves

Calculate/Output Invariants Find best resources Free CPUs!! NCSA SDSC RZG LRZ Archive data

SDSC

Add more resources Clone job with steered parameter Queue time over, find new machine Further Calculations AEI Archive to LIGO experiment

Future Dynamic Grid Computing

slide-8
SLIDE 8

8

New Grid Scenarios

  • Intelligent Parameter Surveys, speculative computing, monte

carlo

  • Dynamic Staging: move to faster/cheaper/bigger machine
  • Multiple Universe: create clone to investigate steered

parameter

  • Automatic Component Loading: needs of process change,

discover/load/execute new calc. component on approp.machine

  • Automatic Convergence Testing
  • Look Ahead: spawn off and run coarser resolution to predict

likely future

  • Spawn Independent/Asynchronous Tasks: send to cheaper

machine, main simulation carries on

  • Routine Profiling: best machine/queue, choose resolution

parameters based on queue

  • Dynamic Load Balancing: inhomogeneous loads, multiple grids
  • Inject dynamically acquired data

But … Need Grid Apps and Programming Tools

  • Need application programming tools for Grid

environments

– Frameworks for developing Grid applications – Toolkits providing Grid functionality – Grid debuggers and profilers – Robust, dependable, flexible Grid tools

  • Challenging CS problems:

– Missing or immature grid services – Changing environment – Different and evolving interfaces to the “grid” – Interfaces are not simple for scientific application developers

  • Application developers need easy, robust and

dependable tools

slide-9
SLIDE 9

9

GridLab Project

  • EU 5th Framework ($7M)
  • Partners in Europe and US

– PSNC (Poland), AEI & ZIB (Germany), VU (Netherlands), MASARYK (Czech), SZTAKI (Hungary), ISUFI (Italy), Cardiff (UK), NTUA (Greece), Chicago, ISI & Wisconsin (US), Sun, Compaq/HP, LSU

  • Application and test bed
  • riented (Cactus + Triana)

– Numerical relativity – Dynamic use of grids

  • Main goal: develop application

programming environment for Grid

www.gridlab.org

Grid Application Toolkit (GAT)

  • Abstract

programming interface between applications and Grid services

  • Designed for

applications (move file, run remote task, migrate, write to remote file)

  • Led to GGF Simple

API for Grid Applications www.gridlab.org/GAT Main result from GridLab project

                                            

slide-10
SLIDE 10

10

Distributed Computation

  • Issues

– Bandwidth (increasing faster than CPU) – Latency – Communication needs, Topology – Communication/computation

  • Techniques to be

developed

– Overlapping communication/computation – Extra ghost zones to reduce latency – Compression – Algorithms to do this for scientist

Harnessing Multiple Computers

Why do this?

Capacity: computers can’t keep up with needs Throughput: combine resources

SDSC IBM SP 1024 procs 5x12x17 =1020 NCSA Origin Array 256+128+128 5x12x(4+2+2) =480

OC-12 line (But only 2.5MB/sec)

GigE:100MB/sec 17 12 5 4 2 12 5 2

Cactus + MPICH-G2 Communications dynamically adapt to application and environment Any Cactus application Scaling: 15% -> 85%

“Gordon Bell Prize”

(With U. Chicago/Northern, Supercomputing 2001, Denver)

Dynamic Adaptive Distributed Computation

slide-11
SLIDE 11

11

HTTP

Streaming HDF5 Autodownsample

Any Viz Client: LCA Vision, OpenDX Changing steerable parameters

  • Parameters
  • Physics, algorithms
  • Performance

Remote Viz & Steering Cactus Worm (SC2000)

  • Cactus simulation starts,

launched from portal

  • Migrates itself to another site

– Grid technologies

  • Registers new location
  • User tracks/steers, using HTTP,

streaming data, etc…

  • Continues around Europe…
slide-12
SLIDE 12

12

User only has to invoke Cactus “Spawner” thorn… Appropriate analysis tasks spawned automatically to free resources worldwide

Task Spawning (SC2001)

Cactus “Spawner” thorn automatically prepares analysis tasks for spawning Grid technologies find resources, manage tasks, collect data Intelligence to decide when to spawn SC2001: resources of GGTC testbed.

Main Cactus BH simulation starts here

  • 5 continents and over

14 countries.

  • Around 70 machines,

7500+ processors

  • Many hardware

types, including PS2, IA32, IA64, MIPS,

  • Many OSs, including

Linux, Irix, AIX, OSF, True64, Solaris, Hitachi

  • Many organizations:

DOE, NSF, MPG, universities, vendors

  • All ran same Grid

infrastructure, and used for different applications

Cactus black hole simulations spawned apparent horizon finding tasks across the grid.

Global Grid Testbed Collaboration

Supercomputing 2001 Prizes for most heterogeneous and most distributed testbed

slide-13
SLIDE 13

13

Main Cactus BH Simulation started in California Dozens of low resolution jobs test corotation parameter Huge job generates remote data visualized in Baltimore Error measure returned Black hole server controls tasks and steers main job

Black Hole Task Farming (SC2002)

Job Migration

GridLab demonstration SC2003

slide-14
SLIDE 14

14

Notification and Information

GridSphere Portal SMS Server Mail Server “The Grid” Replica Catalog User details, notification prefs and simulation information IM Server

“Grid-enabled” Gravitational Physics

  • Adaptive, intelligent

simulation codes able to adapt to environment

  • Simulation data stored

across geographically distributed spaces

– Organization, access, mining issues – Analysis of federated data sets by virtual organizations

  • Data analysis of LIGO,

GEO, LISA signals

– Interacting with simulation data – Managing parameter space/signal analysis

  • Now working on domain specific

information and knowledge based services:

– Gravitational physics description language

  • Schema for describing, searching,

encoding simulation results

  • Automated logging of simulations:

reproducibility

– Notification and data sharing services to enable collaboration – Relativity services

  • Remote servers running e.g.

waveform extraction, horizon finding etc.

  • Connection to publications and

information

  • Automated analysis
slide-15
SLIDE 15

15

Credits

  • This talk describes work carried out
  • ver a number of years by physicists,

computer scientists, mathematicians etc by the joint AEI-LSU numerical relativity groups and colleagues.