grid computing in numerical relativity and astrophysics
play

Grid Computing in Numerical Relativity and Astrophysics Gabrielle - PDF document

Grid Computing in Numerical Relativity and Astrophysics Gabrielle Allen: gallen@cct.lsu.edu Depts Computer Science & Physics Center for Computation & Technology (CCT) Louisiana State University Challenge Problems Cosmology


  1. Grid Computing in Numerical Relativity and Astrophysics Gabrielle Allen: gallen@cct.lsu.edu Depts Computer Science & Physics Center for Computation & Technology (CCT) Louisiana State University Challenge Problems • Cosmology • Black Hole and Neutron Star Models • Supernovae • Astronomical Databases • Gravitational Wave Data Analysis • Drive HEC & Grids 1

  2. Gravitational Wave Physics Analysis & Insight Observations Models Complex Simulations 2

  3. Computational Science Needs Requires incredible mix of technologies & expertise! • Many scientific/engineering components • – Physics, astrophysics, CFD, engineering,... Many numerical algorithm components • – Finite difference? Finite volume? Finite elements? – Elliptic equations: multigrid, Krylov subspace,... – Mesh refinement Many different computational components • – Parallelism (HPF, MPI, PVM, ???) – Multipatch – Architecture (MPP, DSM, Vector, PC Clusters, FPGA, ???) – I/O (generate TBs/simulation, checkpointing…) – Visualization of all that comes out! New technologies • – Grid computing – Steering, data archives Such work cuts across many disciplines, areas of CS… • Cactus Code • Freely available, modular, portable and manageable environment for collaboratively developing parallel, high-performance multi- dimensional simulations • Developed for Numerical Relativity, but now general framework for parallel computing (CFD, astrophysics, climate modeling, chemical eng, quantum gravity, …) • Finite difference, adaptive mesh refinement (Carpet, Samrai, Grace), adding FE/FV, multipatch • Active user and developer communities, main development now at LSU and AEI. • Open source, documentation, etc 3

  4. Cactus Einstein ADMBase • Cactus modules (thorns) for numerical relativity. Evolve Analysis • Many additional thorns available from other groups ADM ADMAnalysis EvolSimple ADMConstraints (AEI, CCT, …) AHFinder • Agree on some basic Extract PsiKadelia principles (e.g. names of TimeGeodesic variables) and then can share InitialData Gauge Conditions evolution, analysis etc. • Can choose whether or not to IDAnalyticBH CoordGauge use e.g. gauge choice, macros, IDAxiBrillBH Maximal IDBrillData masks, matter coupling, IDLinearWaves conformal factor IDSimple • Over 100 relativity papers & SpaceMask ADMCoupling 30 student theses: ADMMacros StaticConformal production research code Grand Challenge Collaborations NASA Neutron Star NSF Black Hole Grand Grand Challenge Challenge 5 US sites 8 US Institutions • • 3 years 5 years • • Colliding neutron Attack colliding black • • star problem hole problem EU Astrophysics Network Examples of Future of Science & 10 EU sites Engineering • 3 years • Require Large Scale Simulations, • Continuing these beyond reach of any machine • problems Require Large Geo-distributed • Cross-Disciplinary Collaborations Require Grid Technologies, but not • yet using them! 4

  5. New Paradigm: Grid Computing Computational resources across • the world – Compute servers (double each 18 months) – File servers – Networks (double each 9 months) – Playstations, cell phones etc… Grid computing integrates • communities and resources How to take advantage of this for • scientific simulations? – Harness multiple sites and devices – Models with new level of complexity and scale, interacting with data – New possibilities for collaboration and advanced scenarios NLR and Louisiana Optical Network (LONI) State initiative ($40M) to support research: 40 Gbps optical network Connects 7 sites Grid resources (IBM P5) at sites LIGO/CAMD New possibilities: Dynamical provisioning and scheduling of network bandwidth Network dependent scenarios “EnLIGHTened” Computing (NSF) 5

  6. Current Grid Application Types • Community Driven Typical scenario: – Distributed communities share resources Find remote – Video Conferencing resources ( task farm, – Virtual Collaborative Environments distribute) • Data Driven Launch jobs (static) Visualize, collect – Remote access of huge data, data mining results – Eg. Gravitational wave analysis, particle physics, astronomy • Process/Simulation Driven Prototypes and demos: – Demanding Simulations of Science and need to move to: Engineering Fault tolerance – Task farming, resource brokering, Robustness distributed computations, workflow Scaling • Remote visualization, steering and Easy to use interaction, etc… Complete solutions New Paradigms for Dynamic Grids Addressing large, complex, multidisciplinary • problems with collaborative teams of varied researchers ... Code/User/Infrastructure should be aware • of environment – Discover and monitor resources available NOW – What is my allocation on these resources? – What is bandwidth/latency Code/User/Infrastructure should make decisions – Slow part of simulation can run independently … spawn it off! – New powerful resources just became available Dynamically provision … migrate there! and use new high – Machine went down … reconfigure and recover! end resources and networks – Need more memory (or less!), get by adding (dropping) machines! 6

  7. Future Dynamic Grid Computing We see something, but too weak. Please simulate to enhance signal! S S 1 S 2 P 1 P 2 S 2 S 1 P 2 P 1 Future Dynamic Grid Computing Add more resources Queue time over, Free CPUs!! find new machine RZG SDSC SDSC LRZ Clone job with Archive data steered parameter Calculate/Output Further Invariants Calculations Found a black hole, Load new component Calculate/Output Look for Grav. Waves AEI horizon Archive to LIGO experiment Find best resources NCSA 7

  8. New Grid Scenarios Intelligent Parameter Surveys, speculative computing, monte • carlo Dynamic Staging: move to faster/cheaper/bigger machine • Multiple Universe: create clone to investigate steered • parameter Automatic Component Loading: needs of process change, • discover/load/execute new calc. component on approp.machine Automatic Convergence Testing • Look Ahead: spawn off and run coarser resolution to predict • likely future Spawn Independent/Asynchronous Tasks: send to cheaper • machine, main simulation carries on Routine Profiling: best machine/queue, choose resolution • parameters based on queue Dynamic Load Balancing: inhomogeneous loads, multiple grids • Inject dynamically acquired data • But … Need Grid Apps and Programming Tools • Need application programming tools for Grid environments – Frameworks for developing Grid applications – Toolkits providing Grid functionality – Grid debuggers and profilers – Robust, dependable, flexible Grid tools • Challenging CS problems: – Missing or immature grid services – Changing environment – Different and evolving interfaces to the “grid” – Interfaces are not simple for scientific application developers • Application developers need easy, robust and dependable tools 8

  9. GridLab Project EU 5th Framework ($7M) • Partners in Europe and US • – PSNC (Poland), AEI & ZIB (Germany), VU (Netherlands), MASARYK (Czech), SZTAKI (Hungary), ISUFI (Italy), Cardiff (UK), NTUA (Greece), Chicago, ISI & Wisconsin (US), Sun, Compaq/HP, LSU Application and test bed • www.gridlab.org oriented (Cactus + Triana) – Numerical relativity – Dynamic use of grids Main goal: develop application • programming environment for Grid Grid Application Toolkit (GAT) Abstract • programming interface between applications and Grid services Designed for • applications (move file, run remote task, migrate, write to remote file) Led to GGF Simple • API for Grid Applications        Main result       from GridLab          project             www.gridlab.org/GAT            9

  10. Distributed Computation • Issues Harnessing Multiple – Bandwidth (increasing Computers faster than CPU) Why do this? – Latency – Communication needs, Capacity: computers can’t keep up with needs Topology – Communication/computation Throughput: combine resources • Techniques to be developed – Overlapping communication/computation – Extra ghost zones to reduce latency – Compression – Algorithms to do this for scientist Dynamic Adaptive Distributed Computation GigE:100MB/sec 17 4 2 2 OC-12 line 12 12 (But only 2.5MB/sec) 5 5 NCSA Origin Array SDSC IBM SP 256+128+128 1024 procs 5x12x(4+2+2) =480 5x12x17 =1020 Cactus + MPICH-G2 “Gordon Bell Prize” Communications dynamically adapt (With U. Chicago/Northern, to application and environment Supercomputing 2001, Denver) Any Cactus application Scaling: 15% -> 85% 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend