Compute and data management strategies for grid deployment of high - PowerPoint PPT Presentation

Compute and data management strategies for grid deployment of high throughput protein structure studies Ian Stokes-Rees, Piotr Sliz Harvard Medical School Many Task Computing on Grids and Supercomputers 2010

Overview Context: Structural biology computing (think proteins) Infrastructure: Open Science Grid Computational model Application Data Workflow Identity management and security Perspectives & Conclusions Ian Stokes-Rees - portal.nebiogrid.org - Harvard Medical School MTAGS10, November 2010

SBGrid Consortium Cornell U. Washington U. School of Med. R. Cerione NE-CAT T. Ellenberger B. Crane R. Oswald D. Fremont S. Ealick C. Parrish Rosalind Franklin NIH M. Jin H. Sondermann D. Harrison M. Mayer A. Ke UMass Medical U. Washington U. Maryland W. Royer T. Gonen E. Toth Brandeis U. UC Davis N. Grigorieff H. Stahlberg Tufts U. K. Heldwein UCSF Columbia U. JJ Miranda Q. Fan Y. Cheng Rockefeller U. Stanford R. MacKinnon A. Brunger Yale U. K. Garcia T. Boggon K. Reinisch T. Jardetzky D. Braddock J. Schlessinger Y. Ha F. Sigworth CalTech E. Lolis F. Zhou P. Bjorkman Harvard and Affiliates W. Clemons N. Beglova A. Leschziner Rice University G. Jensen S. Blacklow K. Miller D. Rees E. Nikonowicz B. Chen A. Rao Y. Shamoo Vanderbilt J. Chou T. Rapoport Y.J. Tao J. Clardy M. Samso Center for Structural Biology WesternU M. Eck P. Sliz W. Chazin C. Sanders M. Swairjo B. Furie T. Springer B. Eichman B. Spiller R. Gaudet G. Verdine M. Egli M. Stone UCSD M. Waterman M. Grant G. Wagner B. Lacy T. Nakagawa S.C. Harrison L. Walensky H. Viadiu Thomas Jefferson J. Hogle S.Walker D. Jeruzalmi T.Walz J. Williams D. Kahne J. Wang Not Pictured: T. Kirchhausen S. Wong University of Toronto: L. Howell, E. Pai, F. Sicheri; NHRI (Taiwan): G. Liou; Trinity College, Dublin: Amir Khan Ian Stokes-Rees - portal.nebiogrid.org - Harvard Medical School MTAGS10, November 2010

Northeast BioGrid Virtual Organization Biomedical researchers Life sciences Universities Tufts Universit y School of Medicin e Hospitals Government agencies Currently Boston-focused Ian Stokes-Rees - portal.nebiogrid.org - Harvard Medical School MTAGS10, November 2010

Protein Structure Studies sample imaging data fragments structure X-ray crystallography ... O(1e5) processed using grid infrastructure cryo electron microscopy Ian Stokes-Rees - portal.nebiogrid.org - Harvard Medical School MTAGS10, November 2010

Single Structure Study

Broad Structure Study

550 structures x 4000 iterations = 1 million iterations in broad study

Single Structure Wide Search 100,000 iterations 20,000 core-hours 12 hours wall-clock (typical) Ian Stokes-Rees - portal.nebiogrid.org - Harvard Medical School MTAGS10, November 2010

Open Science Grid US National Cyberinfrastructure Primarily used for high energy physics computing 80 sites O(1e5) job slots LIGO O(1e6) core-hours per day SBGrid PB scale aggregate storage Engage 4,654,878 Ian Stokes-Rees - portal.nebiogrid.org - Harvard Medical School MTAGS10, November 2010

Ian Stokes-Rees, NEBioGrid, Harvard Medical School February 16th, 2010

Typical Layered Environment Command line application (e.g. Fortran) Fortran bin Friendly application API wrapper Python API Batch execution wrapper for N-iterations MAP- Multi-exec wrapper REDUCE Results extraction and aggregation Result aggregator Grid job management wrapper Grid management Web interface Web interface forms, views, static HTML results GOAL eliminate shell scripts often found as “glue” language between layers Ian Stokes-Rees, SBGrid, Harvard Medical School October 13th, 2009

Shell Scripting vs. Structured Language ✓ Good modularization ✓ Rich set of easy-to-use file system operations ✓ Good Web/RPC integration ✓ Quick to translate “experimental” ✓ Good error handling operations from command line into ✓ Rich data structures reusable script ✓ Portable ✓ GUI interfaces possible - - Limited error handling File system interaction difficult - - Configuration and parameter Portability processing - Translating CLI operations laborious - Limited data structures - Difficult to build larger systems - Poor web integration Ian Stokes-Rees - portal.nebiogrid.org - Harvard Medical School MTAGS10, November 2010

Developing MTC Workflows Single CLI execution Job submission Configuration API for invocation Results suitable for aggregation Multi-exec format (important for short invocations) Meta-data suitable for MTC management and metrics Ian Stokes-Rees - portal.nebiogrid.org - Harvard Medical School MTAGS10, November 2010

Application Model Application binary API wrapper (single invocation) shex Python module for shell-like operations xconfig Python module for environment and module configuration Grid wrapper (single invocation) grid job description for single invocation Workflow generator Create DAG and job descriptors Standard results format Standard meta-data format Ian Stokes-Rees - portal.nebiogrid.org - Harvard Medical School MTAGS10, November 2010

Modules http://portal.nebiogrid.org/devel/projects/shex/shex http://portal.nebiogrid.org/devel/projects/xconfig/xconfig Results jobset job status start runtime exitcode score ba9 1scza_ OK 1287230825 635 0 614 Job meta-data: JOB_MARKER entry JOB_MARKER WQCG-Harvard-OSG tuscany.med.harvard.edu 1287198043 ba9-1c5pa_ sbgrid@tuscany01.med.harvard.edu:/scratch/condor/execute/dir_16947/glide_e16995/ execute/dir_27129 Sat Oct 16 03:00:43 UTC 2010 Application deployment Locally host “gold standard” Replicate to predictable location at all sites: $OSG_APP/sbgrid System configuration Sanity check basic pre-requisites (memory, disk space, applications, common data sets, directory existence and permissions, network) Environment: PATH, LD_LIBRARY_PATH, PYTHONPATH, etc. Ian Stokes-Rees - portal.nebiogrid.org - Harvard Medical School MTAGS10, November 2010

Data Model (I) Per-job data Need to minimize this to smallest unique set Even then, may need to pre-stage data to remote file server Staged using job manager or pulled by rsync, curl (HTTP), scp Removed on job completion Per-job set (workflow instance) data Pre-staged to each site at job set creation time: $OSG_DATA/users/$USERNAME/workflows/$WORKFLOWNAME Fetched by each job to worker node local disk (or read from NFS) Removed on job set cleanup or by tmpwatch weekly sweep NEW : Large data sets for workflow instance: pre-stage to UCSD, pull on per-job basis (insufficient quota, but big pipes) Ian Stokes-Rees - portal.nebiogrid.org - Harvard Medical School MTAGS10, November 2010

Data Model (II) User project data Pre-staged and manually managed at each site by user: $OSG_DATA/users/$USERNAME/projects/$PROJECTNAME Fetched by each job to worker node local disk (or read from NFS) Removed by user or manually by administrators on quota basis Static data Maintain “gold standard” and rsync or bulk update as required 20 GB of protein models pre-staged to $OSG_DATA/sbgrid/biodb Ian Stokes-Rees - portal.nebiogrid.org - Harvard Medical School MTAGS10, November 2010

Workflow Model Continuous aggregation “Inprogress” view of data - accept possibility of corruption Track errors sort by execution site - key predictor (network, disk, library, config problems) Retain only key output STDOUT, STDERR, and single per-job “results” file enough to easily retry arbitrary sub-sets of overall jobset (timeout, error, etc.) On-demand updates User-driven “expensive” status updates on queued, running, complete, failed jobs, plus aggregated results and report generation Finalized results Cleaned results Augmented results (inclusion of static per-job information) Ian Stokes-Rees - portal.nebiogrid.org - Harvard Medical School MTAGS10, November 2010

Application Exit States OK - job executed application correctly and usable results are returned done NO_SOLUTION - job executed, but no usable results failed, don’t rerun ERROR - job failed to execute properly failed, rerun (up to retry limit) SHORT - job executed and produced output, but runtime is suspicious complete, but don’t trust - rerun TIMEOUT - job was aborted before completing, no results available cancelled, don’t rerun Ian Stokes-Rees - portal.nebiogrid.org - Harvard Medical School MTAGS10, November 2010

Integrated View & Debug Web portal access to files X.509 access control (full file management) .htpasswd read-only sharing CLI or web interaction with running jobs Web view of data (files, tables, reports, AJAX) Web “file browsing” of all results With augmented hyperlinking to details or static information ssh/CLI access to files Users need to be able to drill down to the 1 million files and 5 GB of data generated by the execution of their workflow Ian Stokes-Rees - portal.nebiogrid.org - Harvard Medical School MTAGS10, November 2010

Access, IdM and Security Relying heavily on OSG facilities for federated environment X.509 DNs Proxy certs MyProxy LDAP for local accounts Access control: mod_gridsite and GACL policies Data access: apache and mod_gridsite Service access: web portal and gsi-enabled ssh Challenge : Making facilities available to user community alternatives to web portal and gsi-ssh would be nice: local to user Ian Stokes-Rees, SBGrid, Harvard Medical School October 13th, 2009

Compute and data management strategies for grid deployment of high - PowerPoint PPT Presentation

Compute and data management strategies for grid deployment of high throughput protein structure studies Ian Stokes-Rees, Piotr Sliz Harvard Medical School Many Task Computing on Grids and Supercomputers 2010 Overview Context: Structural

Sun and Grid John Barr Grid Business Development 07808 328351 john.barr@sun.com Sun and Grid

ON-GRID VS OFF-GRID SOLAR On-Grid Solar is solar generation that is connected to the utility grid

Migrating from Grid to Cloud: Migrating from Grid to Cloud: Migrating from Grid to Cloud:

Grid Compute Resources and Job Management How do we access the grid ? Command line with tools

SEE-GRID-SCI SEE-GRID Infrastructure for Regional eScience www.see-grid-sci.eu International

SEE-GRID Deploying a Grid-enabled eInfrastructure in SE Europe www.see-grid.org Jorge Sanchez,

Modernizing T&D on the Electric Grid 11/29/2011 Mark Nealon System Meter & Smart Grid

Grid Grid to Grid Grid-to to Ports Clock Routing for to-Ports Clock Routing for Ports Clock

Grid/Clo d Comp ting Grid/Clo d Comp ting Grid/Cloud Computing Grid/Cloud Computing over

1 1 easy to compute , 1 easy to compute 2

Grid Data Repository Dariush Shirmohammadi FERC Technical Conference June 28, 2018 Agenda

GRID PHD GRID, PHD The Smart Grid Cyber Security and the Future of Keeping the Lights On The

Grid! Alison Fulford Housekeeping National Grid 2 Introductions National Grid 3 Workplace

& Grid5000 Grid eXplorer eXplorer Grid Plates-formes de Grilles exprimentales

One Page Everywhere Fluid, Responsive Design with Semantic.gs The Semantic Grid System Grid

Outline n Introduction Proxy Dynamic Delegation in Grid Gateway n Is there the need for a

CUDA Applications I John E. Stone Theoretical and Computational Biophysics Group Beckman

Cryogenic Normal Conducting RF Accelerators - Experiments That Enable High Brightness RF Guns

SciLifeLab Bioinformatics Platform National Bioinformatics Infrastructure Sweden (NBIS) Bjrn

Ligandcentered assessment of SARSCoV2 drug target models A. Wlodawer 1 , Z. Dauter 2 , I.

Proper'es, Applica'on and Further Aspects of Zero-Valent

Course outline Theory Practice Day 1 Introduction to structure determination Chromatin

Progress with the ITER project activity in Russia Anatoly Krasilnikov for RF ITER collaboration

Atomic Resolution Modeling Protein-Protein interaction network of Large Macromolecular

Compute and data management strategies for grid deployment of high - PowerPoint PPT Presentation

Compute and data management strategies for grid deployment of high throughput protein structure studies Ian Stokes-Rees, Piotr Sliz Harvard Medical School Many Task Computing on Grids and Supercomputers 2010 Overview Context: Structural

Sun and Grid John Barr Grid Business Development 07808 328351 john.barr@sun.com Sun and Grid

ON-GRID VS OFF-GRID SOLAR On-Grid Solar is solar generation that is connected to the utility grid

Migrating from Grid to Cloud: Migrating from Grid to Cloud: Migrating from Grid to Cloud:

Grid Compute Resources and Job Management How do we access the grid ? Command line with tools

SEE-GRID-SCI SEE-GRID Infrastructure for Regional eScience www.see-grid-sci.eu International

SEE-GRID Deploying a Grid-enabled eInfrastructure in SE Europe www.see-grid.org Jorge Sanchez,

Modernizing T&amp;D on the Electric Grid 11/29/2011 Mark Nealon System Meter &amp; Smart Grid

Grid Grid to Grid Grid-to to Ports Clock Routing for to-Ports Clock Routing for Ports Clock

Grid/Clo d Comp ting Grid/Clo d Comp ting Grid/Cloud Computing Grid/Cloud Computing over

1 1 easy to compute , 1 easy to compute 2

Grid Data Repository Dariush Shirmohammadi FERC Technical Conference June 28, 2018 Agenda

GRID PHD GRID, PHD The Smart Grid Cyber Security and the Future of Keeping the Lights On The

Grid! Alison Fulford Housekeeping National Grid 2 Introductions National Grid 3 Workplace

&amp; Grid5000 Grid eXplorer eXplorer Grid Plates-formes de Grilles exprimentales

One Page Everywhere Fluid, Responsive Design with Semantic.gs The Semantic Grid System Grid

Outline n Introduction Proxy Dynamic Delegation in Grid Gateway n Is there the need for a

CUDA Applications I John E. Stone Theoretical and Computational Biophysics Group Beckman

Cryogenic Normal Conducting RF Accelerators - Experiments That Enable High Brightness RF Guns

SciLifeLab Bioinformatics Platform National Bioinformatics Infrastructure Sweden (NBIS) Bjrn

Ligandcentered assessment of SARSCoV2 drug target models A. Wlodawer 1 , Z. Dauter 2 , I.

Proper'es, Applica'on and Further Aspects of Zero-Valent

Course outline Theory Practice Day 1 Introduction to structure determination Chromatin

Progress with the ITER project activity in Russia Anatoly Krasilnikov for RF ITER collaboration

Atomic Resolution Modeling Protein-Protein interaction network of Large Macromolecular

Modernizing T&D on the Electric Grid 11/29/2011 Mark Nealon System Meter & Smart Grid

& Grid5000 Grid eXplorer eXplorer Grid Plates-formes de Grilles exprimentales