Challenges for Grids Challenges for Grids Markus Schulz CERN IT GD - PowerPoint PPT Presentation

Challenges for Grids Challenges for Grids Markus Schulz CERN IT GD LCG/EGEE

Disclaimer Disclaimer • All views expressed are mine and are not necessarily shared by the projects or organization that I am associated with – Don’t blame: EGEE, LCG, CERN…. – Critique, flames, and the like should be directed to: • Markus.schulz@cern.ch 7/31/2006 Challenges for grids 2

Approach Approach • Thinking a few years ahead – Based on what we know – Ignoring problems like • software quality (far from perfect) • lack of fabric management on sites • site admin fear of loosing total control – Focused on structural problems • Make production grids work at the required scale • Expand the systems to other domains – Industry, micro Vos, …… • Move closer to the grid vision 7/31/2006 Challenges for grids 3

Babylonian Confusion Babylonian Confusion • What is called Grid covers ฀ : – Standalone Clusters – Clusters for scaling a single service – Intra organizational clusters • With central administrative control – Community computing • SETI@home, boinc – I.Foster: <------- This is what I will use….. • “coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations. “ • ”On-demand, ubiquitous access to computing, data, and services” 7/31/2006 Challenges for grids 4

The Dangers of Success The Dangers of Success • Early Success – Constraints from existing infrastructures • Users depend on them – Research ---> Production transition is very hard – Restricts standardization • The curse of backwards compatibility • Example EGEE, WLCG, OSG, ARC – > 70 VOs 7/31/2006 Challenges for grids 5

EGEE Grid Sites : Q1 2006 200 200 180 180 160 160 140 140 120 120 100 100 80 80 60 sites sites 60 40 40 20 20 0 0 Jun-04 Oct-04 Dec-04 Feb-05 Jun-05 Oct-05 Dec-05 Apr-04 Aug-04 Apr-05 Aug-05 Jun-04 Oct-04 Dec-04 Feb-05 Jun-05 Oct-05 Dec-05 Apr-04 Aug-04 Apr-05 Aug-05 30000 30000 25000 25000 EGEE: EGEE: CPU CPU 20000 20000 > 190 sites, 40 countries > 190 sites, 40 countries 15000 15000 > 24,000 processors, 10000 > 24,000 processors, 10000 5000 5000 ~ 5 PB storage ~ 5 PB storage 0 0 7/31/2006 Challenges for grids 6 ~ 70 Virtual organizations ~ 70 Virtual organizations 4 4 5 5 5 6 4 4 4 5 5 5 4 4 5 5 5 6 0 0 0 4 0 4 4 0 0 0 0 5 0 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 - - - - 0 - - - - - 0 - n - n - r - - g - t c - b - r - - g - t c - b - p n - p n - r u g c t c e b r u g c t c e b p u e p u e A u O c e A u O c e J u D e F J u D e F A A O A A O J A D F J A D F

EGEE Operations EGEE Operations • Grid operator on duty – 6 teams working in weekly rotation • CERN, IN2P3, INFN, UK/I, Ru,Taipei – Crucial in improving site stability and management – Expanding to all ROCs in EGEE-II • Operations coordination – Weekly operations meetings – Regular ROC managers meetings – Series of EGEE Operations Workshops • Nov 04, May 05, Sep 05, June 06 • Geographically distributed responsibility for operations: – There is no “central” operation – Tools are developed/hosted at different sites: • GOC DB (RAL), SFT (CERN), GStat (Taipei), CIC Portal (Lyon) • Procedures described in Operations Manual – Introducing new sites – Site downtime scheduling – Suspending a site – Escalation procedures 7/31/2006 Challenges for grids 7 – etc

Use of the infrastructure Use of the infrastructure 35000 30000 25000 No. jobs/day 20000 15000 10000 Total 5000 non-LCG 0 Jan-05 Feb-05 Mar-05 Apr-05 May-05 Jun-05 Jul-05 Aug-05 Sep-05 Oct-05 Nov-05 Dec-05 Jan-06 Feb-06 Mar-06 Apr-06 Sustained & regular workloads of >30K jobs/day • spread across full infrastructure • doubling/tripling in last 6 months – no effect on operations •Will increase to at least 150k jobs/day in the next CPU time delivered 18month 3,000,000 CPU - cpu-years/month 2,500,000 300 250 -hours/month lhcb 2,000,000 geant4 cpu-year / m onth 200 cms 1,500,000 biomed 150 SI2K atlas 1,000,000 alice 100 500,000 7/31/2006 Challenges for grids 8 50 0 0 Jun-05 Jul-05 Aug-05 Sep-05 Oct-05 Nov-05 Dec-05 Jan-06 Feb-06 Mar-06 Apr-06 Jun-05 Jul-05 Aug-05 Sep-05 Oct-05 Nov-05 Dec-05 Jan-06 Feb-06 Mar-06 Apr-06

Use of the infrastructure Use of the infrastructure Massive data transfers > 1.5 GB/s • Several applications now depend on EGEE as their primary computing resource Sustainability: • Usage can (and does) grow without need for additional operational effort 7/31/2006 Challenges for grids 9

A global, federated e-Infrastructure A global, federated e-Infrastructure BalticGrid NAREGI SEE-GRID OSG EUChinaGrid EUMedGrid EUIndiaGrid EELA EGEE infrastructure ~ 200 sites in 39 countries ~ 20 000 CPUs > 5 PB storage > 35 000 concurrent jobs per day 7/31/2006 Challenges for grids 10 > 80 Virtual Organisations

OSG- Currently ~20,000 Jobs/Day OSG- Currently ~20,000 Jobs/Day ATLAS CMS CDF D0 GLOW, STAR 7/31/2006 Challenges for grids 11

This all looks very promising…. This all looks very promising…. • But……. – Interoperation between grids • Lack of standardization • Several larger sites have to support multiple interfaces – Managing diversity inside grids • OS versions – Applications are sensitive and sites have preferences – Sites and user move independently • Batch systems – Each requires extensive work to interface – Limited to smallest set of shared functionality » Frustrates users AND resource managers » Lack of standardization 7/31/2006 Challenges for grids 12

More problems…. More problems…. • Storage, DBs… – Different storage management systems are established • HSMs, disk pools with shared file systems – Different security, storage models, lack of standards • VO management – Creation of a VO is straight forward – Getting access to resources requires: • Negotiation with resource providers • Significant effort of sites to host an additional VO – Accounting, dynamic prioritization, quotas problematic • on global level (between different Vos) • inter-VO • Constrained by national privacy laws – No market of resources 7/31/2006 Challenges for grids 13

More problems…. More problems…. • Achievable reliability limited – The more complex services have to interact, the higher the probability that the overall service fails • ‘Russian Doll Performance Sink’ here: File open – Applies to many services • Grid interfaces need to be native interfaces Information system interactions are left out – STANDARDS SRM MSS MSS GFAL 7/31/2006 Challenges for grids 14

State of Standardization State of Standardization • First round of tentative standards – Mostly based on research work • Missed deployment and operations related part – Production grids started with ‘de facto standards’ – Now: OGSA • Much more detailed, recycles established standards • But: additional layers, old services will be wrapped!!! 7/31/2006 Challenges for grids 15 Diagram from Globus Alliance

Replication Transfer VO Context Services Mgmt Data Integration Policy Mgmt Services Access Information Services Context Services Data I nfo Event Monitoring Discovery Logging Execution Mgmt Services Services Services Mgmt Execution I nfra Workflow Workload Execution Job Mgmt Execution WSRF WSN WSDM Naming Services Mgmt Mgmt Planning Mgmt Services Infrastructure Rsrc Mgmt Self Mgmt Reservation Configuration Deployment Provisioning Services Services Services Security Services Resource Mgmt Services Heterogeneity Mgmt Self Authentication Optimization Mgmt Authorization Service Level Security Attainment Services Integrity QoS Services Mgmt Boundary Traversal 7/31/2006 Challenges for grids 16

Relevant Specifications Relevant Specifications SYSTEMS GRID UTILITY MANAGEMENT COMPUTING COMPUTING Use Cases & Distributed query processing Data Centre Applications Collaboration Persistent Archive ASP Multi Media VO Management ByteIO OGSA-EMS WS-DAI Core Services Information WSDM Discovery GGF-UR Naming WS-Base Notification Privacy Trust GFD-C.16 WSRF-RP WSRF-RL Data Model WSRF-RAP WS-Security SAML/XACML X.509 Base Profile WS-Addressing HTTP(S)/SOAP WSDL CIM/JSIM Data Transport GRID Computing, Distributed Computing and Utility Computing are different views of the same important problem domain.

Is there Hope? Is there Hope? • Diversity on OS level – Virtualization is making progress (XEN,…) • Experience based standardization – Information systems,etc. • Interoperation efforts start to influence standardization • Core services start to work on native GRID interfaces – DBs, batch systems, storage – Still in an early state, but has a huge potential • Solid, well managed standards are needed • Otherwise a wrapper is the ‘best’ solution 7/31/2006 Challenges for grids 18

Challenges for Grids Challenges for Grids Markus Schulz CERN IT GD - PowerPoint PPT Presentation

Challenges for Grids Challenges for Grids Markus Schulz CERN IT GD LCG/EGEE Disclaimer Disclaimer All views expressed are mine and are not necessarily shared by the projects or organization that I am associated with Dont blame:

Scientific Computing I Grids Strcutured Grids Unstrcutured Grids Module 7: Grid Generation

UPM DAY 1: SMART GRIDS TABLE 1: TECHNOLOGICAL CHALLENGES RELATED WITH SMART GRIDS DEVELOPMENT

MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination

I ntroduction to the NRENs and Grids w orkshops Catalin Meirosu TERENA 4 th NRENs and Grids w

Tuesday Wednesday Thursday Friday Keynotes Keynotes Keynotes parallel Photo coffee Grids

Jos Manuel Martn INYCOM DAY 1: SMART GRIDS TABLE 1: TECHNOLOGICAL CHALLENGES RELATED WITH

Challenges of energy forecasting for smart grids Modelling Smart Grids, Prague 11th of September

GPU Ray-tracing using Irregular Grids Arsne Prard-Gayot, Javor Kalojanov, Philipp Slusallek

Cascading Failures in Power Grids - Analysis and Algorithms Saleh Soltan 1 , Dorian Mazaruic 2 ,

Seizing the Mini- grids Opportunity: Market Trends and Pathways to Growth State of the Global

I nterOperability am ong Grids: A Case Study w ith GARUDA & EGEE Grids Sham jith K. V.

Evert Raaijen Sales and Business Development ELECTRICITY GRIDS ENERGY STORAGE SMART GRIDS

BUSINESS CASES FOR REMOTE MICRO- GRIDS AND OFF-GRIDS WITH HYDROGEN- BASED TECHNOLOGIES EURO 2019

Grids and EGEE are not just for High Energy Physicists Richard Hopkins, National e-Science Centre

Power Grids in Asia Power Grids in Asia Mode of operation and dynamics Mode of operation and

Coupled Simulation of Flow and Body Motion Using Overset Grids Eberhard Schreck & Milovan Peri

BNL dCache Status and Plan dCache Workshop: January 18-19, 2007 dCache Workshop: January 18-19,

EUCALYPTUS: An Open Source Infrastructure for Elastic Computing Research Rich Wolski Chris

Python Best Practices in HPC Roland Haas (NCSA) Email: rhaas@illinois.edu Why use Python in HPC?

Performance of HPC Middleware over Infiniband WAN Designing Efficient FTP Mechanisms for High

News from UMD Cristina Aiftimiei (EGI.eu/INFN) Joao Pina (EGI/LIP) www.egi.eu EGI-InSPIRE

Neuroanatomy I: Structures Marc Norman, Ph.D., ABPP Amanda Gooding, Ph.D., ABPP Department of

Some applications of time delay systems Bootan Rahman University of Kurdistan Hewl er

Renal Disease in Asians 6 th Asian Health Symposium October 7, 2017 Division of Nephrology San

Sambuz

Useful Links

Newsletter

Mail Us

Challenges for Grids Challenges for Grids Markus Schulz CERN IT GD - PowerPoint PPT Presentation

Challenges for Grids Challenges for Grids Markus Schulz CERN IT GD LCG/EGEE Disclaimer Disclaimer All views expressed are mine and are not necessarily shared by the projects or organization that I am associated with Dont blame:

Scientific Computing I Grids Strcutured Grids Unstrcutured Grids Module 7: Grid Generation

UPM DAY 1: SMART GRIDS TABLE 1: TECHNOLOGICAL CHALLENGES RELATED WITH SMART GRIDS DEVELOPMENT

MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination

I ntroduction to the NRENs and Grids w orkshops Catalin Meirosu TERENA 4 th NRENs and Grids w

Tuesday Wednesday Thursday Friday Keynotes Keynotes Keynotes parallel Photo coffee Grids

Jos Manuel Martn INYCOM DAY 1: SMART GRIDS TABLE 1: TECHNOLOGICAL CHALLENGES RELATED WITH

Challenges of energy forecasting for smart grids Modelling Smart Grids, Prague 11th of September

GPU Ray-tracing using Irregular Grids Arsne Prard-Gayot, Javor Kalojanov, Philipp Slusallek

Cascading Failures in Power Grids - Analysis and Algorithms Saleh Soltan 1 , Dorian Mazaruic 2 ,

Seizing the Mini- grids Opportunity: Market Trends and Pathways to Growth State of the Global

I nterOperability am ong Grids: A Case Study w ith GARUDA &amp; EGEE Grids Sham jith K. V.

Evert Raaijen Sales and Business Development ELECTRICITY GRIDS ENERGY STORAGE SMART GRIDS

BUSINESS CASES FOR REMOTE MICRO- GRIDS AND OFF-GRIDS WITH HYDROGEN- BASED TECHNOLOGIES EURO 2019

Grids and EGEE are not just for High Energy Physicists Richard Hopkins, National e-Science Centre

Power Grids in Asia Power Grids in Asia Mode of operation and dynamics Mode of operation and

Coupled Simulation of Flow and Body Motion Using Overset Grids Eberhard Schreck &amp; Milovan Peri

BNL dCache Status and Plan dCache Workshop: January 18-19, 2007 dCache Workshop: January 18-19,

EUCALYPTUS: An Open Source Infrastructure for Elastic Computing Research Rich Wolski Chris

Python Best Practices in HPC Roland Haas (NCSA) Email: rhaas@illinois.edu Why use Python in HPC?

Performance of HPC Middleware over Infiniband WAN Designing Efficient FTP Mechanisms for High

News from UMD Cristina Aiftimiei (EGI.eu/INFN) Joao Pina (EGI/LIP) www.egi.eu EGI-InSPIRE

Neuroanatomy I: Structures Marc Norman, Ph.D., ABPP Amanda Gooding, Ph.D., ABPP Department of

Some applications of time delay systems Bootan Rahman University of Kurdistan Hewl er

Renal Disease in Asians 6 th Asian Health Symposium October 7, 2017 Division of Nephrology San

Sambuz

Useful Links

Newsletter

Mail Us

I nterOperability am ong Grids: A Case Study w ith GARUDA & EGEE Grids Sham jith K. V.

Coupled Simulation of Flow and Body Motion Using Overset Grids Eberhard Schreck & Milovan Peri