f Lothar A T Bauerdick Fermilab ISGC 2004 July 27, 2004 f U.S. - - PowerPoint PPT Presentation

f
SMART_READER_LITE
LIVE PREVIEW

f Lothar A T Bauerdick Fermilab ISGC 2004 July 27, 2004 f U.S. - - PowerPoint PPT Presentation

Grid-3 and the Open Science Grid in the U.S. LATBauerdick, Fermilab International Symposium on Grid Computing ISGC 2004 Academia Sinica, Taipei, Taiwan f Lothar A T Bauerdick Fermilab ISGC 2004 July 27, 2004 f U.S. Grids


slide-1
SLIDE 1

f

Lothar A T Bauerdick Fermilab ISGC 2004 July 27, 2004

LATBauerdick, Fermilab International Symposium on Grid Computing ISGC 2004

中央研究院 Academia Sinica, Taipei, Taiwan

Grid-3 and the Open Science Grid in the U.S.

slide-2
SLIDE 2

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

2

U.S. Grids Science Drivers

 Science drivers for U.S. Physics Grid Projects:

 iVDGL, GriPhyN and PPDG (”Trillium”)

ATLAS & CMS experiments @ CERN LHC

100s of Petabytes

2007 - ?

High Energy & Nuclear Physics expts

~1 Petabyte (1000 TB)

1997 – present

LIGO (gravity wave search)

100s of Terabytes

2002 – present

Sloan Digital Sky Survey

10s of Terabytes

2001 – present

Data growth Community growth

2007 2005 2003 2001 2009

Future Grid resources

Massive CPU (PetaOps) Large distributed datasets (>100PB) Global communities (1000s)

slide-3
SLIDE 3

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

3

Globally Distributed Science Teams

 Sharing and federating vast Grid resources

slide-4
SLIDE 4

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

4

Gravitational Wave Observatory

 Grid-enabled

GW Pulsar Search using the Pegasus system

 Goal: Implement a production-level

blind galactic-plane search for Gravitational Wave pulsar signals

 Run 30 days on ~5-10x more resources

than LIGO has -- using the grid (e.g., 10,000 CPUs for 1 month) Millions of individual jobs

 Planning by GriPhyN Chimera/Pegasus

Execution by Condor DAGman File cataloging by Globus RLS Metadata by Globus MCS

 Achieved: Access to ~ 6000 CPUs for 1 week

~ 5% utilization due to bottlenecks

slide-5
SLIDE 5

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

5

Sloan Digital Sky Survey

 Galaxy Cluster Finding: red-shift analysis, weak lensing effects

 Using the GriPhyN Chimera and Pegasus  Coarse grained DAG works fine (batch system)

Fine grain DAG has scaling issues (virtual data system)

slide-6
SLIDE 6

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

6

Large Hadron Collider

 Energy frontier high luminosity p-p-collider at CERN

 order-of-magnitue step in energy and luminosity for particle physics

1 1 1 1 1 1 1 1 1 1

Constituent Center-of-Mass Energy (GeV) Constituent Center-of-Mass Energy (GeV)

S p p p p S S

S R R E E D D I I L L L L O O C C N N O O R R D D A A H H

C H H L L n n

  • r

r t t a a v v e e T T 2 2 P P E E L L

1 1 P P E E L L

) ) N N R R E E C C ( (

s s e e i i l l i i m m a a f f 3 3 : : 9 9 8 8 9 9 1 1

R R A A E E P P S S

S R R E E D D I I L L L L O O C C e e e e

– – + +

2 2 9 9 9 9 1 1 8 8 9 9 1 1 7 7 9 9 1 1 6 6 9 9 1 1

s c c i i s s y y h h P P t t s s r r i i F F f f

  • r

r a a e e Y Y

) N N R R E E C C ( ( ) b b a a l l i i m m r r e e F F ( ( ) ) N N R R E E C C ( ( ) d d r r

  • f

f n n a a t t S S ( (

A R R E E H H

2 2 2 2 1 1 2 2

S R R E E D D I I L L L L O O C C p p

  • e

e

P E E L L C C H H L

  • A

R R T T E E P P

) ) Y Y S S E E D D ( (

N N O O U U L L G G : : 9 9 7 7 9 9 1 1 / / J J : : 4 4 7 7 9 9 1 1

  • :

: 5 5 7 7 9 9 1 1

  • p

p

  • t

t : : 5 5 9 9 9 9 1 1

Z Z , , W W : : 3 3 8 8 9 9 1 1

slide-7
SLIDE 7

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

7

Emerging LHC Production Grids

 LHC first to put “real”, multi-organizational, global Grids to work

 large resources become available to experiments “opportunistically”

slide-8
SLIDE 8

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

8

Grid2003 project

 in 2003 U.S. science projects and Grid projects coming together

to build a multi-organizational Infrastructure: Grid3

virtual data grid laboratory virtual data research end-to-end HENP applications US LHC projects testbeds, data challenges

RHIC Tevatron BaBar U.Buffalo BTeV VDT Korea CMS

slide-9
SLIDE 9

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

9

Grid3 Initial multi-organizational Grid infrastructure

 Common Grid operating as coherent loosely-coupled infrastructure.  Applications running on Grid3 (Trillium, U.S. LHC), benefiting

LHC (3), SDSS (2), LIGO (1), Biology (2), Computer Science (3).

25 Universities 4 National Labs 2800 CPUs

July-26, 2004, 11:35pm CDT

slide-10
SLIDE 10

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

10

Resource Sharing Works

 example: U.S. CMS Data Challenge Simulation production  running on Grid3 since Nov 2003  profited at least 40% non-CMS resources in first quarter 2004

slide-11
SLIDE 11

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

11

Important Role of Tier2 Centers

 Tier2 facility logically grouped around their Tier1 regional center

 20 – 40% of Tier1?  “1-2 FTE support”: commodity CPU & disk, no hierarchical storage  Essential university role in extended computing infrastructure  Validated by 3 years of experience with proto-Tier2 sites

 Specific Functions for Science Collaborations

 Physics analysis  Simulation  Experiment software  Support smaller institutions

 Official role in Grid hierarchy (U.S.)

 Sanctioned by MOU (ATLAS, CMS, LIGO)  Local P.I. with reporting responsibilities  Selection by collaboration via careful process

slide-12
SLIDE 12

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

12

Grid3 infrastructure built upon the Virtual Data Toolkit

 Grid environment built from core Globus and Condor middleware,

as delivered through the Virtual Data Toolkit (VDT)

 GRAM, GridFTP, MDS, RLS, VDS, VOMS, …  VDT sponsored through GriPhyN and iVDGL, contributions from LCG

 …equipped with VO and multi-VO security, monitoring, and

  • perations services

 …allowing federation with other Grids where possible, eg. CERN

LHC Computing Grid (LCG)

 U.S.ATLAS: GriPhyN Virtual Data System execution on LCG sites  U.S.CMS: storage element interoperability (SRM/dCache)

slide-13
SLIDE 13

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

13

Grid3 Principles

 Simple approach:

 Sites consisting of

 Computing element (CE)  Storage element (SE)  Information and monitoring services

 VO level, and multi-VO

 VO information services  Operations (iGOC)

 Minimal use of grid-wide systems

 No centralized resource broker, replica/metadata catalogs, or

command line interface

 to be provided by individual VO’s

 Application driven

 adapt application to work with Grid-3 services  prove application on VO testbeds

slide-14
SLIDE 14

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

14

“loosely coupled” set of services

 The Grid3 environment consists on a “loosely coupled” set of services

 Processing Service

 Globus-Gram bridge from Condor-G for central submission  four separate queueing systems are being supported

 Data Transfer Services

 GridFTP interfaces on all sites through gateway systems

 Files are transferred into processing sites  Results are transferred directly into MSS GridFTP door  CMS has moved to SRM-based storage element functionality

 VO Management Services

 Need central service for authentication, VOMS

 Monitoring Services

 System and application level monitoring allows status verification and diagnoses

 Software Distribution Services

 lightweight, based on Pacman

 Information Services

 top help applications and monitoring, based on MDS

slide-15
SLIDE 15

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

15

Site Services and Installation

 Goal is to install and configure with minimal human intervention  Use Pacman tool and distributed software “caches”  Registers site with VO and Grid3 level services  Accounts, application install areas & working directories

Compute Element

Storage

Grid3 Site %pacman –get iVDGL:Grid3

VDT VO service GIIS register Info providers Grid3 Schema Log management

$app $tmp

4 hours to install and validate

slide-16
SLIDE 16

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

16

VO centric model

 “what are the services to enable application VOs”  “what do providers need to provide resources to their VOs”

 Lightweight-nes at the cost of centrally provided functionality  examples for this approach:  flexible VO security infrastructure

 DOEGrids Certificate Authority  PPDG and iVDGL Registration Authorities,

with VO or site sponsorship

 Automated multi-VO authorization,

using EDG-developed VOMS

 Each VO manages a service and it’s members  Each Grid3 site is able to generate and locally adjust

gridmap file with authenticated query to each VO service

 VOs negotiate policies & priorities with provider directly  VOs can run their own storage services

 U.S. CMS sites run SRM/dCache storage services on Tier-1 and Tier-2s

iVDGL US CMS US ATLAS LSC SDSS BTeV

Grid3 Sites Grid3 gridmap VOMS servers

slide-17
SLIDE 17

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

17

Information Provider Services

GRID3 GIIS Grid3 Location Grid3 Data DIR Grid3 Applications Grid3 Temporary DIR ATLAS GIIS Boston GRIS Boston U Chicago UFL GRIS ANL BNL ANL BNL Boston U Chicago USCMS GIIS UFL FNAL RiceU CalTech UFL FNAL RiceU CalTech

Grid3 Site Resources VO Index Service (6) Grid3 Index Service

GRID3 Location Grid3 Data_DIR

slide-18
SLIDE 18

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

18

Monitoring Services Framework

VO GIIS MonALISA GIIS Site Catalog Ganglia ACDC JobDB ML repository OS (syscall, /proc) GRIS Job manager Log files System config. Producers Consumers Intermediaries MonALISA client MDViewer IS Clients IS Clients WWW Reports IS Clients Client tools

slide-19
SLIDE 19

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

19

iVDGL Operations Center (iGOC)

 Co-located with Abilene NOC, hosted by Indiana University  Hosts/manages multi-VO services:

 top level Ganglia collectors  MonALISA web server and archival service  top level GIIS  VOMS servers for iVDGL, BTeV, SDSS  Site Catalog service  iVDGL Pacman caches

 Trouble ticket systems

 phone (24 hr), web and email based collection and reporting system  Investigation and resolution of grid middleware problems at the level

  • f 16-20 contacts per week

 Weekly operations meetings for troubleshooting

slide-20
SLIDE 20

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

20

Science Applications

 Rule: install application through grid jobs

encourage self-contained environments

 11 applications

 LHC (ATLAS, CMS)  Astrophysics (LIGO/SDSS)  Biochemical

 Molecular X-ray diffraction  GADU/Gnare: compares protein sequences

 Computer Science

 Adaptive data placement and scheduling algorithms  Grid Exerciser

 Over 100 users authorized to run on Grid3  Managed to add new sites and new applications

 U. Buffalo (Biology) Nov. 2003  Rice U. (CMS Heavy Ions) Feb. 2004

slide-21
SLIDE 21

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

21

Grid3 Metrics

 Grid2003 project milestone metric

 Nov 2003, demonstrated at SC2003

2762 (27 sites) 400 Number of CPUs

Achieved Target Metric

102 (16) > 10 Number of users > 2-3 TB 1000 > 10 > 4 4.4 TB (11.12.03) Data Transfer per day 1100 Peak number of concurrent jobs 17 Number of site running concurrent apps 10 (+CS) Number of applications

slide-22
SLIDE 22

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

22

Grid3 Metrics

 ~40,000 jobs/month sustained running

slide-23
SLIDE 23

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

23

Grid3 and LCG

 strategy of interoperability and joint projects with LCG

 Particular to Grid2003:

 Common Virtual Data Toolkit (VDT) delivery and support team.  Data Movement and Storage Management: U.S. CMS demonstration between

Grid2003 and Cern.

 Job Execution: U.S. ATLAS Grid3 application submission to LCG sites using

Chimera Virtual Data System (VDS).

 Merge of Information Attributes (GLUE Schema extensions) from Grid3 and LCG.

 Other Collaborative Efforts:

 Virtual Organization Management Project (VOX) collaboration with European Data

Grid and LCG Security Working Group.

 Contributions to and from the wider CMS and ATLAS software and computing

deliverables.

 Presentations at and discussions with LCG committees  Participation in High Energy Physics Joint Technical Board and Global Grid Forum

Particle and Nuclear Physics Applications Research Group.

slide-24
SLIDE 24

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

24

Grid3 Lessons Learned

 Grid3 was an evolutionary change in environment

 The infrastructure was built on established testbed efforts from

participating experiments

 However, it was a revolutionary change in scale

 The available computing and storage resources increased by 4-5  The human interactions in the project increased

 Architecture and operations lessons -> “Lessons learned” document

 e.g. scalability issues with headnodes, certificate infrastructure, etc

 Grid3 in continuous use since 10/03, now running Atlas DC2  undergoes adiabatic upgrades while operating (Grid3+)  next step: OSG-0!

slide-25
SLIDE 25

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

25

Roadmap towards Open Science Grid

 iteratively build and extend Grid3, to a national infrastructure of

shared resources, benefiting broad set of scientific applications

 ➡ Open Science Grid  build the OSG by contributing our LHC resources into a coherent

infrastructure to provide the initial federation

 US LHC Tier-1 and Tier-2 centers significant sized infrastructure!

 Laboratories, Universities, Campus Grids etc. participate in OSG,

  • pening their resources to the common infrastructure.

 examples at UW Madison, U.Florida, Fermilab, Purdue, many others:

significantly sized shared “Grid Farms” and storage facilities

 Goal: develop OSG such that it will attract and support

partnerships with and contributions from other sciences

 build it generic enough, end-to-end, to benefit others

slide-26
SLIDE 26

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

26

Open Science Grid Consortium

 Joining our U.S. forces into the Open Science Grid consortium :

inclusive collaboration between application scientists, technology providers, resource owners, ...

 realizing magnitude of task and needs for ongoing support and effort

 Provide a lightweight framework for joint activities...

 coordinating activities that cannot be done within one project  achieve technical coherence and strategic vision (roadmap)  enable communication and provide liaising

 ... and for reaching common decisions when necessary  Technical Groups propose, organize, oversee activities, peer, liaise

 now Security and Storage Services; soon Support Centers, Policies

 Activities are well defined, scoped set of contributing tasks

provided by participants joining the activity

slide-27
SLIDE 27

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

27

Stakeholder and Activities

Enterprise Technical Groups 0…n (small) Consortium Board (1) Research Grid Projects VO Org Researchers Sites Service Providers Campus, Labs activity 1 activity 1 activity 1 activity 0…N (large) Joint committees (0…N small) Participants provide:

resources, management, project steering groups

OSG Process Framework

slide-28
SLIDE 28

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

28

Facilities Laboratory and University Resources

Storage, Processing, Databases..

Production Grid Infrastructure Grid Fabric, Connectivity and Middleware

System Monitoring, Information, Software repositories, Support Centers,

Science Communities Applications and Analysis

Applications, Infrastructure, Facilities

slide-29
SLIDE 29

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

29

Facilities Laboratory and University Resources

Storage, Processing, Databases..

Production Grid Infrastructure Grid Fabric, Connectivity and Middleware

System Monitoring, Information, Software repositories, Support Centers,

Science Communities Applications and Analysis

Applications, Infrastructure, Facilities

slide-30
SLIDE 30

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

30

Grid Life Cycle

 Sustained persistent production Grid infrastructure  Continuous need for Grid Lab for development and integration  instrumental for developing

Grid3 and the OSG…: U.S. CMS/Atlas testbeds!

 important role for iVDGL to

deliver and maintain this lab environment for U.S. Grids!

slide-31
SLIDE 31

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

31

A National Shared Federated Infrastructure for Science…

 develop OSG-0 by increasing functionality and scale

 services for persistent and temporary storage, data transfer services  policy, accounting, authorization, security, configuration infrastructure

 OSG-0 milestone planned for Feb-2005

slide-32
SLIDE 32

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

32

… As Part of a Global Federation of Grids

CERN

Tier2 Lab a Uni a Lab c Uni n Lab m Lab b Uni b Uni y Uni x

Physics Department

  • Desktop

Germany

Tier 1

USA FermiLab UK France Italy NL USA Brookhaven ……….

Open Science Grid

Tier-2 Tier-2 Tier-2 Uni x Uni u Uni z Tier-2

 Open Science Grid: Universities, Labs, LHC Tier-1, Tier-2, ...

slide-33
SLIDE 33

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

33

Partnerships for Open Science Grid Shared Infrastructure

 Complex matrix of partnerships — opportunities!

 NSF and DOE, Universities and Labs, physics and computing dept.  Application sciences, computer sciences, information technologies  Science and education  U.S., Europe, Asia, North and South America  U.S. LHC and Trillium Grid Projects  middleware and infrastructure projects  CERN and U.S. HEP labs, LCG and Experiments, EGEE and OSG

 ~85 participants on OSG stakeholder mail list

~50 authors on OSG CHEP abstract

 Joint Steering Meetings (Trillium, U.S. LHC) to define OSG program  OSG Blueprint meeting July 12-15 and Sept 7,8  OSG workshop hosted by U.S. Atlas in Boston Sept 9, 10

slide-34
SLIDE 34

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

34

Conclusions

 The success of Grid3 has demonstrated that

 Grids bring real value to science experiments  U.S. can use its Tier-1 & Tier-2 facilities in common Grid Environment  can operate and use a Multi-Organization Grid with distributed

  • wnership and authority as a coherent system.

 Grid w/ >20 sites can run robust & performant for simple applications  Feasibility of strategy to federate & share resources

  • Open Science Grid roadmap.

 We are making concrete workplans to

 Operate the current infrastructure for LHC data challenges  Evolve to increased capability and performance for start of 2005  Start longer term Engineering of Services and Capabilities  Continue interoperability and joint projects with the LCG

slide-35
SLIDE 35

Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004

f

35

Acknowledgments

 Acknowledge the excellent work of the Grid3 teams:

 participation was at the level of 58 people. 8 worked full time. 10 worked half time.

20 site administrators worked quarter time. Total effort ~ 7 FTE-years (17 FTE for 5 months).

 Special thanks for help on this talk with slides, graphics, ideas etc to

 Ruth Pordes/Fermilab, Rob Gardner/U.Chicago, Grid3 Taskforce leads  Ian Fisk/Fermilab,J.Rodrigues/U.Florida,Leigh Grundhoefer/U.Indiana

Rob Gardner Mike Wilde Ian Fisk Scott Koranda Rich Baker Jim Annis Peter Couvares Jorge Rodriguez Leigh Grundhoeffer Nickolai Kuropatine Xin Zhao John Hicks Ed May Alain Roy Brian Moe Fred Luerhing Iowna Sakrejda Yuri Smirnov Marco Mambelli Anzar Afaq Suresh Singh Carey Kireyev Alain DeSmet Jerry Gieraltowski Doug Olson Brian Tierney Saul Youssef Anne Heavey Terrence Martin Andrew Zahn Scott Gose Vijay Sekhri Dantong Yu Lawrence Sorrillo Yong Xia Rob Quick Michael Ernst Greg Graham Bobby Brown Bockjoo Kim Jens Voekler Ruth Pordes Matt Allen Yujun Wu Lisa Giacchetti Joe Kaiser Erik Paulson George Fekete Dan Engh Kihyeon Cho James Letts Tim Thomas John Weigand Iosif Legrand Mark Green Craig Prescott Nosa Olomu Ben Clifford Dan Bradley Timur Perelmutov Patrick McGuigan Shawn McKee Guarang Mehta