Evolution of the LHC Computing Models Ian Fisk May 22, 2014 About - PowerPoint PPT Presentation

Evolution of the LHC Computing Models Ian Fisk May 22, 2014

About Me I am a scientist with Fermilab I have spent the last 14 years working on LHC Computing problems I helped build the first Tier-2 prototype computing center in the US I was responsible for Integration and Commissioning of the CMS Computing system for 2006-2010 Ian Fisk CD/FNAL And I was computing coordinator of CMS for LHC Run1

Final Steps Software and Computing is the final in a long series to realize the physics potential of the experiment � Storage and � Serve the data � Reconstruct the � physics objects � Analyze the events Ian Fisk CD/FNAL � As the environment has become more complex and � demanding, computing and software have had to become faster and more capable

You need this Ian Fisk CD/FNAL To get this

Distributed Computing • Computing models are 622 Mbits/s Tier3 based roughly on the FNAL/BNL Univ WG 70k Si95 1 622 Mbits /s 70 Tbytes Disk; Robot MONARC model Tier3 s Univ WG / s t 2 i b Tier2 Center M 20k Si95 –Developed more than a 2 2 20 Tbytes Disk, 6 Robot X N decade ago CERN/CMS Tier3 350k Si95 622Mbits/s Univ WG 350 Tbytes Disk; –Foresaw Tiered M Robot 622 Mbits /s 622 Mbits/s Computing Facilities to Model Circa 2005 Model Circa 2005 meet the needs of the Fig. 4-1 Computing for an LHC Experiment Based on a Hierarchy of Computing Centers. Capac LHC Experiments for CPU and disk are representative and are provided to give an approximate scale). - 16 - • Assumes poor Tier- networking Tier- Tier- Tier- • Hierarchy of functionality 1 1 1 and capability Ian Fisk Tier- Tier- Tier- FNAL/CD 2 2 2 5

Distributing Computing at the Beginning • Before LHC most of the Computing Capacity was located at the experiment at the beginning –Most experiments evolved and added distributed computing later NDG OSG LCG LHC began with a global distributed computing system Ian Fisk 6

Grid Services Connection to batch (Globus and CREAM based) Site WMS During the evolution Experiment Services CE the low level services are largely the same BDII Information � System FTS Most of the changes come from the SE actions and expectations of the Ian Fisk CD/FNAL VOMS experiments Connection to storage (SRM or xrootd) Higher Level Services Lower Level Services Providing Consistent Interfaces to Facilities

Successes • When the WLCG started there was a lot of concern about the viability of the Tier-2 Program –A university based grid of often small sites Tier-0 Tier-1 Tier-2 � � 2009 2013 Capacity 18% 20% Grows by � factor 2.5 44% 47% � 33% 38% � • Total system uses close to half a million processor cores continuously Ian Fisk FNAL/CD 8

Moving Forward –Strict hierarchy of Prompt Reconstruction Storage connections becomes Commissioning more of a mesh Tier- CAF 0 –Divisions in Re-Reconstruction/ functionality Simulation Archiving Tier- Tier- Tier- especially for chaotic Data Serving 1 1 1 activities like analysis become more blurry Tier- Tier- Tier- T i e r - 2 2 2 2 –More access over the Tier- 2 Tier- wide area 2 Simulation and User Analysis ‣ Model changes have been an evolution ‣ Not all experiments have emphasized the same things Ian Fisk ‣ Each pushing farther in particular directions FNAL/CD 9

Evolution We have had evolution all through the history or the project Slow changes and improvements Some examples Use of Tier-2s for analysis in LHCb Full mesh transfers in ATLAS and CMS Data federation in ALICE Better use of the network by all the experiments Ian Fisk CD/FNAL But many things are surprisingly stable Architectures of hardware (x86 with ever increasing cores) Services both in terms of architectures and interfaces

Looking back In June of 2010 we had a workshop on Data Access and Manager in Amsterdam Areas we worried about at the time were making a less deterministic and flexible system providing better access to the data for analysis being more efficient Some things were were not worrying about Ian Fisk CD/FNAL New architectures for hardware Clouds Opportunistic Computing

Progress Networking • One of the areas of progress has been better use of wide area networking to move data and to make efficient use of the distributed computing –Limited dedicated network –Much shared use R&E networking LHCOPN - Dedicated resource T0->T1 and T1 to T1 LHCOne to Tier-2s LHCOne - New initiative for Tier-2 network Ian Fisk FNAL/CD 12

Mesh Transfers Transfers –Change from West 150MB/s � Tier- Tier- 1 1 � � Tier- Tier- � 2 2 150MB/s 150MB/s � Transfers –To East 300MB/s Tier- Tier- 1 1 Ian Fisk Tier- Tier- FNAL/CD 2 2 13

Completing the Mesh Tier- Tier- 2 2 • Tier-2 to Tier-2 transfers are now similar to Tier-1 to Tier-2 in CMS Tier- Tier- 2 2 Ian Fisk FNAL/CD 14

Overlay Batch One of the challenges of the grid is despite having a consistent set of protocols actually getting access to resources takes a lot of workflow development Pilots jobs are centrally submitted and start on worker nodes, reporting back that they are available Ian Fisk Building up an enormous batch queue Batch

So what changes next? The LHC is currently in a 2 year shutdown to improve the machine Energy will increase to ~13TeV and the luminosity will grow by a factor a ~2 Both CMS and ATLAS aim to collect about 1kHz of data Events are more complex and take longer to Ian Fisk CD/FNAL reconstruct All experiments need to continue to improve efficiency

Resource Provisioning The switch to pilot submissions opens other improvements in resource provisioning Instead of submitting pilots through CEs We can submit pilots through local batch systems We can submit requests to Cloud provisioning systems that start VMs with pilots Currently both ATLAS and CMS provision the use of their online trigger farms through an OpenStack cloud Ian Fisk CD/FNAL The CERN Tier-0 will also be provisioned this way Before the start of Run2 ~20% of the resources could be allocated with cloud interfaces

Evolving the Infrastructure VM with VM with VM with Pilots VM with Pilots VM with Pilots VM with Pilots VM with Resource Cloud Pilots Pilots Pilots Provisioning Interface Resource Requests WN with WN with WN with Pilots Resource CE WN with Pilots WN with Pilots WN with Provisioning Pilots WN with Pilots Pilots Pilots Batch Queue Pilots • In the new resource provisioning model the pilot infrastructure communicates with the resource provisioning tools directly Ian Fisk – Requesting groups of machines for periods of time FNAL/CD 18

Local Environment CVMFS Once you arrive on a worker node, you need something to run SQUID SQUID Environment distribution has come a long way SQUID LHC experiments use the same read-only environment centrally Local distributed to nearly half a million Client Ian Fisk processor cores WN FUSE Client

High Performance Computing As modern worker nodes get more and more cores per box, these systems look like HPC All LHC Experiments are working on multi-processing and/or multi-threaded versions of their code We are transitioning how we scheduled pilots. A single pilot comes in an takes over an entire box or group of cores Ian Fisk The overlay batch then schedules the appropriate mix of work to use all the cores. And tightly coupled applications can run too

Wide Area Access • All experiments are looking at sending data directly to the worker node even from long distance –Sending data directly to applications over the WAN • Not immediately obvious that this increases the wide area network transfers m Ian Fisk FNAL/CD 11 21

Data Moved Currently we see about 400MB/s read over the wide area Thousands of active transfers Small hit in efficiency A lot of work goes into predictive read ahead and Ian Fisk caching

Network Improvements While CPU (25%/year) and disk (20%/year) have both slowed in the performance improvements at a fixed cost, network is still above 30% improvement per year Cost of 100Gb/s optics are falling For CMS we expect 30% of our Tier-2 resources will be connected at 100Gb/s at Universities within a year Ian Fisk

Changes at CERN Evolution of Tier 0 CERN recently deployed half their computing in Budapest with 2 x 100Gb/s connecting the two facilities Geneva is expensive for people, power, and space All the disks are at CERN and half the worker Analysis(job(CPU(efficiency( 1.000& nodes are in Hungary 0.900& 0.800& Ian Fisk 0.700& Meyrin&SLC6& 0.600& virtual& We see a 5% drop in 0.500& Wigner&SLC6& virtual& 0.400& 0.300& analysis efficiency 0.200& 0.100& 0.000& 2013%10& 2013%11& 2013%12& 2014%01& 2014%02& 2014%03&

Evolution of the LHC Computing Models Ian Fisk May 22, 2014 About - PowerPoint PPT Presentation

Evolution of the LHC Computing Models Ian Fisk May 22, 2014 About Me I am a scientist with Fermilab I have spent the last 14 years working on LHC Computing problems I helped build the first Tier-2 prototype computing center in the US I was

presentation Rzsa CNET CNET TF-NOC flash p US LHC US LHC Sndor US LHC US LHC Netw w

LHC Computing LHC Computing Nick Brook The LHC & experiments Requirements

LHC An invitation to further reading. Mike Lamont CERN/AB 1 CERNs accelerators LHC 2 LHC

Energy Frontier LHC & HL-LHC Michael Begel April 16, 2019 Michael Begel LHC at CERN LHC

Lisa Randall, Harvard University Entering LHC Era Entering LHC Era Many challenges as LHC

BSM Searches: BSM Searches: From Tevatron to LHC From Tevatron to LHC LHC start-up

Victoria Dec. 14, 2011 ATLAS CMS TRIUMF Workshop on LHC Results TRIUMF Workshop on LHC

LHC status report LHC status report Massi Massi Isnotmax Isnotmax FERRO FERRO-LUZZI , LHC

Models of Language Evolution models thereof its evolution language Models of Language Evolution

EVOLUTION X3 - 1 - Evolution X3 Marketing Dpt. November 2006 - 2 - EVOLUTION X3 Evolution X3

What do we expect from LHC(b)? Tatsuya Nakada CERN and University of Lausanne 19-23.2.2001,

Trigger and DAQ at LHC Trigger and DAQ at LHC C.Schwick Contents Contents INTRODUCTION The

Physics @ LHC (Physics @ TeV) Status of LHC/ATLAS/CMS and Physics explored at LHC

Trigger and DAQ at LHC Trigger and DAQ at LHC C.Schwick Contents Contents INTRODUCTION The

LHC Open Network Environment LHC Open Network Environment LHC ONE Artur Barczyk California

LHC Status and Introduction to the HL-LHC O. Brning LHC Timeline 1/3 2008 2008

HITOC Technology Workgroup Meeting January 6, 2011 1225 Ferry St SE, Salem OR 97301 Agenda

CS5412: REPLICATION, CONSISTENCY AND CLOCKS Lecture X Ken Birman Recall that clouds have tiers

Aligning RtI and PBIS: Potholes and Potential for an Integrated MTSS Brian Gaunt, Ph.D.

@tweetsomemoore Think back! Who were your outside pins? What did you try? What did you notice?

Use Cases and Requirements for MPLS-TP multi-failure protection

Pricing Interconnection: one regulatory economists perspective William Lehr MIT

Singapore Standard for Multi-Tiered Cloud Security - SS 584:2013 1. To provide an overview of

Workload-driven Analysis of File Systems in Shared Multi-Tier Data-Centers over InfiniBand K.

Sambuz

Useful Links

Newsletter

Mail Us

Evolution of the LHC Computing Models Ian Fisk May 22, 2014 About - PowerPoint PPT Presentation

Evolution of the LHC Computing Models Ian Fisk May 22, 2014 About Me I am a scientist with Fermilab I have spent the last 14 years working on LHC Computing problems I helped build the first Tier-2 prototype computing center in the US I was

presentation Rzsa CNET CNET TF-NOC flash p US LHC US LHC Sndor US LHC US LHC Netw w

LHC Computing LHC Computing Nick Brook The LHC &amp; experiments Requirements

LHC An invitation to further reading. Mike Lamont CERN/AB 1 CERNs accelerators LHC 2 LHC

Energy Frontier LHC &amp; HL-LHC Michael Begel April 16, 2019 Michael Begel LHC at CERN LHC

Lisa Randall, Harvard University Entering LHC Era Entering LHC Era Many challenges as LHC

BSM Searches: BSM Searches: From Tevatron to LHC From Tevatron to LHC LHC start-up

Victoria Dec. 14, 2011 ATLAS CMS TRIUMF Workshop on LHC Results TRIUMF Workshop on LHC

LHC status report LHC status report Massi Massi Isnotmax Isnotmax FERRO FERRO-LUZZI , LHC

Models of Language Evolution models thereof its evolution language Models of Language Evolution

EVOLUTION X3 - 1 - Evolution X3 Marketing Dpt. November 2006 - 2 - EVOLUTION X3 Evolution X3

What do we expect from LHC(b)? Tatsuya Nakada CERN and University of Lausanne 19-23.2.2001,

Trigger and DAQ at LHC Trigger and DAQ at LHC C.Schwick Contents Contents INTRODUCTION The

Physics @ LHC (Physics @ TeV) Status of LHC/ATLAS/CMS and Physics explored at LHC

Trigger and DAQ at LHC Trigger and DAQ at LHC C.Schwick Contents Contents INTRODUCTION The

LHC Open Network Environment LHC Open Network Environment LHC ONE Artur Barczyk California

LHC Status and Introduction to the HL-LHC O. Brning LHC Timeline 1/3 2008 2008

HITOC Technology Workgroup Meeting January 6, 2011 1225 Ferry St SE, Salem OR 97301 Agenda

CS5412: REPLICATION, CONSISTENCY AND CLOCKS Lecture X Ken Birman Recall that clouds have tiers

Aligning RtI and PBIS: Potholes and Potential for an Integrated MTSS Brian Gaunt, Ph.D.

@tweetsomemoore Think back! Who were your outside pins? What did you try? What did you notice?

Use Cases and Requirements for MPLS-TP multi-failure protection

Pricing Interconnection: one regulatory economists perspective William Lehr MIT

Singapore Standard for Multi-Tiered Cloud Security - SS 584:2013 1. To provide an overview of

Workload-driven Analysis of File Systems in Shared Multi-Tier Data-Centers over InfiniBand K.

Sambuz

Useful Links

Newsletter

Mail Us

LHC Computing LHC Computing Nick Brook The LHC & experiments Requirements

Energy Frontier LHC & HL-LHC Michael Begel April 16, 2019 Michael Begel LHC at CERN LHC