lhcnet proposal for lhc network infrastructure extending
play

LHCnet: Proposal for LHC Network infrastructure extending globally - PowerPoint PPT Presentation

LHCnet: Proposal for LHC Network infrastructure extending globally to Tier2 and Tier3 sites Artur Barczyk, Harvey Newman California Institute of Technology / US LHCNet LHCT2S Meeting CERN, January 13 th , 2011 1 THE PROBLEM TO SOLVE 2 LHC


  1. LHCnet: Proposal for LHC Network infrastructure extending globally to Tier2 and Tier3 sites Artur Barczyk, Harvey Newman California Institute of Technology / US LHCNet LHCT2S Meeting CERN, January 13 th , 2011 1

  2. THE PROBLEM TO SOLVE 2

  3. LHC Computing Infrastructure WLCG in brief: WLCG in brief: • • 1 Tier-0 (CERN) • • 11 Tiers-1s; 3 continents • • 164 Tier-2s; 5 (6) continents Plus O(300) Tier Plus O(300) Tier-3s worldwide 3

  4. CMS Data Movements (All Sites and Tier1-Tier2) 120 Days June-October 120 Days June-October 2.5 2 Throughput [GBy/s] Daily average total Daily average total Daily average Daily average 2 rates reach over T1-T2 rates reach reach 1.5 2 GBytes/s 1-1.8 GBytes/s 1.5 1 1 0.5 0.5 0 0 6/19 7/03 7/17 7/31 8/14 8/28 9/11 9/25 10/9 6/23 7/07 7/21 8/4 8/18 9/1 9/15 9/29 10/13 132 Hours 132 Hours Tier2-Tier2 ~25% 4 Throughput [GBy/s] Last Week of Tier1-Tier2 1 hour average: 1 hour average: to 3.5 GBytes/s Traffic 3 To ~50% 2 during Dataset Reprocessing & 1 Repopulation 0 10/7 10/6 10/8 10/9 10/10 4

  5. Worldwide data distribution and analysis (F.Gianotti) Total throughput of ATLAS data through the Grid: 1 st January  November. MB/s per day 6 GB/s ~2 GB/s (design) Peaks of 10 GB/s reached Grid-based analysis in Summer 2010: >1000 different users; >15M analysis jobs The excellent Grid performance has been crucial for fast release of physics results. E.g.: ICHEP: the full data sample taken until Monday was shown at the conference Friday 5

  6. Changing LHC Data Models • 3 recurring themes: – Flat(ter) hierarchy: Any site might in the future pull data from any other site hosting it. – Data caching: Analysis sites will pull datasets from other sites “on demand”, including from Tier2s in other regions • Possibly in combination with strategic pre-placement of data sets – Remote data access: jobs executing locally, using data cached at a remote site in quasi-real time • Possibly in combination with local caching • Expect variations by experiment 6

  7. Ian Bird, CHEP conference, Oct 2010 Ian Bird, CHEP conference, Oct 2010 7

  8. Remote Data Access and Local Processing with Xrootd (CMS)  Useful for smaller sites with less (or even no) data storage  Only selected objects are read (with object read-ahead). No transfer of entire data sets  CMS demonstrator: Omaha diskless Tier3, served data from Caltech and Nebraska (Xrootd) Strategic Decisions: Strategic Decisions: Remote Access vs Data Transfers Similar operations in Similar operations in ALICE for years Brian Bockelman, September 2010 Brian Bockelman, September 2010 8

  9. Ian Bird, CHEP conference, Oct 2010 Ian Bird, CHEP conference, Oct 2010 9

  10. Requirements summary (from Kors ’ document) • Bandwidth: – Ranging from 1 Gbps (Minimal site) to 5-10Gbps (Nominal) to N x 10 Gbps (Leadership) – No need for full-mesh @ full-rate, but several full-rate connections between Leadership sites – Scalability is important, • sites are expected to migrate Minimal  Nominal  Leadership • Bandwidth growth: Minimal = 2x/yr, Nominal&Leadership = 2x/2yr • “Staging”: – Facilitate good connectivity to so far (network-wise) underserved sites • Flexibility: – Should be able to include or remove sites at any time • Budget Neutrality: – Solution should be cost neutral [or at least affordable, A/N] 10

  11. SOLUTION PROPOSAL 11

  12. Lessons learned • The LHC OPN has proven itself, shall learn from it • Simple architecture – Point-to-point Layer 2 circuits – Flexible and scalable topology • Grew organically – From star to partial mesh – Open to several technology choices • each of which satisfies requirements • Federated governance model – Coordination between stakeholders – No single administrative body required – Made extensions and funding straight-forward • Remaining challenge: monitoring and reporting – More of a systems approach 12

  13. Design Inputs • By the scale, geographical distribution and diversity of the sites as well as funding, only a federated solution is feasible • The current LHC OPN is not modified – OPN will become part of a larger whole – Some purely Tier2/Tier3 operations • Architecture has to be Open and Scalable – Scalability in bandwidth, extent and scope • Resiliency in the core, allow resilient connections at the edge • Bandwidth guarantees  determinism – Reward effective use – End-to-end systems approach • Operation at Layer 2 and below – Advantage in performance, costs, power consumption 13

  14. Design Inputs, cont. • Most/all R&E networks (technically) can offer Layer 2 services – Where not, commercial carriers can – Some advanced ones offer dynamic (user controlled) allocation • Leverage as much as possible on existing infrastructures and collaborations – GLIF, DICE, GLORIAD, … • Last but not least: – This would be the perfect occasion to start using IPv6, therefore we should, (at least) encourage IPv6, but support IPv4 • Admittedly the challenge is above Layer 3 14

  15. Design Proposal • A design satisfying all requirements: Switched Core with Routed Edge • Sites interconnected through Lightpaths – Site-to-site Layer 2 connections, static or dynamic • Switching is far more robust and cost-effective for high- capacity interconnects • Routing (from end-site viewpoint) is deemed necessary 15

  16. Switched Core • Strategically placed core exchange points – E.g. start with 2-3 in Europe, 2 in NA, 1 in SA, 1-2 in Asia – E.g. existing devices at Tier1s, GOLEs, GEANT nodes, … • Interconnected through high capacity trunks – 10-40 Gbps today, soon 100Gbps • Trunk links can be CBF, multi- domain Layer 1/ Layer 2 links, … – E.g. Layer 1 circuits with virtualised sub-rate channels, sub-dividing 100G links in early stages • Resiliency, where needed, provided at Layer 1/ Layer 2 – E.g. SONET/SDH Automated Protection Switching, Virtual Concatenation • At later stage, automated Lightpath exchanges will enable a flexible “stitching” of dynamic circuits – See demonstration (proof of principle) at last GLIF meeting and SC10 16

  17. One Possible Core Technology: Carrier Ethernet • IEEE standard 802.1Qay (PBB-TE) – Separation of backbone and customer network through MAC-in-MAC – No flooding, no Spanning Tree – Scalable to 16 M services • Provides OAM comparable to SONET/SDH – 802.3ag, end-to-end service OAM • Continuity Check Message, loopback, linktrace – 802.3ah, link OAM • Remote loopback, loopback control, remote failure indication • Cost Effective – e.g. NSP study indicates TCO ~43% lower for COE (PBB-TE) vs MPLS-TE • 802.1Qay and ITU-T G.8031 Ethernet Linear Protection Standard provides 1+1 and 1:1 protection switching – Similar to SONET/SDH APS – Works by Y.1731 message exchange (ITU-T standard) 17

  18. Routed Edge • End sites (might) require Layer 3 connectivity in the LAN – Otherwise a true Layer 2 solution might be adequate • Lightpaths terminate on a site’s router – Site’s border router, or, preferably, – Router closest to the storage elements • All IP peerings are p2p, site-to-site – Reduces convergence time, avoids issues with flapping links • Each site decides and negotiates with which remote site it desires to peer (e.g. based on experiment’s connectivity design) • Router (BGP) advertises only the SE subnet(s) through the configured Lightpath 18

  19. Lightpath termination • Avoid LAN connectivity issues when terminating lightpath at campus edge • Lightpath should be terminated as close as possible to the Storage Elements, but can be challenging if not impossible (support a dedicated border router?) • Or, provide a “local lightpath ” (e.g. a VLAN with proper bandwidth, or a dedicated link where possible); border router does the “stitching” 19

  20. IP backup • Foresee IP routed paths as backup – End- site’s BR is configured for both default IP connectivity, and direct peering through Lightpath – Direct peering takes precedence • Works also for dynamic Lightpaths • For full dynamic Lightpath setup, dynamic end-site configuration through e.g. LambdaStation or TeraPaths will be used 20

  21. Resiliency • Resiliency in the core is provided by protection switching depending on technology used between core nodes – SONET/SDH or OTN protection switching (Layer 1) – MPLS failover – PBB-TE protection switching – Ethernet LAG • Sites can opt for additional resiliency (e.g. where protected trunk links are not available) by forming transit agreements with other site – akin to the current LHC OPN use of CBF 21

  22. Layer1 through Layer 3 22

  23. Scalability • Assuming Layer 2 point-to-point operations, a natural scalability limitation is the 4k VLAN IDs • This problem is naturally resolved in – PBB-TE (802.3Qay), through MAC-in-MAC encapsulation Customer Ethertype Ethertype B-DA B-SA B-VID I-SID Frame incl. B-FCS 0x88A8 0x88E7 Header+FCS – dynamic bandwidth allocation with re-use of VLAN IDs • Only constraint is no two connections through the same network element to use the same VLAN 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend