Regional WLCG connectivity in India and LHC Open Network Environment - - PowerPoint PPT Presentation

regional wlcg connectivity in india and lhc open network
SMART_READER_LITE
LIVE PREVIEW

Regional WLCG connectivity in India and LHC Open Network Environment - - PowerPoint PPT Presentation

Third Annual Workshop-Indian Institute of Technology, Guwahati (IITG) Regional WLCG connectivity in India and LHC Open Network Environment Brij Kishor Jashal Tata Institute of Fundamental Research, Mumbai Email brij.jashal@tifr.res.in


slide-1
SLIDE 1

17th Dec 2014 1

Brij Kishor Jashal Tata Institute of Fundamental Research, Mumbai Email – brij.jashal@tifr.res.in brij.Kishor.jashal@cern.ch

Regional WLCG connectivity in India and LHC Open Network Environment

Third Annual Workshop-Indian Institute of Technology, Guwahati (IITG)

slide-2
SLIDE 2

2

Agenda

  • LHC data challenges and evolution of LHC data access model
  • LHC open network environment - LHCONE
  • India-CMS site infrastructure
  • Network at T2_IN_TIFR
  • Expectations from networks for LHC Run2
  • WLCG regional connectivity in India by NKN
slide-3
SLIDE 3

Complexity of LHC experiments

1 Billion collisions per second (for each experiment) Each collision generates particles that often decay in complex ways into even more particles. Electronic circuits record the passage of each particle through a detector as a series of electronic signals, and send the data to the CERN Data Centre (DC) for digital reconstruction. 1 million Gigabytes (1PB) per second of raw data The digitized summary is recorded as a "collision event". Physicists must sift through the data produced to determine if the collisions have thrown up any interesting physics.

Source – cern.ch

slide-4
SLIDE 4

4

Source: Bill Johnston, ESNet

slide-5
SLIDE 5

5

Evolution of computing models -1

  • When in the late nineties the computing

models for the 4 LHC experiments were developed, networking was still a scarce resource and almost all models reflect this ,although in different ways

  • Instead of relying on the ability to provide

the data when needed for analysis, the data is replicated to many places on the grid shortly after it has been produced to be readily available for user analysis

  • This model has been proved to work well

for the early stages of analysis but is limited by the ever-increasing need for disk space as the data volume from the machine increases with time

slide-6
SLIDE 6

6

Evolution of computing models

  • Bos-Fisk report1 on Tier 2 (T2) requirements for the Large

Hadron Collider (LHC)

  • From Hierarchical to distributed
  • Network is the most reliable resource for

LHC computing.

  • This makes it possible to reconsider the

data models and not rely on pre-placement

  • f the data for analysis and running the

jobs only where the data is. Instead, jobs could pull the data from somewhere else if not already available locally

  • It will depend on the data needs to decide

whether to copy the data locally or to access the data remotely

  • If the data is copied locally the storage

turns into a cache that is likely to hold the selection of the data that is most popular for analysis at that time

slide-7
SLIDE 7

APAN-38 7

Managing large scale science traffic in a shared infrastructure

  • The traffic from T0 to T1 was well served by LHCOPEN
  • The traffic from the Tier 1 data centres to the Tier 2 sites (mostly

universities) where the data analysis is done is now large enough that it must be managed separately from the general R&E traffic

  • In aggregate the Tier 1 to Tier 2 traffic is equal to the Tier 0 to Tier 1

(there are about 170 Tier 2 sites)

  • Managing this with all possible combinations of Tier 2
  • Special infrastructure was required for this:

The LHC’s Open Network Environment LHCONE was designed for this purpose

  • LHCONE is an overlay network supported by Over 15 national and

international RENs

  • Several Open Exchange Points including NetherLight, StarLight, MANLAN and
  • thers
  • Trans-Atlantic connectivity provided by ACE, GEANT, NORDUNET and

USLHCNET

  • Over 50 end sites connected to LHCONE

45 Tier 2s 10 Tier 1s https://twiki.cern.ch/twiki/bin/view/LHCONE/LhcOneVRF#Connected_sites

slide-8
SLIDE 8

APAN-38 8

Source:- Bill Johnston, ESNet

slide-9
SLIDE 9

9

LHCONE L3VPN architecture

TierX sites connected to National-VRFs or Continental-VRFs

What LHCONE L3VPN is:

  • Layer3 (routed) Virtual Private Network
  • Dedicated worldwide backbone connecting Tier1s and Tier2s (and Tier3s) at high

bandwidth

  • Reserved to LHC data transfers and analysis
slide-10
SLIDE 10

10

slide-11
SLIDE 11

11

The TierX site needs to have:

  • Public IP addresses
  • A public Autonomous System (AS) number
  • A BGP capable router

LHCONE Acceptable Use Policy (AUP):

  • Use of LHCONE should be restricted to WLCG related traffic
  • IP addresses announced to LHCONE:
  • should be assigned only to WLCG servers
  • cannot be assigned to generic campus devices (desktop and portable

computers, wireless devices, printers, VOIP phones....) Routing setup

  • A BGP peering is established between the TierX and the VRF border routers
  • The TierX announce only the IP subnets used for WLCG servers
  • The TierX accepts all the prefixes announced by the LHCONE VRF router
  • The TierX must ensure traffic symmetry: Injects only packets sourced by

the announced subnets

slide-12
SLIDE 12

12

Symmetric paths must be ensured To achieve symmetry, we can use any one of the following techniques:

  • Policy Base Routing (source-destination routing)
  • Physically Separated networks
  • Virtually separated networks (VRF)
  • Science DMZ

Policy Base Routing (source-destination routing) If a single border router is used to connect to the Internet and LHCONE, source-destination routing must be used

slide-13
SLIDE 13

13

Physically Separated networks Different routers can be used for Generic and LCG Hosts Virtually separated networks (VRF) Traffic separation is achieved with Virtual Routing instances on the same physical box VRF = Virtual Routing Forwarding (virtual routing instance)

slide-14
SLIDE 14

14

slide-15
SLIDE 15

15

slide-16
SLIDE 16

16

T2_IN_TIFR Resources

Networking

  • Dedicated P2P link to LHCONE, upgraded to 4 G guaranteed with best effort up to 10 G
  • Two 2.5 Gigabit links – one to Europe and other to Singapore through TEIN4
  • Internal network upgraded with new chassis switch with total switching throughput of

3.5 Tbps and new storage nodes upgraded with 10 G link.

  • T2-IN-TIFR is a part of LHCONE via CERNLite from it’s inception.

Computing

  • Total no if physical cores 1024, Total average of runs executed on a machine

( Special Performance Evaluation for HEP code ) i.e HEP-SPEC06 is 7218.12 Storage

  • Total Storage capacity of 28 DPM Disk Nodes is aggregated to more than 1PB

(1020 TB)

  • Regional XrootD federation redirector for India
slide-17
SLIDE 17

17

Network at T2_IN_TIFR

Last week GEANT upgraded backbone link between CERNLite router (where Indian link terminates at CERN to GEANT POP) to 10G

slide-18
SLIDE 18

18

81% 98% 74% 96% 93% 93% 85% 98% 96% 98% 81% 100% 76% 100% 100% 93% 85% 98% 96% 100% 0% 20% 40% 60% 80% 100% 120% Dec-13 Jan-14 Feb-14 Mar-14 Apr-14 May-14 Jun-14 Jul-14 Aug-14 Sep-14 Site Availability Site Relaiability

Source:

https://espace2013.cern.ch/WLCG-document-repository/ReliabilityAvailability/Forms/AllItems.aspx?RootFolder=%2fWLCG-document- repository%2fReliabilityAvailability%2f2014&FolderCTID=0x0120000B00526BA2DC7C4AAEB1B0E517F001F8

WLCG site availability and reliability report India-CMS TIFR In 2014 (Dec-13 to Sep-14)

Dec-13 Jan-14 Feb-14 Mar-14 Apr-14 May-14 Jun-14 Jul-14 Aug-14 Sep-14 Site Availability 81% 98% 74% 96% 93% 93% 85% 98% 96% 98% Site Relaiability 81% 100% 76% 100% 100% 93% 85% 98% 96% 100% CPU Usage in Month (HEPSPEC06-Hrs) 250,432 6,01,448 3,23,452 11,86,232 12,03,460 11,98,572 9,44,016 10,86,608 8,71,916 7,31,964

slide-19
SLIDE 19

19

slide-20
SLIDE 20

20

slide-21
SLIDE 21

21

slide-22
SLIDE 22

22

slide-23
SLIDE 23

APAN-38 23

Total cumulative data transfers for last five months is 700 TB

Production Instance Uploads Production Instance Downloads Debug Instance Downloads Debug Instance Uploads

slide-24
SLIDE 24

24

Austria, 0.972 Belgium, 3 Brazil, 0.025 China, 0.000829 Estonia, 0.01 Finland, 0.008 France, 21 Germany, 68 Hungary, 0.824 Swiss / Italy , 156 Netherland, 8 Others, 5 Portugal, 5 Russia, 17 Russian-federation, 0.102 South Koria, 1 Spain, 18 Switzerland, 5 Taiwan, 2 UK, 57 Ukrain, 0.004 USA, 194

Country wise WLCG traffic to India Total volume in last one year - 557 TB

Austria Belgium Brazil China Estonia Finland France Germany Hungary Italy Netherland Others Portugal Russia Russian-federation South Koria Spain Switzerland Taiwan UK Ukrain USA

Only FTS +PHEdeX

Period 2013-12-01 00:00 UTC to 2014-12-15 00:00 UTC

slide-25
SLIDE 25

25

Austria 0% Belgium 0% Brazil 0% China 0% Estonia 0% France 2% Germany 3% Hungary 0% India 0% Italy 4% South Koria 1% Spain 39% Swiss 1% Taiwan 3% UK 41% USA 4% Others 0%

Country wise WLCG traffic from India ( FTS + PhEDEX) Total Volume – 408 TB

Austria Belgium Brazil China Estonia France Germany Hungary India Italy South Koria Spain Swiss Taiwan UK USA Others

Only FTS +PHEdeX

Period 2013-12-01 00:00 UTC to 2014-12-15 00:00 UTC

slide-26
SLIDE 26

26

Collaborating Indian Institutes at LHC (14 or more)

  • Tata Institutes of Fundamental Research (TIFR, Mumbai) WLCG Site
  • Variable Energy Cyclotron Centre ( VECC, Kolkata ) WLCG Site
  • Bhabha Atomic Research Centre (BARC, Mumbai)
  • Delhi University
  • Saha Institute of Nuclear Physics (SINP, Kolkata)
  • Punjab University
  • Indian Institute of Technology, Bombay (IITB, Mumbai)
  • Indian Institute of Technology, Madras ( IITC, Chennai)
  • Raja Ramanna Centre for Advanced Technology ( RRCAT, Indore)
  • Indian Institute of Technology Bhubaneswar (IITBBS)
  • Institute for Plasma Research (IPR, Ahmedabad)
  • National Institute of Science Education and Research (NISER,

Bhubneshwar)

  • Vishva-Bharti University (Santiniketan, WB)
  • Indian Institute of Science Education and Research, Pune

(More then 75 users active users from these institutes have accounts at

T2_IN_TIFR and TIFR Tier III )

slide-27
SLIDE 27

27

IndiaCMS – NKN - TIFR - LHCONE

  • Currently NKN traffic to LHCONE and CERN is via TEIN4 link from NKN-TEIN4

(Mumbai) – Madrid – GEANT – LHCONE on 2.5 G link.

NKN is an integral part of the connectivity for the regional component of Worldwide LHC Computing Grid (WLCG).

slide-28
SLIDE 28

APAN-38 28

traceroute to srm-cms.cern.ch (128.142.162.99), 30 hops max, 40 byte packets 1 router.puhep.res.in (144.16.112.177) 2.862 ms 2.992 ms 3.149 ms 2 202.141.110.229 (202.141.110.229) 0.353 ms 0.368 ms 0.372 ms 3 10.144.20.5 (10.144.20.5) 38.901 ms 38.968 ms 38.976 ms 4 10.255.238.189 (10.255.238.189) 38.659 ms 38.643 ms 38.625 ms 5 mb-xe-01-v4.bb.tein3.net (202.179.249.41) 38.128 ms 38.254 ms 38.076 ms 6 eu-mad-pr-v4.bb.tein3.net (202.179.249.118) 144.511 ms 144.504 ms 144.507 ms 7 ae3.mx1.par.fr.geant.net (62.40.98.65) 170.136 ms 170.188 ms 170.033 ms 8 switch-bckp-gw.mx1.par.fr.geant.net (62.40.124.82) 224.438 ms 190.136 ms 190.056 ms 9 e513-e-rbrxl-2-te20.cern.ch (192.65.184.70) 177.140 ms 177.028 ms 177.135 ms 10 e513-e-rbrxl-1-ne0.cern.ch (192.65.184.37) 173.402 ms 163.651 ms 172.637 ms traceroute to 144.16.111.24 (144.16.111.24), 30 hops max, 40 byte packets 1 router.puhep.res.in (144.16.112.177) 0.685 ms 1.063 ms 1.262 ms 2 202.141.110.229 (202.141.110.229) 0.357 ms 0.403 ms 0.375 ms 3 * * * 4 * * * 5 202.141.153.30 (202.141.153.30) 31.708 ms 32.060 ms 31.628 ms 6 202.141.153.29 (202.141.153.29) 38.695 ms 38.578 ms 38.657 ms 7 ci-connect.indiacms.res.in (144.16.111.24) 38.329 ms 38.317 ms 38.291 ms

Trace Route analysis.

slide-29
SLIDE 29

29

[root@cmst3ui2 ~]# traceroute -I 192.65.184.73 traceroute to 192.65.184.73 (192.65.184.73), 30 hops max, 60 bytepackets 1 172.16.11.252 (172.16.11.252) 2.353 ms 2.642 ms 2.883 ms 2 172.16.0.254 (172.16.0.254) 0.580 ms 0.574 ms 0.560 ms 3 vpn2.saha.ac.in (14.139.193.1) 0.945 ms 0.969 ms 0.968 ms (NKN) 4 10.118.248.93 (10.118.248.93) 0.951 ms 0.954 ms 0.954 ms ( NKN Private core) 5 * * * 6 * * * 7 10.255.221.34 (10.255.221.34) 31.804 ms 31.700 ms 31.694 ms (NKN Private core) 8 115.249.209.6 (115.249.209.6) 37.302 ms 37.541 ms 37.541 ms ( RCOM – Andhra) 9 * * * 10 * * * 11 62.216.147.73 (62.216.147.73) 46.588 ms 46.577 ms 46.556 ms (UK) 12 xe-0-0-0.0.pjr03.ldn001.flagtel.com (85.95.26.238) 186.642ms 174.002 ms 173.979 ms 13 xe-5-2-0.0.cji01.ldn004.flagtel.com (62.216.128.114) 187.455ms 187.519 ms 187.693 ms 14 80.150.171.69 (80.150.171.69) 295.319 ms 293.396 ms 293.372ms ( Germany) 15 217.239.43.29 (217.239.43.29) 305.694 ms 309.873 ms 306.677ms (Deutsche Telekom AG) 16 e513-e-rbrxl-1-ne1.cern.ch (192.65.184.73) 221.143 ms 230.249 ms 215.501 ms ( CERN)

slide-30
SLIDE 30

30

  • AT Present only TIFR traffic utilizes the dedicated link and follows LHCONE

routing symmetry

  • VECC Kolkata with 6000 CPU cores and 240 TB storage is one of the major

Alice T2, but at present it’s traffic is via TEIN4 link.

  • In the framework of AAA (any time, any data, anywhere), It has become

easier for the smaller T3 clusters to do HEP analysis on LHC data, As they don’t have to worry about the data location. But this is only efficient if they have access to LHCONE network

  • All the WLCG member institutes of India are already operating ( or coming

up with) T3 computing clusters ( e.g SINP – 200 core cluster)

  • For Improving the networking experience of Indian WLCG Institutes, there is

a need of segregation of LHC traffic from general internet traffic.

  • Cisco Catalyst 9120, which is our main router needs to upgraded. ( we

recently upgraded one SR SFT for 10 G) for IP v6 feathers. (belongs to NIC)

  • Enabling and testing jumbo Frames (9000 bytes) on NKN L3VPM in India ?
  • LHCONE traffic and ecosystem has already adopted to Jumbo frame.
  • It is already proven that CPU overhead is reduced significantly and

efficiency is improved.

slide-31
SLIDE 31

31

Thank you