[PPT] - Regional WLCG connectivity in India and LHC Open Network Environment PowerPoint Presentation

SLIDE 1

17th Dec 2014 1

Brij Kishor Jashal Tata Institute of Fundamental Research, Mumbai Email – brij.jashal@tifr.res.in brij.Kishor.jashal@cern.ch

Regional WLCG connectivity in India and LHC Open Network Environment

Third Annual Workshop-Indian Institute of Technology, Guwahati (IITG)

SLIDE 2

2

Agenda

LHC data challenges and evolution of LHC data access model
LHC open network environment - LHCONE
India-CMS site infrastructure
Network at T2_IN_TIFR
Expectations from networks for LHC Run2
WLCG regional connectivity in India by NKN

SLIDE 3

Complexity of LHC experiments

1 Billion collisions per second (for each experiment) Each collision generates particles that often decay in complex ways into even more particles. Electronic circuits record the passage of each particle through a detector as a series of electronic signals, and send the data to the CERN Data Centre (DC) for digital reconstruction. 1 million Gigabytes (1PB) per second of raw data The digitized summary is recorded as a "collision event". Physicists must sift through the data produced to determine if the collisions have thrown up any interesting physics.

Source – cern.ch

SLIDE 4

4

Source: Bill Johnston, ESNet

SLIDE 5

5

Evolution of computing models -1

When in the late nineties the computing

models for the 4 LHC experiments were developed, networking was still a scarce resource and almost all models reflect this ,although in different ways

Instead of relying on the ability to provide

the data when needed for analysis, the data is replicated to many places on the grid shortly after it has been produced to be readily available for user analysis

This model has been proved to work well

for the early stages of analysis but is limited by the ever-increasing need for disk space as the data volume from the machine increases with time

SLIDE 6

6

Evolution of computing models

Bos-Fisk report1 on Tier 2 (T2) requirements for the Large

Hadron Collider (LHC)

From Hierarchical to distributed
Network is the most reliable resource for

LHC computing.

This makes it possible to reconsider the

data models and not rely on pre-placement

f the data for analysis and running the

jobs only where the data is. Instead, jobs could pull the data from somewhere else if not already available locally

It will depend on the data needs to decide

whether to copy the data locally or to access the data remotely

If the data is copied locally the storage

turns into a cache that is likely to hold the selection of the data that is most popular for analysis at that time

SLIDE 7

APAN-38 7

Managing large scale science traffic in a shared infrastructure

The traffic from T0 to T1 was well served by LHCOPEN
The traffic from the Tier 1 data centres to the Tier 2 sites (mostly

universities) where the data analysis is done is now large enough that it must be managed separately from the general R&E traffic

In aggregate the Tier 1 to Tier 2 traffic is equal to the Tier 0 to Tier 1

(there are about 170 Tier 2 sites)

Managing this with all possible combinations of Tier 2
Special infrastructure was required for this:

The LHC’s Open Network Environment LHCONE was designed for this purpose

LHCONE is an overlay network supported by Over 15 national and

international RENs

Several Open Exchange Points including NetherLight, StarLight, MANLAN and
thers
Trans-Atlantic connectivity provided by ACE, GEANT, NORDUNET and

USLHCNET

Over 50 end sites connected to LHCONE

45 Tier 2s 10 Tier 1s https://twiki.cern.ch/twiki/bin/view/LHCONE/LhcOneVRF#Connected_sites

SLIDE 8

APAN-38 8

Source:- Bill Johnston, ESNet

SLIDE 9

9

LHCONE L3VPN architecture

TierX sites connected to National-VRFs or Continental-VRFs

What LHCONE L3VPN is:

Layer3 (routed) Virtual Private Network
Dedicated worldwide backbone connecting Tier1s and Tier2s (and Tier3s) at high

bandwidth

Reserved to LHC data transfers and analysis

SLIDE 10

10

SLIDE 11

11

The TierX site needs to have:

Public IP addresses
A public Autonomous System (AS) number
A BGP capable router

LHCONE Acceptable Use Policy (AUP):

Use of LHCONE should be restricted to WLCG related traffic
IP addresses announced to LHCONE:
should be assigned only to WLCG servers
cannot be assigned to generic campus devices (desktop and portable

computers, wireless devices, printers, VOIP phones....) Routing setup

A BGP peering is established between the TierX and the VRF border routers
The TierX announce only the IP subnets used for WLCG servers
The TierX accepts all the prefixes announced by the LHCONE VRF router
The TierX must ensure traffic symmetry: Injects only packets sourced by

the announced subnets

SLIDE 12

12

Symmetric paths must be ensured To achieve symmetry, we can use any one of the following techniques:

Policy Base Routing (source-destination routing)
Physically Separated networks
Virtually separated networks (VRF)
Science DMZ

Policy Base Routing (source-destination routing) If a single border router is used to connect to the Internet and LHCONE, source-destination routing must be used

SLIDE 13

13

Physically Separated networks Different routers can be used for Generic and LCG Hosts Virtually separated networks (VRF) Traffic separation is achieved with Virtual Routing instances on the same physical box VRF = Virtual Routing Forwarding (virtual routing instance)

SLIDE 14

14

SLIDE 15

15

SLIDE 16

16

T2_IN_TIFR Resources

Networking

Dedicated P2P link to LHCONE, upgraded to 4 G guaranteed with best effort up to 10 G
Two 2.5 Gigabit links – one to Europe and other to Singapore through TEIN4
Internal network upgraded with new chassis switch with total switching throughput of

3.5 Tbps and new storage nodes upgraded with 10 G link.

T2-IN-TIFR is a part of LHCONE via CERNLite from it’s inception.

Computing

Total no if physical cores 1024, Total average of runs executed on a machine

( Special Performance Evaluation for HEP code ) i.e HEP-SPEC06 is 7218.12 Storage

Total Storage capacity of 28 DPM Disk Nodes is aggregated to more than 1PB

(1020 TB)

Regional XrootD federation redirector for India

SLIDE 17

17

Network at T2_IN_TIFR

Last week GEANT upgraded backbone link between CERNLite router (where Indian link terminates at CERN to GEANT POP) to 10G

SLIDE 18

18

81% 98% 74% 96% 93% 93% 85% 98% 96% 98% 81% 100% 76% 100% 100% 93% 85% 98% 96% 100% 0% 20% 40% 60% 80% 100% 120% Dec-13 Jan-14 Feb-14 Mar-14 Apr-14 May-14 Jun-14 Jul-14 Aug-14 Sep-14 Site Availability Site Relaiability

Source:

https://espace2013.cern.ch/WLCG-document-repository/ReliabilityAvailability/Forms/AllItems.aspx?RootFolder=%2fWLCG-document- repository%2fReliabilityAvailability%2f2014&FolderCTID=0x0120000B00526BA2DC7C4AAEB1B0E517F001F8

WLCG site availability and reliability report India-CMS TIFR In 2014 (Dec-13 to Sep-14)

Dec-13 Jan-14 Feb-14 Mar-14 Apr-14 May-14 Jun-14 Jul-14 Aug-14 Sep-14 Site Availability 81% 98% 74% 96% 93% 93% 85% 98% 96% 98% Site Relaiability 81% 100% 76% 100% 100% 93% 85% 98% 96% 100% CPU Usage in Month (HEPSPEC06-Hrs) 250,432 6,01,448 3,23,452 11,86,232 12,03,460 11,98,572 9,44,016 10,86,608 8,71,916 7,31,964

SLIDE 19

19

SLIDE 20

20

SLIDE 21

21

SLIDE 22

22

SLIDE 23

APAN-38 23

Total cumulative data transfers for last five months is 700 TB

Production Instance Uploads Production Instance Downloads Debug Instance Downloads Debug Instance Uploads

SLIDE 24

24

Austria, 0.972 Belgium, 3 Brazil, 0.025 China, 0.000829 Estonia, 0.01 Finland, 0.008 France, 21 Germany, 68 Hungary, 0.824 Swiss / Italy , 156 Netherland, 8 Others, 5 Portugal, 5 Russia, 17 Russian-federation, 0.102 South Koria, 1 Spain, 18 Switzerland, 5 Taiwan, 2 UK, 57 Ukrain, 0.004 USA, 194

Country wise WLCG traffic to India Total volume in last one year - 557 TB

Austria Belgium Brazil China Estonia Finland France Germany Hungary Italy Netherland Others Portugal Russia Russian-federation South Koria Spain Switzerland Taiwan UK Ukrain USA

Only FTS +PHEdeX

Period 2013-12-01 00:00 UTC to 2014-12-15 00:00 UTC

SLIDE 25

25

Austria 0% Belgium 0% Brazil 0% China 0% Estonia 0% France 2% Germany 3% Hungary 0% India 0% Italy 4% South Koria 1% Spain 39% Swiss 1% Taiwan 3% UK 41% USA 4% Others 0%

Country wise WLCG traffic from India ( FTS + PhEDEX) Total Volume – 408 TB

Austria Belgium Brazil China Estonia France Germany Hungary India Italy South Koria Spain Swiss Taiwan UK USA Others

Only FTS +PHEdeX

Period 2013-12-01 00:00 UTC to 2014-12-15 00:00 UTC

SLIDE 26

26

Collaborating Indian Institutes at LHC (14 or more)

Tata Institutes of Fundamental Research (TIFR, Mumbai) WLCG Site
Variable Energy Cyclotron Centre ( VECC, Kolkata ) WLCG Site
Bhabha Atomic Research Centre (BARC, Mumbai)
Delhi University
Saha Institute of Nuclear Physics (SINP, Kolkata)
Punjab University
Indian Institute of Technology, Bombay (IITB, Mumbai)
Indian Institute of Technology, Madras ( IITC, Chennai)
Raja Ramanna Centre for Advanced Technology ( RRCAT, Indore)
Indian Institute of Technology Bhubaneswar (IITBBS)
Institute for Plasma Research (IPR, Ahmedabad)
National Institute of Science Education and Research (NISER,

Bhubneshwar)

Vishva-Bharti University (Santiniketan, WB)
Indian Institute of Science Education and Research, Pune

(More then 75 users active users from these institutes have accounts at

T2_IN_TIFR and TIFR Tier III )

SLIDE 27

27

IndiaCMS – NKN - TIFR - LHCONE

Currently NKN traffic to LHCONE and CERN is via TEIN4 link from NKN-TEIN4

(Mumbai) – Madrid – GEANT – LHCONE on 2.5 G link.

NKN is an integral part of the connectivity for the regional component of Worldwide LHC Computing Grid (WLCG).

SLIDE 28

APAN-38 28

traceroute to srm-cms.cern.ch (128.142.162.99), 30 hops max, 40 byte packets 1 router.puhep.res.in (144.16.112.177) 2.862 ms 2.992 ms 3.149 ms 2 202.141.110.229 (202.141.110.229) 0.353 ms 0.368 ms 0.372 ms 3 10.144.20.5 (10.144.20.5) 38.901 ms 38.968 ms 38.976 ms 4 10.255.238.189 (10.255.238.189) 38.659 ms 38.643 ms 38.625 ms 5 mb-xe-01-v4.bb.tein3.net (202.179.249.41) 38.128 ms 38.254 ms 38.076 ms 6 eu-mad-pr-v4.bb.tein3.net (202.179.249.118) 144.511 ms 144.504 ms 144.507 ms 7 ae3.mx1.par.fr.geant.net (62.40.98.65) 170.136 ms 170.188 ms 170.033 ms 8 switch-bckp-gw.mx1.par.fr.geant.net (62.40.124.82) 224.438 ms 190.136 ms 190.056 ms 9 e513-e-rbrxl-2-te20.cern.ch (192.65.184.70) 177.140 ms 177.028 ms 177.135 ms 10 e513-e-rbrxl-1-ne0.cern.ch (192.65.184.37) 173.402 ms 163.651 ms 172.637 ms traceroute to 144.16.111.24 (144.16.111.24), 30 hops max, 40 byte packets 1 router.puhep.res.in (144.16.112.177) 0.685 ms 1.063 ms 1.262 ms 2 202.141.110.229 (202.141.110.229) 0.357 ms 0.403 ms 0.375 ms 3 * * * 4 * * * 5 202.141.153.30 (202.141.153.30) 31.708 ms 32.060 ms 31.628 ms 6 202.141.153.29 (202.141.153.29) 38.695 ms 38.578 ms 38.657 ms 7 ci-connect.indiacms.res.in (144.16.111.24) 38.329 ms 38.317 ms 38.291 ms

Trace Route analysis.

SLIDE 29

29

[root@cmst3ui2 ~]# traceroute -I 192.65.184.73 traceroute to 192.65.184.73 (192.65.184.73), 30 hops max, 60 bytepackets 1 172.16.11.252 (172.16.11.252) 2.353 ms 2.642 ms 2.883 ms 2 172.16.0.254 (172.16.0.254) 0.580 ms 0.574 ms 0.560 ms 3 vpn2.saha.ac.in (14.139.193.1) 0.945 ms 0.969 ms 0.968 ms (NKN) 4 10.118.248.93 (10.118.248.93) 0.951 ms 0.954 ms 0.954 ms ( NKN Private core) 5 * * * 6 * * * 7 10.255.221.34 (10.255.221.34) 31.804 ms 31.700 ms 31.694 ms (NKN Private core) 8 115.249.209.6 (115.249.209.6) 37.302 ms 37.541 ms 37.541 ms ( RCOM – Andhra) 9 * * * 10 * * * 11 62.216.147.73 (62.216.147.73) 46.588 ms 46.577 ms 46.556 ms (UK) 12 xe-0-0-0.0.pjr03.ldn001.flagtel.com (85.95.26.238) 186.642ms 174.002 ms 173.979 ms 13 xe-5-2-0.0.cji01.ldn004.flagtel.com (62.216.128.114) 187.455ms 187.519 ms 187.693 ms 14 80.150.171.69 (80.150.171.69) 295.319 ms 293.396 ms 293.372ms ( Germany) 15 217.239.43.29 (217.239.43.29) 305.694 ms 309.873 ms 306.677ms (Deutsche Telekom AG) 16 e513-e-rbrxl-1-ne1.cern.ch (192.65.184.73) 221.143 ms 230.249 ms 215.501 ms ( CERN)

SLIDE 30

30

AT Present only TIFR traffic utilizes the dedicated link and follows LHCONE

routing symmetry

VECC Kolkata with 6000 CPU cores and 240 TB storage is one of the major

Alice T2, but at present it’s traffic is via TEIN4 link.

In the framework of AAA (any time, any data, anywhere), It has become

easier for the smaller T3 clusters to do HEP analysis on LHC data, As they don’t have to worry about the data location. But this is only efficient if they have access to LHCONE network

All the WLCG member institutes of India are already operating ( or coming

up with) T3 computing clusters ( e.g SINP – 200 core cluster)

For Improving the networking experience of Indian WLCG Institutes, there is

a need of segregation of LHC traffic from general internet traffic.

Cisco Catalyst 9120, which is our main router needs to upgraded. ( we

recently upgraded one SR SFT for 10 G) for IP v6 feathers. (belongs to NIC)

Enabling and testing jumbo Frames (9000 bytes) on NKN L3VPM in India ?
LHCONE traffic and ecosystem has already adopted to Jumbo frame.
It is already proven that CPU overhead is reduced significantly and

efficiency is improved.

SLIDE 31

31

Brij Kishor Jashal Tata Institute of Fundamental Research, Mumbai Email – brij.jashal@tifr.res.in brij.Kishor.jashal@cern.ch

Regional WLCG connectivity in India and LHC Open Network Environment

Third Annual Workshop-Indian Institute of Technology, Guwahati (IITG)

Agenda

Complexity of LHC experiments

Evolution of computing models -1

models for the 4 LHC experiments were developed, networking was still a scarce resource and almost all models reflect this ,although in different ways

the data when needed for analysis, the data is replicated to many places on the grid shortly after it has been produced to be readily available for user analysis

for the early stages of analysis but is limited by the ever-increasing need for disk space as the data volume from the machine increases with time

Evolution of computing models

Hadron Collider (LHC)

LHC computing.

data models and not rely on pre-placement

jobs only where the data is. Instead, jobs could pull the data from somewhere else if not already available locally

whether to copy the data locally or to access the data remotely

turns into a cache that is likely to hold the selection of the data that is most popular for analysis at that time

Managing large scale science traffic in a shared infrastructure

universities) where the data analysis is done is now large enough that it must be managed separately from the general R&E traffic

(there are about 170 Tier 2 sites)

The LHC’s Open Network Environment LHCONE was designed for this purpose

international RENs

USLHCNET

45 Tier 2s 10 Tier 1s https://twiki.cern.ch/twiki/bin/view/LHCONE/LhcOneVRF#Connected_sites

Source:- Bill Johnston, ESNet

LHCONE L3VPN architecture

TierX sites connected to National-VRFs or Continental-VRFs

What LHCONE L3VPN is:

bandwidth

The TierX site needs to have:

LHCONE Acceptable Use Policy (AUP):

computers, wireless devices, printers, VOIP phones....) Routing setup

the announced subnets

Symmetric paths must be ensured To achieve symmetry, we can use any one of the following techniques:

Policy Base Routing (source-destination routing) If a single border router is used to connect to the Internet and LHCONE, source-destination routing must be used

Physically Separated networks Different routers can be used for Generic and LCG Hosts Virtually separated networks (VRF) Traffic separation is achieved with Virtual Routing instances on the same physical box VRF = Virtual Routing Forwarding (virtual routing instance)

T2_IN_TIFR Resources

Networking

3.5 Tbps and new storage nodes upgraded with 10 G link.

Computing

( Special Performance Evaluation for HEP code ) i.e HEP-SPEC06 is 7218.12 Storage

(1020 TB)

Network at T2_IN_TIFR

Last week GEANT upgraded backbone link between CERNLite router (where Indian link terminates at CERN to GEANT POP) to 10G

WLCG site availability and reliability report India-CMS TIFR In 2014 (Dec-13 to Sep-14)

Total cumulative data transfers for last five months is 700 TB

Country wise WLCG traffic to India Total volume in last one year - 557 TB

Only FTS +PHEdeX

Period 2013-12-01 00:00 UTC to 2014-12-15 00:00 UTC

Country wise WLCG traffic from India ( FTS + PhEDEX) Total Volume – 408 TB

Only FTS +PHEdeX

Period 2013-12-01 00:00 UTC to 2014-12-15 00:00 UTC

Collaborating Indian Institutes at LHC (14 or more)

Bhubneshwar)

(More then 75 users active users from these institutes have accounts at

T2_IN_TIFR and TIFR Tier III )

IndiaCMS – NKN - TIFR - LHCONE

(Mumbai) – Madrid – GEANT – LHCONE on 2.5 G link.

NKN is an integral part of the connectivity for the regional component of Worldwide LHC Computing Grid (WLCG).

Trace Route analysis.

routing symmetry

Alice T2, but at present it’s traffic is via TEIN4 link.

easier for the smaller T3 clusters to do HEP analysis on LHC data, As they don’t have to worry about the data location. But this is only efficient if they have access to LHCONE network

up with) T3 computing clusters ( e.g SINP – 200 core cluster)

a need of segregation of LHC traffic from general internet traffic.

recently upgraded one SR SFT for 10 G) for IP v6 feathers. (belongs to NIC)

efficiency is improved.

Thank you