Huge Data Transfer Experimentation over Lightpaths Corrie Kost, - PowerPoint PPT Presentation

Huge Data Transfer Experimentation over Lightpaths Corrie Kost, Steve McDonald TRIUMF Wade Hong Carleton University

Motivation • LHC expected to come on line in 2007 • data rates expected to exceed a petabyte a year • large Canadian HEP community involved in the ATLAS experiment • establishment of a Canadian Tier 1 at TRIUMF • replicate all/part of the experimental data • need to be able transfer “huge data” to our Tier 1

TRIUMF • Tri University Meson Facility • Canada’s Laboratory for Particle and Nuclear Physics • operated as a joint venture by UofA, UBC, Carleton U, SFU, and UVic • located on the UBC campus in Vancouver • five year funding from 2005 - 2009 announced in federal budget • planned as the Canadian ATLAS Tier 1

TRIUMF

Lightpaths • a significant design principle of CA*net 4 is the ability to provide dedicated point to point bandwidth over lightpaths under user control • similar philosophy of SURFnet provides the ability to establish an end to end lightpath from Canada to CERN • optical bypass isolates “huge data transfers” from other users of the R&E networks • lightpaths permit the extension of ethernet LANs to the wide area

Ethernet: local to global • the de facto LAN technology • original ethernet • shared media, half duplex, distance limited by protocol • modern ethernet • point to point, full duplex, switched, distance limited by the optical components • cost effective

Why native Ethernet Long Haul? • more than 90% of the Internet traffic originates from an Ethernet LAN • data traffic on the LAN increases due to new applications • Ethernet services with incremental bandwidth offer new business opportunities for carriers • why not native Ethernet? • scalability, reliability, service guarantees • all the above are research areas • native Ethernet long haul connections can be used today as a complement to the routed networks, not a replacement

Experimentation • experimenting with 10 GbE hardware for the past 3 years • engaged 10 GbE NIC and network vendors • mostly interested in disk to disk transfers with commodity hardware • tweaking performance of Linux-based disk servers • engaged hardware vendors to help build systems • testing data transfers over dedicated lightpaths • engineering solutions for the e2e lightpath last mile • especially for 10 GbE

2002 Activities • established the first end to end trans-atlantic lightpath between TRIUMF and CERN for iGrid 2002 • bonded dual GbEs transported across a 2.5 Gbps OC-48 • initial experimentation with 10GbE • alpha Intel 10GbE LR NICs, Extreme Black Diamond 6808 with 10GbE LRi blades • transfered data from ATLAS DC from TRIUMF to CERN using bbftp and tsunami

Live continent to continent • e2e lightpath up and running Sept 20 20:45 CET traceroute to cern-10g (192.168.2.2), 30 hops max, 38 byte packets 1 cern-10g (192.168.2.2) 161.780 ms 161.760 ms 161.754 ms

iGrid 2002 Topology

Exceeding a Gbps (Tsunami)

2003 Activities • Canarie funded directed research project, CA*net 4 IGT to continue with experimentation • Canadian HEP community and CERN • GbE lightpath experimentation between CERN and UofA for real-time remote farms • data transfers over a GbE lightpath between CERN and Carleton U for transferring 700GB of ATLAS FCAL test beam data • took 6.5 hrs versus 67 days

Current IGT Topology

2003 Activities • re-establishment of 10 GbE experiments • newer Intel 10 GbE NICs and Force 10 Networks E600 switches, IXIA network testers, servers from Intel and CERN OpenLab • established first native 10GbE end to end transatlantic lightpath between Carleton U and CERN • demonstrated at ITU Telecom World 2003

Demo during ITU Telecom World 2003 HP 10GE LAN PHY 10GE WAN PHY OC192c Itanium-2 Intel Itanium-2 HP Itanium-2 Intel Xeon Ixia Cisco Cisco Cisco 400T Cisco Cisco Force10 Force10 ONS 15454 Ixia ONS 15454 ONS 15454 ONS 15454 ONS 15454 E 600 E 600 400T Geneva Ottawa Toronto Chicago Amsterdam 10 GbE WAN PHY over an OC-192 circuit using lightpaths provided by SURFnet and CA*net 4 9.24 Gbps using 6 Gbps using 5.65 Gbps using traffic generators UDP on PCs TCP on PCs

Results on the transatlantic 10GbE Single stream UDP throughput Single stream TCP throughput Data rates limited by the PC, even for memory to memory tests UDP uses less resources than TCP on high-bandwidth delay product networks

2004-2005 Activities • arrival of the third CA*net 4 lambda in the summer of 2004, looked at establishing a 10 GbE lightpath from TRIUMF • Neterion (s2io) Xframe 10 GbE NICs, Foundry NetIron 40Gs, Foundry NetIron 1500, servers from Sun Microsystems, and custom built disk servers from Ciara Technologies. • distance problem between TRIUMF and the CA*net 4 OME 6500 in Vancouver • XENPAK 10 GbE WAN PHY at 1310nm

2004-2005 Activities • testing data transfers between TRIUMF and CERN, and TRIUMF and Carleton U over a 10 GbE lightpath • experimenting with robust data transfers • attempt to maximize disk i/o performance from Linux-based disk servers • experimenting with disk controllers and processors • ATLAS Service Challenges in 2005

2004-2005 Activities • exploring a more permanent 10 GbE lightpath to CERN and lightpaths to Canadian Tier 2 ATLAS sites from TRIUMF • CANARIE playing a lead role in helping to facilitate • still need to solve some last mile lightpath issues

Experimental Setup at TRIUMF MRV FD Storm 2 Sun 1 NI1500 Storm 1 Storm 1 Storm 1

Xeon-based Servers • Dual 3.2 GHz Xeons • 4GB memory • 4 3WARE 9500S-4LP (&8) • 16 SATA150 120GB drives • 40GB HITACHI 14R9200 drives • INTEL 10GBE PXLA8590LR

Some Xeon Server I/O Results • read a pair of 80 GB (xfs) files for 67 hours – 120TB – average 524 MB/sec (Software Raid0 of 8 Sata disks on each of pair hardware Raid0 RocketRaid 1820A controllers on Storm2) • 10GbE S2IO Nics – back-to-back 17 hrs – 10TB – average 180MB/se ( from Storm2 to Storm1 with software Raid0 of 4 disks on each of 3 3ware-9500S4 controllers in Raid0) • 10GbE lightpath: Storm2 to Itanium machine at CERN – 10,15,20,25 bbftp streams averaged 18, 24, 27, 29 MB/sec disk-to-disk. ( Only 1 disk at CERN – max write speed 48MB/sec) • continued Storm1 to Storm2 testing – many sustainability problems encountered and resolved. Details available on request. Don’t do test flights too close to ground echo 100000 > /proc/sys/vm/min_free_kbytes

Opteron-based Servers • Dual 2.4GHz Opterons • 4GB Memory • 1 WD800JB 80GB HD • 16 SATA 300GB HD (Seagate ST3300831AS) • 4 4 Port Infiniband-SATA • 2 RocketRaid 1820A • 10GbE NIC • 2 PCI-X at 133MHz (*) • 2 PCI-X at 100MHz (*) Note: 64bit * 133MHz = 8.4 Gb/s

Multilane Infini-band SATA

Server Specifications TYAN K8S S2882 SunFire V40z dd /dev/zero > /dev/null 60 GB/sec 32 GB/sec CPU Dual 2.5GHz Opterons Quad 2.5 GHz Opterons PCI-X (64 bit) 2@133 MHz (100 for two) 4@133 MHz full length 2@100MHz (66 for two) 1@133 MHz full length 1@100 Mhz half length 1@66 Mhz half length Memory 4 GB 8 GB Disks 16 300 GB SATA 2 x 73 GB 10 SCSI 320 3 x 147 GB 10K SCSI 320 I/O See slide “Optimal I/O Results” 3 x 147 GB as raid0 JBOD 160 to 123 MB/s write 176 to 130 MB/sec read

The Parameters • 5 types of controllers • number of controllers to use (1 to 4) • number of disks/controller (1 to 16) • RAID0, RAID5, RAID6, JBOD • dual or quad Opteron systems • 4-6 possible PCI-X slots (1 reserved for 10GigE) • linux kernels (2.6.9, 2.6.10, 2.6.11) • many tuning parameters (in addition to WAN) e.g. • blockdev –setra 8192 /dev/md0 • chunk-size in mdadm (1024) • /sbin/setpci –d 8086:1048 e6.b=2e • (modifies MMRBC field in PCI-X configuration space for vendor 8086 and device 1048 to increase transmit burst length on the bus • echo 100000 >/proc/sys/vm/min_free_kbytes • ifconfig eth3 txqueuelen 100000

The SATA Controllers 3Ware-9500S-4 3Ware-9500S-8 Areca 1160 SuperMicro DAC-SATA-MV8 Highpoint RocketRaid 1820A

Areca 1160 Details Extensive tests were done by tweakers.net on ARECA and 8 others www.tweakers.net/benchdb/search/product/104629 www.tweakers.net/reviews/557 PROS CONS Internal & External Web Access Flaky – external hangs require reboot, internal requires starting a new port Many options: display disk temps, SATA300 Trial and error to use them since few +NCQ, email alerts examples in documents. Supports filesystems >2TB, 16 disks, 64bit/ JBOD performance mostly = single disk 133MHz (24 disk / PCI-EXPRESS X8 available) 15 Disk RAID5 W/R 301/390 MB/s 2 RAID0 (7&8 disk) W/R 361/405 MB/s 15 Disk RAID6 W/R 237/328 MB/s RAID0 of 12 disks W/R 349/306 MB/s RAID6 very robust. Background rebuilds Background rebuilds 50-100 slower than has low impact on I/O performance. fast builds (at 20% priority).

Huge Data Transfer Experimentation over Lightpaths Corrie Kost, - PowerPoint PPT Presentation

Huge Data Transfer Experimentation over Lightpaths Corrie Kost, Steve McDonald TRIUMF Wade Hong Carleton University Motivation LHC expected to come on line in 2007 data rates expected to exceed a petabyte a year large Canadian HEP

Dynamic Lightpaths - in SURFnet and Beyond Bram Peeters, Gerben Van Malenstein, Hans Trom pert

Outline - SURF lightpaths - Use cases - Automated GOLE pilot - Call for collaboration on E2E

Radiative Transfer Radiative Transfer Radiative transfer is a branch of atmospheric physics. We

Dynamic Delegation of Experimentation Yingni Guo Northwestern University ngni Guo (NU)

Internet-scale Experimentation The challenges of large-scale networked system experimentation and

building a culture of experimentation at Spotify @bendressler - experimentation lead user

Energy-aware Traffic Allocation to Optical Lightpaths in Multilayer Core Networks Prof. P,

End-to-End Provisioning Workshop - Establishing Lightpaths - Scope and Objectives NRENs Local

Challenges in the last mile 1st E2E Workshop - Establishing Lightpaths Kurosh Bozorgebrahimi,

End-to-End Lightpaths ...in the Smallest University of the Netherlands Maurits van der Schee

Transfer United: Partnerships to Foster Transfer Student Success Tuesday, November 5 th

Keeping things cool is a Keeping things cool is a huge industry! huge industry! Insulation

Designing the Industrial Internet IxDA SF Design Doing August 8, 2013 Dane Petersen

OctoPrint 1.5.0 Gina Huge // @foosel VERRF2020 Who am I? Gina Huge aka foosel 37

CMS Data Transfer tests towards LHC data taking CMS Data Transfer tests towards LHC data taking D

FTP File Transfer Protocol Computer Center, CS, NCTU FTP FTP File Transfer Protocol

Optimalization of beam width and geometry parameters of VADIS cavity Miroslav Macko Comenius

Diversity at CERN CERNs Mission Push forward the frontiers of knowledge Develop new

CMS Programme India CERN LHC CMS India-CMS Kajari Mazumdar ( on behalf of

Cross section measurements of minor actinides a the n_TOF-Ph2 experiment at CERN Daniel Cano Ott

CMS Collabora*on Board Opening Plenary CMS Week

Mapping Hot Gas in the Universe using the Sunyaev-Zeldovich Effect Eiichiro Komatsu

Marek Domaracky CERN IT Vidyo@CERN CERN WebRTC Future 3 VIDYO@CERN: SCALE AND

The nuclear EMC effect in the deep-inelastic and the resonance region Sergey Kulagin Institute

Sambuz

Useful Links

Newsletter

Mail Us