 
              Improving End2End CENIC `07 MAKING WAVES Performance for the March 12-14, 2007 La Jolla, CA Columbia Supercomputer cenic07.cenic.org Mark Foster Computer Sciences Corp. NASA Ames Research Center March 2007 This work is supported by the NASA Advanced Supercomputing Division under Task Order A61812D (ITOP Contract DTTS59-99-D-00437/TO #A61812D) with Advanced Management Technology Incorporated (AMTI).
end2end for Columbia CENIC `07 MAKING WAVES • overview • Columbia system • LAN • WAN • e2e efforts – what we observed – constraints, and tools used – impact of efforts • sample applications – earth, astro, science, aero, spaceflight CENIC `07: Making Waves March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
overview CENIC `07 MAKING WAVES • scientists using large scale supercomputing resources to investigate problems: work is time critical – limited computational cycles allocated – results needed to feed into other projects • 100’s GBs to multiple TB data sets now common and increasing – data transfer performance becomes crucial bottleneck • many scientists from many locations/hosts: no simple solution • bringing network engineers to the edge, we have been able to improve the transfer rates from a few Mbps to a few Gbps for some applications • system utilization now often well above 90% CENIC `07: Making Waves March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
shared challenges CENIC `07 MAKING WAVES • Chris Thomas @ UCLA : – 10 Mbps end hosts, OC3 campus/group access – asymmetric (campus) path – firewall performance consideration – end users: not network engineers • Russ Hobby on Cyber Infrastructure: – it is a system (complex, but not as complex as earth/ocean as John Delaney described) – composition of components that must work together (efficiently) – not all problems are purely technical CENIC `07: Making Waves March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
the Columbia supercomputer CENIC `07 MAKING WAVES • 8th fastest supercomputer in world: 62 Tflops peak • supporting wide variety of projects – >160 projects; >900 accounts; ~150 simultaneous logins – Users from across and outside NASA – 24x7 support • effective architecture: easier application scaling for high-fidelity, shorter time-to- Systems: SGI Altix 3700, 3700-BX2 and 4700 solution, higher throughput Processors: 10,240 Intel Itanium 2 – 20 x 512p/1TB shared memory nodes (single and dual core) Global Shared Memory: 20 Terabytes – Some applications scaling to 2048p and above Front-End: SGI Altix 3700 (64 proc.) • fast build: order to full ops in 120 days; Online Storage: 1.1 Petabytes RAID dedicated Oct. 2004 Offline Storage: 6 Petabytes STK Silo – Unique partnership with industry (SGI, Intel, Internode Comm: Infiniband Voltaire) Hi-Speed Data Transfer: 10 Gigabit Ethernet 2048p subcluster: NUMAlink4 interconnect CENIC `07: Making Waves March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
Columbia configuration CENIC `07 MAKING WAVES Front Ends (3) CFE1 CFE2 CFE3 HWvis 28p Altix 3700 Hyperwall Access (HWvis) 16p Altix 3700 Networking 10GigE InfiniBand - 10GigE Switches - 10GigE Cards (1 Per 512p) - InfiniBand Switch (288port) - InfiniBand Cards (6 per 512p) - Altix 3700 2BX 2048 Numalink Compute Node (Single Sys M512p A512p A512p A512p A512p A512p A512p A512p A512p A512p A512p A512p T512p T512p T512p T512p T512p T512p T512p T512p Image) - Altix 3700 (A) - Altix 3700 BX2 (T) - Altix 4700 (M) Capacity System Capability System 50 TF 13 TF Storage Area Network FC Switch 128p FC Switch - Brocade Switch 2x128port Online Storage (1,040 TB) - 24 racks Fibre Fibre Fibre Fibre SATA SATA SATA SATA SATA SATA SATA SATA Channel Channel Channel Channel - SATA RAID Fibre Fibre Fibre Fibre 35TB 35TB 35TB 35TB 75TB 35TB 35TB 35TB SATA SATA SATA SATA 20TB 20TB 20TB 20TB SATA SATA SATA SATA Channel Channel Channel Channel - FC RAID 35TB 35TB 35TB 35TB 75TB 75TB 75TB 75TB 20TB 20TB 20TB 20TB - SATA RAID CENIC `07: Making Waves March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
Columbia access LAN CENIC `07 MAKING WAVES Columbia interconnect and access and border external peers nodes aggregation peering C1 C1 6500 C1 C1 C1 C1 C1 C1 NISN C1 C1 6500 PE 6500 C1 NREN C1 C1 C1 C1 6500 C1 C1 C1 C1 Cn CENIC `07: Making Waves March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
wide area network - NREN CENIC `07 MAKING WAVES 10G waves at the core, dark fiber to end sites AIX AIX NGIX-E NGIX-E NGIX-W NGIX-W PacWave PacWave ESNet ESNet McLean, VA McLean, VA GSFC GSFC ARC ARC NLR NLR MAX/DRAGON DRAGON MAX/ Sunnyvale, CA Sunnyvale, CA GSFC GSFC MATP/ MATP/ CENIC CENIC Atlanta, Atlanta, ELITE ELITE GA GA SLR SLR LRC LRC Los Angeles, CA Los Angeles, CA Norfolk, VA Norfolk, VA MSFC MSFC JPL JPL Huntsville, AL Huntsville, AL (in progress) Ext Peering Points •National and Regional optical networks provide links over which 10 Gbps and 1 Gbps waves can be established. Distributed Exch •Distributed exchange points provide interconnect in metro and regional areas to other networks and research facilities NLR/Regional net 10 GigE CENIC `07: Making Waves 1 GigE March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
end2end efforts CENIC `07 MAKING WAVES what we observed – long running but low data rates (Kbps, Mbps) – very slow bulk file moves reported – bad mix: untuned systems, small windows, small mtu, long rtt (insert historical graph here) CENIC `07: Making Waves March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
end2end efforts CENIC `07 MAKING WAVES constraints, and tools used – facilities leveraging web100 could be really helpful, but… – local policies/procedures sometimes preclude helpful changes • system admin practices: “standardization” for lowest common denominator, “fear” of impact (mtu, buffers size increase) • IT security policies, firewalls: “just say no” • WAN performance issues: “we don’t observe a problem on our LAN” – path characterization: ndt, npad, nuttcp, iperf, ping, traceroute • solve obvious issues early (duplex mismatch, mtu limitation, poor route) – flow monitoring: netflow, flow-tools (Fullmer), FlowViewer (Loiacono) – bulk transfer: bbftp (IN2P3/Gilles Farrache), bbscp (NASA), hpnssh (PSC/Rapier), starting to look at others: VFER & UDT CENIC `07: Making Waves March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
initial investigations CENIC `07 MAKING WAVES • scp 2-5 Mbps (or worse): cpu limits, and tcp limits – can achieve much better results with HPN-SSH (enables tcp window scaling), and by using RC4 encryption (much more efficient on some processors - use “openssl speed” to assess cpu’s performance) – even with these improvements, still need to use 8-12 concurrent streams to get maximum performance with small MTUs • nuttcp shows udp performance near line rate in many cases, but tcp performance still lacking – examine tcp behavior (ndt, npad, tcptrace) – tcp buffer sizes main culprit in large RTT environment; small amount of loss can be hard to detect/resolve – mid-span (or nearby) test platforms helpful CENIC `07: Making Waves March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
recommend TCP adjustments CENIC `07 MAKING WAVES typical linux example for 85ms rtt: # Set maximum TCP window sizes to 100 megabytes net.core.rmem_max = 104857600 net.core.wmem_max = 104857600 # Set minimum, default, and maximum TCP buffer limits net.ipv4.tcp_rmem = 4096 524288 104857600 net.ipv4.tcp_wmem = 4096 524288 104857600 # Set maximum network input buffer queue length net.core.netdev_max_backlog = 30000 # Disable caching of TCP congestion state (2.6 only) # (workaround a bug in some Linux stacks) net.ipv4.tcp_no_metrics_save = 1 # Ignore ARP requests for local IP received on wrong interface net.ipv4.conf.all.arp_ignore = 1 ref: “Enabling High Performance Data Transfers” www.psc.edu/networking/projects/tcptune CENIC `07: Making Waves March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
recommend ssh changes CENIC `07 MAKING WAVES • at least OpenSSH 4.3p2, using OpenSSL 0.9.8b (May 2006) • use faster ciphers than the default (RC4 leverage of processor specific coding) • OpenSSH should be patched (HPN-SSH) - support large buffers and congestion window www.psc.edu/networking/projects/hpn-ssh CENIC `07: Making Waves March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
firewall impacts CENIC `07 MAKING WAVES Prior to firewall After firewall upgrade upgrade (199 - 644 Mpbs) (792 - 980 Mpbs) CENIC `07: Making Waves March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
end host aggregate improvement CENIC `07 MAKING WAVES host performance using multiple streams, with some tuning 8 streams: 257 Mbps after more tuning, firewall upgrade 4 streams: 4.7 Gbps CENIC `07: Making Waves March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org
Recommend
More recommend