Improving End2End CENIC `07 MAKING WAVES Performance for the - - PowerPoint PPT Presentation

improving end2end
SMART_READER_LITE
LIVE PREVIEW

Improving End2End CENIC `07 MAKING WAVES Performance for the - - PowerPoint PPT Presentation

Improving End2End CENIC `07 MAKING WAVES Performance for the March 12-14, 2007 La Jolla, CA Columbia Supercomputer cenic07.cenic.org Mark Foster Computer Sciences Corp. NASA Ames Research Center March 2007 This work is supported by the


slide-1
SLIDE 1

March 12-14, 2007 La Jolla, CA cenic07.cenic.org

CENIC `07

MAKING WAVES

Improving End2End Performance for the Columbia Supercomputer

Mark Foster Computer Sciences Corp. NASA Ames Research Center

March 2007

This work is supported by the NASA Advanced Supercomputing Division under Task Order A61812D (ITOP Contract DTTS59-99-D-00437/TO #A61812D) with Advanced Management Technology Incorporated (AMTI).

slide-2
SLIDE 2

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

end2end for Columbia

  • overview
  • Columbia system
  • LAN
  • WAN
  • e2e efforts

– what we observed – constraints, and tools used – impact of efforts

  • sample applications

– earth, astro, science, aero, spaceflight

slide-3
SLIDE 3

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

  • verview
  • scientists using large scale supercomputing resources to

investigate problems: work is time critical

– limited computational cycles allocated – results needed to feed into other projects

  • 100’s GBs to multiple TB data sets now common and

increasing

– data transfer performance becomes crucial bottleneck

  • many scientists from many locations/hosts: no simple

solution

  • bringing network engineers to the edge, we have been able

to improve the transfer rates from a few Mbps to a few Gbps for some applications

  • system utilization now often well above 90%
slide-4
SLIDE 4

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

shared challenges

  • Chris Thomas @ UCLA :

– 10 Mbps end hosts, OC3 campus/group access – asymmetric (campus) path – firewall performance consideration – end users: not network engineers

  • Russ Hobby on Cyber Infrastructure:

– it is a system (complex, but not as complex as earth/ocean as John Delaney described) – composition of components that must work together (efficiently) – not all problems are purely technical

slide-5
SLIDE 5

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

the Columbia supercomputer

Systems: SGI Altix 3700, 3700-BX2 and 4700 Processors: 10,240 Intel Itanium 2 (single and dual core) Global Shared Memory: 20 Terabytes Front-End: SGI Altix 3700 (64 proc.) Online Storage: 1.1 Petabytes RAID Offline Storage: 6 Petabytes STK Silo Internode Comm: Infiniband Hi-Speed Data Transfer: 10 Gigabit Ethernet 2048p subcluster: NUMAlink4 interconnect

  • 8th fastest supercomputer in world: 62

Tflops peak

  • supporting wide variety of projects

– >160 projects; >900 accounts; ~150 simultaneous logins – Users from across and outside NASA – 24x7 support

  • effective architecture: easier application

scaling for high-fidelity, shorter time-to- solution, higher throughput – 20 x 512p/1TB shared memory nodes – Some applications scaling to 2048p and above

  • fast build: order to full ops in 120 days;

dedicated Oct. 2004 – Unique partnership with industry (SGI, Intel, Voltaire)

slide-6
SLIDE 6

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

Columbia configuration

SATA 35TB

Capability System 13 TF Capacity System 50 TF

T512p Front Ends (3) 28p Altix 3700 Hyperwall Access (HWvis) 16p Altix 3700 Networking

  • 10GigE Switches
  • 10GigE Cards (1 Per 512p)
  • InfiniBand Switch (288port)
  • InfiniBand Cards (6 per 512p)
  • Altix 3700 2BX 2048 Numalink

Compute Node (Single Sys Image)

  • Altix 3700 (A)
  • Altix 3700 BX2 (T)
  • Altix 4700 (M)

Storage Area Network

  • Brocade Switch 2x128port

Online Storage (1,040 TB) - 24 racks

  • SATA RAID
  • FC RAID
  • SATA RAID

A512p A512p A512p A512p A512p A512p A512p A512p A512p A512p A512p T512p T512p T512p T512p M512p T512p T512p T512p

FC Switch 128p

SATA 35TB SATA 35TB SATA 35TB SATA 35TB SATA 35TB SATA 35TB SATA 35TB Fibre Channel 20TB Fibre Channel 20TB Fibre Channel 20TB Fibre Channel 20TB Fibre Channel 20TB Fibre Channel 20TB Fibre Channel 20TB Fibre Channel 20TB SATA 75TB SATA 35TB SATA 35TB SATA 35TB SATA 75TB SATA 75TB SATA 75TB SATA 75TB

10GigE

CFE1 CFE3

FC Switch

CFE2 HWvis

InfiniBand

slide-7
SLIDE 7

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

Columbia access LAN

C1 C1 C1 C1

6500 6500 6500

C1 C1 C1 C1 C1 C1 C1 C1 C1 C1 C1 C1 C1 C1 C1

Cn PE 6500

NISN NREN

Columbia nodes interconnect and aggregation access and border peering external peers

slide-8
SLIDE 8

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

wide area network - NREN

10G waves at the core, dark fiber to end sites

Ext Peering Points Distributed Exch NLR/Regional net 10 GigE 1 GigE

JPL JPL McLean, VA McLean, VA ESNet ESNet LRC LRC PacWave PacWave Sunnyvale, CA Sunnyvale, CA Los Angeles, CA Los Angeles, CA ARC ARC Norfolk, VA Norfolk, VA NGIX-E NGIX-E GSFC GSFC CENIC CENIC NLR NLR MATP/ MATP/ ELITE ELITE Huntsville, AL Huntsville, AL MSFC MSFC SLR SLR MAX/ MAX/DRAGON DRAGON Atlanta, Atlanta, GA GA

  • National and Regional optical networks provide links over which 10 Gbps and 1 Gbps waves

can be established.

  • Distributed exchange points provide interconnect in metro and regional areas to other

networks and research facilities NGIX-W NGIX-W AIX AIX GSFC GSFC (in progress)

slide-9
SLIDE 9

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

end2end efforts

what we observed

– long running but low data rates (Kbps, Mbps) – very slow bulk file moves reported – bad mix: untuned systems, small windows, small mtu, long rtt

(insert historical graph here)

slide-10
SLIDE 10

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

end2end efforts

constraints, and tools used

– facilities leveraging web100 could be really helpful, but… – local policies/procedures sometimes preclude helpful changes

  • system admin practices: “standardization” for lowest common

denominator, “fear” of impact (mtu, buffers size increase)

  • IT security policies, firewalls: “just say no”
  • WAN performance issues: “we don’t observe a problem on our LAN”

– path characterization: ndt, npad, nuttcp, iperf, ping, traceroute

  • solve obvious issues early (duplex mismatch, mtu limitation, poor route)

– flow monitoring: netflow, flow-tools (Fullmer), FlowViewer (Loiacono) – bulk transfer: bbftp (IN2P3/Gilles Farrache), bbscp (NASA), hpnssh (PSC/Rapier), starting to look at others: VFER & UDT

slide-11
SLIDE 11

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

initial investigations

  • scp 2-5 Mbps (or worse): cpu limits, and tcp limits

– can achieve much better results with HPN-SSH (enables tcp window scaling), and by using RC4 encryption (much more efficient on some processors - use “openssl speed” to assess cpu’s performance) – even with these improvements, still need to use 8-12 concurrent streams to get maximum performance with small MTUs

  • nuttcp shows udp performance near line rate in many cases,

but tcp performance still lacking

– examine tcp behavior (ndt, npad, tcptrace) – tcp buffer sizes main culprit in large RTT environment; small amount

  • f loss can be hard to detect/resolve

– mid-span (or nearby) test platforms helpful

slide-12
SLIDE 12

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

recommend TCP adjustments

typical linux example for 85ms rtt:

# Set maximum TCP window sizes to 100 megabytes net.core.rmem_max = 104857600 net.core.wmem_max = 104857600 # Set minimum, default, and maximum TCP buffer limits net.ipv4.tcp_rmem = 4096 524288 104857600 net.ipv4.tcp_wmem = 4096 524288 104857600 # Set maximum network input buffer queue length net.core.netdev_max_backlog = 30000 # Disable caching of TCP congestion state (2.6 only) # (workaround a bug in some Linux stacks) net.ipv4.tcp_no_metrics_save = 1 # Ignore ARP requests for local IP received on wrong interface net.ipv4.conf.all.arp_ignore = 1

ref: “Enabling High Performance Data Transfers” www.psc.edu/networking/projects/tcptune

slide-13
SLIDE 13

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

recommend ssh changes

  • at least OpenSSH 4.3p2, using OpenSSL 0.9.8b (May 2006)
  • use faster ciphers than the default (RC4 leverage of

processor specific coding)

  • OpenSSH should be patched (HPN-SSH) - support large

buffers and congestion window www.psc.edu/networking/projects/hpn-ssh

slide-14
SLIDE 14

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

firewall impacts

Prior to firewall upgrade (199 - 644 Mpbs) After firewall upgrade (792 - 980 Mpbs)

slide-15
SLIDE 15

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

end host aggregate improvement

host performance using multiple streams, with some tuning 8 streams: 257 Mbps after more tuning, firewall upgrade 4 streams: 4.7 Gbps

slide-16
SLIDE 16

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

example application increase

308 169 5.7 50 100 150 200 250 300 350

Improved File Transfer Application & Jumbo Frames Improved File Transfer Application Standard File Transfer Application

Multi-Stream bbFTP + Jumbo Frames Multi-Stream bbFTP SCP

  • NASA Goddard’s 3-D Cloud-Resolving Model: 54x throughput

performance gains

  • collaboration between NREN and GSFC Scientific & Engineering

Network (SEN), High-End Computing Network (HECN) teams

slide-17
SLIDE 17

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

factors driving traffic growth

  • increased Columbia usage
  • storage and file system

upgrades on Columbia

  • aggressive campaign to work

with users to improve performance to Columbia through the use of better file transfer tools, end system tuning and user education

  • network bandwidth increases

across the NREN wide area network, local area networks and firewalls

slide-18
SLIDE 18

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

impact of e2e efforts

  • trends show

aggregate 5TB/mo increased to more than 20TB/mo

  • three of previous five

months exceed 1TB/day

efforts do not just result in increased bandwidth; improved network performance results in improved capability, increased fidelity, more efficient computing, & better productivity

slide-19
SLIDE 19

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

ecco at JPL/MIT - visualization

High Temporal Resolution Visualization Provides New Insights for Ocean Researchers

  • NAS Visualization group

completed a 110-hour computational run of the Massachusetts Institute of Technology’s general circulation model, MITgcm, simulating an entire year of ocean dynamics

  • These visualizations are

allowing researchers at MIT and NASA’s Jet Propulsion Laboratory to investigate model dynamics with unprecedented temporal resolution salt concentration, parts per thousand at 15m depth

slide-20
SLIDE 20

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

dark matter at UCSC

  • Madau, Diemand,

and Kuhlen at UC Santa Cruz simulate evolution

  • f dark matter halo
  • Projected dark

matter density- square maps of the simulated Milky Way-size halo at 13.3 billion years ago, 460 million years after the Big Bang.

slide-21
SLIDE 21

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

combustion science at LBL

  • Marc Day and

collaborators at LBL perform high-fidelity numerical simulations

  • n Columbia
  • results aid in the

development of clean fuel-efficient combustion systems for transportation and stationary power generation

partial period of combustion simulation, colored by the local fuel consumption rate

slide-22
SLIDE 22

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

national combustion code

  • Combustor hardware is

complex and the turbulent reacting flow process is complicated (and still not well understood).

  • Massively parallel

processing via message passing interface speeds up the calculations to acceptable levels—to approximately a wall-clock week.

Partially resolved Navier-Strokes simulations of the GE LM6000 combustor Combustion instabilities in a LOX-Methane rocket engine

slide-23
SLIDE 23

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

Ares I stage separation

  • Ares I rocket launch

system concept is similar to the Saturn rocket of the Apollo Program.

  • A simulation of a high-

altitude stage separation computes the flow of air around the vehicle and the resultant aerodynamic forces. early stages

  • f separation

flowfield around the vehicle post- separation

slide-24
SLIDE 24

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

Orion - crew vehicle

re-entry wake turbulence

slide-25
SLIDE 25

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

Orion - crew vehicle

re-entry wake turbulence - IBWAN at SC06 (Henze)

Obsidian longbow IB over NREN via 10 GbE NLR FrameNet between NASA Ames and Tampa

slide-26
SLIDE 26

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

NLCS awards - March 2007

4.75 million hours of supercomputing time under NASA’s National Leadership Computing System (NLCS) initiative: computationally intensive research projects of national interest

  • Transition in High-Speed Boundary Layers: Numerical Investigations Using DNS and

LES: Led by Hermann Fasel, University of Arizona, Tucson: high-fidelity simulations to understand how turbulence starts in high-speed airflow over air vehicles

  • Large Scale URANS/DES Ship Hydrodynamics Computations with CFDShip-Iowa: Led

by Fredrick Stern, University of Iowa: accelerate code development for viscous ship hydrodynamics simulation

  • Flame Dynamics and Emission Chemistry in High-Pressure Industrial Burners: Led by

Marcus Day, Lawrence Berkeley National Laboratory: simulate natural gas combustion in power-generation turbines to quantify the mechanisms that control the formation of pollutants

  • Multi-Scale Modeling and Computation of Convective Geophysical Turbulence: Led by

Keith Julien, University of Colorado, Boulder: new algorithms in large-scale simulations to study the role of global ocean thermohaline circulation (THC) in modulating the world’s climate

slide-27
SLIDE 27

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

some tuning references

  • NREN TCP Performance Tuning Guide

www.nren.nasa.gov/tcp_tuning.html

(also has links for bbftp, bbscp)

  • Other useful guides:

WAN Tuning and Troubleshooting

www.internet2.edu/~shalunov/writing/tcp-perf.html

Enabling High Performance Data Transfers

www.psc.edu/networking/projects/tcptune

TCP Tuning Guide www-didc.lbl.gov/TCP-tuning

slide-28
SLIDE 28

CENIC `07: Making Waves

March 12-14, 2007 • La Jolla, CA • cenic07.cenic.org

CENIC `07

MAKING WAVES

thank you

Mark Foster Computer Sciences Corp mafoster@arc.nasa.gov