cluster computing cluster computing you ve come a long
play

Cluster Computing: Cluster Computing: You've Come A Long Way - PDF document

LCSC 5th Annual Workshop on Linux Clusters for Super Computing October 18-21, 2004 Linkping University, Sweden Cluster Computing: Cluster Computing: You've Come A Long Way You've Come A Long Way In A Short Time In A Short Time Jack


  1. LCSC 5th Annual Workshop on Linux Clusters for Super Computing October 18-21, 2004 Linköping University, Sweden Cluster Computing: Cluster Computing: You've Come A Long Way You've Come A Long Way In A Short Time In A Short Time Jack Dongarra University of Tennessee and Oak Ridge National Laboratory 1 Vibrant Field for High Performance Vibrant Field for High Performance Computers Computers ♦ Coming soon … ♦ Cray X1 ♦ SGI Altix � Cray RedStorm � Cray BlackWidow ♦ IBM Regatta � NEC SX-8 ♦ IBM Blue Gene/L ♦ IBM eServer ♦ Sun ♦ HP ♦ Bull NovaScale ♦ Fujitsu PrimePower ♦ Hitachi SR11000 ♦ NEC SX-7 ♦ Apple 2 1

  2. H. Meuer, H. Simon, E. Strohmaier, & JD H. Meuer, H. Simon, E. Strohmaier, & JD - Listing of the 500 most powerful Computers in the World - Yardstick: Rmax from LINPACK MPP Ax=b, dense problem TPP performance Rate - Updated twice a year Size SC‘xy in the States in November Meeting in Heidelberg, Germany in June - All data available from www.top500.org 3 Architecture/Systems Continuum Architecture/Systems Continuum Tightly Coupled Custom processor ♦ with custom interconnect 100% Cray X1 � NEC SX-7 � Custom IBM Regatta � IBM Blue Gene/L � 80% Commodity processor ♦ with custom interconnect SGI Altix � 60% � Intel Itanium 2 Cray Red Storm � Hybrid � AMD Opteron Commodity processor ♦ with commodity interconnect 40% Clusters � � Pentium, Itanium, Opteron, Alpha 20% � GigE, Infiniband, Myrinet, Quadrics Commod Loosely NEC TX7 � IBM eServer Coupled � 0% Bull NovaScale 5160 � J u n -9 3 D e c -9 3 J u n -9 4 D e c -9 4 J u n -9 5 D e c -9 5 J u n -9 6 D e c -9 6 J u n -9 7 D e c -9 7 J u n -9 8 D e c -9 8 J u n -9 9 D e c -9 9 J u n -0 0 D e c -0 0 J u n -0 1 D e c -0 1 J u n -0 2 D e c -0 2 J u n -0 3 D e c -0 3 4 J u n -0 4 2

  3. I t is really difficult to tell when an exponential is happening… by the time you get enough data points, it is too late Larry Smarr 5 Top500 Performance by Manufacturer Intel June 2004 0% California Digital Corp. Linux Networx 2% 3% Others 5% Dell 3% Self-made 2% Hitachi 1% Fujitsu 2% Sun 1% SGI 3% IBM Cray Inc. 51% 2% NEC 6% HP 6 19% 3

  4. The Golden Age of HPC Linux The Golden Age of HPC Linux ♦ The adoption rate of Linux HPC is phenomenal! � Linux in the Top500 is (was) doubling every 12 months � Linux adoption is not driven by bottom feeders � Adoption is actually faster at the ultra-scale! ♦ Most supercomputers run Linux ♦ Adoption rate driven by several factors: � Linux is stable: Often the default platform for CS research � Essentially no barrier to entry � Effort to learn programming paradigm, libs, devl env., and tools preserved across many orders of magnitude � Stable, complete, portable, middleware software stacks: � MPICH, MPI-IO, PVFS, PBS, math libraries, etc 7 Commodity Processors Commodity Processors ♦ HP PA RISC ♦ Intel Pentium Xeon ♦ Sun UltraSPARC IV � 3.2 GHz, peak = 6.4 Gflop/s � Linpack 100 = 1.7 Gflop/s ♦ HP Alpha EV68 � Linpack 1000 = 3.1 Gflop/s � 1.25 GHz, 2.5 Gflop/s peak ♦ AMD Opteron ♦ MIPS R16000 � 2.2 GHz, peak = 4.4 Gflop/s � Linpack 100 = 1.3 Gflop/s � Linpack 1000 = 3.1 Gflop/s ♦ Intel Itanium 2 � 1.5 GHz, peak = 6 Gflop/s � Linpack 100 = 1.7 Gflop/s 8 � Linpack 1000 = 5.4 Gflop/s 4

  5. Commodity Interconnects Commodity Interconnects ♦ Gig Ethernet ♦ Myrinet Clos ♦ Infiniband ♦ QsNet F a t t r e e ♦ SCI T Cost Cost Cost MPI Lat / 1-way / Bi-Dir o r u Switch topology NIC Sw/node Node (us) / MB/s / MB/s s Gigabit Ethernet Bus $ 50 $ 50 $ 100 30 / 100 / 150 SCI Torus $1,600 $ 0 $1,600 5 / 300 / 400 QsNetII (R) Fat Tree $1,200 $1,700 $2,900 3 / 880 / 900 QsNetII (E) Fat Tree $1,000 $ 700 $1,700 3 / 880 / 900 Myrinet (D card) Clos $ 595 $ 400 $ 995 6.5 / 240 / 480 Myrinet (E card) Clos $ 995 $ 400 $1,395 6 / 450 / 900 9 IB 4x Fat Tree $1,000 $ 400 $1,400 6 / 820 / 790 How Big Is Big? How Big Is Big? ♦ Every 10X brings new challenges � 64 processors was once considered large � it hasn’t been “large” for quite a while � 1024 processors is today’s “medium” size � 2048-8096 processors is today’s “large” � we’re struggling even here ♦ 100K processor systems � are in construction � we have fundamental challenges … � … and no integrated research program 10 5

  6. On the Horizon: 10K CPU SGI Columbia @NASA 10K CPU Cray Red Storm @Sandia 130K CPU IBM BG/L@LLNL First 10,000 CPU Linux Cluster Makes Top500 11 IBM BlueGene IBM BlueGene/L /L System (64 racks, 64x32x32) 131,072 procs Rack (32 Node boards, 8x8x16) 2048 processors BlueGene/L Compute ASIC Node Card (32 chips, 4x4x2) 16 Compute Cards 64 processors Compute Card 180/360 TF/s (2 chips, 2x1x1) 32 TB DDR 4 processors Chip (2 processors) 2.9/5.7 TF/s Full system total of 0.5 TB DDR 131,072 processors 90/180 GF/s 16 GB DDR 5.6/11.2 GF/s 2.8/5.6 GF/s 1 GB DDR 4 MB (cache) “Fastest Computer” BG/L 700 MHz 16K proc 8 racks Peak: 45.9 Tflop/s 12 Linpack: 36.0 Tflop/s 78% of peak 6

  7. BlueGene/L Interconnection Networks BlueGene/L Interconnection Networks 3 Dimensional Torus Interconnects all compute nodes (65,536) � Virtual cut-through hardware routing � 1.4Gb/s on all 12 node links (2.1 GB/s per node) � 1 µ s latency between nearest neighbors, 5 µ s to the � farthest 4 µ s latency for one hop with MPI, 10 µ s to the � farthest Communications backbone for computations � 0.7/1.4 TB/s bisection bandwidth, 68TB/s total � bandwidth Global Tree Interconnects all compute and I/O nodes (1024) � One-to-all broadcast functionality � Reduction operations functionality � 2.8 Gb/s of bandwidth per link � Latency of one way tree traversal 2.5 µ s � ~23TB/s total binary tree bandwidth (64k machine) � Ethernet Incorporated into every node ASIC � Active in the I/O nodes (1:64) � All external comm. (file I/O, control, user � interaction, etc.) Low Latency Global Barrier and Interrupt Latency of round trip 1.3 µ s � 13 Control Network OS for IBM’ ’s BG/L s BG/L OS for IBM ♦ Service Node: ♦ Trend: � Extremely large � Linux SuSE SLES 8 systems run an “ OS ♦ Front End Nodes: Suite ” I/O Node � Functional � Linux SuSE SLES 9 Decomposition trend ♦ I/O Nodes: lends itself toward a customized, optimized � An embedded Linux point-solution OS ♦ Compute Nodes: � Hierarchical Organization requires Message Smart � Home-brew OS Processor software to manage Memory Vector Pipeline Vector Pipeline Vector Pipeline Vector Pipeline topology, call forwarding, and collective operations 14 7

  8. Sandia National Lab National Lab’ ’s Red Storm s Red Storm Sandia • Red Storm is a supercomputer system leveraging over 10,000 AMD Opteron™ processors connected by an innovative high speed, high bandwidth 3D mesh interconnect designed by Cray. • Cray was awarded $93M to build the Red Storm system to support the Department of Energy's Nuclear stockpile stewardship program for advanced 3D modeling and simulation. • Scientists at Sandia National Lab helped with the architectural design of the Red Storm supercomputer. 15 Red Storm System Overview Red Storm System Overview • 40TF peak performance • 108 compute node cabinets, 16 service and I/O node cabinets, and 16 Red/Black switch cabinets – 10,368 compute processors - 2.0 GHz AMD Opteron™ – 512 service and I/O processors (256P for red, 256P for black) – 10 TB DDR memory • 240 TB of disk storage(120TB for red, 120TB for black) • MPP System Software – Linux + lightweight compute node operating system – Managed and used as a single system – Easy to use programming environment – Common programming environment – High performance file system – Low overhead RAS and message passing • Approximately 3,000 ft² including disk systems 16 8

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend