hpc x a new resource for uk computational science
play

HPC x : A New Resource for UK Computational Science Mike Ashworth, - PowerPoint PPT Presentation

HPC x : A New Resource for UK Computational Science Mike Ashworth, Ian J. Bush, Martyn F. Guest, Martin Plummer and Andrew G. Sunderland CLRC Daresbury Laboratory, UK m.f.guest@dl.ac.uk and Stephen Booth, David S. Henty, Lorna Smith and Kevin


  1. HPC x : A New Resource for UK Computational Science Mike Ashworth, Ian J. Bush, Martyn F. Guest, Martin Plummer and Andrew G. Sunderland CLRC Daresbury Laboratory, UK m.f.guest@dl.ac.uk and Stephen Booth, David S. Henty, Lorna Smith and Kevin Stratford EPCC, University of Edinburgh, UK http://www.hpcx.ac.uk/

  2. Outline • HPCx Overview – HPCx Consortium – HPCx Technology - Phases 1, 2 and 3 (2002-2007) • Performance Overview of Strategic Applications: – Computational Materials – Molecular Simulation Applicat ions, and not – Molecular Electronic Structure H/ W dr iven – Atomic and Molecular Physics – Computational Engineering – Environmental Science • Evaluation across a range of Current High-End Systems: – IBM SP/p690, SGI Origin 3800/R14k-500, HP/Compaq AlphaServer SC ES45/1000 and Cray T3E/1200E • Summary HPCS 2003, Sherbrooke 14th May 2003 2

  3. HPCx Project overview • A joint venture between the Edinburgh Parallel Computing Centre (EPCC) at the University of Edinburgh and the Daresbury Laboratory of the Central Laboratory for the Research Councils (CLRC) • Project funded to £53M (~$120M) by UK Government • Established to operate and support the principal academic and research computing service for the UK • Principal objective being to provide a Capability Computing service to run scientific applications that could not be run on any other available computing platform • Six-year project with defined performance requirements at year 0, year 2 and year 4 so as to match Moore’s Law • IBM chosen as the technology partner with Power4 based p690 platform, and the “ best available interconnect” HPCS 2003, Sherbrooke 14th May 2003 3

  4. Consortium partners • EPCC (University of Edinburgh) – established in 1991 as the University’s interdisciplinary focus for high-performance computing and commercial exploitation arm – has hosted specialised HPC services for the UK’s QCD community since 1989. 5Tflop QCDOC system due 2003 in project with Columbia, IBM and Brookhaven National Laboratory – operated and supported UK national services on CRAY T3D and T3E systems from 1994 until 2002 • CLRC (Daresbury Laboratory) – HPC service provider to the UK academic community for > 25 yrs – research, development & support centre for leading edge academic engineering and physical science simulation codes – distributed computing support centre for COTS processor & network technologies, evaluating scalability and performance – UK grid support centre HPCS 2003, Sherbrooke 14th May 2003 4

  5. HPCx Technology Phase 1 Phase 1 (Dec. 2002): 3 TFlop/s Rmax Linpack – 40 Regatta-H SMP compute systems (1.28 TB memory) • 32 x 1.3GHz processors, 32 GB memory; 4 x 8-way LPARs – 2 Regatta-H I/O systems • 16 x 1.3GHz processors (Regatta-HPC), 4 GPFS LPARS • 2 HSM/backup LPARS, 18TB EXP500 fibre-channel global filesystem – Switch Interconnect • Existing SP Switch2 with "Colony" PCI adapters in all LPARs (20 us latency, 350 MB/s bandwidth) • Each compute node has two connections into switch fabric (dual plane) • 160 x 8-way compute nodes in total – Ranked #9 in the TOP500 list (November 2002) HPCS 2003, Sherbrooke 14th May 2003 5

  6. HPCx Technology Phases 2 & 3 Phase 2 (2004): 6 TFlop/s Rmax Linpack – >40 Regatta-H+ compute systems • 32 x 1.8GHz processors, 32 GB memory, full SMP mode (no LPAR) – 3 Regatta-H I/O systems (Double the capabilities of Phase 1) – "Federation" switch fabric • bandwidth quadrupled, ~5-10 microsecond latency, Connect to GX bus directly Phase 3 (2006): 12 TFlop/s Rmax Linpack – >40 Regatta-H+ compute systems • unchanged from Phase 2 – >40 additional Regatta-H+ compute systems • double the existing configuration – 4 Regatta I/O systems (Double the capabilities of Phase 2) Open to Alternative Technology Solutions (IPF, BlueGene/L ..) HPCS 2003, Sherbrooke 14th May 2003 6

  7. HPCx - Phase 1 Technology at Daresbury November 2002 November 2002 July 2002 July 2002 HPCS 2003, Sherbrooke 14th May 2003 7

  8. IBM p-series 690Turbo:Multi-chip Module (MCM) Four POWER4 chips (8 pr ocessor s) on an MCM, wit h t wo associat ed memor y slot s M M Mem E Mem E L3 Ctrl Ctrl M L3 M Shared L2 Shared L2 O O GX Distributed switch Distributed switch GX R R Bus Bus Y Y S S GX GX L L Bus Bus O Distributed switch O Distributed switch T T Shared L2 Shared L2 Mem Mem L3 L3 Ctrl Ctrl 4 GX Bus links f or external L3 cache shared connections across all processors HPCS 2003, Sherbrooke 14th May 2003 8

  9. Serial Benchmark Summary Perf ormance relat ive t o t he SGI Origin 3800/ R12k-400 7 X Cray T3E 7 X Cray T3E OVERALL 3 X SGI Origin 3800 3 X SGI Origin 3800 GAMESS-UK DLPOLY Intel Tiger Madison 1.2GHz † † Cray T3E/1200E SGI Origin 3800/R12k-400 HP/Compaq ES45 1 GHz Chem. Kernels IBM SP/p690 1.3 GHz MATRIX-97 0 100 200 300 400 HPCS 2003, Sherbrooke 14th May 2003 9

  10. SPEC CPU2000: SPECfp vs SPECfp _ rate (32 CPUs) Values relat ive t o t he I BM 690 Turbo 1.3 GHz Compaq Alpha GS320/731 Compaq Alpha GS320/1000 SP SP ECf p ECf p SGI Origin3800/R12k-400 SP SP ECf p_r at e ECf p _r at e SGI Origin3800/R14k-500 SGI Origin3800/R14k-600 HP Superdome/PA8600-552 HP Superdome/PA8700-750 IBM 690 Turbo 1.3 GHz 0 20 40 60 80 100 120 HPCS 2003, Sherbrooke 14th May 2003 10

  11. Interconnect Benchmark - EFF_BW 530 SGI Origin 3800/R14k-500 964 AlphaServer SC ES45/1000 (1CPU) QSNet 507 AlphaServer SC ES45/1000 (4CPU) 264 IBM SP/WH2-375 362 IBM SP/NH2-375 (16 - 8+8) 1057 IBM SP/Regatta-H (16 - 16X1) 825 IBM SP/Regatta-H (16 - 8 + 8) IBM Regatta-H 1217 Cray T3E/1200E 2151 698 CS2 QSNet AlphaEV67 (1 CPU) QSNet QSNet 456 CS2 QSNet AlphaEV67 (2 CPU) 635 16 CPUs CS9 P4/2000 Xeon - Myrinet (1CPU) 16 CPUs 476 CS9 P4/2000 Xeon - Myrinet (2CPU) Myrinet 2k 2k Myrinet 667 CS8 Itanium/800 - Myrinet (1CPU) 516 CS8 Itanium/800 - Myrinet (2CPU) 367 AMD K7/1000 MP - SCALI (2 CPU) SCALI SCALI 84.2 CS4 AMD K7/1200 - LAM 44.2 CS6 PIII/800 - MPICH MBytes/ sec / sec MBytes Fast Ethernet Fast Ethernet 77.9 CS6 PIII/800 - LAM 0 500 1000 1500 2000 HPCS 2003, Sherbrooke 14th May 2003 11

  12. Capability Benchmarking and Application Tuning • Materials Science – CASTEP, AIMPRO & CRYSTAL • Molecular Simulation – DL-POLY & NAMD HPCx Terascale • Atomic & Molecular Physics Applicat ions – PFARM and H2MOL Team • Molecular Electronic Structure – GAMESS-UK & NWChem • Computational Engineering – PDNS3D • Environmental Science – POLCOMS HPCS 2003, Sherbrooke 14th May 2003 12

  13. Systems Used In Performance Analysis • IBM Systems – IBM SP/Regatta-H (1024 procs, 8-way LPARs) HPCx system at DL – Regatta-H (32-way) and Regatta HPC (16-way) (Montpelier) – SP/Regatta-H (8-way LPARs, 1.3 GHz) at ORNL • HP/Compaq AlphaServer SC – 4-way ES40/667 (APAC) and 833 MHz SMP nodes ; – TCS1 system at PSC: 750 4-way ES45 nodes - 3,000 EV68 1 GHz CPUs, with 4 GB memory per node – Quadrics “fat tree” interconnect ( 5 usec latency, 250+ MB/sec B/W ) • SGI Origin 3800 – SARA (1000 CPUs) - NumaLink - with R14k/500 and R12k/400 CPUs – CSAR (512 CPUs) - NumaLink - R12k/400 • Cray T3E/1200E – CSAR (788 CPUs) HPCS 2003, Sherbrooke 14th May 2003 13

  14. Materials Science AIMPRO (Ab Initio Modelling PROgram) Patrick Briddon et al, Newcastle University http://aimpro.ncl.ac.uk/ CRYSTAL Properties of crystalline systems periodic HF or DFT Kohn-Sham Hamiltonian various hybrid approximations http://www.cse.clrc.ac.uk/cmg/CRYSTAL/ CASTEP CAmbridge Serial Total Energy Package http://www.cse.clrc.ac.uk/cmg/NETWORKS/UKCP/ HPCS 2003, Sherbrooke 14th May 2003 14

  15. The AIMPRO benchmark 6.0 SGI Origin 3800/R12k-400 IBM SP/p690 Performance (10000/time) 5.0 x1.6 4.0 x2.3 x4.3 3.0 216 at oms : C impurit y in a Si lat t ice; 2.0 5180 basis f unct ions; 1.0 limit ed by ScaLaPack rout ine PDSYEVX 0.0 0 32 64 96 128 160 192 224 256 Number of processors HPCS 2003, Sherbrooke 14th May 2003 15

  16. Scalability of Numerical Algorithms I. SGI Origin 3800/R12k-400 (“green”) SGI Origin 3800/R12k-400 (“green”) Real symmetric 50 Fock matrix (N = 1152) eigenvalue problems 120 PeIGS 2.1 Fock matrix PeIGS 2.1 PeIGS 3.0 PeIGS 2.1 - Cray T3E/1200 (N = 3888) 40 PDSYEV (Scpk 1.5) PeIGS 3.0 PDSYEVD (Scpk 1.7) 100 PDSYEV (Scpk 1.5) BFG-Jacobi (DL) PDSYEVD (Scpk 1.7) Time (sec) 30 80 60 20 40 10 20 0 0 16 32 64 128 256 512 2 4 8 16 32 Number of processors Number of processors HPCS 2003, Sherbrooke 14th May 2003 16

  17. Scalability of Numerical Algorithms II. IBM SP/p690 and SGI Origin O3800/R12k IBM SP/p690 and SGI Origin O3800/R12k N = 3,888 N = 3,888 Time (secs.) Real symmetric 100 PeIGS 3.0 O3K eigenvalue problems PeIGS 3.0 IBM SP/p690 PDSYEV O3K PDSYEV IBM SP/p690 N = 9,000 N = 9,000 Time (secs.) 80 PDSYEVD O3K PDSYEVD IBM SP/p690 800 739 IBM SP/p690 60 PDSYEV 600 PDSYEVD 40 407 400 251 20 200 147 99 99 75 57 0 0 32 64 128 256 512 16 32 64 128 256 Number of processors Number of processors HPCS 2003, Sherbrooke 14th May 2003 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend