Heterogeneous Multi-Computer System A New Platform for - PowerPoint PPT Presentation

Heterogeneous Multi-Computer System A New Platform for Multi-Paradigm Scientific Simulation Taisuke Boku, Hajime Susa, Masayuki Umemura, Akira Ukawa Center for Computational Physics, University of Tsukuba Junichiro Makino, Toshiyuki Fukushige Department of Astronomy, University of Tokyo

Outline • Background • Concept and Design of HMCS prototype • Implementation of prototype • Performance evaluation • Computational physics result • Variation of HMCS • Conclusions 06/24/2002 ICS02, New York 2

Background • Requirements to Platforms for Next Generation Large Scale Scientific Simulation – More powerful computation power – Large capacity of Memory, Wide bandwidth of Network – High speed & Wide bandwidth of I/O – High speed Networking Interface (outside) – … • Is it enough ? How about the quality ? • Multi-Scale or Multi-Paradigm Simulation 06/24/2002 ICS02, New York 3

Multi-Scale Physics Simulation • Various level of interaction – Newton Dynamics, Electro-Magnetic interaction, Quantum Dynamics, … • Microscopic and Macroscopic Interactions • Difference in Computation Order – O(N 2 ): ex. N-body – O(N log N): ex. FFT – O(N): ex. straight-CFD • Combining these simulation, Multi-Scale or Multi- Paradigm Computational Physics is realized 06/24/2002 ICS02, New York 4

HMCS – Heterogeneous Multi-Computer System • Combining Particle Simulation (ex: Gravity interaction) and Continuum Simulation (ex: SPH) in a Platform • Combining General Purpose Processor (flexibility) and Special Purpose Processor (high-speed) • Connecting General Purpose MPP and Special Purpose MPP via high-throughput network • Exchanging particle data at every time-step Prototype System ： CP-PACS + GRAPE-6 （ JSPS Research for the Future Project “Computational Science and Engineering”) 06/24/2002 ICS02, New York 5

Block Diagram of HMCS MPP for Continuum Simulation MPP for Particle Simulation (CP-PACS) Parallel I/O System (GRAPE-6) PAVEMENT/PIO … … 32bit PCI × N ・ … … … … ・・・ … ・・ … 100base-TX Switches Hybrid System Communication Cluster Paralel File Server (Compaq Alpha) (SGI Origin2000) Parallel Visualization Server Parallel Visualization System (SGI Onyx2) PAVEMENT/VIZ

CP － PACS • Pseudo Vector Processor with 300 Mflops of Peak Performance × 2048 ⇒ 614.4 Gflops • I/O node with the same performance × 128 • Interconnection Network ： 3-D Hyper Crossbar (300MB/s / link) • Platform for General Purpose Scientific Calculation • 100base-TX NIC on 16 IOUs for outside comm. • Partitioning is available （ Any partition can access any IOU ） • Manufactured by Hitachi Co. • Operation from 1996.4 with 1024 PUs, from 1996.10 with 2048 PUs 06/24/2002 ICS02, New York 7

CP-PACS (Center for Computational Physics) 06/24/2002 ICS02, New York 8

GRAPE-6 • The 6th generation of GRAPE (Gravity Pipe) Project • Gravity calculation for many particles with 31 Gflops/chip • 32 chips / board ⇒ 0.99 Tflops/board • 64 boards of full system is under implementation ⇒ 63 Tflops • On each board, all particles (j-particles) data are set onto SRAM memory, and each target particle (i-particle) data is injected into the pipeline and acceleration data is calculated • Gordon Bell Prize at SC01 Denver 06/24/2002 ICS02, New York 9

GRAPE-6 （ University of Tokyo ） GRAPE-6 board (32 chips) 8 board × 4 system 06/24/2002 ICS02, New York 10

GRAPE-6 (cont ’ d) (Bottom View) (Top View) Daughter Card Module (4 chip / module) 06/24/2002 ICS02, New York 11

Host Computer for GRAPE-6 • GRAPE-6 is not a stand-alone system ⇒ Host computer is required • Alpha CPU base PC （ Intel x86, AMD Ahtlon are also available ） • Connected via 32bit PCI Interface Card to GRAPE-6 board • A host computer can handle several GRAPE-6 boards • It is impossible to handle an enormous number of particles with a single host computer for complicated calculation 06/24/2002 ICS02, New York 12

Hyades (Alpha CPU base Cluster) • Cluster with Alpha 21264A (600MHz) × 16 node • Samsung UP1100 (single CPU) board • 768 MB memory / node • Dual 100base-TX NIC • 8 nodes are equipped with GRAPE-6 PCI card ⇒ Cooperative work with 8 GRAPE-6 boards under MPI programming • One of 100base-TX NICs is connected with CP-PACS via PIO (Parallel I/O System) • Linux RedHat 6.2 (kernel 2.2.16) • Operated as a data exchanging and controlling system to connect CP-PACS and GRAPE-6 06/24/2002 ICS02, New York 13

GRAPE-6 & Hyades Connection between GRAPE-6 & Hyades GRAPE-6 and Hyades 06/24/2002 ICS02, New York 14

PAVEMENT/PIO • Parallel I/O and Visualization Environment • Connecting multiple parallel processing platforms with commodity-based parallel network • Automatic and dynamic load balancing feature to utilize spatial parallelism for applications • Utilizing multiple I/O processors of MPP not to make bottleneck in communication • Providing easy-to-program API with various operation modes (user-oriented, static or dynamic load balancing) 06/24/2002 ICS02, New York 15

MPP – DSM system example SMP or Cluster CP-PACS … … Switch … … … … … … I/O processor (PIO server) PIO server Calculation processor (user process) User process (thread) 06/24/2002 ICS02, New York 16

HMCS Prototype Massively Parallel Processor CP-PACS Parallel Visualization Server (2048 PUs, 128 IOUs) SGI Onyx2 (4 Processors) 8 links … Switching HUB × 2 … … … Parallel 100Base-TX Ethernet Parallel File Server SGI Origin-2000 (8 Processors) 8 links GRAPE-6 & Hyades (16 node, 8 board)

SPH (Smoothed Particle Hydrodynamics) Representing the material as Representing the material as a collection of particles a collection of particles ∑ ρ = ρ − ( ) r W (| r r |) i j 0 i j j W : kernel function ρ

RT (Radiative Transfer) for SPH � Accurate calculation of optical depth along light paths required. � Use the method by Kessel-Deynet & Burkert (2000) . σ ( )( ) ∑ τ = + − n n s s TS E E E E 2 + + i i 1 i 1 i i P3 P5 Source Target P2 E1 E4 S T θ E5 E2 E3 P4 P1

SPH Algorithm with Self-Gravity Interaction GRAPE-6 calculation Gravity O(N 2 ) SPH (Dencity ） Comm. O(N) Radiation Trans. Iteration Chemistry Temperature CP-PACS Pressure Calculation O(N) Newton Dynamics

g6cpplib – CP-PACS API • g6cpp_start(myid, nio, mode, error) • g6cpp_unit(n, t_unit, x_unit, eps2, error) • g6cpp_calc(mass, r, f_old, phi_old, error) • g6cpp_wait(acc, pot, error) • g6cpp_end(error) 06/24/2002 ICS02, New York 21

Performance (raw – G6 cluster) • GRAPE-6 cluster performance with dummy data (without real RT-SPH) • GRAPE-6 board × 4 with 128K particles (sec) process particle all-to-all set-up N-body result data trans. data data in comp. return circulation SRAM time 1.177 0.746 0.510 0.435 0.085 Processing time for 1 iteration = 3.24 sec (total) 06/24/2002 ICS02, New York 22

Scalability with problem size (sec.) process n=15 n=16 n=17 data trans. 5.613 10.090 17.998 all-to-all 0.309 0.476 0.681 circulation RT-SPH calculation set data to 0.231 0.362 0.628 is included SRAM calculation 0.064 0.169 0.504 TOTAL 6.217 11.097 19.811 # of particles N = 2 n (#P=512) 06/24/2002 ICS02, New York 23

Scalability with # of PUs process #P=512 #P=1024 data trans. 17.998 10.594 all-to-all 0.681 0.639 circulation RT-SPH set data to 0.628 0.609 calculation SRAM is included calculation 0.504 0.503 TOTAL 19.811 12.345 # of particles (N) = 2 17 06/24/2002 ICS02, New York 24

Example of Physics Results （ 64K SPH particles + 64K dark matters ） 06/24/2002 ICS02, New York 25

Various implementation methods of HMCS • HMCS-L (Local) – Same as current prototype – Simple, but the system is closed • HMCS-R (Remote) – Remote access to GRAPE-6 server through Network (LAN or WAN = Grid) – Utilizing GRAPE-6 cluster in time-sharing manner as Gravity Server • HMCS-E (Embedded) – Enhanced HMCS-L : Each node of MPP (or large scale cluster) is equipped with GRAPE chip – Combining wide network bandwidth of MPP (or cluster) and powerful node processing power 06/24/2002 ICS02, New York 26

HMCS-R on Grid GRAPE Client + Computer HOST High Speed (general) Network Client Computer (general) ◎ Remote acess to GRAPE-6 server via g6cpp API Client ◎ no persistency on particle data – Computer suitable for Grid ◎ O(N 2 ) of calculation with O(N) (general) of data amount 06/24/2002 ICS02, New York 27

HMCS-E (Embedded) � Local comm. between general purpose and special purpose processors G-P � Utilizing wide S-P bandwidth of large scale network M � Ideal fusion of NIC flexibility and high performance High Speed Network Switch

Conclusions • HMCS – Platform for Multi-Scale Scientific Simulation • Combining General Purpose MPP (CP-PACS) and Special Purpose MPP (GRAPE-6) with parallel network under PAVEMENT/PIO middleware • SPH + Radiation Transfer with Gravity Interaction ⇒ Detailed simulation for Galaxy formation • 128K particle real simulation with 1024PU CP-PACS makes new epoch of simulation • Next Step: HMCS-R and HMCS-E 06/24/2002 ICS02, New York 29

Heterogeneous Multi-Computer System A New Platform for - PowerPoint PPT Presentation

Heterogeneous Multi-Computer System A New Platform for Multi-Paradigm Scientific Simulation Taisuke Boku, Hajime Susa, Masayuki Umemura, Akira Ukawa Center for Computational Physics, University of Tsukuba Junichiro Makino, Toshiyuki

Decentralized Dynamic Scheduling across Heterogeneous Multi core across Heterogeneous Multi

Coverage in Heterogeneous Coverage in Heterogeneous Networks Xiaoli Chu King s College

Unifying Heterogeneous Cray Unifying Heterogeneous Cray Resources and Systems into an

VINCO tambour cupboard MULTI DRAWER SYSTEM (MDS) Multi Drawer System (MDS) 1 VINCO tambour

Uniform access to heterogeneous Uniform access to heterogeneous grid infrastructures with grid

Mining Heterogeneous Mining Heterogeneous Information Networks Information Networks Xifeng Yan

Learning by Fusing Heterogeneous Data Marinka Zitnik Thesis Defense, October 22 2015 Motivation

Composing heterogeneous software with style Stephen Kell stephen.kell@cs.ox.ac.uk Composing. . .

Static Worksharing Strategies for Heterogeneous Computers with Unrecoverable Failures Anne

An Introduction to Coupling Conditions Homogeneous Heterogeneous Domain Decomposition Problems

Modeling Heterogeneous Modeling Heterogeneous Real- -time Components in BIP time Components in

CUstom Built hEterogeneous Multi-core ArCHitecture design paradigm based simulator : Towards

From CPU-GPU to heterogeneous multi-core Yesterday (2000-2010) Homogeneous multi-core Discrete

A Software Defined Multi-Path Traffic Offloading System for Heterogeneous LTE-WiFi Networks

Multi Use Civic Facility Multi Use Civic Facility Multi Use Civic Facility Multi Use Civic

Multi Multi Multi- Multi - - -Layer Access Control Layer Access Control Layer Access

Diego Carrasco, MSE Student dncarras@uc.cl Dept. Industrial Engineering Pontificia Universidad

Optimal Control of Josephson qubits What can quantum control do for quantum computing? .K. Wilhelm

Building a (resumable and extensible) DSL with Apache Groovy Jesse Glick CloudBees, Inc.

Safeguarding advanced Generation IV reprocessing facilities Challenges, R&D needs, and

Multi-User Editing Using Revision Databases John Rittenhouse johnr@ccpgames.com Overview

Transient weakening of geomagnetic field probed by GRAPES-3 Pravata Mohanty Tata Institute of

Dedicated Computers for (H igh-Energy) Physics R. Tripiccione Physics Department, Universita' di

Computational Linguistics: Language and Vision II Raffaella Bernardi Contents First Last Prev

Heterogeneous Multi-Computer System A New Platform for - PowerPoint PPT Presentation

Heterogeneous Multi-Computer System A New Platform for Multi-Paradigm Scientific Simulation Taisuke Boku, Hajime Susa, Masayuki Umemura, Akira Ukawa Center for Computational Physics, University of Tsukuba Junichiro Makino, Toshiyuki

Decentralized Dynamic Scheduling across Heterogeneous Multi core across Heterogeneous Multi

Coverage in Heterogeneous Coverage in Heterogeneous Networks Xiaoli Chu King s College

Unifying Heterogeneous Cray Unifying Heterogeneous Cray Resources and Systems into an

VINCO tambour cupboard MULTI DRAWER SYSTEM (MDS) Multi Drawer System (MDS) 1 VINCO tambour

Uniform access to heterogeneous Uniform access to heterogeneous grid infrastructures with grid

Mining Heterogeneous Mining Heterogeneous Information Networks Information Networks Xifeng Yan

Learning by Fusing Heterogeneous Data Marinka Zitnik Thesis Defense, October 22 2015 Motivation

Composing heterogeneous software with style Stephen Kell stephen.kell@cs.ox.ac.uk Composing. . .

Static Worksharing Strategies for Heterogeneous Computers with Unrecoverable Failures Anne

An Introduction to Coupling Conditions Homogeneous Heterogeneous Domain Decomposition Problems

Modeling Heterogeneous Modeling Heterogeneous Real- -time Components in BIP time Components in

CUstom Built hEterogeneous Multi-core ArCHitecture design paradigm based simulator : Towards

From CPU-GPU to heterogeneous multi-core Yesterday (2000-2010) Homogeneous multi-core Discrete

A Software Defined Multi-Path Traffic Offloading System for Heterogeneous LTE-WiFi Networks

Multi Use Civic Facility Multi Use Civic Facility Multi Use Civic Facility Multi Use Civic

Multi Multi Multi- Multi - - -Layer Access Control Layer Access Control Layer Access

Diego Carrasco, MSE Student dncarras@uc.cl Dept. Industrial Engineering Pontificia Universidad

Optimal Control of Josephson qubits What can quantum control do for quantum computing? .K. Wilhelm

Building a (resumable and extensible) DSL with Apache Groovy Jesse Glick CloudBees, Inc.

Safeguarding advanced Generation IV reprocessing facilities Challenges, R&amp;D needs, and

Multi-User Editing Using Revision Databases John Rittenhouse johnr@ccpgames.com Overview

Transient weakening of geomagnetic field probed by GRAPES-3 Pravata Mohanty Tata Institute of

Dedicated Computers for (H igh-Energy) Physics R. Tripiccione Physics Department, Universita' di

Computational Linguistics: Language and Vision II Raffaella Bernardi Contents First Last Prev

Safeguarding advanced Generation IV reprocessing facilities Challenges, R&D needs, and