A Parallel Numerical Library for UPC Jorge Gonzlez-Domnguez 1 *, Mara - PowerPoint PPT Presentation

Introduction Design of the library Implementation of the library Experimental evaluation Conclusions A Parallel Numerical Library for UPC Jorge González-Domínguez 1 *, María J. Martín 1 , Guillermo L. Taboada 1 , Juan Touriño 1 , Ramón Doallo 1 , Andrés Gómez 2 1 Computer Architecture Group 2 Galicia Supercomputing Center University of A Coruña (Spain) (CESGA) {jgonzalezd,mariam,taboada, Santiago de Compostela (Spain) juan,doallo}@udc.es {agomez}@cesga.es 15th International European Conference on Parallel and Distributed Computing (Euro-Par 2009), Delft University of Technology, Delft, The Netherlands 1/32

Introduction Design of the library Implementation of the library Experimental evaluation Conclusions Introduction 1 Unified Parallel C for High-Performance Computing Parallel Numerical Computing in UPC Design of the library 2 Private routines Shared routines Implementation of the library 3 Experimental evaluation 4 Conclusions 5 2/32

Introduction Design of the library Unified Parallel C for High-Performance Computing Implementation of the library Parallel Numerical Computing in UPC Experimental evaluation Conclusions Introduction 1 Unified Parallel C for High-Performance Computing Parallel Numerical Computing in UPC Design of the library 2 Implementation of the library 3 Experimental evaluation 4 Conclusions 5 3/32

Introduction Design of the library Unified Parallel C for High-Performance Computing Implementation of the library Parallel Numerical Computing in UPC Experimental evaluation Conclusions UPC: a Suitable Alternative for HPC in Multi-core Era Programming models: PGAS Languages: Traditionally: Shared/Distributed memory programming models UPC -> C Challenge: hybrid memory architectures Titanium -> Java PGAS (Partitioned Global Address Co-Array Fortran -> Space) Fortran UPC Compilers: Berkeley UPC GCC (Intrepid) Michigan TU HP , Cray and IBM UPC Compilers 4/32

Introduction Design of the library Unified Parallel C for High-Performance Computing Implementation of the library Parallel Numerical Computing in UPC Experimental evaluation Conclusions Important identifiers THREADS -> Total number of threads in execution MYTHREAD -> Rank of the current thread #include<stdio.h> #include<upc.h> int main() { printf("Thread %d of %d: Hello world\n", MYTHREAD, THREADS);} $ upcc -o helloworld helloworld.upc $ upcrun -n 3 helloworld Thread 0 of 3: Hello world Thread 2 of 3: Hello world Thread 1 of 3: Hello world 5/32

Introduction Design of the library Unified Parallel C for High-Performance Computing Implementation of the library Parallel Numerical Computing in UPC Experimental evaluation Conclusions Shared array declaration shared [block_factor] A [size] size -> Total number of elements block_factor -> Number of consecutive elements with affinity to the same thread -> Size of the chunks 6/32

Introduction Design of the library Unified Parallel C for High-Performance Computing Implementation of the library Parallel Numerical Computing in UPC Experimental evaluation Conclusions BLAS libraries Basic Linear Algebra Subprograms Specification of a set of numerical functions Widely used by scientists and engineers SparseBLAS and PBLAS (Parallel BLAS) BLAS implementations Generic and open source GSL -> GNU Optimized for specific architectures MKL -> Intel ACML -> AMD CXML -> Compaq MLIB -> HP 7/32

Introduction Design of the library Unified Parallel C for High-Performance Computing Implementation of the library Parallel Numerical Computing in UPC Experimental evaluation Conclusions BLAS level Tblasname Action Tcopy Copies a vector Tswap Swaps the elements of two vectors Tscal Scales a vector by a scalar Taxpy Updates a vector using another one: y = α ∗ x + y BLAS1 Tdot Dot product Tnrm2 Euclidean norm Tasum Sums the absolute value of the elements of a vector iTamax Finds the index with the maximum value iTamin Finds the index with the minimum value Tgemv Matrix-vector product BLAS2 Ttrsv Solves a triangular system of equations Tger Outer product Tgemm Matrix-matrix product BLAS3 Ttrsm Solves a block of triangular systems of equations 8/32

Introduction Design of the library Unified Parallel C for High-Performance Computing Implementation of the library Parallel Numerical Computing in UPC Experimental evaluation Conclusions Numerical computing in UPC No numerical libraries for PGAS languages Alternatives for the programmers: Develop the routine by themselves More effort Worse performance Use different programming models with parallel numerical libraries Distributed memory -> MPI Shared memory -> OpenMP Consequence: Barrier to the productivity of PGAS languages. 9/32

Introduction Design of the library Private routines Implementation of the library Shared routines Experimental evaluation Conclusions Introduction 1 Design of the library 2 Private routines Shared routines Implementation of the library 3 Experimental evaluation 4 Conclusions 5 10/32

Introduction Design of the library Private routines Implementation of the library Shared routines Experimental evaluation Conclusions Analysis of related works Distributed memory approach (Parallel -MPI- BLAS) Message Passing paradigm Only private memory New structures to represent distributed vectors or matrices Difficult to understand and work with Functions to help to work with them Creation Storage of data Deletion New approach Usage of UPC shared arrays 11/32

A Parallel Numerical Library for UPC Jorge Gonzlez-Domnguez 1 *, Mara - PowerPoint PPT Presentation

Introduction Design of the library Implementation of the library Experimental evaluation Conclusions A Parallel Numerical Library for UPC Jorge Gonzlez-Domnguez 1 *, Mara J. Martn 1 , Guillermo L. Taboada 1 , Juan Tourio 1 , Ramn

CoMo-UPC TMA evaluation service @ UPC Pere Barlet-Ros Josep Sanjus-Cuxart Advanced Broadband

KnowledgeWeb UPC Introduction Semantic Web Education Activities and Potential Contributions

EGNOS TUTORIAL Research g roup of A stronomy and GE omatics (gAGE/UPC) Universitat Politcnica

4. Multiagent Systems Design Part 6: Coordination (I). Explicit Coordination ems (SMA-UPC)

3. Reasoning in Agents Part 2: BDI Agents ems (SMA-UPC) Javier Vzquez-Salceda q Multiagent

1. Introduction ( (to Agents and Multiagent g g Systems) ems (SMA-UPC) Javier

RFID UPC Wallace Flint first suggested an automated checkout in 1932 UPC bar code formats

4. Multiagent Systems Design Part 4: Coordination models (I): ( ) Social Models ems (SMA-UPC)

Library Department FY 2021 Library Department FY 2021 Library Organization Chart Springfield

Presentation 7.3b: Multiple linear regression Murray Logan 09 Aug 2016 library (GGally) library

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

AAPoly Library Orientation Library Contacts Phone : 61 3 8610 4132 Email : library@aapoly.edu.au

How UPC is good for Primary Care Clinicians I. How UPC is good for Vermonters II. Primary Care

Pr Prog ogram am UPC Collec UPC Collection tion Na National tional WIC Associa WIC

stereovision Miguel Ares and Santiago Royo (miguel.ares@oo.upc.edu , santiago.royo@upc.edu) COST

Requirements Reuse and Patterns: A Survey GESSI Cristina Palomares (GESSI - UPC) Carme Quer

DEFENSE LOGISTICS AGENCY AMERICA S COMBAT LOGISTICS SUPPORT AGENCY Doing Business with DLA

Overview Overview Look at current state of high performance computing Top500 data for

Using M icrosoft PivotViewer to M ake Sense of the Chaos M ax Slade Principal Test M anager M

Introduction The volume of information is increasing as evolving modern science

The Impact of Multicore Multicore on on The Impact of Math Software Math Software and and

Computational Linear Algebra in the age of Multicores Alfredo Buttari , CNRS-IRIT Toulouse RAIM

On using Different Distance Measures for Fuzzy Numbers in Fuzzy Linear Regression Models Duygu

Training Training for members of a club race committee Canadian Yachting Association 1-1