On the behaviour of the MKL library in multicore shared-memory - PowerPoint PPT Presentation

Motivation Systems Using MKL On the behaviour of the MKL library in multicore shared-memory systems Domingo Gim´ enez Alexey Lastovetsky Departamento de Inform´ atica School of Computer Science y Sistemas and Informatics Universidad de Murcia University College Dublin Jornadas de Paralelismo, Valencia, Septiembre 2010

Motivation Systems Using MKL Matrix multiplication on platforms composed of multicore The goal: To identify the shape matrix multiplication has in a multicore as a function of the problem size and the number of threads, to decide the number of threads to use to obtain the lowest execution time To use this information to develop two-level (OpenMP+BLAS) versions of the multiplication, and select the number of threads in each level To use this information to develop three-level (MPI+OpenMP+BLAS) versions, and select the number of processes and threads in each level To use this information to develop heterogeneous/distributed three-level (MPI+OpenMP+BLAS) versions, and select the number of processes and its distribution or the data partition, and in each processor the number of threads in each level

Motivation Systems Using MKL Systems, basic components name architecture icc MKL rosebud05 4 Itanium dual-core 11.1 10.2 8 cores rosebud09 1 AMD quad-core 11.1 10.2 4 cores hipatia8 2 Xeon E5462 quad-core 10.1 10.0 8 cores hipatia16 4 Xeon X7350 quad-core 10.1 10.0 16 cores arabi 2 Xeon L5450 quad-core 11.1 10.2 8 cores ben HP Integrity Superdome 11.1 10.2 128 cores bertha IBM 16 Xeon X7460 hexa-core 11.0 11.0 96 cores

Motivation Systems Using MKL Systems Rosebud (Polytechnic Univ. of Valencia): cluster with 38 cores 2 nodes single-processors, 2 nodes dual-processors, 2 nodes with 4 dual-core, 2 nodes with 2 dual-core, 2 nodes with 1 quad-core Hipatia (Polytechnic Univ. of Cartagena): cluster with 152 cores 16 nodes with 2 quad-core, 2 nodes with 4 quad-core, 2 nodes with 2 dual-core Ben-Arabi (Supercomputing Centre of Murcia): Shared-memory + cluster: 944 cores Arabi: cluster of 102 nodes with 2 quad-core Ben: HP Superdome, cc-NUMA with 128 cores Bertha (INRIA Bordeaux Ouest): Shared-memory cc-NUMA: 96 cores 4 nodes, each node 4 processors, each processor hexa-core

Motivation Systems Using MKL Ben architecture Hierarchical composition with crossbar interconnection. Two basic components: the computers and two backplane crossbars. Each computer has 4 dual-core Itanium-2 and a controller to connect the CPUs with the local memory and the crossbar commuters. The maximum memory bandwidth in a computer is 17.1 GB/s and with the crossbar commuters 34.5 GB/s. The access to the memory is non uniform and the user does not control where threads are assigned.

On the behaviour of the MKL library in multicore shared-memory - PowerPoint PPT Presentation

Motivation Systems Using MKL On the behaviour of the MKL library in multicore shared-memory systems Domingo Gim enez Alexey Lastovetsky Departamento de Inform atica School of Computer Science y Sistemas and Informatics Universidad

State of Multicore OCaml KC Sivaramakrishnan University of OCaml Labs Cambridge Outline

The Why, Where and How of Multicore Anant Agarwal MIT and Tilera Corp. What is Multicore?

Multicore Multicore curiculum 1 Motivation Moores Law: the number of transistors double

Session 14 Introduction to Behaviour that Challenges SECTION 5: 1 Behaviour Behaviour that is

CS 240A: Shared Memory & Multicore Programming with Cilk++ Multicore and NUMA

Multicore OCaml GC KC Sivaramakrishnan, Stephen Dolan University of OCaml Labs Cambridge

Multicore Synchronization a pragmatic introduction Multicore Synchronization This is a talk on

RETHINKING OPERATING SYSTEM DESIGNS FOR A Ken Birman Based heavily MULTICORE WORLD on a slide

Chapter 6. Object and System Behaviour 1. Object Behaviour Modelling 2. Global System Behaviour

Library Department FY 2021 Library Department FY 2021 Library Organization Chart Springfield

Presentation 7.3b: Multiple linear regression Murray Logan 09 Aug 2016 library (GGally) library

MI MI and Shared MI MI and Shared and Shared Decision Making and Shared Decision Making

AAPoly Library Orientation Library Contacts Phone : 61 3 8610 4132 Email : library@aapoly.edu.au

Catherine Lennox EDPS 650 What is prosocial behaviour? How is prosocial behaviour related to

ANTI SOCIAL BEHAVIOUR WHAT IS ANTISOCIAL WHAT IS ANTISOCIAL BEHAVIOUR BEHAVIOUR Bullying

Anti- -Social Behaviour Statistics Social Behaviour Statistics Anti for Cannock Chase for

R RITSUMEIKAN Introduction 1 Methodology Content 2 Experiments 3 Conclusion 4

Seminar on GPGPU Programming: Optimising Matrix Multiplications with CUDA Axel Eirola 28.01.2010

Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2016 CPS343

Offshoring: A new methodology for complex and spatial LCA calculations Pascal Lesage (CIRAIG,

EMRAS 2 EMRAS 2 Working Group 1 Working Group 1 Legacy Sites and NORM Legacy Sites and NORM

WELCOME! Alvin ISD Magnet Academic Program Agenda for the Evening Welcome &

New methods for description and assessment of climate system changes V.I. Shishlov Institute of

Get Covered Ohio: Working with Enroll America to Maximize Enrollment Hugh F. Trey Daly III,

On the behaviour of the MKL library in multicore shared-memory - PowerPoint PPT Presentation

Motivation Systems Using MKL On the behaviour of the MKL library in multicore shared-memory systems Domingo Gim enez Alexey Lastovetsky Departamento de Inform atica School of Computer Science y Sistemas and Informatics Universidad

State of Multicore OCaml KC Sivaramakrishnan University of OCaml Labs Cambridge Outline

The Why, Where and How of Multicore Anant Agarwal MIT and Tilera Corp. What is Multicore?

Multicore Multicore curiculum 1 Motivation Moores Law: the number of transistors double

Session 14 Introduction to Behaviour that Challenges SECTION 5: 1 Behaviour Behaviour that is

CS 240A: Shared Memory &amp; Multicore Programming with Cilk++ Multicore and NUMA

Multicore OCaml GC KC Sivaramakrishnan, Stephen Dolan University of OCaml Labs Cambridge

Multicore Synchronization a pragmatic introduction Multicore Synchronization This is a talk on

RETHINKING OPERATING SYSTEM DESIGNS FOR A Ken Birman Based heavily MULTICORE WORLD on a slide

Chapter 6. Object and System Behaviour 1. Object Behaviour Modelling 2. Global System Behaviour

Library Department FY 2021 Library Department FY 2021 Library Organization Chart Springfield

Presentation 7.3b: Multiple linear regression Murray Logan 09 Aug 2016 library (GGally) library

MI MI and Shared MI MI and Shared and Shared Decision Making and Shared Decision Making

AAPoly Library Orientation Library Contacts Phone : 61 3 8610 4132 Email : library@aapoly.edu.au

Catherine Lennox EDPS 650 What is prosocial behaviour? How is prosocial behaviour related to

ANTI SOCIAL BEHAVIOUR WHAT IS ANTISOCIAL WHAT IS ANTISOCIAL BEHAVIOUR BEHAVIOUR Bullying

Anti- -Social Behaviour Statistics Social Behaviour Statistics Anti for Cannock Chase for

R RITSUMEIKAN Introduction 1 Methodology Content 2 Experiments 3 Conclusion 4

Seminar on GPGPU Programming: Optimising Matrix Multiplications with CUDA Axel Eirola 28.01.2010

Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2016 CPS343

Offshoring: A new methodology for complex and spatial LCA calculations Pascal Lesage (CIRAIG,

EMRAS 2 EMRAS 2 Working Group 1 Working Group 1 Legacy Sites and NORM Legacy Sites and NORM

WELCOME! Alvin ISD Magnet Academic Program Agenda for the Evening Welcome &amp;

New methods for description and assessment of climate system changes V.I. Shishlov Institute of

Get Covered Ohio: Working with Enroll America to Maximize Enrollment Hugh F. Trey Daly III,

CS 240A: Shared Memory & Multicore Programming with Cilk++ Multicore and NUMA

WELCOME! Alvin ISD Magnet Academic Program Agenda for the Evening Welcome &