Multi/Many Core Programming Strategies Greg Michaelson School of - PowerPoint PPT Presentation

Multicore Challenge Conference 2012 UWE, Bristol Multi/Many Core Programming Strategies Greg Michaelson School of Mathematical & Computer Sciences Heriot-Watt University Multicore Challenge Conference 2012 1

Overview RAM • good old fashioned parallel computing PE PE PE based on lots of identical single CPUs is Network over Shared memory RAM RAM RAM PE PE PE Network Distributed memory Multicore Challenge Conference 2012 2

Overview • Moore’s Law implications have changed – speed of CPUs now stable at Intel 4004 – 1971 http://en.wikipedia.org/wiki/Intel_4004 ~3.5 GHz – performance increases from multi- & many-core CPUs Intel Core I7 – 2008 http://en.wikipedia.org/wiki/Intel_Core_i7 Multicore Challenge Conference 2012 3

Overview • multi-processor architectures increasingly hierarchical & heterogeneous • message passing grids of clusters of: – now: shared memory Hector – Edinburgh Parallel Computer Centre • 464 compute blades with… multi-core • 4 compute nodes with… • 2 *12-core processors. • 44,544 cores http://www.hector.ac.uk/abouthector/hectorbasics/ Multicore Challenge Conference 2012 4

Overview • multi-processor architectures increasingly hierarchical & heterogeneous • message passing grids of clusters of: – soon: message passing many-core arrays SCC – Intel Research http://techresearch.intel.com/ProjectDetails.aspx?Id=1 Multicore Challenge Conference 2012 5

Overview • cores also have SIMD processors (MMX/SSE) • non-uniform memory – differing degrees/levels of private & shared cache • old programming strategies break down – one size no longer fits all • need for hybrid strategies Multicore Challenge Conference 2012 6

Overview • developing multi- processor software is still a black art • would like: – low effort – flexibility – scalability – future proof – re-use Multicore Challenge Conference 2012 7

Overview • different approaches: – require different effort – offer different degree of control over: • task division • communications • process placement Multicore Challenge Conference 2012 8

Methodological choices START Multicore Challenge Conference 2012 9

Methodological choices START automatic parallelisation Multicore Challenge Conference 2012 10

Automatic Parallelisation • vector/array parallelisation • implicit – e.g. SIMD in C with gcc • language directives – Fortrans: Fortran 90; F; High Performance Fortran Multicore Challenge Conference 2012 11

Automatic Parallelisation • low effort – no communications – no/minimal task division • poor flexibility/scalability – good for regular problems – good on uniform architectures Multicore Challenge Conference 2012 12

Methodological choices START automatic do it yourself parallelisation Multicore Challenge Conference 2012 13

Methodological choices START automatic do it yourself parallelisation skeleton Multicore Challenge Conference 2012 14

Algorithmic skeletons • capture common stage patterns of data & 1 control parallelism farmer stage – e.g. pipeline; farm; 2 divide & conquer worker • skeleton libraries worker worker stage N for C/Java process farm pipeline Multicore Challenge Conference 2012 15

Algorithmic skeletons • capture common parent patterns of data & control parallelism parent/ parent/ – e.g. pipeline; farm; child child divide & conquer parent/ parent/ parent/ parent/ • skeleton libraries child child child child for C/Java divide & conquer Multicore Challenge Conference 2012 16

Algorithmic skeletons • industrial frameworks • e.g. Google Map- Reduce • Apache Hadoop Google Map-Reduce http://labs.google.com/papers/mapreduce-osdi04-slides/index-auto- 0008.html Multicore Challenge Conference 2012 17

Algorithmic skeletons • industrial frameworks • e.g. Microsoft Dryad Microsoft Dryad www.wikibench.eu/CloudCP2011/wp-content/.../Isaacs-keynote.ppsx Multicore Challenge Conference 2012 18

Algorithmic skeletons • can choose appropriate skeleton for problem class • medium effort to use skeleton library/industrial framework – must fit problem to skeleton • high effort to develop own skeletons – must make communication & task division explicit Multicore Challenge Conference 2012 19

Algorithmic skeletons • can hand tune for: – problem – irregularity – scalability – process placement • strong potential re-use of components Multicore Challenge Conference 2012 20

Methodological choices START automatic do it yourself parallelisation programmed skeleton parallelisation Multicore Challenge Conference 2012 21

Methodological choices START automatic do it yourself parallelisation programmed skeleton parallelisation operating system Multicore Challenge Conference 2012 22

Operating system • independent programs – realised as threads • communication via pipes/sockets • bolted together with shell scripts Multicore Challenge Conference 2012 23

Operating system • low effort • highly dependent on underlying operating system for: – communication – scheduling – process placement • unpredictable performance Multicore Challenge Conference 2012 24

Methodological choices START automatic do it yourself parallelisation programmed skeleton parallelisation operating explicit system processes Multicore Challenge Conference 2012 25

Methodological choices START automatic do it yourself parallelisation programmed skeleton parallelisation operating explicit system processes library Multicore Challenge Conference 2012 26

Library • shared memory – OpenMP • platform & architecture independent – Posix Threads • Unix/Linux specific/architecture independent – Intel Threading Building Blocks • platform/architecture independent Multicore Challenge Conference 2012 27

Library • distributed memory – MPI & PVM • specialised hardware – SIMD on MMX/SSE – CUDA & OpenCL for GPU arrays Multicore Challenge Conference 2012 28

Library • now common to use: – MPI for inter-cluster – OpenMP for intra-cluster • medium to high effort – explicit communication & task division • can shape algorithm to architecture • best for irregular problem/architecture Multicore Challenge Conference 2012 29

Library • often end up re-inventing some standard algorithmic skeleton • good potential for reuse of: – structure – components Multicore Challenge Conference 2012 30

Methodological choices START automatic do it yourself parallelisation programmed skeleton parallelisation operating explicit system processes library hand crafted Multicore Challenge Conference 2012 31

Hand crafted • very low level • shared memory – critical regions via semaphores • distributed memory – communication over RS232; USB Multicore Challenge Conference 2012 32

Hand crafted • very high effort • highly problem/architecture specific • best for embedded systems Multicore Challenge Conference 2012 33

Questions... • is my problem suitable for parallelisation? • how do I know how my problem scales? • if I parallelise my problem, how do I tell how much communication overhead will be incurred? • how do I assess the benefits of shared versus distributed memory? 28th June, 2011 KTN ICT Scalable Applications & Services 34

Questions... • can I do better with smarter solutions on my existing technology? • where can I get help with deciding how to proceed? • have other people already come up with solutions that might work for me? 28th June, 2011 KTN ICT Scalable Applications & Services 35

Future • UK has major research strengths in multi- processor architectures, parallel languages/compilers, skeletons etc • groups don’t talk much to each other or to practitioners e.g. in eScience • need to build inclusive UK community • opportunities through – EPSRC multi-core priority for ITC – TSB ICT KTN for multi-core Multicore Challenge Conference 2012 36

Multi/Many Core Programming Strategies Greg Michaelson School of - PowerPoint PPT Presentation

Multicore Challenge Conference 2012 UWE, Bristol Multi/Many Core Programming Strategies Greg Michaelson School of Mathematical & Computer Sciences Heriot-Watt University Multicore Challenge Conference 2012 1 Overview RAM good old

Welcome Welcome Core: Core A Regional Destination Core: Core UL Core: Core Downtown

Caching, Parallelism, Fault Tolerance Marco Serafini COMPSCI 532 Lectures 2-3 Memory Hierarchy

Real-Time Multi/Many-Core Architecture Heechul Yun 1 Real-Time Multi/Many-Core Architecture

Toward Efficient Many-to-Many Broadcast in Dynamic Wireless Networks Fabian Mager , Carsten

Multi-core Programming: Implicit Parallelism Tuukka Haapasalo April 16, 2009 Tuukka Haapasalo

Motivation Memory is a shared resource Core Core Memory Core Core Threads requests

PSHE curriculum Robert Willmott Core Themes Core Theme 1: Health and Core Theme 2: Core Theme

Final Assembly Chip Core Your final project chip consists of a core The Chip Core is

Multi-core model checking for biological applications Jaco van de Pol 22 November 2013

Efficient Wake-Up Scheduling for Efficient Wake-Up Scheduling for Multi-Core Systems Multi-Core

Scalable Multi-Core Model Checking Alfons Laarman ( alfons@laarman.com ), Theory joint work with

From CPU-GPU to heterogeneous multi-core Yesterday (2000-2010) Homogeneous multi-core Discrete

Decentralized Dynamic Scheduling across Heterogeneous Multi core across Heterogeneous Multi

A review of dimensionality reduction in high-dimensional data using multi-core and many-core

Energy-Aware Matrix Computations on Multi-Core and Many-core Platforms Enrique S. Quintana-Ort

Comparing P2P Systems Anthony D. Joseph John Kubiatowicz CS294-4 Why so many systems? Many

Systems Engineering Expedient Prototyping Modeling and Design for Military Vehicles with Evolving

Developing a PowerPoint and Notes Leading by Convening: Meeting to Co-Create Tools BASIC CONTENT

dose-finding study using the Continual Reassessment Method Graham Wheeler Cancer Research UK

Communicating Process Architectures in Light of Parallel Design Patterns and Skeletons Dr Kevin

A PRESENTATION FOR A GROUP OF AUTOMORPHISMS OF A SIMPLICIAL COMPLEX by M. A. ARMSTRONG (Received

Com m onw ealth of Pennsylvania State GeoBoard Meeting May 8, 2017 1 1 Agenda Welcome

Rcpp Oliver Heidmann Supervised by: Julian Kunkel University of Hamburg

Presentation Week1 Hanan Alnizami sdsa Week 1 Plans Tasks Accomplishments