Course Material Course material: www.cs.umu.se/kurser/5DV011/VT12 - PowerPoint PPT Presentation

Course Material Course material: www.cs.umu.se/kurser/5DV011/VT12 Lecture 1: Introduction assignments, schedule, hand-outs, etc Mikael Rännar mr@cs.umu.se Jerry Eriksson jerry@cs.umu.se Content Goal • Motivate and define parallel computations The goal of the course is to give basic knowledge about • Design of parallel algorithms - parallel computer hardware architectures - design of parallel algorithms • Overview of different classes of parallel - parallel programming paradigms and languages systems - compiler techniques for automatic parallelization and vectorization • Overview of different programming concepts - areas of application in parallel computing This includes knowledge about central ideas and classification • Historic and current parallel systems systems, machines with shared and distributed memory, data- och functional parallelism, parallel programming languages, scheduling • Applications demanding HPC algorithms, analyses of dependencies and different tools supporting – Research within this area at the department development of parallel programs.

Scientific Computing 87vs2K9 Course evaluation vt-11 • 1987 – Minisupercomputers (1-20Mflop/s): Alliant, Convex, DEC Assignment 2 too difficult • – Parallel vector processors (PVP) (20-2000 Mflop/s) Look for a new book • • 2002 PC:s (lots of ’em) – RISC Workstations (500-4000 Mflop/s): DEC, HP, IBM, SGI, Sun – RISC based symmetric multiprocessors (10-400 Gflop/s): IBM, SUN, SGI – Parallel vector processors (10-36000! Gflop/s): Fujiutsu, Hitachi, NEC – Highly parallel proc. (1-10000 Gflop/s): HP, IBM, NEC, Fujiutsu, Hitachi – Earth Simulator 5120 vector-CPU, 36 teraflop • 2004 - IBM’s Blue Gene Project (65k CPU), 136 teraflop • 2005/6/7 - IBM’s Blue Gene Project (128k CPU) , (208k 2007), 480 teraflop • 2008 - IBM’s Roadrunner, Cell, 1.1 petaflop • 2009 - Cray XT5 (224162 cores), 1.75 petaflop • 2010 – Tihane-1A, 2.57 petaflop, NVIDIA GPU • 2011 – Fujitsu, K computer, SPARC64 (705024 cores), 10.5 petaflop Blue Gene (LLNL) Roadrunner (LANL)

Jaguar (Oak Ridge NL) K computer Scientific applications History at the department/HPC2N (Research at the department) • 1986: IBM 3090VF600 – Shared memory, 6 processors with vector unit • 1987: Intel iPSC/2: 32-128 nodes • BLAS/LAPACK – Distributed memory MIMD, Hypercube with 64 noder (i386 + 4M per node) – BLAS-2, matrix-vector operations – 16 nodes with a vector board • 199X: Alliant FX2800 – BLAS-3, matrix-matrix operations – Shared memory machine MIMD, 17 i860 processors • 1996: IBM SP – LAPACK – 64 Thin nodes, 2 High nodes à 4 processors • Linjear algebra + eigenvalue problems • 1997: SGI Onyx2 – 10 MIPS R10000 – ScaLAPACK • 1998: 2-way POWER3 • 1999: Small Linux cluster • Nonlinear optimization • 2001: Better POWER3 – Neural networks • 2002: Large Linux cluster, Seth (120 dual Athlon processors), Wolfkit SCI Athlo • 2003: SweGrid Linux cluster, Ingrid, 100 nodes with Pentium4 n • Development environments • 2004: 384 CPU cluster (Opteron) Sarek 1,7 Tflops peak, 79% HP-Linpack • 2008: Linux cluster Akka, 5376 cores, 10.7 TB RAM, 46 Teraflop HP-Linpack, ranked – CONLAB/CONLAB-compiler 39 on Top 500 (June 2008) • Functional languages • 2012: Linux cluster Abisko, 15264 cores (318 nodes with 4 AMD 12 core Interlagos)

The Demand for Speed! Example of applications • Grand Challenge Problems • Global atmospheric circulation • Weather prediction • Simulations of different kind – Differential equations (over time) • Deep Blue – Descritization on a lattice • Data analyses • Earthquakes • Cryptography Technical applications More Applications • Simulate atom bombs (ASCI) • VLSI-design • Scientific visualization – Simulation: different gates on one level can be tested in //, as they act independently – Show large data sets graphically – Placement: (move blocks randomly to minimize • Signal and Image Analysis an object function, e.g. Cable length) • Reservoir modeling – Cable drawing – Oil in Norway for example • Design • Rempote analysis of e.g. The Earth – Simulate flows around objects like cars, – Satellite data: adaptation, analysis, catalogization aeroplans, boats • Movies and commercials – Tenacity (hållfasthet) computations – Star Wars etc – Heat distribution • Searching on the Internet • etc, etc, etc, etc ....

Parallel computations! Motive & Goal A collections of processors that • Manufacturing communicate and cooperate to – Physical laws limits the speed of the processors solve a large problem fast. – Moores law – Price/Performance • Cheaper to take many cheap and relatively fast processors than to develop one super fast processor • Possible to use fewer kinds of circuits but use more of them • Use – Decrease wall clock time – Solve bigger problems Communication media Why we’re building parallel A little physics lesson systems � Smaller transistors = faster processors. � Up to now, performance increases have � Faster processors = increased power been attributable to increasing density of consumption. transistors. � Increased power consumption = increased heat. � But there are � Increased heat = unreliable processors. inherent problems.

Why we need to write parallel Solution programs � Move away from single-core systems to multicore processors. � Running multiple instances of a serial program often isn’t very useful. � “core” = central processing unit (CPU) � Think of running multiple instances of your favorite game. � Introducing parallelism!!! � What you really want is for it to run faster. Approaches to the serial problem More problems � Rewrite serial programs so that they’re � Some coding constructs can be parallel. recognized by an automatic program generator, and converted to a parallel construct. � Write translation programs that automatically convert serial programs into � However, it’s likely that the result will be a parallel programs. very inefficient program. � This is very difficult to do. � Sometimes the best parallel solution is to � Success has been limited. step back and devise an entirely new algorithm.

Can all problems be Design of parallel programs solved in parallel? • Data Partitioning Dig a hole in the ground: Dig a ditch: – distribute data on the different processores • Granularity – size of the parallel parts • Load Balancing Yes No Yes No x x Can be parallelized? Can be parallelized? – Make all processors have the same load Data dependency: • Synchronization Yes No Can you put a brick anywhere x – Cooperate to produce the result anytime? Parallel program design, example Load Balancing Goal: All processors should do the same Game-of-life on a 2D net amount of work (see W-A page 190) Look at the following example: max 4 processors max 16 processors coarse-grained fine-grained small amount of communication a lot of communication Communication time = a + ß k

Load Balancing Flynn's Taxonomy 0 0 0 0 0 0 0 0 Number of Data Streams Row block mappning 1 1 1 1 1 1 1 1 Proc.: 0 1 2 3 Single Multiple 2 2 2 2 2 2 2 2 Nr : 13 22 10 3 3 3 3 3 3 3 3 3 SISD SIMD Number of Single (von Neuman) (vector, array) 0 1 2 3 0 1 2 3 Instruction Column block mappning Streams 0 1 2 3 0 1 2 3 MISD MIMD Multiple Proc.: 0 1 2 3 (?) (multiple micros) 0 1 2 3 0 1 2 3 Nr : 4 13 19 12 0 1 2 3 0 1 2 3 0 1 0 1 0 1 0 1 • Flynn does not describe modernities like Block-cyclisc mappning 2 3 2 3 2 3 2 3 Proc.: 0 1 2 3 Nr : 11 12 12 14 0 1 0 1 0 1 0 1 • Pipelining (MISD?) 2 3 2 3 2 3 2 3 • Memory model • Interconnection network Synchronous paradigms Paradigms Vector/Array • Each processor is alotted a very small A model of the world that is used to formulate a computer solution to a problem operation • Pipeline parallelism • Good when operations can be broken down into fine-grained steps

Course Material Course material: www.cs.umu.se/kurser/5DV011/VT12 - PowerPoint PPT Presentation

Course Material Course material: www.cs.umu.se/kurser/5DV011/VT12 Lecture 1: Introduction assignments, schedule, hand-outs, etc Mikael Rnnar mr@cs.umu.se Jerry Eriksson jerry@cs.umu.se Content Goal Motivate and define parallel

Material Handling Chapter 5 Designing material handling systems Overview of material

Course Orientation q Course Description q Course Outcomes q Course Requirements q Course Outline

Rational Phosphorus Rational Phosphorus Management in Biosolids Management in Biosolids

INFORMATION SHARING INITIATIVES INFORMATION SHARING INITIATIVES MATERIAL AND NON- -MATERIAL

Angular Material Design Whats New in Angular Material Design Whats Cool in Material Design

Introduction to Laser Material Introduction to Laser Material Processing ME 677: Laser Material

Learning to Train Course Introduction 2 Classroom I am sessions Reference Material

Course Search Widget Topics StudyLink Course Search Widget Demo Generic Course Search

Course Specifications/Detailed Course Outline Course code : STA 331 2.0 Course title :

PHYSICAL MATERIAL VIRTUAL SUBSTANCE SOURCE MATERIAL SCAN MATERIAL READY

Material Design in practice Marcin Korniluk material design promo video What is Material Design?

DPD Basic Bicycle Course Course Objectives COURSE GOAL: The course will provide the trainee with

CANVAS COURSE PROFILE STUDENT PERFORMANCE COURSE OVERVIEW ASSIGNMENT AND SUBMISSION ANALYSIS

Leadplane Training Course Leadplane Training Course Course Objectives Describe procedures for

Statistics II Xavier Vil Course 2004-2005 1.- Course Contents 2.- Course Resources 3.-

ARM Microcontroller Course June 3, 2015 ARM Microcontroller Course The Course Direct Digital

Processor Architectures 2 Schedule Friday, April 13 th

24 Implementation of Iso-P Triangular Elements IFEM Ch 24 Slide 1 Department of

Variational Time Integrators Symposium on Geometry Processing Course 2015 Andrew Sageman-Furnas

Asymptotic enumeration of labelled planar graphs . Omer Gimnez, Marc Noy omer.gimenez@upc.edu

THREAD LEVEL PARALLELISM Mahdi Nazm Bojnordi Assistant Professor School of Computing University

Lecture 5.1 Flynns Taxonomy EN 600.320/420/620 Instructor: Randal Burns 12 February 2018

CS3350B Computer Organization Chapter 5: Parallel Architectures Alex Brandt Department of

COMP 633 - Parallel Computing Lecture 15 October 1, 2020 Programming Accelerators using