course material
play

Course Material Course material: www.cs.umu.se/kurser/5DV011/VT12 - PowerPoint PPT Presentation

Course Material Course material: www.cs.umu.se/kurser/5DV011/VT12 Lecture 1: Introduction assignments, schedule, hand-outs, etc Mikael Rnnar mr@cs.umu.se Jerry Eriksson jerry@cs.umu.se Content Goal Motivate and define parallel


  1. Course Material Course material: www.cs.umu.se/kurser/5DV011/VT12 Lecture 1: Introduction assignments, schedule, hand-outs, etc Mikael Rännar mr@cs.umu.se Jerry Eriksson jerry@cs.umu.se Content Goal • Motivate and define parallel computations The goal of the course is to give basic knowledge about • Design of parallel algorithms - parallel computer hardware architectures - design of parallel algorithms • Overview of different classes of parallel - parallel programming paradigms and languages systems - compiler techniques for automatic parallelization and vectorization • Overview of different programming concepts - areas of application in parallel computing This includes knowledge about central ideas and classification • Historic and current parallel systems systems, machines with shared and distributed memory, data- och functional parallelism, parallel programming languages, scheduling • Applications demanding HPC algorithms, analyses of dependencies and different tools supporting – Research within this area at the department development of parallel programs.

  2. Scientific Computing 87vs2K9 Course evaluation vt-11 • 1987 – Minisupercomputers (1-20Mflop/s): Alliant, Convex, DEC Assignment 2 too difficult • – Parallel vector processors (PVP) (20-2000 Mflop/s) Look for a new book • • 2002 PC:s (lots of ’em) – RISC Workstations (500-4000 Mflop/s): DEC, HP, IBM, SGI, Sun – RISC based symmetric multiprocessors (10-400 Gflop/s): IBM, SUN, SGI – Parallel vector processors (10-36000! Gflop/s): Fujiutsu, Hitachi, NEC – Highly parallel proc. (1-10000 Gflop/s): HP, IBM, NEC, Fujiutsu, Hitachi – Earth Simulator 5120 vector-CPU, 36 teraflop • 2004 - IBM’s Blue Gene Project (65k CPU), 136 teraflop • 2005/6/7 - IBM’s Blue Gene Project (128k CPU) , (208k 2007), 480 teraflop • 2008 - IBM’s Roadrunner, Cell, 1.1 petaflop • 2009 - Cray XT5 (224162 cores), 1.75 petaflop • 2010 – Tihane-1A, 2.57 petaflop, NVIDIA GPU • 2011 – Fujitsu, K computer, SPARC64 (705024 cores), 10.5 petaflop Blue Gene (LLNL) Roadrunner (LANL)

  3. Jaguar (Oak Ridge NL) K computer Scientific applications History at the department/HPC2N (Research at the department) • 1986: IBM 3090VF600 – Shared memory, 6 processors with vector unit • 1987: Intel iPSC/2: 32-128 nodes • BLAS/LAPACK – Distributed memory MIMD, Hypercube with 64 noder (i386 + 4M per node) – BLAS-2, matrix-vector operations – 16 nodes with a vector board • 199X: Alliant FX2800 – BLAS-3, matrix-matrix operations – Shared memory machine MIMD, 17 i860 processors • 1996: IBM SP – LAPACK – 64 Thin nodes, 2 High nodes à 4 processors • Linjear algebra + eigenvalue problems • 1997: SGI Onyx2 – 10 MIPS R10000 – ScaLAPACK • 1998: 2-way POWER3 • 1999: Small Linux cluster • Nonlinear optimization • 2001: Better POWER3 – Neural networks • 2002: Large Linux cluster, Seth (120 dual Athlon processors), Wolfkit SCI Athlo • 2003: SweGrid Linux cluster, Ingrid, 100 nodes with Pentium4 n • Development environments • 2004: 384 CPU cluster (Opteron) Sarek 1,7 Tflops peak, 79% HP-Linpack • 2008: Linux cluster Akka, 5376 cores, 10.7 TB RAM, 46 Teraflop HP-Linpack, ranked – CONLAB/CONLAB-compiler 39 on Top 500 (June 2008) • Functional languages • 2012: Linux cluster Abisko, 15264 cores (318 nodes with 4 AMD 12 core Interlagos)

  4. The Demand for Speed! Example of applications • Grand Challenge Problems • Global atmospheric circulation • Weather prediction • Simulations of different kind – Differential equations (over time) • Deep Blue – Descritization on a lattice • Data analyses • Earthquakes • Cryptography Technical applications More Applications • Simulate atom bombs (ASCI) • VLSI-design • Scientific visualization – Simulation: different gates on one level can be tested in //, as they act independently – Show large data sets graphically – Placement: (move blocks randomly to minimize • Signal and Image Analysis an object function, e.g. Cable length) • Reservoir modeling – Cable drawing – Oil in Norway for example • Design • Rempote analysis of e.g. The Earth – Simulate flows around objects like cars, – Satellite data: adaptation, analysis, catalogization aeroplans, boats • Movies and commercials – Tenacity (hållfasthet) computations – Star Wars etc – Heat distribution • Searching on the Internet • etc, etc, etc, etc ....

  5. Parallel computations! Motive & Goal A collections of processors that • Manufacturing communicate and cooperate to – Physical laws limits the speed of the processors solve a large problem fast. – Moores law – Price/Performance • Cheaper to take many cheap and relatively fast processors than to develop one super fast processor • Possible to use fewer kinds of circuits but use more of them • Use – Decrease wall clock time – Solve bigger problems Communication media Why we’re building parallel A little physics lesson systems � Smaller transistors = faster processors. � Up to now, performance increases have � Faster processors = increased power been attributable to increasing density of consumption. transistors. � Increased power consumption = increased heat. � But there are � Increased heat = unreliable processors. inherent problems.

  6. Why we need to write parallel Solution programs � Move away from single-core systems to multicore processors. � Running multiple instances of a serial program often isn’t very useful. � “core” = central processing unit (CPU) � Think of running multiple instances of your favorite game. � Introducing parallelism!!! � What you really want is for it to run faster. Approaches to the serial problem More problems � Rewrite serial programs so that they’re � Some coding constructs can be parallel. recognized by an automatic program generator, and converted to a parallel construct. � Write translation programs that automatically convert serial programs into � However, it’s likely that the result will be a parallel programs. very inefficient program. � This is very difficult to do. � Sometimes the best parallel solution is to � Success has been limited. step back and devise an entirely new algorithm.

  7. Can all problems be Design of parallel programs solved in parallel? • Data Partitioning Dig a hole in the ground: Dig a ditch: – distribute data on the different processores • Granularity – size of the parallel parts • Load Balancing Yes No Yes No x x Can be parallelized? Can be parallelized? – Make all processors have the same load Data dependency: • Synchronization Yes No Can you put a brick anywhere x – Cooperate to produce the result anytime? Parallel program design, example Load Balancing Goal: All processors should do the same Game-of-life on a 2D net amount of work (see W-A page 190) Look at the following example: max 4 processors max 16 processors coarse-grained fine-grained small amount of communication a lot of communication Communication time = a + ß k

  8. Load Balancing Flynn's Taxonomy 0 0 0 0 0 0 0 0 Number of Data Streams Row block mappning 1 1 1 1 1 1 1 1 Proc.: 0 1 2 3 Single Multiple 2 2 2 2 2 2 2 2 Nr : 13 22 10 3 3 3 3 3 3 3 3 3 SISD SIMD Number of Single (von Neuman) (vector, array) 0 1 2 3 0 1 2 3 Instruction Column block mappning Streams 0 1 2 3 0 1 2 3 MISD MIMD Multiple Proc.: 0 1 2 3 (?) (multiple micros) 0 1 2 3 0 1 2 3 Nr : 4 13 19 12 0 1 2 3 0 1 2 3 0 1 0 1 0 1 0 1 • Flynn does not describe modernities like Block-cyclisc mappning 2 3 2 3 2 3 2 3 Proc.: 0 1 2 3 Nr : 11 12 12 14 0 1 0 1 0 1 0 1 • Pipelining (MISD?) 2 3 2 3 2 3 2 3 • Memory model • Interconnection network Synchronous paradigms Paradigms Vector/Array • Each processor is alotted a very small A model of the world that is used to formulate a computer solution to a problem operation • Pipeline parallelism • Good when operations can be broken down into fine-grained steps

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend