parallel programming and heterogeneous computing
play

Parallel Programming and Heterogeneous Computing A2 - Parallel - PowerPoint PPT Presentation

Parallel Programming and Heterogeneous Computing A2 - Parallel Hardware Max Plauth, Sven Khler, Felix Eberhardt, Lukas Wenzel and Andreas Polze Operating Systems and Middleware Group Types of Parallel Hardware Task Level Parallelism Data


  1. Parallel Programming and Heterogeneous Computing A2 - Parallel Hardware Max Plauth, Sven Köhler, Felix Eberhardt, Lukas Wenzel and Andreas Polze Operating Systems and Middleware Group

  2. Types of Parallel Hardware Task Level Parallelism Data Level Parallelism Multiple operations are executed in The same operation is applied in parallel parallel. to multiple units of data. I I I I I I I I I I I I I I I I D D D D D D D D D D D D D D D D D D D D D D D I D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D ParProg 2020 A2 Parallel Hardware Lukas Wenzel Chart 2

  3. Hardware Taxonomy [Flynn1966] Multiple Data Streams I I I I I I I D D D D D D D D D D D D I D D D D D D D D D D D D D Instruction Streams D D D D D D D D D D D D D D D D D D D D D SISD SIMD Single Instruction stream Single Instruction stream Single Data stream Multiple Data streams I I I I I I I I I I I I ParProg 2020 A2 I I I I I Multiple I I Parallel Hardware I I I I I I I I I D Lukas Wenzel D D D D D D D D D D D D D D D D D I I D D D D D D D D D D D D D D D D D D D D D D I D D D D D D D D D D D D D D D D Chart 3 MISD MIMD Multiple Instruction streams Multiple Instruction streams Single Data stream Multiple Data streams

  4. Hardware Taxonomy [Flynn1966] Multiple Data Streams LD A LD A 0 A 1 A n LD B LD B 0 B 1 B n ADD C A B ADD C 0 B 0 A 0 C 1 B 1 A 1 C n B n A n ST C ST C 0 C 1 C n MUL A B 2 MUL A 0 B 0 2 A 1 B 1 2 A n B n 2 Instruction Streams ST A ST A 0 A 1 A n SISD SIMD LD LD LD LD A A A D LD LD LD ADD B B B D D 6 ParProg 2020 A2 ADD C 0 B SUB C n B 8 ADD C B ST A A D Multiple Parallel Hardware MUL C 0 C 0 3 DIV D n ST LD T A C n C Lukas Wenzel SUB B MUL D n MUL B 2 CMP T C 0 C 0 C n C n A D ST ST ST BGE label C 0 C n A Chart 4 MISD MIMD

  5. MISD Hardware Most exotic class of parallel hardware, not in mainstream use. Sub-System A Redundant systems like safety-critical = embedded controllers or high-reliability Input Sub-System B Voter Output mainframes Parallelism not for performance, ■ but dependability Sub-System C Example: Triple Modular Redundant Architecture ParProg 2020 A2 Parallel Hardware Lukas Wenzel Not covered in this lecture. Chart 5

  6. SIMD Hardware Popular class of parallel hardware for special purpose systems. Vector processors = Early examples: ILLIAC IV, Cray-1, ... ■ ILLIAC IV Control Unit Cray-1 Recently in widespread use: GPUs ■ Instruction Set Extensions ■ ParProg 2020 A2 (AltiVec, SSE, AVX, ...) Parallel Hardware Lukas Wenzel NVidia Pascal GPU Module Covered in chapter C. Chart 6

  7. MIMD Hardware Classic and most general class of parallel hardware. Wide range of systems from = Multicore CPUs to Supercomputers and Clusters POWER9 Die with 24 Cores Variety of architectures and ➢ characteristics requires further distinction ParProg 2020 A2 Parallel Hardware Lukas Wenzel Summit Supercomputer Chart 7

  8. MIMD Hardware Taxonomy MIMD SM-MIMD DM-MIMD (Shared Memory) (Distributed Memory) Processing elements can directly Processing elements can access their access a common address space private address spaces and exchange messages Data Data Data Data Private Memory Private Memory Shared Memory ParProg 2020 A2 Task Task Task Task Parallel Hardware Task Task Task Task ... ... Task Task Task Task Lukas Wenzel Processing Processing Processing Processing Element Element Element Element Message Message Chart 8 Message Interconnect / Network

  9. MIMD Hardware Taxonomy MIMD SM-MIMD DM-MIMD e.g. Multicore CPUs e.g. Clusters Highly scalable due to low coupling Low interaction overhead due to high ■ ■ between processing elements coupling between processing elements ~ Shared Nothing Parallelism ~ Shared Memory Parallelism Covered in chapter D. Covered in chapter B. ParProg 2020 A2 Terminology Parallel Hardware shared memory system vs. distributed memory system Lukas Wenzel SM-MIMD vs. DM-MIMD Multiprocessor vs. Multicomputer Chart 9 see [Tanenbaum1985], [Foster1995], [Pfister1998]

  10. SM-MIMD Hardware Processing elements can directly access a common address space Uniform memory access (UMA) system ■ Processing elements observe the same memory access characteristics over the entire memory. Simple to program against, but scalability issues ➢ Non-uniform memory access (NUMA) system ■ Processing elements have different access characteristics for different memory ParProg 2020 A2 regions Parallel Hardware Lukas Wenzel Scales well, but unaware programs can exhibit performance issues ➢ Chart 10

  11. SM-MIMD Hardware MIMD SM-MIMD DM-MIMD (Shared Memory) (Distributed Memory) UMA NUMA (Non-Uniform Memory Access) (Uniform Memory Access) PE PE PE PE PE Memory Memory Node Node ParProg 2020 A2 Parallel Hardware PE PE Lukas Wenzel Memory Memory Memory Chart 11 Node Node

  12. DM-MIMD Hardware Processing elements can access their private address spaces and exchange messages Cluster : Multiple independent machines connected through a network Compute cluster: Speedup □ Load Balancing cluster: Throughput □ High Availability cluster: Dependability □ All clusters are distributed systems, but only compute clusters intended for parallel workloads. ParProg 2020 A2 Parallel Hardware Lukas Wenzel This lecture considers only compute clusters. Chart 12

  13. DM-MIMD Hardware Simple way of scaling available compute resources: Just connect multiple machines in a network. Dominant architecture for High-End Systems: Especially High-Performance Computing 1995 Toy Story Render Farm 117 nodes × 2 CPUs = 234 CPUs Cluster of Desktop Cluster of RaspberryPI Singleboard 2001 Monsters Inc. Render Farm Computers Computers 250 nodes × 14 CPUs = 3500 CPUs 2019 Summit cluster (TOP500 #1 in 2019) 4608 nodes, 2 PB RAM, 10 MW power ParProg 2020 A2 × 2 CPUs × 22 Cores = 202 752 Cores Parallel Hardware × 6 GPUs = 27 648 GPUs Lukas Wenzel Chart 13 Summit Cluster

  14. Literature [Flynn1966] "Very High-Speed Computing Systems" Flynn, Michael J. Proceedings of the IEEE 54.12 (1966) IEEE [Tanenbaum 1985] "Distributed Operating Systems" Tanenbaum, Andrew S and Van Renesse, Robbert. ACM Computing Surveys 17.4 (1985) ACM [Foster1995] "Designing and Building Parallel Programs" Foster, Ian (1995) Addison-Wesley [Pfister1998] ParProg 2020 A2 Parallel Hardware "In Search of Clusters" Pfister, Gregory F. 2nd edition (1998) Prentice-Hall Inc Lukas Wenzel Chart 14

  15. And now for a break and a bowl of Sencha. *or beverage of your choice

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend