multi processors and gpu
play

Multi-Processors and GPU Philipp Koehn 2 May 2018 Philipp Koehn - PowerPoint PPT Presentation

Multi-Processors and GPU Philipp Koehn 2 May 2018 Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018 Predicted CPU Clock Speed 1 Clock speed 1971: 740 kHz, 2018: 45 GHz Source: Kurzweil "The Singularity


  1. Multi-Processors and GPU Philipp Koehn 2 May 2018 Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  2. Predicted CPU Clock Speed 1 Clock speed 1971: 740 kHz, 2018: 45 GHz Source: Kurzweil "The Singularity is Near" (2005) Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  3. Actual CPU Clock Speed 2 Clock speed 2018: 3 GHz Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  4. Why? 3 Intel estimate, around 2000: 400 kW by 2018? Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  5. Moore’s Law 4 Number of transitors per chip still exponential Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  6. What to do with the Transitors? 5 • More parallelism → faster execution of instructions • More processors on a chip Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  7. 6 multi-processors Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  8. Intel Core i7: Quad-Core 7 Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  9. Intel Xeon Phi: 72 cores (2017) 8 Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  10. Handling Multiple Processes 9 • Kernel can keep multiple processes running • Each process is assigned to a core – each core has a local cache – all cores share a common cache, common memory • Synchronization between cores not trivial e.g., cache coherence Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  11. More Parallelism 10 • Multiple processes not always the best way to parallelize • Often, within a process parallel execution would be helpful • Example: matrix multiplication – loops over different parts of the data – instructions highly independent → can be executed in parallel Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  12. Multi-Threading 11 • Parallel execution within process • No switching of process context (e.g., virtual address space) • Supported by various libraries – pthread in C++ – thread in C++11 – thread in Python • Programmer has to take care of conflicts Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  13. 12 computer graphics Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  14. Computer Graphics 13 Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  15. Computer Graphics 14 Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  16. tl;dr 15 • Given – 3d models of objects – lighting, textures – ray tracing • Lots of vector and matrix operations • Color value for each pixel on the screen has to be computed Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  17. High Demand 16 • Computer games on regular PCs • Game consoles – Atari (1972-1996) – Nintendo/Wii (since 1977) – Playstation (since 1994) – X-Box (since 2001) • 100s of millions sold Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  18. 17 history Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  19. VGA Controller 18 Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  20. GPU 19 Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  21. Co-Processor 20 • CPU handles the bulk of the complexity • GPU focuses on specific problems Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  22. Graphics Pipeline 21 Initially: dedicated hardware for core steps Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  23. Unified GPU Architecture 22 Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  24. 23 gpu Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  25. Streaming Multiprocessor (SM) 24 • Fetches instruction (I-Cache) • Has to apply it over a vector of data • Each vector element is processed in one thread (MT Issue) • Thread is handled by scalar processor (SP) • Special function units (SFU) Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  26. Taxonomy 25 • SISD (single instruction, single data) – uni-processors (6502, Intel until 1990s) • MIMD (multi instruction, multiple data) – Intel Core i7 – multiple cores on a chip – each core runs instructions that operate on their own data • SIMD (single instruction, multiple data) – Streaming Multi-Processors – multiple cores on a chip – same instruction executed on different data Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  27. GPU Architecture 26 Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  28. Graphics Programming 27 • Libraries that support all steps of graphics pipeline • Open standard: OpenGL • Microsoft: Direct3D • Libraries handle mapping to GPU hardware Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  29. Direct3D Pipeline 28 Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  30. 29 more uses for gpus Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  31. Deep Learning 30 Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  32. Deep Learning 31 • The latest machine learning hype • Computationally – lots of matrix multiplications – lots of vector operations – massive data sets • Just what GPUs are good at Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  33. CUDA 32 • Extension of C++ to support general GPU programming • Fairly low-level – identify parts of program to be handled by GPU – define function to be executed by a thread – define how many threads are used • Key concepts – kernel = function to be executed by a thread – thread block = set of threads to be executed in parallel – thread grid = set of thread blocks Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  34. Example 33 • Serial loop void example(int n, float alpha, float *x, float *y) { for( int i=0; i<n; n++) y[i] = alpha * x[i] + y[i] } example(n, 2.0, x, y); • Parallel with CUDA #define THREADS 256 void cuda_example(int n, float alpha, float *x, float *y) { int i = blockIdx.x * blockDim.x + threadIDx.x; if (i < n) y[i] = alpha*x[i] + y[i]; } int nblocks = (n + THREADS - 1) / THREADS; cuda_example<<< nblocks, THREADS >>>(n, 2.0, x, y); Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  35. Memory Levels 34 Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  36. 35 multiprocessor architecture Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  37. Nvidia Titan V 36 • 80 streaming multiprocessors, 5120 cores, 640 tensor cores • Clock speed 1455 MHz • Memory size 12 GB, bandwidth 650 GB/sec • Retail price $2999 Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  38. Multithreaded Multiprocessor 37 Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  39. Single Instruction, Multiple Thread 38 • Each scalar processors – executes same instruction – on different data – has own register file • Branch synchronization – if threads diverge on conditional branches → execute different paths separately • Shared memory Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  40. 39 instructions Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  41. Basics 40 • Design more similar to MIPS than x86 • Various data types - each of different sizes – untyped bit arrays (8, 16, 32, 64 bits) – unsigned integers (8, 16, 32, 64 bits) – signed integers (8, 16, 32, 64 bits) – floating points (16, 32, 64 bits) Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  42. Basic Instructions 41 • Arithmetic instructions operate on registers – add d, a, b → d = a+b – mul d, a, b → d = a*b – mad d, a, b, c → d = a*b+c – mov d, a → d = a • Special functions handled by SFU processors – square root (sqrt) – sine (sin) – cosine (cos) – binary logarithm (lg2) Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  43. Memory Access 42 • Different memory spaces (global, shared, local, const) • Different data sizes (8, 16, 32, 64 bits) • Load (ld) and store (st) • Atomic memory read, write, add, min, max, and, ... Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

  44. Control Flow 43 • Branch (conditional on register value = 0) • Subroutine call: call, ret • Synchronization: bar.sync forces all threads to synchronize • Terminate thread: exit Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend