Multi-Processors and GPU
Philipp Koehn 2 May 2018
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
Multi-Processors and GPU Philipp Koehn 2 May 2018 Philipp Koehn - - PowerPoint PPT Presentation
Multi-Processors and GPU Philipp Koehn 2 May 2018 Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018 Predicted CPU Clock Speed 1 Clock speed 1971: 740 kHz, 2018: 45 GHz Source: Kurzweil "The Singularity
Philipp Koehn 2 May 2018
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
1
Clock speed 1971: 740 kHz, 2018: 45 GHz Source: Kurzweil "The Singularity is Near" (2005)
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
2
Clock speed 2018: 3 GHz
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
3
Intel estimate, around 2000: 400 kW by 2018?
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
4
Number of transitors per chip still exponential
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
5
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
6
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
7
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
8
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
9
– each core has a local cache – all cores share a common cache, common memory
e.g., cache coherence
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
10
matrix multiplication – loops over different parts of the data – instructions highly independent → can be executed in parallel
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
11
– pthread in C++ – thread in C++11 – thread in Python
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
12
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
13
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
14
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
15
– 3d models of objects – lighting, textures – ray tracing
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
16
– Atari (1972-1996) – Nintendo/Wii (since 1977) – Playstation (since 1994) – X-Box (since 2001)
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
17
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
18
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
19
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
20
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
21
Initially: dedicated hardware for core steps
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
22
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
23
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
24
(MT Issue)
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
25
– uni-processors (6502, Intel until 1990s)
– Intel Core i7 – multiple cores on a chip – each core runs instructions that operate on their own data
– Streaming Multi-Processors – multiple cores on a chip – same instruction executed on different data
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
26
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
27
OpenGL
Direct3D
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
28
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
29
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
30
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
31
– lots of matrix multiplications – lots of vector operations – massive data sets
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
32
– identify parts of program to be handled by GPU – define function to be executed by a thread – define how many threads are used
– kernel = function to be executed by a thread – thread block = set of threads to be executed in parallel – thread grid = set of thread blocks
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
33
void example(int n, float alpha, float *x, float *y) { for( int i=0; i<n; n++) y[i] = alpha * x[i] + y[i] } example(n, 2.0, x, y);
#define THREADS 256 void cuda_example(int n, float alpha, float *x, float *y) { int i = blockIdx.x * blockDim.x + threadIDx.x; if (i < n) y[i] = alpha*x[i] + y[i]; } int nblocks = (n + THREADS - 1) / THREADS; cuda_example<<< nblocks, THREADS >>>(n, 2.0, x, y);
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
34
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
35
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
36
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
37
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
38
– executes same instruction – on different data – has own register file
– if threads diverge on conditional branches → execute different paths separately
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
39
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
40
– untyped bit arrays (8, 16, 32, 64 bits) – unsigned integers (8, 16, 32, 64 bits) – signed integers (8, 16, 32, 64 bits) – floating points (16, 32, 64 bits)
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
41
– add d, a, b → d = a+b – mul d, a, b → d = a*b – mad d, a, b, c → d = a*b+c – mov d, a → d = a
– square root (sqrt) – sine (sin) – cosine (cos) – binary logarithm (lg2)
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
42
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
43
call, ret
bar.sync forces all threads to synchronize
exit
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
44
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
45
(fast access, high bandwidth, lots of pins)
L2 cache associated with each DRAM chip
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
46
external DRAM (not on chip)
per streaming multiprocessor
in DRAM, but cached on chip
read-only, in DRAM
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018
47
Philipp Koehn Computer Systems Fundamentals: Multi-Processors and GPU 2 May 2018