AMD’s Unified CPU & GPU Processor Concept
Advanced Seminar Computer Engineering Sven Nobis
Institute of Computer Engineering (ZITI) University of Heidelberg
AMDs Unified CPU & GPU Processor Concept Advanced Seminar - - PowerPoint PPT Presentation
AMDs Unified CPU & GPU Processor Concept Advanced Seminar Computer Engineering Sven Nobis Institute of Computer Engineering (ZITI) University of Heidelberg February 5, 2014 Overview AMDs 1 Introduction Unified CPU & GPU
Institute of Computer Engineering (ZITI) University of Heidelberg
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
1 Introduction 2 Background
3 Related Work 4 The way to HSA
5 Heterogeneous System Architecture
6 Conclusion / Outlook
2/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
? Single-thread Performance
Time we are here
Enabled by: Moore’s Law Voltage
Scaling
Constrained by: Power Complexity
Single-Core Era
Moore’s Law
Assembly C/C++ Java … …
[8, P. 5] 3/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
? Single-thread Performance
Time we are here
Enabled by: Moore’s Law Voltage
Scaling
Constrained by: Power Complexity
Single-Core Era
Throughput Performance Time (# of processors) we are here
Enabled by:
Moore’s Law SMP architecture
Constrained by:
Power Parallel SW Scalability
Multi-Core Era
Assembly C/C++ Java … pthreads OpenMP / TBB …
[8, P. 5] 4/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
? Single-thread Performance
Time we are here
Enabled by: Moore’s Law Voltage
Scaling
Constrained by: Power Complexity
Single-Core Era
Modern Application Performance
Time (Data-parallel exploitation) we are here
Heterogeneous Systems Era
Enabled by:
Abundant data parallelism Power efficient GPUs
Temporarily Constrained by:
Programming models Comm.overhead Throughput Performance Time (# of processors) we are here
Enabled by:
Moore’s Law SMP architecture
Constrained by:
Power Parallel SW Scalability
Multi-Core Era
Assembly C/C++ Java … pthreads OpenMP / TBB … Shader CUDA OpenCL C++ and Java
[8, P. 5] 5/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
programmability barrier communication costs
AMD’s Unified CPU & GPU Processor Concept?
→ Heterogeneous System Architecture (HSA)
[3, P. 4] 6/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
programmability barrier communication costs
AMD’s Unified CPU & GPU Processor Concept?
→ Heterogeneous System Architecture (HSA)
[3, P. 4] 6/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
1 Introduction 2 Background
3 Related Work 4 The way to HSA
5 Heterogeneous System Architecture
6 Conclusion / Outlook
7/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
8/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
Proprietary Only for NVIDIA GPUs
Open standard ATI, NVIDIA, Intel, ... Not only GPUs
9/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
Platform Model
[10] 10/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
Execution Model
[5, P. 11] 11/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
1 Introduction 2 Background
3 Related Work 4 The way to HSA
5 Heterogeneous System Architecture
6 Conclusion / Outlook
12/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
Unified Virtual Addressing (UVA) in CUDA 4 Unified Memory in CUDA 6 → Developer view to the memory
Implicit copy & pinning
Shared Virtual Memory
13/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
1 Introduction 2 Background
3 Related Work 4 The way to HSA
5 Heterogeneous System Architecture
6 Conclusion / Outlook
14/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
Llano
[3, P. 2] [7, P. 7] 15/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
Different/partitioned physical memory per compute unit Multiple virtual memory address spaces
Same physical memory Same virtual memory for all compute units
PHYSICAL MEMORY
Multiple Virtual memory address spaces CPU0 GPU
VIRTUAL MEMORY1 PHYSICAL MEMORY
VA1->PA1 VA2->PA1
VIRTUAL MEMORY2
16/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
Different/partitioned physical memory per compute unit Multiple virtual memory address spaces
Same physical memory Same virtual memory for all compute units
PHYSICAL MEMORY
Multiple Virtual memory address spaces CPU0 GPU
VIRTUAL MEMORY1 PHYSICAL MEMORY
VA1->PA1 VA2->PA1
VIRTUAL MEMORY2 PHYSICAL MEMORY
Common Virtual Memory for all HSA agents CPU0 GPU
VIRTUAL MEMORY PHYSICAL MEMORY
VA->PA VA->PA
[2, P. 7], [2, P. 8] 16/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
Shared page table support
Same large address space as the CPU Page faulting
Coherent memory regions
Fully coherent shared memory model Like on today’s SMP CPU systems
17/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
1 Introduction 2 Background
3 Related Work 4 The way to HSA
5 Heterogeneous System Architecture
6 Conclusion / Outlook
18/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
Already mentioned with hUMA
19/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
Unified Programming Model
→ Treating the GPU as a remote processor
Programming languages like C++ Task parallel and data parallel APIs like C++ AMP
#include <iostream> #include <amp.h> using namespace concurrency; int main() // "Hello World" in C++ AMP { int v[11] = {'G', 'd', 'k', 'k', 'n', 31, 'v', 'n', 'q', 'k', 'c'}; array_view<int> av(11, v); parallel_for_each(av.extent, [=](index<1> idx) restrict(amp) { av[idx] += 1; }); for(unsigned int i = 0; i < av.extent.size(); i++) std::cout << static_cast<char>(av(i)); } [6] 20/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
Queuing - Current
[5, P.9] 21/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
Queuing - New!
[5, P.9] 22/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
HSA Intermediate Language
Bytecode Designed for data parallel programming GPU independent
to the Hardware Instruction Set of the current device
23/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
Compilation Stack Runtime Stack System (Kernel) Software
24/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
Compilation Stack
[5, P. 15] 25/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
Runtime-Stack
[5, P. 16] 26/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
27/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
OpenCL
28/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
BOLT Library
SIMPLE BOLT EXAMPLE
#include <bolt/sort.h> #include <vector> #include <algorithm> void main() { // generate random data (on host) std::vector<int> a(1000000); std::generate(a.begin(), a.end(), rand); // sort, run on best device bolt::sort(a.begin(), a.end()); }
[9, P.5] 29/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
BOLT and C++ AMP
BOLT FOR C++ AMP : USER-SPECIFIED FUNCTOR
#include <bolt/transform.h> #include <vector> struct SaxpyFunctor { float _a; SaxpyFunctor(float a) : _a(a) {}; float operator() (const float &xx, const float &yy) restrict(cpu,amp) { return _a * xx + yy; }; }; void main() { SaxpyFunctor s(100); std::vector<float> x(1000000); // initialization not shown std::vector<float> y(1000000); // initialization not shown std::vector<float> z(1000000); bolt::transform(x.begin(), x.end(), y.begin(), z.begin(), s); };
[9, P.6] 30/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
1 Introduction 2 Background
3 Related Work 4 The way to HSA
5 Heterogeneous System Architecture
6 Conclusion / Outlook
31/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
Simplifies development Open up new possibilities
Missing hardware with hUMA
→ Outlook
Software components not ready
32/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
Kaveri APU is available [1] Desktop APU Support for
hUMA Queuing
Can connect both DDR3 and GDDR5 [11]
Berlin ARM-Based: Seattle
[11] 33/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
34/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
35/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
36/37
AMD’s Unified CPU & GPU Processor Concept Sven Nobis Introduction Background
CPU vs. GPU OpenCL & CUDA
Related Work The way to HSA
Heterogeneous Unified Memory Access
HSA
Concepts System Components Development Tools
Conclusion / Outlook References
37/37