Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt
MICROSOFT RESEARCH
Helios: Heterogeneous Multiprocessing with Satellite Kernels
1
Helios: Heterogeneous Multiprocessing with Satellite Kernels Ed - - PowerPoint PPT Presentation
Helios: Heterogeneous Multiprocessing with Satellite Kernels Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen Hunt MICROSOFT RESEARCH 1 Once upon a time RAM CPU Single CPU Hardware was homogeneous 2 Once upon a
MICROSOFT RESEARCH
1
RAM
2
CPU
Single CPU
RAM
3
CPU
CPU
SMP
RAM
4
CPU CPU CPU CPU CPU CPU CPU CPU
RAM
CPU CPU CPU CPU CPU CPU CPU CPU
NUMA
5
GP-GPU
RAM
Programmable NIC
RAM RAM
CPU CPU CPU CPU CPU CPU CPU CPU
RAM
CPU CPU CPU CPU CPU CPU CPU CPU
NUMA
Simplify app development, deployment, and tuning Provide single programming model for heterogeneous systems
Satellite kernels: Same OS abstraction everywhere Remote message passing: Transparent IPC between kernels Affinity: Easily express arbitrary placement policies to OS 2-phase compilation: Run apps on arbitrary devices
6
Entire networking stack Entire file system Arbitrary applications
Eliminate resource contention with multiple kernels Eliminate remote memory accesses
7
Satellite kernels Remote message passing Affinity Encapsulating many ISAs
8
Kernel
9 CPU I/O device
driver 1010
App App
Kernel
Programmable device
10 CPU
driver
App App
JIT Sched. Mem. IPC
11
CPU Programmable device
FS App
Efficiently manage local resources Apps developed for single system call interface μkernel: Scheduler, memory manager, namespace manager
TCP
\\
12
Programmable device
App
NUMA
App FS App
Efficiently manage local resources Apps developed for single system call interface μkernel: Scheduler, memory manager, namespace manager
TCP
NUMA
\\
13
Programmable device
App
NUMA
App FS App
TCP
NUMA
\\
14
/fs /dev/nic0 /dev/disk0 /services/TCP /services/PNGEater /services/kernels/ARMv5
1.
2.
3.
Affinity is expressed in application metadata and acts as a hint Positive represents emphasis on communication – zero copy IPC Negative represents desire for non-interference
15
16
17
18
X86 NUMA
GP-GPU Programmable NIC
X86 NUMA
/services/kernels/vector-CPU platform affinity = +2 /services/kernels/x86 platform affinity = +1
+2
19
X86 NUMA
GP-GPU Programmable NIC
X86 NUMA
/services/kernels/vector-CPU platform affinity = +2 /services/kernels/x86 platform affinity = +1
+2 +1 +1
Ensure fast message passing between processes
20
X86 NUMA
GP-GPU Programmable NIC
X86 NUMA
/services/TCP communication affinity = +1 /services/PNGEater communication affinity = +2 /services/antivirus communication affinity = +3 X86 NUMA Programmable NIC
+1 TCP PNG A/V
Ensure fast message passing between processes
21
X86 NUMA
GP-GPU Programmable NIC
X86 NUMA
/services/TCP communication affinity = +1 /services/PNGEater communication affinity = +2 /services/antivirus communication affinity = +3 X86 NUMA Programmable NIC
+1 +2 TCP PNG A/V
Ensure fast message passing between processes
22
X86 NUMA
GP-GPU Programmable NIC
X86 NUMA
/services/TCP communication affinity = +1 /services/PNGEater communication affinity = +2 /services/antivirus communication affinity = +3 X86 NUMA Programmable NIC
+1 +5 TCP PNG A/V
Used as a means of avoiding resource contention
23
X86 NUMA
GP-GPU Programmable NIC
X86 NUMA
/services/kernels/x86 platform affinity = +100 /services/antivirus non-interference affinity = -1
A/V
Used as a means of avoiding resource contention
24
X86 NUMA
GP-GPU Programmable NIC
X86 NUMA
/services/kernels/x86 platform affinity = +100 /services/antivirus non-interference affinity = -1 X86 NUMA X86 NUMA
A/V
Used as a means of avoiding resource contention
25
X86 NUMA
GP-GPU Programmable NIC
X86 NUMA
/services/kernels/x86 platform affinity = +100 /services/antivirus non-interference affinity = -1 X86 NUMA
X86 NUMA
A/V
26
X86 NUMA
GP-GPU Programmable NIC
X86 NUMA
/services/webserver non-interference affinity = -1
W1
27
X86 NUMA
GP-GPU Programmable NIC
X86 NUMA
/services/webserver non-interference affinity = -1
W1
W2
28
X86 NUMA
GP-GPU Programmable NIC
X86 NUMA
/services/webserver non-interference affinity = -1
W1
W2 W3
First: Platform affinities Second: Other positive affinities Third: Negative affinities Fourth: CPU utilization
29
All apps first compiled to MSIL At install-time, apps compiled down to available ISAs
30
Added satellite kernels, remote message passing, and affinity
2.0 GHz ARM processor, Gig E, 256 MB of DRAM Satellite kernel identical to x86 (except for ARM asm bits) Roughly 7x slower than comparable x86
2 GHz CPU, 1 GB RAM per domain Satellite kernel on each NUMA domain.
31
Balance device support with support for basic abstractions GPUs headed in this direction (e.g., Intel Larrabee)
Need new compiler support for new platforms
Create satellite kernels out of commodity system Access to more applications
32
Satellite kernels Remote message passing Affinity Encapsulating many ISAs
33
34
NIC Kernel X86 X86 Satellite Kernel NIC XScale Satellite Kernel X86 NUMA Single Kernel X86 NUMA X86 NUMA Satellite Kernel X86 NUMA Satellite Kernel
35
Name LOC LOC changed LOM changed Networking stack 9600 1 FAT 32 FS 14200 1 TCP test harness 300 5 1 Disk indexer 900 1 Network driver 1700 Mail server 2700 1 Web server 1850 1
36
Name LOC LOC changed LOM changed Networking stack 9600 1 FAT 32 FS 14200 1 TCP test harness 300 5 1 Disk indexer 900 1 Network driver 1700 Mail server 2700 1 Web server 1850 1
37
Name LOC LOC changed LOM changed Networking stack 9600 1 FAT 32 FS 14200 1 TCP test harness 300 5 1 Disk indexer 900 1 Network driver 1700 Mail server 2700 1 Web server 1850 1
38
Name LOC LOC changed LOM changed Networking stack 9600 1 FAT 32 FS 14200 1 TCP test harness 300 5 1 Disk indexer 900 1 Network driver 1700 Mail server 2700 1 Web server 1850 1
39
PNG Size X86 Only uploads/sec X86+Xscale uploads/sec Speedup % reduction in context switches 28 KB 161 171 6% 54% 92 KB 55 61 12% 58% 150 KB 35 38 10% 65% 290 KB 19 21 10% 53%
40
10 20 30 40 50 60 70 80 90 Emails Per Second No Sat. Kernel
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Instructions Per Cycle (IPC) No Sat. Kernel
Multiple kernels – single system image
Focus on scale-out performance on large NUMA architectures
Custom run-time on programmable device
41
Simplify application development, deployment, tuning
Satellite kernels: Same OS abstraction everywhere Remote message passing: Transparent IPC between kernels Affinity: Easily express arbitrary placement policies to OS 2-phase compilation: Run apps on arbitrary devices
42