NUMA Support for Charm++
Does memory affinity matter?
Christiane Pousa Ribeiro Maxime Martinasso Jean-François Méhaut
NUMA Support for Charm++ Does memory affinity matter? Christiane - - PowerPoint PPT Presentation
NUMA Support for Charm++ Does memory affinity matter? Christiane Pousa Ribeiro Maxime Martinasso Jean-Franois Mhaut Outline Introduction Motivation NUMA Problem Support NUMA for Charm++ First Results Conclusion and
Christiane Pousa Ribeiro Maxime Martinasso Jean-François Méhaut
Node#0 Node#1 Node#2 Node#3 Node#4 Node#5 Node#6 Node#7
Node#0 Node#1 Node#2 Node#3 Node#4 Node#5 Node#6 Node#7
Node#0 Node#1 Node#2 Node#3 Node#4 Node#5 Node#6 Node#7
Node#0 Node#1 Node#2 Node#3 Node#4 Node#5 Node#6 Node#7
Node#0 Node#1 Node#2 Node#3 Node#4 Node#5 Node#6 Node#7
Node#0 Node#1 Node#2 Node#3 Node#4 Node#5 Node#6 Node#7
– First-touch: first memoy access
– System call to bind memory pages – Numactl, user-level tool to bind memory and to
– Libnuma an interface to place memory pages
500 1000 1500 2000 2500 3000 3500
kNeighbor Application - charm++ multicore64 Different Memory Allocators
ptmalloc tcmalloc NUMA ptmalloc + setcpu tcmalloc NUMA + setcpu
Memory Allocators average time (us)
8 16 50 100 150 200
kNeighbor Application - charm++ multicore64 (100 iteration)
bind interleave Number of cores Average time - 3-kN iteration (us)
8 16 20 40 60 80 100 120 Molecular2D - charm++ multicore64
bind interleave Number of cores step time (ms/step) 50 51 52 53 54 55 56 57 58 59 60
Molecular 2D - charm++ multicore64 Different Memory Allocators ptmalloc tcmalloc NUMA ptmalloc + setcpu tcmalloc NUMA + setcpu Memory Allocators Benchmark Time (ms)
MEM MEM
MEM MEM
CPU CPU
CPU CPU Node#2 Node#3 Node#0 Node#1
Heap
MEM MEM
MEM MEM
CPU CPU CPU CPU
Node#2 Node#3 Node#0 Node#1
Memory page
Heap
MEM MEM
MEM MEM
CPU CPU CPU CPU
Node#2 Node#3 Node#0 Node#1
Memory page
Heap
MEM MEM
MEM MEM
CPU CPU CPU CPU
Node#2 Node#3 Node#0 Node#1
Memory page
Heap
MEM MEM
MEM MEM
CPU CPU CPU CPU
Node#2 Node#3 Node#0 Node#1
Heap
MEM MEM
MEM MEM
CPU CPU CPU CPU
Node#2 Node#3 Node#0 Node#1 Virtual memory pages binded to physical memory banks
24 48 64 50000 100000 150000 200000 250000 300000 350000
Charm - Memory Affinity Kn Application
maffinity interleave Number of Cores Time (us)
24 48 64 10 20 30 40 50 60 70
Charm - Memory affinity Mol2d Application
maffinity interleave Number of Cores Time in ms
MEM MEM
MEM MEM
CPU CPU CPU CPU
Node#2 Node#3 Node#0 Node#1
One heap per core of a node Memory Node#2
MEM MEM
MEM MEM
CPU CPU CPU CPU
Node#2 Node#3 Node#0 Node#1
Thread running on node#0 calls malloc Memory Node#0 core0 core1 core2 core3 Memory is allocated from heap 'core0' Thread running on node#0 calls malloc Thread running on node#3 calls free for memory allocated by thread Memory is returned to heap 'core0'
pousa@imag.fr