NUMA obliviousness through memory mapping
Mrunal Gawade Martin Kersten CWI, Amsterdam DaMoN 2015 (1st June 2015) Melbourne, Australia
NUMA obliviousness through memory mapping Mrunal Gawade - - PowerPoint PPT Presentation
NUMA obliviousness through memory mapping Mrunal Gawade Martin Kersten CWI, Amsterdam DaMoN 2015 (1 st June 2015) Melbourne, Australia NUMA architecture Intel Xeon E5-4657L v2 @2.40GHz Memory mapping What is it? Operating system maps
Mrunal Gawade Martin Kersten CWI, Amsterdam DaMoN 2015 (1st June 2015) Melbourne, Australia
Intel Xeon E5-4657L v2 @2.40GHz
5 1 1 5 2 2 5 3 3 5 , 1 , 2 1 , 2 2 , 3 3 T i m e ( s e c ) S
k e t s
w h i c h m e m
y i s a l l
a t e d
numactl -N 0,1 -m “Varied between sockets 0-3” “Database server process”
NUMA oblivious (shared-everything) is relatively good
Effect of memory mapping on NUMA obliviousness
Distributed database system using multi-sockets (shared-
NUMA_Obliv- (shared everything) Default parallel plans in MonetDB Only “Lineitem” table is sliced
NUMA_Shard- (Variation of NUMA_Obliv) Shard aware plans in MonetDB “Lineitem” and “Orders” table sharded in 4 pieces (orderkey) and sliced
NUMA_Distr- (shared nothing) Socket aware plans in MonetDB “Lineitem” and “Orders” table sharded in 4 pieces(orderkey), and sliced Dimension tables replicated
Intel Xeon E5-4657L v2 @2.40GHz, 4 sockets, 12 cores per socket (total 96
threads with Hyper-threading)
Cache - L1=32KB, L2 =256KB, shared L3=30MB. 1TB four channel DDR3 memory, (256 GB memory / socket). O.S. - Fedora 20 Data-set- TPC-H 100GB Tools – numactl, Intel PCM, Linux Perf MonetDB open-source system with memory mapped columnar storage
1 2 3 4 5 6 4 6 1 5 1 9 T i m e ( s e c ) T P C
Q u e r i e s
N U M A _ O b l i v N U M A _ S h a r d N U M A _ D i s t r
NUMA_Shard is a variation of NUMA_Obliv with sharded & partitioned “orders” table.
Selection on “lineitem” table Easily parallelizable NUMA effects analysis is easy (read only query)
1 2 3 4 5 6 7 8 1 2 2 4 3 6 4 8 6 0 7 2 8 4 9 6 M e m
y a c c e s s e s i n M i l l i
s N u m b e r
t h r e a d s
L
a l m e m
y a c c e s s R e m
e m e m
y a c c e s s
1 2 3 4 5 6 7 8 1 2 2 4 3 6 4 8 6 0 7 2 8 4 9 6 M e m
y a c c e s s e s i n M i l l i
s N u m b e r
t h r e a d s
L
a l m e m
y a c c e s s R e m
e m e m
y a c c e s s
1 2 3 4 5 6 7 8 1 2 2 4 3 6 4 8 6 0 7 2 8 4 9 6 M e m
y a c c e s s e s i n M i l l i
s N u m b e r
t h r e a d s
L
a l m e m
y a c c e s s R e m
e m e m
y a c c e s s
Process and memory affinity = PMA Buffer cache cleared = BCC (echo 3 | sudo /usr/bin/tee /proc/sys/vm/drop caches)
PMA= yes, BCC=yes PMA= no, BCC=yes PMA= no, BCC=no
5 1 1 5 2 2 5 3 3 5 1 2 2 4 3 6 4 8 6 0 7 2 8 4 9 6 T i m e ( m i l l i
e c
d N u m b e r
t h r e a d s 5 1 1 5 2 2 5 3 3 5 1 2 2 4 3 6 4 8 6 0 7 2 8 4 9 6 T i me ( mi l l i
e c
d s ) N u m b e r
t h r e a d s 5 1 1 5 2 2 5 3 3 5 1 2 2 4 3 6 4 8 6 0 7 2 8 4 9 6 T i m e ( m i l l i
e c
d s ) N u m b e r
t h r e a d s
PMA= yes, BCC=yes PMA= no, BCC=yes PMA= no, BCC=no
Most robust Less robust Least robust Process and memory affinity = PMA Buffer cache cleared = BCC (echo 3 | sudo /usr/bin/tee /proc/sys/vm/drop caches)
2 4 6 8 1 1 2 2 4 3 6 4 8 P r
t i
m a p p e d p a g e s N u m b e r
t h r e a d s s
k e t s
k e t 1 s
k e t 2 s
k e t 3
/proc/process id/numa maps
5 1 1 5 2 1 2 2 4 3 6 4 8 6 0 7 2 8 4 9 6 # C P U m i g r a t i
s N u m b e r
t h r e a d s
2 4 6 8 1 1 2 1 4 1 6 N U M A _ O b l i vN U M A _ D i s t r T i m e ( m i l l i
e c
d s ) M
i fj e d T P C
Q 6
#Local Access # Remote Access NUMA_Obliv 69 Million (M) 136 M NUMA_Distr 196 M 9 M
1 2 3 4 5 6 4 6 1 5 1 9 T i m e ( s e c ) T P C
Q u e r i e s M
e t D B N U M A _ S h a r d M
e t D B N U M A _ D i s t r V e c t
_ D e f V e c t
_ D i s t r
Vectorwise has no NUMA awareness and also uses a dedicated buffer manager
. 5 1 1 . 5 2 2 . 5 3 3 . 5 4 6 9 1 2 1 4 1 5 1 9 T i m e ( s e c ) T P C
Q u e r i e s M
e t D B N U M A _ D i s t r H y p e r
2.5 2 1.15 5.7 2.3 The RED numbers indicate speed-up of Hyper over MonetDB NUMA_Distr plans. Hyper generates NUMA aware, LLVM JIT compiled fused operator pipeline plans.
robustly.
with the state of the art database.