Analysis of Data Reuse in Task Parallel Runtimes
Miquel Peric` as⋆, Abdelhalim Amer⋆, Kenjiro Taura† and Satoshi Matsuoka⋆
⋆Tokyo Institute of Technology †The University of Tokyo PMBS’13, Denver, November 18th 2013 1
Analysis of Data Reuse in Task Parallel Runtimes Miquel Peric` as , - - PowerPoint PPT Presentation
Analysis of Data Reuse in Task Parallel Runtimes Miquel Peric` as , Abdelhalim Amer , Kenjiro Taura and Satoshi Matsuoka Tokyo Institute of Technology The University of Tokyo PMBS13, Denver, November 18th 2013 1 Table
⋆Tokyo Institute of Technology †The University of Tokyo PMBS’13, Denver, November 18th 2013 1
PMBS’13, Denver, November 18th 2013 2
PMBS’13, Denver, November 18th 2013 3
Runtime Layer (Cilk, TBB, OpenMP, ..)
PMBS’13, Denver, November 18th 2013 4
4 8 12 16 20 24 1 2 4 6 12 18 24
Speed-Up Number of Cores Runtime A Runtime B Runtime C Runtime D Linear
PMBS’13, Denver, November 18th 2013 5
PMBS’13, Denver, November 18th 2013 6
PMBS’13, Denver, November 18th 2013 7
PMBS’13, Denver, November 18th 2013 8
1measured on Intel Xeon E7-4807 at 1.86GHz 2https://bitbucket.org/rioyokota/exafmm-dev
LIFO local task scheduling FIFO Work stealing
Task Queues
Work First
LIFO local task scheduling FIFO Work stealing
Task Queues
Help First
LIFO global task scheduling
NUMA node #2 (shepherd)
bulk FIFO work stealing
“shepherd”
PMBS’13, Denver, November 18th 2013 10
PMBS’13, Denver, November 18th 2013 11
4 8 12 16 20 24 1 2 4 6 12 18 24 Speed-Up Number of Cores
MassiveThreads Threading Building Blocks QThread/Core QThread/Socket Linear
4 8 12 16 20 24 1 2 4 6 12 18 24 Speed-Up Number of Cores
MassiveThreads Threading Building Blocks QThread/Core QThread/Socket Linear
PMBS’13, Denver, November 18th 2013 12
1 1.2 1.4 1.6 1.8 2 2.2 1 2 4 6 12 18 24 Non-Work Overheads Number of cores
MassiveThreads Threading Building Blocks QThread/Core QThread/Socket
1 1.2 1.4 1.6 1.8 2 2.2 1 2 4 6 12 18 24 Non-Work Overheads Number of cores
MassiveThreads Threading Building Blocks QThread/Core QThread/Socket
PMBS’13, Denver, November 18th 2013 13
PMBS’13, Denver, November 18th 2013 14
0.75 0.8 0.85 0.9 0.95 1 1.05 1 2 4 6 12 18 24
Normalized Speed-Up x Overhead Product
Number of Cores
MassiveThreads Threading Building Blocks QThread/Core QThread/Socket
0.75 0.8 0.85 0.9 0.95 1 1.05 1 2 4 6 12 18 24
Normalized Speed-Up x Overhead Product
Number of cores
MassiveThreads Threading Building Blocks QThread/Core QThread/Socket
PMBS’13, Denver, November 18th 2013 15
1https://perf.wiki.kernel.org 2http://hpctoolkit.org/ 3http://www.bsc.es/computer-sciences/performance-tools/paraver 4http://www.vampir.eu 5http://tau.uoregon.edu
PMBS’13, Denver, November 18th 2013 17
L1 L2 CORE #1 a b c d e a f g f 4 1 @
1 4 100 % dist
∞ L1 L2 CORE #1
a b c d e a f g f
4 1 @ L1 L2 CORE #2
e f g e h i j k i
2 2 @ L3
a e f b c g e d e a h i f j k g f i
1 4 100 % dist
∞
PMBS’13, Denver, November 18th 2013 18
PMBS’13, Denver, November 18th 2013 19
Kernel Access Trace CORE #1
4 5 7 9 1 2 3 6 8 10 11 12 first time accesses
Kernel Access Trace CORE #2
PMBS’13, Denver, November 18th 2013 20
PMBS’13, Denver, November 18th 2013 21
PMBS’13, Denver, November 18th 2013 22
PMBS’13, Denver, November 18th 2013 23
PMBS’13, Denver, November 18th 2013 24
20 40 60 80 100 3 2 K B 6 4 K B 1 2 8 K B 2 5 6 K B 5 1 2 K B 1 M B 2 M B 4 M B 8 M B 1 6 M B 3 2 M B 6 4 M B 1 2 8 M B 2 5 6 M B I N F
Reuse Ratio (%) Reuse Distance (Bytes)
MassiveThreads Threading Building Blocks QThread/Core QThread/Socket 10 20 30 40 50 60 70 80 90 100 2 5 6 1 K B 4 K B 1 6 K B 6 4 K B 2 5 6 K B 1 M B 4 M B 1 6 M B 6 4 M B I N F
Reuse Ratio (%) Reuse Distance (Bytes)
MassiveThreads Threading Building Blocks QThread/Core QThread/Socket
PMBS’13, Denver, November 18th 2013 25
10 20 30 40 50 60 70 80 90 100 3 2 K B 6 4 K B 1 2 8 K B 2 5 6 K B 5 1 2 K B 1 M B 2 M B 4 M B 8 M B 1 6 M B 3 2 M B 6 4 M B 1 2 8 M B 2 5 6 M B I N F
Reuse Ratio (%) Reuse Distance (Bytes)
MassiveThreads Threading Building Blocks QThread/Core QThread/Socket 10 20 30 40 50 60 70 80 90 100 2 5 6 1 K B 4 K B 1 6 K B 6 4 K B 2 5 6 K B 1 M B 4 M B 1 6 M B 6 4 M B I N F
Reuse Ratio (%) Reuse Distance (Bytes)
MassiveThreads Threading Building Blocks QThread/Core QThread/Socket
PMBS’13, Denver, November 18th 2013 26
10 20 30 40 50 60 70 80 90 100 3 2 K B 6 4 K B 1 2 8 K B 2 5 6 K B 5 1 2 K B 1 M B 2 M B 4 M B 8 M B 1 6 M B 3 2 M B 6 4 M B 1 2 8 M B 2 5 6 M B I N F
Reuse Ratio (%) Reuse Distance (Bytes)
MassiveThreads Threading Building Blocks QThread/Core QThread/Socket 10 20 30 40 50 60 70 80 90 100 2 5 6 1 K B 4 K B 1 6 K B 6 4 K B 2 5 6 K B 1 M B 4 M B 1 6 M B 6 4 M B I N F
Reuse Ratio (%) Reuse Distance (Bytes)
MassiveThreads Threading Building Blocks QThread/Core QThread/Socket
PMBS’13, Denver, November 18th 2013 27
80 85 90 95 100 4 M B 8 M B 1 6 M B 3 2 M B 6 4 M B 1 2 8 M B 2 5 6 M B I N F
Reuse Ratio (%) Reuse Distance (Bytes)
MassiveThreads Threading Building Blocks QThread/Core QThread/Socket 99.1% 88 90 92 94 96 98 100 1 M B 4 M B 1 6 M B 6 4 M B I N F
Reuse Ratio (%) Reuse Distance (Bytes)
MassiveThreads Threading Building Blocks QThread/Core QThread/Socket 97.5%
80 85 90 95 100 4 M B 8 M B 1 6 M B 3 2 M B 6 4 M B 1 2 8 M B 2 5 6 M B I N F
Reuse Ratio (%) Reuse Distance (Bytes)
MassiveThreads Threading Building Blocks QThread/Core QThread/Socket 96.2% 97.4% 88 90 92 94 96 98 100 1 M B 4 M B 1 6 M B 6 4 M B I N F
Reuse Ratio (%) Reuse Distance (Bytes)
MassiveThreads Threading Building Blocks QThread/Core QThread/Socket 95.6% 93.5%
PMBS’13, Denver, November 18th 2013 28
PMBS’13, Denver, November 18th 2013 29
93 94 95 96 97 98 1 6 M B 3 2 M B 6 4 M B 1 2 8 M B 2 5 6 M B
Reuse Ratio (%) Reuse Distance (Bytes)
MassiveThreads Threading Building Blocks QThread/Core QThread/Socket
PMBS’13, Denver, November 18th 2013 30
PMBS’13, Denver, November 18th 2013 31
PMBS’13, Denver, November 18th 2013 32
PMBS’13, Denver, November 18th 2013 33
PMBS’13, Denver, November 18th 2013 34
PMBS’13, Denver, November 18th 2013 35