Integrating Productivity-Oriented Programming Languages with High-Performance Data Structures
James Fairbanks Rohit Varkey Thankachan, Eric Hein, Brian Swenson
Georgia Tech Research Institute
September 13 2017
1 / 20
Integrating Productivity-Oriented Programming Languages with - - PowerPoint PPT Presentation
Integrating Productivity-Oriented Programming Languages with High-Performance Data Structures James Fairbanks Rohit Varkey Thankachan, Eric Hein, Brian Swenson Georgia Tech Research Institute September 13 2017 1 / 20 Graph Analysis
Georgia Tech Research Institute
1 / 20
◮ Applications: Cybersecurity, Social Media, Fraud Detection...
(a) Big Graphs (b) HPC (c) Productivity
2 / 20
◮ Purely High productivity Language with simple data structures ◮ Low level language core with high productivity language
Table 1: Libraries using the hybrid model
3 / 20
(a) z = exp(a + b2) (b) BFS from s Figure 2: Computations access patterns in scientific computing and graph analysis
◮ Less regular computation ◮ Diverse user defined functions beyond arithmetic ◮ Temporary allocations kill performance
4 / 20
5 / 20
◮ Since 2012 - pretty new! ◮ Multiple dispatch ◮ Dynamic Type system ◮ JIT Compiler ◮ Metaprogramming ◮ Single machine and Distributed Parallelism ◮ Open Source (MIT License)
6 / 20
◮ A complex data structure for graphs in C ◮ Parallel primitives for graph algorithms
7 / 20
◮ Two languages incurs development complexity ◮ All algorithms in Julia ◮ Reuse only the complex STINGER data structure from C ◮ Parallel constructs in Julia, NOT low level languages
8 / 20
◮ All algorithms in Julia ◮ Reuse only the complex STINGER data structure from C ◮ Parallel constructs in Julia, not low level languages ◮ Productivity + Performance!
9 / 20
◮ Standard benchmark for large graphs ◮ BFS on a RMAT graph
◮ 2scale vertices ◮ 2scale ∗ 16 edges
◮ Comparing BFS on graphs from scale 10 to 27 in C and using
◮ A multithreaded version of the BFS with up to 64 threads was
10 / 20
10 12 14 16 18 20 22 24 26 0.0 0.5 1.0 1.5 2.0 2.5
Normalized Runtime Threads = 1
Stinger StingerGraphs.jl 10 12 14 16 18 20 22 24 26
Normalized Runtime Threads = 6
10 12 14 16 18 20 22 24 26
Scale Normalized Runtime Threads = 12
10 12 14 16 18 20 22 24 26
Scale
0.0 0.5 1.0 1.5 2.0 2.5
Normalized Runtime Threads = 24
10 12 14 16 18 20 22 24 26
Scale Normalized Runtime Threads = 48
Figure 3: Graph500 Benchmark Results (Normalized to STINGER – C)
11 / 20
Table 3: Methods for synchronizing C heap with Julia memory Lazy vs Eager
12 / 20
Table 4: Iterators (I) vs Gathering successors (G) – all times in ms
13 / 20
Table 4: Iterators (I) vs Gathering successors (G) – all times in ms
13 / 20
◮ MPI style remote processes ◮ Cilk style Tasks that are lightweight “green” threads ◮ OpenMP style native multithreading support - @threads
14 / 20
◮ Atomic type on which atomic ops are dispatched ◮ Atomic{T} contains a reference to a Julia variable of type T ◮ Extra level of indirection for a vector of atomics
Figure 4: Julia provides easy access to LLVM/Clang intrinsics
15 / 20
Figure 5: Atomic data structures in Julia
16 / 20
Table 5: Atomics: Native (N) VS Unsafe (U) (Times in ms)
17 / 20
Table 6: Total time to run Graph500 BFS benchmark for all graphs scale 10-27, in minutes
18 / 20
1 6 12 24 48
Threads
2000 4000 6000 8000 10000
Runtime (seconds)
Scale 27 BFS
Stinger StingerGraphs.jl
Figure 6: Performance scaling with threads
19 / 20
◮ Tight integration between high productivity and high
◮ Julia is ready for HPC graph workloads ◮ Julia parallelism can compete with OpenMP parallelism ◮ We can expand HPC in High Level Languages beyond
20 / 20