GAIL
The Graph Algorithm Iron Law
Scott Beamer, Krste Asanović, David Patterson
Berkeley
Electrical Engineering & Computer Sciences
gap.cs.berkeley.edu
GAIL The Graph Algorithm Iron Law Scott Beamer, Krste Asanovi , - - PowerPoint PPT Presentation
GAIL The Graph Algorithm Iron Law Scott Beamer, Krste Asanovi , David Patterson Berkeley GAP Electrical Engineering & Computer Sciences gap.cs.berkeley.edu Graph Applications UC Berkeley Social Network Recommendations CAD Analysis
Scott Beamer, Krste Asanović, David Patterson
Electrical Engineering & Computer Sciences
gap.cs.berkeley.edu
UC Berkeley
Graph Applications
2 Social Network Analysis Speech Recognition Recommendations Webpage Layout Bioinformatics CAD
UC Berkeley
Research Ongoing At All Levels
3
Algorithms Implementation Platform Applications GAIL
UC Berkeley
Need More Informative Metrics
Time
+ is often the the most important
Traversed edges per second (TEPS)
+ a rate, so can compare different inputs
4
UC Berkeley
Need More Informative Metrics
Time
+ is often the the most important
Traversed edges per second (TEPS)
+ a rate, so can compare different inputs
4
Time & TEPS only quantify which is fastest, no insight into why
UC Berkeley
Graph Performance Factors
For measurements: [Beamer, IISWC, 2015]
5 1 2 3
Algorithms - how much work to do Cache utility - how much data to move Memory bandwidth - how fast data moves
UC Berkeley
Graph Algorithm Iron Law (GAIL)
6
time kernel = edges kernel
edge time
x x
UC Berkeley
Graph Algorithm Iron Law (GAIL)
6
time kernel = edges kernel
edge time
x x annotate code to count edges traversed use performance counters to access total # of memory requests
UC Berkeley
Graph Algorithm Iron Law (GAIL)
7
time kernel = edges kernel
edge time
x x Role: Metric: algorithmic intensity algorithm designer
UC Berkeley
Graph Algorithm Iron Law (GAIL)
7
time kernel = edges kernel
edge time
x x Role: Metric: cache utility implementor
UC Berkeley
Graph Algorithm Iron Law (GAIL)
7
time kernel = edges kernel
edge time
x x Role: Metric: DRAM BW utilization system designer
UC Berkeley
Comparing BFS Implementations
8
3 BFS Approaches
reduce communication
algorithmically does less
Kronecker SCALE=27, 32 threads, Ivy Bridge
Time doesn’t explain speedup
UC Berkeley
BFS Analyzed by GAIL time kernel = edges kernel
edge time
x x
seconds B edges ns
9
UC Berkeley
BFS Strong Scaling Analyzed by GAIL
10 Kronecker USA Roads
UC Berkeley
Delta-Stepping Analyzed by GAIL
11 ~Dijkstra ~Bellman-Ford ∆ Parameter Parallelism Work Efficiency 1
∞
tradeoff
Single-source shortest paths algorithm
UC Berkeley
Delta-Stepping Analyzed by GAIL
12 USA roads, 8 threads, Ivy Bridge
UC Berkeley
GAP Benchmark Suite
Benchmark Specifications (technical report)
Portable, high-quality baseline code
13
gap.cs.berkeley.edu
UC Berkeley
Conclusion
14
time kernel = edges kernel
edge time
x x
gap.cs.berkeley.edu
UC Berkeley
Conclusion
GAIL concisely breaks down performance
14
time kernel = edges kernel
edge time
x x
gap.cs.berkeley.edu
UC Berkeley
Acknowledgements
15
Research partially funded by DARPA Award Number HR0011-12-2-0016, the Center for Future Architecture Research, a member of STARnet, a Semiconductor Research Corporation program sponsored by MARCO and DARPA, and ASPIRE Lab industrial sponsors and affiliates Intel, Google, Huawei, Nokia, NVIDIA, Oracle, and Samsung. Any
recommendations in this paper are solely those of the authors and does not necessarily reflect the position or the policy of the sponsors.
UC Berkeley
Bonus Slides
16
UC Berkeley
What Do GAIL Results Represent?
GAIL results are for a particular execution
Focused on single-server shared-memory GAIL requirements
memory requests
17
UC Berkeley
Why Not Complexity Analysis?
Formal complexity analysis is helpful, but…
topology-dependent, but often difficult to model real-world graphs
implementation optimizations
algorithms with a slower worst-case performance much faster in practice
18
UC Berkeley
What About Other Platforms?
GAIL is for single-server shared memory For other platforms, replace memory requests with equivalent bottleneck metric
19
UC Berkeley
Iron Law Reapplied time program = insts. program cycles inst. time cycle x x For CPUs: time kernel = edges kernel
edge time
x x For Graph Algorithms:
20
UC Berkeley
Graph Algorithm Iron Law (GAIL)
21
time kernel = edges kernel
edge time
x x time edge = 1 TEPS
kernel data transferred cache line size =
UC Berkeley
Evaluation Setup
23
Graph # Vertices # Edges Degree Diameter Degree Dist.
Roads of USA 23.9M 58.3M 2.4 High const Web Crawl of .sk Domain 50.6M 1949.4M 38.5 Medium power Kronecker Synthetic Graph 128.0M 2048.0M 16.0 Low power
Target Platform