GAIL The Graph Algorithm Iron Law Scott Beamer, Krste Asanovi , - - PowerPoint PPT Presentation

gail
SMART_READER_LITE
LIVE PREVIEW

GAIL The Graph Algorithm Iron Law Scott Beamer, Krste Asanovi , - - PowerPoint PPT Presentation

GAIL The Graph Algorithm Iron Law Scott Beamer, Krste Asanovi , David Patterson Berkeley GAP Electrical Engineering & Computer Sciences gap.cs.berkeley.edu Graph Applications UC Berkeley Social Network Recommendations CAD Analysis


slide-1
SLIDE 1

GAIL

The Graph Algorithm Iron Law

Scott Beamer, Krste Asanović, David Patterson

Berkeley

Electrical Engineering & Computer Sciences

gap.cs.berkeley.edu

GAP

slide-2
SLIDE 2

UC Berkeley

Graph Applications

2 Social Network Analysis Speech Recognition Recommendations Webpage Layout Bioinformatics CAD

slide-3
SLIDE 3

UC Berkeley

Research Ongoing At All Levels

3

Algorithms Implementation Platform Applications GAIL

slide-4
SLIDE 4

UC Berkeley

Need More Informative Metrics

Time

+ is often the the most important

  • requires other parameters matched

Traversed edges per second (TEPS)

+ a rate, so can compare different inputs

  • confusion about what counts as a TE

4

slide-5
SLIDE 5

UC Berkeley

Need More Informative Metrics

Time

+ is often the the most important

  • requires other parameters matched

Traversed edges per second (TEPS)

+ a rate, so can compare different inputs

  • confusion about what counts as a TE

4

Time & TEPS only quantify which is fastest, no insight into why

slide-6
SLIDE 6

UC Berkeley

Graph Performance Factors

For measurements: [Beamer, IISWC, 2015]

5 1 2 3

Algorithms - how much work to do Cache utility - how much data to move Memory bandwidth - how fast data moves

slide-7
SLIDE 7

UC Berkeley

Graph Algorithm Iron Law (GAIL)

6

time kernel = edges kernel

  • mem. req.

edge time

  • mem. req.

x x

slide-8
SLIDE 8

UC Berkeley

Graph Algorithm Iron Law (GAIL)

6

time kernel = edges kernel

  • mem. req.

edge time

  • mem. req.

x x annotate code to count edges traversed use performance counters to access total # of memory requests

slide-9
SLIDE 9

UC Berkeley

Graph Algorithm Iron Law (GAIL)

7

time kernel = edges kernel

  • mem. req.

edge time

  • mem. req.

x x Role: Metric: algorithmic intensity algorithm designer

slide-10
SLIDE 10

UC Berkeley

Graph Algorithm Iron Law (GAIL)

7

time kernel = edges kernel

  • mem. req.

edge time

  • mem. req.

x x Role: Metric: cache utility implementor

slide-11
SLIDE 11

UC Berkeley

Graph Algorithm Iron Law (GAIL)

7

time kernel = edges kernel

  • mem. req.

edge time

  • mem. req.

x x Role: Metric: DRAM BW utilization system designer

slide-12
SLIDE 12

UC Berkeley

Comparing BFS Implementations

8

3 BFS Approaches

  • Naive - classic top-down
  • Bitmap - uses bitmaps to

reduce communication

  • Direction-optimizing -

algorithmically does less

Kronecker SCALE=27, 32 threads, Ivy Bridge

Time doesn’t explain speedup

slide-13
SLIDE 13

UC Berkeley

BFS Analyzed by GAIL time kernel = edges kernel

  • mem. req.

edge time

  • mem. req.

x x

seconds B edges ns

  • mem. req.

9

slide-14
SLIDE 14

UC Berkeley

BFS Strong Scaling Analyzed by GAIL

10 Kronecker USA Roads

slide-15
SLIDE 15

UC Berkeley

Delta-Stepping Analyzed by GAIL

11 ~Dijkstra ~Bellman-Ford ∆ Parameter Parallelism Work Efficiency 1

tradeoff

Single-source shortest paths algorithm

slide-16
SLIDE 16

UC Berkeley

Delta-Stepping Analyzed by GAIL

12 USA roads, 8 threads, Ivy Bridge

slide-17
SLIDE 17

UC Berkeley

GAP Benchmark Suite

Benchmark Specifications (technical report)

  • standardize input graphs and rules
  • allows other implementations to compare

Portable, high-quality baseline code

  • Only requirement is C++11 & OpenMP
  • Built in testing to verify results

13

gap.cs.berkeley.edu

GAP

slide-18
SLIDE 18

UC Berkeley

Conclusion

14

time kernel = edges kernel

  • mem. req.

edge time

  • mem. req.

x x

gap.cs.berkeley.edu

slide-19
SLIDE 19

UC Berkeley

Conclusion

GAIL concisely breaks down performance

  • useful as a starting point for introspection
  • useful as simple model to weigh tradeoffs

14

time kernel = edges kernel

  • mem. req.

edge time

  • mem. req.

x x

gap.cs.berkeley.edu

slide-20
SLIDE 20

UC Berkeley

Acknowledgements

15

Research partially funded by DARPA Award Number HR0011-12-2-0016, the Center for Future Architecture Research, a member of STARnet, a Semiconductor Research Corporation program sponsored by MARCO and DARPA, and ASPIRE Lab industrial sponsors and affiliates Intel, Google, Huawei, Nokia, NVIDIA, Oracle, and Samsung. Any

  • pinions, findings, conclusions, or

recommendations in this paper are solely those of the authors and does not necessarily reflect the position or the policy of the sponsors.

slide-21
SLIDE 21

UC Berkeley

Bonus Slides

16

slide-22
SLIDE 22

UC Berkeley

What Do GAIL Results Represent?

GAIL results are for a particular execution

  • fixed: input, platform, implementation
  • changing any of above will change results

Focused on single-server shared-memory GAIL requirements

  • measure: runtime, traversed edges,

memory requests

  • algorithm has notion of “traversing” edge

17

slide-23
SLIDE 23

UC Berkeley

Why Not Complexity Analysis?

Formal complexity analysis is helpful, but…

  • Many algorithms’ runtime is input graph

topology-dependent, but often difficult to model real-world graphs

  • Hides many improvements to platform or

implementation optimizations

  • Can be overly pessimistic. Many

algorithms with a slower worst-case performance much faster in practice

18

slide-24
SLIDE 24

UC Berkeley

What About Other Platforms?

GAIL is for single-server shared memory For other platforms, replace memory requests with equivalent bottleneck metric

  • Clusters: packets or bytes transmitted
  • Flash/HD: blocks read from storage
  • Cache-less (XMT): memory requests OK

19

slide-25
SLIDE 25

UC Berkeley

Iron Law Reapplied time program = insts. program cycles inst. time cycle x x For CPUs: time kernel = edges kernel

  • mem. req.

edge time

  • mem. req.

x x For Graph Algorithms:

20

slide-26
SLIDE 26

UC Berkeley

Graph Algorithm Iron Law (GAIL)

21

time kernel = edges kernel

  • mem. req.

edge time

  • mem. req.

x x time edge = 1 TEPS

  • mem. req.

kernel data transferred cache line size =

slide-27
SLIDE 27

UC Berkeley

Evaluation Setup

23

Graph # Vertices # Edges Degree Diameter Degree Dist.

Roads of USA 23.9M 58.3M 2.4 High const Web Crawl of .sk Domain 50.6M 1949.4M 38.5 Medium power Kronecker Synthetic Graph 128.0M 2048.0M 16.0 Low power

Target Platform

  • Dual-socket Intel Ivy Bridge 3.3 GHz
  • Socket: 8 cores with 25MB L3 cache
  • DRAM: 128 GB DDR3-1600