A High-Quality and Fast Maximal Independent Set Algorithm for GPUs - - PowerPoint PPT Presentation

a high quality and fast maximal independent set algorithm
SMART_READER_LITE
LIVE PREVIEW

A High-Quality and Fast Maximal Independent Set Algorithm for GPUs - - PowerPoint PPT Presentation

A High-Quality and Fast Maximal Independent Set Algorithm for GPUs Martin Burtscher and Sindhu Devale Department of Computer Science Overview Introduction Serial and parallel algorithms Our parallel algorithm Optimizations


slide-1
SLIDE 1

A High-Quality and Fast Maximal Independent Set Algorithm for GPUs

Martin Burtscher and Sindhu Devale

Department of Computer Science

slide-2
SLIDE 2

Overview

Introduction Serial and parallel algorithms Our parallel algorithm Optimizations Results Summary

A High-Quality and Fast MIS Algorithm 2

slide-3
SLIDE 3

Maximal Independent Set

Maximal independent set (MIS)

Subset of vertices of undirected graph Vertices in subset are independent (not adjacent) Subset is maximal (all other vertices are adjacent) Not unique

Largest possible MIS

Maximum independent set NP-hard to compute

A High-Quality and Fast MIS Algorithm 3

d g b c e f a i h

slide-4
SLIDE 4

Importance of MIS

Building block of many parallel graph algorithms

Graph coloring Maximal matching 2-satisfiability Maximal set packing Odd set cover problem etc.

A High-Quality and Fast MIS Algorithm 4

d g b c e f a i h

slide-5
SLIDE 5

Importance of MIS (cont.)

Parallelization of complex computations

Supports arbitrary and dynamically changing conflicts

  • 1. Build graph (vertices = computations, edges = conflicts)
  • 2. Compute MIS
  • 3. Run computations in MIS in parallel (w/o locks or atomics)
  • 4. Repeat if necessary

E.g., Delaunay mesh refinement

Approach is only useful if MIS can be computed

quickly in parallel and benefits from large sets

A High-Quality and Fast MIS Algorithm 5

slide-6
SLIDE 6

Highlights

ECL-MIS algorithm for massively-parallel devices

Fastest MIS runtimes on modern GPUs

Randomized permutation selection function

Largest set sizes among many MIS algorithms

New optimizations

Enhance performance and reduce memory footprint

A High-Quality and Fast MIS Algorithm 6

slide-7
SLIDE 7

Serial Algorithm

A High-Quality and Fast MIS Algorithm 7

slide-8
SLIDE 8

Serial Algorithm

Repeating steps

Visit unvisited vertex Add vertex to set if no

graph neighbors in set

Example

Start with empty set

Set = {}

A High-Quality and Fast MIS Algorithm 8

d g b c e f a i h

slide-9
SLIDE 9

Serial Algorithm

Repeating steps

Visit unvisited vertex Add vertex to set if no

graph neighbors in set

Example

a has no neighbor in set Add vertex a

Set = {a}

A High-Quality and Fast MIS Algorithm 9

d g b c e f a i h

slide-10
SLIDE 10

Serial Algorithm

Repeating steps

Visit unvisited vertex Add vertex to set if no

graph neighbors in set

Example

b has neighbor in set Discard vertex b

Set = {a}

A High-Quality and Fast MIS Algorithm 10

d g b c e f a i h

slide-11
SLIDE 11

Serial Algorithm

Repeating steps

Visit unvisited vertex Add vertex to set if no

graph neighbors in set

Example

c has neighbor in set Discard vertex c

Set = {a}

A High-Quality and Fast MIS Algorithm 11

d g b c e f a i h

slide-12
SLIDE 12

Serial Algorithm

Repeating steps

Visit unvisited vertex Add vertex to set if no

graph neighbors in set

Example

d has neighbor in set Discard vertex d

Set = {a}

A High-Quality and Fast MIS Algorithm 12

d g b c e f a i h

slide-13
SLIDE 13

Serial Algorithm

Repeating steps

Visit unvisited vertex Add vertex to set if no

graph neighbors in set

Example

e has no neighbor in set Add vertex e

Set = {a, e}

A High-Quality and Fast MIS Algorithm 13

d g b c e f a i h

slide-14
SLIDE 14

Serial Algorithm

Repeating steps

Visit unvisited vertex Add vertex to set if no

graph neighbors in set

Example

f has neighbor in set Discard vertex f

Set = {a, e}

A High-Quality and Fast MIS Algorithm 14

d g b c e f a i h

slide-15
SLIDE 15

Serial Algorithm

Repeating steps

Visit unvisited vertex Add vertex to set if no

graph neighbors in set

Example

g has neighbor in set Discard vertex g

Set = {a, e}

A High-Quality and Fast MIS Algorithm 15

d g b c e f a i h

slide-16
SLIDE 16

Serial Algorithm

Repeating steps

Visit unvisited vertex Add vertex to set if no

graph neighbors in set

Example

h has no neighbor in set Add vertex h

Set = {a, e, h}

A High-Quality and Fast MIS Algorithm 16

d g b c e f a i h

slide-17
SLIDE 17

Serial Algorithm

Repeating steps

Visit unvisited vertex Add vertex to set if no

graph neighbors in set

Example

i has neighbor in set Discard vertex i

MIS = {a, e, h}

A High-Quality and Fast MIS Algorithm 17

d g b c e f a i h

slide-18
SLIDE 18

Luby’s Random-Priority Parallel MIS Algorithm

A High-Quality and Fast MIS Algorithm 18

slide-19
SLIDE 19

Random-Priority Algorithm (Luby)

Repeating steps

Assign random priorities Add vertices with highest

local priority to set

Remove their neighbors

from graph

Set = {}

A High-Quality and Fast MIS Algorithm 19

d g b c e f a i h

slide-20
SLIDE 20

Random-Priority Algorithm (Luby)

Repeating steps

Assign random priorities Add vertices with highest

local priority to set

Remove their neighbors

from graph

Set = {}

A High-Quality and Fast MIS Algorithm 20

d4 g4 b7 c1 e6 f3 a5 i2 h3

slide-21
SLIDE 21

Random-Priority Algorithm (Luby)

Repeating steps

Assign random priorities Add vertices with highest

local priority to set

Remove their neighbors

from graph

Set = {b, e}

A High-Quality and Fast MIS Algorithm 21

d4 g4 b7 c1 e6 f3 a5 i2 h3

slide-22
SLIDE 22

Random-Priority Algorithm (Luby)

Repeating steps

Assign random priorities Add vertices with highest

local priority to set

Remove their neighbors

from graph

Set = {b, e}

A High-Quality and Fast MIS Algorithm 22

d g b c e f a i h

slide-23
SLIDE 23

Random-Priority Algorithm (Luby)

Repeating steps

Assign random priorities Add vertices with highest

local priority to set

Remove their neighbors

from graph

Set = {b, e}

A High-Quality and Fast MIS Algorithm 23

d g b c9 e f a i6 h5

slide-24
SLIDE 24

Random-Priority Algorithm (Luby)

Repeating steps

Assign random priorities Add vertices with highest

local priority to set

Remove their neighbors

from graph

Set = {b, c, e, i}

A High-Quality and Fast MIS Algorithm 24

d g b c9 e f a i6 h5

slide-25
SLIDE 25

Random-Priority Algorithm (Luby)

Repeating steps

Assign random priorities Add vertices with highest

local priority to set

Remove their neighbors

from graph

MIS = {b, c, e, i}

A High-Quality and Fast MIS Algorithm 25

d g b c e f a i h

slide-26
SLIDE 26

Random-Permutation Parallel MIS Algorithm

A High-Quality and Fast MIS Algorithm 26

slide-27
SLIDE 27

Random-Permutation Algorithm

Initialization

Assign random priorities

Repeating steps

Add vertices with highest

local priority to set

Remove neighbors and

their edges from graph

Set = {}

A High-Quality and Fast MIS Algorithm 27

d g b c e f a i h

slide-28
SLIDE 28

Random-Permutation Algorithm

Initialization

Assign random priorities

Repeating steps

Add vertices with highest

local priority to set

Remove neighbors and

their edges from graph

Set = {}

A High-Quality and Fast MIS Algorithm 28

d4 g4 b7 c1 e6 f3 a5 i2 h3

slide-29
SLIDE 29

Random-Permutation Algorithm

Initialization

Assign random priorities

Repeating steps

Add vertices with highest

local priority to set

Remove neighbors and

their edges from graph

Set = {b, e}

A High-Quality and Fast MIS Algorithm 29

d4 g4 b7 c1 e6 f3 a5 i2 h3

slide-30
SLIDE 30

Random-Permutation Algorithm

Initialization

Assign random priorities

Repeating steps

Add vertices with highest

local priority to set

Remove neighbors and

their edges from graph

Set = {b, e}

A High-Quality and Fast MIS Algorithm 30

d4 g4 b7 c1 e6 f3 a5 i2 h3

slide-31
SLIDE 31

Random-Permutation Algorithm

Initialization

Assign random priorities

Repeating steps

Add vertices with highest

local priority to set

Remove neighbors and

their edges from graph

Set = {b, c, e, h}

A High-Quality and Fast MIS Algorithm 31

d4 g4 b7 c1 e6 f3 a5 i2 h3

slide-32
SLIDE 32

Random-Permutation Algorithm

Initialization

Assign random priorities

Repeating steps

Add vertices with highest

local priority to set

Remove neighbors and

their edges from graph

MIS = {b, c, e, h}

A High-Quality and Fast MIS Algorithm 32

d4 g4 b7 c1 e6 f3 a5 i2 h3

slide-33
SLIDE 33

Luby’s Random-Selection Parallel MIS Algorithm

A High-Quality and Fast MIS Algorithm 33

slide-34
SLIDE 34

Random-Selection Algorithm (Luby)

Repeating steps

Mark vertices with

probability 0.5/degree

Add marked vertices to set

if no marked neighbors

Remove their neighbors

from graph

Set = {}

A High-Quality and Fast MIS Algorithm 34

d g b c e f a i h

slide-35
SLIDE 35

Random-Selection Algorithm (Luby)

Repeating steps

Mark vertices with

probability 0.5/degree

Add marked vertices to set

if no marked neighbors

Remove their neighbors

from graph

Set = {}

A High-Quality and Fast MIS Algorithm 35

di go bi co eo fo ao ii hi

slide-36
SLIDE 36

Random-Selection Algorithm (Luby)

Repeating steps

Mark vertices with

probability 0.5/degree

Add marked vertices to set

if no marked neighbors

Remove their neighbors

from graph

Set = {b, d}

A High-Quality and Fast MIS Algorithm 36

di go bi co eo fo ao ii hi

slide-37
SLIDE 37

Random-Selection Algorithm (Luby)

Repeating steps

Mark vertices with

probability 0.5/degree(v)

Add marked vertices to set

if no marked neighbors

Remove their neighbors

from graph

Set = {b, d}

A High-Quality and Fast MIS Algorithm 37

d g b c e f a i h

slide-38
SLIDE 38

Random-Selection Algorithm (Luby)

Repeating steps

Mark vertices with

probability 0.5/degree

Add marked vertices to set

if no marked neighbors

Remove their neighbors

from graph

Set = {b, d}

A High-Quality and Fast MIS Algorithm 38

d g b ci e fi a ii ho

slide-39
SLIDE 39

Random-Selection Algorithm (Luby)

Repeating steps

Mark vertices with

probability 0.5/degree

Add marked vertices to set

if no marked neighbors

Remove their neighbors

from graph

Set = {b, c, d, f, i}

A High-Quality and Fast MIS Algorithm 39

d g b ci e fi a ii ho

slide-40
SLIDE 40

Random-Selection Algorithm (Luby)

Repeating steps

Mark vertices with

probability 0.5/degree

Add marked vertices to set

if no marked neighbors

Remove their neighbors

from graph

MIS = {b, c, d, f, i}

A High-Quality and Fast MIS Algorithm 40

d g b c e f a i h

slide-41
SLIDE 41

ECL-MIS Our Permutation-Selection Parallel MIS Algorithm

A High-Quality and Fast MIS Algorithm 41

slide-42
SLIDE 42

Our Permutation-Selection Algorithm

Initialization

Assign priorities ~ 1/deg Randomize within level

Repeating steps

Add vertices with highest

local priority to set

Remove their neighbors

from graph

Set = {}

A High-Quality and Fast MIS Algorithm 42

d g b c e f a i h

slide-43
SLIDE 43

Our Permutation-Selection Algorithm

Initialization

Assign priorities ~ 1/deg Randomize within level

Repeating steps

Add vertices with highest

local priority to set

Remove their neighbors

from graph

Set = {}

A High-Quality and Fast MIS Algorithm 43

d76 g35 b87 c82 e74 f73 a65 i81 h89

slide-44
SLIDE 44

Our Permutation-Selection Algorithm

Initialization

Assign priorities ~ 1/deg Randomize within level

Repeating steps

Add vertices with highest

local priority to set

Remove their neighbors

from graph

Set = {}

A High-Quality and Fast MIS Algorithm 44

d76 g35 b87 c82 e74 f73 a65 i81 h89

slide-45
SLIDE 45

Our Permutation-Selection Algorithm

Initialization

Assign priorities ~ 1/deg Randomize within level

Repeating steps

Add vertices with highest

local priority to set

Remove their neighbors

from graph

Set = {b, c, d, h}

A High-Quality and Fast MIS Algorithm 45

d76 g35 b87 c82 e74 f73 a65 i81 h89

slide-46
SLIDE 46

Our Permutation-Selection Algorithm

Initialization

Assign priorities ~ 1/deg Randomize within level

Repeating steps

Add vertices with highest

local priority to set

Remove their neighbors

from graph

Set = {b, c, d, h}

A High-Quality and Fast MIS Algorithm 46

d76 g35 b87 c82 e74 f73 a65 i81 h89

slide-47
SLIDE 47

Our Permutation-Selection Algorithm

Initialization

Assign priorities ~ 1/deg Randomize within level

Repeating steps

Add vertices with highest

local priority to set

Remove their neighbors

from graph

Set = {b, c, d, f, h}

A High-Quality and Fast MIS Algorithm 47

d76 g35 b87 c82 e74 f73 a65 i81 h89

slide-48
SLIDE 48

Our Permutation-Selection Algorithm

Initialization

Assign priorities ~ 1/deg Randomize within level

Repeating steps

Add vertices with highest

local priority to set

Remove their neighbors

from graph

MIS = {b, c, d, f, h}

A High-Quality and Fast MIS Algorithm 48

d g b c e f a i h

slide-49
SLIDE 49

ECL-MIS Features

Single initialization

Requires less work (faster) Enables asynchronous implementation (faster)

Permutation-selection function

Boosts set size (higher-quality result) Requires only a few bits (lower memory footprint)

Combined priority and status information

Reduces storage (lower memory footprint) Minimizes memory accesses (faster)

A High-Quality and Fast MIS Algorithm 49

slide-50
SLIDE 50

Permutation-Selection Function

Requirements

Has to work for all graphs Needs to be proportional to 1/degree Do not know highest degree (but know average)

Our solution

priority(v) = avg_degree / (avg_degree + degree*(v))

Degree* includes random fraction (e.g., 3.xyz) Scaled to small integer (e.g., a byte)

A High-Quality and Fast MIS Algorithm 50

slide-51
SLIDE 51

Permutation-Selection Function

A High-Quality and Fast MIS Algorithm 51 Wide range at low degrees Narrow range at high degrees 50% of range is below avg degree Makes ties unlikely Ties are likely but unimportant

slide-52
SLIDE 52

Combining Information

Standard implementation (2 arrays)

1st array: vertex state (undecided, in set, out of set) 2nd array: vertex priority (random number)

Our implementation (1 array)

7 MSBs hold combined status and priority

Reserved highest value: in set (= higher than its neighbors) Reserved lowest value: out of set (= removed from graph) Remaining values = priority

LSB = decided/undecided (to boost performance)

A High-Quality and Fast MIS Algorithm 52

in prio prio

. . .

prio prio

  • ut
slide-53
SLIDE 53

Results

A High-Quality and Fast MIS Algorithm 53

slide-54
SLIDE 54

Methodology

System

GPUs: Titan X and K40, nvcc 8.0 CPUs: 2 Xeon E5-2687W v3 (20 cores, 3.1GHz), gcc 5.3

GPU MIS codes

CUSP, ECL, IrGL, and Pannotia

CPU MIS codes

Ligra, Ligra+, and PBBS (Cilk and OpenMP, incremental

and non-deterministic)

PBBS (serial)

A High-Quality and Fast MIS Algorithm 54

slide-55
SLIDE 55

Input Graphs

16 graphs

Real-world + synth. All made undirected

Sizes

66k – 24M vertices 387k – 524M edges 0 – 214k degrees

A High-Quality and Fast MIS Algorithm 55 vertices edges min max avg 2d-2e20.sym 1,048,576 4,190,208 2 4 4.0 amazon0601 403,394 4,886,816 1 2,752 12.1 as-skitter 1,696,415 22,190,596 1 35,455 13.1 citationCiteseer 268,495 2,313,294 1 1,318 8.6 cit-Patents 3,774,768 33,037,894 1 793 8.8 coPapersDBLP 540,486 30,491,458 1 3,299 56.4 delaunay_n24 16,777,216 100,663,202 3 26 6.0 in-2004 1,382,908 27,182,946 21,869 19.7 internet 124,651 387,240 1 151 3.1 kron_g500-logn21 2,097,152 182,081,864 0 213,904 86.8 r4-2e23.sym 8,388,608 67,108,846 2 26 8.0 rmat16.sym 65,536 967,866 569 14.8 rmat22.sym 4,194,304 65,660,814 3,687 15.7 uk-2002 18,520,486 523,574,516 0 194,955 28.3 USA-road-d.NY 264,346 730,100 1 8 2.8 USA-road-d.USA 23,947,347 57,708,624 1 9 2.4 vertex degree

slide-56
SLIDE 56

Titan X Performance (Edges/Second)

A High-Quality and Fast MIS Algorithm 56 ECL-MIS is >3.9x faster

  • n each tested graph

>12x faster than other codes on average >100x faster than

  • ther codes on kron
slide-57
SLIDE 57

K40 Performance (Edges/Second)

A High-Quality and Fast MIS Algorithm 57 ECL-MIS is >3.8x faster

  • n each tested graph

>9x faster than other codes on average >70x faster than

  • ther codes on kron
slide-58
SLIDE 58

Set Size (Deterministic)

A High-Quality and Fast MIS Algorithm 58 ECL-MIS yields largest set on all but one graph 10% larger

  • n average
slide-59
SLIDE 59

Performance Optimizations

A High-Quality and Fast MIS Algorithm 59 Using 16-bit values yields a 15% slowdown Using 32-bit values yields a 33% slowdown Not using randomization yields a 6% slowdown Synchronous execution yields an 18% slowdown Using 2 separate arrays yields an 18% slowdown Visiting all neighbors yields a 59% slowdown Combination of

  • ptimizations is key

Combination of

  • ptimizations is key
slide-60
SLIDE 60

Comparison to CPU Codes (Averages)

A High-Quality and Fast MIS Algorithm 60 ECL-MIS is fastest on all but one tested graph >2.9x faster than CPU codes on average ECL-MIS yields largest set on 15 of 16 graphs 9% to 11% larger

  • n average
slide-61
SLIDE 61

Summary

ECL-MIS maximal independent set algorithm

Fastest GPU implementation (due to optimizations) Produces largest sets (due to permutation selection)

Atomic-free CUDA implementation

http://cs.txstate.edu/~burtscher/research/ECL-MIS/

Acknowledgments

NSF grant 1406304 Nvidia donations

A High-Quality and Fast MIS Algorithm 61