Sparse Matrix Multiplication and Triangle Listing in the Congested - - PowerPoint PPT Presentation

sparse matrix multiplication and triangle listing in the
SMART_READER_LITE
LIVE PREVIEW

Sparse Matrix Multiplication and Triangle Listing in the Congested - - PowerPoint PPT Presentation

Sparse Matrix Multiplication and Triangle Listing in the Congested Clique Model Keren Censor-Hillel, Dean Leitersdorf , Elia Turner (Technion) OPODIS 2018 This project received funding from the European Unions Horizon 2020 Research and


slide-1
SLIDE 1

Keren Censor-Hillel, Dean Leitersdorf, Elia Turner (Technion) OPODIS 2018

Sparse Matrix Multiplication and Triangle Listing in the Congested Clique Model

This project received funding from the European Union’s Horizon 2020 Research and Innovation Program under grant agreement no. 755839

slide-2
SLIDE 2

Overview

2

slide-3
SLIDE 3

The Congested Clique

Input Graph Overlay Network

  • n nodes in both graphs
  • Synchronous, bits per message
  • All-to-All Communication
  • Goal: Minimize # communication rounds

3

slide-4
SLIDE 4

Sparse Algorithms

  • Sparse input graphs common in practice
  • Leverage sparsity, reduce runtime
  • Congested Clique: Does not decrease model strength

4

slide-5
SLIDE 5

Sparse Algorithms - Our Results

  • New load balancing building blocks
  • New algorithms for sparse matrix multiplication,

triangle listing

  • Implies sparse graph algorithms

○ Triangle, 4-cycle counting ○ APSP

5

slide-6
SLIDE 6

Sparse Matrix Multiplication (Sparse MM)

6

slide-7
SLIDE 7

Previous Work on MM

  • boolean MM

[Drucker, Kuhn, Oshman, PODC 2014]

  • ring MM

[Censor-Hillel, Kaski, Korhonen, Lenzen, Paz, Suomela, PODC 2015]

  • semiring MM

[Censor-Hillel, Kaski, Korhonen, Lenzen, Paz, Suomela, PODC 2015]

  • Rectangular matrices and multiple instances of MM concurrently

[Le Gall, DISC 2016] ω = exponent of sequential MM < 2.372864

7

slide-8
SLIDE 8

Previous Work on MM

8

  • boolean MM

[Drucker, Kuhn, Oshman, PODC 2014]

  • ring MM

[Censor-Hillel, Kaski, Korhonen, Lenzen, Paz, Suomela, PODC 2015]

  • semiring MM

[Censor-Hillel, Kaski, Korhonen, Lenzen, Paz, Suomela, PODC 2015]

  • Rectangular matrices and multiple instances of MM concurrently

[Le Gall, DISC 2016] ω = exponent of sequential MM < 2.372864

slide-9
SLIDE 9

Previous Work on MM

9

  • boolean MM

[Drucker, Kuhn, Oshman, PODC 2014]

  • ring MM

[Censor-Hillel, Kaski, Korhonen, Lenzen, Paz, Suomela, PODC 2015]

  • semiring MM

[Censor-Hillel, Kaski, Korhonen, Lenzen, Paz, Suomela, PODC 2015]

  • Rectangular matrices and multiple instances of MM concurrently

[Le Gall, DISC 2016] ω = exponent of sequential MM < 2.372864

slide-10
SLIDE 10

Matrix Multiplication (MM)

Input Graph Input Matrices

  • Input: Square matrices S, T. Output: P = S*T
  • Node i: row i of each matrix
  • Example: S = T = Adjacency matrix of graph

10

slide-11
SLIDE 11

Sparse MM

  • Many beautiful works in sequential and
  • parallel. Typically different runtime measures
  • New algorithm: deterministic & dynamic

communication pattern w.r.t. sparsity structure

11

slide-12
SLIDE 12

Sparse MM

12

S T P Non-Zero Zero N/A

slide-13
SLIDE 13

Sparse MM

13

S T P Non-Zero Zero N/A Implicit communication of zeros!

slide-14
SLIDE 14

➔ P = S*T: nz(A) = number of non-zero elements in A

Sparse MM - Our Main Result

14

Lets see it!

slide-15
SLIDE 15

Semiring MM

  • [Censor-Hillel, Kaski, Korhonen,

Lenzen, Paz, Suomela, PODC 2015]

  • 3 Parts:

1. Distribute matrix entries 2. Locally compute partial products 3. Sum partial products

  • Our novelty: 1, 3 in a sparsity aware manner

15

slide-16
SLIDE 16

S T P

16

The Challenges

slide-17
SLIDE 17

T S P

17

The Challenges

slide-18
SLIDE 18

S

18

The Challenges

Non-Zero Zero N/A

slide-19
SLIDE 19

S

19

The Challenges

Non-Zero Zero N/A

slide-20
SLIDE 20

Sparse MM: Two Challenges

  • [Lenzen, 2013] - runtime depends on max messages
  • Receiving Challenge: Every node receives ~same # messages
  • Sending Challenge: Every node sends ~same # messages

20

slide-21
SLIDE 21

Step 1: (a, b)-split

Square MM

21

Several Instances of Rectangular MM

slide-22
SLIDE 22

S T P

Step 1: (a, b)-split

22

slide-23
SLIDE 23

S T P

Step 1: (a, b)-split

23

slide-24
SLIDE 24

Step 1: (a, b)-split

24

S T Detailed Example P

slide-25
SLIDE 25

S

Step 1: (a, b)-split

a

25

T Detailed Example a = 3 P

slide-26
SLIDE 26

a S

Step 1: (a, b)-split

26

T Detailed Example a = 3 b = 4 b P

slide-27
SLIDE 27

a S

Step 1: (a, b)-split

27

Detailed Example a = 3 b = 4 T b P

slide-28
SLIDE 28

a S

Step 1: (a, b)-split

28

Finally:

  • There are a*b

rectangular MM

  • Assign n/(ab) nodes

to compute each Detailed Example a = 3 b = 4 T b P

slide-29
SLIDE 29

Step 2: Receiving Challenge

  • Step 2.1: Roughly similar rectangular MM (density-wise)
  • Step 2.2: Sparsity awareness within rectangular MM

29

slide-30
SLIDE 30

a S

Step 2.1: Similar Rectangular MM

30

P T b

slide-31
SLIDE 31

a S

Step 2.1: Similar Rectangular MM

31

Observation:

  • Ok reorder S-rows, T-cols
  • Reorder to achieve similar

rectangular MMs

  • O(1) in congested clique
  • Deterministic

P T b

slide-32
SLIDE 32

a S

Step 2.1: Similar Rectangular MM

32

Observation:

  • Ok reorder S-rows, T-cols
  • Reorder to achieve similar

rectangular MMs

  • O(1) in congested clique
  • Deterministic

P T b

slide-33
SLIDE 33

b a S

Step 2.1: Similar Rectangular MM

33

Observation:

  • Ok reorder S-rows, T-cols
  • Reorder to achieve similar

rectangular MMs

  • O(1) in congested clique
  • Deterministic

P T

slide-34
SLIDE 34

S T

34

Step 2.2: Sparsity Aware Rectangular MM

How do we split this between the n/(ab) nodes?

slide-35
SLIDE 35

35

Step 2.2: Sparsity Aware Rectangular MM

slide-36
SLIDE 36

Step 2.2: Sparsity Aware Rectangular MM

S T

36

slide-37
SLIDE 37

37

Step 2.2: Sparsity Aware Rectangular MM

S T

slide-38
SLIDE 38

38

Step 2.2: Sparsity Aware Rectangular MM

Observation:

  • Swapping two S-cols and the two

respective T-rows cancels out S T

slide-39
SLIDE 39

0 0 0 3 3 3 0 0 0 3 3 3

39

Step 2.2: Sparsity Aware Rectangular MM

  • Phase 1: Count non-zeros in S-cols, T-rows
slide-40
SLIDE 40

0 0 0 3 3 3 0 0 0 3 3 3

40

Step 2.2: Sparsity Aware Rectangular MM

  • Phase 1: Count non-zeros in S-cols, T-rows
  • Phase 2: Sum counts

0 0 0 6 6 6 0 0 0

slide-41
SLIDE 41

41

Step 2.2: Sparsity Aware Rectangular MM

  • Phase 1: Count non-zeros in S-cols, T-rows
  • Phase 2: Sum counts
  • Phase 3: Reorder

0 0 0 6 6 6 0 0 0 6 0 0 6 0 0 6 0 0

slide-42
SLIDE 42

42

Step 2.2: Sparsity Aware Rectangular MM

  • Phase 1: Count non-zeros in S-cols, T-rows
  • Phase 2: Sum counts
  • Phase 3: Reorder

0 0 0 6 6 6 0 0 0 6 0 0 6 0 0 6 0 0

slide-43
SLIDE 43

Step 2: Receiving Challenge

SOLVED!

43

slide-44
SLIDE 44

44

Step 2.2: Sparsity Aware Rectangular MM

Notice!

  • a*b different rectangular MM
  • n/(ab) nodes in each
  • Inner reorderings = local knowledge of n/ab nodes

Making them global knowledge = too expensive!

  • Will be problematic soon
slide-45
SLIDE 45

Step 3: Sending Challenge

  • Need every node to send roughly same
  • Solution: balancing message duplication

45

slide-46
SLIDE 46

T

46

Dense Nodes Will Be Slow

Non-Zero Zero N/A

slide-47
SLIDE 47

a S

Message Duplication

47

  • S, T duplicated b, a times resp.

P T b

slide-48
SLIDE 48

a S

Message Duplication

48

  • S, T duplicated b, a times resp.

P T b

slide-49
SLIDE 49

Step 3: Sending Challenge

  • Key Point 1: Duplication is expensive!
  • Key Point 2: Very easily load balanced - sparse nodes

help dense nodes

49

slide-50
SLIDE 50

Step 4: Knowledge Challenge

  • Problem: Inner reorderings = local knowledge
  • Senders do not know who to send messages to
  • We show O(1) solution

○ Requires specific redistribution of elements in sending challenge - receivers know who needs to message them ○ Nodes request messages

50

slide-51
SLIDE 51

Summary

  • For any (a, b), total runtime:
  • Optimal (a, b):
  • Resulting overall runtime:

51

slide-52
SLIDE 52

➔ P = S*T: nz(A) = number of non-zero elements in A

Sparse MM - Our Main Result

52

slide-53
SLIDE 53

Sparse Triangle Listing

53

slide-54
SLIDE 54

Triangle Listing

  • Every triangle must be known to at least one node

54

slide-55
SLIDE 55
  • [Dolev, Lenzen, Peled DISC 2012]
  • [Izumi, Le Gall, PODC 2017,

Pandurangan, Robinson, Scquizzato, SPAA 2018]

  • w.h.p.

[Pandurangan, Robinson, Scquizzato, SPAA 2018] Our Result: deterministic

Previous Work on Triangle Listing

55

slide-56
SLIDE 56

Our Result

  • Triangle = path length 1 (v→u) + path length 2 (u→v)
  • Our runtime:
  • Notice: this is faster than the time for squaring!

○ No need to compute all of A2

56

slide-57
SLIDE 57

Conclusion

57

slide-58
SLIDE 58

Conclusion

58

Our Work

  • New load balancing building blocks in Congested Clique
  • New algorithms for Sparse MM, Triangle Listing

Open Questions

  • Can the complexity of Sparse MM be improved in the

clique? Sparse Ring MM?

  • Lower bound for Sparse Triangle Listing?
  • Using these algorithms/techniques in other models