Matrix-Vector Multiplication in Sub-Quadratic Time (Some - - PowerPoint PPT Presentation

matrix vector multiplication in sub quadratic time some
SMART_READER_LITE
LIVE PREVIEW

Matrix-Vector Multiplication in Sub-Quadratic Time (Some - - PowerPoint PPT Presentation

Matrix-Vector Multiplication in Sub-Quadratic Time (Some Preprocessing Required) Ryan Williams Carnegie Mellon University 0-0 Introduction Matrix-Vector Multiplication: Fundamental Operation in Scientific Computing 1 Introduction


slide-1
SLIDE 1

Matrix-Vector Multiplication in Sub-Quadratic Time (Some Preprocessing Required)

Ryan Williams Carnegie Mellon University

0-0

slide-2
SLIDE 2

Introduction

Matrix-Vector Multiplication: Fundamental Operation in Scientific Computing

1

slide-3
SLIDE 3

Introduction

Matrix-Vector Multiplication: Fundamental Operation in Scientific Computing How fast can n × n matrix-vector multiplication be?

1-a

slide-4
SLIDE 4

Introduction

Matrix-Vector Multiplication: Fundamental Operation in Scientific Computing How fast can n × n matrix-vector multiplication be?

Θ(n2) steps just to read the matrix!

1-b

slide-5
SLIDE 5

Introduction

Matrix-Vector Multiplication: Fundamental Operation in Scientific Computing How fast can n × n matrix-vector multiplication be?

Θ(n2) steps just to read the matrix!

Main Result: If we allow O(n2+ε) preprocessing, then matrix-vector multiplication over any finite semiring can be done in O(n2/(ε log n)2).

1-c

slide-6
SLIDE 6

Better Algorithms for Matrix Multiplication

Three of the major developments:

2

slide-7
SLIDE 7

Better Algorithms for Matrix Multiplication

Three of the major developments:

  • Arlazarov et al., a.k.a. “Four Russians” (1960’s): O(n3/ log n) operations

Uses table lookups Good for hardware with short vector operations as primitives

2-a

slide-8
SLIDE 8

Better Algorithms for Matrix Multiplication

Three of the major developments:

  • Arlazarov et al., a.k.a. “Four Russians” (1960’s): O(n3/ log n) operations

Uses table lookups Good for hardware with short vector operations as primitives

  • Strassen (1969): n

log 7 log 2 = O(n2.81) operations

Asymptotically fast, but overhead in the big-O Experiments in practice are inconclusive about Strassen vs. Four Russians for Boolean matrix multiplication (Bard, 2006)

2-b

slide-9
SLIDE 9

Better Algorithms for Matrix Multiplication

Three of the major developments:

  • Arlazarov et al., a.k.a. “Four Russians” (1960’s): O(n3/ log n) operations

Uses table lookups Good for hardware with short vector operations as primitives

  • Strassen (1969): n

log 7 log 2 = O(n2.81) operations

Asymptotically fast, but overhead in the big-O Experiments in practice are inconclusive about Strassen vs. Four Russians for Boolean matrix multiplication (Bard, 2006)

  • Coppersmith and Winograd (1990): O(n2.376) operations

Not yet practical

2-c

slide-10
SLIDE 10

Focus: Combinatorial Matrix Multiplication Algorithms

3

slide-11
SLIDE 11

Focus: Combinatorial Matrix Multiplication Algorithms

  • Also called non-algebraic; let’s call them non-subtractive

E.g. Four-Russians is combinatorial, Strassen isn’t

3-a

slide-12
SLIDE 12

Focus: Combinatorial Matrix Multiplication Algorithms

  • Also called non-algebraic; let’s call them non-subtractive

E.g. Four-Russians is combinatorial, Strassen isn’t More Non-Subtractive Boolean Matrix Mult. Algorithms:

  • Atkinson and Santoro: O(n3/ log3/2 n) on a (log n)-word RAM
  • Rytter and Basch-Khanna-Motwani: O(n3/ log2 n) on a RAM
  • Chan: Four Russians can be implemented on O(n3/ log2 n) on a pointer

machine

3-b

slide-13
SLIDE 13

Main Result

The O(n3/ log2 n) matrix multiplication algorithm can be “de-amortized”

4

slide-14
SLIDE 14

Main Result

The O(n3/ log2 n) matrix multiplication algorithm can be “de-amortized” More precisely, we can: Preprocess an n × n matrix A over a finite semiring in O(n2+ε) Such that vector multiplications with A can be done in O(n2/(ε log n)2)

4-a

slide-15
SLIDE 15

Main Result

The O(n3/ log2 n) matrix multiplication algorithm can be “de-amortized” More precisely, we can: Preprocess an n × n matrix A over a finite semiring in O(n2+ε) Such that vector multiplications with A can be done in O(n2/(ε log n)2) Allows for “non-subtractive” matrix multiplication to be done on-line

4-b

slide-16
SLIDE 16

Main Result

The O(n3/ log2 n) matrix multiplication algorithm can be “de-amortized” More precisely, we can: Preprocess an n × n matrix A over a finite semiring in O(n2+ε) Such that vector multiplications with A can be done in O(n2/(ε log n)2) Allows for “non-subtractive” matrix multiplication to be done on-line Can be implemented on a pointer machine

4-c

slide-17
SLIDE 17

Main Result

The O(n3/ log2 n) matrix multiplication algorithm can be “de-amortized” More precisely, we can: Preprocess an n × n matrix A over a finite semiring in O(n2+ε) Such that vector multiplications with A can be done in O(n2/(ε log n)2) Allows for “non-subtractive” matrix multiplication to be done on-line Can be implemented on a pointer machine This Talk: The Boolean case

4-d

slide-18
SLIDE 18

Preprocessing Phase: The Boolean Case

Partition the input matrix A into blocks of ⌈ε log n⌉ × ⌈ε log n⌉ size:

A1,1 A2,1 A

n ε log n,1

A1,2 A1,

n ε log n

A

n ε log n, n ε log n

· · ·

. . .

· · · · · ·

. . . . . .

Ai,j

ε log n ε log n

A =

5

slide-19
SLIDE 19

Preprocessing Phase: The Boolean Case

Build a graph G with parts P1, . . . , Pn/(ε log n), Q1, . . . , Qn/(ε log n)

2ε log n

P2 . . . . . . . . . . . . P

n ε log n

P1

2ε log n 2ε log n 2ε log n 2ε log n 2ε log n

Q1 Q2 Q

n ε log n

Each part has 2ε log n vertices, one for each possible ε log n vector

6

slide-20
SLIDE 20

Preprocessing Phase: The Boolean Case

Edges of G: Each vertex v in each Pi has exactly one edge into each Qj

2ε log n

Pi

2ε log n Qj v Aj,iv

7

slide-21
SLIDE 21

Preprocessing Phase: The Boolean Case

Edges of G: Each vertex v in each Pi has exactly one edge into each Qj

2ε log n

Pi

2ε log n Qj v Aj,iv

Time to build the graph:

n ε log n · n ε log n · 2ε log n · (ε log n)2 = O(n2+ε)

number

  • f Qj

number

  • f Pi

number

  • f nodes

in Pi matrix-vector mult

  • f Aj,i and v

7-a

slide-22
SLIDE 22

How to Do Fast Vector Multiplications

Let v be a column vector. Want: A · v.

8

slide-23
SLIDE 23

How to Do Fast Vector Multiplications

Let v be a column vector. Want: A · v.

(1) Break up v into ε log n sized chunks:

v =        v1 v2

. . .

v

n ε log n

      

8-a

slide-24
SLIDE 24

How to Do Fast Vector Multiplications

(2) For each i = 1, . . . , n/(ε log n), look up vi in Pi.

9

slide-25
SLIDE 25

How to Do Fast Vector Multiplications

(2) For each i = 1, . . . , n/(ε log n), look up vi in Pi.

2ε log n

P2 . . . . . . . . . . . . P

n ε log n

P1

2ε log n 2ε log n 2ε log n 2ε log n 2ε log n

Q1 Q2 Q

n ε log n

v1 v2 vn/(ε log n)

Takes ˜

O(n) time.

9-a

slide-26
SLIDE 26

How to Do Fast Vector Multiplications

(2) For each i = 1, . . . , n/(ε log n), look up vi in Pi.

2ε log n

P2 . . . . . . . . . . . . P

n ε log n

P1

2ε log n 2ε log n 2ε log n 2ε log n 2ε log n

Q1 Q2 Q

n ε log n

v1 v2 vn/(ε log n)

Takes ˜

O(n) time.

10

slide-27
SLIDE 27

How to Do Fast Vector Multiplications

(3) Look up the neighbors of vi, mark each neighbor found.

11

slide-28
SLIDE 28

How to Do Fast Vector Multiplications

(3) Look up the neighbors of vi, mark each neighbor found.

2ε log n

P2 . . . . . . . . . . . . P

n ε log n

P1

2ε log n 2ε log n 2ε log n 2ε log n 2ε log n

Q1 Q2 Q

n ε log n

v1 v2 vn/(ε log n) A1,1 · v1 A2,1 · v1 A

n ε log n,1 · v1 11-a

slide-29
SLIDE 29

How to Do Fast Vector Multiplications

(3) Look up the neighbors of vi, mark each neighbor found.

2ε log n

P2 . . . . . . . . . . . . P

n ε log n

P1

2ε log n 2ε log n 2ε log n 2ε log n 2ε log n

Q1 Q2 Q

n ε log n

v1 v2 vn/(ε log n) A1,2 · v2 A2,2 · v2 A

n ε log n,2 · v2 12

slide-30
SLIDE 30

How to Do Fast Vector Multiplications

(3) Look up the neighbors of vi, mark each neighbor found.

2ε log n

P2 . . . . . . . . . . . . P

n ε log n

P1

2ε log n 2ε log n 2ε log n 2ε log n 2ε log n

Q1 Q2 Q

n ε log n

v1 v2 vn/(ε log n) A1,

n ε log n · vn/(ε log n)

A2,

n ε log n · vn/(ε log n)

A

n ε log n, n ε log n · vn/(ε log n)

Takes O

  • n

ε log n

2

13

slide-31
SLIDE 31

How to Do Fast Vector Multiplications

(4) For each Qj, define v′

j as the OR of all marked vectors in Qj

2ε log n

P2 . . . . . . . . . . . . P

n ε log n

P1

2ε log n 2ε log n 2ε log n 2ε log n 2ε log n

Q1 Q2 Q

n ε log n

v1 v2 vn/(ε log n)

⇒ ⇒ ⇒ v′

1

v′

2

v′

n/(ε log n)

∨ ∨ ∨

Takes ˜

O(n1+ε) time

14

slide-32
SLIDE 32

How to Do Fast Vector Multiplications

(4) For each Qj, define v′

j as the OR of all marked vectors in Qj

2ε log n

P2 . . . . . . . . . . . . P

n ε log n

P1

2ε log n 2ε log n 2ε log n 2ε log n 2ε log n

Q1 Q2 Q

n ε log n

v1 v2 vn/(ε log n)

⇒ ⇒ ⇒ v′

1

v′

2

v′

n/(ε log n)

∨ ∨ ∨

Takes ˜

O(n1+ε) time

15

slide-33
SLIDE 33

How to Do Fast Vector Multiplications

(5) Output v′ :=

       v′

1

v′

2

. . .

v′

n ε log n

      

. Claim: v′ = A · v.

16

slide-34
SLIDE 34

How to Do Fast Vector Multiplications

(5) Output v′ :=

       v′

1

v′

2

. . .

v′

n ε log n

      

. Claim: v′ = A · v. Proof: By definition, v′

j = n/(ε log n) i=1

Aj,i · vi.

16-a

slide-35
SLIDE 35

How to Do Fast Vector Multiplications

(5) Output v′ :=

       v′

1

v′

2

. . .

v′

n ε log n

      

. Claim: v′ = A · v. Proof: By definition, v′

j = n/(ε log n) i=1

Aj,i · vi. Av =     A1,1 · · · A1,n/(ε log n)

. . . ... . . .

An/(ε log n),1 · · · An/(ε log n),n/(ε log n)         v1

. . .

v

n ε log n

   

16-b

slide-36
SLIDE 36

How to Do Fast Vector Multiplications

(5) Output v′ :=

       v′

1

v′

2

. . .

v′

n ε log n

      

. Claim: v′ = A · v. Proof: By definition, v′

j = n/(ε log n) i=1

Aj,i · vi. Av =     A1,1 · · · A1,n/(ε log n)

. . . ... . . .

An/(ε log n),1 · · · An/(ε log n),n/(ε log n)         v1

. . .

v

n ε log n

    = (n/(ε log n)

i=1

A1,i · vi, . . . , n/(ε log n)

i=1

A1,n/(ε log n) · vi) = v′.

16-c

slide-37
SLIDE 37

Some Applications

Can quickly compute the neighbors of arbitrary vertex subsets Let A be the adjacency matrix of G = (V, E). Let vS be the indicator vector for a S ⊆ V .

17

slide-38
SLIDE 38

Some Applications

Can quickly compute the neighbors of arbitrary vertex subsets Let A be the adjacency matrix of G = (V, E). Let vS be the indicator vector for a S ⊆ V . Proposition: A · vS is the indicator vector for N(S), the neighborhood of S.

17-a

slide-39
SLIDE 39

Some Applications

Can quickly compute the neighbors of arbitrary vertex subsets Let A be the adjacency matrix of G = (V, E). Let vS be the indicator vector for a S ⊆ V . Proposition: A · vS is the indicator vector for N(S), the neighborhood of S. Corollary: After O(n2+ε) preprocessing, can determine the neighborhood of any vertex subset in O(n2/(ε log n)2) time. (One level of BFS in o(n2) time)

17-b

slide-40
SLIDE 40

Graph Queries

Corollary: After O(n2+ε) preprocessing, can determine if a given vertex subset is an independent set, a vertex cover, or a dominating set, all in

O(n2/(ε log n)2) time.

18

slide-41
SLIDE 41

Graph Queries

Corollary: After O(n2+ε) preprocessing, can determine if a given vertex subset is an independent set, a vertex cover, or a dominating set, all in

O(n2/(ε log n)2) time.

Proof: Let S ⊆ V .

18-a

slide-42
SLIDE 42

Graph Queries

Corollary: After O(n2+ε) preprocessing, can determine if a given vertex subset is an independent set, a vertex cover, or a dominating set, all in

O(n2/(ε log n)2) time.

Proof: Let S ⊆ V .

S is dominating ⇐ ⇒ S ∪ N(S) = V .

18-b

slide-43
SLIDE 43

Graph Queries

Corollary: After O(n2+ε) preprocessing, can determine if a given vertex subset is an independent set, a vertex cover, or a dominating set, all in

O(n2/(ε log n)2) time.

Proof: Let S ⊆ V .

S is dominating ⇐ ⇒ S ∪ N(S) = V . S is independent ⇐ ⇒ S ∩ N(S) = ∅.

18-c

slide-44
SLIDE 44

Graph Queries

Corollary: After O(n2+ε) preprocessing, can determine if a given vertex subset is an independent set, a vertex cover, or a dominating set, all in

O(n2/(ε log n)2) time.

Proof: Let S ⊆ V .

S is dominating ⇐ ⇒ S ∪ N(S) = V . S is independent ⇐ ⇒ S ∩ N(S) = ∅. S is a vertex cover ⇐ ⇒ V − S is independent.

18-d

slide-45
SLIDE 45

Graph Queries

Corollary: After O(n2+ε) preprocessing, can determine if a given vertex subset is an independent set, a vertex cover, or a dominating set, all in

O(n2/(ε log n)2) time.

Proof: Let S ⊆ V .

S is dominating ⇐ ⇒ S ∪ N(S) = V . S is independent ⇐ ⇒ S ∩ N(S) = ∅. S is a vertex cover ⇐ ⇒ V − S is independent.

Each can be quickly determined from knowing S and N(S).

18-e

slide-46
SLIDE 46

Triangle Detection

19

slide-47
SLIDE 47

Triangle Detection

Problem: Triangle Detection Given: Graph G and vertex i. Question: Does i participate in a 3-cycle, a.k.a. triangle?

19-a

slide-48
SLIDE 48

Triangle Detection

Problem: Triangle Detection Given: Graph G and vertex i. Question: Does i participate in a 3-cycle, a.k.a. triangle? Worst Case: Can take Θ(n2) time to check all pairs of neighbors of i

19-b

slide-49
SLIDE 49

Triangle Detection

Problem: Triangle Detection Given: Graph G and vertex i. Question: Does i participate in a 3-cycle, a.k.a. triangle? Worst Case: Can take Θ(n2) time to check all pairs of neighbors of i Corollary: After O(n2+ε) preprocessing on G, can solve triangle detection for arbitrary vertices in O(n2/(ε log n)2) time.

19-c

slide-50
SLIDE 50

Triangle Detection

Problem: Triangle Detection Given: Graph G and vertex i. Question: Does i participate in a 3-cycle, a.k.a. triangle? Worst Case: Can take Θ(n2) time to check all pairs of neighbors of i Corollary: After O(n2+ε) preprocessing on G, can solve triangle detection for arbitrary vertices in O(n2/(ε log n)2) time. Proof: Given vertex i, let S be its set of neighbors (gotten in O(n) time).

S is not independent ⇐ ⇒ i participates in a triangle.

19-d

slide-51
SLIDE 51

Conclusion

A preprocessing/multiplication algorithm for matrix-vector multiplication that builds on lookup table techniques

20

slide-52
SLIDE 52

Conclusion

A preprocessing/multiplication algorithm for matrix-vector multiplication that builds on lookup table techniques

  • Is there a preprocessing/multiplication algorithm for sparse matrices? Can

we do multiplication in e.g. O(m/poly(log n) + n), where m = number of nonzeroes?

20-a

slide-53
SLIDE 53

Conclusion

A preprocessing/multiplication algorithm for matrix-vector multiplication that builds on lookup table techniques

  • Is there a preprocessing/multiplication algorithm for sparse matrices? Can

we do multiplication in e.g. O(m/poly(log n) + n), where m = number of nonzeroes?

  • Can the algebraic matrix multiplication algorithms (Strassen, etc.) be

applied to this problem?

20-b

slide-54
SLIDE 54

Conclusion

A preprocessing/multiplication algorithm for matrix-vector multiplication that builds on lookup table techniques

  • Is there a preprocessing/multiplication algorithm for sparse matrices? Can

we do multiplication in e.g. O(m/poly(log n) + n), where m = number of nonzeroes?

  • Can the algebraic matrix multiplication algorithms (Strassen, etc.) be

applied to this problem?

  • Can our ideas be extended to achieve non-subtractive Boolean matrix

multiplication in o(n3/ log2 n)?

20-c

slide-55
SLIDE 55

Thank you!

21