Fast Output-Sensitive Matrix Multiplication ESA 2015 Riko Jacob 1 , - - PowerPoint PPT Presentation

fast output sensitive matrix multiplication
SMART_READER_LITE
LIVE PREVIEW

Fast Output-Sensitive Matrix Multiplication ESA 2015 Riko Jacob 1 , - - PowerPoint PPT Presentation

Fast Output-Sensitive Matrix Multiplication ESA 2015 Riko Jacob 1 , Morten St ockel 2 1 IT University of Copenhagen, 2 University of Copenhagen March 8 2016 Jacob, St ockel ITU, DIKU March 8 2016 1 / 28 (Fast) Sparse matrix multiplication


slide-1
SLIDE 1

Fast Output-Sensitive Matrix Multiplication

ESA 2015

Riko Jacob1, Morten St¨

  • ckel2

1IT University of Copenhagen, 2 University of Copenhagen

March 8 2016

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 1 / 28

slide-2
SLIDE 2

(Fast) Sparse matrix multiplication Problem description

(Fast) Sparse matrix multiplication Problem description Fast and output-sensitive matrix mult High level The row-balanced case The general case

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 2 / 28

slide-3
SLIDE 3

(Fast) Sparse matrix multiplication Problem description

Overview

◮ Let A and C be U × U matrices over a field F with N nonzero

entries in total.

◮ The problem: Compute matrix product [AC]i,j = k Ai,kCk,j with Z

nonzero entries.

◮ Well known solution: O(U ω) word-RAM operations. ◮ Our main result: Monte Carlo algorithm using ˜

O(U 2(Z/U)ω−2 + N) word-RAM operations.

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 3 / 28

slide-4
SLIDE 4

(Fast) Sparse matrix multiplication Problem description

Matrix multiplication, basics

a11 a12 ... a1p a21 a22 ... a2p . . . . . . ... . . . an1 an2 ... anp                                     A : n rows p columns

×

c11 c12 ... c1q c21 c22 ... c2q . . . . . . ... . . . cp1 cp2 ... cpq                                     C : p rows q columns

=

ac11 ac12 ... ac1q ac21 ac22 ... ac2q . . . . . . ... . . . acn1 acn2 ... acnq                                     AC = A×C : n rows q columns

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 4 / 28

slide-5
SLIDE 5

(Fast) Sparse matrix multiplication Problem description

Matrix multiplication, basics

a11 a12 ... a1p a21 a22 ... a2p . . . . . . ... . . . an1 an2 ... anp                                     A : n rows p columns c11 c12 ... c1q c21 c22 ... c2q . . . . . . ... . . . cp1 cp2 ... cpq                                     C : p rows q columns ac11 ac12 ... ac1q ac21 ac22 ... ac2q . . . . . . ... . . . acn1 acn2 ... acnq                                     a21 ×c12 a

2 2

× c

2 2

a

2 p

× c

p 2

+ +...+ AC = A×C : n rows q columns

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 5 / 28

slide-6
SLIDE 6

(Fast) Sparse matrix multiplication Problem description

Motivation

Some applications:

◮ Computing determinants and inverses of matrices. ◮ Bioinformatics. ◮ Graphs: counting cycles, computing matchings.

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 6 / 28

slide-7
SLIDE 7

(Fast) Sparse matrix multiplication Problem description

Some intuition, fast matrix multiplication

A C U U ×

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 7 / 28

slide-8
SLIDE 8

(Fast) Sparse matrix multiplication Problem description

Some intuition, fast matrix mult 2

◮ Can be done in O(U ω) operations due to Strassen who showed

ω ≤ log2 7. Most recently ω < 2.3728639 due to Le Gall.

◮ But what if the input and/or output is sparse?

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 8 / 28

slide-9
SLIDE 9

(Fast) Sparse matrix multiplication Problem description

Some intuition, fast matrix multiplication

A C U U ×

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 9 / 28

slide-10
SLIDE 10

(Fast) Sparse matrix multiplication Problem description

Some intuition, The Dream

A C U U ×

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 10 / 28

slide-11
SLIDE 11

(Fast) Sparse matrix multiplication Problem description

◮ We could apply the fast matrix mult black box on the colored area to

get Nω/2 operations – unfortunately difficult/impossible, since:

  • 1. Even sparse input can mean dense output (maybe(N + Z)ω/2

possible?)

  • 2. Compressing like this breaks matrix structure.

◮ Main idea: Compress the input according to sparsity and structure of

the output instead. A C U U ×

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 11 / 28

slide-12
SLIDE 12

(Fast) Sparse matrix multiplication Problem description

Our results

◮ Let A and C be U × U matrices over field F with N nonzero input

and Z nonzero output entries. There exist Monte Carlo algorithms 1 and 2 such that:

  • 1. When we have Z/U nonzero entries per row and per column, uses

˜ O(UZ

ω−1 2

+ N) operations.

  • 2. When the input matrices have arbitrary balance, uses

˜ O(U 2(Z/U)ω−2 + N) operations.

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 12 / 28

slide-13
SLIDE 13

(Fast) Sparse matrix multiplication Problem description

Our results, overview

Method word-RAM complexity Notes General dense O (U ω) Lingas ˜ O

  • U 2Zω/2−1

Requires boolean matrices. Iwen-Spencer, Le Gall O

  • U 2+ε

Requires O

  • n0.3

nonzeros per column. Williams-Yu, Pagh ˜ O

  • U 2 + UZ
  • Van Gucht et al.

˜ O

  • N

√ Z + Z + N

  • This paper

˜ O

  • UZ(ω−1)/2 + N
  • Requires balanced rows and columns.

This paper ˜ O

  • U 2(Z/U)ω−2 + N
  • Method

I/O complexity Notes General dense ˜ O

  • U ω/(Mω/2−1B)
  • Pagh-St¨
  • ckel

˜ O

  • N

√ Z/(B √ M)

  • Elements from semirings.

This paper ˜ O

  • UZ

ω−1 2 /(Mω/2−1B) + Z/B + N/B

  • Requires balanced rows and columns.

This paper ˜ O

  • U 2(Z/U)ω−2/(Mω/2−1B) + U 2/B
  • Jacob, St¨
  • ckel ITU, DIKU

March 8 2016 13 / 28

slide-14
SLIDE 14

(Fast) Sparse matrix multiplication Problem description

Our results, overview

Method word-RAM complexity Notes General dense O (U ω) Lingas ˜ O

  • U 2Zω/2−1

Requires boolean matrices. Iwen-Spencer, Le Gall O

  • U 2+ε

Requires O

  • n0.3

nonzeros per column. Williams-Yu, Pagh ˜ O

  • U 2 + UZ
  • Van Gucht et al.

˜ O

  • N

√ Z + Z + N

  • This paper

˜ O

  • UZ(ω−1)/2 + N
  • Requires balanced rows and columns.

This paper ˜ O

  • U 2(Z/U)ω−2 + N
  • Method

I/O complexity Notes General dense ˜ O

  • U ω/(Mω/2−1B)
  • Pagh-St¨
  • ckel

˜ O

  • N

√ Z/(B √ M)

  • Elements from semirings.

This paper ˜ O

  • UZ

ω−1 2 /(Mω/2−1B) + Z/B + N/B

  • Requires balanced rows and columns.

This paper ˜ O

  • U 2(Z/U)ω−2/(Mω/2−1B) + U 2/B
  • When N = U 2 we use less word-RAM operations for any Z >> U and

U > 1.

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 13 / 28

slide-15
SLIDE 15

(Fast) Sparse matrix multiplication Problem description

Our results, overview

Method word-RAM complexity Notes General dense O (U ω) Lingas ˜ O

  • U 2Zω/2−1

Requires boolean matrices. Iwen-Spencer, Le Gall O

  • U 2+ε

Requires O

  • n0.3

nonzeros per column. Williams-Yu, Pagh ˜ O

  • U 2 + UZ
  • Van Gucht et al.

˜ O

  • N

√ Z + Z + N

  • This paper

˜ O

  • UZ(ω−1)/2 + N
  • Requires balanced rows and columns.

This paper ˜ O

  • U 2(Z/U)ω−2 + N
  • Method

I/O complexity Notes General dense ˜ O

  • U ω/(Mω/2−1B)
  • Pagh-St¨
  • ckel

˜ O

  • N

√ Z/(B √ M)

  • Elements from semirings.

This paper ˜ O

  • UZ

ω−1 2 /(Mω/2−1B) + Z/B + N/B

  • Requires balanced rows and columns.

This paper ˜ O

  • U 2(Z/U)ω−2/(Mω/2−1B) + U 2/B
  • When N = U 2 we use less external memory operations, unless M is larger

than Z.

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 13 / 28

slide-16
SLIDE 16

Fast and output-sensitive matrix mult High level

Overview

Our approach at a high level:

  • 1. Assume bounded number of nonzero entries in output rows – solve

this case efficiently.

  • 2. Show that any matrix can be divided into a small number of such

subproblems.

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 14 / 28

slide-17
SLIDE 17

Fast and output-sensitive matrix mult The row-balanced case

Row-balance intuition

Promise: Upper bound on number of nonzero entries in a row of AC. Goal: Use this to compress the input. A C AC × =

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 15 / 28

slide-18
SLIDE 18

Fast and output-sensitive matrix mult The row-balanced case

Row-balance intuition

Promise: Upper bound on number of nonzero entries in a row of AC. Goal: Use this to compress the input. Idea: Collapse columns (“make rows shorter”). A C AC × =

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 15 / 28

slide-19
SLIDE 19

Fast and output-sensitive matrix mult The row-balanced case

Row-balance intuition

Promise: Upper bound on number of nonzero entries in a row of AC. Goal: Use this to compress the input. Idea: Collapse columns (”make rows shorter”). A C AC × =

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 15 / 28

slide-20
SLIDE 20

Fast and output-sensitive matrix mult The row-balanced case

Row-balance intuition

Promise: Upper bound on number of nonzero entries in a row of AC. Goal: Use this to compress the input. Idea: Collapse columns (”make rows shorter”). A C AC × =

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 15 / 28

slide-21
SLIDE 21

Fast and output-sensitive matrix mult The row-balanced case

Row-balance intuition

The benefit: A smaller matrix product! A C AC × =

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 15 / 28

slide-22
SLIDE 22

Fast and output-sensitive matrix mult The row-balanced case

High level compression

Compression algorithm.

◮ Assume at most d/5 nonzero entries per row. ◮ Consider a random hash function h1 : [U] → [d]. For each i, j do

C′

i,h1(j)+ = Ci,j. ◮ Compute AC′ using fast matrix mult black box.

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 16 / 28

slide-23
SLIDE 23

Fast and output-sensitive matrix mult The row-balanced case

A specific entry, analysis

Consider one nonzero entry in the output AC. This entry will be correct if h1 maps no other nonzero entry to its column. A C AC × =

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 17 / 28

slide-24
SLIDE 24

Fast and output-sensitive matrix mult The row-balanced case

A specific entry, analysis

◮ A specific nonzero entry will be correct if no other nonzero entries to

the same row hash to it.

◮ At most d/5 nonzero entries in a row that are mapped to d random

positions.

◮ Probability of mapping to a specific position is 1/d. ◮ Proability of avoiding d/5 entries mapping to a specific position is at

least (1 − 1/d)d/5 ≥ 3/4.

◮ Bottom line: An entry is correct with probability at least 3/4.

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 18 / 28

slide-25
SLIDE 25

Fast and output-sensitive matrix mult The row-balanced case

All entries, analysis

Getting all entries right with high probability:

◮ Repeat for O(log U) independently drawn random hash functions. ◮ Now have O(log U) compressed matrix products Gl. ◮ To query an entry i, j of AC, perform majority vote over all Gli,hl(j)

for l = 1, . . . , O(log U).

◮ By Chernoff bound + union bound we get arbitrary polynomially

small error probability.

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 19 / 28

slide-26
SLIDE 26

Fast and output-sensitive matrix mult The row-balanced case

The cost 1

Observe: We’re no longer quadratic. Need fast matrix multiplication for rectangular matrices. A C AC × =

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 20 / 28

slide-27
SLIDE 27

Fast and output-sensitive matrix mult The row-balanced case

The cost 2

Observe: We’re no longer quadratic. Need fast matrix multiplication for rectangular matrices. Say they have sizes U × V and V × W.

Fact

Let ω be the smallest constant such that an algorithm to multiply two n × n matrices that runs in time O(nω) is known. Let β = min{U, V, W}. Fast matrix multiplication has FRAM(U, V, W) = O

  • UV W · βω−3

running time on a RAM.

Proof.

Assume wlog that β divides α = UV W/β. Since β is the smallest dimension we can divide the matrices into α/β2 submatrices of size β × β, which can each be solved in O (βω) operations.

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 21 / 28

slide-28
SLIDE 28

Fast and output-sensitive matrix mult The row-balanced case

The cost 3

Say A is U ′ × U and C is U × U. All O(log U) compressed matrices can be done in time FRAM(U ′, U, d) and a specific entry of AC can be queried in O(log U) time. Report all non-zero entries:

◮ A random linear form over at least one non-zero variable is non-zero

with probability 1 − 1/|F|.

◮ Create tree-summaries of all rows simultaneously (these are matrices) ◮ A tree traversal finds the non-zero entries

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 22 / 28

slide-29
SLIDE 29

Fast and output-sensitive matrix mult The row-balanced case

The cost 3

Say A is U ′ × U and C is U × U. All O(log U) compressed matrices can be done in time FRAM(U ′, U, d) and a specific entry of AC can be queried in O(log U) time. Report all non-zero entries:

◮ A random linear form over at least one non-zero variable is non-zero

with probability 1 − 1/|F|.

◮ Create tree-summaries of all rows simultaneously (these are matrices) ◮ A tree traversal finds the non-zero entries

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 22 / 28

slide-30
SLIDE 30

Fast and output-sensitive matrix mult The general case

Partitioning the general case

◮ Status: We have a good algorithm when number of nonzero entries in

a row are bounded.

◮ Problem: They are not bounded. ◮ Solution: We partition the input into subproblems where they are

bounded.

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 23 / 28

slide-31
SLIDE 31

Fast and output-sensitive matrix mult The general case

Partitioning using size oracle

Assume we have oracle that for any row of AC tells us the number of nonzero entries of that row. A C AC × =

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 24 / 28

slide-32
SLIDE 32

Fast and output-sensitive matrix mult The general case

Partitioning using size oracle

Assume we have oracle that for any row of AC tells us the number of nonzero entries of that row. Use this to permute rows of AC – equivalent to permuting rows of A. A C AC × =

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 24 / 28

slide-33
SLIDE 33

Fast and output-sensitive matrix mult The general case

Partitioning using size oracle

Assume we have oracle that for any row of AC tells us the number of nonzero entries of that row. Use this to permute rows of AC – equivalent to permuting rows of A. A C AC × =

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 24 / 28

slide-34
SLIDE 34

Fast and output-sensitive matrix mult The general case

Partitioning using size oracle

After permuting in a sorted manner, invoke row-balanced case! A C AC × =

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 24 / 28

slide-35
SLIDE 35

Fast and output-sensitive matrix mult The general case

Partitioning using size oracle

After permuting in a sorted manner, invoke row-balanced case! A C AC × =

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 24 / 28

slide-36
SLIDE 36

Fast and output-sensitive matrix mult The general case

Partitioning analysis

Good news: A suitable oracle exists [Pagh and S., ESA’14]. How we proceed, at a high level:

◮ Create groups of output rows using the oracle. ◮ Row i with number of entries zi belong to group l if

U · 2−l−1 ≤ zi ≤ U · 2−l.

◮ This gives at most log U groups. Each group will correspond to an

invocation of the row-balanced method.

◮ Say group l has xl rows. The overall running time is

  • l FRAM(U, xl, 5U2−l).

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 25 / 28

slide-37
SLIDE 37

Fast and output-sensitive matrix mult The general case

Partitioning analysis, 2

It turns out we can bound the smallest dimension for a worst-case partition to be “average density” Z/U.

◮ This gives in total ˜

O (FRAM(U, Z/U, U)) = ˜ O

  • U 2(Z/U)ω−2

(by Fact 1).

◮ Better bound on the smallest dimension not possible – would imply

  • (U ω) for dense matrices.

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 26 / 28

slide-38
SLIDE 38

Fast and output-sensitive matrix mult The general case

Concluding words

◮ We saw two techniques: collapsing columns and permuting rows

to achieve a speedup.

◮ Improves upon current state of the art when input is dense. ◮ External memory model: not mentioned here but comes almost for

free.

◮ Two obvious open problems: Remove Monte Carlo component and/or

get better complexity (O((N + Z)ω)?).

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 27 / 28

slide-39
SLIDE 39

Fast Output-Sensitive Matrix Multiplication

ESA 2015

Riko Jacob1, Morten St¨

  • ckel2

1IT University of Copenhagen, 2 University of Copenhagen

March 8 2016

Jacob, St¨

  • ckel ITU, DIKU

March 8 2016 28 / 28