Matrix Multiplication Rasmus Pagh IT University of Copenhagen - - PowerPoint PPT Presentation

matrix multiplication
SMART_READER_LITE
LIVE PREVIEW

Matrix Multiplication Rasmus Pagh IT University of Copenhagen - - PowerPoint PPT Presentation

Matrix Multiplication Rasmus Pagh IT University of Copenhagen ITCS, January 10, 2012 1 Matrix Multiplication Rasmus Pagh IT University of Copenhagen ITCS, January 10, 2012 2 Outline Algorithm and analysis Related work Case


slide-1
SLIDE 1

Matrix Multiplication

Rasmus Pagh IT University of Copenhagen ITCS, January 10, 2012

1

slide-2
SLIDE 2

Matrix Multiplication

Rasmus Pagh IT University of Copenhagen ITCS, January 10, 2012

2

slide-3
SLIDE 3

Outline

  • Algorithm and analysis
  • Related work
  • Case study: Correlations
  • Open problems

3

slide-4
SLIDE 4

Informal problem statement

  • Input: n-by-n matrices A and B,

parameter b.

  • Output: Approximation of AB that is good if

AB is dominated by its b largest entries (“compressible”).

4

slide-5
SLIDE 5

Basic algorithm

  • 1. Take hash functions s1,s2: [n]→{-1,1} and h1,h2: [n] →[b].
  • 2. Compute the polynomial
  • 3. Extract unbiased estimator

. X

i

cixi =

n

X

k=1

n X

i=1

Aiks1(i) xh1(i) ! 0 @

n

X

j=1

Bkjs2(j) xh2(j) 1 A .

(AB)ij ≈ s1(i) s2(j) ch1(i)+h2(j)

5

slide-6
SLIDE 6

Basic algorithm

  • 1. Take hash functions s1,s2: [n]→{-1,1} and h1,h2: [n] →[b].
  • 2. Compute the polynomial
  • 3. Extract unbiased estimator

. X

i

cixi =

n

X

k=1

n X

i=1

Aiks1(i) xh1(i) ! 0 @

n

X

j=1

Bkjs2(j) xh2(j) 1 A .

(AB)ij ≈ s1(i) s2(j) ch1(i)+h2(j)

Observation: Each coefficient ci is a sum of entries of AB with random signs

5

slide-7
SLIDE 7

Why unbiased?

Lemma: If s1 and s2 are pairwise independent,

E[s1(i1)s1(i2)s2(j1)s2(j2)] = ⇢ 1 if i1 = i2 and j1 = j2

  • therwise

.

. E 2 4s1(i)s2(j)

n

X

k=1

n X

i=1

Aiks1(i) xh1(i) ! 0 @

n

X

j=1

Bkjs2(j) xh2(j) 1 A 3 5 . =

n

X

k=1

Aiks2

1(i)Aikxh1(i)s2 2(j)Bkjxh2(j)

= (AB)ijxh1(i)+h2(j)

Using lemma, expected value of is: s1(i)s2(j) X

i

cixi

6

slide-8
SLIDE 8

What is the variance?

  • Consider the “noise” in estimator caused by (AB)i’j’:
  • If h1,h2 are 3-wise independent, these random

variables are uncorrelated, so:

Xi0j0 = ⇢ s1(i0)s2(j0)(AB)i0j0 if h1(i) + h2(j) = h1(i0) + h2(j0)

  • therwise

.

. Var @X

i0,j0

Xi0j0 1 A = X

i0,j0

Var (Xi0j0) = X

i0,j0

E[X2

i0j0]

≤ X

i0,j0

(AB)2

i0j0/b = ||AB||2 F /b

7

slide-9
SLIDE 9

Sparse outputs

  • Suppose AB has at most b/3 nonzero entries.
  • Then with probability 2/3 there is no noise in

a given estimator.

  • Repeat O(log n) times and take median

estimate, to get exact result whp.

8

slide-10
SLIDE 10

Time analysis

  • Construct 2n degree b polynomials: O(n2+nb).
  • Multiply n pairs of degree b polynomials, using

FFT: O(nb log b).

  • Extracting estimates: O(n2).

Total time: O(n2+nb log b).

9

slide-11
SLIDE 11

Background

  • The polynomial computed is in fact a Count-

Sketch [Charikar et al. ’04], an early compressed sensing method.

10

slide-12
SLIDE 12

Background

  • The polynomial computed is in fact a Count-

Sketch [Charikar et al. ’04], an early compressed sensing method.

  • Polynomial multiplication combines Count-

Sketches of column vector of A and a row vector of B into a Count-Sketch for their outer product.

10

slide-13
SLIDE 13

Background

  • The polynomial computed is in fact a Count-

Sketch [Charikar et al. ’04], an early compressed sensing method.

  • Polynomial multiplication combines Count-

Sketches of column vector of A and a row vector of B into a Count-Sketch for their outer product.

  • Add up outer product sketches to get a sketch

for AB.

10

slide-14
SLIDE 14

Some related results

  • Folklore: Computing AB with b nonzeros in

time O(nb) if there are no cancellations.

11

slide-15
SLIDE 15

Some related results

  • Folklore: Computing AB with b nonzeros in

time O(nb) if there are no cancellations.

  • Cohen and Lewis ’99: For nonnegative matrices,

estimate AB with low relative error.

11

slide-16
SLIDE 16

Some related results

  • Folklore: Computing AB with b nonzeros in

time O(nb) if there are no cancellations.

  • Cohen and Lewis ’99: For nonnegative matrices,

estimate AB with low relative error.

  • Iwen and Spencer ’09: Computing AB with ≤ b/n

nonzeros in each column in time Õ(nb).

11

slide-17
SLIDE 17

Some related results

  • Folklore: Computing AB with b nonzeros in

time O(nb) if there are no cancellations.

  • Cohen and Lewis ’99: For nonnegative matrices,

estimate AB with low relative error. ||A||F and ||B||F .

  • Iwen and Spencer ’09: Computing AB with ≤ b/n

nonzeros in each column in time Õ(nb).

  • Drineas, Kannan, Mahoney ’06; Sarlós ’06:

Computing AB with low total error in terms of

11

slide-18
SLIDE 18

Case study: Correlations

Two rows of A are correlated. Which ones?

A =

12

slide-19
SLIDE 19

Sample covariance matrix

AAT =

13

slide-20
SLIDE 20

Sample covariance matrix

AAT =

13

slide-21
SLIDE 21

Sample covariance matrix

AAT ≈

estimated using compressed matrix multiplication

14

slide-22
SLIDE 22

Sample covariance matrix

f(AAT)=

Showing large values not explained by hash collisions. estimated using compressed matrix multiplication

15

slide-23
SLIDE 23

Some open problems

  • Can other problems with “sparse

solutions” be solved efficiently using compressed sensing techniques?

  • Matrix inversion?
  • Linear systems with a sparse solution?
  • Sparse transitive closure of a graph?
  • Product of > 2 matrices?

16

slide-24
SLIDE 24

Discussion:

Combinatorial algorithms

  • Compressed MM can be considered “combinatorial”.
  • Another view: No large hidden constants

(in contrast to “algebraic” approaches leading to ω < 2.3727).

17

slide-25
SLIDE 25

Discussion:

Combinatorial algorithms

  • Compressed MM can be considered “combinatorial”.
  • Another view: No large hidden constants

(in contrast to “algebraic” approaches leading to ω < 2.3727).

  • It is interesting to consider what other subclasses of

matrix products can be computed in time, say, n2+ε, using algorithms with these properties.

17

slide-26
SLIDE 26

Hidden slide: Extra application

=

18

slide-27
SLIDE 27

Hidden slide: Extra application

=

http://xkcd.com/651/ 18