Dimension Independent Matrix Square Introduction using MapReduce - - PowerPoint PPT Presentation

dimension independent matrix square
SMART_READER_LITE
LIVE PREVIEW

Dimension Independent Matrix Square Introduction using MapReduce - - PowerPoint PPT Presentation

Dimension Independent Matrix Square Reza Zadeh Dimension Independent Matrix Square Introduction using MapReduce The Problem Why Bother MapReduce First Pass Naive Reza Bosagh Zadeh Analysis DIMSUM Algorithm Shuffle Size Correctness


slide-1
SLIDE 1

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Dimension Independent Matrix Square using MapReduce

Reza Bosagh Zadeh STOC 2013

slide-2
SLIDE 2

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Outline

1

Introduction The Problem Why Bother MapReduce

2

First Pass Naive Analysis

3

DIMSUM Algorithm Shuffle Size Correctness Singular values Similarities

4

Experiments Large Small

5

More Results

slide-3
SLIDE 3

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Computing ATA Given m × n matrix A with entries in [0, 1] and m ≫ n, compute ATA. A =      a1,1 a1,2 · · · a1,n a2,1 a2,2 · · · a2,n . . . . . . ... . . . am,1 am,2 · · · am,n      A is tall and skinny, example values m = 1012, n = 106. A has sparse rows, each row has at most L nonzeros. A is stored across thousands of machines and cannot be streamed through a single machine.

slide-4
SLIDE 4

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Guarantees Preserve singular values of ATA with ǫ relative error paying shuffle size O(n2/ǫ2) and reduce-key complexity O(n/ǫ2). i.e. independent of m. Preserve specific entries of ATA, then we can reduce the shuffle size to O(n log(n)/s) and reduce-key complexity to O(log(n)/s) where s is the minimum similarity for the entries being estimated. Similarity can be via Cosine, Dice, Overlap, or Jaccard.

slide-5
SLIDE 5

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Computing All Pairs of Cosine Similarities We have to find dot products between all pairs of columns of A We prove results for general matrices, but can do better for those entries with cos(i, j) ≥ s Cosine similarity: a widely used definition for “similarity" between two vectors cos(i, j) = cT

i cj

||ci||||cj|| ci is the i′th column of A

slide-6
SLIDE 6

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Ubiquitous problem

slide-7
SLIDE 7

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

MapReduce With such large datasets (e.g. m = 1012), we must use many machines. Biggest clusters of computers use MapReduce MapReduce is the tool of choice in such distributed systems With so many machines (around 1000), CPU power is abundant, but communication is expensive 2 Minute description of MapReduce...

slide-8
SLIDE 8

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

MapReduce

slide-9
SLIDE 9

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

MapReduce

slide-10
SLIDE 10

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

MapReduce Input gets dished out to the mappers roughly equally Two performance measures 1) Shuffle size: shuffling the data output by the mappers to the correct reducer is expensive 2) Largest reduce-key: can’t send too much of the data to a single reducer First pass at implementing cos(i, j) in MapReduce...

slide-11
SLIDE 11

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Naive Implementation

1

Given row ri, Map with NaiveMapper (Algorithm 1)

2

Reduce using the NaiveReducer (Algorithm 2) Algorithm 1 NaiveMapper(ri) for all pairs (aij, aik) in ri do Emit ((cj, ck) → aijaik) end for Algorithm 2 NaiveReducer((ci, cj), v1, . . . , vR)

  • utput cT

i cj → R i=1 vi

slide-12
SLIDE 12

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Analysis for First Pass Very easy analysis 1) Shuffle size: O(mL2) 2) Largest reduce-key: O(m) Both depend on m, the larger dimension, and are intractable for m = 1012, L = 100. We’ll bring both down via clever sampling

slide-13
SLIDE 13

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

DIMSUM Algorithm Algorithm 3 DIMSUMMapper(ri) for all pairs (aij, aik) in ri do With probability min

  • 1, γ

1 ||cj||||ck||

  • emit ((cj, ck) → aijaik)

end for Algorithm 4 DIMSUMReducer((ci, cj), v1, . . . , vR) if

γ ||ci||||cj|| > 1 then

  • utput bij →

1 ||ci||||cj||

R

i=1 vi

else

  • utput bij → 1

γ

R

i=1 vi

end if

slide-14
SLIDE 14

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Analysis for DIMSUM Four things to prove:

1

Shuffle size: O(nLγ)

2

Largest reduce-key: O(γ)

3

The sampling scheme preserves similarities when γ = Ω(log(n)/s)

4

The sampling scheme preserves singular values when γ = Ω(n/ǫ2)

slide-15
SLIDE 15

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Analysis for DIMSUM Some notation

1

#(ci, cj) is the number of times columns i and j have a nonzero in the same dimension

2

#(ci) is the number of nonzeros in the vector ci

3

Theorem will be about {0, 1} matrices, but can be generalized

slide-16
SLIDE 16

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Shuffle size for DIMSUM Theorem For {0, 1} matrices, the expected shuffle size for DIMSUMMapper is O(nLγ). Proof. The expected contribution from each pair of columns will constitute the shuffle size:

n

  • i=1

n

  • j=i+1

#(ci,cj)

  • k=1

Pr[DIMSUMSampleEmit(ci, cj)] =

n

  • i=1

n

  • j=i+1

#(ci, cj)Pr[CosineSampleEmit(ci, cj)]

slide-17
SLIDE 17

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Shuffle size for DIMSUM Proof. ≤

n

  • i=1

n

  • j=i+1

γ #(ci, cj)

  • #(ci)
  • #(cj)
slide-18
SLIDE 18

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Shuffle size for DIMSUM Proof. ≤

n

  • i=1

n

  • j=i+1

γ #(ci, cj)

  • #(ci)
  • #(cj)

(by AM-GM) ≤ γ

n

  • i=1

n

  • j=i+1

#(ci, cj)( 1 #(ci) + 1 #(cj))

slide-19
SLIDE 19

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Shuffle size for DIMSUM Proof. ≤

n

  • i=1

n

  • j=i+1

γ #(ci, cj)

  • #(ci)
  • #(cj)

(by AM-GM) ≤ γ

n

  • i=1

n

  • j=i+1

#(ci, cj)( 1 #(ci) + 1 #(cj)) ≤ γ

n

  • i=1

1 #(ci)

n

  • j=1

#(ci, cj)

slide-20
SLIDE 20

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Shuffle size for DIMSUM Proof. ≤

n

  • i=1

n

  • j=i+1

γ #(ci, cj)

  • #(ci)
  • #(cj)

(by AM-GM) ≤ γ

n

  • i=1

n

  • j=i+1

#(ci, cj)( 1 #(ci) + 1 #(cj)) ≤ γ

n

  • i=1

1 #(ci)

n

  • j=1

#(ci, cj) ≤ γ

n

  • i=1

1 #(ci)L#(ci) = γLD

slide-21
SLIDE 21

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Shuffle size for DIMSUM It is easy to see via Chernoff bounds that the above shuffle size is obtained with high probability. O(nLγ) has no dependence on the dimension m, this is the heart of DIMSUM. Happens because higher magnitude columns are sampled with lower probability: γ 1 ||c1||||c2||

slide-22
SLIDE 22

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Shuffle size for DIMSUM For matrices with real entries, we can still get a bound Let H be the smallest nonzero entry in magnitude, after all entries of A have been scaled to be in [0, 1] E.g. for {0, 1} matrices, we have H = 1 Shuffle size is bounded by O(nLγ/H2)

slide-23
SLIDE 23

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Largest reduce key for DIMSUM Each reduce key receives at most γ values (the

  • versampling parameter)

Immediately get that reduce-key complexity is O(γ) Also independent of dimension m. Happens because high magnitude columns are sampled with lower probability.

slide-24
SLIDE 24

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Correctness Since higher magnitude columns are sampled with lower probability, are we guaranteed to obtain correct results w.h.p.?

  • Yes. But setting γ correctly.

Preserve similarities when γ = Ω(log(n)/s) Preserve singular values when γ = Ω(n/ǫ2)

slide-25
SLIDE 25

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Correctness Theorem Let A be an m × n tall and skinny (m > n) matrix. If γ = Ω(n/ǫ2) and D a diagonal matrix with entries dii = ||ci||, then the matrix B output by DIMSUM satisfies, ||DBD − ATA||2 ||ATA||2 ≤ ǫ with probability at least 1/2. Relative error guaranteed to be low with high probability.

slide-26
SLIDE 26

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Proof Uses Latala’s theorem, bounds 2nd and 4th central moments of entries of B. Latala’s Theorem. Really need extra power of moments.

slide-27
SLIDE 27

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Latala’s Theorem Theorem (Latala’s theorem). Let X be a random matrix whose entries xij are independent centered random variables with finite fourth moment. Denoting ||X||2 as the matrix spectral norm, we have E ||X||2 ≤ C[max

i

 

j

E x2

ij

 

1/2

+ max

j

  • i

E x2

ij

1/2 +  

i,j

E x4

ij

 

1/4

].

slide-28
SLIDE 28

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Proof Prove two things E[(bij − Ebij)2] ≤ 1

γ (easy)

E[(bij − Ebij)4] ≤ 2

γ2 (not easy)

Details in paper.

slide-29
SLIDE 29

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Correctness Theorem For any two columns ci and cj having cos(ci, cj) ≥ s, let B be the output of DIMSUM with entries bij = 1

γ

m

k=1 Xijk with

Xijk as the indicator for the k’th coin in the call to

  • DIMSUMMapper. Now if γ = Ω(α/s), then we have,

Pr

  • ||ci||||cj||bij > (1 + δ)[ATA]ij

(1 + δ)(1+δ) α and Pr

  • ||ci||||cj||bi,j < (1 − δ)[ATA]ij
  • < exp(−αδ2/2)

Relative error guaranteed to be low with high probability.

slide-30
SLIDE 30

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Correctness Proof. In the paper at http://reza-zadeh.com Uses standard concentration inequality for sums of indicator random variables. Ends up requiring that the oversampling parameter γ be set to γ = log(n2)/s = 2 log(n)/s.

slide-31
SLIDE 31

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Experiments Large scale experiment live at twitter.com Smaller scale experiment with points as words, and dimensions as tweets m = 200M, n = 1000, L = 10

slide-32
SLIDE 32

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Experiments

0.1 0.2 0.3 0.4 0.5 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

similarity threshold avg relative err

DISCO Cosine Similarity

Figure : Average error for all pairs with similarity threshold s. DIMSUM estimated Cosine error decreases for more similar pairs.

slide-33
SLIDE 33

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Experiments

2 4 6 8 10 12 14 16 18 0.2 0.4 0.6 0.8 1

log(p / ε)

DISCO Cosine shuffle size vs accuracy tradeoff DISCO Shuffle / Naive Shuffle 2 4 6 8 10 12 14 16 18 0.5 1 1.5 2 2.5 avg relative err DISCO Shuffle / Naive Shuffle avg relative err

Figure : As γ = p/ǫ increases, shuffle size increases and error

  • decreases. There is no thresholding for highly similar pairs here.
slide-34
SLIDE 34

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Other Similarity Measures This all works for many other similarity measures.

Similarity Definition Shuffle Size Reduce-key size Cosine

#(x,y)

#(x)√ #(y)

O(nL log(n)/s) O(log(n)/s) Jaccard

#(x,y) #(x)+#(y)−#(x,y)

O((n/s) log(n/s)) O(log(n/s)/s) Overlap

#(x,y) min(#(x),#(y))

O(nL log(n)/s) O(log(n)/s) Dice

2#(x,y) #(x)+#(y)

O(nL log(n)/s) O(log(n)/s)

Table : All sizes are independent of m, the dimension. These are bounds for shuffle size without combining. Combining can only bring down these sizes.

slide-35
SLIDE 35

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Locality Sensitive Hashing MinHash from the Locality-Sensitive-Hashing family can have its vanilla implementation greatly improved by DIMSUM. Theorems for shuffle size and correctness in paper.

slide-36
SLIDE 36

Dimension Independent Matrix Square Reza Zadeh Introduction

The Problem Why Bother MapReduce

First Pass

Naive Analysis

DIMSUM

Algorithm Shuffle Size Correctness Singular values Similarities

Experiments

Large Small

More Results

Fin. Consider DIMSUM if you ever need to compute ATA for large sparse A Many more experiments and results at reza-zadeh.com Thanks!