Algorithms for Big Data (X) Chihao Zhang Shanghai Jiao Tong - - PowerPoint PPT Presentation

algorithms for big data x
SMART_READER_LITE
LIVE PREVIEW

Algorithms for Big Data (X) Chihao Zhang Shanghai Jiao Tong - - PowerPoint PPT Presentation

Algorithms for Big Data (X) Chihao Zhang Shanghai Jiao Tong University Nov. 22, 2019 Algorithms for Big Data (X) 1/10 Algorithms for Big Data (X) Matrix Multiplication 2/10 Given two matrices A R m n and B R n p , we computes C


slide-1
SLIDE 1

Algorithms for Big Data (X)

Chihao Zhang

Shanghai Jiao Tong University

  • Nov. 22, 2019

Algorithms for Big Data (X) 1/10

slide-2
SLIDE 2

Matrix Multiplication

Given two matrices A ∈ Rm×n and B ∈ Rn×p, we computes C = AB. For m = n = p, the naive algorithm costs O(n3) multiplication operations. The Strassen’s algorithm reduces the cost to O(n2.81). The best algorithm so far costs O(nω) where ω < 2.3728639. Today we will introduce a Monte-Carlo algorithm to approximate AB.

Algorithms for Big Data (X) 2/10

slide-3
SLIDE 3

Review of Linear Algebra

Assume A =

  • a1, . . . , an
  • and B =

   bT

1

. . . bT

n

  . Then AB = ∑n

i=1 aibT i , where each aibT i is of rank 1.

The Frobenius norm of a matrix A = (aij)1≤i≤m,1≤j≤n is ∥A∥F ≜

  • m

i=1 n

j=1

a2

ij.

Algorithms for Big Data (X) 3/10

slide-4
SLIDE 4

The Algorithm

Note that AB = ∑n

i=1 aibT i .

The algorithm randomly pick indices i ∈ [n] independently c times (with replacement). Let J : [c] → [n] denote the indices. Output ∑c

i=1 w(J(i)) · aJ(i)bT J(i), where w(J(i)) is some weight to be determined.

Algorithms for Big Data (X) 4/10

slide-5
SLIDE 5

We fix a distribution on [n] (pi for i ∈ [n] satisfying ∑

i∈[n] pi = 1).

Therefore, the index j is picked c · pj times in expectation, so we can set w(j) = (cpj)−1. It is convenient to formulate the algorithm using matrices. Define a random sampling matrix Π = (πij) ∈ Rc×c such that πij = { (cpi)− 1

2

if i = J(j)

  • therwise

. Then our algorithm outputs A′B′ where A′ = AΠ and B′ = ΠTB.

Algorithms for Big Data (X) 5/10

slide-6
SLIDE 6

Analysis

We are going to choose some (pi)i∈[n] so that A′B′ ≈ AB. Fix i, j for any k ∈ [c], we let Xk =

  • aJ(k)bT

J(k)

cpJ(k)

  • ij

. E [Xk] =

n

ℓ=1

pℓ aℓbT

cpℓ

  • ij

= 1 c (AB)ij E

  • X2

k

  • =

n

ℓ=1

pℓ aℓbT

cpℓ 2

ij

=

n

ℓ=1

a2

ℓib2 ℓj

c2pℓ Var [Xk] =

n

ℓ=1

a2

ℓib2 ℓj

c2pℓ − 1 c2 (AB)2

ij .

Algorithms for Big Data (X) 6/10

slide-7
SLIDE 7

Therefore, E

  • (A′B′)ij
  • =

c

k=1

E [Xk] = (AB)ij. We are going to study the concentration of this algorithm. We compute that E

  • ∥AB − A′B′∥2

F

  • =

n

i=1 p

j=1

E

  • (AB − A′B′)2

ij

  • =

n

i=1 p

j=1

Var

  • (A′B′)ij
  • = 1

c n ∑

ℓ=1

1 pℓ ∥aℓ∥2∥bℓ∥2 − ∥AB∥2

F

  • Algorithms for Big Data (X)

7/10

slide-8
SLIDE 8

If we choose pℓ ∼ ∥aℓ∥∥bℓ∥, then E

  • ∥AB − A′B′∥2

F

  • = 1

c   n ∑

ℓ=1

∥aℓ∥∥bℓ∥ 2 − ∥AB∥2

F

  ≤ 1 c n ∑

ℓ=1

∥aℓ∥∥bℓ∥ 2 ≤ 1 c∥A∥2

F∥B∥2 F.

Algorithms for Big Data (X) 8/10

slide-9
SLIDE 9

Therefore, by Chebyshev’s inequality, Pr

  • ∥AB − A′B′∥F > ε∥A∥F∥B∥F
  • = Pr
  • ∥AB − A′B′∥2

F > ε2∥A∥2 F∥B∥2 F

1 cε2 . We can use a variant of median trick to boost the algorithm. We can choose c = O( 1

ε2 log

1

δ

  • ) to achieve 1 − δ probability of correctness.

Algorithms for Big Data (X) 9/10

slide-10
SLIDE 10

Graph Spectrum

Algorithms for Big Data (X) 10/10