 
              Algorithms for Big Data (X) Chihao Zhang Shanghai Jiao Tong University Nov. 22, 2019 Algorithms for Big Data (X) 1/10
Algorithms for Big Data (X) Matrix Multiplication 2/10 Given two matrices A ∈ R m × n and B ∈ R n × p , we computes C = AB . For m = n = p , the naive algorithm costs O ( n 3 ) multiplication operations. The Strassen’s algorithm reduces the cost to O ( n 2.81 ) . The best algorithm so far costs O ( n ω ) where ω < 2.3728639 . Today we will introduce a Monte-Carlo algorithm to approximate AB .
Review of Linear Algebra . Algorithms for Big Data (X) . . 3/10  b T  1 � � Assume A = and B = a 1 , . . . , a n     . b T n Then AB = ∑ n i = 1 a i b T i , where each a i b T i is of rank 1 . The Frobenius norm of a matrix A = ( a ij ) 1 ≤ i ≤ m,1 ≤ j ≤ n is � m n � ∑ ∑ ∥ A ∥ F ≜ � a 2 ij . � i = 1 j = 1
The Algorithm Algorithms for Big Data (X) 4/10 Note that AB = ∑ n i = 1 a i b T i . The algorithm randomly pick indices i ∈ [ n ] independently c times (with replacement). Let J : [ c ] → [ n ] denote the indices. Output ∑ c i = 1 w ( J ( i )) · a J ( i ) b T J ( i ) , where w ( J ( i )) is some weight to be determined.
5/10 otherwise Algorithms for Big Data (X) . We fix a distribution on [ n ] ( p i for i ∈ [ n ] satisfying ∑ i ∈ [ n ] p i = 1 ). Therefore, the index j is picked c · p j times in expectation, so we can set w ( j ) = ( cp j ) − 1 . It is convenient to formulate the algorithm using matrices. Define a random sampling matrix Π = ( π ij ) ∈ R c × c such that { ( cp i ) − 1 if i = J ( j ) 2 π ij = 0 Then our algorithm outputs A ′ B ′ where A ′ = AΠ and B ′ = Π T B.
Analysis . Algorithms for Big Data (X) E 6/10 We are going to choose some ( p i ) i ∈ [ n ] so that A ′ B ′ ≈ AB . � � a J ( k ) b T J ( k ) Fix i, j for any k ∈ [ c ] , we let X k = cp J ( k ) ij n � a ℓ b T ∑ � = 1 ℓ E [ X k ] = c ( AB ) ij p ℓ cp ℓ ij ℓ = 1 � 2 n n a 2 ℓi b 2 � a ℓ b T ∑ ∑ � � ℓj X 2 ℓ = p ℓ = k c 2 p ℓ cp ℓ ij ℓ = 1 ℓ = 1 n a 2 ℓi b 2 ∑ − 1 ℓj c 2 ( AB ) 2 Var [ X k ] = ij . c 2 p ℓ ℓ = 1
Therefore, We are going to study the concentration of this algorithm. Algorithms for Big Data (X) Var E E E We compute that 7/10 c ∑ ( A ′ B ′ ) ij � � = E [ X k ] = ( AB ) ij . k = 1 n p ∑ ∑ � � � � ∥ AB − A ′ B ′ ∥ 2 ( AB − A ′ B ′ ) 2 = F ij i = 1 j = 1 n p ∑ ∑ � ( A ′ B ′ ) ij � = i = 1 j = 1 � n � = 1 ∑ 1 ∥ a ℓ ∥ 2 ∥ b ℓ ∥ 2 − ∥ AB ∥ 2 F c p ℓ ℓ = 1
8/10 E Algorithms for Big Data (X) If we choose p ℓ ∼ ∥ a ℓ ∥∥ b ℓ ∥ , then � n   � 2 ∑ = 1 � � ∥ AB − A ′ B ′ ∥ 2 − ∥ AB ∥ 2 ∥ a ℓ ∥∥ b ℓ ∥ F  F  c ℓ = 1 � n � 2 ∑ ≤ 1 ∥ a ℓ ∥∥ b ℓ ∥ c ℓ = 1 ≤ 1 c ∥ A ∥ 2 F ∥ B ∥ 2 F .
Therefore, by Chebyshev’s inequality, Pr Algorithms for Big Data (X) 9/10 1 � � � ∥ AB − A ′ B ′ ∥ F > ε ∥ A ∥ F ∥ B ∥ F � ∥ AB − A ′ B ′ ∥ 2 F > ε 2 ∥ A ∥ 2 F ∥ B ∥ 2 = Pr ≤ cε 2 . F We can use a variant of median trick to boost the algorithm. � 1 We can choose c = O ( 1 � ) to achieve 1 − δ probability of correctness. ε 2 log δ
Graph Spectrum Algorithms for Big Data (X) 10/10
Recommend
More recommend