Shared Memory Parallelization of MTTKRP for Dense Tensors
Koby Hayashi, Grey Ballard, Yujie Jiang, Michael Tobia hayakb13@,ballard@,jiany14@,tobiamj@wfu.edu
BLIS Retreat 2017, September 18th
Shared Memory Parallelization of MTTKRP for Dense Tensors BLIS - - PowerPoint PPT Presentation
Shared Memory Parallelization of MTTKRP for Dense Tensors BLIS Retreat 2017, September 18 th Koby Hayashi , Grey Ballard, Yujie Jiang, Michael Tobia hayakb13@,ballard@,jiany14@,tobiamj@wfu.edu Neuroimaging Application ensor: Time by Subjects by
BLIS Retreat 2017, September 18th
ensor: Time by Subjects by
est: Rest Γ Activity Γ ecovery Subjects: Control, MDD, SAD, COMO Time
ay 2-way 5-way 3-way 4-way
nonical Polyadic composition (CP): Decomposes a tensor into a sum of rank 1 tensors
π΄ββπ=0βπ·β1ββπ£βππ ββπ€βππ ββπ₯βππ π΄ββ¦π,π,πβ§
π· = π΅βπΆ βπ·βππ =βπ΅βππ ββπΆβππ βπ·βππ βπ΅βππ βπΆβππ
= π½ πΎ πΎ πΎ βπ βπβ0 )ββ¦ββ(πβπβ1βπ βπβπβ1 )ββ(πβπ+1βπ βπβπ+1 )ββ¦ββ(πβπβ1βπ βπβπβ1 )
πΏ = π΅βπΆ
πΏ(:,π)= π΅(:,π)βπΆ(:,π)
πΏ(βπ βπΆ +βπ βπ΅ βπ½βπΆ ,:)= π΅(βπ βπ΅ ,:)βπΆ(βπ βπΆ ,:) π· πΏ(βπ βπΆ +βπ βπ΅ βπ½βπΆ ,:) =
A(βπ βπ΅ ,:) B(βπ βπΆ
β π· π· βπ½βπ΅ ββπ½βπΆ =βπβ πβ(π) (βπ½β π½βπΆβπ β¨β¦β¨βπ½β π½βπ+π β¨βπ½β π½βπβπ β¨β¦β¨βπ½β π½βπ )
βπ=0, π΄β(:ππ) βπ=1, π΄β(π:π) βπ=2, π΄β(ππ:)
βπ½ββ π βπ½βπ βπ={π,π+1,β¦,π}βββπ½βπ βπβ(π) βπβ(π:π) π=βπβ πβ(π) (βπ½β π½βπΆβπ β¨β¦β¨βπ½β π½βπ+π β¨βπ½β π½βπβπ β¨β¦β¨βπ½β π½βπ )
π=βπβ(π) (βπβ0 β¨β¦β¨βπβπβ1 β¨βπβπ+1 β¨β¦β¨βπβπβ1 ) NaΓ―ve algorithm 1. Permute π΄ to βπβ(π) 2. Form K=β(πβ0 β¨β¦β¨βπβπβ1 β¨βπβπ+1 β¨β¦β¨βπβπβ1 ) 3. Call DGEMM 1-Step and 2-Step MTTKRP 1. Avoid permuting π΄ 2. Efficiently form the KRP Β§ 1Step
Β§ 2Step
3. Utilize BLAS = βπ½βπ π· βπ½ββ π π· βπ½βπ βπβ(π) K
Consider πΏ=π΅β¨πΆβ¨π·
π΅ πΆ π· β¨ β¨ = βπ½βπ΅ βπ½βπΆ βπ½βπ π΅(0,:)βπΆ(0,:)β¨π· πΏ
void permuting tensor entries ast computation as matmul y observation: the nth mode tricization of a tensor can be tained by chunking the tensor contiguous submatrices of ual size.
!"#$% !#$% !"& !& !'
) blocks
!' 1 !'
(
2(4) 2(6) 2(7$8) 9
!'
) blocks
!"
#
!" $ !"
%
& !"
#
!" $ !"
%
&
2
! blocks )*
! )* !
)*
7
)*
7
60Γ60Γ60Γ60Γ60
wo interesting networks
Tobia M., Hayashi K., Ballard G., Gotlib I. Dynamic Functional Connectivity and Individual Differences in Emotions During Social Stress - to appear in uman Brain Mapping
Tamara G. Kolda and Bre8 W. Bader. 2009. Tensor DecomposiAons and ApplicaAons. SIAM Rev. 51, 3 (Septembe 2009), 455β500. h8ps://doi.org/10.1137/ 07070111X Jiajia Li, Jee Choi, Ioakeim Perros, Jimeng Sun, and Richard Vuduc. 2017. Model Driven Sparse CP DecomposiAon for Higher-Order Tensors. In IEEE InternaAonal Parallel and Distributed Processing Symposium (IPDPS). 1048β10 h8ps://doi.org/10.1109/IPDPS.2017.80 Shaden Smith, Niranjay Ravindran, Nicholas D. Sidiropoulos, and George Karypis. 2015. SPLATT: Efficient and Parallel Sparse Tensor-Matrix MulAplicaAon. In Proceedings of the 2015 IEEE InternaAonal Parallel and Distribute Processing Symposium (IPDPS β15). IEEE Computer Society, Washington, DC, USA, 61β70. h8ps://doi.org/10.1109/ IPDPS.2015.27 D.C. Van Essen, K. Ugurbil, E. Auerbach, D. Barch, T.E.J. Behrens, R. Bucholz, A. Chang, L. Chen, M. Corbe8a, S.W. CurAss, S. Della Penna, D. Feinberg, M.F. Glasser, N. Harel, A.C. Heath, L. Larson-Prior, D. Marcus, G. Michalareas
The Human Connectome Project: a data acquisiAon perspecAve. Neuroimage 62, 4 (2012), 2222β2231. h8ps:// doi.org/10. 1016/j.neuroimage.2012.02.018 Anh-Huy Phan, Petr Tichavsky, and Andrzej Cichocki. 2013. Fast AlternaAng LS Algorithms for High Order CANDECOMP/PARAFAC Tensor FactorizaAons. IEEE TransacAons on Signal Processing 61, 19 (Oct 2013), 4834β