structure tensors
play

structure tensors Lek-Heng Lim July 18, 2017 acknowledgments - PowerPoint PPT Presentation

structure tensors Lek-Heng Lim July 18, 2017 acknowledgments Turner, many MS students as well Fahroo), NSF thank you all from the bottom of my heart kind colleagues who nominated/supported me: Shmuel Friedland Sayan Mukherjee


  1. structure tensors Lek-Heng Lim July 18, 2017

  2. acknowledgments Turner, many MS students as well Fahroo), NSF thank you all from the bottom of my heart • kind colleagues who nominated/supported me: ⋄ Shmuel Friedland ⋄ Sayan Mukherjee ⋄ Pierre Comon ⋄ Jiawang Nie ⋄ Ming Gu ⋄ Bernd Sturmfels ⋄ Jean Bernard Lasserre ⋄ Charles Van Loan • postdocs: Ke Ye, Yang Qi, Jose Rodriguez, Anne Shiu • students: Liwen Zhang, Ken Wong, Greg Naitzat, Kate • brilliant collaborators: too many to list • funding agencies: AFOSR, DARPA (special thanks to Fariba

  3. motivation

  4. goal: fjnd fastest algorithms m • fast algorithms are rarely obvious algorithms • want fast algorithms for bilinear operation β : U × V → W ( A , x ) �→ A x , ( A , B ) �→ AB , ( A , B ) �→ AB − BA • embed into appropriate algebra A ι U ⊗ V A ⊗ A β A W π • systematic way to discover new algorithms via structure tensors µ β and µ A • fastest algorithms: rank of structure tensor • stablest algorithms: nuclear norm of structure tensor

  5. ubiquitous problems • linear equations, least squares, eigenvalue problem, etc A x = b , min ∥ A x − b ∥ , A x = λ x , x = exp ( A ) b • backbone of numerical computations • almost always: A ∈ C n × n has structure • very often: A ∈ C n × n prohibitively high-dimensional • impossible to solve without exploiting structure

  6. structured matrices h n f -circulant, symmetric, skew-seymmetric, triangular Toeplitz, h n symmetric Toeplitz, etc advantage of them” [Wilkinson, 1971] • sparse: “any matrix with enough zeros that it pays to take • classical: circulant, Toeplitz, Hankel     · · · t 0 t − 1 t 1 − n h 0 h 1 h n − 1     ... ...      t 1 t 0   h 1 h 2  T = , H =     . . ... ... ... ... . .     . . t − 1 · · · t n − 1 t 1 t 0 h n − 1 h 2 n − 2 • many more: banded, triangular, Toeplitz-plus-Hankel,

  7. multilevel • 2 -level: block-Toeplitz-Toeplitz-blocks ( bttb ):   T 0 T − 1 T 1 − n   ...    T 1 T 0  ∈ C mn × mn T =   ... ...   T − 1 T n − 1 T 1 T 0 where T i ∈ C m × m are Toeplitz matrices • 3 -level: block-Toeplitz with bttb blocks • 4 -level: block- bttb with bttb blocks • and so on • also multilevel versions of: • block-circulant-circulant-blocks ( bccb ) • block-Hankel-Hankel-blocks ( bhhb ) • block-Toeplitz-plus-Hankel-Toeplitz-plus-Hankel-blocks ( bththb )

  8. if A diagonalizable Krylov subspace methods • easiest way to exploit structure in A • basic idea: by Cayley–Hamilton, α 0 I + α 1 A + · · · + α d A d = 0 for some d ≤ n , so A − 1 = − α 1 I − α 2 A − · · · − α d A d − 1 α 0 α 0 α 0 and so x = A − 1 b ∈ span { b , A b , . . . , A d − 1 b } • one advantage: d can be much smaller than n , e.g. d = number of distinct eigenvalues of A • another advantage: reduces to forming matrix-vector product ( A , x ) �→ A x effjciently

  9. fastest algorithms where ignores addition, subtraction, scalar multiplication • bilinear complexity: counts only multiplication of variables, • Gauss’s method ( a + bi )( c + di ) = ( ac − bd ) + i ( bc + ad ) = ( ac − bd ) + i [( a + b )( c + d ) − ac − bd ] • usual: 4 × ’s and 2 ± ’s; Gauss: 3 × ’s and 5 ± ’s • Strassen’s algorithm [ a 1 ] [ b 1 ] [ ] a 1 b 1 + a 2 b 2 β + γ + ( a 1 + a 2 − a 3 − a 4 ) b 4 a 2 b 2 = α + γ + a 4 ( b 2 + b 3 − b 1 − b 4 ) α + β + γ a 3 a 4 b 3 b 4 α = ( a 3 − a 1 )( b 3 − b 4 ) , β = ( a 3 + a 4 )( b 3 − b 1 ) , γ = a 1 b 1 +( a 3 + a 4 − a 1 )( b 1 + b 4 − b 3 ) • usual: 8 × ’s and 8 ± ’s; Strassen: 7 × ’s and 15 ± ’s

  10. why minimize multiplications? consumption, number of gates, code space GPU, motion coprocessor, smart chip matrix multiplication vastly more expensive than matrix addition • nowadays: latency of FMUL ≈ latency of FADD • may want other measures of computational cost: e.g. energy • multiplier requires many more gates than adder (e.g. 18-bit: 2200 vs 125) → more wires/transistors → more energy • may not use general purpose CPU: e.g. ASIC, DSP, FPGA, • block operations: A , B , C , D ∈ R n × n ( A + iB )( C + iD ) = ( AC − BD ) + i [( A + B )( C + D ) − AC − BD ]

  11. structure tensors

  12. structure tensor • bilinear operator β : U × V → W , β ( a 1 u 1 + a 2 u 2 , v ) = a 1 β ( u 1 , v ) + a 2 β ( u 2 , v ) , β ( u , a 1 v 1 + a 2 v 2 ) = a 1 β ( u , v 1 ) + a 2 β ( u , v 2 ) • there exists unique 3 -tensor µ β ∈ U ∗ ⊗ V ∗ ⊗ W such that given any ( u , v ) ∈ U × V we have β ( u , v ) = µ β ( u , v , · ) ∈ W • examples of β : U × V → W , ( A , x ) �→ A x , ( A , B ) �→ AB , ( A , B ) �→ AB − BA • call µ β structure tensor of bilinear map β

  13. structure constants hypermatrix • if we give µ β coordinates, i.e., choose bases on U , V , W , get ( µ ijk ) ∈ C m × n × p where m = dim U , n = dim V , p = dim W , ∑ p β ( u i , v j ) = k =1 µ ijk w k , i = 1 , . . . , m , j = 1 , . . . , n • d -dimensional hypermatrix is d -tensor in coordinates • call µ ijk structure constants of β

  14. example: physics is Levi-Civita symbol • g Lie algebra with basis { e i : i = 1 , . . . , n } ∑ n [ e i , e j ] = k =1 c ijk e k • ( c ijk ) ∈ C n × n × n structure constants measure self-interaction • structure tensor of g is ∑ n j ⊗ e k ∈ g ∗ ⊗ g ∗ ⊗ g i , j , k =1 c ijk e ∗ i ⊗ e ∗ µ g = • take g = so 3 , real 3 × 3 skew symmetric matrices and  0 0 0   0 0 − 1   0 − 1 0  e 1 =  , e 2 =  , e 3 = 0 0 − 1 0 0 0 1 0 0     0 1 0 1 0 0 0 0 0 • structure tensor of so 3 is ∑ 3 i , j , k =1 ε ijk e ∗ i ⊗ e ∗ j ⊗ e k , µ so 3 = where ε ijk = ( i − j )( j − k )( k − i ) 2

  15. example: numerical computations to multiply two matrices [Strassen, 1973] • for A = ( a ij ) ∈ C m × n , B = ( b jk ) ∈ C n × p , ∑ m , n , p ∑ m , n , p i , j , k =1 E ∗ ik ( A ) E ∗ AB = i , j , k =1 a ik b kj E ij = kj ( B ) E ij ij : C m × n → C , A �→ a ij where E ij = e i e T j ∈ C m × n and E ∗ ∑ m , n , p • let i , j , k =1 E ∗ ik ⊗ E ∗ µ m , n , p = kj ⊗ E ij write µ n = µ n , n , n • structure tensor of matrix-matrix product µ m , n , p ∈ ( C m × n ) ∗ ⊗ ( C n × p ) ∗ ⊗ C m × p ∼ = C mn × np × pm • later: rank gives minimal number of multiplications required

  16. example: computer science NP-hard problems max n m max of matrix-matrix product [LHL, 2016] n m max • A ∈ R m × n , there exists K G > 0 such that ∑ ∑ a ij ⟨ x i , y j ⟩ x 1 ,..., x m , y 1 ,..., y n ∈ S m + n − 1 i =1 j =1 ∑ ∑ ≤ K G a ij ε i δ j . ε 1 ,...,ε m ,δ 1 ,...,δ n ∈{− 1 , +1 } i =1 j =1 • remarkable: K G independent of m and n [Grothendieck, 1953] • important: unique games conjecture and sdp relaxations of • best known bounds: 1 . 676 ≤ K G ≤ 1 . 782 • Grothendieck’s constant is injective norm of structure tensor µ m , n , m + n ( A , X , Y ) ∥ µ m , n , m + n ∥ 1 , 2 , ∞ := ∥ A ∥ ∞ , 1 ∥ X ∥ 1 , 2 ∥ Y ∥ 2 , ∞ A , X , Y ̸ =0

  17. example: algebraic geometry of an associative algebra [Kontsevich–Manin, 1994] • quantum potential of quantum cohomology ∞ ∑ z 3 d − 1 Φ( x , y , z ) = 1 2( xy 2 + x 2 z ) + N ( d ) (3 d − 1)! e dy d =1 N ( d ) is number of rational curves of degree d on the plane passing through 3 d − 1 points in general position 2 ( xy 2 + x 2 z ) + φ ( y , z ) , then φ satisfjes • Φ( x , y , z ) = 1 φ zzz = φ 2 yyz − φ yyy φ yzz • can be transformed into Painlevé-six • equivalent to third order derivative of Φ being structure tensor

  18. bilinear complexity = tensor rank • A ∈ C m × n × p , u ⊗ v ⊗ w := ( u i v j w k ) ∈ C m × n × p { } ∑ r i =1 λ i u i ⊗ v i ⊗ w i rank ( A ) = min r : A = • number of multiplications given by rank ( µ n ) • asymptotic growth • usual: O ( n 3 ) • earliest: O ( n log 2 7 ) [Strassen, 1969] • longest: O ( n 2 . 375477 ) [Coppersmith–Winograd, 1990] • recent: O ( n 2 . 3728642 ) [Williams, 2011] • latest: O ( n 2 . 3728639 ) [Le Gall, 2014] • exact: O ( n ω ) where ω := inf { α : rank ( µ n ) = O ( n α ) } • see [Bürgisser–Clausen–Shokrollahi, 1997]

  19. rank, decomposition, nuclear norm • tensor rank { } ∑ r i =1 λ i u i ⊗ v i ⊗ w i rank ( µ β ) = min r : µ β = gives least number of multiplications needed to compute β • tensor decomposition ∑ r i =1 λ i u i ⊗ v i ⊗ w i µ β = gives an explicit algorithm for computing β • tensor nuclear norm [Friedland–LHL, 2016] {∑ r } ∑ r i =1 λ i u i ⊗ v i ⊗ w i , r ∈ N ∥ µ β ∥ ∗ = inf i =1 | λ i | : µ β = quantifjes optimal numerical stability of computing β

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend