low rank approximation lecture 6
play

Low Rank Approximation Lecture 6 Daniel Kressner Chair for - PowerPoint PPT Presentation

Low Rank Approximation Lecture 6 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 The Kronecker product 2 Vectorization The vectorization of an m n matrix A denoted by vec (


  1. Low Rank Approximation Lecture 6 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1

  2. The Kronecker product 2

  3. Vectorization The vectorization of an m × n matrix A denoted by vec ( A ) , where vec : R m × n → R m · n stacks the columns of a matrix into a long column vector. Example:   a 11     a 21   a 11 a 12   a 31     A = a 21 a 22 ⇒ vec ( A ) =   a 12   a 31 a 32   a 22 a 32 Remarks: ◮ In M ATLAB : A(:) ◮ This way of vectorizing corresponds to how matrices are laid out in memory in M ATLAB . In other programming languages (e.g., C arrays) matrices are laid out rowwise. 3

  4. Kronecker product For m × n matrix A and k × ℓ matrix B , Kronecker product defined as   b 11 A · · · b 1 ℓ A   . .  ∈ R km × ℓ n . B ⊗ A := . .  . . b k 1 A · · · b k ℓ A Most important properties (for our purposes): 1. vec ( A X ) = ( I ⊗ A ) vec ( X ) . 2. vec ( X A T ) = ( A ⊗ I ) vec ( X ) . 3. ( B ⊗ A )( D ⊗ C ) = ( BD ⊗ AC ) for A ∈ R m × n , B ∈ R k × ℓ , C ∈ R n × q , D ∈ R ℓ × p . 4. I m ⊗ I n = I mn . 5. ( A 1 + A 2 ) ⊗ B = A 1 ⊗ B + A 2 ⊗ B , A ⊗ ( B 1 ⊗ B 2 ) = A ⊗ B 1 + A ⊗ B 2 4

  5. First steps with tensors 5

  6. Vectors, matrices, and tensors Vector Matrix Tensor ◮ scalar = tensor of order 0 ◮ (column) vector = tensor of order 1 ◮ matrix = tensor of order 2 ◮ tensor of order 3 = n 1 n 2 n 3 numbers arranged in n 1 × n 2 × n 3 array 6

  7. Tensors of arbitrary order A d -th order tensor X of size n 1 × n 2 × · · · × n d is a d -dimensional array with entries X i 1 , i 2 ,..., i d , i µ ∈ { 1 , . . . , n µ } for µ = 1 , . . . , d . In the following, entries of X are usually real (for simplicity) � X ∈ R n 1 × n 2 ×···× n d . Multi-index notation: I = { 1 , . . . , n 1 } × { 1 , . . . , n 2 } × · · · × { 1 , . . . , n d } . Then i ∈ I is a tuple of d indices: i = ( i 1 , i 2 , . . . , i d ) . Allows to write entries of X as X i for i ∈ I . 7

  8. Two important points 1. A matrix A ∈ R m × n has a natural interpretation as a linear operator in terms of matrix-vector multiplications: A : R n → R m , A : x �→ A · x . There is no such (unique and natural) interpretation for tensors! � fundamental difficulty to define meaningful general notion of eigenvalues and singular values of tensors. 2. Number of entries in tensor grows exponentially with d � Curse of dimensionality. Example: Tensor of order 30 with n 1 = n 2 = · · · = n d = 10 has 10 30 entries = 8 × 10 12 Exabyte storage! 1 For d ≫ 1: Cannot afford to store tensor explicitly (in terms of its entries). 1 Global data storage a few years ago calculated at 295 exabyte, see http://www.bbc.co.uk/news/technology-12419672 . 8

  9. Basic calculus ◮ Addition of two equal-sized tensors X , Y : Z = X + Y ⇔ Z i = X i + Y i ∀ i ∈ I . ◮ Scalar multiplication with α ∈ R : Z = α X ⇔ Z i = α X i ∀ i ∈ I . � vector space structure. ◮ Inner product of two equal-sized tensors X , Y : � �X , Y� := x i y i . i ∈ I � Induced norm � � � 1 / 2 x 2 �X� := i i ∈ I For a 2nd order tensor ( = matrix) this corresponds to the usual Euclidean geometry and Frobenius norm . 9

  10. Vectorization Tensor X of size n 1 × n 2 × · · · × n d has n 1 · n 2 · · · n d entries � many ways to stack entries in a (loooong) column vector. One possible choice: The vectorization of X is denoted by vec ( X ) , where vec : R n 1 × n 2 ×···× n d → R n 1 · n 2 ··· n d stacks the entries of a tensor in reverse lexicographical order into a long column vector. Example: d = 3, n 1 = 3, n 2 = 2, n 3 = 3. x 111   x 211   x 311     x 121     .   .   vec ( X ) = .     .   .   .     x 123     x 223   x 323 10

  11. Matricization ◮ A matrix has two modes (column mode and row mode). ◮ A d th-order tensor X has d modes ( µ = 1, µ = 2, . . . , µ = d ). Let us fix all but one mode, e.g., µ = 1: Then X (: , i 2 , i 3 , . . . , i d ) (abuse of M ATLAB notation) is a vector of length n 1 for each choice of i 2 , . . . , i d . These vectors are called fibers. � View tensor X as a bunch of column vectors: 11

  12. Matricization Stack vectors into an n 1 × ( n 2 · · · n d ) matrix: X ( 1 ) ∈ R n 1 × ( n 2 n 3 ··· n d ) X ∈ R n 1 × n 2 ×···× n d For µ = 1 , . . . , d , the µ -mode matricization of X is a matrix X ( µ ) ∈ R n µ × ( n 1 ··· n µ − 1 n µ + 1 ··· n d ) with entries � X ( µ ) � i µ 1 , ( i 1 ,..., i µ − 1 , i µ + 1 ... i d ) = X i ∀ i ∈ I . 12

  13. Matricization In M ATLAB : a = rand(2,3,4,5); ◮ 1-mode matricization: reshape(a,2,3*4*5) ◮ 2-mode matricization: b = permute(a,[2 1 3 4]); reshape(b,3,2*4*5) ◮ 3-mode matricization: b = permute(a,[3 1 2 4]); reshape(b,4,2*3*5) ◮ 4-mode matricization: b = permute(a,[4 1 2 3]); reshape(b,5,2*3*4) For a matrix A ∈ R n 1 × n 2 : A ( 1 ) = A , A ( 2 ) = A T . 13

  14. µ -mode matrix products Consider 1-mode matricization X ( 1 ) ∈ R n 1 × ( n 2 ··· n d ) : Seems to make sense to multiply an m × n 1 matrix A from the left: Y ( 1 ) := A X ( 1 ) ∈ R m × ( n 2 ··· n d ) . Can rearrange Y ( 1 ) back into an m × n 2 × · · · × n d tensor Y . This is called 1-mode matrix multiplication Y ( 1 ) = AX ( 1 ) Y = A ◦ 1 X ⇔ More formally (and more ugly): n 1 � Y i 1 , i 2 ,..., i d = a i 1 , k X k , i 2 ,..., i d . k = 1 14

  15. µ -mode matrix products General definition of a µ -mode matrix product with A ∈ R m × n 1 : Y ( µ ) = AX ( µ ) . Y = A ◦ µ X ⇔ More formally (and more ugly): n 1 � Y i 1 , i 2 ,..., i d = a i µ , k X i 1 ,..., i µ − 1 , k , i µ + 1 ,..., i d . k = 1 For matrices: ◮ 1-mode multiplication = multiplication from the left: Y = A ◦ 1 X = A X . ◮ 2-mode multiplication = transposed multiplication from the right: Y = A ◦ 2 X = X A T . 15

  16. µ -mode matrix products and vectorization By definition, � X ( 1 ) � vec ( X ) = vec . Consequently, also � A X ( 1 ) � vec ( A ◦ 1 X ) = vec . � Vectorized version of 1-mode matrix product: vec ( A ◦ 1 X ) = ( I n 2 ··· n d ⊗ A ) vec ( X ) = ( I n d ⊗ · · · ⊗ I n 2 ⊗ A ) vec ( X ) . Relation between µ -mode matrix product and matrix-vector product: vec ( A ◦ µ X ) = ( I n d ⊗ · · · ⊗ I n µ + 1 ⊗ A ⊗ I n µ − 1 ⊗ · · · ⊗ I n 1 ) vec ( X ) 16

  17. Summary ◮ Tensor X ∈ R n 1 ×···× n d is a d -dimensional array. ◮ Various ways of reshaping entries of a tensor X into a vector or matrix. ◮ µ -mode matrix multiplication can be expressed with Kronecker products Further reading: ◮ T. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM Rev. 51 (2009), no. 3, 455–500. Software: ◮ M ATLAB (and all programming languages) offer basic functionality to work with d -dimensional arrays. ◮ M ATLAB Tensor Toolbox: http://www.tensortoolbox.org/ 17

  18. Applications of tensors 18

  19. Two classes of tensor problems Class 1: function-related tensors Consider a function u ( ξ 1 , . . . , ξ d ) ∈ R in d variables ξ 1 , . . . , ξ d . Tensor U ∈ R n 1 ×···× n d represents discretization of u : ◮ U contains function values of u evaluated on a grid; or ◮ U contains coefficients of truncated expansion in tensorized basis functions: � u ( ξ 1 , . . . , ξ d ) ≈ U i φ i 1 ( ξ 1 ) φ i 2 ( ξ 2 ) · · · φ i d ( ξ d ) . i ∈ I Typical setting: ◮ U only given implicitly, e.g., as the solution of a discretized PDE; ◮ seek approximations to U with very low storage and tolerable accuracy. ◮ d may become very large. 19

  20. Discretization of function in d variables ξ 1 , . . . , ξ d ∈ [ 0 , 1 ] � # function values grows exponentially with d 20

  21. Separability helps Ideal situation: Function f separable: f ( ξ 1 , ξ 2 , . . . , ξ d ) = f 1 ( ξ 1 ) f 2 ( ξ 2 ) . . . f d ( ξ d ) Kronecker product discretized f j O ( n d ) memory � O ( dn ) memory diskretized f Of course: Exact separability rarely satisfied in practice. 21

  22. Two classes of tensor problems Class 2: data-related tensors Tensor U ∈ R n 1 ×···× n d contains multi-dimensional data. Example 1: U 2011 , 3 , 2 denotes the number of papers published 2011 by author 3 in the mathematical journal 2. Example 2: A video of 1000 frames with resolution 640 × 480 can be viewed as a 640 × 480 × 1000 tensor. Example 3: Hyperspectral images. Example 4: Deep learning: Coefficients in each layer of deep NN stored as tensors (TensorFlow), Interpretation of RNNs as hierarchical tensor decomposition. Typical setting (except for Example 4): ◮ entries of U often given explicitly (at least partially). ◮ extraction of dominant features from U . ◮ usually moderate values for d . 22

  23. High-dimensional elliptic PDEs: 3D model problem ◮ Consider − ∆ u = f in Ω , u | ∂ Ω = 0 , on unit cube Ω = [ 0 , 1 ] 3 . ◮ Discretize on tensor grid. Uniform grid for simplicity: 1 ξ ( j ) µ = jh , h = n + 1 for µ = 1 , 2 , 3. ◮ Approximate solution tensor U ∈ R n × n × n : � � ξ ( i 1 ) 1 , ξ ( i 2 ) 2 , . . . , ξ ( i d ) U i 1 , i 2 , i 3 ≈ u . d 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend