Low Rank Approximation Lecture 6 Daniel Kressner Chair for - PowerPoint PPT Presentation

Low Rank Approximation Lecture 6 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1

The Kronecker product 2

Vectorization The vectorization of an m × n matrix A denoted by vec ( A ) , where vec : R m × n → R m · n stacks the columns of a matrix into a long column vector. Example:   a 11     a 21   a 11 a 12   a 31     A = a 21 a 22 ⇒ vec ( A ) =   a 12   a 31 a 32   a 22 a 32 Remarks: ◮ In M ATLAB : A(:) ◮ This way of vectorizing corresponds to how matrices are laid out in memory in M ATLAB . In other programming languages (e.g., C arrays) matrices are laid out rowwise. 3

Kronecker product For m × n matrix A and k × ℓ matrix B , Kronecker product defined as   b 11 A · · · b 1 ℓ A   . .  ∈ R km × ℓ n . B ⊗ A := . .  . . b k 1 A · · · b k ℓ A Most important properties (for our purposes): 1. vec ( A X ) = ( I ⊗ A ) vec ( X ) . 2. vec ( X A T ) = ( A ⊗ I ) vec ( X ) . 3. ( B ⊗ A )( D ⊗ C ) = ( BD ⊗ AC ) for A ∈ R m × n , B ∈ R k × ℓ , C ∈ R n × q , D ∈ R ℓ × p . 4. I m ⊗ I n = I mn . 5. ( A 1 + A 2 ) ⊗ B = A 1 ⊗ B + A 2 ⊗ B , A ⊗ ( B 1 ⊗ B 2 ) = A ⊗ B 1 + A ⊗ B 2 4

First steps with tensors 5

Vectors, matrices, and tensors Vector Matrix Tensor ◮ scalar = tensor of order 0 ◮ (column) vector = tensor of order 1 ◮ matrix = tensor of order 2 ◮ tensor of order 3 = n 1 n 2 n 3 numbers arranged in n 1 × n 2 × n 3 array 6

Tensors of arbitrary order A d -th order tensor X of size n 1 × n 2 × · · · × n d is a d -dimensional array with entries X i 1 , i 2 ,..., i d , i µ ∈ { 1 , . . . , n µ } for µ = 1 , . . . , d . In the following, entries of X are usually real (for simplicity) � X ∈ R n 1 × n 2 ×···× n d . Multi-index notation: I = { 1 , . . . , n 1 } × { 1 , . . . , n 2 } × · · · × { 1 , . . . , n d } . Then i ∈ I is a tuple of d indices: i = ( i 1 , i 2 , . . . , i d ) . Allows to write entries of X as X i for i ∈ I . 7

Two important points 1. A matrix A ∈ R m × n has a natural interpretation as a linear operator in terms of matrix-vector multiplications: A : R n → R m , A : x �→ A · x . There is no such (unique and natural) interpretation for tensors! � fundamental difficulty to define meaningful general notion of eigenvalues and singular values of tensors. 2. Number of entries in tensor grows exponentially with d � Curse of dimensionality. Example: Tensor of order 30 with n 1 = n 2 = · · · = n d = 10 has 10 30 entries = 8 × 10 12 Exabyte storage! 1 For d ≫ 1: Cannot afford to store tensor explicitly (in terms of its entries). 1 Global data storage a few years ago calculated at 295 exabyte, see http://www.bbc.co.uk/news/technology-12419672 . 8

Basic calculus ◮ Addition of two equal-sized tensors X , Y : Z = X + Y ⇔ Z i = X i + Y i ∀ i ∈ I . ◮ Scalar multiplication with α ∈ R : Z = α X ⇔ Z i = α X i ∀ i ∈ I . � vector space structure. ◮ Inner product of two equal-sized tensors X , Y : � �X , Y� := x i y i . i ∈ I � Induced norm � � � 1 / 2 x 2 �X� := i i ∈ I For a 2nd order tensor ( = matrix) this corresponds to the usual Euclidean geometry and Frobenius norm . 9

Vectorization Tensor X of size n 1 × n 2 × · · · × n d has n 1 · n 2 · · · n d entries � many ways to stack entries in a (loooong) column vector. One possible choice: The vectorization of X is denoted by vec ( X ) , where vec : R n 1 × n 2 ×···× n d → R n 1 · n 2 ··· n d stacks the entries of a tensor in reverse lexicographical order into a long column vector. Example: d = 3, n 1 = 3, n 2 = 2, n 3 = 3. x 111   x 211   x 311     x 121     .   .   vec ( X ) = .     .   .   .     x 123     x 223   x 323 10

Matricization ◮ A matrix has two modes (column mode and row mode). ◮ A d th-order tensor X has d modes ( µ = 1, µ = 2, . . . , µ = d ). Let us fix all but one mode, e.g., µ = 1: Then X (: , i 2 , i 3 , . . . , i d ) (abuse of M ATLAB notation) is a vector of length n 1 for each choice of i 2 , . . . , i d . These vectors are called fibers. � View tensor X as a bunch of column vectors: 11

Matricization Stack vectors into an n 1 × ( n 2 · · · n d ) matrix: X ( 1 ) ∈ R n 1 × ( n 2 n 3 ··· n d ) X ∈ R n 1 × n 2 ×···× n d For µ = 1 , . . . , d , the µ -mode matricization of X is a matrix X ( µ ) ∈ R n µ × ( n 1 ··· n µ − 1 n µ + 1 ··· n d ) with entries � X ( µ ) � i µ 1 , ( i 1 ,..., i µ − 1 , i µ + 1 ... i d ) = X i ∀ i ∈ I . 12

Matricization In M ATLAB : a = rand(2,3,4,5); ◮ 1-mode matricization: reshape(a,2,3*4*5) ◮ 2-mode matricization: b = permute(a,[2 1 3 4]); reshape(b,3,2*4*5) ◮ 3-mode matricization: b = permute(a,[3 1 2 4]); reshape(b,4,2*3*5) ◮ 4-mode matricization: b = permute(a,[4 1 2 3]); reshape(b,5,2*3*4) For a matrix A ∈ R n 1 × n 2 : A ( 1 ) = A , A ( 2 ) = A T . 13

µ -mode matrix products Consider 1-mode matricization X ( 1 ) ∈ R n 1 × ( n 2 ··· n d ) : Seems to make sense to multiply an m × n 1 matrix A from the left: Y ( 1 ) := A X ( 1 ) ∈ R m × ( n 2 ··· n d ) . Can rearrange Y ( 1 ) back into an m × n 2 × · · · × n d tensor Y . This is called 1-mode matrix multiplication Y ( 1 ) = AX ( 1 ) Y = A ◦ 1 X ⇔ More formally (and more ugly): n 1 � Y i 1 , i 2 ,..., i d = a i 1 , k X k , i 2 ,..., i d . k = 1 14

µ -mode matrix products General definition of a µ -mode matrix product with A ∈ R m × n 1 : Y ( µ ) = AX ( µ ) . Y = A ◦ µ X ⇔ More formally (and more ugly): n 1 � Y i 1 , i 2 ,..., i d = a i µ , k X i 1 ,..., i µ − 1 , k , i µ + 1 ,..., i d . k = 1 For matrices: ◮ 1-mode multiplication = multiplication from the left: Y = A ◦ 1 X = A X . ◮ 2-mode multiplication = transposed multiplication from the right: Y = A ◦ 2 X = X A T . 15

µ -mode matrix products and vectorization By definition, � X ( 1 ) � vec ( X ) = vec . Consequently, also � A X ( 1 ) � vec ( A ◦ 1 X ) = vec . � Vectorized version of 1-mode matrix product: vec ( A ◦ 1 X ) = ( I n 2 ··· n d ⊗ A ) vec ( X ) = ( I n d ⊗ · · · ⊗ I n 2 ⊗ A ) vec ( X ) . Relation between µ -mode matrix product and matrix-vector product: vec ( A ◦ µ X ) = ( I n d ⊗ · · · ⊗ I n µ + 1 ⊗ A ⊗ I n µ − 1 ⊗ · · · ⊗ I n 1 ) vec ( X ) 16

Summary ◮ Tensor X ∈ R n 1 ×···× n d is a d -dimensional array. ◮ Various ways of reshaping entries of a tensor X into a vector or matrix. ◮ µ -mode matrix multiplication can be expressed with Kronecker products Further reading: ◮ T. Kolda and B. W. Bader. Tensor decompositions and applications. SIAM Rev. 51 (2009), no. 3, 455–500. Software: ◮ M ATLAB (and all programming languages) offer basic functionality to work with d -dimensional arrays. ◮ M ATLAB Tensor Toolbox: http://www.tensortoolbox.org/ 17

Applications of tensors 18

Two classes of tensor problems Class 1: function-related tensors Consider a function u ( ξ 1 , . . . , ξ d ) ∈ R in d variables ξ 1 , . . . , ξ d . Tensor U ∈ R n 1 ×···× n d represents discretization of u : ◮ U contains function values of u evaluated on a grid; or ◮ U contains coefficients of truncated expansion in tensorized basis functions: � u ( ξ 1 , . . . , ξ d ) ≈ U i φ i 1 ( ξ 1 ) φ i 2 ( ξ 2 ) · · · φ i d ( ξ d ) . i ∈ I Typical setting: ◮ U only given implicitly, e.g., as the solution of a discretized PDE; ◮ seek approximations to U with very low storage and tolerable accuracy. ◮ d may become very large. 19

Discretization of function in d variables ξ 1 , . . . , ξ d ∈ [ 0 , 1 ] � # function values grows exponentially with d 20

Separability helps Ideal situation: Function f separable: f ( ξ 1 , ξ 2 , . . . , ξ d ) = f 1 ( ξ 1 ) f 2 ( ξ 2 ) . . . f d ( ξ d ) Kronecker product discretized f j O ( n d ) memory � O ( dn ) memory diskretized f Of course: Exact separability rarely satisfied in practice. 21

Two classes of tensor problems Class 2: data-related tensors Tensor U ∈ R n 1 ×···× n d contains multi-dimensional data. Example 1: U 2011 , 3 , 2 denotes the number of papers published 2011 by author 3 in the mathematical journal 2. Example 2: A video of 1000 frames with resolution 640 × 480 can be viewed as a 640 × 480 × 1000 tensor. Example 3: Hyperspectral images. Example 4: Deep learning: Coefficients in each layer of deep NN stored as tensors (TensorFlow), Interpretation of RNNs as hierarchical tensor decomposition. Typical setting (except for Example 4): ◮ entries of U often given explicitly (at least partially). ◮ extraction of dominant features from U . ◮ usually moderate values for d . 22

High-dimensional elliptic PDEs: 3D model problem ◮ Consider − ∆ u = f in Ω , u | ∂ Ω = 0 , on unit cube Ω = [ 0 , 1 ] 3 . ◮ Discretize on tensor grid. Uniform grid for simplicity: 1 ξ ( j ) µ = jh , h = n + 1 for µ = 1 , 2 , 3. ◮ Approximate solution tensor U ∈ R n × n × n : � � ξ ( i 1 ) 1 , ξ ( i 2 ) 2 , . . . , ξ ( i d ) U i 1 , i 2 , i 3 ≈ u . d 23

Low Rank Approximation Lecture 6 Daniel Kressner Chair for - PowerPoint PPT Presentation

Low Rank Approximation Lecture 6 Daniel Kressner Chair for Numerical Algorithms and HPC Institute of Mathematics, EPFL daniel.kressner@epfl.ch 1 The Kronecker product 2 Vectorization The vectorization of an m n matrix A denoted by vec (

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Parallel Numerical Algorithms Chapter 6 Matrix Models Section 6.2 Low Rank Approximation

Low Rank Approximation Lecture 4 Daniel Kressner Chair for Numerical Algorithms and HPC

Low Rank Approximation Lecture 10 Daniel Kressner Chair for Numerical Algorithms and HPC

Low Rank Approximation Lecture 5 Daniel Kressner Chair for Numerical Algorithms and HPC

ECS231 Low-rank approximation revisited (Introduction to Randomized Algorithms) May 23, 2019

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

Low Rank Approximation Lecture 3 Daniel Kressner Chair for Numerical Algorithms and HPC

6. Approximation and fitting norm approximation least-norm problems regularized

1 SVD applications: rank, column, row, and null spaces Rank : the rank of a matrix is equal to:

A new family of maximum rank distance codes or: Maximum rank distance codes and finite semifields

Predictive low-rank decomposition for kernel methods Francis Bach Michael Jordan Ecole des

Recitations for 10-701 Randomized Algorithm for matrices Mu Li April 9, 2013 Low-rank

Computing the Best Rank ( r 1 , r 2 , r 3 ) Approximation of a Tensor Lars Eld en

Optimal Rank-1 Hankel Approximation of Matrices Gerlind Plonka University of Gttingen CodEx

Bayesian Estimation of Low-rank Matrices Pierre Alquier Journes de Statistique du Sud,

Advanced Algorithms COMS31900 Approximation algorithms part three (Fully) Polynomial Time

Precision physics at the LHC 56. International Meeting on Nuclear Physics, January 2018, Bormio

Binary Access Memory: An Optimized Lookup Table for Successive Approximation Applications

DUNE Cold Electronics Strategy David Christian Outline CE task force New Plan for

Predicate Abstraction for Dense Real-Time Systems oller 1 , Harald Rue 2 , Maria Sorea 2 Oliver

Introduction Reinforcement Learning Scott Sanner Act NICTA / ANU First.Last@nicta.com.au

Parallel Scientific Computing Matrix-vector multiplication. Matrix-matrix multiplication.

CS475 / CM375 Lecture 23: Nov 29, 2011 Convergence of Iterative Methods CS475/CM375 (c) 2011 P.