Parallel Numerical Algorithms Chapter 7 Differential Equations - - PowerPoint PPT Presentation

parallel numerical algorithms
SMART_READER_LITE
LIVE PREVIEW

Parallel Numerical Algorithms Chapter 7 Differential Equations - - PowerPoint PPT Presentation

Tensor Algebra Tensor Decompositions Fast Algorithms Parallel Numerical Algorithms Chapter 7 Differential Equations Section 7.5 Tensor Analysis Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign


slide-1
SLIDE 1

Tensor Algebra Tensor Decompositions Fast Algorithms

Parallel Numerical Algorithms

Chapter 7 – Differential Equations Section 7.5 – Tensor Analysis Edgar Solomonik

Department of Computer Science University of Illinois at Urbana-Champaign

CS 554 / CSE 512

Edgar Solomonik Parallel Numerical Algorithms 1 / 26

slide-2
SLIDE 2

Tensor Algebra Tensor Decompositions Fast Algorithms

Outline

1

Tensor Algebra Tensors Tensor Transposition Tensor Contractions

2

Tensor Decompositions CP Decomposition Tucker Decomposition Tensor Train Decomposition

3

Fast Algorithms Strassen’s Algorithm Bilinear Algorithms

Edgar Solomonik Parallel Numerical Algorithms 2 / 26

slide-3
SLIDE 3

Tensor Algebra Tensor Decompositions Fast Algorithms Tensors Tensor Transposition Tensor Contractions

Tensors

A tensor T ∈ Rn1×···×nd has Order d (i.e. d modes / indices) Dimensions n1-by-· · · -by-nd Elements ti1...id = ti where i ∈ d

i=1{1, . . . , ni}

Order d tensors represent d-dimensional arrays (d ≥ 3)-dimensional arrays are prevalent in scientific computing

Regular grids, collections of matrices, multilinear operators Experimental data, visual/graphic data

Tensors analysis is the expression and study of numerical methods using tensor representations

Edgar Solomonik Parallel Numerical Algorithms 3 / 26

slide-4
SLIDE 4

Tensor Algebra Tensor Decompositions Fast Algorithms Tensors Tensor Transposition Tensor Contractions

Reshaping Tensors

When using tensors, it is often necessary to transition between high-order and low-order representations of the same object Recall for a matrix A ∈ Rm×n its unfolding is given by v = vec (A) , ⇒ v ∈ Rmn, vi+jm = aij A tensor T ∈ Rn1×···×nd can be fully unfolded the same way v = vec (T ) , ⇒ v ∈ Rn1···nd, vi1+i2n1+i3n1n2+... = ti1i2i3... Often we also want to fold tensors into higher-order ones Generally, we can reshape (fold or unfold) any tensor U = on1×···×nd(V ) ⇒ U ∈ Rn1×···×nd, vec (U) = vec (V )

Edgar Solomonik Parallel Numerical Algorithms 4 / 26

slide-5
SLIDE 5

Tensor Algebra Tensor Decompositions Fast Algorithms Tensors Tensor Transposition Tensor Contractions

Tensor Transposition

For tensors of order ≥ 3, there is more than one way to transpose modes A tensor transposition is defined by a permutation p containing elements {1, . . . , d} Y = Xp ⇒ yip1,...,ipd = xi1,...,id In this notation, a transposition of matrix A is defined as AT = A[2,1] Tensor transposition is a convenient primitive for manipulating multidimensional arrays and mapping tensor computations to linear algebra In tensor derivations, indices are often carried through to avoid transpositions

Edgar Solomonik Parallel Numerical Algorithms 5 / 26

slide-6
SLIDE 6

Tensor Algebra Tensor Decompositions Fast Algorithms Tensors Tensor Transposition Tensor Contractions

Tensor Symmetry

We say a tensor is symmetric if ∀j, k ∈ {1, . . . , d} ti1...ij...ik...id = ti1...ik...ij...id

  • r equivalently

T = T [1,...,j,...,k,...d] A tensor is antisymmetric (skew-symmetric) if ∀j, k ∈ {1, . . . , d} ti1...ij...ik...id = (−1)ti1...ik...ij...id A tensor is partially-symmetric if such index interchanges are restricted to be within subsets of {1, . . . , d}, e.g. tijkl = tjikl = tjilk = tijlk

Edgar Solomonik Parallel Numerical Algorithms 6 / 26

slide-7
SLIDE 7

Tensor Algebra Tensor Decompositions Fast Algorithms Tensors Tensor Transposition Tensor Contractions

Tensor Products and Kronecker Products

Tensor products can be defined with respect to maps f : Vf → Wf and g : Vg → Wg h = f ×g ⇒ g : (Vf ×Vg) → (Wf ×Wg), h(x, y) = f(x)g(y) Tensors can be used to represent multilinear maps and have a corresponding definition for a tensor product T = X × Y ⇒ ti1,...,im,j1,...,jn = xi1,...,imyj1,...,jn The Kronecker product between two matrices A ∈ Rm1×m2, B ∈ Rn1×n2 C = A ⊗ B ⇒ ci2+i1m2,j2+j1n2 = ai1j1bi2j2 corresponds to transposing and reshaping the tensor product A ⊗ B = om1n1,m2n2((A × B)[3,1,4,2])

Edgar Solomonik Parallel Numerical Algorithms 7 / 26

slide-8
SLIDE 8

Tensor Algebra Tensor Decompositions Fast Algorithms Tensors Tensor Transposition Tensor Contractions

Tensor Partial Sum

Y of order d − r is a partial sum of X of order d if for some q containing r elements of {1, . . . , d} Y =

  • q

(X) ⇒ ¯ X = X[q1,...,qr,...], yi1...id−r =

  • j1

· · ·

  • jr

¯ xj1,...jr,i1,...id−r Partial summations provide a powerful primitive operation when coupled with transposition and reshape

Edgar Solomonik Parallel Numerical Algorithms 8 / 26

slide-9
SLIDE 9

Tensor Algebra Tensor Decompositions Fast Algorithms Tensors Tensor Transposition Tensor Contractions

Tensor Trace

Z of order d − 2r is a trace of X of order d ≥ r if for some p, q each containing a different set of r elements of {1, . . . , d} Y = trace

p,q (X)

⇒ ¯ X = X[p1,...,prq1,...,qr,...], yi1...id−2r =

  • j1

· · ·

  • jr

¯ xj1,...jr,j1,...jr,i1,...id−2r The trace of a matrix A in this notation is trace(A) = trace

[0],[1] (A) =

  • i

aii

Edgar Solomonik Parallel Numerical Algorithms 9 / 26

slide-10
SLIDE 10

Tensor Algebra Tensor Decompositions Fast Algorithms Tensors Tensor Transposition Tensor Contractions

Tensor Contraction

Tensor contraction is a transpose of a trace of a tensor product C =

  • trace

p,q (A × B)

r for some p, q, r Examples in linear algebra include: vector inner and outer products, matrix–vector product, matrix–matrix product The contracted modes of A appear in p and of B in q, while uncontracted modes appear in r Matrix multiplication would be given by p = [2], q = [3], r = [1, 4]

Edgar Solomonik Parallel Numerical Algorithms 10 / 26

slide-11
SLIDE 11

Tensor Algebra Tensor Decompositions Fast Algorithms Tensors Tensor Transposition Tensor Contractions

Tensor Times Matrix

Tensor times matrix (TTM) is one of the most common tensor contractions involving tensors of order ≥ 3 Given an order 3 tensor T and matrix V , TTM computes

  • rder 3 tensor W , generalizes naturally to higher-order T

TTM can contract one of three modes of T W =

  • trace

[3],[4] (T × V )

[1,2,5]

  • r

W =

  • trace

[2],[4] (T × V )

[1,3,5]

  • r

W =

  • trace

[1],[4] (T × V )

[2,3,5] In the first case, we have wijk =

  • l

tijlvlk

Edgar Solomonik Parallel Numerical Algorithms 11 / 26

slide-12
SLIDE 12

Tensor Algebra Tensor Decompositions Fast Algorithms Tensors Tensor Transposition Tensor Contractions

Tensor Contraction Diagrams

Consider the tensor contraction wijk =

  • lm

tijlmvmkl which we can also write in tensor notation W =

  • trace

[3,4],[7,5](T × V )

[1,2,6]

  • r in the following diagrammatic form

Edgar Solomonik Parallel Numerical Algorithms 12 / 26

slide-13
SLIDE 13

Tensor Algebra Tensor Decompositions Fast Algorithms CP Decomposition Tucker Decomposition Tensor Train Decomposition

CP decomposition

The SVD corresponds to a sum of outer products of the form A =

r

  • k=1

σkukvT

k

so it is natural to seek to approximate a tensor as T =

r

  • k=1

σku(1)

k

× · · · × u(d)

k

where each u(i)

k is orthogonal to any other u(i) k′ , yielding the

canonical polyadic (CP) decomposition of T r is referred to as the canonical rank of the tensor

Edgar Solomonik Parallel Numerical Algorithms 13 / 26

slide-14
SLIDE 14

Tensor Algebra Tensor Decompositions Fast Algorithms CP Decomposition Tucker Decomposition Tensor Train Decomposition

Computing the CP decomposition

Computing the canonical rank is NP hard Approximation by CP decomposition is ill-posed Regularization (imposing bounds on the norm of the factor matrices) make the optimization problem feasible Alternating least squares (ALS) commonly used for computation

Optimizes for one factor matrix at a time Least squares problem for each matrix

Alternatives include coordinate and gradient descent methods, much like in numerical optimization for matrix completion

Edgar Solomonik Parallel Numerical Algorithms 14 / 26

slide-15
SLIDE 15

Tensor Algebra Tensor Decompositions Fast Algorithms CP Decomposition Tucker Decomposition Tensor Train Decomposition

Tucker Decomposition

The Tucker decomposition introduces an order d core tensor into the CP decomposition ti1...id =

  • k1···kd

sk1...kdw(1)

i1k1 · · · w(d) idkd

where the columns of each W (i) are orthonormal Unlike CP decomposition (given by ‘diagonal’ tensor S), each index appears in no more than two tensors Tucker decomposition is not low-order since the order of T matches that of S

Edgar Solomonik Parallel Numerical Algorithms 15 / 26

slide-16
SLIDE 16

Tensor Algebra Tensor Decompositions Fast Algorithms CP Decomposition Tucker Decomposition Tensor Train Decomposition

Computing the Tucker Decomposition

The SVD (HOSVD) can refer to the Tucker decomposition or the following basic method for its computation Compute the left singular vectors W (i) of all d single-mode unfoldings of T T(i) = oni×n1···ni−1ni+1···nd(T [i,1,...,i−1,i+1,...d]) Compute the core tensor by projecting A onto these singular vectors along each mode S =

  • trace

[1,...d],[d+2,d+4,...,2d](T × W (1) × · · · × W (d))

[d+1,d+3,...,2d−1] The HOSVD works well when the core tensor S can be shown to have decaying entries

Edgar Solomonik Parallel Numerical Algorithms 16 / 26

slide-17
SLIDE 17

Tensor Algebra Tensor Decompositions Fast Algorithms CP Decomposition Tucker Decomposition Tensor Train Decomposition

Tensor Train Decomposition

The tensor train decomposition has the following diagrammatic representation The tensor train is a chain of contracted order 3 tensors with the two ends having order 2 Elements of T are given by matrix product chain ti1...id = u(i1)U (i2) · · · U (id−1)u(id) Has been used for decades in physics, known as the matrix product states (MPS) representation

Edgar Solomonik Parallel Numerical Algorithms 17 / 26

slide-18
SLIDE 18

Tensor Algebra Tensor Decompositions Fast Algorithms CP Decomposition Tucker Decomposition Tensor Train Decomposition

Tensor Train Properties

The tensor train (TT) ranks are given by the dimensions of the auxiliary modes in the factorization The tensor train (TT) rank is the maximum number of columns in any matrix U (ij) The tensor train rank is a matrix rank, i.e. it corresponds to a low-rank decomposition of a matrix (given by contracting two parts of the tensor train) Summation and products of tensor trains can be readily computed

Edgar Solomonik Parallel Numerical Algorithms 18 / 26

slide-19
SLIDE 19

Tensor Algebra Tensor Decompositions Fast Algorithms CP Decomposition Tucker Decomposition Tensor Train Decomposition

Quantized Tensor Train

The quantized tensor train (QTT) corresponds to the application of TT to a tensor that is reshaped to be higher-order (e.g. each resulting mode dimension is constant) For some classes of matrices QTT analytic decompositions are known Toeplitz matrices have constant TT rank 3D Poisson operator has constant TT rank, more generally for FEM methods with simple mass and stiffness matrices

Edgar Solomonik Parallel Numerical Algorithms 19 / 26

slide-20
SLIDE 20

Tensor Algebra Tensor Decompositions Fast Algorithms CP Decomposition Tucker Decomposition Tensor Train Decomposition

Computing the Tensor Train Decomposition

Product of matrix with constant QTT rank and vector of dimension n has cost Θ(n log n) Given general vector, can compute TT decomposition by hierarchical SVD Faster algorithms leveraging low-rank of SVD are possible Can interpolate order d tensor with dimensions equal to n and TT rank r with cost O(dnr2) Can efficiently obtain cross approximation, i.e. a lower-rank approximation to an existing TT approximation For order d tensor with n-dimensions and TT rank r, cross approximation cost is O(dnr3)

Edgar Solomonik Parallel Numerical Algorithms 20 / 26

slide-21
SLIDE 21

Tensor Algebra Tensor Decompositions Fast Algorithms Strassen’s Algorithm Bilinear Algorithms

Strassen’s Algorithm

Strassen’s algorithm

C11 C12 C21 C22

  • =

A11 A12 A21 A22

  • ·

B11 B12 B21 B22

  • M1 = (A11 + A22) · (B11 + B22)

M2 = (A21 + A22) · B11 M3 = A11 · (B12 − B22) M4 = A22 · (B21 − B11) M5 = (A11 + A12) · B22 M6 = (A21 − A11) · (B11 + B12) M7 = (A12 − A22) · (B21 + B22) C11 = M1 + M4 − M5 + M7 C21 = M2 + M4 C12 = M3 + M5 C22 = M1 − M2 + M3 + M6

Minimize products ⇒ minimize number of recursive calls T(n) = 7T(n/2) + O(n2) = O(7log2 n) = O(nlog2 7) For convolution, DFT matrix reduces from naive O(n2) products to O(n), both of these are bilinear algorithms

Edgar Solomonik Parallel Numerical Algorithms 21 / 26

slide-22
SLIDE 22

Tensor Algebra Tensor Decompositions Fast Algorithms Strassen’s Algorithm Bilinear Algorithms

Bilinear Algorithms

Definition (Bilinear algorithms (V. Pan, 1984)) A bilinear algorithm Λ = (F (A), F (B), F (C)) computes c = F (C)[(F (A)Ta) ⊙ (F (B)Tb)], where a and b are inputs and ⊙ is the Hadamard (pointwise) product.

Edgar Solomonik Parallel Numerical Algorithms 22 / 26

slide-23
SLIDE 23

Tensor Algebra Tensor Decompositions Fast Algorithms Strassen’s Algorithm Bilinear Algorithms

Bilinear Algorithms as Tensor Factorizations

A bilinear algorithm corresponds to a CP tensor decomposition ci =

r

  • r=1

f(C)

ir j

f(A)

jr aj k

f(B)

kr bk

  • =
  • j
  • k
  • r
  • r=1

f(C)

ir f(A) jr f(B) kr

  • ajbk

=

  • j
  • k

tijkajbk where tijk =

r

  • r=1

f(C)

ir f(A) jr f(B) kr

For multiplication of n × n matrices, T is n2 × n2 × n2 Classical algorithm has rank r = n3 Strassen’s algorithm has rank r ≈ nlog2(7)

Edgar Solomonik Parallel Numerical Algorithms 23 / 26

slide-24
SLIDE 24

Tensor Algebra Tensor Decompositions Fast Algorithms

References

Golub, Gene H., and Charles F . Van Loan. Matrix

  • computations. Vol. 3. JHU Press, 2012.

Kolda, Tamara G., and Brett W. Bader. Tensor decompositions and applications. SIAM review 51.3 (2009): 455-500. Khoromskij, Boris N. Tensors-structured numerical methods in scientific computing: Survey on recent

  • advances. Chemometrics and Intelligent Laboratory

Systems 110.1 (2012): 1-19. Grasedyck, Lars, Daniel Kressner, and Christine Tobler. A literature survey of low-rank tensor approximation

  • techniques. GAMM-Mitteilungen 36.1 (2013): 53-78.

Edgar Solomonik Parallel Numerical Algorithms 24 / 26

slide-25
SLIDE 25

Tensor Algebra Tensor Decompositions Fast Algorithms

References

Kazeev, Vladimir A., and Boris N. Khoromskij. Low-rank explicit QTT representation of the Laplace operator and its

  • inverse. SIAM Journal on Matrix Analysis and Applications

33.3 (2012): 742-758. Khoromskaia, Venera, Boris Khoromskij, and Reinhold

  • Schneider. QTT representation of the Hartree and

exchange operators in electronic structure calculations. Computational Methods in Applied Mathematics Comput. Methods Appl. Math. 11.3 (2011): 327-341. Khoromskij, Boris N., and Ivan V. Oseledets. QTT approximation of elliptic solution operators in higher

  • dimensions. Russian Journal of Numerical Analysis and

Mathematical Modelling 26.3 (2011): 303-322.

Edgar Solomonik Parallel Numerical Algorithms 25 / 26

slide-26
SLIDE 26

Tensor Algebra Tensor Decompositions Fast Algorithms

References

De Silva, Vin, and Lek-Heng Lim. Tensor rank and the ill-posedness of the best low-rank approximation problem. SIAM Journal on Matrix Analysis and Applications 30.3 (2008): 1084-1127. Pan, Victor. How can we speed up matrix multiplication? SIAM review 26.3 (1984): 393-415. Karlsson, Lars, Daniel Kressner, and André Uschmajew. Parallel algorithms for tensor completion in the CP format. Parallel Computing 57 (2016): 222-234.

Edgar Solomonik Parallel Numerical Algorithms 26 / 26