Communication avoiding low rank matrix approximation, a unified - - PowerPoint PPT Presentation

communication avoiding low rank matrix approximation a
SMART_READER_LITE
LIVE PREVIEW

Communication avoiding low rank matrix approximation, a unified - - PowerPoint PPT Presentation

Communication avoiding low rank matrix approximation, a unified perspective on deterministic and randomized approaches L. Grigori and collaborators Alpines Inria Paris and LJLL, Sorbonne University Slides available at


slide-1
SLIDE 1

Communication avoiding low rank matrix approximation, a unified perspective on deterministic and randomized approaches

  • L. Grigori

and collaborators Alpines Inria Paris and LJLL, Sorbonne University

Slides available at https://who.rocq.inria.fr/Laura.Grigori/Slides/ENLA20_Grigori.pdf

July 8, 2020

slide-2
SLIDE 2

Plan

Motivation of our work Unified perspective on low rank matrix approximation Generalized LU decomposition Recent deterministic algorithms and bounds CA RRQR with 2D tournament pivoting CA LU with column/row tournament pivoting Randomized generalized LU and bounds Approximation of tensors Parallel HORRQR Conclusions

2 of 42

slide-3
SLIDE 3

Motivation of our work

The communication challenge

Cost of data movement dominates the cost of arithmetics: time and

energy consumption

Per socket flop performance continues to increase: increase of number of

cores per socket and/or number of flops per cycle 2008 Intel Nehalem 3.2GHz×4 cores (51.2 GFlops/socket DP) 2020 A64FX 2.2GHz×48 cores (3.37 TFlops/socket DP)1 66x in 12 years

Interconnect latency: few µs MPI latency

Our focus: increasing scalability by reducing/minimizing coummunication while controlling the loss of information in low rank matrix (and tensor) approximation.

1 Fugaku supercomputer https://www.top500.org/system/179807/ 3 of 42

slide-4
SLIDE 4

Unified perspective on low rank matrix approximation

Low rank matrix approximation

Problem: given A ∈ Rm×n, compute rank-k approximation ZW T, where

Z ∈ Rm×k and W T ∈ Rk×n.

Problem ubiquitous in scientific computing and data analysis column subset selection, linear dependency analysis, fast solvers for integral

equations, H-matrices,

principal component analysis, image processing, data in high dimensions, ... 4 of 42

slide-5
SLIDE 5

Unified perspective on low rank matrix approximation

Low rank matrix approximation

Best rank-k approximation Aopt,k = ˆ

UkΣk ˆ V T

k is rank-k truncated SVD of

A [Eckart and Young, 1936], with

σmax(A) = σ1(A) ≥ . . . ≥ σmin(A) = σmin(m,n)(A) min

rank( ˜ Ak )≤k

||A − ˜ Ak||2 = ||A − Aopt,k||2 = σk+1(A) min

rank( ˜ Ak )≤k

||A − ˜ Ak||F = ||A − Aopt,k||F =

  • n
  • j=k+1

σ2

j (A)

Image, size 1190 × 1920 Rank-10 approximation, SVD Rank-50 approximation, SVD

Image source: https://pixabay.com/photos/billiards-ball-play-number-half-4345870/ 5 of 42

slide-6
SLIDE 6

Unified perspective on low rank matrix approximation

Low rank matrix approximation: trade-offs

Communication optimal if computing a rank-k approximation on P processors requires # messages = Ω (log2 P) .

6 of 42

slide-7
SLIDE 7

Unified perspective on low rank matrix approximation

Low rank matrix approximation: trade-offs

Communication optimal if computing a rank-k approximation on P processors requires # messages = Ω (log2 P) .

6 of 42

slide-8
SLIDE 8

Unified perspective on low rank matrix approximation

Idea underlying many algorithms

Compute ˜ Ak = PA, where P = Po or P = Pso is obtained as:

  • 1. Construct a low dimensional subspace X = range(AV1), V1 ∈ Rn×l that

approximates well the range of A, e.g. A − PoA2 ≤ γσk+1(A), for some γ ≥ 1, where Q1 is orth. basis of (AV1) Po = AV1(AV1)+ = Q1QT

1 , or equiv Poaj := arg min x∈X x − aj2

  • 2. Select a semi-inner product U1·, U1·2, U1 ∈ Rl′×m l′ ≥ l, define

Pso = AV1(U1AV1)+U1, or equiv Psoaj := arg min

x∈X U1(x − aj)2

with O. Balabanov, 2020

7 of 42

slide-9
SLIDE 9

Unified perspective on low rank matrix approximation

Idea underlying many algorithms

Compute ˜ Ak = PA, where P = Po or P = Pso is obtained as:

  • 1. Construct a low dimensional subspace X = range(AV1), V1 ∈ Rn×l that

approximates well the range of A, e.g. A − PoA2 ≤ γσk+1(A), for some γ ≥ 1, where Q1 is orth. basis of (AV1) Po = AV1(AV1)+ = Q1QT

1 , or equiv Poaj := arg min x∈X x − aj2

  • 2. Select a semi-inner product U1·, U1·2, U1 ∈ Rl′×m l′ ≥ l, define

Pso = AV1(U1AV1)+U1, or equiv Psoaj := arg min

x∈X U1(x − aj)2

with O. Balabanov, 2020

7 of 42

slide-10
SLIDE 10

Unified perspective on low rank matrix approximation Generalized LU decomposition

Unified perspective: generalized LU factorization

Given A ∈ Rm×n, U = U1 U2

  • ∈ Rm,m, V =

V1 V2

  • ∈ Rn,n, U, V

invertible, U1 ∈ Rl′×m, V1 ∈ Rn×l, k ≤ l ≤ l′. UAV = ¯ A = ¯ A11 ¯ A12 ¯ A21 ¯ A22

  • =
  • I

¯ A21 ¯ A+

11

I ¯ A11 ¯ A12 S( ¯ A11)

  • where ¯

A11 ∈ Rl′,l, ¯ A+

11 ¯

A11 = I, S( ¯ A11) = ¯ A22 − ¯ A21 ¯ A+

11 ¯

A12.

Generalized LU computes the approximation

˜ Aglu = U−1

  • I

¯ A21 ¯ A+

11

¯ A11 ¯ A12

  • V −1

= [U+

1 (I − (U1AV1)(U1AV1)+) + (AV1)(U1AV1)+][U1A]

with J. Demmel and A. Rusciano, 2019

8 of 42

slide-11
SLIDE 11

Unified perspective on low rank matrix approximation Generalized LU decomposition

Unified perspective: generalized LU factorization

Given U1, A, V1, Q1 orth. basis of (AV1), k ≤ l < l′, rank-k approximation, ˜ Aglu = [U+

1 (I − (U1AV1)(U1AV1)+) + (AV1)(U1AV1)+][U1A]

Unification for many existing algorithms:

If k ≤ l = l′ and U1 = QT

1 , then ˜

Aglu = Q1QT

1 A = PoA

If k ≤ l = l′ then ˜

Aglu = AV1(U1AV1)−1U1A = PsoA Approximation result: If k ≤ l < l′, A − PsoA2

F = A − ˜

Aglu2

F + ˜

Aglu − PsoA2

F

9 of 42

slide-12
SLIDE 12

Unified perspective on low rank matrix approximation Generalized LU decomposition

Unified perspective: generalized LU factorization

Given U1, A, V1, Q1 orth. basis of (AV1), k ≤ l < l′, rank-k approximation, ˜ Aglu = [U+

1 (I − (U1AV1)(U1AV1)+) + (AV1)(U1AV1)+][U1A]

Unification for many existing algorithms:

If k ≤ l = l′ and U1 = QT

1 , then ˜

Aglu = Q1QT

1 A = PoA

If k ≤ l = l′ then ˜

Aglu = AV1(U1AV1)−1U1A = PsoA Approximation result: If k ≤ l < l′, A − PsoA2

F = A − ˜

Aglu2

F + ˜

Aglu − PsoA2

F

9 of 42

slide-13
SLIDE 13

Unified perspective on low rank matrix approximation Generalized LU decomposition

Unified perspective: generalized LU factorization

Given U1, A, V1, Q1 orth. basis of (AV1), k ≤ l < l′, rank-k approximation, ˜ Aglu = [U+

1 (I − (U1AV1)(U1AV1)+) + (AV1)(U1AV1)+][U1A]

Unification for many existing algorithms:

If k ≤ l = l′ and U1 = QT

1 , then ˜

Aglu = Q1QT

1 A = PoA

If k ≤ l = l′ then ˜

Aglu = AV1(U1AV1)−1U1A = PsoA Approximation result: If k ≤ l < l′, A − PsoA2

F = A − ˜

Aglu2

F + ˜

Aglu − PsoA2

F

9 of 42

slide-14
SLIDE 14

Unified perspective on low rank matrix approximation Generalized LU decomposition

Desired properties of low rank matrix approximation

  • 1. ˜

Ak is (k, γ) low-rank approximation of A if it satisfies A − ˜ Ak2 ≤ γσk+1(A), for some γ ≥ 1. → Focus of both deterministic and randomized approaches

  • 2. ˜

Ak is (k, γ) spectrum preserving of A if 1 ≤ σi(A) σi( ˜ Ak) ≤ γ, for all i = 1, . . . , k and some γ ≥ 1 → Focus of deterministic approaches

  • 3. ˜

Ak is (k, γ) kernel approximation of A if 1 ≤ σj(A − ˜ Ak) σk+j(A) ≤ γ, for all i = 1, . . . , min(m, n) − k and some γ ≥ 1 → Focus of deterministic approaches Goal γ is a low degree polynomial in k and the number of columns and/or rows of A for some of the most efficient algorithms.

10 of 42

slide-15
SLIDE 15

Unified perspective on low rank matrix approximation Generalized LU decomposition

Desired properties of low rank matrix approximation

  • 1. ˜

Ak is (k, γ) low-rank approximation of A if it satisfies A − ˜ Ak2 ≤ γσk+1(A), for some γ ≥ 1. → Focus of both deterministic and randomized approaches

  • 2. ˜

Ak is (k, γ) spectrum preserving of A if 1 ≤ σi(A) σi( ˜ Ak) ≤ γ, for all i = 1, . . . , k and some γ ≥ 1 → Focus of deterministic approaches

  • 3. ˜

Ak is (k, γ) kernel approximation of A if 1 ≤ σj(A − ˜ Ak) σk+j(A) ≤ γ, for all i = 1, . . . , min(m, n) − k and some γ ≥ 1 → Focus of deterministic approaches Goal γ is a low degree polynomial in k and the number of columns and/or rows of A for some of the most efficient algorithms.

10 of 42

slide-16
SLIDE 16

Unified perspective on low rank matrix approximation Generalized LU decomposition

Desired properties of low rank matrix approximation

  • 1. ˜

Ak is (k, γ) low-rank approximation of A if it satisfies A − ˜ Ak2 ≤ γσk+1(A), for some γ ≥ 1. → Focus of both deterministic and randomized approaches

  • 2. ˜

Ak is (k, γ) spectrum preserving of A if 1 ≤ σi(A) σi( ˜ Ak) ≤ γ, for all i = 1, . . . , k and some γ ≥ 1 → Focus of deterministic approaches

  • 3. ˜

Ak is (k, γ) kernel approximation of A if 1 ≤ σj(A − ˜ Ak) σk+j(A) ≤ γ, for all i = 1, . . . , min(m, n) − k and some γ ≥ 1 → Focus of deterministic approaches Goal γ is a low degree polynomial in k and the number of columns and/or rows of A for some of the most efficient algorithms.

10 of 42

slide-17
SLIDE 17

Recent deterministic algorithms and bounds

Plan

Motivation of our work Unified perspective on low rank matrix approximation Generalized LU decomposition Recent deterministic algorithms and bounds CA RRQR with 2D tournament pivoting CA LU with column/row tournament pivoting Randomized generalized LU and bounds Approximation of tensors Parallel HORRQR Conclusions

11 of 42

slide-18
SLIDE 18

Recent deterministic algorithms and bounds CA LU with column/row tournament pivoting

Strong rank revealing QR (RRQR) factorization

Given A ∈ Rm×n, consider the QRCP decomposition with R11 ∈ Rk×k, [Golub, 1965, Businger and Golub, 1965], AV = QR = Q1 Q2 R11 R12 R22

  • ,

˜ Aqr = Q1 R11 R12

  • V −1 = Q1QT

1 A = PoA

[Gu and Eisenstat, 1996] show that given k and f , there exists

permutation V ∈ Rn×n such that the factorization satisfies, 1 ≤ σi(A) σi(R11), σj(R22) σk+j(A) ≤ γ(n, k), γ(n, k) =

  • 1 + f 2k(n − k)

||R−1

11 R12||max

≤ f for 1 ≤ i ≤ k and 1 ≤ j ≤ min(m, n) − k, and σj(R22) = σj(A − ˜ Aqr)

Cost: 4mnk (QRCP) plus O(mnk) flops and O(k log2 P) messages.

→ ˜ Aqr with strong RRQR is (k, γ(n, k)) spectrum preserving and kernel approximation of A

12 of 42

slide-19
SLIDE 19

Recent deterministic algorithms and bounds CA LU with column/row tournament pivoting

Deterministic column selection: tournament pivoting

1D tournament pivoting (1Dc-TP)

1D column block partition of A, select k cols

from each block with strong RRQR

( A11 A12 A13 A14 ) = = = = ( Q00R00V T

00

Q10R10V T

10

Q20R20V T

20

Q30R30V T

30 )

↓ ↓ ↓ ↓ I00 I10 I20 I30

Reduction tree to select k cols from sets of 2k

cols,

( A(:, I00 ∪ I10) A(:, I20 ∪ I30); ) = = ( Q01R01V T

01

Q11R11V T

11 )

↓ ↓ I01 I11 A(:, I01 ∪ I11) = Q02R02V T

02 → I02

Return selected columns A(:, I02)

[Demmel, LG, Gu, Xiang’15]

13 of 42

slide-20
SLIDE 20

Recent deterministic algorithms and bounds CA LU with column/row tournament pivoting

Deterministic column selection: tournament pivoting

1D tournament pivoting (1Dc-TP)

1D column block partition of A, select k cols

from each block with strong RRQR

( A11 A12 A13 A14 ) = = = = ( Q00R00V T

00

Q10R10V T

10

Q20R20V T

20

Q30R30V T

30 )

↓ ↓ ↓ ↓ I00 I10 I20 I30

Reduction tree to select k cols from sets of 2k

cols,

( A(:, I00 ∪ I10) A(:, I20 ∪ I30); ) = = ( Q01R01V T

01

Q11R11V T

11 )

↓ ↓ I01 I11 A(:, I01 ∪ I11) = Q02R02V T

02 → I02

Return selected columns A(:, I02)

[Demmel, LG, Gu, Xiang’15]

13 of 42

slide-21
SLIDE 21

Recent deterministic algorithms and bounds CA LU with column/row tournament pivoting

Deterministic column selection: tournament pivoting

1D tournament pivoting (1Dc-TP)

1D column block partition of A, select k cols

from each block with strong RRQR

( A11 A12 A13 A14 ) = = = = ( Q00R00V T

00

Q10R10V T

10

Q20R20V T

20

Q30R30V T

30 )

↓ ↓ ↓ ↓ I00 I10 I20 I30

Reduction tree to select k cols from sets of 2k

cols,

( A(:, I00 ∪ I10) A(:, I20 ∪ I30); ) = = ( Q01R01V T

01

Q11R11V T

11 )

↓ ↓ I01 I11 A(:, I01 ∪ I11) = Q02R02V T

02 → I02

Return selected columns A(:, I02)

[Demmel, LG, Gu, Xiang’15]

13 of 42

slide-22
SLIDE 22

Recent deterministic algorithms and bounds CA LU with column/row tournament pivoting

Deterministic column selection: tournament pivoting

1D tournament pivoting (1Dc-TP)

1D column block partition of A, select k cols

from each block with strong RRQR

( A11 A12 A13 A14 ) = = = = ( Q00R00V T

00

Q10R10V T

10

Q20R20V T

20

Q30R30V T

30 )

↓ ↓ ↓ ↓ I00 I10 I20 I30

Reduction tree to select k cols from sets of 2k

cols,

( A(:, I00 ∪ I10) A(:, I20 ∪ I30); ) = = ( Q01R01V T

01

Q11R11V T

11 )

↓ ↓ I01 I11 A(:, I01 ∪ I11) = Q02R02V T

02 → I02

Return selected columns A(:, I02)

[Demmel, LG, Gu, Xiang’15]

13 of 42

slide-23
SLIDE 23

Recent deterministic algorithms and bounds CA LU with column/row tournament pivoting

Deterministic column selection: tournament pivoting

1D tournament pivoting (1Dc-TP)

1D column block partition of A, select k cols

from each block with strong RRQR

( A11 A12 A13 A14 ) = = = = ( Q00R00V T

00

Q10R10V T

10

Q20R20V T

20

Q30R30V T

30 )

↓ ↓ ↓ ↓ I00 I10 I20 I30

Reduction tree to select k cols from sets of 2k

cols,

( A(:, I00 ∪ I10) A(:, I20 ∪ I30); ) = = ( Q01R01V T

01

Q11R11V T

11 )

↓ ↓ I01 I11 A(:, I01 ∪ I11) = Q02R02V T

02 → I02

Return selected columns A(:, I02)

[Demmel, LG, Gu, Xiang’15]

13 of 42

slide-24
SLIDE 24

Recent deterministic algorithms and bounds CA LU with column/row tournament pivoting

Deterministic column selection: tournament pivoting

1D tournament pivoting (1Dc-TP)

1D column block partition of A, select k cols

from each block with strong RRQR

( A11 A12 A13 A14 ) = = = = ( Q00R00V T

00

Q10R10V T

10

Q20R20V T

20

Q30R30V T

30 )

↓ ↓ ↓ ↓ I00 I10 I20 I30

Reduction tree to select k cols from sets of 2k

cols,

( A(:, I00 ∪ I10) A(:, I20 ∪ I30); ) = = ( Q01R01V T

01

Q11R11V T

11 )

↓ ↓ I01 I11 A(:, I01 ∪ I11) = Q02R02V T

02 → I02

Return selected columns A(:, I02)

[Demmel, LG, Gu, Xiang’15]

13 of 42

slide-25
SLIDE 25

Recent deterministic algorithms and bounds CA LU with column/row tournament pivoting

Tournament pivoting for 1D row partitioning - 1Dr TP

Row block partition A as e.g.

A =     A11 A21 A31 A41     =     Q00R00V −1

00

Q10R10V −1

10

Q20R20V −1

20

Q30R30V −1

30

    → select k cols I00 → select k cols I10 → select k cols I20 → select k cols I30

Apply 1D-TP on sets of 2k sub-columns

    A11 A21

  • (:, I00 ∪ I10)

A31 A41

  • (:, I20 ∪ I30)

    = Q01R01V −1

01

Q11R11V −1

11

  • → I01

→ I11 A(:, I01 ∪ I11) =

  • Q02R02V −1

02

  • → I02

Return columns A(:, I02)

with M. Beaup` ere, Inria

14 of 42

slide-26
SLIDE 26

Recent deterministic algorithms and bounds CA LU with column/row tournament pivoting

Tournament pivoting for 1D row partitioning - 1Dr TP

Row block partition A as e.g.

A =     A11 A21 A31 A41     =     Q00R00V −1

00

Q10R10V −1

10

Q20R20V −1

20

Q30R30V −1

30

    → select k cols I00 → select k cols I10 → select k cols I20 → select k cols I30

Apply 1D-TP on sets of 2k sub-columns

    A11 A21

  • (:, I00 ∪ I10)

A31 A41

  • (:, I20 ∪ I30)

    = Q01R01V −1

01

Q11R11V −1

11

  • → I01

→ I11 A(:, I01 ∪ I11) =

  • Q02R02V −1

02

  • → I02

Return columns A(:, I02)

with M. Beaup` ere, Inria

14 of 42

slide-27
SLIDE 27

Recent deterministic algorithms and bounds CA LU with column/row tournament pivoting

Tournament pivoting for 1D row partitioning - 1Dr TP

Row block partition A as e.g.

A =     A11 A21 A31 A41     =     Q00R00V −1

00

Q10R10V −1

10

Q20R20V −1

20

Q30R30V −1

30

    → select k cols I00 → select k cols I10 → select k cols I20 → select k cols I30

Apply 1D-TP on sets of 2k sub-columns

    A11 A21

  • (:, I00 ∪ I10)

A31 A41

  • (:, I20 ∪ I30)

    = Q01R01V −1

01

Q11R11V −1

11

  • → I01

→ I11 A(:, I01 ∪ I11) =

  • Q02R02V −1

02

  • → I02

Return columns A(:, I02)

with M. Beaup` ere, Inria

14 of 42

slide-28
SLIDE 28

Recent deterministic algorithms and bounds CA LU with column/row tournament pivoting

Tournament pivoting for 1D row partitioning - 1Dr TP

Row block partition A as e.g.

A =     A11 A21 A31 A41     =     Q00R00V −1

00

Q10R10V −1

10

Q20R20V −1

20

Q30R30V −1

30

    → select k cols I00 → select k cols I10 → select k cols I20 → select k cols I30

Apply 1D-TP on sets of 2k sub-columns

    A11 A21

  • (:, I00 ∪ I10)

A31 A41

  • (:, I20 ∪ I30)

    = Q01R01V −1

01

Q11R11V −1

11

  • → I01

→ I11 A(:, I01 ∪ I11) =

  • Q02R02V −1

02

  • → I02

Return columns A(:, I02)

with M. Beaup` ere, Inria

14 of 42

slide-29
SLIDE 29

Recent deterministic algorithms and bounds CA LU with column/row tournament pivoting

Tournament pivoting for 1D row partitioning - 1Dr TP

Row block partition A as e.g.

A =     A11 A21 A31 A41     =     Q00R00V −1

00

Q10R10V −1

10

Q20R20V −1

20

Q30R30V −1

30

    → select k cols I00 → select k cols I10 → select k cols I20 → select k cols I30

Apply 1D-TP on sets of 2k sub-columns

    A11 A21

  • (:, I00 ∪ I10)

A31 A41

  • (:, I20 ∪ I30)

    = Q01R01V −1

01

Q11R11V −1

11

  • → I01

→ I11 A(:, I01 ∪ I11) =

  • Q02R02V −1

02

  • → I02

Return columns A(:, I02)

with M. Beaup` ere, Inria

14 of 42

slide-30
SLIDE 30

Recent deterministic algorithms and bounds CA LU with column/row tournament pivoting

CA-RRQR : 2D tournament pivoting

A distributed on Pr × Pc procs as e.g.

A =

  • A11

A12 A13 A14 A21 A22 A23 A24

  • Select k cols from each column block by 1Dr-TP,

A11 A21

  • A12

A22

  • A13

A23

  • A14

A24

↓ ↓ ↓ I00 I10 I20 I30

Apply 1Dc-TP on sets of k selected cols,

A(:, I00) A(:, I10) A(:, I20) A(:, I30)

Return columns selected by 1Dc-TP A(:, I02)

with M. Beaup` ere, Inria

15 of 42

slide-31
SLIDE 31

Recent deterministic algorithms and bounds CA LU with column/row tournament pivoting

CA-RRQR : 2D tournament pivoting

A distributed on Pr × Pc procs as e.g.

A =

  • A11

A12 A13 A14 A21 A22 A23 A24

  • Select k cols from each column block by 1Dr-TP,

A11 A21

  • A12

A22

  • A13

A23

  • A14

A24

↓ ↓ ↓ I00 I10 I20 I30

Apply 1Dc-TP on sets of k selected cols,

A(:, I00) A(:, I10) A(:, I20) A(:, I30)

Return columns selected by 1Dc-TP A(:, I02)

with M. Beaup` ere, Inria

15 of 42

slide-32
SLIDE 32

Recent deterministic algorithms and bounds CA LU with column/row tournament pivoting

CA-RRQR : 2D tournament pivoting

A distributed on Pr × Pc procs as e.g.

A =

  • A11

A12 A13 A14 A21 A22 A23 A24

  • Select k cols from each column block by 1Dr-TP,

A11 A21

  • A12

A22

  • A13

A23

  • A14

A24

↓ ↓ ↓ I00 I10 I20 I30

Apply 1Dc-TP on sets of k selected cols,

A(:, I00) A(:, I10) A(:, I20) A(:, I30)

Return columns selected by 1Dc-TP A(:, I02)

with M. Beaup` ere, Inria

15 of 42

slide-33
SLIDE 33

Recent deterministic algorithms and bounds CA LU with column/row tournament pivoting

CA-RRQR - bounds for 2D tournament pivoting

Bounds when selecting k columns from A ∈ Rm×n distributed on P = Pr × Pc processors by using 2D tournament pivoting: 1 ≤ σi(A) σi(R11), σj(R22) σk+j(A) ≤ γ1(n, k), γ1(n, k) =

  • 1 + F 2

2D−TP(n − k),

||(R−1

11 R12)(:, l)||2 ≤ F2D−TP

for 1 ≤ i ≤ k, 1 ≤ j ≤ min(m, n) − k, 1 ≤ l ≤ n − k.

1Dr-TP with binary tree of depth log2 Pr followed by 1Dc-TP with binary

tree of depth log2 Pc, F2D−TP ≤ Pklog2 P+1/2f log2 Pc+1

Cost: O( mnk

P )flops, (1 + log2Pr)log2P messages , O( mk Pr log2 Pc) words

→ ˜ Aqr with 2D TP is (k, γ1(n, k)) spectrum preserving and kernel approximation of A

16 of 42

slide-34
SLIDE 34

Recent deterministic algorithms and bounds CA LU with column/row tournament pivoting

CA-RRQR : 2D tournament pivoting

17 of 42

slide-35
SLIDE 35

Recent deterministic algorithms and bounds CA LU with column/row tournament pivoting

Numerical experiments

Original image, size 1190 × 1920 Singular values and ratios

20 40 60 80 100 i 25000 50000 75000 100000 125000 150000 175000 matrix billiard SVD RRQR Approximation rank 2D TP 0.6 0.7 0.8 0.9 1.0 Error 2D TP / SVD

Rank-10 approx, 2D TP 8 × 8 procs Rank-50 approx, 2D TP 8 × 8 procs

Image source: https://pixabay.com/photos/billiards-ball-play-number-half-4345870/ 18 of 42

slide-36
SLIDE 36

Recent deterministic algorithms and bounds CA LU with column/row tournament pivoting

LU CRTP: LU with column/row tournament pivoting

Compute rank-k approx. ˜ Alu of A ∈ Rm×n, k = l = l′, ˜ Alu = ¯ A11 ¯ A21

  • ¯

A−1

11

¯ A11 ¯ A12

  • = AV1(U1AV1)−1U1A = PsoA

(1)

  • 1. Select k columns by using TP, bounds for s.v. governed by γ1(n, k)

AV = Q R11 R12 R22

  • =

Q1 Q2 R11 R12 R22

  • 2. Select k rows from Q1 ∈ Rm×k by using TP,

U1Q1 = ¯ Q11 ¯ Q21

  • , hence ¯

A11 = ¯ Q11R11, s.t. || ¯ Q21 ¯ Q−1

11 ||max is bounded and bounds for s.v. governed by γ2(m, k),

1 γ2(m, k) ≤ σi( ¯ Q11) ≤ 1.

with S. Cayrols, J. Demmel, 2018

19 of 42

slide-37
SLIDE 37

Recent deterministic algorithms and bounds CA LU with column/row tournament pivoting

Deterministic guarantees for rank-k approximation

CA LU CRTP with column/row selection with binary tree tournament

pivoting: 1 ≤ σi(A) σi( ¯ A11), σj(S( ¯ A11)) σk+j(A) ≤

  • (1 + F 2

TP(n − k))/σmin( ¯

Q11) ≤

  • (1 + F 2

TP(n − k)) (1 + F 2 TP(m − k))

= γ1(n, k)γ2(m, k), for any 1 ≤ i ≤ k, and 1 ≤ j ≤ min(m, n) − k, U1Q1 = ¯ Q11 ¯ Q21

  • , and

σj(A − ˜ Alu) = σj(S( ¯ A11)).

→ ˜ Alu is (k, γ1(n, k)γ2(m, k)) spectrum preserving and kernel approximation of A

20 of 42

slide-38
SLIDE 38

Recent deterministic algorithms and bounds CA LU with column/row tournament pivoting

Performance results

Selection of 256 columns by tournament pivoting

Edison, Cray XC30 (NERSC): 2x12-core Intel Ivy Bridge (2.4 GHz) Tournament pivoting uses SPQR (T. Davis) + dGEQP3 (Lapack), time

in secs Matrices: dimension at leaves on 32 procs

Parab fem: 528825 × 528825

528825 × 16432

Mac econ: 206500 × 206500

206500 × 6453

Time Time leaves Number of MPI processes 2k cols 32procs 16 32 64 128 256 512 1024 SPQR + dGEQP3 Parab fem 0.26 0.26 + 1129 46.7 24.5 13.7 8.4 5.9 4.8 4.4 Mac econ 0.46 25.4 + 510 132.7 86.3 111.4 59.6 27.2 − −

21 of 42

slide-39
SLIDE 39

Randomized generalized LU and bounds

Plan

Motivation of our work Unified perspective on low rank matrix approximation Generalized LU decomposition Recent deterministic algorithms and bounds CA RRQR with 2D tournament pivoting CA LU with column/row tournament pivoting Randomized generalized LU and bounds Approximation of tensors Parallel HORRQR Conclusions

22 of 42

slide-40
SLIDE 40

Randomized generalized LU and bounds

Typical randomized SVD

  • 1. Compute an approximate basis for the range of A ∈ Rm×n

Sample V1 ∈ Rn×l, l = p + k, with independent mean-zero, unit-variance Gaussian entries. Compute Y = AV1, Y ∈ Rm×l expected to span column space of A.

Cost of multiplying AV1: 2mnl flops

  • 2. With Q1 being orthonormal basis of Y , approximate A as:

˜ Ak = Q1QT

1 A = PoA

Cost of multiplying QT

1 A: 2mnl flops

Source: Halko et al, Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decomposition, SIREV 2011.

23 of 42

slide-41
SLIDE 41

Randomized generalized LU and bounds

Cost of randomized SVD for dense matrices

→ To have lower arithmetic complexity than deterministic approaches, the costs of multiplying AV1 and QT

1 A need to be reduced:

  • 1. Take V1 a fast Johnson-Lindenstrauss transform, e.g. a subsampled

randomized Hadamard transform (SRHT), AV1 costs 2mn log2(l + 1)

References: Ailon and Chazelle’06, Liberty, Rokhlin, Tygert and Woolfe’06, Sarlos’06.

  • 2. Use a different projector than Po, e.g. pick U1 and compute

˜ Ak = PsoA = AV1(U1AV1)+U1A

Examples: randomized SVD via row extraction, Clarkson and Woodruff approximation in input sparsity time.

24 of 42

slide-42
SLIDE 42

Randomized generalized LU and bounds

Cost of randomized SVD for dense matrices

→ To have lower arithmetic complexity than deterministic approaches, the costs of multiplying AV1 and QT

1 A need to be reduced:

  • 1. Take V1 a fast Johnson-Lindenstrauss transform, e.g. a subsampled

randomized Hadamard transform (SRHT), AV1 costs 2mn log2(l + 1)

References: Ailon and Chazelle’06, Liberty, Rokhlin, Tygert and Woolfe’06, Sarlos’06.

  • 2. Use a different projector than Po, e.g. pick U1 and compute

˜ Ak = PsoA = AV1(U1AV1)+U1A

Examples: randomized SVD via row extraction, Clarkson and Woodruff approximation in input sparsity time.

24 of 42

slide-43
SLIDE 43

Randomized generalized LU and bounds

Cost of randomized SVD for dense matrices

→ To have lower arithmetic complexity than deterministic approaches, the costs of multiplying AV1 and QT

1 A need to be reduced:

  • 1. Take V1 a fast Johnson-Lindenstrauss transform, e.g. a subsampled

randomized Hadamard transform (SRHT), AV1 costs 2mn log2(l + 1)

References: Ailon and Chazelle’06, Liberty, Rokhlin, Tygert and Woolfe’06, Sarlos’06.

  • 2. Use a different projector than Po, e.g. pick U1 and compute

˜ Ak = PsoA = AV1(U1AV1)+U1A

Examples: randomized SVD via row extraction, Clarkson and Woodruff approximation in input sparsity time.

24 of 42

slide-44
SLIDE 44

Randomized generalized LU and bounds

Unified perspective: generalized LU factorization

Given U1, A, V1, Q1 orth. basis of (AV1), k ≤ l = l′, rank-k approximation, ˜ Ak = AV1(U1AV1)−1U1A = PsoA Deterministic algorithms Randomized algorithms∗ V1 column permutation and ... V1 random matrix and ... QR with column selection Randomized QR

(a.k.a. strong rank revealing QR) (a.k.a. randomized SVD)

U1 = QT

1 , ˜

Ak = Q1QT

1 A = PoA

U1 = QT

1 , ˜

Ak = Q1QT

1 A = PoA

||R−1

11 R12||max is bounded

LU with column/row selection Randomized LU with row selection

(a.k.a. rank revealing LU) (a.k.a. randomized SVD via Row extraction)

U1 row permutation s.t. U1Q1 = ¯ Q11 ¯ Q21

  • U1 row permutation s.t. U1Q1 =

¯ Q11 ¯ Q21

  • || ¯

Q21 ¯ Q−1

11 ||max is bounded

|| ¯ Q21 ¯ Q−1

11 ||max bounded

Randomized LU approximation

U1 random matrix with J. Demmel, A. Rusciano

∗ For a review, see Halko et al., SIAM Review 11

25 of 42

slide-45
SLIDE 45

Randomized generalized LU and bounds

Unified perspective: generalized LU factorization

Given U1, A, V1, Q1 orth. basis of (AV1), k ≤ l = l′, rank-k approximation, ˜ Ak = AV1(U1AV1)−1U1A = PsoA Deterministic algorithms Randomized algorithms∗ V1 column permutation and ... V1 random matrix and ... QR with column selection Randomized QR

(a.k.a. strong rank revealing QR) (a.k.a. randomized SVD)

U1 = QT

1 , ˜

Ak = Q1QT

1 A = PoA

U1 = QT

1 , ˜

Ak = Q1QT

1 A = PoA

||R−1

11 R12||max is bounded

LU with column/row selection Randomized LU with row selection

(a.k.a. rank revealing LU) (a.k.a. randomized SVD via Row extraction)

U1 row permutation s.t. U1Q1 = ¯ Q11 ¯ Q21

  • U1 row permutation s.t. U1Q1 =

¯ Q11 ¯ Q21

  • || ¯

Q21 ¯ Q−1

11 ||max is bounded

|| ¯ Q21 ¯ Q−1

11 ||max bounded

Randomized LU approximation

U1 random matrix with J. Demmel, A. Rusciano

∗ For a review, see Halko et al., SIAM Review 11

25 of 42

slide-46
SLIDE 46

Randomized generalized LU and bounds

Unified perspective: generalized LU factorization

Given U1, A, V1, Q1 orth. basis of (AV1), k ≤ l = l′, rank-k approximation, ˜ Ak = AV1(U1AV1)−1U1A = PsoA Deterministic algorithms Randomized algorithms∗ V1 column permutation and ... V1 random matrix and ... QR with column selection Randomized QR

(a.k.a. strong rank revealing QR) (a.k.a. randomized SVD)

U1 = QT

1 , ˜

Ak = Q1QT

1 A = PoA

U1 = QT

1 , ˜

Ak = Q1QT

1 A = PoA

||R−1

11 R12||max is bounded

LU with column/row selection Randomized LU with row selection

(a.k.a. rank revealing LU) (a.k.a. randomized SVD via Row extraction)

U1 row permutation s.t. U1Q1 = ¯ Q11 ¯ Q21

  • U1 row permutation s.t. U1Q1 =

¯ Q11 ¯ Q21

  • || ¯

Q21 ¯ Q−1

11 ||max is bounded

|| ¯ Q21 ¯ Q−1

11 ||max bounded

Randomized LU approximation

U1 random matrix with J. Demmel, A. Rusciano

∗ For a review, see Halko et al., SIAM Review 11

25 of 42

slide-47
SLIDE 47

Randomized generalized LU and bounds

Unified perspective: generalized LU factorization

Given U1, A, V1, Q1 orth. basis of (AV1), k ≤ l = l′, rank-k approximation, ˜ Ak = AV1(U1AV1)−1U1A = PsoA Deterministic algorithms Randomized algorithms∗ V1 column permutation and ... V1 random matrix and ... QR with column selection Randomized QR

(a.k.a. strong rank revealing QR) (a.k.a. randomized SVD)

U1 = QT

1 , ˜

Ak = Q1QT

1 A = PoA

U1 = QT

1 , ˜

Ak = Q1QT

1 A = PoA

||R−1

11 R12||max is bounded

LU with column/row selection Randomized LU with row selection

(a.k.a. rank revealing LU) (a.k.a. randomized SVD via Row extraction)

U1 row permutation s.t. U1Q1 = ¯ Q11 ¯ Q21

  • U1 row permutation s.t. U1Q1 =

¯ Q11 ¯ Q21

  • || ¯

Q21 ¯ Q−1

11 ||max is bounded

|| ¯ Q21 ¯ Q−1

11 ||max bounded

Randomized LU approximation

U1 random matrix with J. Demmel, A. Rusciano

∗ For a review, see Halko et al., SIAM Review 11

25 of 42

slide-48
SLIDE 48

Randomized generalized LU and bounds

Unified perspective: generalized LU factorization

Given U1, A, V1, Q1 orth. basis of (AV1), k ≤ l < l′, rank-k approximation, ˜ Aglu = U−1

  • I

¯ A21 ¯ A+

11

¯ A11 ¯ A12

  • V −1

= [U+

1 (I − (U1AV1)(U1AV1)+) + (AV1)(U1AV1)+][U1A] = PsoA

Approximation result: When k ≤ l < l′, the approximation ˜ Aglu is more accurate than PsoA, A − PsoAT

F = A − ˜

Aglu2

F + ˜

Aglu − PsoA2

F

Deterministic guarantee: Let AV = QR =

  • Q1

Q2 R11 R12 R22

  • , then

σj(A − PoA) = σj(R22) σ2

j (A − ˜

Aglu) ≤ σ2

j (R22) + (U1Q1)+(U1Q2)(R22 − (R22)opt,j−1)2 2

26 of 42

slide-49
SLIDE 49

Randomized generalized LU and bounds

Unified perspective: generalized LU factorization

Given U1, A, V1, Q1 orth. basis of (AV1), k ≤ l < l′, rank-k approximation, ˜ Aglu = U−1

  • I

¯ A21 ¯ A+

11

¯ A11 ¯ A12

  • V −1

= [U+

1 (I − (U1AV1)(U1AV1)+) + (AV1)(U1AV1)+][U1A] = PsoA

Approximation result: When k ≤ l < l′, the approximation ˜ Aglu is more accurate than PsoA, A − PsoAT

F = A − ˜

Aglu2

F + ˜

Aglu − PsoA2

F

Deterministic guarantee: Let AV = QR =

  • Q1

Q2 R11 R12 R22

  • , then

σj(A − PoA) = σj(R22) σ2

j (A − ˜

Aglu) ≤ σ2

j (R22) + (U1Q1)+(U1Q2)(R22 − (R22)opt,j−1)2 2

26 of 42

slide-50
SLIDE 50

Randomized generalized LU and bounds

Unified perspective: generalized LU factorization

Given U1, A, V1, Q1 orth. basis of (AV1), k ≤ l < l′, rank-k approximation, ˜ Aglu = U−1

  • I

¯ A21 ¯ A+

11

¯ A11 ¯ A12

  • V −1

= [U+

1 (I − (U1AV1)(U1AV1)+) + (AV1)(U1AV1)+][U1A] = PsoA

Approximation result: When k ≤ l < l′, the approximation ˜ Aglu is more accurate than PsoA, A − PsoAT

F = A − ˜

Aglu2

F + ˜

Aglu − PsoA2

F

Deterministic guarantee: Let AV = QR =

  • Q1

Q2 R11 R12 R22

  • , then

σj(A − PoA) = σj(R22) σ2

j (A − ˜

Aglu) ≤ σ2

j (R22) + (U1Q1)+(U1Q2)(R22 − (R22)opt,j−1)2 2

26 of 42

slide-51
SLIDE 51

Randomized generalized LU and bounds

Oblivious subspace embedding

A (k, ǫ, δ) oblivious subspace embedding (OSE) from Rn to Rl is a

distribution U1 ∼ D over l × n matrices. It satisfies with probability 1 − δ 1 − ǫ ≤ σ2

min(U1Q1) ≤ σ2 max(U1Q1) ≤ 1 + ǫ

(2) for any given orthogonal n × k matrix Q1. We assume l ≥ k and ǫ < 1/6.

U1 ∈ Rl×n is (ǫ, δ, n) multiplication approximating, if for any A, B having

n rows, it satisfies with probability 1 − δ, ATUT

1 U1B − ATB2 F ≤ ǫA2 FB2 F

(3)

Let U1 ∈ Rl×n be subsampled random Hadamard transform (SRHT)

  • btained by uniform sampling without replacement,

With appropriate choices of ǫ, δ, l, U1 satisfies OSE property (2) (Lemma

4.1 from [Boutsidis and Gittens, 2013]) and the multiplication property (3).

27 of 42

slide-52
SLIDE 52

Randomized generalized LU and bounds

Oblivious subspace embedding

A (k, ǫ, δ) oblivious subspace embedding (OSE) from Rn to Rl is a

distribution U1 ∼ D over l × n matrices. It satisfies with probability 1 − δ 1 − ǫ ≤ σ2

min(U1Q1) ≤ σ2 max(U1Q1) ≤ 1 + ǫ

(2) for any given orthogonal n × k matrix Q1. We assume l ≥ k and ǫ < 1/6.

U1 ∈ Rl×n is (ǫ, δ, n) multiplication approximating, if for any A, B having

n rows, it satisfies with probability 1 − δ, ATUT

1 U1B − ATB2 F ≤ ǫA2 FB2 F

(3)

Let U1 ∈ Rl×n be subsampled random Hadamard transform (SRHT)

  • btained by uniform sampling without replacement,

With appropriate choices of ǫ, δ, l, U1 satisfies OSE property (2) (Lemma

4.1 from [Boutsidis and Gittens, 2013]) and the multiplication property (3).

27 of 42

slide-53
SLIDE 53

Randomized generalized LU and bounds

Probabilistic guarantees

Combine deterministic guarantees with sketching ensembles satisfying

  • blivious subspace embedding properties → better bounds

Consider U1 ∈ Rl′×m, V1 ∈ Rn×l are SRHT, l′ > l Compute PoA costs O(mnl) flops Compute ˜

Aglu through generalized LU costs O(mn log2 l′) flops

Let ρ be the rank of A, l = O(1)ǫ−1( √ k +

  • 8 log(n/δ))2 log(k/δ), l ≥ log(n/δ)log(ρ/δ),

l′ = O(1)ǫ−1( √ l +

  • 8 log(m/δ))2 log(k/δ), l′ ≥ log(m/δ)log(ρ/δ).

With probability 1 − 5δ, σ2

j (A − PoA)

≤ O(1)σ2

k+j(A) + O(log(ρ/δ)

l )(σ2

k+j(A) + . . . σ2 n(A))

σ2

j (A − ˜

Aglu) ≤ O(1)σ2

k+j(A) + O(log(ρ/δ)

l )(σ2

k+j(A) + . . . σ2 n(A)).

→ Randomized PoA and ˜ Aglu are kernel approximations (upper bound) of A

28 of 42

slide-54
SLIDE 54

Randomized generalized LU and bounds

Probabilistic guarantees

Combine deterministic guarantees with sketching ensembles satisfying

  • blivious subspace embedding properties → better bounds

Consider U1 ∈ Rl′×m, V1 ∈ Rn×l are SRHT, l′ > l Compute PoA costs O(mnl) flops Compute ˜

Aglu through generalized LU costs O(mn log2 l′) flops

Let ρ be the rank of A, l = O(1)ǫ−1( √ k +

  • 8 log(n/δ))2 log(k/δ), l ≥ log(n/δ)log(ρ/δ),

l′ = O(1)ǫ−1( √ l +

  • 8 log(m/δ))2 log(k/δ), l′ ≥ log(m/δ)log(ρ/δ).

With probability 1 − 5δ, σ2

j (A − PoA)

≤ O(1)σ2

k+j(A) + O(log(ρ/δ)

l )(σ2

k+j(A) + . . . σ2 n(A))

σ2

j (A − ˜

Aglu) ≤ O(1)σ2

k+j(A) + O(log(ρ/δ)

l )(σ2

k+j(A) + . . . σ2 n(A)).

→ Randomized PoA and ˜ Aglu are kernel approximations (upper bound) of A

28 of 42

slide-55
SLIDE 55

Randomized generalized LU and bounds

Growth factor in Gaussian elimination

ρ(A) := maxk ||Sk||max ||A||max , where A ∈ Rm×n, Sk is Schur complement obtained at iteration k Deterministic algorithms, k steps of LU

LU with partial pivoting: ρ(A) ≤ 2k CA LU with column/row selection with binary tree tournament pivoting:

||Sk( ¯ A11)||max ≤ min((1 + FTP √ k)||A||max, FTP

  • 1 + F 2

TP(m − k)σk(A))

Randomized algorithms U, V Haar distributed matrices, complete LU factorization, E[log(ρ(UAV ))] = O(log(n))

29 of 42

slide-56
SLIDE 56

Approximation of tensors

Plan

Motivation of our work Unified perspective on low rank matrix approximation Generalized LU decomposition Recent deterministic algorithms and bounds CA RRQR with 2D tournament pivoting CA LU with column/row tournament pivoting Randomized generalized LU and bounds Approximation of tensors Parallel HORRQR Conclusions

30 of 42

slide-57
SLIDE 57

Approximation of tensors Parallel HORRQR

Approximation of tensors

Let A be a d-order tensor, A ∈ Rn1×n2×...nd.

CANDECOMP/PARAFAC (CP) [Hitchcock’27] approximates A as the

sum of k rank-1 tensors, where q1,i ◦ q2,i is outer product of q1,i and q2,i, ˜ A =

k

  • i=1

q1,i ◦ q2,i ◦ . . . ◦ qd,i

Tucker decomposition [Tucker, 1963], computes a rank-(k1, . . . kd)

approximation e.g. by using HOSVD and ALS, ˜ A = C ×1 Q1 ×2 Q2 . . . ×d Qd =

k1

  • s1=1

k2

  • s2=1

. . .

kd

  • sd=1

C(s1, . . . , sd)Q1(:, s1) ◦ . . . ◦ Qd(:, sd) where C ∈ Rk1×k2×...×kd, Qi ∈ Rni×ki, i = 1, . . . d.

Tensor train or tensor networks for high dimensions

For an overview, see Kolda and Bader, SIAM Review 2009

31 of 42

slide-58
SLIDE 58

Approximation of tensors Parallel HORRQR

HOSVD for computing a Tucker decomposition

HOSVD for computing a rank − (k1, . . . kd) approximation

  • 1. Input: Tensor A ∈ Rn1×...×nd , ranks k1, . . . kd
  • 2. For every unfolding Ai along mode i = 1...d compute the ki

(approximated) leading left singular vectors of Ai, Qi ∈ Rni×ki

A1 =     1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62 3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64     → RRQR     61 1 62 2 63 3 64 4    

  • 3. C = A ×1 QT

1 ×2 QT 2 . . . ×d QT d

  • 4. Return:

˜ A = C ×1 Q1 . . . ×d Qd = A ×1 Q1QT

1 . . . ×d QdQT d

5 2 5 2 5 1 5 1 5 5 4 9 4 9 5 6 5 6 5 5 5 5 5 4 5 4 5 3 5 3 6 6 5 9 5 9 5 8 5 8 5 7 5 7 6 4 6 4 6 3 6 3 6 2 6 2 6 1 6 1 3 6 3 6 3 5 3 5 3 4 3 4 3 3 3 3 4 4 3 9 3 9 3 8 3 8 3 7 3 7 4 4 4 4 4 3 4 3 4 2 4 2 4 1 4 1 4 8 4 8 4 7 4 7 4 6 4 6 4 5 4 5 2 2 1 9 1 9 1 8 1 8 1 7 1 7 2 4 2 4 2 3 2 3 2 2 2 2 2 1 2 1 2 8 2 8 2 7 2 7 2 6 2 6 2 5 2 5 3 2 3 2 3 1 3 1 3 3 2 9 2 9 4 4 3 3 2 2 1 1 8 8 7 7 6 6 5 5 1 2 1 2 1 1 1 1 1 1 9 9 1 6 1 6 1 5 1 5 1 4 1 4 1 3 1 3

32 of 42

slide-59
SLIDE 59

Approximation of tensors Parallel HORRQR

HOSVD for computing a Tucker decomposition

HOSVD for computing a rank − (k1, . . . kd) approximation

  • 1. Input: Tensor A ∈ Rn1×...×nd , ranks k1, . . . kd
  • 2. For every unfolding Ai along mode i = 1...d compute the ki

(approximated) leading left singular vectors of Ai, Qi ∈ Rni×ki

A1 =     1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62 3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64     → RRQR     61 1 62 2 63 3 64 4    

  • 3. C = A ×1 QT

1 ×2 QT 2 . . . ×d QT d

  • 4. Return:

˜ A = C ×1 Q1 . . . ×d Qd = A ×1 Q1QT

1 . . . ×d QdQT d

5 2 5 2 5 1 5 1 5 5 4 9 4 9 5 6 5 6 5 5 5 5 5 4 5 4 5 3 5 3 6 6 5 9 5 9 5 8 5 8 5 7 5 7 6 4 6 4 6 3 6 3 6 2 6 2 6 1 6 1 3 6 3 6 3 5 3 5 3 4 3 4 3 3 3 3 4 4 3 9 3 9 3 8 3 8 3 7 3 7 4 4 4 4 4 3 4 3 4 2 4 2 4 1 4 1 4 8 4 8 4 7 4 7 4 6 4 6 4 5 4 5 2 2 1 9 1 9 1 8 1 8 1 7 1 7 2 4 2 4 2 3 2 3 2 2 2 2 2 1 2 1 2 8 2 8 2 7 2 7 2 6 2 6 2 5 2 5 3 2 3 2 3 1 3 1 3 3 2 9 2 9 4 4 3 3 2 2 1 1 8 8 7 7 6 6 5 5 1 2 1 2 1 1 1 1 1 1 9 9 1 6 1 6 1 5 1 5 1 4 1 4 1 3 1 3

32 of 42

slide-60
SLIDE 60

Approximation of tensors Parallel HORRQR

HOSVD for computing a Tucker decomposition

HOSVD for computing a rank − (k1, . . . kd) approximation

  • 1. Input: Tensor A ∈ Rn1×...×nd , ranks k1, . . . kd
  • 2. For every unfolding Ai along mode i = 1...d compute the ki

(approximated) leading left singular vectors of Ai, Qi ∈ Rni×ki

A1 =     1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62 3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64     → RRQR     61 1 62 2 63 3 64 4    

  • 3. C = A ×1 QT

1 ×2 QT 2 . . . ×d QT d

  • 4. Return:

˜ A = C ×1 Q1 . . . ×d Qd = A ×1 Q1QT

1 . . . ×d QdQT d

5 2 5 2 5 1 5 1 5 5 4 9 4 9 5 6 5 6 5 5 5 5 5 4 5 4 5 3 5 3 6 6 5 9 5 9 5 8 5 8 5 7 5 7 6 4 6 4 6 3 6 3 6 2 6 2 6 1 6 1 3 6 3 6 3 5 3 5 3 4 3 4 3 3 3 3 4 4 3 9 3 9 3 8 3 8 3 7 3 7 4 4 4 4 4 3 4 3 4 2 4 2 4 1 4 1 4 8 4 8 4 7 4 7 4 6 4 6 4 5 4 5 2 2 1 9 1 9 1 8 1 8 1 7 1 7 2 4 2 4 2 3 2 3 2 2 2 2 2 1 2 1 2 8 2 8 2 7 2 7 2 6 2 6 2 5 2 5 3 2 3 2 3 1 3 1 3 3 2 9 2 9 4 4 3 3 2 2 1 1 8 8 7 7 6 6 5 5 1 2 1 2 1 1 1 1 1 1 9 9 1 6 1 6 1 5 1 5 1 4 1 4 1 3 1 3

23.23 4.057 16.44 .5748

  • 297.5

2.820 9.812 .8033

  • .4469
  • .4813
  • .5157
  • .5500
  • .7072
  • .2614

.1845 .6304

  • .1826
  • .3651
  • .5477
  • .7303
  • .8165
  • .4082

.4082

  • .4879
  • .6797
  • .4959
  • .2325
  • .5039

.2146

  • .5119

.6618

32 of 42

slide-61
SLIDE 61

Approximation of tensors Parallel HORRQR

HOSVD for computing a Tucker decomposition

HOSVD for computing a rank − (k1, . . . kd) approximation

  • 1. Input: Tensor A ∈ Rn1×...×nd , ranks k1, . . . kd
  • 2. For every unfolding Ai along mode i = 1...d compute the ki

(approximated) leading left singular vectors of Ai, Qi ∈ Rni×ki

A1 =     1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62 3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64     → RRQR     61 1 62 2 63 3 64 4    

  • 3. C = A ×1 QT

1 ×2 QT 2 . . . ×d QT d

  • 4. Return:

˜ A = C ×1 Q1 . . . ×d Qd = A ×1 Q1QT

1 . . . ×d QdQT d

Error bound: If Qi are the leading left singular vectors of unfolding Ai, then: A − ˜ AF ≤ √ dA − AbestF, where Abest is the best rank-k1, . . . , kd approximation of A.

32 of 42

slide-62
SLIDE 62

Approximation of tensors Parallel HORRQR

Partitioning for parallel HO-RRQR

Consider a d-order tensor A ∈ Rn×...×n (n = 4, d = 3 in the example),

A =

49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Partition A into d

√ P × . . . ×

d

√ P subtensors Ai1..id ∈ Rn/ d

√ P×...×n/ d √ P

distributed on

d

√ P × . . . ×

d

√ P processor tensor,

P5 P1 P6 P2 P7 P3 P8 P4

59 60 63 64 43 44 47 48

P8

57 58 61 62 41 42 45 46

P7

51 52 55 56 35 36 39 40

P6

49 50 53 54 33 34 37 38

P5

27 28 31 32 11 12 15 16

P4

25 26 29 30 9 10 13 14

P3

19 20 23 24 3 4 7 8

P2

17 18 21 22 1 2 5 6

P1

33 of 42

slide-63
SLIDE 63

Approximation of tensors Parallel HORRQR

Partitioned unfolding

Consider 1-mode unfolding of the 2 × 2 × 2 processor tensor,

P2 P4 P6 P8 P1 P3 P5 P7

5 9 6 6 3 6 4 4 3 4 4 4 7 4 8 5 7 5 8 6 1 6 2 4 1 4 2 4 5 4 6 5 1 5 2 5 5 5 6 3 5 3 6 3 9 4 4 9 5 5 3 5 4 3 3 3 4 3 7 3 8 2 7 2 8 3 1 3 2 1 1 1 2 1 5 1 6 2 5 2 6 2 9 3 9 1 1 3 1 4 1 9 2 2 3 2 4 3 4 7 8 1 7 1 8 2 1 2 2 1 2 5 6 Followed on each processor by 1-mode unfolding of its subtensor, A12 =     1 5 17 21 9 13 25 29 33 37 49 53 41 45 57 61 2 6 18 22 10 14 26 30 34 38 50 54 42 46 58 62 3 7 19 23 11 15 27 31 35 39 51 55 43 47 59 63 4 8 20 24 12 16 28 32 36 40 52 56 44 48 60 64     The 1-mode unfolding of A is: A1 =     1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62 3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64     For any i-mode unfolding, there is a permutation Πi such that

Ai2 = AiΠi

with M. Beaup` ere and D. Frenkiel

34 of 42

slide-64
SLIDE 64

Approximation of tensors Parallel HORRQR

Partitioned unfolding

Consider 1-mode unfolding of the 2 × 2 × 2 processor tensor,

P2 P4 P6 P8 P1 P3 P5 P7

5 9 6 6 3 6 4 4 3 4 4 4 7 4 8 5 7 5 8 6 1 6 2 4 1 4 2 4 5 4 6 5 1 5 2 5 5 5 6 3 5 3 6 3 9 4 4 9 5 5 3 5 4 3 3 3 4 3 7 3 8 2 7 2 8 3 1 3 2 1 1 1 2 1 5 1 6 2 5 2 6 2 9 3 9 1 1 3 1 4 1 9 2 2 3 2 4 3 4 7 8 1 7 1 8 2 1 2 2 1 2 5 6 Followed on each processor by 1-mode unfolding of its subtensor, A12 =     1 5 17 21 9 13 25 29 33 37 49 53 41 45 57 61 2 6 18 22 10 14 26 30 34 38 50 54 42 46 58 62 3 7 19 23 11 15 27 31 35 39 51 55 43 47 59 63 4 8 20 24 12 16 28 32 36 40 52 56 44 48 60 64     The 1-mode unfolding of A is: A1 =     1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62 3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64     For any i-mode unfolding, there is a permutation Πi such that

Ai2 = AiΠi

with M. Beaup` ere and D. Frenkiel

34 of 42

slide-65
SLIDE 65

Approximation of tensors Parallel HORRQR

Partitioned unfolding

Consider 1-mode unfolding of the 2 × 2 × 2 processor tensor,

P2 P4 P6 P8 P1 P3 P5 P7

5 9 6 6 3 6 4 4 3 4 4 4 7 4 8 5 7 5 8 6 1 6 2 4 1 4 2 4 5 4 6 5 1 5 2 5 5 5 6 3 5 3 6 3 9 4 4 9 5 5 3 5 4 3 3 3 4 3 7 3 8 2 7 2 8 3 1 3 2 1 1 1 2 1 5 1 6 2 5 2 6 2 9 3 9 1 1 3 1 4 1 9 2 2 3 2 4 3 4 7 8 1 7 1 8 2 1 2 2 1 2 5 6 Followed on each processor by 1-mode unfolding of its subtensor, A12 =     1 5 17 21 9 13 25 29 33 37 49 53 41 45 57 61 2 6 18 22 10 14 26 30 34 38 50 54 42 46 58 62 3 7 19 23 11 15 27 31 35 39 51 55 43 47 59 63 4 8 20 24 12 16 28 32 36 40 52 56 44 48 60 64     The 1-mode unfolding of A is: A1 =     1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62 3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64     For any i-mode unfolding, there is a permutation Πi such that

Ai2 = AiΠi

with M. Beaup` ere and D. Frenkiel

34 of 42

slide-66
SLIDE 66

Approximation of tensors Parallel HORRQR

Partitioned unfolding

Consider 1-mode unfolding of the 2 × 2 × 2 processor tensor,

P2 P4 P6 P8 P1 P3 P5 P7

5 9 6 6 3 6 4 4 3 4 4 4 7 4 8 5 7 5 8 6 1 6 2 4 1 4 2 4 5 4 6 5 1 5 2 5 5 5 6 3 5 3 6 3 9 4 4 9 5 5 3 5 4 3 3 3 4 3 7 3 8 2 7 2 8 3 1 3 2 1 1 1 2 1 5 1 6 2 5 2 6 2 9 3 9 1 1 3 1 4 1 9 2 2 3 2 4 3 4 7 8 1 7 1 8 2 1 2 2 1 2 5 6 Followed on each processor by 1-mode unfolding of its subtensor, A12 =     1 5 17 21 9 13 25 29 33 37 49 53 41 45 57 61 2 6 18 22 10 14 26 30 34 38 50 54 42 46 58 62 3 7 19 23 11 15 27 31 35 39 51 55 43 47 59 63 4 8 20 24 12 16 28 32 36 40 52 56 44 48 60 64     The 1-mode unfolding of A is: A1 =     1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62 3 7 11 15 19 23 27 31 35 39 43 47 51 55 59 63 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64     For any i-mode unfolding, there is a permutation Πi such that

Ai2 = AiΠi

with M. Beaup` ere and D. Frenkiel

34 of 42

slide-67
SLIDE 67

Approximation of tensors Parallel HORRQR

Parallel HO-RRQR

HO-RRQR for computing a rank − (k1, . . . kd) approximation

  • 1. Input: Partitioned tensor A ∈ Rn×...×n on a

d

√ P × . . . ×

d

√ P processor tensor, ranks k1, . . . kd

  • 2. For every partitioned unfolding Ai2 along mode i = 1...d, compute factor

matrices Qi ∈ Rn×ki using 2D tournament pivoting (2D TP) on AT

i2:

A12 =     1 5 17 21 9 13 25 29 33 37 49 53 41 45 57 61 2 6 18 22 10 14 26 30 34 38 50 54 42 46 58 62 3 7 19 23 11 15 27 31 35 39 51 55 43 47 59 63 4 8 20 24 12 16 28 32 36 40 52 56 44 48 60 64     → 2D TP     61 1 62 2 63 3 64 4    

  • 3. C = A ×1 QT

1 ×2 QT 2 . . . ×d QT d

  • 4. Return:

˜ A = C ×1 Q1 . . . ×d Qd = A ×1 Q1QT

1 . . . ×d QdQT d

5 2 5 2 5 1 5 1 5 5 4 9 4 9 5 6 5 6 5 5 5 5 5 4 5 4 5 3 5 3 6 6 5 9 5 9 5 8 5 8 5 7 5 7 6 4 6 4 6 3 6 3 6 2 6 2 6 1 6 1 3 6 3 6 3 5 3 5 3 4 3 4 3 3 3 3 4 4 3 9 3 9 3 8 3 8 3 7 3 7 4 4 4 4 4 3 4 3 4 2 4 2 4 1 4 1 4 8 4 8 4 7 4 7 4 6 4 6 4 5 4 5 2 2 1 9 1 9 1 8 1 8 1 7 1 7 2 4 2 4 2 3 2 3 2 2 2 2 2 1 2 1 2 8 2 8 2 7 2 7 2 6 2 6 2 5 2 5 3 2 3 2 3 1 3 1 3 3 2 9 2 9 4 4 3 3 2 2 1 1 8 8 7 7 6 6 5 5 1 2 1 2 1 1 1 1 1 1 9 9 1 6 1 6 1 5 1 5 1 4 1 4 1 3 1 3

5 9 6 6 3 6 4 4 3 4 4 4 7 4 8

P8

5 7 5 8 6 1 6 2 4 1 4 2 4 5 4 6

P7

5 1 5 2 5 5 5 6 3 5 3 6 3 9 4

P6

4 9 5 5 3 5 4 3 3 3 4 3 7 3 8

P5

2 7 2 8 3 1 3 2 1 1 1 2 1 5 1 6

P4

2 5 2 6 2 9 3 9 1 1 3 1 4

P3

1 9 2 2 3 2 4 3 4 7 8

P2

1 7 1 8 2 1 2 2 1 2 5 6

P1

35 of 42

slide-68
SLIDE 68

Approximation of tensors Parallel HORRQR

Parallel HO-RRQR

HO-RRQR for computing a rank − (k1, . . . kd) approximation

  • 1. Input: Partitioned tensor A ∈ Rn×...×n on a

d

√ P × . . . ×

d

√ P processor tensor, ranks k1, . . . kd

  • 2. For every partitioned unfolding Ai2 along mode i = 1...d, compute factor

matrices Qi ∈ Rn×ki using 2D tournament pivoting (2D TP) on AT

i2:

A12 =     1 5 17 21 9 13 25 29 33 37 49 53 41 45 57 61 2 6 18 22 10 14 26 30 34 38 50 54 42 46 58 62 3 7 19 23 11 15 27 31 35 39 51 55 43 47 59 63 4 8 20 24 12 16 28 32 36 40 52 56 44 48 60 64     → 2D TP     61 1 62 2 63 3 64 4    

  • 3. C = A ×1 QT

1 ×2 QT 2 . . . ×d QT d

  • 4. Return:

˜ A = C ×1 Q1 . . . ×d Qd = A ×1 Q1QT

1 . . . ×d QdQT d

5 2 5 2 5 1 5 1 5 5 4 9 4 9 5 6 5 6 5 5 5 5 5 4 5 4 5 3 5 3 6 6 5 9 5 9 5 8 5 8 5 7 5 7 6 4 6 4 6 3 6 3 6 2 6 2 6 1 6 1 3 6 3 6 3 5 3 5 3 4 3 4 3 3 3 3 4 4 3 9 3 9 3 8 3 8 3 7 3 7 4 4 4 4 4 3 4 3 4 2 4 2 4 1 4 1 4 8 4 8 4 7 4 7 4 6 4 6 4 5 4 5 2 2 1 9 1 9 1 8 1 8 1 7 1 7 2 4 2 4 2 3 2 3 2 2 2 2 2 1 2 1 2 8 2 8 2 7 2 7 2 6 2 6 2 5 2 5 3 2 3 2 3 1 3 1 3 3 2 9 2 9 4 4 3 3 2 2 1 1 8 8 7 7 6 6 5 5 1 2 1 2 1 1 1 1 1 1 9 9 1 6 1 6 1 5 1 5 1 4 1 4 1 3 1 3

5 9 6 6 3 6 4 4 3 4 4 4 7 4 8

P8

5 7 5 8 6 1 6 2 4 1 4 2 4 5 4 6

P7

5 1 5 2 5 5 5 6 3 5 3 6 3 9 4

P6

4 9 5 5 3 5 4 3 3 3 4 3 7 3 8

P5

2 7 2 8 3 1 3 2 1 1 1 2 1 5 1 6

P4

2 5 2 6 2 9 3 9 1 1 3 1 4

P3

1 9 2 2 3 2 4 3 4 7 8

P2

1 7 1 8 2 1 2 2 1 2 5 6

P1

35 of 42

slide-69
SLIDE 69

Approximation of tensors Parallel HORRQR

Parallel HO-RRQR

HO-RRQR for computing a rank − (k1, . . . kd) approximation

  • 1. Input: Partitioned tensor A ∈ Rn×...×n on a

d

√ P × . . . ×

d

√ P processor tensor, ranks k1, . . . kd

  • 2. For every partitioned unfolding Ai2 along mode i = 1...d, compute factor

matrices Qi ∈ Rn×ki using 2D tournament pivoting (2D TP) on AT

i2:

A12 =     1 5 17 21 9 13 25 29 33 37 49 53 41 45 57 61 2 6 18 22 10 14 26 30 34 38 50 54 42 46 58 62 3 7 19 23 11 15 27 31 35 39 51 55 43 47 59 63 4 8 20 24 12 16 28 32 36 40 52 56 44 48 60 64     → 2D TP     61 1 62 2 63 3 64 4    

  • 3. C = A ×1 QT

1 ×2 QT 2 . . . ×d QT d

  • 4. Return:

˜ A = C ×1 Q1 . . . ×d Qd = A ×1 Q1QT

1 . . . ×d QdQT d

5 2 5 2 5 1 5 1 5 5 4 9 4 9 5 6 5 6 5 5 5 5 5 4 5 4 5 3 5 3 6 6 5 9 5 9 5 8 5 8 5 7 5 7 6 4 6 4 6 3 6 3 6 2 6 2 6 1 6 1 3 6 3 6 3 5 3 5 3 4 3 4 3 3 3 3 4 4 3 9 3 9 3 8 3 8 3 7 3 7 4 4 4 4 4 3 4 3 4 2 4 2 4 1 4 1 4 8 4 8 4 7 4 7 4 6 4 6 4 5 4 5 2 2 1 9 1 9 1 8 1 8 1 7 1 7 2 4 2 4 2 3 2 3 2 2 2 2 2 1 2 1 2 8 2 8 2 7 2 7 2 6 2 6 2 5 2 5 3 2 3 2 3 1 3 1 3 3 2 9 2 9 4 4 3 3 2 2 1 1 8 8 7 7 6 6 5 5 1 2 1 2 1 1 1 1 1 1 9 9 1 6 1 6 1 5 1 5 1 4 1 4 1 3 1 3

5 9 6 6 3 6 4 4 3 4 4 4 7 4 8

P8

5 7 5 8 6 1 6 2 4 1 4 2 4 5 4 6

P7

5 1 5 2 5 5 5 6 3 5 3 6 3 9 4

P6

4 9 5 5 3 5 4 3 3 3 4 3 7 3 8

P5

2 7 2 8 3 1 3 2 1 1 1 2 1 5 1 6

P4

2 5 2 6 2 9 3 9 1 1 3 1 4

P3

1 9 2 2 3 2 4 3 4 7 8

P2

1 7 1 8 2 1 2 2 1 2 5 6

P1

35 of 42

slide-70
SLIDE 70

Approximation of tensors Parallel HORRQR

Parallel HO-RRQR: cost and bounds

HO-RRQR for computing a rank − (k1, . . . kd) approximation

  • 1. Input: Partitioned tensor A ∈ Rn×...×n on a

d

√ P × . . . ×

d

√ P processor tensor, ranks k1, . . . kd

  • 2. For every partitioned unfolding Ai2 ∈ Rn×nd−1, i = 1...d, compute factor

matrices Qi ∈ Rn×ki using 2D tournament pivoting (2D TP) on AT

i2:

# messages ≈ d log2

2 P

Conjecture: can be decreased to log2

2 P with a unique reduction tree used by 2D TP

  • n the different unfoldings
  • 3. C = A ×1 QT

1 ×2 QT 2 . . . ×d QT d

  • 4. Return:

˜ A = C ×1 Q1 . . . ×d Qd = A ×1 Q1QT

1 . . . ×d QdQT d

Error bound: If factor matrices Qi are obtained from 2D TP on AT

i2, then:

A − ˜ AF ≤

  • 1 + max

i

(F 2

i,2D−TP(n − ki))

√ dA − AbestF, where Fi,2D−TP ≤ Pklog2 P+1/2

i

f (1−1/d) log2 P+1 where Abest is the best rank-k1, . . . , kd approximation of A.

36 of 42

slide-71
SLIDE 71

Approximation of tensors Parallel HORRQR

Parallel HO-RRQR: numerical experiments

Isosurface view of 256 × 256 × 256 aneurism:

Original tensor Core tensor 64 × 64 × 64, 2D TP, 8 procs Reconstructed image from core tensor 64 × 64 × 64

Image source: https://tc18.org/3D_images.html x-ray scan of the arteries of the

right half of a human head with aneurism.

37 of 42

slide-72
SLIDE 72

Conclusions

Plan

Motivation of our work Unified perspective on low rank matrix approximation Generalized LU decomposition Recent deterministic algorithms and bounds CA RRQR with 2D tournament pivoting CA LU with column/row tournament pivoting Randomized generalized LU and bounds Approximation of tensors Parallel HORRQR Conclusions

38 of 42

slide-73
SLIDE 73

Conclusions

Open questions for tensors

Many open questions - only a few recalled Communication bounds few existing results

Symmetric tensor contractions [Solomonik et al, 18] Bound for volume of communication for matricized tensor times

Khatri-Rao product [Ballard et al, 17] Approximation algorithms

Algorithms as DMRG are intrinsically sequential in the number of modes Dynamically adapt the rank to a given error Approximation of high rank tensors but low rank in large parts, e.g. due to stationarity in the model the tensor

describes

39 of 42

slide-74
SLIDE 74

Conclusions

Prospects for the future

Tensors in high dimensions ERC Synergy project Extreme-scale Mathematically-based Computational

Chemistry project (EMC2), with E. Cances, Y. Maday, and J.-P. Piquemal. Collaborators: O. Balabanov, M. Beaup` ere, S. Cayrols, J. Demmel, D. Frenkiel, A. Rusciano.

Funding:

This project has received funding from the European Commission under the Horizon

2020 research and innovation programme Grant agreement No. 810367

H2020 NLAFET project, ANR 40 of 42

slide-75
SLIDE 75

Conclusions

References

Results from following papers:

  • 1. Papers in preparation with O. Balabanov, M. Beaup`

ere, D. Frenkiel on 2D tournament pivoting, parallel HOSVD.

  • 2. J. Demmel, L. Grigori, A. Rusciano, An improved analysis and unified perspective on deterministic

and randomized low rank matrix approximations, October 2019.

  • 3. V. Ehrlacher, L. Grigori, D. Lombardi, H. Song, Adaptive hierarchical subtensor partitioning for

tensor compression, SIAM J. Sci. Comput., 2020, in revision.

  • 4. L. Grigori, S. Cayrols, and J. Demmel, Low rank approximation of a sparse matrix based on LU

factorization with column and row tournament pivoting , SIAM J. Sci. Comput., 40 (2):C181-C209, 2018.

  • 5. J. Demmel, L. Grigori, M. Gu, and H. Xiang, Communication-Avoiding Rank-Revealing QR

Factorization with Column Pivoting, SIAM Journal on Matrix Analysis and Applications, Vol. 36,

  • No. 1, pp. 55-89, 2015.

41 of 42

slide-76
SLIDE 76

Conclusions

References (1)

Boutsidis, C. and Gittens, A. (2013). Improved matrix algorithms via the subsampled randomized hadamard transform. SIAM J. Matrix Analysis Applications, 34:1301–1340. Businger, P. A. and Golub, G. H. (1965). Linear least squares solutions by Householder transformations.

  • Numer. Math., 7:269–276.

Eckart, C. and Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika, 1:211–218. Golub, G. H. (1965). Numerical methods for solving linear least squares problems.

  • Numer. Math., 7:206–216.

Gu, M. and Eisenstat, S. C. (1996). Efficient algorithms for computing a strong rank-revealing QR factorization. SIAM J. Sci. Comput., 17(4):848–869. Hastad, J. (1990). Tensor rank is NP-complete.

  • J. of Algorithms, 11:644–654.

Tucker, L. R. (1963). Implications of factor analysis of three-way matrices for measurement of change. In Harris, C. W., editor, Problems in Measuring Change, pages 122–137. University of Wisconsin Press. 42 of 42