Algebraic Methods for Tensor Data Neriman Tokcan (Broad Institute, - - PowerPoint PPT Presentation

algebraic methods for tensor data
SMART_READER_LITE
LIVE PREVIEW

Algebraic Methods for Tensor Data Neriman Tokcan (Broad Institute, - - PowerPoint PPT Presentation

Algebraic Methods for Tensor Data Neriman Tokcan (Broad Institute, MIT/Harvard) Harm Derksen (Dept. of Math., Northeastern University) Jonathan Gryak (BCIL lab, Univ. of Michigan) Kayvan Najarian (BCIL lab, Univ. of Michigan) August 25, 2020


slide-1
SLIDE 1

Algebraic Methods for Tensor Data

Neriman Tokcan (Broad Institute, MIT/Harvard) Harm Derksen (Dept. of Math., Northeastern University) Jonathan Gryak (BCIL lab, Univ. of Michigan) Kayvan Najarian (BCIL lab, Univ. of Michigan) August 25, 2020

Algebraic Methods for Tensor Data

slide-2
SLIDE 2

Tensors

Vi = Rpi V = V1 ⊗ V2 ⊗ · · · ⊗ Vd = Rp1×p2×···×pd T ∈ V tensor

Algebraic Methods for Tensor Data

slide-3
SLIDE 3

Tensors

Vi = Rpi V = V1 ⊗ V2 ⊗ · · · ⊗ Vd = Rp1×p2×···×pd T ∈ V tensor tensors appear in many data applications e.g., chemometrics, psychometrics, imaging, algebraic complexity theory, signal processing, neural networks, . . .

Algebraic Methods for Tensor Data

slide-4
SLIDE 4

Tensors

Vi = Rpi V = V1 ⊗ V2 ⊗ · · · ⊗ Vd = Rp1×p2×···×pd T ∈ V tensor tensors appear in many data applications e.g., chemometrics, psychometrics, imaging, algebraic complexity theory, signal processing, neural networks, . . . T · S inner product of tensors T = |T F = √ T · T Euclidean (Frobenius) norm

Algebraic Methods for Tensor Data

slide-5
SLIDE 5

Tensor Rank

Definition

rank(T ) is the smallest integer r for which we have a decomposition (⋆) T =

r

  • j=1

v1j ⊗ v2j ⊗ · · · ⊗ vdj

Algebraic Methods for Tensor Data

slide-6
SLIDE 6

Tensor Rank

Definition

rank(T ) is the smallest integer r for which we have a decomposition (⋆) T =

r

  • j=1

v1j ⊗ v2j ⊗ · · · ⊗ vdj if r is minimal then (⋆) is called the CP decomposition

Algebraic Methods for Tensor Data

slide-7
SLIDE 7

Tensor Rank

Definition

rank(T ) is the smallest integer r for which we have a decomposition (⋆) T =

r

  • j=1

v1j ⊗ v2j ⊗ · · · ⊗ vdj if r is minimal then (⋆) is called the CP decomposition tensor rank is not continuous or even semi-continuous, i.e., {T ∈ V | rank(T ) ≤ r} not always a closed set

Algebraic Methods for Tensor Data

slide-8
SLIDE 8

Nuclear Norm

use convex relaxation of tensor rank:

Definition

the nuclear norm T ⋆ of T is the minimal value of (#)

r

  • j=1

vi1 ⊗ vi2 ⊗ · · · ⊗ vid where r is arbitrary and (⋆) T =

r

  • j=1

v1j ⊗ v2j ⊗ · · · ⊗ vdj (a minimum is attained)

Algebraic Methods for Tensor Data

slide-9
SLIDE 9

Nuclear Norm

use convex relaxation of tensor rank:

Definition

the nuclear norm T ⋆ of T is the minimal value of (#)

r

  • j=1

vi1 ⊗ vi2 ⊗ · · · ⊗ vid where r is arbitrary and (⋆) T =

r

  • j=1

v1j ⊗ v2j ⊗ · · · ⊗ vdj (a minimum is attained) if T ⋆ is equal to (#), then (⋆) is called the nuclear decomposition or convex decomposition

Algebraic Methods for Tensor Data

slide-10
SLIDE 10

Spectral Norm

Definition

the spectral norm of T is T σ = max{T ·(v1⊗v2⊗· · ·⊗vd) | v1 = v2 = · · · = vd = 1} (a maximum is attained)

Algebraic Methods for Tensor Data

slide-11
SLIDE 11

Spectral Norm

Definition

the spectral norm of T is T σ = max{T ·(v1⊗v2⊗· · ·⊗vd) | v1 = v2 = · · · = vd = 1} (a maximum is attained) · σ and · ⋆ are dual norms |T · S| ≤ T ⋆Sσ

Algebraic Methods for Tensor Data

slide-12
SLIDE 12

Matrices (d = 2)

A ∈ Rp×q has a singular value decomposition A = UDV t with U ∈ Op, V ∈ Oq, D diagonal with the nonzero diagonal entries λ1 ≥ λ2 ≥ · · · ≥ λr > 0

Algebraic Methods for Tensor Data

slide-13
SLIDE 13

Matrices (d = 2)

A ∈ Rp×q has a singular value decomposition A = UDV t with U ∈ Op, V ∈ Oq, D diagonal with the nonzero diagonal entries λ1 ≥ λ2 ≥ · · · ≥ λr > 0 r = rank(A) is the matrix rank and the tensor rank of A

Algebraic Methods for Tensor Data

slide-14
SLIDE 14

Matrices (d = 2)

A ∈ Rp×q has a singular value decomposition A = UDV t with U ∈ Op, V ∈ Oq, D diagonal with the nonzero diagonal entries λ1 ≥ λ2 ≥ · · · ≥ λr > 0 r = rank(A) is the matrix rank and the tensor rank of A A⋆ = Tr( √ AAt) = λ1 + λ2 + · · · + λr

Algebraic Methods for Tensor Data

slide-15
SLIDE 15

Matrices (d = 2)

A ∈ Rp×q has a singular value decomposition A = UDV t with U ∈ Op, V ∈ Oq, D diagonal with the nonzero diagonal entries λ1 ≥ λ2 ≥ · · · ≥ λr > 0 r = rank(A) is the matrix rank and the tensor rank of A A⋆ = Tr( √ AAt) = λ1 + λ2 + · · · + λr Aσ = λ1 is operator norm

Algebraic Methods for Tensor Data

slide-16
SLIDE 16

Matrices (d = 2)

A ∈ Rp×q has a singular value decomposition A = UDV t with U ∈ Op, V ∈ Oq, D diagonal with the nonzero diagonal entries λ1 ≥ λ2 ≥ · · · ≥ λr > 0 r = rank(A) is the matrix rank and the tensor rank of A A⋆ = Tr( √ AAt) = λ1 + λ2 + · · · + λr Aσ = λ1 is operator norm the norms and the rank of A are easy to compute

Algebraic Methods for Tensor Data

slide-17
SLIDE 17

negative results (d = 3)

Theorem (H˚ astad)

computing tensor rank is NP-complete

Algebraic Methods for Tensor Data

slide-18
SLIDE 18

negative results (d = 3)

Theorem (H˚ astad)

computing tensor rank is NP-complete

Theorem (Friedland–Lim)

computing the nuclear norm is NP-complete

Algebraic Methods for Tensor Data

slide-19
SLIDE 19

negative results (d = 3)

Theorem (H˚ astad)

computing tensor rank is NP-complete

Theorem (Friedland–Lim)

computing the nuclear norm is NP-complete

Theorem (Hillar–Lim)

computing the spectral norm is NP-complete

Algebraic Methods for Tensor Data

slide-20
SLIDE 20

Approximating Spectral Norm

(case d = 3 for simplicity) T σ,d =

  • Sp−1×Sq−1×Sr−1 |T · (x ⊗ y ⊗ z)|d

1

d

. normalize T σ,d = T σ,d e1 ⊗ e1 ⊗ e1σ,d

Algebraic Methods for Tensor Data

slide-21
SLIDE 21

Approximating Spectral Norm

(case d = 3 for simplicity) T σ,d =

  • Sp−1×Sq−1×Sr−1 |T · (x ⊗ y ⊗ z)|d

1

d

. normalize T σ,d = T σ,d e1 ⊗ e1 ⊗ e1σ,d T σ = lim

d→∞ T σ,d.

Algebraic Methods for Tensor Data

slide-22
SLIDE 22

Approximating Spectral Norm

(case d = 3 for simplicity) T σ,d =

  • Sp−1×Sq−1×Sr−1 |T · (x ⊗ y ⊗ z)|d

1

d

. normalize T σ,d = T σ,d e1 ⊗ e1 ⊗ e1σ,d T σ = lim

d→∞ T σ,d.

when d is even there is an algebraic method for computing T σ,d!

Algebraic Methods for Tensor Data

slide-23
SLIDE 23

Invariant Tensors

V = Rn, V ⊗d = V ⊗ V ⊗ · · · ⊗ V

  • d

Algebraic Methods for Tensor Data

slide-24
SLIDE 24

Invariant Tensors

V = Rn, V ⊗d = V ⊗ V ⊗ · · · ⊗ V

  • d

On acts on V and V ⊗d

Algebraic Methods for Tensor Data

slide-25
SLIDE 25

Invariant Tensors

V = Rn, V ⊗d = V ⊗ V ⊗ · · · ⊗ V

  • d

On acts on V and V ⊗d there is an On-equivariant linear isomorphism between V ⊗d and the space of multilinear maps V d → R

Algebraic Methods for Tensor Data

slide-26
SLIDE 26

Invariant Tensors

V = Rn, V ⊗d = V ⊗ V ⊗ · · · ⊗ V

  • d

On acts on V and V ⊗d there is an On-equivariant linear isomorphism between V ⊗d and the space of multilinear maps V d → R so there is a linear isomorphism between the space (V ⊗d)On of On-invariant tensors and the space of On-invariant multilinear maps V d → R

Algebraic Methods for Tensor Data

slide-27
SLIDE 27

Brauer Diagrams

a Brauer diagram is a perfect matching, for example D = 1 3 5 2 4 6

Algebraic Methods for Tensor Data

slide-28
SLIDE 28

Brauer Diagrams

a Brauer diagram is a perfect matching, for example D = 1 3 5 2 4 6 to a diagram E on d vertices we can associate an On-invariant multilinear map ME : V d → R, for example MD(v1, v2, . . . , v6) = (v1 · v3)(v2 · v6)(v4 · v5)

Algebraic Methods for Tensor Data

slide-29
SLIDE 29

Brauer Diagrams

a Brauer diagram is a perfect matching, for example D = 1 3 5 2 4 6 to a diagram E on d vertices we can associate an On-invariant multilinear map ME : V d → R, for example MD(v1, v2, . . . , v6) = (v1 · v3)(v2 · v6)(v4 · v5) the invariant multi-linear map ME corresponds to an invariant tensor TE using the linear isomorphism of before, for example TD =

n

  • i=1

n

  • j=1

n

  • k=1

ei ⊗ ej ⊗ ei ⊗ ek ⊗ ek ⊗ ej

Algebraic Methods for Tensor Data

slide-30
SLIDE 30

Brauer Diagrams

Theorem (FFT of Invariant Theory for On)

the space (V ⊗d)On is spanned by all TD where D runs over all Brauer diagrams on d vertices

Algebraic Methods for Tensor Data

slide-31
SLIDE 31

Brauer Diagrams

Theorem (FFT of Invariant Theory for On)

the space (V ⊗d)On is spanned by all TD where D runs over all Brauer diagrams on d vertices the tensors TD are independent if n ≥ d

Algebraic Methods for Tensor Data

slide-32
SLIDE 32

Brauer Diagrams

Theorem (FFT of Invariant Theory for On)

the space (V ⊗d)On is spanned by all TD where D runs over all Brauer diagrams on d vertices the tensors TD are independent if n ≥ d the number of Brauer diagrams on d vertices is 1 · 3 · 5 · · · d − 1 (when d even)

Algebraic Methods for Tensor Data

slide-33
SLIDE 33

Partial Brauer Diagrams

to a partial Brauer diagram with d vertices and e edges we associate an O(V )-equivariant multi-linear map MD : V d → V ⊗(d−2e) and a linear map LD : V ⊗d → V ⊗d−2e

Algebraic Methods for Tensor Data

slide-34
SLIDE 34

Partial Brauer Diagrams

to a partial Brauer diagram with d vertices and e edges we associate an O(V )-equivariant multi-linear map MD : V d → V ⊗(d−2e) and a linear map LD : V ⊗d → V ⊗d−2e for example, if D = 1 3 5 2 4 6 then MD(v1, v2, . . . , v6) = (v1 · v3)(v4 · v5)v2 ⊗ v6 ∈ V ⊗2

Algebraic Methods for Tensor Data

slide-35
SLIDE 35

Brauer Diagram Calculus

the inner product of two tensors TD and TE is nc, were c is the number of cycles if we overlay the diagrams D and E

Algebraic Methods for Tensor Data

slide-36
SLIDE 36

Brauer Diagram Calculus

the inner product of two tensors TD and TE is nc, were c is the number of cycles if we overlay the diagrams D and E for example, if D = 1 3 5 2 4 6 E = 1 3 5 2 4 6 then TD · TE = n2 because

  • has 2 cycles

Algebraic Methods for Tensor Data

slide-37
SLIDE 37

Integrating

let Sd =

  • D

TD where D runs over all Brauer diagrams on d vertices

Algebraic Methods for Tensor Data

slide-38
SLIDE 38

Integrating

let Sd =

  • D

TD where D runs over all Brauer diagrams on d vertices if we integrate v⊗d ∈ V ⊗d over the unit sphere Sn−1 ⊂ V (where Sn−1 has measure 1) then we get a tensor that is invariant under On and the symmetric group Σd, so

  • Sn−1 v⊗d = CSd

for some constant C

Algebraic Methods for Tensor Data

slide-39
SLIDE 39

Integrating

let Sd =

  • D

TD where D runs over all Brauer diagrams on d vertices if we integrate v⊗d ∈ V ⊗d over the unit sphere Sn−1 ⊂ V (where Sn−1 has measure 1) then we get a tensor that is invariant under On and the symmetric group Σd, so

  • Sn−1 v⊗d = CSd

for some constant C from 1 =

  • Sn−1 1 =
  • Sn−1 v⊗d ·TE = CSd ·TE = Cn(n +2) · · · (n +d −2)

follows C =

1 n(n+2)···(n+d−2)

Algebraic Methods for Tensor Data

slide-40
SLIDE 40

Colored Brauer Diagrams

R = Rp, G = Rq, B = Rr V = R ⊗ G ⊗ B space of order 3 tensors ψ : R⊗d ⊗ G ⊗d ⊗ B⊗d → V ⊗d linear isomorphism H = Op × Oq × Or acts on V , V ⊗d

Algebraic Methods for Tensor Data

slide-41
SLIDE 41

Colored Brauer Diagrams

R = Rp, G = Rq, B = Rr V = R ⊗ G ⊗ B space of order 3 tensors ψ : R⊗d ⊗ G ⊗d ⊗ B⊗d → V ⊗d linear isomorphism H = Op × Oq × Or acts on V , V ⊗d isomorphism ψ : (R⊗d)Op ⊗ (G ⊗d)Oq ⊗ (B⊗d)Oq → (V ⊗d)H

Algebraic Methods for Tensor Data

slide-42
SLIDE 42

Colored Brauer Diagrams

R = Rp, G = Rq, B = Rr V = R ⊗ G ⊗ B space of order 3 tensors ψ : R⊗d ⊗ G ⊗d ⊗ B⊗d → V ⊗d linear isomorphism H = Op × Oq × Or acts on V , V ⊗d isomorphism ψ : (R⊗d)Op ⊗ (G ⊗d)Oq ⊗ (B⊗d)Oq → (V ⊗d)H (V ⊗d)H spanned by all ψ(TDR ⊗ TDG ⊗ TDB) where DR, DG, DB are Brauer diagrams on d vertices

Algebraic Methods for Tensor Data

slide-43
SLIDE 43

Colored Brauer Diagrams

R = Rp, G = Rq, B = Rr V = R ⊗ G ⊗ B space of order 3 tensors ψ : R⊗d ⊗ G ⊗d ⊗ B⊗d → V ⊗d linear isomorphism H = Op × Oq × Or acts on V , V ⊗d isomorphism ψ : (R⊗d)Op ⊗ (G ⊗d)Oq ⊗ (B⊗d)Oq → (V ⊗d)H (V ⊗d)H spanned by all ψ(TDR ⊗ TDG ⊗ TDB) where DR, DG, DB are Brauer diagrams on d vertices we draw the Brauer diagrams DR in red, DG in green, and DB in blue using the same d vertices and we get a colored Brauer diagram D, for example

  • Algebraic Methods for Tensor Data
slide-44
SLIDE 44

Tensor Invariants

V = R ⊗ G ⊗ B, space of p × q × r tensors (V ⊗d)H is spanned by all TD := ψ(TDR ⊗ TDG ⊗ TDB) where D runs over all colored Brauer diagrams

Algebraic Methods for Tensor Data

slide-45
SLIDE 45

Tensor Invariants

V = R ⊗ G ⊗ B, space of p × q × r tensors (V ⊗d)H is spanned by all TD := ψ(TDR ⊗ TDG ⊗ TDB) where D runs over all colored Brauer diagrams for a colored Brauer diagram, define a polynomial function ID : V → R by ID(U) = U⊗d · TD

Theorem

the ring of H-invariant polynomials on the space of p × q × r tensors is spanned by all ID, where D is a colored Brauer diagram

Algebraic Methods for Tensor Data

slide-46
SLIDE 46

Computing the Integral

Up to a constant

  • Sp−1×Sq−1×Sr−1(a ⊗ b ⊗ c)⊗d

is equal to ψ(Sd ⊗ Sd ⊗ Sd), the sum of all TD where D is a colored Brauer diagram on d vertices

Algebraic Methods for Tensor Data

slide-47
SLIDE 47

Computing the Norm

U4

σ,4 =

  • (U · a ⊗ b ⊗ c)d =
  • U⊗d · (a ⊗ b ⊗ c)⊗d =

= U⊗d ·

  • (a ⊗ b ⊗ c)⊗d
  • is up to scalar equal to the sum of all ID where D is a colored

diagram on d vertices

Algebraic Methods for Tensor Data

slide-48
SLIDE 48

Computing the Norm

U4

σ,4 =

  • (U · a ⊗ b ⊗ c)d =
  • U⊗d · (a ⊗ b ⊗ c)⊗d =

= U⊗d ·

  • (a ⊗ b ⊗ c)⊗d
  • is up to scalar equal to the sum of all ID where D is a colored

diagram on d vertices note that ID does not depend on the labeling of the vertices, so up to a scalar, U4

σ,4 is equal to

3

  • + 6
  • + 6
  • + 6
  • + 6
  • (by abuse of notation, the diagram D represents ID)

Algebraic Methods for Tensor Data

slide-49
SLIDE 49

Tensor Norms

U4

σ,4 =

  • + 2
  • + 2
  • + 2
  • + 2
  • 9

a similar norm that in some ways is a better approximation of the spectral norm is given by U4

# =

  • +
  • +
  • + 2
  • 5

.

Algebraic Methods for Tensor Data

slide-50
SLIDE 50

Tensor Amplification

if matrix A has singular values λ1 ≥ λ2 ≥ · · · ≥ λr then AAtA has singular values λ3

1, λ3 2, . . .

so the map A → AAtA enhances the low rank structure (largest singular values) and surpresses noise (small singular values)

Algebraic Methods for Tensor Data

slide-51
SLIDE 51

Tensor Amplification

if matrix A has singular values λ1 ≥ λ2 ≥ · · · ≥ λr then AAtA has singular values λ3

1, λ3 2, . . .

so the map A → AAtA enhances the low rank structure (largest singular values) and surpresses noise (small singular values) AAtA is the gradient of 1

4 Tr((AAt)2)

Algebraic Methods for Tensor Data

slide-52
SLIDE 52

Tensor Amplification

if matrix A has singular values λ1 ≥ λ2 ≥ · · · ≥ λr then AAtA has singular values λ3

1, λ3 2, . . .

so the map A → AAtA enhances the low rank structure (largest singular values) and surpresses noise (small singular values) AAtA is the gradient of 1

4 Tr((AAt)2)

we can do something similar with tensors and define Φσ,4 = ∇T4

σ,4

Φ# = ∇T4

#

these maps are Op × Oq × Or-equivariant and amplify the low rank structure of a tensor

Algebraic Methods for Tensor Data

slide-53
SLIDE 53

Partial Colored Brauer Diagrams

in partial colored Brauer diagrams, we have Φ# = 4 5

  • + 4

5

  • + 4

5

  • + 8

5

  • Algebraic Methods for Tensor Data
slide-54
SLIDE 54

Numerical Experiments

choose a random tensor 30 × 30 × 30 tensor Tn = T + E T is random rank 1 with T = 1 E is random tensor with E = 10 signal to noise ratio −20dB experiment: use Alternating Least Squares (ALS) algorithm for best rank 1 approximation of Tn (to recover T ) use ALS with random initial guess compare with using ALS with amplified initial guess

Algebraic Methods for Tensor Data

slide-55
SLIDE 55

Results

Random (10 runs) Max Fit Total # Iterations Total Time Average 0.7136 77.5080 0.0943 Standard Deviation 0.2715 12.0254 0.0159 Quick Rank 1 Fit # Iterations Time Average 0.7848 2.94 0.0177 Standard Deviation 0.1618 1.2345 0.0025 Φσ,4 and Quick Rank 1 Fit # Iterations Time Average 0.8010 2 0.0210 Standard Deviation 0.1256 0.0027 Φ# and Quick Rank 1 Fit # Iterations Time Average 0.8178 2 0.0205 Standard Deviation 0.0515 0.0025

Algebraic Methods for Tensor Data