An Introduction to Tensor-Based Independent Component Analysis - - PowerPoint PPT Presentation

an introduction to tensor based independent component
SMART_READER_LITE
LIVE PREVIEW

An Introduction to Tensor-Based Independent Component Analysis - - PowerPoint PPT Presentation

L. De Lathauwer An Introduction to Tensor-Based Independent Component Analysis Lieven De Lathauwer K.U.Leuven Belgium Lieven.DeLathauwer@kuleuven-kortrijk.be 1 L. De Lathauwer Overview Problem definition Higher-order statistics


slide-1
SLIDE 1
slide-2
SLIDE 2
  • L. De Lathauwer

An Introduction to Tensor-Based Independent Component Analysis

Lieven De Lathauwer K.U.Leuven Belgium

Lieven.DeLathauwer@kuleuven-kortrijk.be

1

slide-3
SLIDE 3
  • L. De Lathauwer

Overview

  • Problem definition
  • Higher-order statistics
  • Basic ICA equations
  • Specific prewhitening-based multilinear algorithms
  • Application
  • Higher-order-only schemes
  • Variants for coloured sources
  • Dimensionality reduction
  • Conclusions

2

slide-4
SLIDE 4
  • L. De Lathauwer

Independent Component Analysis (ICA)

Model:

Y = MX + N (P × 1) (P × R)(R × 1) (P × 1) x1 x2 x3 ˆ x1 ˆ x2 ˆ x3 +

3

slide-5
SLIDE 5
  • L. De Lathauwer

Model:

Y = MX + N (P × 1) (P × R)(R × 1) (P × 1)

Assumptions:

  • columns of M are linearly independent
  • components of X are statistically independent

Goal: Identification of M and/or reconstruction of X while observing only Y

4

slide-6
SLIDE 6
  • L. De Lathauwer

Independent Component Analysis (ICA)

Disciplines: statistics, neural networks, information theory, linear and multilinear algebra, . . . Indeterminacies:

  • rdering and scaling of the columns (Y = MX)

Uncorrelated vs independent:

X, Y are uncorrelated iff E{XY } = 0 X, Y are independent iff pXY (x, y) = pX(x)pY (y)

statistical independence implies:

  • the variables are uncorrelated
  • additional conditions on the HOS

5

slide-7
SLIDE 7
  • L. De Lathauwer

Algebraic tools: Condition Identification Tool

Xi uncorr.

column space M matrix EVD/SVD

Xi indep.

M tensor EVD/SVD Web site:

http://www.tsi.enst.fr/icacentral/index.html

mailing list, data sets, software

6

slide-8
SLIDE 8
  • L. De Lathauwer

Applications

  • Speech and audio
  • Image processing

feature extraction, image reconstruction, video

  • Telecommunications

OFDM, CDMA, . . .

  • Biomedical applications

functional Magnetic Resonance Imaging, electromyogram, electro-encephalogram, (fetal) electrocardiogram, mammography, pulse oximetry, (fetal) magnetocardiogram, . . .

  • Other applications

text classification, vibratory signals generated by termites (!), electron energy loss spectra, astrophysics, . . .

7

slide-9
SLIDE 9
  • L. De Lathauwer

HOS definitions

Moments and cumulants of a random variable: Moments Cumulants

mX

1 = E{X}

cX

1 = E{X}

“mean”(mX) “mean”

mX

2 = E{X2}

cX

2 = E{(X − mX)2}

(RX)

“variance”(σ2

X)

mX

3 = E{X3}

cX

3 = E{(X − mX)3}

mX

4 = E{X4}

cX

4 = E{(X − mX)4} − 3σ4 X 8

slide-10
SLIDE 10
  • L. De Lathauwer

Characteristic Functions

First characteristic function:

Φx(ω)

def

= E{ejωx} =

  • +∞

−∞

px(x)ejωxdx

Generates moments:

Φx(ω) =

  • k=0

mX

k

(jω)k k! (m0 = 1)

Second characteristic function:

Ψx(ω)

def

= ln Φx(ω)

Generates cumulants:

Ψx(ω) =

  • k=1

cX

k

(jω)k k!

9

slide-11
SLIDE 11
  • L. De Lathauwer

Moments and cumulants of a set of random variables: Moments:

(M(N)

x

)i1i2...iN = Mom(xi1, xi2, . . . , xiN)

def

= E{xi1xi2 . . . xiN}

Cumulants:

(cx)i = Cum(xi)

def

= E{xi} (Cx)i1i2 = Cum(xi1, xi2)

def

= E{xi1xi2} (C(3)

x )i1i2i3 = Cum(xi1, xi2, xi3) def

= E{xi1xi2xi3} (C(4)

x )i1i2i3i4 = Cum(xi1, xi2, xi3, xi4) def

= E{xi1xi2xi3xi4} − E{xi1xi2}E{xi3xi4} −E{xi1xi3}E{xi2xi4} − E{xi1xi4}E{xi2xi3}

Order 2: xi ← xi − E{xi}

10

slide-12
SLIDE 12
  • L. De Lathauwer

Multivariate case: e.g. moments:

= = } } E{ E{ X X X X X RX MX

3 11

slide-13
SLIDE 13
  • L. De Lathauwer

= ⇒                                              1 : mX

def

= E{X} →

vector

2 : RX

def

= E{XXT} →

matrix

3 : MX

3 def

= E{X ◦ X ◦ X} →

3rd order tensor

4 : MX

4 def

= E{X ◦ X ◦ X ◦ X} →

4th order tensor

12

slide-14
SLIDE 14
  • L. De Lathauwer

HOS example

Gaussian distribution

px(x) =

1 √ 2πσ exp(− x2 2σ2)

n m(n)

x

c(n)

x

1 2 σ2 σ2 3 px(x) x

4 3 σ4 Uniform distribution

px(x) =

1 2a

(x ∈ [−a, +a]) n m(n)

x

c(n)

x

1 2 a2/3 a2/3 3 px(x)

1 2a

+ax −a

4

3a4/5 −2a4/15

13

slide-15
SLIDE 15
  • L. De Lathauwer

ICA: basic equations

Model:

Y = MX

Second order:

CY

2

= E{Y Y T} = M · CX

2 · MT

= CX

2 •1 M •2 M

uncorrelated sources: CX

2 is diagonal

“diagonalization by congruence”

CY

2

= σ2

1

σ2

2

σ2

R

M1 M1 M2 M2 MR MR

+ . . . + +

14

slide-16
SLIDE 16
  • L. De Lathauwer

Higher order:

CY

4 = CX 4 •1 M •2 M •3 M •4 M

independent sources: CX

4

is diagonal “CANDECOMP / PARAFAC”

= + λ1 λ2 λR M1 M2 MR M1 M2 MR M1 M2 MR + . . . + CY

15

slide-17
SLIDE 17
  • L. De Lathauwer

Prewhitening-based computation

Model:

Y = MX

Second order:

CY

2

= E{Y Y T} = M · CX

2 · MT

⇒ M · I · MT = M · MT = (M · Q) · (M · Q)T

“square root”: EVD, Cholesky, . . . Remark: PCA: SVD of M:

M = U · S · VT ⇒ CY

2

= (US) · (US)T = U · S2 · UT

16

slide-18
SLIDE 18
  • L. De Lathauwer

Prewhitening-based computation (2)

Matrix factorization:

M = T · Q

Second order:

CY

2 = CX 2 •1 M •2 M = T · TT

Observed r.v. Y = MX Whitened r.v. Z = T−1Y = QX Higher order: ICA:

CY

4

= CX

4 •1 M •2 M •3 M •4 M

⇒ CZ

4

= CX

4 •1 Q •2 Q •3 Q •4 Q

“multilinear symmetric EVD” “CANDECOMP/PARAFAC with orthogonality and symmetry constraints” Source cumulant is theoretically diagonal An arbitrary symmetric tensor cannot be diagonalized

⇒ different solution strategies

17

slide-19
SLIDE 19
  • L. De Lathauwer

PCA versus ICA

ICA = higher-order fine-tuning of PCA: PCA ICA 2nd-order higher-order matrix EVD tensor EVD uncorrelated sources independent sources column space M M itself always possible depends on context Computational cost: cumulant estimation and diagonalization

18

slide-20
SLIDE 20
  • L. De Lathauwer

Illustration

50 100 150 200 250 300 350 400 −3 −2 −1 1 2 3 50 100 150 200 250 300 350 400 −2 −1 1 2

Observations

19

slide-21
SLIDE 21
  • L. De Lathauwer

50 100 150 200 250 300 350 400 −0.1 −0.05 0.05 0.1 50 100 150 200 250 300 350 400 −0.1 −0.05 0.05 0.1

Sources estimated with PCA

50 100 150 200 250 300 350 400 −3 −2 −1 1 2 3 50 100 150 200 250 300 350 400 −1.5 −1 −0.5 0.5 1 1.5

Sources estimated with ICA

20

slide-22
SLIDE 22
  • L. De Lathauwer

Algorithm 1: maximal diagonality

x x x x x x x x x x x x

C(k+1) C(k) Q Q Q =

21

slide-23
SLIDE 23
  • L. De Lathauwer
  • Maximize energy on the diagonal by Jacobi-iteration
  • Determination of optimal rotation angle:
  • rder 3

real roots polynomial degree 2

  • rder 3

complex roots polynomial degree 3

  • rder 4

real roots polynomial degree 4

  • rder 4

complex

  • [Comon ’94, De Lathauwer ’01]

22

slide-24
SLIDE 24
  • L. De Lathauwer

Algorithm 2: maximal diagonality

x x x x x x x x x x x x

C(k+1) C(k) Q Q Q =

  • Trace is not rotation invariant
  • Maximize sum of diagonal entries by Jacobi-iteration
  • Determination of optimal rotation angle:
  • rder 4

real roots polynomial degree 2

  • rder 4

complex roots polynomial degree 3 [Comon, Moreau, ’97]

23

slide-25
SLIDE 25
  • L. De Lathauwer

Algorithm 3: simultaneous EVD

= = + + . . . + Q1 Q1 Q1 Q2 Q2 Q2 QP QP QP CZ

  • Maximize energy on the diagonals by Jacobi-iteration
  • Determination of optimal rotation angle:

real roots polynomial degree 2 complex roots polynomial degree 3 [Cardoso ’94 (JADE)]

24

slide-26
SLIDE 26
  • L. De Lathauwer

Application: fetal electrocardiogram extraction

Abdominal and thoracic recordings

−1000 1000 −2000 2000 −1000 1000 −500 500 −2000 2000 −5000 5000 −5000 5000 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −5000 5000 [s]

25

slide-27
SLIDE 27
  • L. De Lathauwer

ICA results for FECG extraction

Independent components:

−0.2 0.2 −0.2 0.2 −0.2 0.2 −0.1 0.1 −0.2 0.2 −0.2 0.2 −0.1 0.1 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −0.2 0.2 [s]

26

slide-28
SLIDE 28
  • L. De Lathauwer

A variant for coloured sources

Condition: sources mutually uncorrelated, but individually correlated in time Basic equations:

CY

2 (0)

= E{Y (t)Y (t)T} = M · CX

2 (0) · MT

CY

2 (0)

= σ2

1

σ2

2

σ2

R

M1 M1 M2 M2 MR MR

+ . . . + +

27

slide-29
SLIDE 29
  • L. De Lathauwer

CY

2 (τ)

= E{Y (t)Y (t + τ)T} = M · CX

2 (τ) · MT

=

Variants: nonstationary sources, time-frequency representations, Hessian second characteristic function, . . . [Belouchrani et al. ’97 (SOBI)], [De Lathauwer and Castaing ’08] (overcomplete)

28

slide-30
SLIDE 30
  • L. De Lathauwer

Large mixtures: more sensors than sources

Applications: EEG, MEG, NMR, hyper-spectral image processing, data analysis, . . . Prewhitening-based algorithms:

Y = MX

(P × 1) (P × R)(R × 1)

(P ≫ R) M = U · S · VT

(P × R) (P × R)(R × R)(R × R)

Z = S−1 · UTY Z = VTX

(R × 1) (R × R)(R × 1)

29

slide-31
SLIDE 31
  • L. De Lathauwer

Large mixtures: more sensors than sources (2)

Algorithms without prewhitening: best multilinear rank approximation

A S I1 I1 I2 I2 I3 I3 = U(1) U(2) U(3)

Tucker decomposition: [Tucker ’64], [De Lathauwer ’00]

30

slide-32
SLIDE 32
  • L. De Lathauwer

Large data sets: [Mahoney et al. ’06], [Tyrtyshnikov et al. ’06], [Oseledets et al. ’08] Orthogonal iteration: [Kroonenberg ’83], [De Lathauwer ’00] Optimization on manifolds:

  • Newton

[Eld´ en and Savas ’06], [Ishteva et al. ’08]

  • Quasi-Newton

[Savas and Lim ’08]

  • Trust region

[Ishteva et al. ’09]

  • Conjugate gradient

[Ishteva et al. ’09] Krylov method: [Savas and Eld´ en ’08]

31

slide-33
SLIDE 33
  • L. De Lathauwer

Conclusion

  • PCA: directions of extremal oriented energy

ICA: directions of statistically independent contributions

  • Independence is a stronger condition than uncorrelatedness → unique

solution

  • Solution by means of multilinear algebra:
  • maximal diagonality
  • simultaneous EVD
  • CANDECOMP/PARAFAC with symmetry constraint
  • Broad application domain
  • Generalizations for convolutive mixtures

32