an introduction to tensor based independent component
play

An Introduction to Tensor-Based Independent Component Analysis - PowerPoint PPT Presentation

L. De Lathauwer An Introduction to Tensor-Based Independent Component Analysis Lieven De Lathauwer K.U.Leuven Belgium Lieven.DeLathauwer@kuleuven-kortrijk.be 1 L. De Lathauwer Overview Problem definition Higher-order statistics


  1. L. De Lathauwer An Introduction to Tensor-Based Independent Component Analysis Lieven De Lathauwer K.U.Leuven Belgium Lieven.DeLathauwer@kuleuven-kortrijk.be 1

  2. L. De Lathauwer Overview • Problem definition • Higher-order statistics • Basic ICA equations • Specific prewhitening-based multilinear algorithms • Application • Higher-order-only schemes • Variants for coloured sources • Dimensionality reduction • Conclusions 2

  3. L. De Lathauwer Independent Component Analysis (ICA) Model: Y = M X + N ( P × 1) ( P × R )( R × 1) ( P × 1) x 2 x 1 x 3 + x 1 ˆ x 2 ˆ x 3 ˆ 3

  4. L. De Lathauwer Model: Y = M X + N ( P × 1) ( P × R )( R × 1) ( P × 1) Assumptions: • columns of M are linearly independent • components of X are statistically independent Goal: Identification of M and/or reconstruction of X while observing only Y 4

  5. L. De Lathauwer Independent Component Analysis (ICA) Disciplines: statistics, neural networks, information theory, linear and multilinear algebra , . . . Indeterminacies: ordering and scaling of the columns ( Y = M X ) Uncorrelated vs independent: X , Y are uncorrelated iff E { XY } = 0 X , Y are independent iff p XY ( x, y ) = p X ( x ) p Y ( y ) statistical independence implies: - the variables are uncorrelated - additional conditions on the HOS 5

  6. L. De Lathauwer Algebraic tools: Condition Identification Tool X i uncorr. column space M matrix EVD/SVD X i indep. M tensor EVD/SVD Web site: http://www.tsi.enst.fr/icacentral/index.html mailing list, data sets, software 6

  7. L. De Lathauwer Applications • Speech and audio • Image processing feature extraction, image reconstruction, video • Telecommunications OFDM, CDMA, . . . • Biomedical applications functional Magnetic Resonance Imaging, electromyogram, electro-encephalogram, (fetal) electrocardiogram, mammography, pulse oximetry, (fetal) magnetocardiogram, . . . • Other applications text classification, vibratory signals generated by termites (!), electron energy loss spectra, astrophysics, . . . 7

  8. L. De Lathauwer HOS definitions Moments and cumulants of a random variable: Moments Cumulants m X c X 1 = E { X } 1 = E { X } “mean” ( m X ) “mean” m X 2 = E { X 2 } c X 2 = E { ( X − m X ) 2 } “variance” ( σ 2 ( R X ) X ) m X 3 = E { X 3 } c X 3 = E { ( X − m X ) 3 } m X 4 = E { X 4 } c X 4 = E { ( X − m X ) 4 } − 3 σ 4 X 8

  9. L. De Lathauwer Characteristic Functions First characteristic function: + ∞ � def = E { e jωx } = p x ( x ) e jωx dx Φ x ( ω ) −∞ Generates moments: ∞ ( jω ) k m X � Φ x ( ω ) = ( m 0 = 1) k k ! k =0 Second characteristic function: def Ψ x ( ω ) = ln Φ x ( ω ) Generates cumulants: ∞ ( jω ) k c X � Ψ x ( ω ) = k k ! k =1 9

  10. L. De Lathauwer Moments and cumulants of a set of random variables: Moments: def ( M ( N ) ) i 1 i 2 ...iN = Mom ( x i 1 , x i 2 , . . . , x iN ) = E { x i 1 x i 2 . . . x iN } x Cumulants: def ( c x ) i = Cum ( x i ) = E { x i } def ( C x ) i 1 i 2 = Cum ( x i 1 , x i 2 ) = E { x i 1 x i 2 } def ( C (3) x ) i 1 i 2 i 3 = Cum ( x i 1 , x i 2 , x i 3 ) = E { x i 1 x i 2 x i 3 } def ( C (4) x ) i 1 i 2 i 3 i 4 = Cum ( x i 1 , x i 2 , x i 3 , x i 4 ) = E { x i 1 x i 2 x i 3 x i 4 } − E { x i 1 x i 2 } E { x i 3 x i 4 } − E { x i 1 x i 3 } E { x i 2 x i 4 } − E { x i 1 x i 4 } E { x i 2 x i 3 } Order � 2 : x i ← x i − E { x i } 10

  11. L. De Lathauwer Multivariate case: e.g. moments: X = E { } R X X X X = E { } M X X 3 11

  12. L. De Lathauwer def  1 : m X = E { X }    → vector         def  E { XX T } 2 : =  R X     → matrix      = ⇒ def M X 3 : = E { X ◦ X ◦ X }  3   → 3rd order tensor          def M X  4 : = E { X ◦ X ◦ X ◦ X }   4   → 4th order tensor      12

  13. L. De Lathauwer HOS example Gaussian distribution 2 πσ exp( − x 2 1 p x ( x ) = 2 σ 2 ) √ m ( n ) c ( n ) n p x ( x ) x x 1 0 0 σ 2 σ 2 2 3 0 0 3 σ 4 x 4 0 Uniform distribution 1 p x ( x ) = ( x ∈ [ − a, + a ]) 2 a m ( n ) c ( n ) n p x ( x ) x x 1 1 0 0 2 a a 2 / 3 a 2 / 3 2 3 0 0 3 a 4 / 5 − 2 a 4 / 15 − a + ax 4 13

  14. L. De Lathauwer ICA: basic equations Model: Y = M X Second order: C Y E { Y Y T } = 2 M · C X 2 · M T = C X = 2 • 1 M • 2 M uncorrelated sources: C X 2 is diagonal “diagonalization by congruence” σ 2 σ 2 σ 2 1 2 R M 1 M 2 M R = + + . . . + C Y 2 M 1 M 2 M R 14

  15. L. De Lathauwer Higher order: C Y 4 = C X 4 • 1 M • 2 M • 3 M • 4 M independent sources: C X is diagonal 4 “CANDECOMP / PARAFAC” M 1 M 2 M R λ 1 λ 2 λ R M 1 M 2 M R = + . . . + + C Y M 1 M 2 M R 15

  16. L. De Lathauwer Prewhitening-based computation Model: Y = M X Second order: C Y E { Y Y T } = 2 M · C X 2 · M T = M · I · M T = M · M T ⇒ ( M · Q ) · ( M · Q ) T = “square root”: EVD, Cholesky, . . . Remark: PCA: M = U · S · V T SVD of M : ( US ) · ( US ) T = U · S 2 · U T ⇒ C Y = 2 16

  17. L. De Lathauwer Prewhitening-based computation (2) Matrix factorization: M = T · Q Second order: C Y 2 = C X 2 • 1 M • 2 M = T · T T Whitened r.v. Z = T − 1 Y = Q X Observed r.v. Y = M X Higher order: ICA: C Y C X = 4 • 1 M • 2 M • 3 M • 4 M 4 ⇒ C Z C X = 4 • 1 Q • 2 Q • 3 Q • 4 Q 4 “multilinear symmetric EVD” “CANDECOMP/PARAFAC with orthogonality and symmetry constraints” Source cumulant is theoretically diagonal An arbitrary symmetric tensor cannot be diagonalized ⇒ different solution strategies 17

  18. L. De Lathauwer PCA versus ICA ICA = higher-order fine-tuning of PCA: PCA ICA 2nd-order higher-order matrix EVD tensor EVD uncorrelated sources independent sources column space M M itself always possible depends on context Computational cost: cumulant estimation and diagonalization 18

  19. L. De Lathauwer Illustration 3 2 1 0 −1 −2 −3 0 50 100 150 200 250 300 350 400 2 1 0 −1 −2 0 50 100 150 200 250 300 350 400 Observations 19

  20. L. De Lathauwer 0.1 0.05 0 −0.05 −0.1 0 50 100 150 200 250 300 350 400 0.1 0.05 0 −0.05 −0.1 0 50 100 150 200 250 300 350 400 Sources estimated with PCA 3 2 1 0 −1 −2 −3 0 50 100 150 200 250 300 350 400 1.5 1 0.5 0 −0.5 −1 −1.5 0 50 100 150 200 250 300 350 400 Sources estimated with ICA 20

  21. L. De Lathauwer Algorithm 1: maximal diagonality x x x Q x Q Q = x x x x x x x x C ( k +1) C ( k ) 21

  22. L. De Lathauwer • Maximize energy on the diagonal by Jacobi-iteration • Determination of optimal rotation angle: order 3 real roots polynomial degree 2 order 3 complex roots polynomial degree 3 order 4 real roots polynomial degree 4 order 4 complex - [ Comon ’94, De Lathauwer ’01 ] 22

  23. L. De Lathauwer Algorithm 2: maximal diagonality x x x Q x Q Q = x x x x x x x x C ( k +1) C ( k ) • Trace is not rotation invariant • Maximize sum of diagonal entries by Jacobi-iteration • Determination of optimal rotation angle: order 4 real roots polynomial degree 2 order 4 complex roots polynomial degree 3 [ Comon, Moreau, ’97 ] 23

  24. L. De Lathauwer Algorithm 3: simultaneous EVD Q 1 Q 2 Q P Q 1 Q 2 Q P = + . . . + + C Z Q 1 Q 2 Q P = • Maximize energy on the diagonals by Jacobi-iteration • Determination of optimal rotation angle: real roots polynomial degree 2 complex roots polynomial degree 3 [ Cardoso ’94 (JADE) ] 24

  25. L. De Lathauwer Application: fetal electrocardiogram extraction Abdominal and thoracic recordings 1000 0 −1000 2000 0 −2000 1000 0 −1000 500 0 −500 2000 0 −2000 5000 0 −5000 5000 0 −5000 5000 0 −5000 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 [s] 25

  26. L. De Lathauwer ICA results for FECG extraction Independent components: 0.2 0 −0.2 0.2 0 −0.2 0.2 0 −0.2 0.1 0 −0.1 0.2 0 −0.2 0.2 0 −0.2 0.1 0 −0.1 0.2 0 −0.2 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 [s] 26

  27. L. De Lathauwer A variant for coloured sources Condition: sources mutually uncorrelated, but individually correlated in time Basic equations: C Y E { Y ( t ) Y ( t ) T } 2 (0) = M · C X 2 (0) · M T = σ 2 σ 2 σ 2 1 2 R M 1 M 2 M R = + + . . . + C Y 2 (0) M 1 M 2 M R 27

  28. L. De Lathauwer C Y E { Y ( t ) Y ( t + τ ) T } 2 ( τ ) = M · C X 2 ( τ ) · M T = = Variants: nonstationary sources, time-frequency representations, Hessian second characteristic function, . . . [ Belouchrani et al. ’97 (SOBI) ], [ De Lathauwer and Castaing ’08 ] (overcomplete) 28

  29. L. De Lathauwer Large mixtures: more sensors than sources Applications: EEG, MEG, NMR, hyper-spectral image processing, data analysis, . . . Prewhitening-based algorithms: Y = M X ( P ≫ R ) ( P × 1) ( P × R )( R × 1) U · S · V T = M ( P × R ) ( P × R )( R × R )( R × R ) S − 1 · U T Y Z = V T X Z = ( R × 1) ( R × R )( R × 1) 29

  30. L. De Lathauwer Large mixtures: more sensors than sources (2) Algorithms without prewhitening: best multilinear rank approximation U (3) I 3 I 3 I 2 = I 2 I 1 I 1 U (2) U (1) A S Tucker decomposition: [ Tucker ’64 ], [ De Lathauwer ’00 ] 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend