[PPT] - Dimitri Nion & Lieven De Lathauwer K.U. Leuven, Kortrijk campus, PowerPoint Presentation

SLIDE 1

1

The Joint Block Diagonalization (JBD) problem: a tensor framework.

Dimitri Nion & Lieven De Lathauwer

K.U. Leuven, Kortrijk campus, Belgium

E-mails: Dimitri.Nion@kuleuven-kortrijk.be Lieven.DeLathauwer@kuleuven-kortrijk.be TDA 2010, Monopoli, Italy, September 13-17, 2010

SLIDE 2

2

Joint-Block-Diagonalization (JBD): model

I

L1 LR

1 T

A

T R

A

I

1

A

R

A

L1 LR

11

D

1 R

D

1K

D

RK

D 1

X

K

X

=

K k

r

k H k k k T k k

,..., 1 ), ( ) (      N A AD X N A AD X

JBD is a generalization of JD (Joint Diagonalization)/INDSCAL

1

X

K

X

1

D

A

T

A

=

I R

= cR aR aR

+

c1 a1 a1

+ …

K

D

SLIDE 3

3

I

L1 L

R 1 T

A

T R

A I

1

A

R

A

L1 LR

11

D

1 R

D

1K

D

RK

D

1

X

K

X

=

Z Z-1 ZT Z-T

=

~

A

~ T

A

1

{ }

K k k

D 

JBD : ambiguities

Observation: if you choose Z arbitrary, you lose the JBD structure. Question: what is the structure of Z such that the JBD model is still valid?

SLIDE 4

4

1

A

3

A

2

A

~ 1

A

~ 2

A

~ 3

A

=

3

Λ

1

Λ

2

Λ

1 1 1 1 1 1

JBD : essential uniqueness

Solving a JBD problem Estimation of {Span(Ar)}r=1,…,R in an arbitrary order The JBD of is said essentially unique if

an arbitrary block-diagonal matrix,
an arbitrary block-wise permutation matrix.

 

K k k 1 

X

Λ

Π ΛΠ Z 

SLIDE 5

5

JBD: State of the art

JBD is becoming popular signal processing tool in applications such as:

Blind Source Separation (BSS) of convolutive mixtures in the time-domain,
Independent Subspace Analysis.

Two approaches in the literature:

Approach 1: Unitary-JBD [Abed Meraim and Belouchrani, 2004]

« A is a square unitary matrix » (AT A = I )

Approach 2: Non-Unitary JBD
Approach 2.1: [H. Ghennioui, N. Thirion-Moreau, E. Moreau, 2008, 2010]

« A is tall and full column-rank. »  Indeed, their approach only works if A is a square non-unitary matrix.

Approach 2.2: This talk

« A can be a tall, square or fat non-unitary matrix »  JBD is a particular instance of Block-Component-Decompositions  Computation by a gradient-based algorithm  In the square case, better performance than 2.1

SLIDE 6

6

 

  K k F k T K k F k T

ffbdiag
r

bdiag

1 2 1 2

) ( min ) ( max A X A A X A

A A

Joint-Block-Diagonalization : state of the art (1)

Approach 1: Unitary-JBD [Abed Meraim and Belouchrani, 2004]
A is square unitary matrix (ATA = I )

1

X

K

X

T

A A

Xk = A D A Dk AT (+ Nk ) AT Xk A A = Dk + (A N Nk AT )

=

SLIDE 7

7

 

  K k F T k K k F T k

ffbdiag
r

bdiag

1 2 1 2

) ( min ) ( max B BX B BX

B B

Joint-Block-Diagonalization : state of the art (2)

Approach 2: Non-Unitary-JBD [H. Ghennioui, N. Thirion-Moreau, E. Moreau, 2008, 2010]
A is tall and full column-rank (Let B = A

, then BA = I ) 1

X

K

X

B

T

B

Xk = A D A Dk AT (+ Nk ) BX BXk BT = Dk + (B Nk BT )

=

SLIDE 8

8

k

ffbdiag
ffbdiag
ffbdiag

k T T k T k

    , ) ( ) ( ) ( D B A BAD B BX

Joint-Block-Diagonalization : state of the art (3)

Approach 2: Non-Unitary-JBD [H. Ghennioui, N. Thirion-Moreau, E. Moreau, 2008, 2010]
2 gradient-descent based algorithms, JBDOG and JBDORG to solve

) 1 ( ) ( min

1 2





K k F T k

ffbdiag

B BX

B

ff



Drawbacks of approach 2:
B=0 is a trivial minimizer
Under-determined case (A fat, I<N) not handled, since it is assumed that BA=I
Indeed, the over-determined case (A tall, I>N) is not successfully handled either

because if is solution of an exact JBD problem, i.e., then is also solution of (1) but not solution of the JBD problem

A B B B  

T T T

] , [

2 1

T T T

] , ) [(

2 ~

B A B





       X X A B

~

because is not full rank. Idem for

T T T T

] , ) ( [

2 1 ~

B A B B



 

SLIDE 9

9

Joint-Block-Diagonalization : our contributions

Starting point: the gradient-descent based algorithms JBDOG and JBDORG of

[H. Ghennioui, N. Thirion-Moreau, E. Moreau, 2008, 2010] for Non-Unitary JBD can only

handle the case where A is square (I=N).

Main motivation: Propose a novel technique to solve non-unitary JBD problems

that can handle exactly-, over- and under- determined cases (i.e., A may be square, tall or fat).

Main contributions:
Formulate JBD as a tensor decomposition fitting problem;
Build a Conjugate Gradient (CG) algorithm to compute the tensor decomposition;
In the over-determined case, build a good starting point for any JBD algorithm;
Application: blind source separation via Independent Subspace Analysis (ISA).

SLIDE 10

10

JBD in tensor format : link to BCD

1 T

A

T R

A

I

1

A

R

A

L1 LR

11

D

1 R

D

1K

D

RK

D

1

X

K

X

= =

r

A

T r

A

Lr

I

Dr X



 R r 1

r r R r r

A A

2 1 1

   



D X

* 2 1 1 r r R r r

A A    



D X

(or in case of hermitian symmetry) JD particular case of candecomp-parafac JBD particular case of BCD-(Lr, Mr, .), (« Block- Component-Decomposition in rank-(Lr, Mr, .) terms »)

SLIDE 11

11

BCD-(Lr , Mr , .) :

where A=[A1,…,AR] is I by N B=[B1,…,BR] is J by Q

Theorem [De Lathauwer, 2008]: Suppose that rank(A)=N, rank(B)=Q, K>2 and that

the tensors {Dr}r=1,…,R are generic, then the BCD-(Lr , Mr , .) of X is essentially unique (Sufficient condition).

JBD in tensor format : conditions for essential uniqueness

r r R r r

B A

2 1 1

   



D X

JBD :

where A=[A1,…,AR] is I by N

r r R r r

A A

2 1 1

   



D X

The same theorem can be invoked (the proof still holds with A instead of B)
In summary, it means that JBD is generically unique if

K>2 and rank(A)=N

This is only a sufficient condition: uniqueness still holds but is harder to prove

in several cases where the condition is not satisfied.

For instance, uniqueness may still hold when rank(A)=I (A fat, I<N)

SLIDE 12

12

JBD in tensor format : cost function

 

D A A X N N A D A X N N A A

A

) ( min

2 1 1 2 2 1 1 2 ,

1

          

  

  



with with with

F R r T r kr r k k K k F k r r R r r F LS

R r r

D X N N

D



.

Where:  A A = [A1 A1 , …, AR AR] is the Khatri-Rao product (block-wise Kronecker product)  X and N are the I2xK matrix unfoldings of the IxIxK tensors X and N, resp.

.

X X

SLIDE 13

13

JBD in tensor format : derivation of the gradient

k T k k

N A AD X  

Real-valued data Complex-valued data, standard symmetry Complex-valued data, hermitian symmetry

k T k k

N A AD X  

k H k k

N A AD X  

N A A AD N AD N

D A

) ( 2 ) ( , ) ( 2 ) (

1 T LS K k T k k k T k LS

      





 

.

N A A D A N D A N

D A H LS K k H k k k T k LS

) ( ) ( , ) ( ) (

* *

1 * * *

      





 

.

N A A AD N AD N

D A T LS K k H k k k H k LS

) ( ) ( , ) ( ) (

* 1

* *

      





 

.

SLIDE 14

14

Conjugate Gradient algorithm for JBD: JBD-CG

End while

LS D D A A D D A A D A D D D A A A D A

d D D d A A d D d A d d d d A D

D A

         

 

               : unknwons Update

4

( : Search Line Exact Joint

3

: directions search New

2.2

formula) Ribière

Polak

stabilized with (e.g. Compute

2.1

directions search Update

2

and from and Compute

1

satisfied) not criterion (stop ) , ( min arg ) ,

,

SLIDE 15

15

Step 3 of JBD-CG : Joint Exact Line Search

) , ( min

, D D A A

d D d A

D A

  

 

 

LS

.

 Solve

) , (   

D D A

   LS

 

) 1 ( ) ( ) ( 2 ) ( ) ( ) ( ) ( ) , (

1 2 2 2 A A D A D D D A A A A D D A A

X d D d A d A d D d A            Q Q Q

F LS

         

) 2 ( ) ( ) (

2 1 A A D

   Q Q  

 Substitute (2) in (1):

) 3 ( ) ( ) ( ) ( ) ( ) (

2 2 2 1 A A A A A

      Q Q Q Q

LS

  

 Minimize (3) w.r.t. (degree 7 polynomial rooting)  Given , is given by (2)

A



A



D



SLIDE 16

16

JBD in tensor format : a closed form solution (when A is full column-rank)

T T

A AD X A AD X

2 2 1 1

 

Idea: when A is square or tall (full column-rank), the exact (noise-free) JBD

problem can be solved by eigenvalue analysis. In presence of noise, this solution can be used to initialize iterative algorithms.

Derivation in the square case (I=N):

Take two matrices

K k L L r Span

T k T k r r r rr r r

,..., 1 , . , , . , ), ( , ] [

1 1 12 12 12 12 12 1 12 1 2 1 12

         

    

Π ΠD E X E Π Π EΠ A E E X X A D A A X D A AD X X X matrices the

f

structure diagonal

block

permuted the checking by Find by rs eigenvecto the groups that matrix n permutatio a is where have We diagonal. is where : EVD

f

subspace invariant an is i.e., So diagonal.

block

is where have We

SLIDE 17

17

Performance index

We have :

where

is an unknown block-diagonal matrix,
is an unknown block-wise permutation matrix,
is the estimation error.
Compute

and such that matches in the least squares sense.

Compute the relative error:

N Π Λ A A   ˆ

Π

Λ N

Π

Λ A ˆ Π Λ A

F F

A Π Λ A A   ˆ

rel



SLIDE 18

18

Numerical experiments (1)

Exactly determined case: I=N=6, L1=L2=L3=2, R=3, K=30

Comparison of our JBD-CG technique with JBD-OG of [H. Ghennioui, N. Thirion-Moreau, E. Moreau, 2008, 2010]

SLIDE 19

19

Numerical experiments (2)

Under determined case: I=6, N=8, L1=L2=L3=L4=2, R=4, K=100

SLIDE 20

20

Application: Blind Subspace Separation

11

h 

1

1L

h  ] [

1 p

s ] [

11 p

x ] [

1

p x L

1 R

h 

R

RL

h  ] [p sR ] [

1 p

xR

] [p x

R

RL

] [p x A  ] [ ] [ p p y Ax 

R primary sources (mutually uncorrelated)

Set of FIR filters

N Secondary sources:

Mutually correlated within the same subset
Mutually uncorrelated if not in the same subset





R r r

L N

1

] [

1 p

y ] [

2 p

y ] [ p yM

Linear instantaneous mixing (M sensors). Mixing matrix: A is M by N

SLIDE 21

Blind Subspace Separation via Second-Order- Statistics (SOS)

] [ ] [ p p Ax y 

Covariance matrix:

where D is block diagonal because on assumptions on secondary sources.

Set of K covariance matrices:
Estimate A by JBD-ACG algorithm
In the (over)-determined case (A square or tall), LS estimate of secondary sources:
Blind SIMO identification stage to recover primary sources from secondary ones,

see, e.g., E. Moulines, P. Duhamel, J.-F. Cardoso, and S. Mayrargue, “Subspace Methods for the Blind

Identification of Multichannel FIR filters,” IEEE Trans. Signal Proc., vol. 43, pp. 516–525, 1995.

   

H H H H

p p E p p E ADA A x x A y y Ryy      ] [ ] [ ] [ ] [ ] [   

H K K H

A AD R A AD R

yy yy

  ] [ ] [

1 1

  

JBD problem !

] [ ] [

1

p p y A x





SLIDE 22

22

Practical example: separation of speech signals

R=3 primary sources.
Number of filters for each source: L1=L2=L3=4; each filter of length 50

generated randomly  N=12 secondary sources.

M=14 sensors (A is 14 by 12).
K=21 covariance matrices.

s1 S2 s3

SLIDE 23

23

Practical example: separation of speech signals

 

 



K k k K k k 1 1

] [ D Rxx  A A ˆ

1 

y1 y2 y3 … y14

SLIDE 24

24

Practical example: separation of speech signals

1

ˆ s

2

ˆ s

3

ˆ s

Separation is successful

SLIDE 25

25

Conclusion

JBD is a generalization of JD
JD

a particular case of Candecomp/Parafac JBD a particular case of Block-Component-Decomposition (BCD)

Uniqueness conditions for BCD can be invoked.
In the exactly- and over- determined cases, we proposed an EVD-based

technique useful for good initialization of JBD algorithms.

We proposed a JBD-CG algorithm that works in exactly-, over- and under-

determined cases

Application: CG can accuratly achieve blind subspace separation based on

Second Order Statistics.