Independent Component Independent Component Analysis y Class 20. 8 - - PowerPoint PPT Presentation

independent component independent component analysis y
SMART_READER_LITE
LIVE PREVIEW

Independent Component Independent Component Analysis y Class 20. 8 - - PowerPoint PPT Presentation

11-755 Machine Learning for Signal Processing Independent Component Independent Component Analysis y Class 20. 8 Nov 2012 Instructor: Bhiksha Raj 8 Nov 2012 11755/18797 1 A brief review of basic probability Uncorrelated: Two random


slide-1
SLIDE 1

11-755 Machine Learning for Signal Processing

Independent Component Independent Component Analysis y

Class 20. 8 Nov 2012 Instructor: Bhiksha Raj

8 Nov 2012 1 11755/18797

slide-2
SLIDE 2

A brief review of basic probability

 Uncorrelated: Two random variables X and Y are

uncorrelated iff: uncorrelated iff:

 The average value of the product of the variables equals the

product of their individual averages

 Setup: Each draw produces one instance of X and one

instance of Y instance of Y

 I.e one instance of (X,Y)

 E[XY] = E[X]E[Y]  E[XY] E[X]E[Y]  The average value of X is the same regardless of the

value of Y

8 Nov 2012 11755/18797 2

slide-3
SLIDE 3

Uncorrelatedness

 Which of the above represent uncorrelated RVs?

8 Nov 2012 11755/18797 3

slide-4
SLIDE 4

A brief review of basic probability

 Independence: Two random variables X and Y are

independent iff:

 Their joint probability equals the product of their

individual probabilities

 P(X Y) = P(X)P(Y)  P(X,Y) P(X)P(Y)   The average value of X is the same regardless

  • f the value of Y

 E[X|Y] = E[X]

8 Nov 2012 11755/18797 4

slide-5
SLIDE 5

A brief review of basic probability

 Independence: Two random variables X and Y are

independent iff:

 The average value of any function X is the same

The average value of any function X is the same regardless of the value of Y E[f(X) (Y)] E[f(X)] E[ (Y)] f ll f() ()

 E[f(X)g(Y)] = E[f(X)] E[g(Y)] for all f(), g()

8 Nov 2012 11755/18797 5

slide-6
SLIDE 6

Independence

 Which of the above represent independent RVs?

Whi h t l t d RV ?

 Which represent uncorrelated RVs?

8 Nov 2012 11755/18797 6

slide-7
SLIDE 7

A brief review of basic probability

( ) f(x) p(x)

 The expected value of an odd function of an RV is

0 if

 The RV is 0 mean  The RV is 0 mean  The PDF is of the RV is symmetric around 0

E[f(X)] 0 if f(X) i dd t i

 E[f(X)] = 0 if f(X) is odd symmetric

8 Nov 2012 11755/18797 7

slide-8
SLIDE 8

A brief review of basic info. theory

T(all), M(ed), S(hort)…  Entropy: The minimum average number of bits to

 

X

X P X P X H )] ( log )[ ( ) (

 Entropy: The minimum average number of bits to

transmit to convey a symbol

X T, M, S… M F F M.. X Y  Joint entropy: The minimum average number of bits

 

Y X

Y X P Y X P Y X H

,

)] , ( log )[ , ( ) , (

Y

to convey sets (pairs here) of symbols

8 Nov 2012 11755/18797 8

slide-9
SLIDE 9

A brief review of basic info. theory

X T, M, S… M F F M.. X Y

  

   

Y X Y X

Y X P Y X P Y X P Y X P Y P Y X H

,

)] | ( log )[ , ( )] | ( log )[ | ( ) ( ) | (

 Conditional Entropy: The minimum average

number of bits to transmit to convey a symbol X, y y , after symbol Y has already been conveyed

 Averaged over all values of X and Y  Averaged over all values of X and Y

8 Nov 2012 11755/18797 9

slide-10
SLIDE 10

A brief review of basic info. theory

) ( )] ( log )[ ( ) ( )] | ( log )[ | ( ) ( ) | ( X H X P X P Y P Y X P Y X P Y P Y X H     

   

 Conditional entropy of X = H(X) if X is

Y X Y X

   

 Conditional entropy of X = H(X) if X is

independent of Y

 

Y P X P Y X P Y X P Y X P Y X H )] ( ) ( l )[ ( )] ( l )[ ( ) (

 

   

Y X Y X

Y P X P Y X P Y X P Y X P Y X H

, ,

)] ( ) ( log )[ , ( )] , ( log )[ , ( ) , ( ) ( ) ( ) ( log ) , ( ) ( log ) , (

, ,

Y H X H Y P Y X P X P Y X P

Y X Y X

     

 Joint entropy of X and Y is the sum of the

entropies of X and Y if they are independent p y p

8 Nov 2012 11755/18797 10

slide-11
SLIDE 11

Onward..

8 Nov 2012 11755/18797 11

slide-12
SLIDE 12

Projection: multiple notes j

M = W =

 P = W (WTW)‐1 WT

11755/18797

( )

 Projected Spectrogram = P * M

8 Nov 2012 12

slide-13
SLIDE 13

We’re actually computing a score

M = H = ? W =

 M ~ WH

11755/18797

 H = pinv(W)M

8 Nov 2012 13

slide-14
SLIDE 14

How about the other way?

M = H = W =

? ?

U = W =

? ?

U =

11755/18797

 M ~ WH W = Mpinv(V) U = WH

8 Nov 2012 14

slide-15
SLIDE 15

So what are we doing here?

H = ? W = ?

 M ~ WH is an approximation  Given W, estimate H to minimize error

 



   

i j ij ij F 2 2

) ( min arg || || min arg H W M H W M H

H H

 Must ideally find transcription of given notes

8 Nov 2012 15 11755/18797

slide-16
SLIDE 16

Going the other way..

H W =? ?

 M ~ WH is an approximation  Given H, estimate W to minimize error

d ll f d h d h

 



   

i j ij ij F 2 2

) ( min arg || || min arg H W M H W M W

H W

 Must ideally find the notes corresponding to the

transcription

8 Nov 2012 16 11755/18797

slide-17
SLIDE 17

When both parameters are unknown

H = ? W =? approx(M) = ? approx(M) ?

 Must estimate both H and W to best

approximate M

 Ideally, must learn both the notes and their

transcription!

8 Nov 2012 17 11755/18797

slide-18
SLIDE 18

A least squares solution

2 ,

|| || min arg ,

F

H W M H W

H W

 

 Unconstrained

 For any W,H that minimizes the error, W’=WA, H’=A-1H

also minimizes the error for any invertible A also minimizes the error for any invertible A

H  For our problem, lets consider the “truth”.. H

For our problem, lets consider the truth ..

 When one note occurs, the other does not

 hi

Thj = 0 for all i != j i j

 The rows of H are uncorrelated

8 Nov 2012 18 11755/18797

slide-19
SLIDE 19

A least squares solution

H

 Assume: HHT = I

 Normalizing all rows of H to length 1

g g

 pinv(H) = HT  Projecting M onto H  Projecting M onto H

 W = M pinv(H) = MHT  WH = M HTH  WH M H H 2 ,

|| || min arg ,

F

H W M H W

H W

 

2

|| || min arg

F TH

H M M H

H

 

Constraint: Rank(H) = 4

8 Nov 2012 19 11755/18797

slide-20
SLIDE 20

Finding the notes

2

|| || min arg

F TH

H M M H

H

 

 Note HTH != I

 Only HHT = I  Could also be rewritten as

 

T T

trace M H H I M H

H

) ( min arg  

 

H

 

) ( min arg H H I M M H

H T T

trace  

 

) )( ( min arg H H I M H

H T T

n Correlatio trace  

 

T T

8 Nov 2012 11755/18797 20

 

H H M H

H T T

n Correlatio trace ) ( max arg 

slide-21
SLIDE 21

Finding the notes

 Constraint: every row of H has length 1

       

H H H H M H

H T T T

trace n Correlatio trace    ) ( max arg

 Differentiating and equating to 0

  H H M ) (

T

n Correlatio

 Simply requiring the rows of H to be orthonormal

 H H M ) ( n Correlatio

p y q g gives us that H is the set of Eigenvectors of the data in MT

8 Nov 2012 11755/18797 21

slide-22
SLIDE 22

Equivalences

   

H H H H M H

H T T T

trace n Correlatio trace    ) ( max arg

 is identical to

j T i j i ij i i i F

h h h H W M H W

H W

 

     

2 2 ,

|| || || || min arg ,

 Minimize least squares error with the constraint

that the rows of H are length 1 and orthogonal to

  • ne another

8 Nov 2012 22 11755/18797

slide-23
SLIDE 23

So how does that work?

 There are 12 notes in the segment, hence we try

to estimate 12 notes to estimate 12 notes..

8 Nov 2012 11755/18797 23

slide-24
SLIDE 24

So how does that work?

 The first three “notes” and their contributions

The spectrograms of the notes are statistically uncorrelated

 The spectrograms of the notes are statistically uncorrelated

8 Nov 2012 11755/18797 24

slide-25
SLIDE 25

Finding the notes

 Can find W instead of H

2

|| || i

T

M W W M W

 Solving the above with the constraints that the

2

|| || min arg

F T

M W W M W

W

 

 Solving the above, with the constraints that the

columns of W are orthonormal gives you the eigen vectors of the data in M eigen vectors of the data in M

   

W W M W W W

W T T

trace n Correlatio trace    ) ( max arg W W M   ) ( n Correlatio

8 Nov 2012 11755/18797 25

slide-26
SLIDE 26

So how does that work?

 There are 12 notes in the segment, hence we try

to estimate 12 notes..

8 Nov 2012 11755/18797 26

slide-27
SLIDE 27

Our notes are not orthogonal

O l i f i

 Overlapping frequencies  Note occur concurrently

 Harmonica continues to resonate to previous note

 More generally, simple orthogonality will not give

us the desired solution

8 Nov 2012 11755/18797 28

slide-28
SLIDE 28

What else can we look for?

 Assume: The “transcription” of one note does not

p depend on what else is playing

 Or, in a multi‐instrument piece, instruments are

playing independently of one another

 Not strictly true, but still..

8 Nov 2012 11755/18797 29

slide-29
SLIDE 29

Formulating it with Independence

) . . . . ( || || min arg ,

2 ,

t independen are H

  • f

rows

F

    H W M H W

H W

 Impose statistical independence constraints on  Impose statistical independence constraints on

decomposition

8 Nov 2012 11755/18797 30

slide-30
SLIDE 30

Changing problems for a bit

) ( ) ( ) (

2 12 1 11 1

t h w t h w t m  

) (

1 t

h

) ( ) ( ) ( t h t h t ) ( ) ( ) (

2 22 1 21 2

t h w t h w t m  

) (

2 t

h

 Two people speak simultaneously  Recorded by two microphones

E h d d i l i i t f b th i l

 Each recorded signal is a mixture of both signals

8 Nov 2012 11755/18797 31

slide-31
SLIDE 31

Imposing Statistical Constraints

=

M H W w11 w12

=

w21 w22 Signal from speaker 1

 M = WH

 M = “mixed” signal  W = “notes”

Signal at mic 1 Signal at mic 2 Signal from speaker 2

 H = “transcription”

 Given only M estimate H

Signal at mic 2

 Given only M estimate H  Ensure that the components of the vectors in the estimated H

are statistically independent

 Multiple approaches..

8 Nov 2012 11755/18797 32

slide-32
SLIDE 32

Imposing Statistical Constraints

=

M H W w11 w12

=

w21 w22  M = WH  Given only M estimate H  H = W-1M = AM  Estimate A such that the components of AM are

statistically independent

 A is the unmixing matrix

 Multiple approaches..

8 Nov 2012 11755/18797 33

slide-33
SLIDE 33

Statistical Independence

 M = WH H = AM  Emulating independence

C t W ( A) d H h th t H h t ti ti l

 Compute W (or A) and H such that H has statistical

characteristics that are observed in statistically independent variables p

 Enforcing independence

 Compute W and H such that the components of M

are independent

8 Nov 2012 11755/18797 34

slide-34
SLIDE 34

Emulating Independence

H  The rows of H are uncorrelated  The rows of H are uncorrelated

 E[hihj] = E[hi]E[hj]  hi and hj are the ith and jth components of any vector in H

j

 The fourth order moments are independent

 E[h h h h ] = E[h ]E[h ]E[h ]E[h ]  E[hihjhkhl] = E[hi]E[hj]E[hk]E[hl]  E[hi

2hjhk] = E[hi 2]E[hj]E[hk]

 E[hi

2hj 2] = E[hi 2]E[hj 2] j j

 Etc.

8 Nov 2012 11755/18797 35

slide-35
SLIDE 35

Zero Mean

 Usual to assume zero mean processes

 Otherwise, some of the math doesn’t work well

 M = WH H = AM  If mean(M) = 0 => mean(H) = 0

 E[H] = AE[M] = A0 = 0

[ ] [ ]

 First step of ICA: Set the mean of M to 0

i

l m M

m

) ( 1 

i i

cols M

m

) (  i

i i

  

m

m m 

 mi are the columns of M

8 Nov 2012 11755/18797 36

slide-36
SLIDE 36

Emulating Independence..

H H’

=

Diagonal + rank1 matrix H=AM A=BC

 Independence  Uncorrelatedness

H matrix H=BCM A BC

 Independence  Uncorrelatedness  Estimate a C such that CM is uncorrelated

X CM

 X = CM

 E[xixj] = E[xi]E[xj] = ij [since M is now “centered”]

T

 XXT = I

 In reality, we only want this to be a diagonal matrix, but we’ll

make it identity

8 Nov 2012 11755/18797 37

slide-37
SLIDE 37

Decorrelating

H H’

=

Diagonal + rank1 matrix H=AM A=BC

 X = CM

H matrix H=BCM A BC

 X = CM  XXT = I  Eigen decomposition MMT= USUT

Let C S-1/2UT

 Let C = S-1/2UT

 WMMTWT = S-1/2UT USUTUS-1/2 = I

8 Nov 2012 11755/18797 38

slide-38
SLIDE 38

Decorrelating

H H’

=

Diagonal + rank1 matrix H=AM A=BC

 X = CM

H matrix H=BCM A BC

X CM

 XXT = I  Eigen decomposition MMT= ESET  Eigen decomposition MM

ESE

 Let C = S-1/2ET

WMMTWT = S-1/2ET ESETES-1/2 = I

 X is called the whitened version of M

The process of decorrelating M is called whitening C i th hit i t i

C is the whitening matrix

8 Nov 2012 11755/18797 39

slide-39
SLIDE 39

Uncorrelated != Independent

 Whitening merely ensures that the resulting shat signals

are uncorrelated i e are uncorrelated, i.e. E[xixj] = 0 if i != j

j  This does not ensure higher order moments are also

decoupled e g it does not ensure that decoupled, e.g. it does not ensure that E[xi

2xj 2] = E[xi 2]E [xj 2] j j  This is one of the signatures of independent RVs

L t li itl d l th f th d t

40

 Lets explicitly decouple the fourth order moments

8 Nov 2012 11755/18797

slide-40
SLIDE 40

Decorrelating

H H’

=

Diagonal + rank1 matrix H=AM A=BC

 X = CM

H matrix H=BX A BC

 X CM  XXT = I

ill l i l i X b B l h ?

 Will multiplying X by B re‐correlate the components?  Not if B is unitary

 BBT = BTB = I  BB

B B I

 So we want to find a unitary matrix

 Since the rows of H are uncorrelated

B h i d d

Because they are independent

8 Nov 2012 11755/18797 41

slide-41
SLIDE 41

ICA: Freeing Fourth Moments

 We have E[xi xj] = 0 if i != j

 Already been decorrelated

 A=BC, H = BCM, X = CM,  H = BX  The fourth moments of H have the form:

E[hi hj hk hl]

 If the rows of H were independent

E[hi hj hk hl] = E[hi] E[hj] E[hk] E[hl]

 Solution: Compute B such that the fourth moments of H = BX

are decoupled

42  While ensuring that B is Unitary 8 Nov 2012 11755/18797

slide-42
SLIDE 42

ICA: Freeing Fourth Moments

 Create a matrix of fourth moment terms that would be

diagonal were the rows of H independent and diagonalize it A good candidate

 A good candidate

 Good because it incorporates the energy in all rows of H

         .. ..

23 22 21 13 12 11

d d d d d d D

 Where

dij = E[ k hk

2 hi hj]

    .. .. .. ..

ij

[

k k i j]

 i.e.

D = E[hTh h hT]

h are the columns of H

43 

Assuming h is real, else replace transposition with Hermition

8 Nov 2012 11755/18797

slide-43
SLIDE 43

ICA: The D matrix

       .. ..

23 22 21 13 12 11

d d d d d d D

dij = E[ k hk

2 hi hj] = mj mi m k mk

h h h cols



2

) ( 1 H

    .. .. .. ..

jth component jth component Sum of squares

  • f all components

hk

2

hi hj hk

2hi hj

 Average above term across all columns of H

44 8 Nov 2012 11755/18797

slide-44
SLIDE 44

ICA: The D matrix

       .. ..

23 22 21 13 12 11

d d d d d d D

dij = E[ k hk

2 hi hj] = mj mi m k mk

h h h cols



2

) ( 1 H

If the hi terms were independent

    .. .. .. ..

For i != j

             

 

 

        

j k i k j i k i j j i k j i k

E E E E E E E E

, 2 3 3 2

h h h h h h h h h h

Centered: E[hj] = 0  E[ k hk

2 hi hj]=0 for i != j

For i = j

     

 

Thus, if the hi terms were independent, dij = 0 if i != j

     

2 2 4 2

        

 

i k k i i k j i k

E E E E h h h h h h

45 j

i.e., if hi were independent, D would be a diagonal matrix

Let us diagonalize D

8 Nov 2012 11755/18797

slide-45
SLIDE 45

Diagonalizing D

 Compose a fourth order matrix from X

 Recall: X = CM, H = BX = BCM

 B is what we’re trying to learn to make H independent

 Compose D’ = E[xT x x xT]

 Diagonalize D’ via Eigen decmpositin

D’

UUT

D’ = UUT

 B = UT

 That’s it!!!!

8 Nov 2012 11755/18797 46

slide-46
SLIDE 46

B frees the fourth moment

D’ = UUT ; B = UT

 U is a unitary matrix, i.e. UTU = UUT = I (identity)  H = BX = UTX  h = UTx  The fourth moment matrix of H is

E[hT h h hT] = E[xTUUTx UT x xTU] = E[xTx UT x xTU] = UT E[xTx xxT]U U E[x x xx ]U = UT D’ U = UT U  U T U = 

47

 The fourth moment matrix of H = UTX is Diagonal!!

8 Nov 2012 11755/18797

slide-47
SLIDE 47

Overall Solution

 H = AM = BCM = BX  A = BC = UTC

C U C

48 8 Nov 2012 11755/18797

slide-48
SLIDE 48

Independent Component Analysis

Goal: to derive a matrix A such that the rows of AM are independent Proced re

Procedure:

1.

“Center” M

2.

Compute the autocorrelation matrix RMM of M

3.

Compute whitening matrix C via Eigen decomposition RXX = ESET, C = S-1/2ET

4.

Compute X = CM p

5.

Compute the fourth moment matrix D’ = E[xTxxxT]

6.

Diagonalize D’ via Eigen decomposition

7

D’ = UUT

7.

D = UUT

8.

Compute A = UTC

The fourth moment matrix of H=AM is diagonal

49 

Note that the autocorrelation matrix of H will also be diagonal

8 Nov 2012 11755/18797

slide-49
SLIDE 49

ICA by diagonalizing moment i matrices

 The procedure just outlined, while fully functional, has

shortcomings

 Only a subset of fourth order moments are considered  There are many other ways of constructing fourth‐order moment

e e a e a y ot e ays o co st uct g ou t

  • de
  • e t

matrices that would ideally be diagonal

Diagonalizing the particular fourth‐order moment matrix we have chosen is not guaranteed to diagonalize every other fourth‐order moment matrix

 JADE: (Joint Approximate Diagonalization of Eigenmatrices),

J.F. Cardoso

 Jointly diagonalizes several fourth‐order moment matrices  More effective than the procedure shown, but more computationally

expensive

50

expensive

8 Nov 2012 11755/18797

slide-50
SLIDE 50

Enforcing Independence

 Specifically ensure that the components of H are

independent independent

 H = AM

 Contrast function: A non‐linear function that has a

minimum value when the output components are independent D fi d i i i f i

 Define and minimize a contrast function

  • F(AM)

C t t f ti ft l i ti t

 Contrast functions are often only approximations too..

8 Nov 2012 11755/18797 51

slide-51
SLIDE 51

A note on pre-whitening

 The mixed signal is usually “prewhitened”

 Normalize variance along all directions

g

 Eliminate second‐order dependence

 X = CM

 E[xixj] = E[xi]E[xj] = ij for centered signals

 Eigen decomposition MMT = ESET

g p

 C = S-1/2ET  Can use first K columns of E only if only K independent  Can use first K columns of E only if only K independent

sources are expected

 In microphone array setup – only K < M sources

8 Nov 2012 11755/18797 52

slide-52
SLIDE 52

The contrast function

 Contrast function: A non‐linear function that has

i i l h th t t t a minimum value when the output components are independent

 An explicit contrast function

) ( ) ( ) ( H h H H H I

) ( ) ( ) ( H h H H H I

i i

 

 With constraint : H = BX

 X is “whitened” M

8 Nov 2012 11755/18797 53

slide-53
SLIDE 53

Linear Functions

 h = Bx

 Individual columns of the H and X matrices  x is mixed signal, B is the unmixing matrix 1

| | ) ( ) (

 

 B h B h

1 x h

P P

 x x x x d P P H ) ( log ) ( ) ( | | log ) ( ) ( B x h   H H

8 Nov 2012 11755/18797 54

slide-54
SLIDE 54

The contrast function

) ( ) ( ) ( H h H H H I

i

 

i

| | log ) ( ) ( ) ( B x h H     H H I

i

 Ignoring H(x) (Const)

i

g g ( ) ( )

| | log ) ( ) ( W h H

 

i i

H J

 Minimize the above to obtain B

i

8 Nov 2012 11755/18797 55

slide-55
SLIDE 55

An alternate approach

 Recall PCA  M = WH, the columns of W must be statistically

independent

 Leads to: minW||M –WTWM||2

 Error minimization framework to estimate W

 Can we arrive at an error minimization

framework for ICA

 Define an “Error” objective that represents

independence

8 Nov 2012 11755/18797 56

slide-56
SLIDE 56

An alternate approach

 Definition of Independence – if x and y are

i d d t independent:

 E[f(x)g(y)] = E[f(x)]E[g(y)]

M t h ld f f() d ()!!

 Must hold for every f() and g()!!

8 Nov 2012 11755/18797 57

slide-57
SLIDE 57

An alternate approach

 Define g(H) = g(BX) (component‐wise function)

g(h11) g(h12) . g(h21) g(h22) . . . .

 Define f(H) = f(BX)

. . . .

( ) ( )

. . . f(h11) f(h12) f(h21) f(h22) . . . . . .

8 Nov 2012 11755/18797 58

slide-58
SLIDE 58

An alternate approach

 P = g(H) f(H)T = g(BX) f(BX)T

P11 P12 . P21 P22. . . .

k jk ik ij

h f h g P ) ( ) (

P =

This is a square matrix

 Must ideally be

. . . .

y

. . . Q11 Q12 Q21 Q22

Q =

j i h f h g Q

k l jl ik ij

  

) ( ) (

E ||P Q|| 2

. . . . . .

Q

k l

k il ik ii

h f h g Q ) ( ) (

 Error = ||P-Q||F 2

8 Nov 2012 11755/18797 59

k

slide-59
SLIDE 59

An alternate approach

 Ideal value for Q

 

. . . Q11 Q12 . Q21 Q22 .

Q =

j i h f h g Q

k l jl ik ij

  

) ( ) (

. . . . . .

Q

k il ik ii

h f h g Q ) ( ) (

 If g() and h() are odd symmetric functions

jg(hij) = 0 for all i jg(hij) 0 for all i

 Since = jhij = 0 (H is centered)  Q is a Diagonal Matrix!!!

Q g

8 Nov 2012 11755/18797 60

slide-60
SLIDE 60

An alternate approach

 Minimize Error T

Diagonal  Q

T

g(BX)f(BX) P  Diagonal Q

2

|| ||

F

error Q P  

 Leads to trivial Widrow Hopf type iterative rule:

|| ||

F

Q

T

g(BX)f(BX) E   Diag

T

EB B B  

8 Nov 2012 11755/18797 61

EB B B   

slide-61
SLIDE 61

Update Rules

 Multiple solutions under different assumptions

f () d f() for g() and f()

 H = BX  B = B +  B  Jutten Herraut : Online update

 Bij = f(hi)g(hj); ‐‐ actually assumed a recursive

neural network

 Bell Sejnowski

 B = ([BT]-1 – g(H)XT)

8 Nov 2012 11755/18797 62

slide-62
SLIDE 62

Update Rules

 Multiple solutions under different assumptions

f () d f() for g() and f()

 H = BX  B = B +  B

l d f() d f

 Natural gradient ‐‐ f() = identity function

 B = (I – g(H)HT)W

 Cichoki‐Unbehaeven

 B = (I – g(H)f(H)T)W

8 Nov 2012 11755/18797 63

slide-63
SLIDE 63

What are G() and H()

 Must be odd symmetric functions  Multiple functions proposed

 G i i ) h(       Gaussian sub is x ) tanh( Gaussian super is x ) tanh( ) ( x x x x x g

 Audio signals in general

 B = (I – HHT-Ktanh(H)HT)W  B (I HH Ktanh(H)H )W

 Or simply

 B = (I –Ktanh(H)HT)W  B = (I –Ktanh(H)H )W

8 Nov 2012 11755/18797 64

slide-64
SLIDE 64

So how does it work?

 Example with instantaneous mixture of two

speakers

 Natural gradient update  Works very well!

65

 Works very well!

8 Nov 2012 11755/18797

slide-65
SLIDE 65

Another example!

Input Mix Output

8 Nov 2012 66 11755/18797

slide-66
SLIDE 66

Another Example

Th i t t

 Three instruments..

8 Nov 2012 11755/18797 67

slide-67
SLIDE 67

The Notes

 Three instruments..

8 Nov 2012 11755/18797 68

slide-68
SLIDE 68

ICA for data exploration

 The “bases” in PCA

represent the “building blocks”

 Ideally notes

 Very successfully used

y y

 So can ICA be used to do

the same?

8 Nov 2012 69 11755/18797

slide-69
SLIDE 69

ICA vs PCA bases

Non-Gaussian data PCA

  • Motivation for using ICA vs PCA
  • PCA will indicate orthogonal directions

ICA PCA

  • PCA will indicate orthogonal directions
  • f maximal variance
  • May not align with the data!
  • ICA finds directions that are

independent More likely to “align” with the data

  • More likely to align with the data

8 Nov 2012 70 11755/18797

slide-70
SLIDE 70

Finding useful transforms with ICA

 Audio preprocessing example  Take a lot of audio snippets

d t t th i bi and concatenate them in a big matrix, do component analysis l i h b

 PCA results in the DCT bases  ICA returns time/freq

localized sinusoids which is a better way to analyze sounds

 Ditto for images

ICA returns localizes edge filters g

8 Nov 2012 71 11755/18797

slide-71
SLIDE 71

Example case: ICA-faces vs. Eigenfaces

ICA-faces Eigenfaces

8 Nov 2012 72 11755/18797

slide-72
SLIDE 72

ICA for Signal Enhncement

 Very commonly used to enhance EEG signals  EEG signals are frequently corrupted by

g q y p y heartbeats and biorhythm signals

 ICA can be used to separate them out  ICA can be used to separate them out

8 Nov 2012 11755/18797 73

slide-73
SLIDE 73

So how does that work?

 There are 12 notes in the segment, hence we try

to estimate 12 notes to estimate 12 notes..

8 Nov 2012 11755/18797 74

slide-74
SLIDE 74

PCA solution

 There are 12 notes in the segment, hence we try

to estimate 12 notes..

8 Nov 2012 11755/18797 75

slide-75
SLIDE 75

So how does this work: ICA solution

 Better..

But not much

 But not much

 But the issues here?

8 Nov 2012 11755/18797 76

slide-76
SLIDE 76

ICA Issues

 No sense of order

 Unlike PCA  Unlike PCA

 Get K independent directions, but does not have a notion

  • f the “best” direction

 So the sources can come in any order  Permutation invariance

D t h f li

 Does not have sense of scaling

 Scaling the signal does not affect independence

 Outputs are scaled versions of desired signals in  Outputs are scaled versions of desired signals in

permuted order

 In the best case  In worse case, output are not desired signals at all..

8 Nov 2012 11755/18797 77

slide-77
SLIDE 77

What else went wrong?

 Assume distribution of signals is symmetric

around mean around mean

 Note energy here  Not symmetric – negative values never happen  Still this didn’t affect the three instruments case..

 Notes are not independent

 Only one note plays at a time  If one note plays, other notes are not playing

p y p y g

8 Nov 2012 11755/18797 78

slide-78
SLIDE 78

Continue in next class..

 NMF  Factor analysis..

8 Nov 2012 11755/18797 79