Processing Independent Component Analysis Class 8. 24 Sep 2015 - - PowerPoint PPT Presentation

processing
SMART_READER_LITE
LIVE PREVIEW

Processing Independent Component Analysis Class 8. 24 Sep 2015 - - PowerPoint PPT Presentation

Machine Learning for Signal Processing Independent Component Analysis Class 8. 24 Sep 2015 Instructor: Bhiksha Raj 11755/18797 1 Revisiting the Covariance Matrix Assuming centered data C = S X XX T = X 1 X 1 T + X 2 X 2 T + .


slide-1
SLIDE 1

Machine Learning for Signal Processing Independent Component Analysis

Class 8. 24 Sep 2015 Instructor: Bhiksha Raj

11755/18797 1

slide-2
SLIDE 2

Revisiting the Covariance Matrix

  • Assuming centered data
  • C = SX XXT
  • = X1X1

T + X2X2 T + ….

  • Let us view C as a transform..

11755/18797 2

slide-3
SLIDE 3

Covariance matrix as a transform

  • (X1X1

T + X2X2 T + … ) V = X1X1 TV + X2X2 TV + …

  • Consider a 2-vector example

– In two dimensions for illustration

11755/18797 3

slide-4
SLIDE 4

Covariance Matrix as a transform

  • Data comprises only 2 vectors..
  • Major axis of component ellipses proportional to twice the

length of the corresponding vector

4

slide-5
SLIDE 5

Covariance Matrix as a transform

  • Data comprises only 2 vectors..
  • Major axis of component ellipses proportional to twice the

length of the corresponding vector

5

Adding

slide-6
SLIDE 6

Covariance Matrix as a transform

  • More vectors..
  • Major axis of component ellipses proportional to twice the

length of the corresponding vector

11755/18797 6

Adding

slide-7
SLIDE 7

Covariance Matrix as a transform

  • More vectors..
  • Major axis of component ellipses proportional to twice the

length of the corresponding vector

11755/18797 7

Adding

slide-8
SLIDE 8

Covariance Matrix as a transform

  • And still more vectors..
  • Major axis of component ellipses proportional to twice the

length of the corresponding vector

11755/18797 8

slide-9
SLIDE 9

Covariance Matrix as a transform

  • The covariance matrix captures the directions of

maximum variance

  • What does it tell us about trends?

11755/18797 9

slide-10
SLIDE 10

Data Trends: Axis aligned covariance

  • Axis aligned covariance
  • At any X value, the average Y value of vectors is 0

– X cannot predict Y

  • At any Y, the average X of vectors is 0

– Y cannot predict X

  • The X and Y components are uncorrelated

11755/18797 10

slide-11
SLIDE 11

Data Trends: Tilted covariance

  • Tilted covariance
  • The average Y value of vectors at any X varies with X

– X predicts Y

  • Average X varies with Y
  • The X and Y components are correlated

11755/18797 11

slide-12
SLIDE 12

Decorrelation

  • Shifting to using the major axes as the coordinate system

– L1 does not predict L2 and vice versa – In this coordinate system the data are uncorrelated

  • We have decorrelated the data by rotating the axes

11755/18797 12

L1 L1

slide-13
SLIDE 13

The statistical concept of correlatedness

  • Two variables X and Y are correlated if If

knowing X gives you an expected value of Y

  • X and Y are uncorrelated if knowing X tells you

nothing about the expected value of Y

– Although it could give you other information – How?

11755/18797 13

slide-14
SLIDE 14

Correlation vs. Causation

  • The consumption of burgers has gone up

steadily in the past decade

  • In the same period, the penguin population of

Antarctica has gone down

11755/18797 14

Correlation, not Causation (unless McDonalds has a top-secret Antarctica division)

slide-15
SLIDE 15

The concept of correlation

  • Two variables are correlated if knowing the

value of one gives you information about the expected value of the other

11755/18797 15

Burger consumption Penguin population Time

slide-16
SLIDE 16

A brief review of basic probability

  • Uncorrelated: Two random variables X and Y are

uncorrelated iff:

– The average value of the product of the variables equals the product of their individual averages

  • Setup: Each draw produces one instance of X and one

instance of Y

– I.e one instance of (X,Y)

  • E[XY] = E[X]E[Y]
  • The average value of Y is the same regardless of the value
  • f X

11755/18797 16

slide-17
SLIDE 17

Correlated Variables

  • Expected value of Y given X:

– Find average of Y values of all samples at (or close) to the given X – If this is a function of X, X and Y are correlated

11755/18797 17

Burger consumption Penguin population b1 b2 P1 P2

slide-18
SLIDE 18

Uncorrelatedness

  • Knowing X does not tell you what the average

value of Y is

– And vice versa

11755/18797 18

Average Income Burger consumption b1 b2

slide-19
SLIDE 19

Uncorrelated Variables

  • The average value of Y is the same regardless
  • f the value of X and vice versa

11755/18797 19

Burger consumption Average Income Y as a function of X X as a function of Y

slide-20
SLIDE 20

Uncorrelatedness in Random Variables

  • Which of the above represent uncorrelated RVs?

11755/18797 20

slide-21
SLIDE 21

The notion of decorrelation

  • So how does one transform the correlated

variables (X,Y) to the uncorrelated (X’, Y’)

11755/18797 21

X Y X’ Y’

?

                 Y X M Y X ' '

slide-22
SLIDE 22

What does “uncorrelated” mean

  • If Y is a matrix of vectors, YYT = diagonal

11755/18797 22

X’ Y’ Assuming 0 mean

  • E[X’] = constant
  • E[Y’] = constant
  • E[Y’|X’] = constant

– All will be 0 for centered data

 

matrix diagonal Y E X E Y Y X Y X X E Y X Y X E ] ' [ ] ' [ ' ' ' ' ' ' ' ' ' '

2 2 2 2

                                

slide-23
SLIDE 23

Decorrelation

  • Let X be the matrix of correlated data vectors

– Each component of X informs us of the mean trend of

  • ther components
  • Need a transform M such that if Y = MX such

that the covariance of Y is diagonal

– YYT is the covariance if Y is zero mean – YYT = Diagonal MXXTMT = Diagonal M.Cov(X).MT = Diagonal

11755/18797 23

slide-24
SLIDE 24

Decorrelation

  • Easy solution:

– Eigen decomposition of Cov(X): Cov(X) = ELET – EET = I

  • Let M = ET
  • MCov(X)MT = ETELETE = L = diagonal
  • PCA: Y = MTX
  • Diagonalizes the covariance matrix

– “Decorrelates” the data

11755/18797 24

slide-25
SLIDE 25

PCA

  • PCA: Y = MTX
  • Diagonalizes the covariance matrix

– “Decorrelates” the data

11755/18797 25

X Y w1 w2 E1 E2

2 2 1 1

E E X w w  

slide-26
SLIDE 26

Decorrelating the data

  • Are there other decorrelating axes?

11755/18797 26

. . . . . . . . . . . . . . . . . . . . . .

slide-27
SLIDE 27

Decorrelating the data

  • Are there other decorrelating axes?

11755/18797 27

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

slide-28
SLIDE 28

Decorrelating the data

  • Are there other decorrelating axes?
  • What about if we don’t require them to be
  • rthogonal?

11755/18797 28

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

slide-29
SLIDE 29

Decorrelating the data

  • Are there other decorrelating axes?
  • What about if we don’t require them to be
  • rthogonal?
  • What is special about these axes?

11755/18797 29

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

slide-30
SLIDE 30

The statistical concept of Independence

  • Two variables X and Y are dependent if If

knowing X gives you any information about Y

  • X and Y are independent if knowing X tells you

nothing at all of Y

11755/18797 30

slide-31
SLIDE 31

A brief review of basic probability

  • Independence: Two random variables X and Y

are independent iff:

– Their joint probability equals the product of their individual probabilities

  • P(X,Y) = P(X)P(Y)
  • Independence implies uncorrelatedness

– The average value of X is the same regardless of the value of Y

  • E[X|Y] = E[X]

– But not the other way

11755/18797 31

slide-32
SLIDE 32

A brief review of basic probability

  • Independence: Two random variables X and

Y are independent iff:

  • The average value of any function of X is the

same regardless of the value of Y

– Or any function of Y

  • E[f(X)g(Y)] = E[f(X)] E[g(Y)] for all f(), g()

11755/18797 32

slide-33
SLIDE 33

Independence

  • Which of the above represent independent RVs?
  • Which represent uncorrelated RVs?

11755/18797 33

slide-34
SLIDE 34

A brief review of basic probability

  • The expected value of an odd function of an

RV is 0 if

– The RV is 0 mean – The PDF is of the RV is symmetric around 0

  • E[f(X)] = 0 if f(X) is odd symmetric

11755/18797 34

y = f(x) p(x)

slide-35
SLIDE 35

A brief review of basic info. theory

  • Entropy: The minimum average number of bits

to transmit to convey a symbol

  • Joint entropy: The minimum average number of

bits to convey sets (pairs here) of symbols

11755/18797 35

T(all), M(ed), S(hort)… T, M, S… M F F M..

 

X

X P X P X H )] ( log )[ ( ) (

 

Y X

Y X P Y X P Y X H

,

)] , ( log )[ , ( ) , ( X Y

slide-36
SLIDE 36

A brief review of basic info. theory

  • Conditional Entropy: The minimum average

number of bits to transmit to convey a symbol X, after symbol Y has already been conveyed

– Averaged over all values of X and Y

11755/18797 36

T, M, S… M F F M.. X Y

  

   

Y X Y X

Y X P Y X P Y X P Y X P Y P Y X H

,

)] | ( log )[ , ( )] | ( log )[ | ( ) ( ) | (

slide-37
SLIDE 37

A brief review of basic info. theory

  • Conditional entropy of X = H(X) if X is

independent of Y

  • Joint entropy of X and Y is the sum of the

entropies of X and Y if they are independent

11755/18797 37

) ( )] ( log )[ ( ) ( )] | ( log )[ | ( ) ( ) | ( X H X P X P Y P Y X P Y X P Y P Y X H

Y X Y X

    

     

   

Y X Y X

Y P X P Y X P Y X P Y X P Y X H

, ,

)] ( ) ( log )[ , ( )] , ( log )[ , ( ) , ( ) ( ) ( ) ( log ) , ( ) ( log ) , (

, ,

Y H X H Y P Y X P X P Y X P

Y X Y X

     

slide-38
SLIDE 38

Onward..

11755/18797 38

slide-39
SLIDE 39

Projection: multiple notes

11755/18797 39

 P = W (WTW)-1 WT  Projected Spectrogram = PM

M = W =

slide-40
SLIDE 40

We’re actually computing a score

11755/18797 40

 M ~ WH  H = pinv(W)M

M = W = H = ?

slide-41
SLIDE 41

How about the other way?

11755/18797 41

 M ~ WH W = Mpinv(H) U = WH

M = W =

? ?

H = U =

slide-42
SLIDE 42

When both parameters are unknown

  • Must estimate both H and W to best

approximate M

  • Ideally, must learn both the notes and their

transcription!

W =? H = ? approx(M) = ?

11755/18797 42

slide-43
SLIDE 43

A least squares solution

  • Constraint: W is orthogonal

– WTW = I

  • The solution: W are the Eigen vectors of

MMT

– PCA!!

  • M ~ WH is an approximation
  • Also, the rows of H are decorrelated

– Trivial to prove that HHT is diagonal

) ( || || min arg ,

2 ,

I W W H W M H W

H W

 L   

T F

43

slide-44
SLIDE 44

PCA

  • The columns of W are the bases we have

learned

– The linear “building blocks” that compose the music

  • They represent “learned” notes

WH M H W M H W

H W

  

2 ,

|| || min arg ,

F

11755/18797 44

slide-45
SLIDE 45

So how does that work?

  • There are 12 notes in the segment, hence we

try to estimate 12 notes..

11755/18797 45

slide-46
SLIDE 46

So how does that work?

  • There are 12 notes in the segment, hence we

try to estimate 12 notes..

  • Results are not good

11755/18797 46

slide-47
SLIDE 47

PCA through decorrelation of notes

  • Different constraint: Constraint H to be decorrelated

– HHT = D

  • This will result exactly in PCA too
  • Decorrelation of H Interpretation: What does this

mean?

11755/18797 47

) ( || || min arg ,

2 ,

D H H H M H W

H W

 L   

T F

slide-48
SLIDE 48

What else can we look for?

  • Assume: The “transcription” of one note does

not depend on what else is playing

– Or, in a multi-instrument piece, instruments are playing independently of one another

  • Not strictly true, but still..

11755/18797 48

slide-49
SLIDE 49

Formulating it with Independence

  • Impose statistical independence constraints
  • n decomposition

11755/18797 49

) . . . . ( || || min arg ,

2 ,

t independen are H

  • f

rows

F

L    H W M H W

H W

slide-50
SLIDE 50

Changing problems for a bit

  • Two people speak simultaneously
  • Recorded by two microphones
  • Each recorded signal is a mixture of both signals

11755/18797 50

) ( ) ( ) (

2 12 1 11 1

t h w t h w t m   ) ( ) ( ) (

2 22 1 21 2

t h w t h w t m  

) (

1 t

h ) (

2 t

h

slide-51
SLIDE 51

A Separation Problem

  • M = WH

– M = “mixed” signal – W = “notes” – H = “transcription”

  • Separation challenge: Given only M estimate H
  • Identical to the problem of “finding notes”

11755/18797 51

=

M H W w11 w12 w21 w22 Signal at mic 1 Signal at mic 2 Signal from speaker 1 Signal from speaker 2

slide-52
SLIDE 52

A Separation Problem

  • Separation challenge: Given only M estimate H
  • Identical to the problem of “finding notes”

11755/18797 52

M H W w11 w12 w21 w22

slide-53
SLIDE 53

Imposing Statistical Constraints

  • M = WH
  • Given only M estimate H
  • H = W-1M = AM
  • Only known constraint: The rows of H are

independent

  • Estimate A such that the components of AM are

statistically independent

– A is the unmixing matrix

11755/18797 53

=

M H W w11 w12 w21 w22

slide-54
SLIDE 54

Statistical Independence

  • M = WH H = AM

11755/18797 54

Remember this form

slide-55
SLIDE 55

An ugly algebraic solution

  • We could decorrelate signals by algebraic manipulation

– We know uncorrelated signals have diagonal correlation matrix – So we transformed the signal so that it has a diagonal correlation matrix (HHT)

  • Can we do the same for independence

– Is there a linear transform that will enforce independence?

11755/18797 55

M = WH …….. H = AM

slide-56
SLIDE 56

Emulating Independence

  • The rows of H are uncorrelated

– E[hihj] = E[hi]E[hj] – hi and hj are the ith and jth components of any vector in H

  • The fourth order moments are independent

– E[hihjhkhl] = E[hi]E[hj]E[hk]E[hl] – E[hi

2hjhk] = E[hi 2]E[hj]E[hk]

– E[hi

2hj 2] = E[hi 2]E[hj 2]

– Etc.

11755/18797 56

H

slide-57
SLIDE 57

Zero Mean

  • Usual to assume zero mean processes

– Otherwise, some of the math doesn’t work well

  • M = WH H = AM
  • If mean(M) = 0 => mean(H) = 0

– E[H] = A.E[M] = A0 = 0 – First step of ICA: Set the mean of M to 0 – mi are the columns of M

11755/18797 57

i i

cols m M

m

) ( 1  i

i i

  

m

m m 

slide-58
SLIDE 58

Emulating Independence..

  • Independence  Uncorrelatedness
  • Estimate a C such that CM is decorrelated
  • A little more than PCA

11755/18797 58

H H’

=

Diagonal + rank1 matrix H=AM H=BCM A=BC

slide-59
SLIDE 59

Decorrelating

  • Eigen decomposition MMT= ESET
  • C = S-1/2ET
  • X = CM
  • Not merely decorrelated but whitened

– XXT = CMMTCT = S-1/2ET ESETES-1/2 = I

  • C is the whitening matrix

11755/18797 59

H H’

=

Diagonal + rank1 matrix H=AM H=BCM A=BC

slide-60
SLIDE 60

Uncorrelated != Independent

  • Whitening merely ensures that the resulting signals are

uncorrelated, i.e. E[xixj] = 0 if i != j

  • This does not ensure higher order moments are also

decoupled, e.g. it does not ensure that E[xi

2xj 2] = E[xi 2]E [xj 2]

  • This is one of the signatures of independent RVs
  • Lets explicitly decouple the fourth order moments

60 11755/18797

slide-61
SLIDE 61

Decorrelating

  • X = CM
  • XXT = I
  • Will multiplying X by B re-correlate the components?
  • Not if B is unitary

– BBT = BTB = I

  • HHT = BXXTBT = BBT = I
  • So we want to find a unitary matrix

– Since the rows of H are uncorrelated

  • Because they are independent

11755/18797 61

H H’

=

Diagonal + rank1 matrix H=AM H=BX A=BC H=BCM

slide-62
SLIDE 62

ICA: Freeing Fourth Moments

  • H=AM, A=BC, X = CM,  H = BX
  • The fourth moments of H have the form:

E[hi hj hk hl]

  • If the rows of H were independent

E[hi hj hk hl] = E[hi] E[hj] E[hk] E[hl]

  • Solution: Compute B such that the fourth moments of H = BX

are decoupled

– While ensuring that B is Unitary

62 11755/18797

slide-63
SLIDE 63

ICA: Freeing Fourth Moments

  • Create a matrix of fourth moment terms that would be

diagonal were the rows of H independent and diagonalize it

  • A good candidate

– Good because it incorporates the energy in all rows of H – Where dij = E[ Sk hk

2 hi hj]

– i.e. D = E[hTh h hT]

  • h are the columns of H
  • Assuming h is real, else replace transposition with Hermition

63

         .. .. .. .. .. ..

23 22 21 13 12 11

d d d d d d D

11755/18797

slide-64
SLIDE 64

ICA: The D matrix

  • Average above term across all columns of H

64

         .. .. .. .. .. ..

23 22 21 13 12 11

d d d d d d D

dij = E[ Sk hk

2 hi hj] = mj mi m k mk

h h h cols



2

) ( 1 H

jth component jth component Sum of squares

  • f all components

Shk

2

hi hj Shk

2hi hj 11755/18797

Energy-weighted correlation!!

slide-65
SLIDE 65

ICA: The D matrix

  • If the hi terms were independent

– For i != j – Centered: E[hj] = 0  E[ Sk hk

2 hi hj]=0 for i != j

– For i = j

  • Thus, if the hi terms were independent, dij = 0 if i != j
  • i.e., if hi were independent, D would be a diagonal matrix

– Let us diagonalize D

65

         .. .. .. .. .. ..

23 22 21 13 12 11

d d d d d d D

dij = E[ Sk hk

2 hi hj] =

             

 

 

        

j k i k j i k i j j i k j i k

E E E E E E E E

, 2 3 3 2

h h h h h h h h h h

     

2 2 4 2

        

 

i k k i i k j i k

E E E E h h h h h h

mj mi m k mk

h h h cols



2

) ( 1 H

11755/18797

slide-66
SLIDE 66

Diagonalizing D

  • Compose a fourth order matrix from X

– Recall: X = CM, H = BX = BCM

  • B is what we’re trying to learn to make H independent
  • Note: if H = BX , then each h = Bx
  • The fourth moment matrix of H is
  • D = E[hT h h hT] = E[xTBBTx BT x xTB]

= E[xTx BT x xTB] = BT E[xTx xxT]B

11755/18797 66

slide-67
SLIDE 67

Diagonalizing D

  • Objective: Estimate B such that the fourth

moment of H = BX is diagonal

  • Compose Dx = E[xT x x xT]
  • Diagonalize Dx via Eigen decomposition

Dx = ULUT

  • B = UT

– That’s it!!!!

11755/18797 67

slide-68
SLIDE 68

B frees the fourth moment

Dx = ULUT ; B = UT

  • U is a unitary matrix, i.e. UTU = UUT = I (identity)
  • H = BX = UTX
  • h = UTx
  • The fourth moment matrix of H is

E[hT h h hT] = UT E[xTx xxT]U = UT Dx U = UT U L U T U = L

  • The fourth moment matrix of H = UTX is Diagonal!!

68 11755/18797

slide-69
SLIDE 69

Overall Solution

  • H = AM = BCM

– C is the (transpose of the) matrix of Eigen vectors of MMT

  • X = CM
  • A = BC = UTC

– B is the (transpose of the) matrix of Eigenvectors of X.diag(XTX).XT

69 11755/18797

slide-70
SLIDE 70

ICA by diagonalizing moment matrices

  • The procedure just outlined, while fully functional, has

shortcomings

– Only a subset of fourth order moments are considered – There are many other ways of constructing fourth-order moment matrices that would ideally be diagonal

  • Diagonalizing the particular fourth-order moment matrix we have chosen

is not guaranteed to diagonalize every other fourth-order moment matrix

  • JADE: (Joint Approximate Diagonalization of Eigenmatrices),

J.F. Cardoso

– Jointly diagonalizes several fourth-order moment matrices – More effective than the procedure shown, but computationally more expensive

71 11755/18797

slide-71
SLIDE 71

Enforcing Independence

  • Specifically ensure that the components of H are

independent

– H = AM

  • Contrast function: A non-linear function that has a

minimum value when the output components are independent

  • Define and minimize a contrast function

» F(AM)

  • Contrast functions are often only approximations too..

11755/18797 72

slide-72
SLIDE 72

A note on pre-whitening

  • The mixed signal is usually “prewhitened” for all ICA methods

– Normalize variance along all directions – Eliminate second-order dependence

  • Eigen decomposition MMT = ESET
  • C = S-1/2ET
  • Can use first K columns of E only if only K independent sources are

expected

– In microphone array setup – only K < M sources

  • X = CM

– E[xixj] = dij for centered signal

11755/18797 73

slide-73
SLIDE 73

The contrast function

  • Contrast function: A non-linear function that

has a minimum value when the output components are independent

  • An explicit contrast function
  • With constraint : H = BX

– X is “whitened” M

11755/18797 74

) ( ) ( ) ( h h H H H I

i i

 

slide-74
SLIDE 74

Linear Functions

  • h = Bx, x = B-1h

– Individual columns of the H and X matrices – x is mixed signal, B is the unmixing matrix

11755/18797 75

1

| | ) ( ) (

 

 B h B h

1 x h

P P

  x x x x d P P H ) ( log ) ( ) ( | | log ) ( ) ( B x h   H H |) log(| ) ( log ) ( log

1

B h B x

x

 

P P

slide-75
SLIDE 75

The contrast function

  • Ignoring H(x) (Const)
  • Minimize the above to obtain B

11755/18797 76

) ( ) ( ) ( H h H H H I

i i

  | | log ) ( ) ( ) ( B x h H    H H I

i i

| | log ) ( ) ( B h H

 

i i

H J

slide-76
SLIDE 76

An alternate approach

  • Definition of Independence – if x and y are

independent:

– E[f(x)g(y)] = E[f(x)]E[g(y)] – Must hold for every f() and g()!!

11755/18797 78

slide-77
SLIDE 77

An alternate approach

  • Define g(H) = g(BX) (component-wise

function)

  • Define f(H) = f(BX)

11755/18797 79

g(h11) g(h12) . . . g(h21) g(h22) . . . . . . . . . . . . . . . f(h11) f(h12) f(h21) f(h22)

slide-78
SLIDE 78

An alternate approach

  • P = g(H) f(H)T = g(BX) f(BX)T

This is a square matrix

  • Must ideally be
  • Error = ||P-Q||F

2

11755/18797 80

P11 P12 . . . P21 P22 . . . . . . . . .

P =

Q11 Q12 . . . Q21 Q22 . . .

Q =

j i h f E h g E Q

j i ij

  )] ( [ )] ( [

)] ( ) ( [

i i ii

h f h g E Q 

Pij = E[g(hi)f(hj)]

slide-79
SLIDE 79

An alternate approach

  • Ideal value for Q
  • If g() and h() are odd symmetric functions

E[g(hi)] = 0 for all i

– Since = Ehi] = 0 (H is centered) – Q is a Diagonal Matrix!!!

11755/18797 81

. . . Q11 Q12 . . . Q21 Q22 . . .

Q =

j i h f E h g E Q

j i ij

  )] ( [ )] ( [

)] ( ) ( [

i i ii

h f h g E Q 

slide-80
SLIDE 80

An alternate approach

  • Minimize Error
  • Leads to trivial Widrow Hopf type iterative

rule:

11755/18797 82

Diagonal  Q

T

g(BX)f(BX) P 

2

|| ||

F

error Q P  

T

g(BX)f(BX) E   Diag

T

EB B B   

slide-81
SLIDE 81

Update Rules

  • Multiple solutions under different

assumptions for g() and f()

  • H = BX
  • B = B +  DB
  • Jutten Herraut : Online update

– DBij = f(hi)g(hj); -- actually assumed a recursive neural network

  • Bell Sejnowski

– DB = ([BT]-1 – g(H)XT)

11755/18797 83

slide-82
SLIDE 82

Update Rules

  • Multiple solutions under different

assumptions for g() and f()

  • H = BX
  • B = B +  DB
  • Natural gradient -- f() = identity function

– DB = (I – g(H)HT)W

  • Cichoki-Unbehaeven

– DB = (I – g(H)f(H)T)W

11755/18797 84

slide-83
SLIDE 83

What are G() and H()

  • Must be odd symmetric functions
  • Multiple functions proposed
  • Audio signals in general

– DB = (I – HHT-Ktanh(H)HT)W

  • Or simply

– DB = (I –Ktanh(H)HT)W

11755/18797 85

      Gaussian sub is x ) tanh( Gaussian super is x ) tanh( ) ( x x x x x g

slide-84
SLIDE 84

So how does it work?

  • Example with instantaneous mixture of two

speakers

  • Natural gradient update
  • Works very well!

86 11755/18797

slide-85
SLIDE 85

Another example!

Input Mix Output

11755/18797 87

slide-86
SLIDE 86

Another Example

  • Three instruments..

11755/18797 88

slide-87
SLIDE 87

The Notes

  • Three instruments..

11755/18797 89

slide-88
SLIDE 88

ICA for data exploration

  • The “bases” in PCA

represent the “building blocks”

– Ideally notes

  • Very successfully used
  • So can ICA be used to

do the same?

11755/18797 90

slide-89
SLIDE 89

ICA vs PCA bases

Non-Gaussian data ICA PCA

  • Motivation for using ICA vs PCA
  • PCA will indicate orthogonal directions
  • f maximal variance
  • May not align with the data!
  • ICA finds directions that are

independent

  • More likely to “align” with the data

11755/18797 91

slide-90
SLIDE 90

Finding useful transforms with ICA

  • Audio preprocessing

example

  • Take a lot of audio snippets

and concatenate them in a big matrix, do component analysis

  • PCA results in the DCT bases
  • ICA returns time/freq

localized sinusoids which is a better way to analyze sounds

  • Ditto for images

– ICA returns localizes edge filters

11755/18797 92

slide-91
SLIDE 91

Example case: ICA-faces vs. Eigenfaces

ICA-faces Eigenfaces

11755/18797 93

slide-92
SLIDE 92

ICA for Signal Enhncement

  • Very commonly used to enhance EEG signals
  • EEG signals are frequently corrupted by

heartbeats and biorhythm signals

  • ICA can be used to separate them out

11755/18797 94

slide-93
SLIDE 93

So how does that work?

  • There are 12 notes in the segment, hence we

try to estimate 12 notes..

11755/18797 95

slide-94
SLIDE 94

PCA solution

  • There are 12 notes in the segment, hence we

try to estimate 12 notes..

11755/18797 96

slide-95
SLIDE 95

So how does this work: ICA solution

  • Better..

– But not much

  • But the issues here?

11755/18797 97

slide-96
SLIDE 96

ICA Issues

  • No sense of order

– Unlike PCA

  • Get K independent directions, but does not have a notion
  • f the “best” direction

– So the sources can come in any order – Permutation invariance

  • Does not have sense of scaling

– Scaling the signal does not affect independence

  • Outputs are scaled versions of desired signals in permuted
  • rder

– In the best case – In worse case, output are not desired signals at all..

11755/18797 98

slide-97
SLIDE 97

What else went wrong?

  • Notes are not independent

– Only one note plays at a time – If one note plays, other notes are not playing

  • Will deal with these later in the course..

11755/18797 99