independent component independent component analysis y
play

Independent Component Independent Component Analysis y Class 20. 8 - PowerPoint PPT Presentation

11-755 Machine Learning for Signal Processing Independent Component Independent Component Analysis y Class 20. 8 Nov 2012 Instructor: Bhiksha Raj 8 Nov 2012 11755/18797 1 A brief review of basic probability Uncorrelated: Two random


  1. 11-755 Machine Learning for Signal Processing Independent Component Independent Component Analysis y Class 20. 8 Nov 2012 Instructor: Bhiksha Raj 8 Nov 2012 11755/18797 1

  2. A brief review of basic probability  Uncorrelated: Two random variables X and Y are uncorrelated iff: uncorrelated iff:  The average value of the product of the variables equals the product of their individual averages  Setup: Each draw produces one instance of X and one instance of Y instance of Y  I.e one instance of (X,Y)  E[XY] = E[X]E[Y]  E[XY] E[X]E[Y]  The average value of X is the same regardless of the value of Y 8 Nov 2012 11755/18797 2

  3. Uncorrelatedness  Which of the above represent uncorrelated RVs? 8 Nov 2012 11755/18797 3

  4. A brief review of basic probability  Independence: Two random variables X and Y are independent iff:  Their joint probability equals the product of their individual probabilities  P(X Y) = P(X)P(Y)  P(X,Y) P(X)P(Y)   The average value of X is the same regardless of the value of Y  E[X|Y] = E[X] 8 Nov 2012 11755/18797 4

  5. A brief review of basic probability  Independence: Two random variables X and Y are independent iff:  The average value of any function X is the same The average value of any function X is the same regardless of the value of Y  E[f(X)g(Y)] = E[f(X)] E[g(Y)] for all f(), g() E[f(X) (Y)] E[f(X)] E[ (Y)] f ll f() () 8 Nov 2012 11755/18797 5

  6. Independence  Which of the above represent independent RVs?  Which represent uncorrelated RVs? Whi h t l t d RV ? 8 Nov 2012 11755/18797 6

  7. A brief review of basic probability p(x) ( ) f(x)  The expected value of an odd function of an RV is 0 if  The RV is 0 mean  The RV is 0 mean  The PDF is of the RV is symmetric around 0  E[f(X)] = 0 if f(X) is odd symmetric E[f(X)] 0 if f(X) i dd t i 8 Nov 2012 11755/18797 7

  8. A brief review of basic info. theory T(all), M(ed), S(hort)…    ( ) ( )[ log ( )] H X P X P X X  Entropy: The minimum average number of bits to  Entropy: The minimum average number of bits to transmit to convey a symbol X X T, M, S… M F F M..  Y Y   ( , ) ( , )[ log ( , )] H X Y P X Y P X Y , X Y  Joint entropy: The minimum average number of bits to convey sets (pairs here) of symbols 8 Nov 2012 11755/18797 8

  9. A brief review of basic info. theory X X T, M, S… M F F M.. Y        ( | ) ( ) ( | )[ log ( | )] ( , )[ log ( | )] H X Y P Y P X Y P X Y P X Y P X Y , Y X X Y  Conditional Entropy: The minimum average number of bits to transmit to convey a symbol X, y y , after symbol Y has already been conveyed  Averaged over all values of X and Y  Averaged over all values of X and Y 8 Nov 2012 11755/18797 9

  10. A brief review of basic info. theory              ( | ) ( ) ( | )[ log ( | )] ( ) ( )[ log ( )] ( ) H X Y P Y P X Y P X Y P Y P X P X H X Y X Y X  Conditional entropy of X = H(X) if X is  Conditional entropy of X = H(X) if X is independent of Y         ( ( , ) ) ( ( , )[ )[ log l ( ( , )] )] ( ( , )[ )[ log l ( ( ) ) ( ( )] )] H H X X Y Y P P X X Y Y P P X X Y Y P P X X Y Y P P X X P P Y Y , , X Y X Y        ( , ) log ( ) ( , ) log ( ) ( ) ( ) P X Y P X P X Y P Y H X H Y X , Y X , Y  Joint entropy of X and Y is the sum of the entropies of X and Y if they are independent p y p 8 Nov 2012 11755/18797 10

  11. Onward.. 8 Nov 2012 11755/18797 11

  12. Projection: multiple notes j M = W =  P = W (W T W) ‐ 1 W T ( )  Projected Spectrogram = P * M 8 Nov 2012 11755/18797 12

  13. We’re actually computing a score M = H = ? W =  M ~ WH  H = pinv (W)M 8 Nov 2012 11755/18797 13

  14. How about the other way? M = H = ? ? ? ? U = U = W = W =  M ~ WH W = M pinv (V) U = WH 8 Nov 2012 11755/18797 14

  15. So what are we doing here? H = ? W = ?  M ~ WH is an approximation  Given W , estimate H to minimize error    2     2 arg min || || arg min ( ) H M W H M W H F ij ij H H i j  Must ideally find transcription of given notes 8 Nov 2012 11755/18797 15

  16. Going the other way.. H W =? ?  M ~ WH is an approximation  Given H , estimate W to minimize error    2     2 arg min || || arg min ( ) W M W H M W H F ij ij W H i j  Must ideally find the notes corresponding to the d ll f d h d h transcription 8 Nov 2012 11755/18797 16

  17. When both parameters are unknown H = ? W =? approx(M) = ? approx(M) ?  Must estimate both H and W to best approximate M  Ideally, must learn both the notes and their transcription! 8 Nov 2012 11755/18797 17

  18. A least squares solution   2 , arg min || || W H M W H , F W H  Unconstrained  For any W,H that minimizes the error, W’=WA, H’=A -1 H also minimizes the error for any invertible A also minimizes the error for any invertible A H H  For our problem, lets consider the “truth”.. For our problem, lets consider the truth ..  When one note occurs, the other does not T h j = 0 for all i != j  h i i j  The rows of H are uncorrelated 8 Nov 2012 11755/18797 18

  19. A least squares solution H  Assume: HH T = I  Normalizing all rows of H to length 1 g g  pinv (H) = H T  Projecting M onto H  Projecting M onto H  W = M pinv (H) = MH T  WH = M H T H  WH M H H   2 , arg min || || W H M W H , F W H   2 T H arg min || || H M M H Constraint: Rank(H) = 4 F H 8 Nov 2012 11755/18797 19

  20. Finding the notes   2 T H arg min || || H M M H F H  Note H T H != I  Only HH T = I  Could also be rewritten as       T T arg min ( ) H trace M I H H M H H     T T arg min ( ) H trace M M I H H H     T T arg min ( )( ) H M I H H trace Correlatio n H      T T T T arg max ( ) H trace Correlatio n M H H H 8 Nov 2012 11755/18797 20

  21. Finding the notes  Constraint: every row of H has length 1            T T T arg max ( ) H trace Correlatio n M H H trace H H H  Differentiating and equating to 0  H   T ( ( M ) M ) Correlatio Correlatio n n H H H  Simply requiring the rows of H to be orthonormal p y q g gives us that H is the set of Eigenvectors of the data in M T 8 Nov 2012 11755/18797 21

  22. Equivalences        T T T arg max ( ) H trace Correlatio n M H H trace H H H  is identical to         2 2 T , arg min || || || || W H M W H h h h , F i i ij i j W H  i i j  Minimize least squares error with the constraint that the rows of H are length 1 and orthogonal to one another 8 Nov 2012 11755/18797 22

  23. So how does that work?  There are 12 notes in the segment, hence we try to estimate 12 notes to estimate 12 notes.. 8 Nov 2012 11755/18797 23

  24. So how does that work?  The first three “notes” and their contributions  The spectrograms of the notes are statistically uncorrelated The spectrograms of the notes are statistically uncorrelated 8 Nov 2012 11755/18797 24

  25. Finding the notes  Can find W instead of H   2 2 T T arg min i || || || || W W M M W W W W M M F W  Solving the above with the constraints that the  Solving the above, with the constraints that the columns of W are orthonormal gives you the eigen vectors of the data in M eigen vectors of the data in M        T T arg max ( ) W W W M W W trace Correlatio n trace W   ( ) Correlatio n M W W 8 Nov 2012 11755/18797 25

  26. So how does that work?  There are 12 notes in the segment, hence we try to estimate 12 notes.. 8 Nov 2012 11755/18797 26

  27. Our notes are not orthogonal  Overlapping frequencies O l i f i  Note occur concurrently  Harmonica continues to resonate to previous note  More generally, simple orthogonality will not give us the desired solution 8 Nov 2012 11755/18797 28

  28. What else can we look for?  Assume: The “transcription” of one note does not p depend on what else is playing  Or, in a multi ‐ instrument piece, instruments are playing independently of one another  Not strictly true, but still.. 8 Nov 2012 11755/18797 29

  29. Formulating it with Independence     2 , arg min || || ( . . . . ) W H M W H rows of H are independen t , F W H  Impose statistical independence constraints on  Impose statistical independence constraints on decomposition 8 Nov 2012 11755/18797 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend