machine learning for signal
play

Machine Learning for Signal Processing Supervised Representations: - PowerPoint PPT Presentation

Machine Learning for Signal Processing Supervised Representations: Class 19. 8 Nov 2016 Bhiksha Raj Slides by Najim Dehak MLSP 1 Definitions: Variance and Covariance > 0 > 0


  1. Machine Learning for Signal Processing Supervised Representations: Class 19. 8 Nov 2016 Bhiksha Raj Slides by Najim Dehak MLSP 1

  2. Definitions: Variance and Covariance 𝜏 𝑧 𝜏 𝑦𝑧 > 0 β‡’ 𝑒𝑧 𝜏 𝑦 𝑒𝑦 > 0 Variance: S XX = E(XX T ), estimated as S XX = (1/N) XX T β€’ – How β€œspread” is the data in the direction of X 2 = 𝐹(𝑦 2 ) – Scalar version: 𝜏 𝑦 Covariance: S XY = E(XY T ) estimated as S XY = (1/N) XY T β€’ – How much does X predict Y – Scalar version: 𝜏 𝑦𝑧 = 𝐹(𝑦 2 ) MLSP 2

  3. Definition: Whitening Matrix = π‘Œ 𝑗 π‘Œ 𝑗 βˆ’0.5 (π‘Œ βˆ’ π‘Œ ) π‘Ž = Ξ£ π‘Œπ‘Œ If X is already centered βˆ’0.5 π‘Œ 𝑄(π‘Œ) π‘Ž = Ξ£ π‘Œπ‘Œ 𝑄(π‘Ž) βˆ’0.5 β€’ Whitening matrix: Ξ£ π‘Œπ‘Œ β€’ Transforms the variable to unit variance βˆ’1 β€’ Scalar version: 𝜏 𝑦 MLSP 3

  4. Definition: Correlation Coefficient 𝝉 π’šπ’› > 𝟏 𝝇 > 𝟏 𝜏 𝑧 βˆ’1 𝑦 1 𝑦 = 𝝉 𝑦 βˆ’1 𝑧 𝜏 𝑦 𝑧 = 𝝉 𝑧 1 βˆ’0.5 Ξ£ π‘Œπ‘ Ξ£ 𝑍𝑍 βˆ’0.5 β€’ Whitening matrix: Ξ£ π‘Œπ‘Œ 𝜏 𝑦𝑧 β€’ Scalar version: 𝜍 𝑦𝑧 = 𝜏 𝑧 𝜏 𝑧 – Explains how Y varies with X , after normalizing out innate variation of X and Y MLSP 4

  5. MLSP β€’ Application of Machine Learning techniques to the analysis of signals External Knowledge sensor Signal Feature Modeling/ Channel Capture Extraction Regression β€’ Feature Extraction: – Supervised (Guided) representation MLSP 5

  6. Data specific bases? β€’ Issue : The bases we have considered so far are data agnostic – Fourier / Wavelet type bases for all data may not be optimal β€’ Improvement I : The bases we saw next were data specific – PCA, NMF, ICA, ... – The bases changed depending on the data β€’ Improvement II : What if bases are both data specific and task specific? – Basis depends on both the data and a task MLSP 6

  7. Recall: Unsupervised Basis Learning β€’ What is a good basis? – Energy Compaction οƒ  Karkhonen-LoΓ¨ve – Uncorrelated οƒ  PCA – Sparsity οƒ  Sparse Representation, Compressed Sensing, … – Statistically Independent οƒ  ICA β€’ We create a narrative about how the data are created MLSP 7

  8. Supervised Basis Learning? β€’ What is a good basis? – Basis that gives best classification performance – Basis that maximizes shared information with another β€˜view’ β€’ We have some external information guiding our notion of optimal basis – Can we learn a basis for a set of variables that will best predict some value(s) MLSP 8

  9. Regression β€’ Simplest case – Given a bunch of scalar data points predict some value – Years are independent – Temperature is dependent MLSP 9

  10. Regression β€’ Formulation of problem β€’ Let’s solve! MLSP 10

  11. Regression β€’ Expand out the Frobenius norm β€’ Take derivative β€’ Solve for 0 MLSP 11

  12. Regression β€’ This is just basically least squares again β€’ Note that this looks a lot like the following – In the 1- d case where x predicts y this is just … MLSP 12

  13. Multiple Regression β€’ Robot Archer Example – Our robot fires defective arrows at a target β€’ We don’t know how wind might affect their movement, but we’d like to correct for it if possible. – Predict the distance from the center of a target of a fired arrow β€’ Measure wind speed in 3 directions 1 π‘₯ 𝑦 π‘Œ 𝑗 = π‘₯ 𝑧 π‘₯ 𝑨 MLSP 13

  14. Multiple Regression 1 β€’ Wind speed π‘₯ 𝑦 π‘Œ 𝑗 = π‘₯ 𝑧 π‘₯ 𝑨 𝑝 𝑦 β€’ Offset from center in 2 directions 𝑍 𝑗 = 𝑝 𝑧 β€’ Model 𝑍 𝑗 = π›Ύπ‘Œ 𝑗 MLSP 14

  15. Multiple Regression β€’ Answer – Here Y contains measurements of the distance of the arrow from the center – We are fitting a plane – Correlation is basically just the gradient MLSP 15

  16. Canonical Correlation Analysis β€’ Further Generaliztion (CCA) – Do all wind factors affect the position = π΅π‘Œ β€’ Or just some low-dimensional combinations π‘Œ – Do they affect both coordinates individually β€’ Or just some of combination 𝑧 = 𝐢𝑍 x x x x x x x x x x x MLSP 16

  17. Canonical Correlation Analysis β€’ Let’s call the arrow location vector Y and the wind vectors X – Let’s find the projection of the vectors for Y and X respectively that are most correlated w z x x x x x x x x x x x w x w y Best X projection plane Predicts best Y projection MLSP 17

  18. Canonical Correlation Analysis β€’ What do these vectors represent? – Direction of max correlation ignores parts of wind and location data that do not affect each other β€’ Only information about the defective arrow remains! w z x x x x x x x x x x x w x w y Best X projection plane Predicts best Y projection MLSP 18

  19. CCA Motivation and History β€’ Proposed by Hotelling (1936) β€’ Many real world problems involve 2 β€˜views’ of data β€’ Economics – Consumption of wheat is related to the price of potatoes, rice and barley … and wheat – Random vector of prices X – Random vector of consumption Y Y = Consumption X = Prices MLSP 19

  20. CCA Motivation and History β€’ Magnus Borga, David Hardoon popularized CCA as a technique in signal processing and machine learning β€’ Better for dimensionality reduction in many cases MLSP 20

  21. CCA Dimensionality Reduction β€’ We keep only the correlated subspace β€’ Is this always good? – If we have measured things we care about then we have removed useless information MLSP 21

  22. CCA Dimensionality Reduction β€’ In this case: – CCA found a basis component that preserved class distinctions while reducing dimensionality – Able to preserve class in both views MLSP 22

  23. Comparison to PCA β€’ PCA fails to preserve class distinctions as well MLSP 23

  24. Failure of PCA β€’ PCA is unsupervised – Captures the direction of greatest variance (Energy) – No notion of task or hence what is good or bad information – The direction of greatest variance can sometimes be noise – Ok for reconstruction of signal – Catastrophic for preserving class information in some cases MLSP 24

  25. Benefits of CCA β€’ Why did CCA work? – Soft supervision β€’ External Knowledge – The 2 views track each other in a direction that does not correspond to noise – Noise suppression (sometimes) β€’ Preview – If one of the sets of signals are true labels, CCA is equivalent to Linear Discriminant Analysis – Hard Supervision MLSP 25

  26. Multiview Assumption β€’ When does CCA work? – The correlated subspace must actually have interesting signal β€’ If two views have correlated noise then we will learn a bad representation β€’ Sometimes the correlated subspace can be noise – Correlated noise in both sets of views MLSP 26

  27. Multiview Assumption β€’ Why not just concatenate both views? – It does not exploit the extra structure of the signal (more on this in 2 slides) β€’ PCA on joint data will decorrelate all variables – Not good for prediction β€’ We want to decorrelate X and Y, but maximize cross-correlation between X and Y – High dimensionality οƒ  over-fit w z x x x x x x x x x x x w x w y MLSP 27

  28. Multiview Assumption β€’ We can sort of think of a model for how our data might be generated View 1 Source View 2 β€’ We want View 1 independent of View 2 conditioned on knowledge of the source – All correlation is due to source MLSP 28

  29. Multiview Examples β€’ Look at many stocks from different sectors of the economy – Conditioned on the fact that they are part of the same economy they might be independent of one another β€’ Multiple Speakers saying the same sentence β€’ The sentence generates signals from many speakers. Each speaker might be independent of each other conditioned on the sentence View 1 Source View 2 MLSP 29

  30. Multiview Examples http://mlg.postech.ac.kr/static/research/multiview_overview.png MLSP 30

  31. Matrix Representation 𝑗 2 𝐹 = π‘Œ 𝑗 βˆ’ 𝑍 𝑗 𝐘 = [π‘Œ 1 , π‘Œ 2 , … , π‘Œ 𝑂 ] 𝐙 = [𝑍 1 , 𝑍 2 , … , 𝑍 𝑂 ] 2 = π‘Œ 𝑗 π‘ˆ π‘Œ 𝑗 = 𝑒𝑠𝑏𝑑𝑓 𝐘𝐘 π‘ˆ 𝐘 𝐺 𝑗 2 = 𝑒𝑠𝑏𝑑𝑓(𝐘 βˆ’ 𝐙)(𝐘 βˆ’ 𝐙) π‘ˆ 𝐹 = 𝐘 βˆ’ 𝐙 𝐺 β€’ Expressing total error as a matrix operation MLSP 31

  32. Recall: Objective Functions β€’ Least Squares β€’ What is a good basis? – Energy Compaction οƒ  Karkhonen-LoΓ¨ve – Positive Sparse οƒ  NMF – Regression MLSP 32

  33. A Quick Review β€’ Cross Covariance MLSP 33

  34. A Quick Review β€’ The effect of a transform π‘Ž = π‘‰π‘Œ 𝐷 π‘Œπ‘Œ = 𝐹[π‘Œπ‘Œ π‘ˆ ] 𝐷 π‘Žπ‘Ž = 𝐹 π‘Žπ‘Ž π‘ˆ = 𝑉𝐷 π‘Œπ‘Œ 𝑉 π‘ˆ MLSP 34

  35. Recall: Objective Functions β€’ So far our objective needs to external data – No knowledge of task 𝑑. 𝑒. 𝑉 ∈ ℝ 𝑒×𝑙 2 argmin 𝐘 βˆ’ 𝑉𝐙 𝐺 π‘ π‘π‘œπ‘™ 𝑉 = 𝑙 π™βˆˆβ„ 𝑙×𝑂 β€’ CCA requires an extra view – We force both views to look like each other 2 π‘‰βˆˆβ„ 𝑒𝑦×𝑙 , π‘Šβˆˆβ„ 𝑒𝑧×𝑙 𝑉 π‘ˆ 𝐘 βˆ’ 𝑍 π‘ˆ 𝐙 𝐺 min 𝑑. 𝑒. 𝑉 π‘ˆ 𝐷 π‘Œπ‘Œ 𝑉 = 𝐽 𝑙 , π‘Š π‘ˆ 𝐷 𝑍𝑍 π‘Š = 𝐽 𝑙 MLSP 35

  36. Interpreting the CCA Objective β€’ Minimize the reconstruction error between the projections of both views of data β€’ Find the subspaces U,V onto which we project views X and Y such that their correlation is maximized β€’ Find combinations of both views that best predict each other MLSP 36

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend