Machine Learning for Signal Processing Regression and Prediction - PowerPoint PPT Presentation

Machine Learning for Signal Processing Regression and Prediction Class 14. 17 Oct 2012 Instructor: Bhiksha Raj 17 Oct 2013 11755/18797 1

Matrix Identities   df dx   1 dx     1 x 1 df     x dx  x x  f ( ) x    2  df ( ) 2 dx ... 2      x  ...   D df   dx D    dx  D • The derivative of a scalar function w.r.t. a vector is a vector 17 Oct 2013 11755/18797 2

Matrix Identities   df df df dx dx dx   11 12 1 D   dx dx dx .. x x .. x   11 12 1 D 11 12 1 D df df df     .. x x .. x dx dx dx  x  x x f ( )   df ( ) 21 22 2 D  21 22 2 D  dx dx dx 21 22 .. 2 D .. .. .. ..     .. .. ..    x x .. x  .. df df df D 1 D 2 DD   dx dx dx  D 1 D 2 DD   dx dx dx  D 1 D 2 DD • The derivative of a scalar function w.r.t. a vector is a vector • The derivative w.r.t. a matrix is a matrix 17 Oct 2013 11755/18797 3

Matrix Identities   dF dF dF 1 1 1 dx dx dx   1 2 D dx dx dx     1 2 D     dF F x .. dF dF dF 1   1 1       2 2 2 dx dx dx dF ..  F x   1 2 D    2  dx dx dx F ( x ) F x  2   2  .. ... 1 2 D     ... ... .. .. .. ..     dF      F   x  N dF dF dF   N D N N N dx dx dx  1 2 D   dx dx dx  1 2 D • The derivative of a vector function w.r.t. a vector is a matrix – Note transposition of order 17 Oct 2013 11755/18797 4

Derivatives , , UxV Nx1 UxV NxUxV Nx1 UxVxN • In general: Differentiating an MxN function by a UxV argument results in an MxNxUxV tensor derivative 17 Oct 2013 11755/18797 5

Matrix derivative identities X is a matrix, a is a vector.   T T d ( Xa ) X d a d ( a X ) X d a Solution may also be X T   d ( AX ) ( d A ) X ; d ( XA ) X ( d A ) A is a matrix     a   T T T d a Xa a X X d                 T T T T d trace A XA d trace XAA d trace AA X ( X X ) d A • Some basic linear and quadratic identities 17 Oct 2013 11755/18797 6

A Common Problem • Can you spot the glitches? 17 Oct 2013 11755/18797 7

How to fix this problem? • “Glitches” in audio – Must be detected – How? • Then what? • Glitches must be “fixed” – Delete the glitch • Results in a “hole” – Fill in the hole – How? 17 Oct 2013 11755/18797 8

Interpolation.. • “Extend” the curve on the left to “predict” the values in the “blank” region – Forward prediction • Extend the blue curve on the right leftwards to predict the blank region – Backward prediction • How? – Regression analysis.. 17 Oct 2013 11755/18797 9

Detecting the Glitch OK NOT OK • Regression-based reconstruction can be done anywhere • Reconstructed value will not match actual value • Large error of reconstruction identifies glitches 17 Oct 2013 11755/18797 10

What is a regression • Analyzing relationship between variables • Expressed in many forms • Wikipedia – Linear regression, Simple regression, Ordinary least squares, Polynomial regression, General linear model, Generalized linear model, Discrete choice, Logistic regression, Multinomial logit, Mixed logit, Probit, Multinomial probit, …. • Generally a tool to predict variables 17 Oct 2013 11755/18797 11

Regressions for prediction • y = f( x ; Q ) + e • Different possibilities – y is a scalar • y is real • y is categorical (classification) – y is a vector – x is a vector • x is a set of real valued variables • x is a set of categorical variables • x is a combination of the two – f( . ) is a linear or affine function – f( . ) is a non-linear function – f( . ) is a time-series model 17 Oct 2013 11755/18797 12

A linear regression Y X • Assumption: relationship between variables is linear – A linear trend may be found relating x and y – y = dependent variable – x = explanatory variable – Given x , y can be predicted as an affine function of x 17 Oct 2013 11755/18797 13

An imaginary regression.. • http://pages.cs.wisc.edu/~kovar/hall.html • Check this shit out (Fig. 1). That's bonafide, 100%-real data, my friends. I took it myself over the course of two weeks. And this was not a leisurely two weeks, either; I busted my ass day and night in order to provide you with nothing but the best data possible. Now, let's look a bit more closely at this data, remembering that it is absolutely first-rate. Do you see the exponential dependence? I sure don't. I see a bunch of crap. Christ, this was such a waste of my time. Banking on my hopes that whoever grades this will just look at the pictures, I drew an exponential through my noise. I believe the apparent legitimacy is enhanced by the fact that I used a complicated computer program to make the fit. I understand this is the same process by which the top quark was discovered. 17 Oct 2013 11755/18797 14

Linear Regressions • y = Ax + b + e – e = prediction error • Given a “training” set of { x, y } values: estimate A and b – y 1 = Ax 1 + b + e 1 – y 2 = Ax 2 + b + e 2 – y 3 = Ax 3 + b + e 3 – … • If A and b are well estimated, prediction error will be small 17 Oct 2013 11755/18797 15

Linear Regression to a scalar y 1 = a T x 1 + b + e 1 y 2 = a T x 2 + b + e 2 y 3 = a T x 3 + b + e 3  Define:      y [ y y y ...] x x x a  b  A 1 2 3 1 2 3 ... X        1 1  1  e [ e e e ...] 1 2 3 • Rewrite   T y A X e 17 Oct 2013 11755/18797 16

Learning the parameters   T y A X e  ˆ T y A X Assuming no error • Given training data: several x , y ˆ • Can define a “divergence”: D( y , ) y ˆ – Measures how much differs from y y – Ideally, if the model is accurate this should be small ˆ • Estimate A , b to minimize D( y , ) y 17 Oct 2013 11755/18797 17

The prediction error as divergence y 1 = a T x 1 + b + e 1 y 2 = a T x 2 + b + e 2 y 3 = a T x 3 + b + e 3     ˆ T y A X e y e      ˆ 2 2 2 D(y, y ) E e e e ... 1 2 3           T 2 T 2 T 2 ( y a x b ) ( y a x b ) ( y a x b ) ... 1 1 2 2 3 3    2 T      T T T E y A X y A X y A X • Define divergence as sum of the squared error in predicting y 17 Oct 2013 11755/18797 18

Prediction error as divergence • y = a T x + e – e = prediction error – Find the “slope” a such that the total squared length of the error lines is minimized 17 Oct 2013 11755/18797 19

Solving a linear regression   T y A X e • Minimize squared error      T 2 T T T E || y X A || ( y A X )( y A X )   T T T T yy A XX A - 2 yX A • Differentiating w.r.t A and equating to 0     T T T d E 2 A XX - 2 yX d A 0       -1 -1    T T T T T A yX XX y pinv X A XX Xy 17 Oct 2013 11755/18797 20

Regression in multiple dimensions y 1 = A T x 1 + b + e 1 y i is a vector y 2 = A T x 2 + b + e 2 y 3 = A T x 3 + b + e 3 y ij = j th component of vector y i a i = i th column of A • Also called multiple regression b j = j th component of b • Equivalent of saying: T x i + b 1 + e i1 y i1 = a 1 T x i + b 2 + e i2 y i2 = a 2 y i = A T x i + b + e i T x i + b 3 + e i3 y i3 = a 3 • Fundamentally no different from N separate single regressions – But we can use the relationship between y s to our benefit 17 Oct 2013 11755/18797 21

Multiple Regression     A x x x ˆ Y    b 1 2 3 [ y y y ...] X ... A     1 2 3   1 1 1   E  [ e e e ...] 1 2 3 Dx1 vector of ones ˆ   T Y A X E     2 ˆ ˆ ˆ     T T T T DIV y A x trace ( Y A X )( Y A X ) i i i • Differentiating and equating to 0   ˆ ˆ ˆ     T T T T T d . Div 2 Y - A X X d A 0 YX A XX       ˆ ˆ -1 -1    T T T T T A YX XX Y pinv X A XX XY 17 Oct 2013 11755/18797 22

A Different Perspective = + • y is a noisy reading of A T x   T y A x e • Error e is Gaussian  2 I e ~ N ( 0 , ) • Estimate A from   Y [ y y ... y ] X [ x x ... x ] 1 2 N 1 2 N 17 Oct 2013 11755/18797 23

Machine Learning for Signal Processing Regression and Prediction - PowerPoint PPT Presentation

Machine Learning for Signal Processing Regression and Prediction Class 14. 17 Oct 2012 Instructor: Bhiksha Raj 17 Oct 2013 11755/18797 1 Matrix Identities df dx 1 dx 1 x 1 df

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Machine Learning for Signal Processing Lecture 1: Signal Representations Class 1. 27 August

Signal Processing - Introduction Signal Processing Analogue/digital filters: extensively used

Regression: Simple and Linear Introduction to Machine Learning Regression Principle REGRESSION

Digital Signal Processing Solutions Digital Signal Processing Solutions SIGNAL PROCESSING

Speech Processing 15-492/18-492 Speech Synthesis Signal Processing Signal Manipulation Signal

Tx Signal: 1000 Hz sine wave; Attenuation; Random noise with 0.5ms spike Tx Signal Noise Rx

Waveform Generation Fundamental part of signal processing is the signal. Within the

MACHINE LEARNING Linear and Weighted Regression Support Vector Regression 1 APPLIED MACHINE

10-601 Machine Learning Regression Outline Regression vs Classification Linear regression

Efficient audio signal processing using LLVM and Haskell Henning Thielemann 2013-04-30

E9 205 Machine Learning for Signal Processing Linear Models for Regression and 23-09-2019

Advanced Digital Signal Processing Part 5: Multi-Rate Digital Signal Processing Gerhard Schmidt

VLSI Digital Signal Processing Systems Keshab K. Parhi VLSI Digital Signal Processing Systems

Signal Processing in MATLAB Signal Processing in MATLAB February 2, 1998 Tom Krauss PhD Student

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Generality & ExistenceIII Predication& Identity Greg Restall arch, st andrews 2

Commitment Schemes A commitment scheme CS = ( P , C , V ) is a triple of algorithms P

Commercial MLaaSPlatforms Yun-Yun Tsai & Tsung-Yi Ho National Tsing Hua University #BHUSA

Comments on Anson et al.s paper Postal Economics Conference in Toulouse, 31 st March and 1 st

Data-Driven Inference and Observationally Complete Devices joint work with: M. DallArno, A.

Error bounds for approximations of coherent lower previsions Damjan kulj University of

Dark Matter direct detection and Bayesian statistics BASED ON: CA, J. Hamann and Y. Wong,

The Swiss Spirit: Going Far, Going Together Slides will be made available on the school website

Machine Learning for Signal Processing Regression and Prediction - PowerPoint PPT Presentation

Machine Learning for Signal Processing Regression and Prediction Class 14. 17 Oct 2012 Instructor: Bhiksha Raj 17 Oct 2013 11755/18797 1 Matrix Identities df dx 1 dx 1 x 1 df

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Machine Learning for Signal Processing Lecture 1: Signal Representations Class 1. 27 August

Signal Processing - Introduction Signal Processing Analogue/digital filters: extensively used

Regression: Simple and Linear Introduction to Machine Learning Regression Principle REGRESSION

Digital Signal Processing Solutions Digital Signal Processing Solutions SIGNAL PROCESSING

Speech Processing 15-492/18-492 Speech Synthesis Signal Processing Signal Manipulation Signal

Tx Signal: 1000 Hz sine wave; Attenuation; Random noise with 0.5ms spike Tx Signal Noise Rx

Waveform Generation Fundamental part of signal processing is the signal. Within the

MACHINE LEARNING Linear and Weighted Regression Support Vector Regression 1 APPLIED MACHINE

10-601 Machine Learning Regression Outline Regression vs Classification Linear regression

Efficient audio signal processing using LLVM and Haskell Henning Thielemann 2013-04-30

E9 205 Machine Learning for Signal Processing Linear Models for Regression and 23-09-2019

Advanced Digital Signal Processing Part 5: Multi-Rate Digital Signal Processing Gerhard Schmidt

VLSI Digital Signal Processing Systems Keshab K. Parhi VLSI Digital Signal Processing Systems

Signal Processing in MATLAB Signal Processing in MATLAB February 2, 1998 Tom Krauss PhD Student

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Generality &amp; ExistenceIII Predication&amp; Identity Greg Restall arch, st andrews 2

Commitment Schemes A commitment scheme CS = ( P , C , V ) is a triple of algorithms P

Commercial MLaaSPlatforms Yun-Yun Tsai &amp; Tsung-Yi Ho National Tsing Hua University #BHUSA

Comments on Anson et al.s paper Postal Economics Conference in Toulouse, 31 st March and 1 st

Data-Driven Inference and Observationally Complete Devices joint work with: M. DallArno, A.

Error bounds for approximations of coherent lower previsions Damjan kulj University of

Dark Matter direct detection and Bayesian statistics BASED ON: CA, J. Hamann and Y. Wong,

The Swiss Spirit: Going Far, Going Together Slides will be made available on the school website

Generality & ExistenceIII Predication& Identity Greg Restall arch, st andrews 2

Commercial MLaaSPlatforms Yun-Yun Tsai & Tsung-Yi Ho National Tsing Hua University #BHUSA