Processing Regression and Prediction Class 14. 25 Oct 2016 - PowerPoint PPT Presentation

Machine Learning for Signal Processing Regression and Prediction Class 14. 25 Oct 2016 Instructor: Bhiksha Raj 11755/18797 1

A Common Problem • Can you spot the glitches? 11755/18797 2

How to fix this problem? • “Glitches” in audio – Must be detected – How? • Then what? • Glitches must be “fixed” – Delete the glitch • Results in a “hole” – Fill in the hole – How? 11755/18797 3

Interpolation.. • “Extend” the curve on the left to “predict” the values in the “blank” region – Forward prediction • Extend the blue curve on the right leftwards to predict the blank region – Backward prediction • How? – Regression analysis.. 11755/18797 4

Detecting the Glitch OK NOT OK • Regression-based reconstruction can be done anywhere • Reconstructed value will not match actual value • Large error of reconstruction identifies glitches 11755/18797 5

What is a regression • Analyzing relationship between variables • Expressed in many forms • Wikipedia – Linear regression, Simple regression, Ordinary least squares, Polynomial regression, General linear model, Generalized linear model, Discrete choice, Logistic regression, Multinomial logit, Mixed logit, Probit, Multinomial probit, …. • Generally a tool to predict variables 11755/18797 6

Regressions for prediction • y = f( x ; Q ) + e • Different possibilities – y is a scalar • y is real • y is categorical (classification) – y is a vector – x is a vector • x is a set of real valued variables • x is a set of categorical variables • x is a combination of the two – f( . ) is a linear or affine function – f( . ) is a non-linear function – f( . ) is a time-series model 11755/18797 7

A linear regression Y X • Assumption: relationship between variables is linear – A linear trend may be found relating x and y – y = dependent variable – x = explanatory variable – Given x , y can be predicted as an affine function of x 11755/18797 8

An imaginary regression.. • http://pages.cs.wisc.edu/~kovar/hall.html • Check this shit out (Fig. 1). That's bonafide, 100%-real data, my friends. I took it myself over the course of two weeks. And this was not a leisurely two weeks, either; I busted my ass day and night in order to provide you with nothing but the best data possible. Now, let's look a bit more closely at this data, remembering that it is absolutely first-rate. Do you see the exponential dependence? I sure don't. I see a bunch of crap. Christ, this was such a waste of my time. Banking on my hopes that whoever grades this will just look at the pictures, I drew an exponential through my noise. I believe the apparent legitimacy is enhanced by the fact that I used a complicated computer program to make the fit. I understand this is the same process by which the top quark was discovered. 11755/18797 9

Linear Regressions • y = a T x + b + e – e = prediction error • Given a “training” set of { x, y } values: estimate a and b – y 1 = a T x 1 + b + e 1 – y 2 = a T x 2 + b + e 2 – y 3 = a T x 3 + b + e 3 – … • If a and b are well estimated, prediction error will be small 11755/18797 10

Linear Regression to a scalar y 1 = a T x 1 + b + e 1 y 2 = a T x 2 + b + e 2 y 3 = a T x 3 + b + e 3  Define:      y [ y y y ...] x x x a  b  A 1 2 3 X 1 2 3 ...        1 1  1  e [ e e e ...] 1 2 3 • Rewrite   T y A X e 11755/18797 11

Learning the parameters   T y A X e  ˆ T y A X Assuming no error • Given training data: several x , y • Can define a “divergence”: D(𝐳, 𝐳 ) – Measures how much 𝐳 differs from y – Ideally, if the model is accurate this should be small • Estimate a , b to minimize D(𝐳, 𝐳 ) 11755/18797 12

The prediction error as divergence y 1 = a T x 1 + b + e 1 y 2 = a T x 2 + b + e 2 y 3 = a T x 3 + b + e 3     ˆ T y a X e y e      ˆ 2 2 2 D(y, y ) E e e e ... 1 2 3           T 2 T 2 T 2 ( y a x b ) ( y a x b ) ( y a x b ) ... 1 1 2 2 3 3    2 T      T T T E y A X y A X y A X • Define divergence as sum of the squared error in predicting y 11755/18797 13

Prediction error as divergence • y = A T x + e – e = prediction error – Find the “slope” a such that the total squared length of the error lines is minimized 11755/18797 14

Solving a linear regression   T y A X e • Minimize squared error   T 2 E || y A X || T    A y pinv X    T y T A pinv X 11755/18797 15

More Explicitly • Minimize squared error      T 2 T T T E || y X A || ( y A X )( y A X )   T T T T yy A XX A - 2 yX A • Differentiating w.r.t A and equating to 0     T T T d E 2 A XX - 2 yX d A 0     -1   T T T A yX XX y pinv X   -1  T T A XX Xy 11755/18797 16

Regression in multiple dimensions y 1 = A T x 1 + b + e 1 y i is a vector y 2 = A T x 2 + b + e 2 y 3 = A T x 3 + b + e 3 y ij = j th component of vector y i a i = i th column of A • Also called multiple regression b j = j th component of b • Equivalent of saying: T x i + b 1 + e i1 y i1 = a 1 T x i + b 2 + e i2 y i2 = a 2 y i = A T x i + b + e i T x i + b 3 + e i3 y i3 = a 3 • Fundamentally no different from N separate single regressions – But we can use the relationship between y s to our benefit 11755/18797 17

Multiple Regression     A x x x ˆ Y    b 1 2 3 [ y y y ...] X ... A     1 2 3   1 1 1   E  [ e e e ...] 1 2 3 ˆ   T Y A X E  2 ˆ x   T DIV y A i i i • Minimizing     ˆ -1   T T T A Y pinv X YX XX   ˆ -1  T T A XX XY 11755/18797 18

A Different Perspective = + • y is a noisy reading of A T x   T y A x e • Error e is Gaussian  2 I e ~ N ( 0 , ) • Estimate A from   Y [ y y ... y ] X [ x x ... x ] 1 2 N 1 2 N 11755/18797 19

The Likelihood of the data    T 2 I y A x e e ~ N ( 0 , ) • Probability of observing a specific y , given x , for a particular matrix A   T 2 I P ( y | x ; A ) N ( y ; A x , ) • Probability of collection: 1 2 N 1 2 N Y [ y y ... y ] X [ x x ... x ]      T 2 I P ( Y | X ; A ) N ( y ; A x , ) i i i • Assuming IID for convenience (not necessary) 11755/18797 20

A Maximum Likelihood Estimate      2 I T Y [ y y ... y ] X [ x x ... x ] y A x e e ~ N ( 0 , ) 1 2 N 1 2 N    1 1  2    T  P ( Y | X ) exp y A x  i i  2   2 2 D ( 2 ) i 1  2    T log P ( Y | X ; A ) C y A x  i i 2 2 i • Maximizing the log probability is identical to minimizing the error – Identical to the least squares solution       -1 -1    T T T T T A YX XX Y pinv X A XX XY 11755/18797 21

Predicting an output • From a collection of training data, have learned A • Given x for a new instance, but not y , what is y ?  ˆ T • Simple solution: y A X 11755/18797 22

Applying it to our problem • Prediction by regression • Forward regression • x t = a 1 x t-1 + a 2 x t-2 … a k x t-k + e t • Backward regression • x t = b 1 x t+1 + b 2 x t+2 … b k x t+k + e t 11755/18797 23

Applying it to our problem • Forward prediction       x x x .. x e    t t 1 t 2 t K t       x x x .. x e               t 1 t 2 t 3 t K 1 t 1 a       t .. .. .. .. .. ..        x   x x .. x   e     K 1 K K 1 1 K 1   x X a e t  pinv ( X ) x a t 11755/18797 24

Applying it to our problem • Backward prediction       x x x .. x e       t K 1 t t 1 t K t K 1       x x x .. x e                 t K 2 t 1 t 2 t K 1 t K 2 b       t .. .. .. .. .. ..        x   x x .. x   e   1 K 1 K 2 1   x X b e t  pinv ( X ) x b t 11755/18797 25

Finding the burst • At each time – Learn a “forward” predictor a t est = S i a t,k x t-k – At each time, predict next sample x t – Compute error: ferr t =| x t -x t est | 2 – Learn a “backward” predict and compute backward error • berr t – Compute average prediction error over window, threshold – If the error exceeds a threshold, identify burst 11755/18797 26

Processing Regression and Prediction Class 14. 25 Oct 2016 - PowerPoint PPT Presentation

Machine Learning for Signal Processing Regression and Prediction Class 14. 25 Oct 2016 Instructor: Bhiksha Raj 11755/18797 1 A Common Problem Can you spot the glitches? 11755/18797 2 How to fix this problem? Glitches in

FOOD PROCESSING FOOD PROCESSING GREEN BEAN PROCESSING GREEN BEAN PROCESSING GREEN BEAN

Ballot Processing | PP 2016 Ballot Processing | PP 2016 Keys to processing the PP from Heidi Hunt,

STAR-CCM+ Pre/Post Processing Bill Jester, CD-adapco Introduction Pre/Post Processing

61A Lecture 30 Announcements Data Processing Data Processing 4 Data Processing Many data sets

Annotation Processing in a Kotlin World Zac Sweers @pandanomic Annotation Processing in a

Signal Processing - Introduction Signal Processing Analogue/digital filters: extensively used

Cryosat Processing Prototype Cryosat Processing Prototype (CPP) (CPP) CRYOSAT LRM, TRK and SAR

Image Processing Tricks in Image Processing Tricks in OpenGL OpenGL Simon Green Simon Green

f Fermilab SRF Cavity Processing for SRF Cavity Processing for Project X and ILC R& D

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Post Processing Effects By Michael Michuki What is Post processing? Post Processing is the

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Traditional Processing Pipeline Roman Kern <rkern@tugraz.at>

Stream Processing Marco Serafini COMPSCI 532 Lecture 5 Stream vs. Batch Processing Batch

Digital Signal Processing Solutions Digital Signal Processing Solutions SIGNAL PROCESSING

Decision Aid Methodologies In Transportation Lecture 10: Data Mining in Transport

Welcome to Todays Webinar October 20, 2020 Managing Your Historic Campus Facilities in

Welcome to ResponsibleSteel Members Meeting 24 th & 25 th June 2020 th June Respo ponsi

Data Science in the Wild Lecture 1: Introduction Eran Toch Data Science in the Wild, Spring 2019

7. Separating Hyperplane Theorems II Daisuke Oyama Mathematics II May 7, 2020 Farkas Lemma

Extending a Base Product for Multiple Customers Denis Defreyne MediaGeniX NG 1 2 Your

(Sub)Gradients and Convexity (contd) A subdifferential is the closed convex set of all

Fictitious Play beats Simplex for fractional packing and covering Christos Koufogiannakis and

Processing Regression and Prediction Class 14. 25 Oct 2016 - PowerPoint PPT Presentation

Machine Learning for Signal Processing Regression and Prediction Class 14. 25 Oct 2016 Instructor: Bhiksha Raj 11755/18797 1 A Common Problem Can you spot the glitches? 11755/18797 2 How to fix this problem? Glitches in

FOOD PROCESSING FOOD PROCESSING GREEN BEAN PROCESSING GREEN BEAN PROCESSING GREEN BEAN

Ballot Processing | PP 2016 Ballot Processing | PP 2016 Keys to processing the PP from Heidi Hunt,

STAR-CCM+ Pre/Post Processing Bill Jester, CD-adapco Introduction Pre/Post Processing

61A Lecture 30 Announcements Data Processing Data Processing 4 Data Processing Many data sets

Annotation Processing in a Kotlin World Zac Sweers @pandanomic Annotation Processing in a

Signal Processing - Introduction Signal Processing Analogue/digital filters: extensively used

Cryosat Processing Prototype Cryosat Processing Prototype (CPP) (CPP) CRYOSAT LRM, TRK and SAR

Image Processing Tricks in Image Processing Tricks in OpenGL OpenGL Simon Green Simon Green

f Fermilab SRF Cavity Processing for SRF Cavity Processing for Project X and ILC R&amp; D

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Post Processing Effects By Michael Michuki What is Post processing? Post Processing is the

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Traditional Processing Pipeline Roman Kern &lt;rkern@tugraz.at&gt;

Stream Processing Marco Serafini COMPSCI 532 Lecture 5 Stream vs. Batch Processing Batch

Digital Signal Processing Solutions Digital Signal Processing Solutions SIGNAL PROCESSING

Decision Aid Methodologies In Transportation Lecture 10: Data Mining in Transport

Welcome to Todays Webinar October 20, 2020 Managing Your Historic Campus Facilities in

Welcome to ResponsibleSteel Members Meeting 24 th &amp; 25 th June 2020 th June Respo ponsi

Data Science in the Wild Lecture 1: Introduction Eran Toch Data Science in the Wild, Spring 2019

7. Separating Hyperplane Theorems II Daisuke Oyama Mathematics II May 7, 2020 Farkas Lemma

Extending a Base Product for Multiple Customers Denis Defreyne MediaGeniX NG 1 2 Your

(Sub)Gradients and Convexity (contd) A subdifferential is the closed convex set of all

Fictitious Play beats Simplex for fractional packing and covering Christos Koufogiannakis and

f Fermilab SRF Cavity Processing for SRF Cavity Processing for Project X and ILC R& D

Natural Language Processing: Traditional Processing Pipeline Roman Kern <rkern@tugraz.at>

Welcome to ResponsibleSteel Members Meeting 24 th & 25 th June 2020 th June Respo ponsi