Regression Given: Dataset D = { ( x i , Y i ) | i = 1 , ..., n } with - PowerPoint PPT Presentation

Regression Given: Dataset D = { ( x i , Y i ) | i = 1 , ..., n } with n tuples x : Object description Y : Numerical target attribute ⇒ regression problem Find a function f : dom ( X 1 ) × . . . × dom ( X k ) → Y minimizing the error e ( f ( x 1 , . . . , x k ) , y ) for all given data objects ( x 1 , . . . , x k , y ) . Remember Instead of finding structure in a data set, we are now focusing on methods that find explanations for an unknown dependency within the data. Supervised (because we know the desired outcome) Descriptive (because we care about explanation) Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 1 / 43 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Regression line Given: A data set for two continuous attributes x and y . It is assumed that there is an approximate linear dependency between x and y : y ≈ a + bx Find a regression line (i.e. determine the parameters a and b ) such that the line fits the data as good as possible. Example Trend estimation (e.g. oil price over time) Epidemiology (e.g. cigarette smoking vs. lifespan ) Finance (e.g. return on investment vs. return on all risky assets) Economics (e.g. consumption vs. available income) Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 2 / 43 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Regression Line y -distance Euclidean distance What is a good fit? Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 3 / 43 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Cost functions Usually, the sum of square errors in y -direction is chosen as cost function (to be minimized). Other reasonable cost functions: mean absolute distance in y -direction mean Euclidean distance maximum absolute distance in y -direction (or equivalently: the maximum squared distance in y -direction) maximum Euclidean distance . . . Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 4 / 43 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Construction Given data ( x i , y i ) ( i = 1 , . . . , n ), the least squares cost function is n (( a + bx i ) − y i ) 2 . � F ( a, b ) = i =1 Goal The y-values that are computed with the linear equation should (in total) deviate as little as possible from the measured values. Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 5 / 43 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Finding the minimum A necessary condition for a minimum of the cost function i =1 (( a + bx i ) − y i ) 2 is that the partial derivatives of this F ( a, b ) = � n function w.r.t the parameters a and b vanish, that is n n ∂F ∂F � � ∂a = 2( a + bx i − y i ) = 0 and ∂b = 2( a + bx i − y i ) x i = 0 i =1 i =1 As a consequence, we obtain the so-called normal equations � n � n � n � n � � n � � � � x 2 � na + x i b = y i and x i a + b = x i y i i i =1 i =1 i =1 i =1 i =1 that is, a two-equation system with two unknowns a and b which has a unique solution (if at least two different x -values exist). Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 6 / 43 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Least squares and MLE A regression line can be interpreted as a maximum likelihood estimator (MLE): Assumption: The data generation process can be described well by the model y = a + bx + ξ, where ξ is normally distributed random variable with mean 0 and (unknown) variance σ 2 . The parameter that minimizes the sum of squared deviations (in y -direction) from the data points maximizes the probability of the data given this model class. Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 7 / 43 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Least squares and MLE Therefore, − ( y − ( a + bx )) 2 1 � � f ( y | x ) = √ 2 πσ 2 · exp , 2 σ 2 leading to the likelihood function L (( x 1 , y 1 ) , . . . ( x n , y n ); a, b, σ 2 ) n � = f ( y i | x i ) i =1 n − ( y i − ( a + bx i )) 2 � � 1 � = · √ 2 πσ 2 · exp . 2 σ 2 i =1 Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 8 / 43 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Least squares and MLE To simplify the computation of derivatives for finding the maximum, we compute the logarithm: ln L (( x 1 , y 1 ) , . . . ( x n , y n ); a, b, σ 2 ) n − ( y i − ( a + bx i )) 2 � � 1 � = ln √ 2 πσ 2 · exp 2 σ 2 i =1 n n 1 1 � � ( y i − ( a + bx i )) 2 = ln √ 2 πσ 2 − 2 σ 2 i =1 i =1 Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 9 / 43 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Least squares and MLE n n 1 1 � � ( y i − ( a + bx i )) 2 ln √ 2 πσ 2 − 2 σ 2 i =1 i =1 From this expression it becomes clear by computing the derivatives w.r.t. the parameters a and b that maximizing the likelihood function is equivalent to minimizing n � ( y i − ( a + bx i )) 2 . F ( a, b ) = i =1 Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 10 / 43 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Regression polynomials The least squares method can be extended to regression polynomials (e.g. x = time, y = distance by constant acceleration) y = p ( x ) = a 0 + a 1 x + . . . + a m x m with a given fixed degree m . We have to minimize the error function n � ( p ( x i ) − y i ) 2 F ( a 0 , . . . , a m ) = i =1 n � i ) − y i ) 2 (( a 0 + a 1 x i + . . . + a m x m = i =1 In analogy to the linear case, we form the partial derivatives of this function w.r.t. the parameters a k , 0 ≤ k ≤ m , and equate them to zero. Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 11 / 43 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Multilinear regression Given: A data set (( x 1 , y 1 ) , . . . , ( x n , y n )) with input vectors x i and corresponding responses y i , 1 ≤ i ≤ n . for which we want to determine the linear regression function m � y = f ( x 1 , . . . , x m ) = a 0 + a k x k . k =1 Example Price of a house depending on its size ( x 1 ) and age ( x 2 ) Ice cream consumption based on the temperature ( x 1 ), the price ( x 2 ) and the family income ( x 3 ) Electric consumption based on the number of flats with one ( x 1 ), two ( x 2 ), three ( x 3 ) and four or more persons ( x 4 ) living in them Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 12 / 43 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Multilinear regression n � ( f ( x i ) − y i ) 2 F ( a 0 , . . . , a m ) = i =1 n � 2 � a 0 + a 1 x ( i ) � 1 + . . . + a m x ( i ) = m − y i i =1 In order to derive the normal equations, it is convenient to write the functional to minimize in matrix form F ( a ) = ( Xa − y ) ⊤ ( Xa − y ) where       a 0 1 x 1 , 1 · · · x 1 ,m y 1 . . . . . ... . . . . . a =  , X =  and y =       . . . . .     a m 1 x n, 1 · · · x n,m y n Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 13 / 43 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Multilinear regression Again a necessary condition for a minimum is that the partial derivatives of this function w.r.t the coefficients a k , 0 ≤ k ≤ m , vanish. Using the differential operator ∇ , we can write these conditions as � ∂ ∇ a F ( a ) = d F ( a ) , ∂ ∂ � d a F ( a ) = F ( a ) , . . . , F ( a ) = 0 ∂a 0 ∂a 1 ∂a m Whereas the differential operator behaves like a vector � ∂ , ∂ ∂ � ∇ a = , . . . , ∂a 0 ∂a 1 ∂a m Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 14 / 43 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Multilinear regression F ( a ) = ( Xa − y ) ⊤ ( Xa − y ) to find the minimum we use the differential operator ∇ ∇ a ( Xa − y ) ⊤ ( Xa − y ) 0 = � ⊤ ( ∇ a ( Xa − y )) ⊤ ( Xa − y ) + � ( Xa − y ) ⊤ ( ∇ a ( Xa − y )) = ( ∇ a ( Xa − y )) ⊤ ( Xa − y ) + ( ∇ a ( Xa − y )) ⊤ ( Xa − y ) = 2 X ⊤ ( Xa − y ) = 2 X ⊤ Xa − 2 X ⊤ y = from which we obtain the system of normal equations X ⊤ Xa = X ⊤ y . Compendium slides for “Guide to Intelligent Data Analysis”, Springer 2011. 15 / 43 � Michael R. Berthold, Christian Borgelt, Frank H¨ c oppner, Frank Klawonn and Iris Ad¨ a

Regression Given: Dataset D = { ( x i , Y i ) | i = 1 , ..., n } with - PowerPoint PPT Presentation

Regression Given: Dataset D = { ( x i , Y i ) | i = 1 , ..., n } with n tuples x : Object description Y : Numerical target attribute regression problem Find a function f : dom ( X 1 ) . . . dom ( X k ) Y minimizing the error e ( f ( x

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

10-601 Machine Learning Regression Outline Regression vs Classification Linear regression

Linear regression How to measure the accuracy of linear regression models Linear Regression

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Analysis of variance and regression Other types of regression models Other types of regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

Linear regression Linear regression is a simple approach to supervised learning. It assumes

STARTS: STARTS: STARTS: STARTS: STAtic STAtic Regression Test Selection Regression Test

Machine Learning for NLP Reinforcement learning Aurlie Herbelot 2019 Centre for Mind/Brain

Masters Thesis Defense Matthew Jeremy Michelson University of Southern California June 15,

Conceptual Dependency KR Chowdhary, Professor, Department of Computer Science & Engineering,

Foundations of Artificial Intelligence 14. Deep Learning Learning from Raw Data Joschka

BIG DA T A Experimental Observational Computational Cognitive engineering today:

A Gentle Introduction to Neural Networks (with Python) Tariq Rashid @postenterprise EuroPython

Information Extraction from the World Wide Web Andrew McCallum University of Massachusetts

Measurement of the Cosmic-ray Proton Spectrum with the Fermi Large Area Telescope David Green,

Regression Given: Dataset D = { ( x i , Y i ) | i = 1 , ..., n } with - PowerPoint PPT Presentation

Regression Given: Dataset D = { ( x i , Y i ) | i = 1 , ..., n } with n tuples x : Object description Y : Numerical target attribute regression problem Find a function f : dom ( X 1 ) . . . dom ( X k ) Y minimizing the error e ( f ( x

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Business Statistics CONTENTS Multiple regression Dummy regressors Assumptions of regression

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

Multiple Regression and Logistic Regression I Dajiang Liu @PHS 525 Apr-14-2016 Multiple

Planning and Optimization B2. Regression: Introduction &amp; STRIPS Case Malte Helmert and

10-601 Machine Learning Regression Outline Regression vs Classification Linear regression

Linear regression How to measure the accuracy of linear regression models Linear Regression

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Analysis of variance and regression Other types of regression models Other types of regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

Linear regression Linear regression is a simple approach to supervised learning. It assumes

STARTS: STARTS: STARTS: STARTS: STAtic STAtic Regression Test Selection Regression Test

Machine Learning for NLP Reinforcement learning Aurlie Herbelot 2019 Centre for Mind/Brain

Masters Thesis Defense Matthew Jeremy Michelson University of Southern California June 15,

Conceptual Dependency KR Chowdhary, Professor, Department of Computer Science &amp; Engineering,

Foundations of Artificial Intelligence 14. Deep Learning Learning from Raw Data Joschka

BIG DA T A Experimental Observational Computational Cognitive engineering today:

A Gentle Introduction to Neural Networks (with Python) Tariq Rashid @postenterprise EuroPython

Information Extraction from the World Wide Web Andrew McCallum University of Massachusetts

Measurement of the Cosmic-ray Proton Spectrum with the Fermi Large Area Telescope David Green,

Planning and Optimization B2. Regression: Introduction & STRIPS Case Malte Helmert and

Conceptual Dependency KR Chowdhary, Professor, Department of Computer Science & Engineering,