Introduction to Big Data and Machine Learning OLS matrix derivation Dr. Mihail August 26, 2019 (Dr. Mihail) Intro Big Data August 26, 2019 1 / 9
Ordinary least squares Matrix form Let X be n x k , where each row ( n of them) is an observation of k variables. We will assume models have a constant (bias), so first column will be 1’s Let y be an n x 1 vector of observations on the dependent variable Let ǫ be an n x 1 vector of disturbances or errors Let β be a k x 1 vector of unknown population parameters that we wish to estimate 1 Y 1 X 11 X 21 . . . X 21 β 1 ǫ 1 1 . . . β 2 ǫ 2 Y 2 X 12 X 22 X k 2 . . . . . . . . . . . . . . . = . . . . . . . . + . (1) . . . . . . . . . . . . . . . . . . . . . . . . 1 Y n X 1 n X 2 n . . . X kn β n ǫ n nx 1 kx 1 nx 1 nxk (Dr. Mihail) Intro Big Data August 26, 2019 2 / 9
Ordinary least squares Matrix form 1 Y 1 X 11 X 21 . . . X 21 β 1 ǫ 1 1 . . . β 2 ǫ 2 Y 2 X 12 X 22 X k 2 . . . . . . . . . . . . . . . = . . . . . . . . + . . . . . . . . . . . . . . . . . . . . . . . . . 1 Y n X 1 n X 2 n . . . X kn β n ǫ n nx 1 kx 1 nx 1 nxk Or more succintly y = X β + ǫ (2) (Dr. Mihail) Intro Big Data August 26, 2019 3 / 9
Ordinary least squares Matrix form We wish to estimate ˆ β β minimizes the sum of the squared residuals � e 2 ˆ i The vector of residuals is given by e = y − X ˆ β The sum of squared residuals is given by e ′ e a a Not to be confused with ee ′ , the covariance of residuals Sum of squared residuals e 1 e 2 . . � � � � . = (3) e 1 e 2 . . . . . . e n e 1 × e 1 e 2 × e 2 . . . e n × e n 1 xn . . . e n (Dr. Mihail) Intro Big Data August 26, 2019 4 / 9
Ordinary least squares Sum of squares e ′ e = ( y − X ˆ β ) ′ ( y − X ˆ β ) = y ′ y − ˆ β ′ y − y ′ X ˆ β + ˆ β ′ X ′ X ˆ (4) β = y ′ y − 2ˆ β ′ X ′ y + ˆ β ′ X ′ X ˆ β β ) ′ = ˆ We used this identity: y ′ X ˆ β = ( y ′ X ˆ β ′ X ′ y (Dr. Mihail) Intro Big Data August 26, 2019 5 / 9
Ordinary least squares Matrix differentiation review ∂ a ′ b ∂ b = ∂ b ′ a ∂ b = a (5) where a and b are Kx 1 vectors ∂ b ′ Ab = 2 Ab = 2 b ′ A (6) ∂ b where A is any symmetric matrix. Note that you can write the derivative as 2 Ab or 2 b ′ a (Dr. Mihail) Intro Big Data August 26, 2019 6 / 9
Ordinary least squares Matrix differentiation review ∂ 2 β ′ X ′ y = ∂ 2 β ′ ( X ′ y ) = 2 X ′ y (7) ∂ b ∂ b and ∂ 2 β ′ X ′ X β = ∂ 2 β ′ A β = 2 A β = 2 X ′ X β (8) ∂ b ∂ b when X ′ X is a KxK matrix. (Dr. Mihail) Intro Big Data August 26, 2019 7 / 9
Ordinary least squares Parameter estimation The ˆ β that minimizes the sum of squared residuals is obtained by computing the derivative of e ′ e with respect to ˆ β ∂ e ′ e = − 2 X ′ y + 2 X ′ X ˆ (9) β ∂ ˆ β Setting the derivative equal to 0 and solving for ˆ β − 2 X ′ y + 2 X ′ X ˆ β = 0 (10) ( X ′ X )ˆ β = X ′ y (11) X ′ X is always square ( k x k ) and symmetric. Both X and y are known from our data (Dr. Mihail) Intro Big Data August 26, 2019 8 / 9
Ordinary least squares Parameter estimation ( X ′ X )ˆ β = X ′ y (12) X ′ X is always square ( k x k ) and symmetric. Both X and y are known from our data, so we can multiply both sides by the inverse ( X ′ X ) − 1 , yielding: ( X ′ X ) − 1 ( X ′ X )ˆ β = ( X ′ X ) − 1 X ′ y (13) I ˆ β = ( X ′ X ) − 1 X ′ y (14) or finally: ˆ β = ( X ′ X ) − 1 X ′ y (15) (Dr. Mihail) Intro Big Data August 26, 2019 9 / 9
Recommend
More recommend