 
              Topic 5: Non-Linear Relationships and Non-Linear Least Squares Non-linear Relationships Many relationships between variables are non-linear. (Examples) OLS may not work (recall A.1). It may be biased and inconsistent. In other situations, we may still be able to use OLS, either by approximating the non-linear relationship, or by appropriately transforming the population model. 1
 The models we’ ve worked with so far have been linear in the parameters .  They’ve been of the form: 𝒛 = 𝑌𝜸 + 𝜻  Many models based on economic theory are actually non-linear in the parameters.  In general: 𝒛 = 𝑔(𝜾; 𝑌) + 𝜻 where 𝑔 is non-linear.  Note the linear model is a special case. 2
Transforming a non-linear population model Cobb-Douglas production function: 𝑍 = 𝐵𝐿 𝛾 2 𝑀 𝛾 3 𝜁 By taking logs, the Cobb-Douglas production function can be rewritten as: log 𝑍 = 𝛾 1 + 𝛾 2 log 𝐿 + 𝛾 3 log 𝑀 + log (𝜁) This model now satisfies A.1 (linear in the parameters), however, it is not advisable to estimate by OLS in most cases. Silva and Tenreyro (2006) 1 : If log (𝜁) is heteroskedastic (it likely is), 𝑌 and 𝜻 are not independent! 1 Silva and Tenreyro (2006). The Log of Gravity. The Review of Economics and Statistics. 3
“It may be surprising that the pattern of heteroscedasticity … can affect the consistency of an estimator, rather than just its efficiency. The reason is that the nonlinear transformation …changes the properties of the error term in a nontrivial way” Approximations Some mathematical properties may be exploited in order to approximate the function 𝑔(𝜾; 𝑌) .  Polynomials  Logarithms  Dummy variables 4
Polynomial Regression Model One way to characterize the non-linear relationship between 𝑧 and 𝑦 is to say that the marginal effect of 𝑦 on 𝑧 depends on the value of 𝑦 itself.  Just include powers of the regressors on the right-hand-side  Not a violation of A.2  e.g. 𝑧 = 𝛾 0 + 𝛾 1 𝑦 + 𝛾 2 𝑦 2 + 𝛾 3 𝑦 3 + ⋯ + 𝜁  Take the derivative  Choosing 𝜸 approximates the non-linear function 𝑔  The validity of the approximation is based on Taylor-series expansion  The appropriate order of the polynomial may be determined through a series of t -tests 5
Logarithms Can take the logarithm of the LHS and/or RHS variables.  The 𝛾 s have approximate percentage-change interpretations  log-lin  lin-log  log-log For example: log 𝑥𝑏𝑓 = 𝛾 0 + 𝛾 1 𝑓𝑒𝑣𝑑 + 𝛾 2 𝑔𝑓𝑛𝑏𝑚𝑓 + ⋯ + 𝜁  Take the derivative w.r.t. 𝑓𝑒𝑣𝑑  Change in 𝑓𝑒𝑣𝑑 leads to a multiplicative change of exp(𝛾 1 ) in 𝑥𝑏𝑓  approximately 100 𝛾 1 % change (approx. based on Taylor-series expansion of exp(𝑦) )  females make 100[ exp(𝛾 2 ) − 1 ]% more than males 6
Dummy variables – Splines Ther e may be a “break” in the model so that it is “piecewise” linear.  Example: wage before and after age = 18.  “knots” and dummy variables  [pictures and notes]  Nothing in the unrestricted estimators to ensure the two functions join at the knot  Use RLS  Multiple knots can be introduced  Location of the knots can be arbitrary, leading to nonparametric kernel regression 7
Non-linear population models There are many situations where transformations/approximations of the non- linear model is not desirable/possible, and the non-linear pop. model should be estimated directly.  CES Production function : −𝜍 + (1 − 𝜀)𝑀 𝑗 −𝑤/𝜍 exp −𝜍 ] 𝑍 𝑗 = 𝛿[𝜀𝐿 𝑗 (𝜁 𝑗 ) −𝜍 + (1 − 𝜀)𝑀 𝑗 𝑤 −𝜍 ] +𝜁 𝑗 or, 𝑚𝑜(𝑍 𝑗 ) = 𝑚𝑜(𝛿) − ( 𝜍 ) 𝑚𝑜[𝜀𝐿 𝑗  Linear Expenditure System : ( Stone, 1954 ) Max. 𝑉(𝒓) = ∑ 𝛾 𝑗 𝑚𝑜(𝑟 𝑗 − 𝛿 𝑗 ) ( Stone-Geary /Klein-Rubin ) 𝑗 s.t. ∑ 𝑞 𝑗 𝑟 𝑗 = 𝑁 𝑗 8
Yields the following system of demand equations: 𝑞 𝑗 𝑟 𝑗 = 𝛿 𝑗 𝑞 𝑗 + 𝛾 𝑗 (𝑁 − ∑ 𝛿 𝑘 𝑞 𝑘 ) ; i = 1, 2, … ., n 𝑘 The 𝛾 𝑗 ’s are the Marginal Budget Shares . So, we require that 0 < 𝛾 𝑗 < 1 ; i = 1, 2, …., n .  Box-Cox transform (often applied to positive valued variables  “Limited dependent variables” o y must be positive (or negative) o y is a dummy o y is an integer 9
In general, suppose we have a single non-linear equation: 𝑧 𝑗 = 𝑔(𝑦 𝑗1 , 𝑦 𝑗2 , … , 𝑦 𝑗𝑙 ; 𝜄 1 , 𝜄 2 , … , 𝜄 𝑞 ) + 𝜁 𝑗  We can still consider a “Least Squares” approach.  The Non-Linear Least Squares estimator is the vector, 𝜾 ̂ , that minimizes the 𝟑 ̂)] quantity: 𝑇(𝑌, 𝜾) = ∑ [𝑧 𝑗 − 𝑔 𝑗 (𝑌, 𝜾 . 𝒋  Clearly the usual LS estimator is just a special case of this.  To obtain the estimator, we differentiate S with respect to each element of ̂ ; set up the “ p ” first -order conditions and solve. 𝜾  Difficulty – usually, the first-order conditions are themselves non-linear in the unknowns (the parameters).  This means there is (generally) no exact, closed-form, solution.  Can’t write down an explicit formula for the estimators of parameter s. 10
Example 𝑧 𝑗 = 𝜄 1 + 𝜄 2 𝑦 𝑗2 + 𝜄 3 𝑦 𝑗3 + (𝜄 2 𝜄 3 )𝑦 𝑗4 + 𝜁 𝑗 𝑇 = ∑[𝑧 𝑗 − 𝜄 1 − 𝜄 2 𝑦 𝑗2 − 𝜄 3 𝑦 𝑗3 − (𝜄 2 𝜄 3 )𝑦 𝑗4 ] 2 𝑗 𝜖𝑇 = −2 ∑[ 𝑧 𝑗 − 𝜄 1 − 𝜄 2 𝑦 𝑗2 − 𝜄 3 𝑦 𝑗3 − ( 𝜄 2 𝜄 3 ) 𝑦 𝑗4 ] 𝜖𝜄 1 𝑗 𝜖𝑇 = −2 ∑[(𝜄 3 𝑦 𝑗4 + 𝑦 𝑗2 )(𝑧 𝑗 − 𝜄 1 − 𝜄 2 𝑦 𝑗2 − 𝜄 3 𝑦 𝑗3 − 𝜄 2 𝜄 3 𝑦 𝑗4 )] 𝜖𝜄 2 𝑗 𝜖𝑇 = −2 ∑[(𝜄 2 𝑦 𝑗4 + 𝑦 𝑗3 )(𝑧 𝑗 − 𝜄 1 − 𝜄 2 𝑦 𝑗2 − 𝜄 3 𝑦 𝑗3 − 𝜄 2 𝜄 3 𝑦 𝑗4 )] 𝜖𝜄 3 𝑗 11
Setting these 3 equ ations to zero, we can’t solve analytically for the estimators of the three parameters.  In situations such as this, we need to use a numerical algorithm to obtain a solution to the first-order conditions.  Lots of methods for doing this – one possibility is Newton’s algorithm (the Newton-Raphson algorithm ). Methods of Descent ̃ = 𝜾 0 + 𝑡𝒆(𝜾 0 ) 𝜾 𝜾 0 = initial (vector) value. s = step-length (positive scalar) 𝒆(. ) = direction vector 12
 Usually, 𝒆(. ) Depends on the gradient vector at 𝜾 0 .  It may also depend on the change in the gradient (the Hessian matrix) at 𝜾 0 .  Some specific algorithms in the “family” make the step -length a function of the Hessian.  One very useful, specific member of the family of “Descent Methods” is the Newton-Raphson algorithm : Suppose we want to minimize some function, 𝑔(𝜾) . ̃ , the vector Approximate the function using a Taylor’s series expansion about 𝜾 value that minimizes 𝑔(𝜾) : ′ [ 𝜖 2 𝑔 ′ (𝜖𝑔 ̃ + 1 ̃) + (𝜾 − 𝜾 ̃) ̃) ̃) 𝑔(𝜾) ≅ 𝑔(𝜾 𝜖𝜾) 2! (𝜾 − 𝜾 𝜖𝜾𝜖𝜾 ′ ] (𝜾 − 𝜾 𝜾 ̃ 𝜾 13
Or: ̃) + 1 ′ (𝜾 ′ 𝐼(𝜾 ̃) + (𝜾 − 𝜾 ̃) ̃) ̃)(𝜾 − 𝜾 ̃) 𝑔(𝜾) ≅ 𝑔(𝜾 2! (𝜾 − 𝜾 So, 𝜖𝑔(𝜾) ̃) + 1 ′ (𝜾 ̃) ̃)(𝜾 − 𝜾 ̃) ≅ 0 + (𝜾 − 𝜾 2! 2𝐼(𝜾 𝜖𝜾 ̃ ) = 0 ; as 𝜾 ̃ locates a minimum. However,  (𝜾 So, 𝜖𝑔(𝜾) ̃) ≅ 𝐼 −1 (𝜾 ̃) ( (𝜾 − 𝜾 𝜖𝜾 ) ; ̃ ≅ 𝜾 − 𝐼 −1 (𝜾 ̃)(𝜾) or, 𝜾 14
This suggests a numerical algorithm: Set 𝜾 = 𝜾 0 to begin, and then iterate – 𝜾 1 = 𝜾 0 − 𝐼 −1 (𝜾 1 )(𝜾 0 ) 𝜾 2 = 𝜾 1 − 𝐼 −1 (𝜾 2 )(𝜾 1 ) ⋮ ⋮ ⋮ 𝜾 𝑜+1 = 𝜾 𝑜 − 𝐼 −1 (𝜾 𝑜+1 )(𝜾 𝑜 ) or, approximately: 𝜾 𝑜+1 = 𝜾 𝑜 − 𝐼 −1 (𝜾 𝑜 )(𝜾 𝑜 ) 15
(𝑗) −𝜄 𝑜 (𝑗) ) (𝜄 𝑜+1 | <𝜁 (𝑗) ; i = 1, 2, …, p Stop if | (𝑗) 𝜄 𝑜 Note: 1. s = 1. 2. 𝒆(𝜾 𝑜 ) = −𝐼 −1 (𝜾 𝑜 )(𝜾 𝑜 ) . 3. Algorithm fails if H ever becomes singular at any iteration. 4. Achieve a minimum of f (.) if H is positive definite . 5. Algorithm may locate only a local minimum. 6. Algorithm may oscillate . The algorithm can be given a nice geometric interpretation – scalar θ . 16
𝜖𝑔(𝜄) = (𝜄) = 0 . To find an extremum of f (.), solve 𝜖𝜄  𝜄 𝑛𝑗𝑜 𝜄 𝜄 1 𝜄 0 17
 𝜄 𝑛𝑏𝑦 𝜄 𝑛𝑗𝑜 𝜄 𝜄 1 𝜄 2 𝜄 0 18
 (𝜄 0 ) = 𝐼(𝜄 0 ) 𝜄 0 − 𝜄 1 ⇒𝜄 1 = 𝜄 0 − 𝐼 −1 (𝜄 0 )(𝜄 0 ) 𝜾 𝒐+𝟐 = 𝜾 𝒐 − 𝑰 −𝟐 (𝜾 𝒐 )𝒉(𝜾 𝒐 ) 𝜄 𝑛𝑗𝑜 𝜄 𝜄 1 𝜄 0 19
If 𝑔(𝜾) is quadratic in 𝜾 , then the algorithm converges in one iteration:  If the function is quadratic, then its gradient is linear: 𝜄 𝑛𝑗𝑜 𝜄 𝜄 1 𝜄 0 20
In general, different choices of 𝜄 0 may lead to different solutions, or no solution at all.  𝜄 𝑛𝑏𝑦 𝜄 𝑛𝑏𝑦 𝜄 𝑛𝑗𝑜 𝜄 𝑛𝑗𝑜 𝜄 𝜄 0 21
Recommend
More recommend