Topic 5: Non-Linear Relationships and Non-Linear Least Squares - - PowerPoint PPT Presentation

topic 5 non linear relationships and non linear least
SMART_READER_LITE
LIVE PREVIEW

Topic 5: Non-Linear Relationships and Non-Linear Least Squares - - PowerPoint PPT Presentation

Topic 5: Non-Linear Relationships and Non-Linear Least Squares Non-linear Relationships Many relationships between variables are non-linear. (Examples) OLS may not work (recall A.1). It may be biased and inconsistent. In other situations, we


slide-1
SLIDE 1

1

Non-linear Relationships Many relationships between variables are non-linear. (Examples) OLS may not work (recall A.1). It may be biased and inconsistent. In other situations, we may still be able to use OLS, either by approximating the non-linear relationship, or by appropriately transforming the population model.

Topic 5: Non-Linear Relationships and Non-Linear Least Squares

slide-2
SLIDE 2

2

 The models we’ve worked with so far have been linear in the parameters.  They’ve been of the form: 𝒛 = 𝑌𝜸 + 𝜻  Many models based on economic theory are actually non-linear in the parameters.  In general:

𝒛 = 𝑔(𝜾; 𝑌) + 𝜻

where 𝑔 is non-linear.  Note the linear model is a special case.

slide-3
SLIDE 3

3

Transforming a non-linear population model Cobb-Douglas production function: 𝑍 = 𝐵𝐿𝛾2𝑀𝛾3𝜁 By taking logs, the Cobb-Douglas production function can be rewritten as: log 𝑍 = 𝛾1 + 𝛾2 log 𝐿 + 𝛾3 log 𝑀 + log⁡ (𝜁) This model now satisfies A.1 (linear in the parameters), however, it is not advisable to estimate by OLS in most cases. Silva and Tenreyro (2006)1: If log⁡ (𝜁) is heteroskedastic (it likely is), 𝑌 and 𝜻 are not independent!

1 Silva and Tenreyro (2006). The Log of Gravity. The Review of Economics and Statistics.

slide-4
SLIDE 4

4

“It may be surprising that the pattern of heteroscedasticity … can affect the consistency of an estimator, rather than just its efficiency. The reason is that the nonlinear transformation …changes the properties of the error term in a nontrivial way” Approximations Some mathematical properties may be exploited in order to approximate the function 𝑔(𝜾; 𝑌).  Polynomials  Logarithms  Dummy variables

slide-5
SLIDE 5

5

Polynomial Regression Model One way to characterize the non-linear relationship between 𝑧 and 𝑦 is to say that the marginal effect of 𝑦 on 𝑧 depends on the value of 𝑦 itself.  Just include powers of the regressors on the right-hand-side  Not a violation of A.2  e.g. 𝑧 = 𝛾0 + 𝛾1𝑦 + 𝛾2𝑦2 + 𝛾3𝑦3 + ⋯ + 𝜁  Take the derivative  Choosing 𝜸 approximates the non-linear function 𝑔  The validity of the approximation is based on Taylor-series expansion  The appropriate order of the polynomial may be determined through a series

  • f t-tests
slide-6
SLIDE 6

6

Logarithms Can take the logarithm of the LHS and/or RHS variables.  The 𝛾s have approximate percentage-change interpretations  log-lin  lin-log  log-log For example: log 𝑥𝑏𝑕𝑓 =⁡ 𝛾0 + 𝛾1𝑓𝑒𝑣𝑑 + 𝛾2𝑔𝑓𝑛𝑏𝑚𝑓 + ⋯ + 𝜁  Take the derivative w.r.t. 𝑓𝑒𝑣𝑑  Change in 𝑓𝑒𝑣𝑑 leads to a multiplicative change of exp(𝛾1) in 𝑥𝑏𝑕𝑓  approximately 100𝛾1% change (approx. based on Taylor-series expansion of exp(𝑦))  females make 100[exp(𝛾2) − 1]% more than males

slide-7
SLIDE 7

7

Dummy variables – Splines There may be a “break” in the model so that it is “piecewise” linear.  Example: wage before and after age = 18.  “knots” and dummy variables  [pictures and notes]  Nothing in the unrestricted estimators to ensure the two functions join at the knot  Use RLS  Multiple knots can be introduced  Location of the knots can be arbitrary, leading to nonparametric kernel regression

slide-8
SLIDE 8

8

Non-linear population models There are many situations where transformations/approximations of the non- linear model is not desirable/possible, and the non-linear pop. model should be estimated directly.  CES Production function:

𝑍

𝑗 = 𝛿[𝜀𝐿𝑗 −𝜍 + (1 − 𝜀)𝑀𝑗 −𝜍] −𝑤/𝜍exp⁡

(𝜁𝑗)

  • r, 𝑚𝑜(𝑍

𝑗) = 𝑚𝑜(𝛿) − ( 𝑤 𝜍) 𝑚𝑜[𝜀𝐿𝑗 −𝜍 + (1 − 𝜀)𝑀𝑗 −𝜍] +𝜁𝑗

 Linear Expenditure System: (Stone, 1954)

  • Max. 𝑉(𝒓) = ∑ 𝛾𝑗𝑚𝑜(𝑟𝑗 − 𝛿𝑗)

𝑗

(Stone-Geary /Klein-Rubin) s.t. ∑ 𝑞𝑗𝑟𝑗 = 𝑁

𝑗

slide-9
SLIDE 9

9

Yields the following system of demand equations:

𝑞𝑗𝑟𝑗 = 𝛿𝑗𝑞𝑗 + 𝛾𝑗(𝑁 − ∑ 𝛿𝑘𝑞𝑘

𝑘

) ; i = 1, 2, …., n

The 𝛾𝑗’s are the Marginal Budget Shares. So, we require that 0 < 𝛾𝑗 < 1 ; i = 1, 2, …., n.  Box-Cox transform (often applied to positive valued variables  “Limited dependent variables”

  • y must be positive (or negative)
  • y is a dummy
  • y is an integer
slide-10
SLIDE 10

10

In general, suppose we have a single non-linear equation:

𝑧𝑗 = 𝑔(𝑦𝑗1, 𝑦𝑗2, … , 𝑦𝑗𝑙; 𝜄1, 𝜄2, … , 𝜄𝑞) + 𝜁𝑗

 We can still consider a “Least Squares” approach.  The Non-Linear Least Squares estimator is the vector, 𝜾 ̂ , that minimizes the quantity: 𝑇(𝑌, 𝜾) = ∑ [𝑧𝑗 − 𝑔

𝑗(𝑌, 𝜾

̂)]

𝟑 𝒋

.

 Clearly the usual LS estimator is just a special case of this.  To obtain the estimator, we differentiate S with respect to each element of 𝜾 ̂; set up the “p” first-order conditions and solve.  Difficulty – usually, the first-order conditions are themselves non-linear in the unknowns (the parameters).  This means there is (generally) no exact, closed-form, solution.  Can’t write down an explicit formula for the estimators of parameters.

slide-11
SLIDE 11

11

Example

𝑧𝑗 = 𝜄1 + 𝜄2𝑦𝑗2 + 𝜄3𝑦𝑗3 + (𝜄2𝜄3)𝑦𝑗4 + 𝜁𝑗 𝑇 = ∑[𝑧𝑗 − 𝜄1 − 𝜄2𝑦𝑗2 − 𝜄3𝑦𝑗3 − (𝜄2𝜄3)𝑦𝑗4]2

𝑗

𝜖𝑇 𝜖𝜄1 = −2 ∑[𝑧𝑗 − 𝜄1 − 𝜄2𝑦𝑗2 − 𝜄3𝑦𝑗3 − (𝜄2𝜄3)𝑦𝑗4]

𝑗

𝜖𝑇 𝜖𝜄2 = −2 ∑[(𝜄3𝑦𝑗4 + 𝑦𝑗2)(𝑧𝑗 − 𝜄1 − 𝜄2𝑦𝑗2 − 𝜄3𝑦𝑗3 − 𝜄2𝜄3𝑦𝑗4)]

𝑗

𝜖𝑇 𝜖𝜄3 = −2 ∑[(𝜄2𝑦𝑗4 + 𝑦𝑗3)(𝑧𝑗 − 𝜄1 − 𝜄2𝑦𝑗2 − 𝜄3𝑦𝑗3 − 𝜄2𝜄3𝑦𝑗4)]

𝑗

slide-12
SLIDE 12

12

Setting these 3 equations to zero, we can’t solve analytically for the estimators

  • f the three parameters.

 In situations such as this, we need to use a numerical algorithm to obtain a solution to the first-order conditions.  Lots of methods for doing this – one possibility is Newton’s algorithm (the Newton-Raphson algorithm). Methods of Descent

𝜾⁡ ̃ = 𝜾0 + 𝑡⁡𝒆(𝜾0) 𝜾0 = initial (vector) value.

s = step-length (positive scalar)

𝒆(. ) = direction vector

slide-13
SLIDE 13

13

 Usually, 𝒆(. ) Depends on the gradient vector at 𝜾0.  It may also depend on the change in the gradient (the Hessian matrix) at 𝜾0.  Some specific algorithms in the “family” make the step-length a function of the Hessian.  One very useful, specific member of the family of “Descent Methods” is the Newton-Raphson algorithm: Suppose we want to minimize some function, 𝑔(𝜾). Approximate the function using a Taylor’s series expansion about 𝜾

̃ , the vector

value that minimizes 𝑔(𝜾):

𝑔(𝜾) ≅ 𝑔(𝜾 ̃) + (𝜾 − 𝜾 ̃)

′ (𝜖𝑔

𝜖𝜾)

𝜾 ̃ + 1

2! (𝜾 − 𝜾 ̃)

′ [ 𝜖2𝑔

𝜖𝜾𝜖𝜾′]

𝜾 ̃

(𝜾 − 𝜾 ̃)

slide-14
SLIDE 14

14

Or:

𝑔(𝜾) ≅ 𝑔(𝜾 ̃) + (𝜾 − 𝜾 ̃)

′𝑕(𝜾

̃) + 1 2! (𝜾 − 𝜾 ̃)

′𝐼(𝜾

̃)(𝜾 − 𝜾 ̃)

So,

𝜖𝑔(𝜾) 𝜖𝜾 ≅ 0 + (𝜾 − 𝜾 ̃)

′𝑕(𝜾

̃) + 1 2! 2𝐼(𝜾 ̃)(𝜾 − 𝜾 ̃)

However, 𝑕(𝜾 ̃) = 0 ; as 𝜾

̃ locates a minimum.

So, (𝜾 − 𝜾

̃) ≅ 𝐼−1(𝜾 ̃) (

𝜖𝑔(𝜾) 𝜖𝜾 ) ;

  • r, 𝜾

̃ ≅ 𝜾 − 𝐼−1(𝜾 ̃)𝑕(𝜾)

slide-15
SLIDE 15

15

This suggests a numerical algorithm: Set 𝜾 = 𝜾0 to begin, and then iterate –

𝜾1 = 𝜾0 − 𝐼−1(𝜾1)𝑕(𝜾0) 𝜾2 = 𝜾1 − 𝐼−1(𝜾2)𝑕(𝜾1)

⋮ ⋮ ⋮

𝜾𝑜+1 = 𝜾𝑜 − 𝐼−1(𝜾𝑜+1)𝑕(𝜾𝑜)

  • r, approximately:

𝜾𝑜+1 = 𝜾𝑜 − 𝐼−1(𝜾𝑜)𝑕(𝜾𝑜)

slide-16
SLIDE 16

16

Stop if |

(𝜄𝑜+1

(𝑗) −𝜄𝑜 (𝑗))

𝜄𝑜

(𝑗)

| <⁡𝜁(𝑗) ; i = 1, 2, …, p

Note:

  • 1. s = 1.
  • 2. 𝒆(𝜾𝑜) = −𝐼−1(𝜾𝑜)𝑕(𝜾𝑜) .
  • 3. Algorithm fails if H ever becomes singular at any iteration.
  • 4. Achieve a minimum of f (.) if H is positive definite.
  • 5. Algorithm may locate only a local minimum.
  • 6. Algorithm may oscillate.

The algorithm can be given a nice geometric interpretation – scalar θ.

slide-17
SLIDE 17

17

To find an extremum of f (.), solve

𝜖𝑔(𝜄) 𝜖𝜄

= 𝑕(𝜄) = 0 . ⁡⁡⁡⁡𝑕 𝜄 𝜄0 𝜄1 𝜄𝑛𝑗𝑜

slide-18
SLIDE 18

18

⁡⁡⁡⁡𝑕 𝜄 𝜄0 𝜄1 𝜄2 𝜄𝑛𝑗𝑜 𝜄𝑛𝑏𝑦

slide-19
SLIDE 19

19

⁡⁡⁡⁡𝑕 𝜄 𝜄0 𝜄1 𝜄𝑛𝑗𝑜 𝑕(𝜄0) 𝜄0 − 𝜄1 = 𝐼(𝜄0) ⁡⁡⁡⇒⁡⁡⁡⁡⁡⁡⁡⁡𝜄1 = 𝜄0 − 𝐼−1(𝜄0)𝑕(𝜄0) ⁡⁡⁡⁡⁡⁡⁡⁡𝜾𝒐+𝟐 = 𝜾𝒐 − 𝑰−𝟐(𝜾𝒐)𝒉(𝜾𝒐)

slide-20
SLIDE 20

20

If 𝑔(𝜾) is quadratic in 𝜾, then the algorithm converges in one iteration:

⁡⁡⁡⁡𝑕 𝜄 𝜄0 𝜄1 𝜄𝑛𝑗𝑜

If the function is quadratic, then its gradient is linear:

slide-21
SLIDE 21

21

In general, different choices of 𝜄0 may lead to different solutions, or no

solution at all.

⁡⁡⁡⁡𝑕 𝜄 𝜄0 𝜄𝑛𝑗𝑜 𝜄𝑛𝑗𝑜 𝜄𝑛𝑏𝑦 𝜄𝑛𝑏𝑦

slide-22
SLIDE 22

22

⁡⁡⁡⁡𝑕 𝜄 𝜄0

The Hessian is singular

slide-23
SLIDE 23

23

𝑕 𝜄 𝜄0 𝜄1

The algorithm “cycles”

slide-24
SLIDE 24

24

Example (Where we actually know the answer)

𝑔(𝜄) = 3𝜄4 − 4𝜄3 + 1 locate minimum

Analytically:

𝑕(𝜄) = 12𝜄3 − 12𝜄2 = 12𝜄2(𝜄 − 1) 𝐼(𝜄) = 36𝜄2 − 24𝜄 = 12𝜄(3𝜄 − 2)

Turning points at = 0, 1 .

𝐼(0) = 0 saddlepoint 𝐼(1) = 12

minimum Algorithm

𝜄𝑜+1 = 𝜄𝑜 − 𝐼−1(𝜄𝑜)𝑕(𝜄𝑜)

slide-25
SLIDE 25

25

𝜄0 = 2 (say) 𝜄1 = 2 − (

48 96) = 1.5

𝜄2 = 1.5 − (

13.5 45 ) = 1.2

𝜄3 = 1.2 − (

3.456 23.040) = 1.05

etc.

Try: 𝜄0 = −2;⁡⁡⁡⁡𝜄0 = 0.5