Announcements Dont forget about Problem Set 4 Midterm 2 is getting - - PowerPoint PPT Presentation

announcements
SMART_READER_LITE
LIVE PREVIEW

Announcements Dont forget about Problem Set 4 Midterm 2 is getting - - PowerPoint PPT Presentation

Announcements Dont forget about Problem Set 4 Midterm 2 is getting closer (Thursday, February 24) Midterm 2 will cover all of the bivariate material: Chapters 5, 6, 7, 8 Lectures 1-20-11 through 2-10-11 The old Midterm 2s cover exactly


slide-1
SLIDE 1

Announcements

Don’t forget about Problem Set 4 Midterm 2 is getting closer (Thursday, February 24) Midterm 2 will cover all of the bivariate material:

Chapters 5, 6, 7, 8 Lectures 1-20-11 through 2-10-11

The old Midterm 2’s cover exactly the same material Similar format to Midterm 1 The formula sheet you will get is posted on Smartsite

  • J. Parman (UC-Davis)

Analysis of Economic Data, Winter 2011 February 15, 2011 1 / 24

slide-2
SLIDE 2

Multivariate Data

points per game assists per game annual salary, millions $

  • J. Parman (UC-Davis)

Analysis of Economic Data, Winter 2011 February 15, 2011 2 / 24

slide-3
SLIDE 3

Multivariate Data: Overview

We have seen how to analyze univariate data and bivariate data Now it is time to move on to working with more than two variables This is going to require a different set of techniques Most of what we do in economics uses more than two variables, even if the question of interest is the relationship between x and y Why? Because we’re never in a controlled environment, there are lots of things other than x and y moving around

  • J. Parman (UC-Davis)

Analysis of Economic Data, Winter 2011 February 15, 2011 3 / 24

slide-4
SLIDE 4

Multivariate Data: Overview

The general plan for studying multivariate data: Data description: graphical techniques Data description: regression Statistical inference: single slope (t-stats) Statistical inference: multiple slopes simultaneously (F-stats)

  • J. Parman (UC-Davis)

Analysis of Economic Data, Winter 2011 February 15, 2011 4 / 24

slide-5
SLIDE 5

Graphing Multivariate Data

With three variables, you can do a three-way scatter plot (or a surface) With three variables, you can also do a bubble chart (scatter plot with points of varying size) With additional variables, you have to start getting creative (3-D surface with color, animation to show a time dimension, bubble plot with different colors, etc.) An alternative is to produce a scatterplot for every pairing of variables (doesn’t really capture multivariate interactions) To Excel for a bubble plot example (nba-data.xlsx)...

  • J. Parman (UC-Davis)

Analysis of Economic Data, Winter 2011 February 15, 2011 5 / 24

slide-6
SLIDE 6

Graphing Multivariate Data

  • J. Parman (UC-Davis)

Analysis of Economic Data, Winter 2011 February 15, 2011 6 / 24

slide-7
SLIDE 7

Graphing Multivariate Data

5 ‐1 1 2 3 4 4 5 6 7 8 9 10 11 missions per capita) ‐5 ‐4 ‐3 ‐2 1 4 5 6 7 8 9 10 11 ln(CO2 em ln(consumption per capita)

Size of data points is proportional to GDP per capita.

  • J. Parman (UC-Davis)

Analysis of Economic Data, Winter 2011 February 15, 2011 7 / 24

slide-8
SLIDE 8

Graphing Multivariate Data

40 45 10 15 20 25 30 35 40 45 City miles per gallon Compact Mid‐size Large 5 10 15 20 25 30 35 40 45 1 2 3 4 5 6 7 8 City miles per gallon Engine displacement (liters) Compact Mid‐size Large 5 10 15 20 25 30 35 40 45 1 2 3 4 5 6 7 8 City miles per gallon Engine displacement (liters) Compact Mid‐size Large

  • J. Parman (UC-Davis)

Analysis of Economic Data, Winter 2011 February 15, 2011 8 / 24

slide-9
SLIDE 9

Graphing Multivariate Data

From Natural Autoantibodies Reactive With Glycosaminoglycansin RA: Results, Gyorgy et al, Arthritis Research & Therapy. 2008;10(5)

  • J. Parman (UC-Davis)

Analysis of Economic Data, Winter 2011 February 15, 2011 9 / 24

slide-10
SLIDE 10

Describing Multivariate Data with a Regression

Graphs aren’t going to get us too far with multivariate data Instead, the most common approach is to use a multivariate regression This approach assumes that we have one dependent variable of interest (y) Now, we have several independent variables and need a little new notation

  • J. Parman (UC-Davis)

Analysis of Economic Data, Winter 2011 February 15, 2011 10 / 24

slide-11
SLIDE 11

Multivariate Regression

We now have K random variables:

Y : dependent variable, outcome, left-hand-side (LHS) variable X2, ..., XK: covariates, explanatory variables, independent variables, right-hand-side (RHS) variables, regressors

With these K variables, we also have K unknown population parameters (K different β’s)

  • J. Parman (UC-Davis)

Analysis of Economic Data, Winter 2011 February 15, 2011 11 / 24

slide-12
SLIDE 12

Multivariate Regression

Our model is now: Y = β1 + β2X2 + β3X3 + ... + βKXK + ε We want to estimate a ’best-fit’ line: ˆ yi = b1 + b2x2i + b3x3i + ... + bKxKi

ˆ yi: predicted value of Y for individual i x2i, ..., xKi: values of X2, ..., XK for individual i b1: intercept bk: predicted ∆Y for a one unit increase in Xk holding all other X’s constant

  • J. Parman (UC-Davis)

Analysis of Economic Data, Winter 2011 February 15, 2011 12 / 24

slide-13
SLIDE 13

Multivariate Regression

As an illustration, let’s think about a wage regression Suppose we think wage (w) is a function of education (edu) and (age) so we estimate the following best fit line: ˆ wi = b1 + b2edui + b3agei b2 is telling us

∆w ∆edu when age is held constant

b3 is telling us

∆w ∆age when education is held constant

Note that these are not the same as the coefficients from doing two bivariate regressions (a bivariate regression doesn’t hold omitted variables constant)

  • J. Parman (UC-Davis)

Analysis of Economic Data, Winter 2011 February 15, 2011 13 / 24

slide-14
SLIDE 14

Multivariate Regression

  • J. Parman (UC-Davis)

Analysis of Economic Data, Winter 2011 February 15, 2011 14 / 24

slide-15
SLIDE 15

Multivariate Regression

So how do we get this best fit line? Same way as before, minimize the distance of the yi values from the line Recall that we did this by minimizing the average squared deviation of each yi from the line (the residual): min 1 n

n

  • i=1

(yi − ˆ yi)2 The difference now is that the minimization is done by choosing the values of K different coefficients

  • J. Parman (UC-Davis)

Analysis of Economic Data, Winter 2011 February 15, 2011 15 / 24

slide-16
SLIDE 16

Multivariate Regression

min

b1,...,bK

1 n

n

  • i=1

(yi − ˆ yi)2 min

b1,...,bK

1 n

n

  • i=1

(yi − b1 − b2x2i − ... − bKxKi)2 To minimize this, we would take a derivative with respect to each bk and set it equal to zero This would give us K different equations to solve for K different unknowns The solution gives us a way to calculate each bk as a function of our data

  • J. Parman (UC-Davis)

Analysis of Economic Data, Winter 2011 February 15, 2011 16 / 24

slide-17
SLIDE 17

Multivariate Regression

The coefficients aren’t hard to calculate if you know a little matrix algebra We’ll just use Excel’s regression option to calculate them In Excel, choose Regression from the Data Analysis menu When you choose your x data, select all of the columns containing your independent variables (these columns need to be side by side) The regression output will contain coefficients, standard errors, etc. for all of the variables To Excel and the NBA data...

  • J. Parman (UC-Davis)

Analysis of Economic Data, Winter 2011 February 15, 2011 17 / 24

slide-18
SLIDE 18

Multivariate Regression: Interpreting the Results

SUMMARY OUTPUT: ln(salary in millions) as dependent variable Regression Statistics Regression Statistics Multiple R 0.63293872 R Square 0.40061142 Adjusted R Square 0.39612162 Standard Error 0.68450419 Observations 270 Coefficients Standard Error t Stat P‐value Intercept ‐0.9728428 0.087267003 ‐11.1479 5.98E‐24 points 0.07408318 0.008560474 8.654098 4.76E‐16 rebounds 0.06056555 0.017484676 3.463922 0.00062

  • J. Parman (UC-Davis)

Analysis of Economic Data, Winter 2011 February 15, 2011 18 / 24

slide-19
SLIDE 19

Multivariate Regression: Goodness of Fit

We can use the same methods as before to measure how good the fit of the regression line is:

The standard error of the regression The R2

We also have another measure called the adjusted R2 All of these measures are reported in Excel’s regression

  • utput
  • J. Parman (UC-Davis)

Analysis of Economic Data, Winter 2011 February 15, 2011 19 / 24

slide-20
SLIDE 20

Multivariate Regression: Goodness of Fit

The standard error of the regression: se =

  • 1

n − K

n

  • i=1

(yi − ˆ yi)2 This measures the average squared deviation of each yi from its predicted value It will be smaller the better our fit is but its magnitude depends on the units in which we measure y

  • J. Parman (UC-Davis)

Analysis of Economic Data, Winter 2011 February 15, 2011 20 / 24

slide-21
SLIDE 21

Multivariate Regression: Goodness of Fit

The R2: R2 = 1 − ESS TSS ESS =

n

  • i=1

(yi − ˆ yi)2 TSS =

n

  • i=1

(yi − ¯ y)2 R2 will be between 0 and 1, the closer it is to 1 the better the fit is

  • J. Parman (UC-Davis)

Analysis of Economic Data, Winter 2011 February 15, 2011 21 / 24

slide-22
SLIDE 22

Multivariate Regression: Goodness of Fit

The problem with R2 is that it will automatically increase (or at least stay the same) whenever we add more regressors We would like a measure that takes into account the number of regressors we use For example, we might prefer a line that gives us an R2

  • f .8 with only three regressors to a line that gives us an

R2 of .81 but uses thirty regressors The adjusted R2 is a variation on R2 that penalizes models that use may variables

  • J. Parman (UC-Davis)

Analysis of Economic Data, Winter 2011 February 15, 2011 22 / 24

slide-23
SLIDE 23

Multivariate Regression: Goodness of Fit

The adjusted R2: ¯ R2 = 1 − n − 1 n − K ESS TSS ESS =

n

  • i=1

(yi − ˆ yi)2 TSS =

n

  • i=1

(yi − ¯ y)2 The adjusted R2 will be between 0 and 1 and will be closer to 1 the better the fit is Adding a regressor will raise the adjusted R2 if it lowers the error sum of squares enough to offset the penalty for increasing K

  • J. Parman (UC-Davis)

Analysis of Economic Data, Winter 2011 February 15, 2011 23 / 24

slide-24
SLIDE 24

Multivariate Regression: Goodness of Fit

Dependent variable: ln(wage) Independent variables: Dependent variable: ln(wage) Independent variables: Regression Statistics Regression Statistics Multiple R 0.482165 Multiple R 0.632939 R Square 0.232483 R Square 0.400611 Adjusted R Square 0.229619 Adjusted R Square 0.396122 St d d E 0 773133 St d d E 0 684504 p rebounds p rebounds, points Standard Error 0.773133 Standard Error 0.684504 Observations 270 Observations 270

  • J. Parman (UC-Davis)

Analysis of Economic Data, Winter 2011 February 15, 2011 24 / 24