Poli 5D Social Science Data Analytics Regression in Stata Shane - - PowerPoint PPT Presentation

poli 5d social science data analytics
SMART_READER_LITE
LIVE PREVIEW

Poli 5D Social Science Data Analytics Regression in Stata Shane - - PowerPoint PPT Presentation

Poli 5D Social Science Data Analytics Regression in Stata Shane Xinyang Xuan ShaneXuan.com February 10, 2017 ShaneXuan.com 1 / 10 Contact Information Shane Xinyang Xuan xxuan@ucsd.edu The teaching staff is a team! Professor Roberts M


slide-1
SLIDE 1

Poli 5D Social Science Data Analytics

Regression in Stata Shane Xinyang Xuan ShaneXuan.com February 10, 2017

ShaneXuan.com 1 / 10

slide-2
SLIDE 2

Contact Information

Shane Xinyang Xuan xxuan@ucsd.edu The teaching staff is a team! Professor Roberts M 1600-1800 (SSB 299) Jason Bigenho Th 1000-1200 (Econ 116) Shane Xuan M 1100-1150 (SSB 332) Th 1200-1250 (SSB 332) Supplemental Materials UCLA STATA starter kit http://www.ats.ucla.edu/stat/stata/sk/ Princeton data analysis http://dss.princeton.edu/training/

ShaneXuan.com 2 / 10

slide-3
SLIDE 3

Road map

Some quick notes before we start today’s section: – Make sure that you pass around the attendance sheet – Open a .do file – Import your data (“h1 fams data.xlsx”) – I will be using my slides, and you will need to type the code in your .do file

ShaneXuan.com 3 / 10

slide-4
SLIDE 4

Regression: Examples!

Figure: Data points

ShaneXuan.com 4 / 10

slide-5
SLIDE 5

Regression: Examples!

Figure: Bad fit

ShaneXuan.com 4 / 10

slide-6
SLIDE 6

Regression: Examples!

Figure: Good fit

ShaneXuan.com 4 / 10

slide-7
SLIDE 7

Model

– Population yi = β0 + β1xi

ShaneXuan.com 5 / 10

slide-8
SLIDE 8

Model

– Population yi = β0 + β1xi – Estimation ˆ yi = ˆ β0 + ˆ β1xi + ˆ ei

ShaneXuan.com 5 / 10

slide-9
SLIDE 9

Model

– Population yi = β0 + β1xi – Estimation ˆ yi = ˆ β0 + ˆ β1xi + ˆ ei – (You don’t need to memorize this) Regression Coefficient is calculated by ˆ β1 =

  • i(xi − x)(yi − y)
  • i(xi − x)2

ShaneXuan.com 5 / 10

slide-10
SLIDE 10

Interpretation of regression coefficient

Suppose we have the model y = ˆ β1x1 + ˆ β2x2 + ˆ β0 + ˆ e

ShaneXuan.com 6 / 10

slide-11
SLIDE 11

Interpretation of regression coefficient

Suppose we have the model y = ˆ β1x1 + ˆ β2x2 + ˆ β0 + ˆ e

◮ A 1-unit change in x1 is associated with a β1-unit change in

y, all else equal.

ShaneXuan.com 6 / 10

slide-12
SLIDE 12

Interpretation of regression coefficient

Suppose we have the model y = ˆ β1x1 + ˆ β2x2 + ˆ β0 + ˆ e

◮ A 1-unit change in x1 is associated with a β1-unit change in

y, all else equal.

◮ A 1-unit change in x2 is associated with a β2-unit change in

y, all else equal.

ShaneXuan.com 6 / 10

slide-13
SLIDE 13

Application

◮ Suppose consumption (cons) is a function of family income

(inc): cons = β0 + β1inc + u where u contains other factors affecting consumption. What change do you expect to see in cons with a two-unit increase in inc?

ShaneXuan.com 7 / 10

slide-14
SLIDE 14

Application

◮ Suppose consumption (cons) is a function of family income

(inc): cons = β0 + β1inc + u where u contains other factors affecting consumption. What change do you expect to see in cons with a two-unit increase in inc?

◮ With a two-unit increase in inc,

ShaneXuan.com 7 / 10

slide-15
SLIDE 15

Application

◮ Suppose consumption (cons) is a function of family income

(inc): cons = β0 + β1inc + u where u contains other factors affecting consumption. What change do you expect to see in cons with a two-unit increase in inc?

◮ With a two-unit increase in inc,

cons = β0 + β1(inc + 2) + u = β0 + (β1inc + 2β1) + u = (β0 + β1inc + u) + 2β1

ShaneXuan.com 7 / 10

slide-16
SLIDE 16

Application

◮ Suppose consumption (cons) is a function of family income

(inc): cons = β0 + β1inc + u where u contains other factors affecting consumption. What change do you expect to see in cons with a two-unit increase in inc?

◮ With a two-unit increase in inc,

cons = β0 + β1(inc + 2) + u = β0 + (β1inc + 2β1) + u = (β0 + β1inc + u) + 2β1 Thus, we see a 2β1 increase in cons with a 2-unit increase in inc!

ShaneXuan.com 7 / 10

slide-17
SLIDE 17

Code

◮ Scatter plot: twoway (scatter povertyratio mom age mom,

mlabsize(tiny) msize(tiny))

ShaneXuan.com 8 / 10

slide-18
SLIDE 18

Code

◮ Scatter plot: twoway (scatter povertyratio mom age mom,

mlabsize(tiny) msize(tiny))

◮ Regression: regress povertyratio mom age mom

ShaneXuan.com 8 / 10

slide-19
SLIDE 19

Code

◮ Scatter plot: twoway (scatter povertyratio mom age mom,

mlabsize(tiny) msize(tiny))

◮ Regression: regress povertyratio mom age mom ◮ Visualization: twoway (scatter povertyratio mom age mom,

mlabsize(tiny) msize(tiny)) (lfit povertyratio mom age mom)

ShaneXuan.com 8 / 10

slide-20
SLIDE 20

Code

◮ Scatter plot: twoway (scatter povertyratio mom age mom,

mlabsize(tiny) msize(tiny))

◮ Regression: regress povertyratio mom age mom ◮ Visualization: twoway (scatter povertyratio mom age mom,

mlabsize(tiny) msize(tiny)) (lfit povertyratio mom age mom)

ShaneXuan.com 8 / 10

slide-21
SLIDE 21

Residuals

◮ Fitted values

ShaneXuan.com 9 / 10

slide-22
SLIDE 22

Residuals

◮ Fitted values

– Manually: gen fitted = -1.091357 + .1305531 * age mom

ShaneXuan.com 9 / 10

slide-23
SLIDE 23

Residuals

◮ Fitted values

– Manually: gen fitted = -1.091357 + .1305531 * age mom – Stata command: predict fv

ShaneXuan.com 9 / 10

slide-24
SLIDE 24

Residuals

◮ Fitted values

– Manually: gen fitted = -1.091357 + .1305531 * age mom – Stata command: predict fv

◮ Residuals

ShaneXuan.com 9 / 10

slide-25
SLIDE 25

Residuals

◮ Fitted values

– Manually: gen fitted = -1.091357 + .1305531 * age mom – Stata command: predict fv

◮ Residuals

– Manually: gen resid = povertyratio mom - fv

ShaneXuan.com 9 / 10

slide-26
SLIDE 26

Residuals

◮ Fitted values

– Manually: gen fitted = -1.091357 + .1305531 * age mom – Stata command: predict fv

◮ Residuals

– Manually: gen resid = povertyratio mom - fv – Stata command: predict e, residual

ShaneXuan.com 9 / 10

slide-27
SLIDE 27

Residuals

◮ Fitted values

– Manually: gen fitted = -1.091357 + .1305531 * age mom – Stata command: predict fv

◮ Residuals

– Manually: gen resid = povertyratio mom - fv – Stata command: predict e, residual Figure: Similar results for fitted values, and residuals

ShaneXuan.com 9 / 10

slide-28
SLIDE 28

What else can you do using regressions?

◮ Suppose you run a regression of y on x1, and get an error

term ˆ

  • e. You can then do a scatterplot of error term (ˆ

e) and a different variable (x2) to see how much of the difference can be explained by this variable:

ShaneXuan.com 10 / 10

slide-29
SLIDE 29

What else can you do using regressions?

◮ Suppose you run a regression of y on x1, and get an error

term ˆ

  • e. You can then do a scatterplot of error term (ˆ

e) and a different variable (x2) to see how much of the difference can be explained by this variable:

– twoway scatter e x 2

ShaneXuan.com 10 / 10

slide-30
SLIDE 30

What else can you do using regressions?

◮ Suppose you run a regression of y on x1, and get an error

term ˆ

  • e. You can then do a scatterplot of error term (ˆ

e) and a different variable (x2) to see how much of the difference can be explained by this variable:

– twoway scatter e x 2

◮ You can do a multiple regression

ShaneXuan.com 10 / 10

slide-31
SLIDE 31

What else can you do using regressions?

◮ Suppose you run a regression of y on x1, and get an error

term ˆ

  • e. You can then do a scatterplot of error term (ˆ

e) and a different variable (x2) to see how much of the difference can be explained by this variable:

– twoway scatter e x 2

◮ You can do a multiple regression

– regress y 1 x 1 x 2 ...

ShaneXuan.com 10 / 10