Stata Bootcamp - STAMP Denise Laroze 1 University of Essex - - PowerPoint PPT Presentation

stata bootcamp stamp
SMART_READER_LITE
LIVE PREVIEW

Stata Bootcamp - STAMP Denise Laroze 1 University of Essex - - PowerPoint PPT Presentation

Motivation Stata Basics Datasets Summary Basics Regression Final Questions Stata Bootcamp - STAMP Denise Laroze 1 University of Essex Department of Government October 10, 2014 Denise Laroze University of Essex STAMP 1 of 30 Motivation


slide-1
SLIDE 1

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Stata Bootcamp - STAMP

Denise Laroze

1University of Essex

Department of Government

October 10, 2014

Denise Laroze University of Essex STAMP 1 of 30

slide-2
SLIDE 2

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Motivation

i) What is STATA? ii) Why do we learn it? iii) What are students expected to learn from STAMP?

Disclaimer

As with any programming environment there are multiple ways of producing the same results. The options shown here are just one alternative, you may find that other options suit you more. Please, go out and find the best one for you.

Denise Laroze University of Essex STAMP 2 of 30

slide-3
SLIDE 3

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Stata Basics

Open Stata, understand the layout HELP!!

i) ‘Help’ icon in Stata ii) UCLA Institute for Digital Research and Education iii) Online tutorials (e.g. Princeton) iv) Google?

‘log’ and ‘do’ files

Denise Laroze University of Essex STAMP 3 of 30

slide-4
SLIDE 4

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Creating and Manipulating Data

i) Generate data with certain characteristics using the gen command and a series of options (including, rnormal(), rbinomial(), runiform(), if , =, != ). Here the structure of the command works like this: gen variablename = something ii) Use existing data with the use and insheet commands iii) Manipulate data with the recode, replace, rename, label define, and label values commands

Denise Laroze University of Essex STAMP 4 of 30

slide-5
SLIDE 5

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Tables and Graphs

As in any statistical package, Stata allows you to summarise your data by producing graphs and tables. These can be as sophisticated as you are willing code or very simple. Open the ‘STAMP.do’ file

Denise Laroze University of Essex STAMP 5 of 30

slide-6
SLIDE 6

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Exercise 1 - Generating and Using Data I

1 Generate a .do and a .log file 2 Generate a dataset with 1000 observations 3 Generate variable z1 from a normal distribution with mean=0

and sd=1. Create two other variables (z2 and z3) with different means and standard deviations. Then create a variable z4 that is the equal to z2*z3. Create a histogram of each of the variables and summarize the data and describe the differences and similarities between the variables you have just created.

Denise Laroze University of Essex STAMP 6 of 30

slide-7
SLIDE 7

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Exercise 1 - Generating and Using Data II

4 Create a new variable ”Fun” with 4 categories (0-3). Recode

category 3 and replace it with the value 99. After you have finished add labels to each of the categories. The first three should be activities that are fun for you and the last is a category for ”Don’t know”.

5 Why is recoding the ”Don’t know” category to 99 a problem?

What should you use instead?

6 Create dummy variables out of the categorical variable and

rename the each of the new variables according to what they represent.

7 Save your data as a .dta file and as a .csv. 8 close the ‘log’ file Denise Laroze University of Essex STAMP 7 of 30

slide-8
SLIDE 8

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Solutions Exercise 1

For the solutions to Excercise 1 look at the ‘solutionsEx1.do’ file

Denise Laroze University of Essex STAMP 8 of 30

slide-9
SLIDE 9

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Datasets

Quantitative research can take many shapes and forms, the only requirement is that it uses numbers. However, to conduct empirical research you have to understand what kind of data you are working with, there are several different types, for example:

1 Cross-Section 2 Time-Series 3 Time-Series Cross-Sections 4 Panel 5 Panel(wide) or Survival Denise Laroze University of Essex STAMP 9 of 30

slide-10
SLIDE 10

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Cross-Sectional Data

Denise Laroze University of Essex STAMP 10 of 30

slide-11
SLIDE 11

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Time-Series

Denise Laroze University of Essex STAMP 11 of 30

slide-12
SLIDE 12

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Panel

Denise Laroze University of Essex STAMP 12 of 30

slide-13
SLIDE 13

Motivation Stata Basics Datasets Summary Basics Regression Final Questions Survival

Survival Data (or Panel (wide) for STATA)

Denise Laroze University of Essex STAMP 13 of 30

slide-14
SLIDE 14

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Merging Data

For most research projects you will need to combine (merge)

  • datasets. (Sometimes you may even code data yourselves). You

can do this by hand, for example copying and pasting in an excel file, but that becomes inefficient very quickly. Stata can do the merging for you, but only if you: i) Have a ‘merging’ or ‘identifying’ variable ii) The files you are merging have the same shape iii) Have the data in the same formate (e.g. a .dta file)

Denise Laroze University of Essex STAMP 14 of 30

slide-15
SLIDE 15

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Stata Example - Datasets and Merging

Let’s see the difference between the datasets in practice and test how the merging command works. Go on to the ‘STAMP.do’ file

Denise Laroze University of Essex STAMP 15 of 30

slide-16
SLIDE 16

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Exercise 2 - Merging Real Datasets I

For this exercise you will have to merge two datasets available on my webpage http://deniselaroze.wordpress.com/ and a third dataset of your choice, either from the world bank or the IMF.

1 To start look at the ‘STAMP merge.dta’ dataset. What type

  • f dataset it is? What observations are included? How many
  • bservations does it have?

2 Now insheet the ‘Inflation IMF.csv’ dataset. What shape

does it have? Can you merge it with the ‘STAMP merge.dta’? Why or why not? (Hint, to merge data it has to be in the same ‘shape’ and both files have to merge on the same criteria/variables, for example country id and year).

3 Once you have reshaped the data, merge both files. Denise Laroze University of Essex STAMP 16 of 30

slide-17
SLIDE 17

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Exercise 2 - Merging Real Datasets II

4 Go to the IMF or the World Bank databank and get a variable

that you like. Download it, reshape it and merge it with the

  • ther datasets.

5 Choose two variables and create a scatter plot with an lfit

line.

6 Obtain correlation coefficients of the variables in your new

dataset (pwcorr, var1 var2 ..., sig).

Denise Laroze University of Essex STAMP 17 of 30

slide-18
SLIDE 18

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Solutions Exercise 2

For the solutions to Exercise 2 look at the ‘solutionsEx2.do’ file

Denise Laroze University of Essex STAMP 18 of 30

slide-19
SLIDE 19

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Trouble Shooting

Once you’ve managed to get all the data into one dataset with the correct shape for your objectives, remember to check if the process has worked correctly. You may encounter more problems than you

  • riginally thought

Potential problems

You may have used 1:M or M:1 instead of 1:1, Merged a file that uses ‘,’ instead of ‘.’ and now its a string, Accidentally included data with letters or symbols (e.g. e−8), and, again, the variable is a string, etc. There are many problems and even more solutions (as long as you check!!!)

Denise Laroze University of Essex STAMP 19 of 30

slide-20
SLIDE 20

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Summary of Basics

Up to this point you have learned how to: i) Introduce data (from different formats) into Stata ii) Reshape and create data iii) Created some basic graphs and tables

Denise Laroze University of Essex STAMP 20 of 30

slide-21
SLIDE 21

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Important Commands

Who can tell me what these commands do? i) gen ii) merge iii) use iv) tab v) sum vi) set obs vii) replace viii) if viv) histogram vv) rnormal(a,b)

Denise Laroze University of Essex STAMP 21 of 30

slide-22
SLIDE 22

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Linear Regression - (O)LS

A linear regression is a function of the type: y = β0 + β1X + e The function has two components:

1 The systematic component: β0 + β1X 2 The random component: e Denise Laroze University of Essex STAMP 22 of 30

slide-23
SLIDE 23

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

(O)LS - to STATA I

To implement a function like this in Stata, let’s look at the following example income = 1000 + 20 ∗ capabilities + 500 ∗ EssexMasters + luck It is composed of N elements

1 a β0 constant = 1000 2 a β1 constant = 20 3 a variable capabilities (our X), let’s assume it has mean=10

and standard deviation =5

4 an EssexMasters dummy variable (another X) 5 a luck variable (our e), and 6 a predicted income variable (our y variable) Denise Laroze University of Essex STAMP 23 of 30

slide-24
SLIDE 24

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

(O)LS - to STATA II

So to recreate this Data Generating Process (DGP) using Stata, we have to generate the constants, independent variable(s) and random component(s), and use those to generate y. Obviously, the variables in an (O)LS regression can be of different

  • types. They can be continuous, discrete, dummies and categorical

(or any sub-classification you have heard of). They can also have non-linear shapes such as X n and log(X). Let’s look at how this works in Stata

Denise Laroze University of Essex STAMP 24 of 30

slide-25
SLIDE 25

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Exercise 3. I

This last exercise will provide you with a better understanding of the linear model. To complete it you will need to use most of the commands we have seen throughout this course. Let’s start with a simple model: y1 = β0 + β1 ∗ x1 + ǫ1 Where, x1 ∼ N(0, 1), ǫ1 ∼ N(0, 100), β0 = 3, β1 = 6

1 Create the model with 10 observations, run the corresponding

regression (reg y x1) and save the output (try outreg2).

2 Now increase the observations to 1000 and create the model

  • again. Compare the results, do they produce the values you

expected? How do the two models differ? What happens to the coefficient and sd of the independent variable?

Denise Laroze University of Essex STAMP 25 of 30

slide-26
SLIDE 26

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Exercise 3. II

Now let’s move on to a multivariate model: y1 = β0 + β1 ∗ x1 + β2 ∗ z1 + ǫ1 Where all the variables are the same as before, β2 = 0.3 and x1 ∼ N(10, 1)

1 Create the model with 1000 observations, run the

corresponding regression and interpret the results. Are they reasonable? Why or why not?

2 Now create new data (with the same characteristics), but

increase the size of β2 = 1. What happens now?

Denise Laroze University of Essex STAMP 26 of 30

slide-27
SLIDE 27

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Exercise 3. III

Now let’s move on to causality. When is regressions causal? Try understanding it from this exercise. Create a dataset that has the following structure y2 = β0 + β1 ∗ x1 + β2 ∗ z1 + β3 ∗ w1 + ǫ1 But this time z1 and w1 will be correlated. To do so, create w1 = 4 ∗ z1 ∗ random, where ‘random’ is some random variable N(0, 1) and β3 = 4.

1 Run the regression y3 = β0 + β1 ∗ x1 + β2 ∗ z1 + ǫ1 and

interpret the results.

2 Now run the regression

y1 = β0 + β1 ∗ x1 + β2 ∗ z1 + β3 ∗ w1 + ǫ1.

3 Compare the results. Are the coefficients what you expected? Denise Laroze University of Essex STAMP 27 of 30

slide-28
SLIDE 28

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Exercise 3. IV

Now let’s move on to graphs

1 Create a scatter plot of y1 and x1 and add a linear fit line to

the graph (try lfit)

2 Now create a variable called u that is equal to x1 + 1.5 if the

variable is from the first half of the dataset and equal to x1 − 1.5 if it is in the second half. Using the gen number= n and if commands will be useful.

3 Make a scatter plot with both y1, x1 and u and the

corresponding linear fit line. Change the colours of the data points different for x1 and u so you can observe the differences.

4 Why do you think you had to do the regression exercises

above? What were you expected to learn from this process?

Denise Laroze University of Essex STAMP 28 of 30

slide-29
SLIDE 29

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Solutions Exercise 3.

For the solutions to Exercise 3 look at the ‘solutionsEx3.do’ file

Denise Laroze University of Essex STAMP 29 of 30

slide-30
SLIDE 30

Motivation Stata Basics Datasets Summary Basics Regression Final Questions

Q & A

This is the end of the beginning. . . what you have learned in these two days will be useful throughout the year and (maybe) in you future career. If there are any questions please use this time to ask. Thank you for you patience and effort. Have a great term!

Denise Laroze University of Essex STAMP 30 of 30