Poli 5D Social Science Data Analytics More on Stata Shane Xinyang - - PowerPoint PPT Presentation

poli 5d social science data analytics
SMART_READER_LITE
LIVE PREVIEW

Poli 5D Social Science Data Analytics More on Stata Shane Xinyang - - PowerPoint PPT Presentation

Poli 5D Social Science Data Analytics More on Stata Shane Xinyang Xuan ShaneXuan.com February 1, 2017 ShaneXuan.com 1 / 12 Contact Information Shane Xinyang Xuan xxuan@ucsd.edu The teaching sta ff is a team! Professor Roberts M 1600-1800


slide-1
SLIDE 1

Poli 5D Social Science Data Analytics

More on Stata Shane Xinyang Xuan ShaneXuan.com February 1, 2017

ShaneXuan.com 1 / 12

slide-2
SLIDE 2

Contact Information

Shane Xinyang Xuan xxuan@ucsd.edu The teaching staff is a team! Professor Roberts M 1600-1800 (SSB 299) Jason Bigenho Th 1000-1200 (Econ 116) Shane Xuan M 1100-1150 (SSB 332) TH 1200-1250 (SSB 332) Supplemental Materials UCLA STATA starter kit http://www.ats.ucla.edu/stat/stata/sk/ Princeton data analysis http://dss.princeton.edu/training/

ShaneXuan.com 2 / 12

slide-3
SLIDE 3

Road map

Some quick notes before we start today’s section: – Make sure that you pass around the attendance sheet – Open a .do file – Import your data (“h1 fams data.xlsx”) – I will be using my slides, and you will need to type the code in your .do file

ShaneXuan.com 3 / 12

slide-4
SLIDE 4

Announcement

I have changed my office hours to

I Monday 11-11:50 am I Thursday 12-12:50 pm

in order to accommodate as many students as possible.

ShaneXuan.com 4 / 12

slide-5
SLIDE 5

Data management

I You should have the data imported before the section starts:

– cd “/Users/Shane/Dropbox/Poli5D/psets/” – import excel “h1 fams data.xlsx”, sheet(“Families”) firstrow clear

ShaneXuan.com 5 / 12

slide-6
SLIDE 6

Data management

I You should have the data imported before the section starts:

– cd “/Users/Shane/Dropbox/Poli5D/psets/” – import excel “h1 fams data.xlsx”, sheet(“Families”) firstrow clear

I We want to generate a new variable (age dad2)

ShaneXuan.com 5 / 12

slide-7
SLIDE 7

Data management

I You should have the data imported before the section starts:

– cd “/Users/Shane/Dropbox/Poli5D/psets/” – import excel “h1 fams data.xlsx”, sheet(“Families”) firstrow clear

I We want to generate a new variable (age dad2)

generate age dad2 = age dad + 1

ShaneXuan.com 5 / 12

slide-8
SLIDE 8

Data management

I You should have the data imported before the section starts:

– cd “/Users/Shane/Dropbox/Poli5D/psets/” – import excel “h1 fams data.xlsx”, sheet(“Families”) firstrow clear

I We want to generate a new variable (age dad2)

generate age dad2 = age dad + 1

I We want to replace a value in variable race mom

ShaneXuan.com 5 / 12

slide-9
SLIDE 9

Data management

I You should have the data imported before the section starts:

– cd “/Users/Shane/Dropbox/Poli5D/psets/” – import excel “h1 fams data.xlsx”, sheet(“Families”) firstrow clear

I We want to generate a new variable (age dad2)

generate age dad2 = age dad + 1

I We want to replace a value in variable race mom

replace race mom = “Black” if race mom == “Blck”

ShaneXuan.com 5 / 12

slide-10
SLIDE 10

Label your variables

I Create a mapping (mom older names)

ShaneXuan.com 6 / 12

slide-11
SLIDE 11

Label your variables

I Create a mapping (mom older names)

label define mom older names 1 ”Yes” 0 ”No”

ShaneXuan.com 6 / 12

slide-12
SLIDE 12

Label your variables

I Create a mapping (mom older names)

label define mom older names 1 ”Yes” 0 ”No”

I Associate the mapping with a variable

ShaneXuan.com 6 / 12

slide-13
SLIDE 13

Label your variables

I Create a mapping (mom older names)

label define mom older names 1 ”Yes” 0 ”No”

I Associate the mapping with a variable

label values mom older mom older names

ShaneXuan.com 6 / 12

slide-14
SLIDE 14

Label your variables

I Create a mapping (mom older names)

label define mom older names 1 ”Yes” 0 ”No”

I Associate the mapping with a variable

label values mom older mom older names

I Assign label

ShaneXuan.com 6 / 12

slide-15
SLIDE 15

Label your variables

I Create a mapping (mom older names)

label define mom older names 1 ”Yes” 0 ”No”

I Associate the mapping with a variable

label values mom older mom older names

I Assign label

label variable mom older ”Whether mom is older”

ShaneXuan.com 6 / 12

slide-16
SLIDE 16

Label your variables

I Create a mapping (mom older names)

label define mom older names 1 ”Yes” 0 ”No”

I Associate the mapping with a variable

label values mom older mom older names

I Assign label

label variable mom older ”Whether mom is older”

I Tabulate your results

ShaneXuan.com 6 / 12

slide-17
SLIDE 17

Label your variables

I Create a mapping (mom older names)

label define mom older names 1 ”Yes” 0 ”No”

I Associate the mapping with a variable

label values mom older mom older names

I Assign label

label variable mom older ”Whether mom is older”

I Tabulate your results

tab mom older

ShaneXuan.com 6 / 12

slide-18
SLIDE 18

Deal with missingness

I Generate missing

ShaneXuan.com 7 / 12

slide-19
SLIDE 19

Deal with missingness

I Generate missing

generate dadmiss = missing(age dad)

ShaneXuan.com 7 / 12

slide-20
SLIDE 20

Deal with missingness

I Generate missing

generate dadmiss = missing(age dad)

I Tabulate your results

ShaneXuan.com 7 / 12

slide-21
SLIDE 21

Deal with missingness

I Generate missing

generate dadmiss = missing(age dad)

I Tabulate your results

tab dadmiss

ShaneXuan.com 7 / 12

slide-22
SLIDE 22

Deal with missingness

I Generate missing

generate dadmiss = missing(age dad)

I Tabulate your results

tab dadmiss

I lookup functions

ShaneXuan.com 7 / 12

slide-23
SLIDE 23

Deal with missingness

I Generate missing

generate dadmiss = missing(age dad)

I Tabulate your results

tab dadmiss

I lookup functions

list if dadmiss == 1

ShaneXuan.com 7 / 12

slide-24
SLIDE 24

Create some “bins”

Scenario: We want to recode interval variables into ordinal variables.

I recode functions

ShaneXuan.com 8 / 12

slide-25
SLIDE 25

Create some “bins”

Scenario: We want to recode interval variables into ordinal variables.

I recode functions

recode age dad (15/25=1) (26/35=2) (36/55=3), gen(age dad3)

ShaneXuan.com 8 / 12

slide-26
SLIDE 26

Create some “bins”

Scenario: We want to recode interval variables into ordinal variables.

I recode functions

recode age dad (15/25=1) (26/35=2) (36/55=3), gen(age dad3)

I Create a mapping

ShaneXuan.com 8 / 12

slide-27
SLIDE 27

Create some “bins”

Scenario: We want to recode interval variables into ordinal variables.

I recode functions

recode age dad (15/25=1) (26/35=2) (36/55=3), gen(age dad3)

I Create a mapping

label define agenames 1 “young” 2 “middle” 3 “older”

ShaneXuan.com 8 / 12

slide-28
SLIDE 28

Create some “bins”

Scenario: We want to recode interval variables into ordinal variables.

I recode functions

recode age dad (15/25=1) (26/35=2) (36/55=3), gen(age dad3)

I Create a mapping

label define agenames 1 “young” 2 “middle” 3 “older”

I Apply the mapping

ShaneXuan.com 8 / 12

slide-29
SLIDE 29

Create some “bins”

Scenario: We want to recode interval variables into ordinal variables.

I recode functions

recode age dad (15/25=1) (26/35=2) (36/55=3), gen(age dad3)

I Create a mapping

label define agenames 1 “young” 2 “middle” 3 “older”

I Apply the mapping

label values age dad3 agenames

ShaneXuan.com 8 / 12

slide-30
SLIDE 30

Create some “bins”

Scenario: We want to recode interval variables into ordinal variables.

I recode functions

recode age dad (15/25=1) (26/35=2) (36/55=3), gen(age dad3)

I Create a mapping

label define agenames 1 “young” 2 “middle” 3 “older”

I Apply the mapping

label values age dad3 agenames

I Tabulate results, calculate by row

ShaneXuan.com 8 / 12

slide-31
SLIDE 31

Create some “bins”

Scenario: We want to recode interval variables into ordinal variables.

I recode functions

recode age dad (15/25=1) (26/35=2) (36/55=3), gen(age dad3)

I Create a mapping

label define agenames 1 “young” 2 “middle” 3 “older”

I Apply the mapping

label values age dad3 agenames

I Tabulate results, calculate by row

tab age dad3 welfare, row

ShaneXuan.com 8 / 12

slide-32
SLIDE 32

Visualization in Stata

I Histogram

– histogram age mom – histogram age mom, frequency – histogram age mom, percent

ShaneXuan.com 9 / 12

slide-33
SLIDE 33

Visualization in Stata

I Histogram

– histogram age mom – histogram age mom, frequency – histogram age mom, percent

I Scatterplot

– twoway (scatter age mom age dad, mlabel(idnum) mlabsize(tiny) msize(tiny))

ShaneXuan.com 9 / 12

slide-34
SLIDE 34

Visualization in Stata (2)

I Boxplot

ShaneXuan.com 10 / 12

slide-35
SLIDE 35

Visualization in Stata (2)

I Boxplot

– graph box age mom

ShaneXuan.com 10 / 12

slide-36
SLIDE 36

Visualization in Stata (2)

I Boxplot

– graph box age mom – graph box age mom, scheme(s1manual)

ShaneXuan.com 10 / 12

slide-37
SLIDE 37

Visualization in Stata (2)

I Boxplot

– graph box age mom – graph box age mom, scheme(s1manual)

I Barplot

ShaneXuan.com 10 / 12

slide-38
SLIDE 38

Visualization in Stata (2)

I Boxplot

– graph box age mom – graph box age mom, scheme(s1manual)

I Barplot

I Code race mom into numeric variable ShaneXuan.com 10 / 12

slide-39
SLIDE 39

Visualization in Stata (2)

I Boxplot

– graph box age mom – graph box age mom, scheme(s1manual)

I Barplot

I Code race mom into numeric variable

encode race mom, generate(race mom2)

ShaneXuan.com 10 / 12

slide-40
SLIDE 40

Visualization in Stata (2)

I Boxplot

– graph box age mom – graph box age mom, scheme(s1manual)

I Barplot

I Code race mom into numeric variable

encode race mom, generate(race mom2)

I install -catplot- ShaneXuan.com 10 / 12

slide-41
SLIDE 41

Visualization in Stata (2)

I Boxplot

– graph box age mom – graph box age mom, scheme(s1manual)

I Barplot

I Code race mom into numeric variable

encode race mom, generate(race mom2)

I install -catplot-

ssc inst catplot

ShaneXuan.com 10 / 12

slide-42
SLIDE 42

Visualization in Stata (2)

I Boxplot

– graph box age mom – graph box age mom, scheme(s1manual)

I Barplot

I Code race mom into numeric variable

encode race mom, generate(race mom2)

I install -catplot-

ssc inst catplot

I Plot ShaneXuan.com 10 / 12

slide-43
SLIDE 43

Visualization in Stata (2)

I Boxplot

– graph box age mom – graph box age mom, scheme(s1manual)

I Barplot

I Code race mom into numeric variable

encode race mom, generate(race mom2)

I install -catplot-

ssc inst catplot

I Plot

catplot race mom2

ShaneXuan.com 10 / 12

slide-44
SLIDE 44

Visualization in Stata (3)

Histogram across units

ShaneXuan.com 11 / 12

slide-45
SLIDE 45

Visualization in Stata (3)

Histogram across units

I histogram age mom if race mom==“Black”

ShaneXuan.com 11 / 12

slide-46
SLIDE 46

Visualization in Stata (3)

Histogram across units

I histogram age mom if race mom==“Black” I histogram age mom if race mom==“White”

ShaneXuan.com 11 / 12

slide-47
SLIDE 47

Visualization in Stata (3)

Histogram across units

I histogram age mom if race mom==“Black” I histogram age mom if race mom==“White” I histogram age mom, by(race mom)

ShaneXuan.com 11 / 12

slide-48
SLIDE 48

Visualization in Stata (3)

Histogram across units

I histogram age mom if race mom==“Black” I histogram age mom if race mom==“White” I histogram age mom, by(race mom)

ShaneXuan.com 11 / 12

slide-49
SLIDE 49

Midterm Review

Please ask questions.

ShaneXuan.com 12 / 12