ps 405 week 1 section intro to r and summary statistics
play

PS 405 Week 1 Section Intro to R and Summary Statistics D.J. Flynn - PowerPoint PPT Presentation

PS 405 Week 1 Section Intro to R and Summary Statistics D.J. Flynn January 14, 2014 Todays plan Preliminaries Intro to R Basic univariate and bivariate stats Plots Preliminaries Section: Tuesday, 5:00-6:00, Scott 212 Office


  1. PS 405 – Week 1 Section Intro to R and Summary Statistics D.J. Flynn January 14, 2014

  2. Today’s plan Preliminaries Intro to R Basic univariate and bivariate stats Plots

  3. Preliminaries ◮ Section: Tuesday, 5:00-6:00, Scott 212 ◮ Office Hours: Thursday, 12:30-2:00, Scott 230 ◮ Problem Sets: ◮ hard copies ◮ include code (annotated) ◮ neat tables (cleaned up in Word or L A T EX) ◮ grades: number correct (meaningless) ◮ Questions: substantive questions to office hours, please ◮ Website: my overheads/code will be posted at www.djflynn.org/teaching

  4. Caveats ◮ this presentation: intro to the basics ◮ a lot of helpful R guides out there (see Thomas Leeper’s: thomasleeper.com/Rcourse/Intro2R/Intro2R.pdf ) ◮ 90% of R skills come from trial-and-error ◮ Google error messages ◮ pro tip: always know what you’re asking R to do (not just the code). Next quarter Jay will show you what’s going on behind the scenes.

  5. R looks like this...

  6. RStudio I highly recommend using a text editor, such as RStudio:

  7. About R ◮ Almost entirely command-based (no point-and-click) ◮ Core functionalities already loaded; if you need anything else, load a package (we’ll do this) ◮ Advantages : FREE, extremely flexible, great graphics, increasingly the norm ◮ Disadvantages : steep learning curve, tedious code, very sensitive, unhelpful error messages

  8. Practical tips 1 ◮ R is extremely sensitive: x � = X, Data � = data ◮ scroll through code using up and down arrows ◮ putting a question mark before a command will bring up the relevant help file: ?summary ◮ use pound signs (#) to annotate code as you go along ◮ ALWAYS save your code in a separate file (RStudio makes this easy) ◮ when R asks if you want to save the workspace image, say yes! 1 Most of these tips came from Salma Al-Shami’s slides from previous years (thanks, Salma!)

  9. Basic commands ◮ R works like a calculator:

  10. ◮ Creating objects in R : ◮ constants: x<-5 constant=1 ◮ vectors: myvec<-c(1,2,3,4,5) myothervec<-c(6,7,8,9,10) colors<-c("blue","green","red","purple") ◮ matrices: mymatrix<-cbind(myvec,myothervec) my.other.matrix<-matrix(seq(1,100),10,10) ◮ data frames: mydataframe<-cbind.data.frame(myvec,myothervec)

  11. Looking at data ◮ you have to tell R where to find variables: dataset$variable ◮ use attach() and detach() , but always know what dataset you’re referring to ◮ to look at an object, just type its name ◮ descriptives: mean median mode max min var sd range ◮ distributions: table() summary() head() ◮ variables: names(dataset) dataset$variable dataset$variable[obs1:obs2]

  12. Practice looking at variables in the pre-loaded dataset faithful . Access it like this: install.packages("car") library(car) names(faithful)

  13. Loading packages install.packages("nameofpackage") library(nameofpackage)

  14. Loading data in R ◮ code depends on the type of file you’re attempting to load: read.table read.dta read.csv read.spss , etc.. ◮ two options: (1) tell R exactly where to find the dataset you want, or (2) set a working directory and then just tell it the file name ◮ I highly recommend the latter because typing long file paths can be a nightmare (e.g., typos, slashes, quotation marks) ◮ to load data not already in .R format, load the foreign package ◮ MUCH easier in RStudio (and on Macs)

  15. Example using pilot.data.csv Option 1: Load from file path install.packages("foreign") library(foreign) pilot<-read.csv(" ∼ /Documents/TAing/winter 2014/section/week1/pilot.data.csv") names(pilot) Option 2: Set wd, then call up file setwd(" ∼ /Documents/TAing/winter 2014/section/week1") install.packages("foreign") library(foreign) pilot<-read.csv("pilot.data.csv") names(pilot) Option 3: Point-and-click open in RStudio

  16. Types of variables and why we care ◮ nominal/categorical: can’t be ordered; distance not meaningful ◮ ordinal: can be ordered; distance may/may not be meaningful ◮ continuous: can be ordered; distance meaningful Model selection depends on type of DV. This class: continuous and quasi-continuous DVs Next class: categorical/limited DVs

  17. Re-coding Raw data (especially secondary data, e.g., ANES) are ofen coded awkwardly, so we want to re-code: load("/Users/DJF/Documents/TAing/winter 2014/section/week1/nes2008.RData") practice<-nes08 summary(practice$partyid) #notice how responses are non-numeric Here I code Dems as 1, Reps as 2, Inds as 3, and others as missing: library(car) practice$newpartyid<-recode(practice$partyid,"’1. Democrat’=1; ’2. Republican’=2; ’3. Independent’=3;else=’NA’") It’s always a good idea to compare the distributions before and afer re-coding to make sure everything was done correctly: table(practice$partyid) table(practice$newpartyid)

  18. Another recoding example (this time changing already numeric responses): library(car) pilot$gmf.new<-recode(pilot$gmf,"7=1;6=2;5=3;4=4;3=5; 2=6;1=7;else=NA") table(pilot$gmf) table(pilot$gmf.new)

  19. Sub-setting We ofen want to subset data based on values of one or more variables (e.g., look only at Democrats, or voters>50, etc..): older<-subset(practice,V081104>=60) Does partyid vary by age? table(practice$partyid) table(older$partyid) CrossTable(practice$age,practice$partyid) Subsetting on older GOP voters: olderGOP<-subset(older,newpartyid==2) We could now run analyses on our subsets...

  20. Basic bivariate stats ◮ Correlation (numeric variables) duration<-faithful$eruptions waiting<-faithful$waiting cor(duration,waiting) cor.test(duration,waiting) ◮ Crosstabulation (categorical variables) install.packages("gmodels") library(gmodels) CrossTable(nes08$partyid,nes08$marriage) CrossTable(nes08$partyid,nes08$bibleview) ◮ down the road: regression models

  21. Sample plots hist(faithful$eruptions) Histogram of faithful$eruptions 60 Frequency 40 20 0 2 3 4 5 faithful$eruptions

  22. hist(faithful$eruptions,breaks=20,col="lightblue2", main="Histogram of ’eruptions’ variable",xlab="x",ylab="freq(x)") Histogram of 'eruptions' variable 40 30 freq(x) 20 10 0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 x

  23. hist(eruptions, breaks=20,col="lightblue2",main="Histogram of ’eruptions’ Variable",xlab="x",ylab="freq(x)",prob=TRUE) curve(dnorm(x, mean=mean(eruptions), sd=sd(eruptions)), add=TRUE) Histogram of 'eruptions' Variable 0.7 0.6 0.5 0.4 freq(x) 0.3 0.2 0.1 0.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 x

  24. my.density<-density(faithful$eruptions) plot(my.density) density.default(x = faithful$eruptions) 0.5 0.4 0.3 Density 0.2 0.1 0.0 1 2 3 4 5 6 N = 272 Bandwidth = 0.3348

  25. plot(my.density,col="seagreen3",main="PDF of ’eruptions’ variable",xlab="x",ylab="Pr(X=x)",lty=6,lwd=4)

  26. plot(faithful$eruptions,faithful$waiting) 90 80 faithful$waiting 70 60 50 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 faithful$eruptions

  27. plot(eruptions,waiting,main="Scatterplot of faithful Data",xlab="Eruptions",ylab="Waiting",pch=19) Scatterplot of faithful Data 90 80 Waiting 70 60 50 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

  28. plot(eruptions ∼ waiting,main="Scatterplot with Regression Line",xlab="Eruptions",ylab="Waiting") abline(lm(eruptions ∼ waiting),col="blue",lwd=3)

  29. plot(eruptions,waiting,main="Scatterplot with Smoothed Regression Line",xlab="Eruptions",ylab="Waiting",pch=20) lines(lowess(eruptions,waiting),col="red",lwd=3)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend