Reshaping data An introduction to WS 2018/2019 We will use data on - PDF document

Reshaping data An introduction to WS 2018/2019 We will use data on fish abundance. ● Download the file Fish_survey.csv from the course page. Set directory, for example: setwd("~/Desktop/Day_5") ● Import the sample data into a variable Fish_survey : Rearranging and manipulating data Fish_survey <- read.csv("Fish_survey.csv", header = TRUE) head(Fish_survey) Dr. Sonja Grath Dr. Eliza Argyridou Special thanks to : Dr. Benedikt Holtmann for sharing slides for this lecture 4 What you should know after day 5 Rearranging and manipulating data ● Reshaping data ● Combining data sets ● Making new variables Do you remember what I told ● Subsetting data you on data frames? ● Summarizing data IMPORTANT: We will work with two particular packages: All values of the same variable MUST go in the same column! ● tidyr ● dplyr Remember: What do we have to do before we can work with a package in R? (2 things) 2 5 Reshaping data We will use data on fish abundance. ● Download the file Fish_survey.csv from the course page. IMPORTANT: All values of the same variable MUST go in the same column! Set directory, for example: setwd("~/Desktop/Day_5") Example: Data of expression study 3 groups/treatments: Control, Tropics, Temperate ● Import the sample data into a variable Fish_survey : 4 measurements per treatment Fish_survey <- read.csv("Fish_survey.csv", header = TRUE) head(Fish_survey) NOT a data frame! 3 6

Same data as data frame Reshaping data Fish_survey_long <- gather(Fish_survey, Species, Abundance, 4:6) head(Fish_survey_long) tail(Fish_survey_long) 7 10 Reshaping data To convert the data back into a format with separate columns for each species, you can use the function spread() from the tidyr package: Back to the fish data... Fish_survey_wide <- spread(Fish_survey_long, Species, Abundance) 8 11 Reshaping data Combining data head(Fish_survey) Note: ● 3 species (trout, perch, stickleback) We now want to combine the information given by three different data ● The numbers are abundance values for sets. the species at specific sites To combine the data sets we will use the package dplyr: library(dplyr) To combine the three columns into one column that contains all species you can use the function gather() from the tidyr package: library(tidyr) Fish_survey_long <- gather(Fish_survey, Species, Abundance, 4:6) Fish_survey.csv Water_data.csv GPS_data.csv 9 12

Combining data Combining data We can join data sets by using the columns they share. 2) Add GPS locations to new Fish_and_Water data set using inner_join() Fish_survey_combined <- inner_join(Fish_and_Water, GPS_location, Fish survey Water GPS by = c(" Site ", " Transect ")) characteristics Site Site Site Month Transect Month Transect Latitude Water temp. Species Longitude O 2 - content 13 16 Combining data Adding new variables We will use data on bird behaviour. Functions to combine data sets in dplyr left_join(a, b, by = "x1") Joins matching rows from b to a Bird_Behaviour <- read.csv("Bird_Behaviour.csv", right_join(a, b, by = "x1") Joins matching rows from a to b header = TRUE, stringsAsFactors = FALSE) inner_join(a, b, by = "x1") Returns all rows from a where there are matching values in b full_join(a, b, by = "x1") Joins data and returns all rows and columns semi_join(a, b, by = "x1") All rows in a that have a match in b, keeping just columns from a. anti_join(a, b, by = "x1") All rows in a that do not have a match in b 14 17 Combining data Adding new variables We will use data on bird behaviour. 1) Join water characteristics to fish abundance data using inner_join() Bird_Behaviour <- read.csv("Bird_Behaviour.csv", Fish_and_Water <- inner_join(Fish_survey_long, header = TRUE, Water_data, stringsAsFactors=FALSE) by = c(" Site ", " Month ")) # Get an overview str(Bird_Behaviour) X1 X2 X1 X2 X3 A 1 A 1 T B 1 B 1 F A 2 A 2 T B 2 B 2 F We want to add the new variable (column) log_FID 15 18

Adding new variables Combining variables Three possibilities: We can combine two columns into one using the function unite() from the tidyr package: a) Using $ Bird_Behaviour$log_FID <- log(Bird_Behaviour$FID) Bird_Behaviour <- unite(Bird_Behaviour, "Genus_Species", b) Using the [ ] - operator c(Genus, Species), Bird_Behaviour[ , "log_FID"] <- log(Bird_Behaviour$FID) sep = "_", remove = TRUE) c) Using the function mutate() from dplyr package Bird_Behaviour <- mutate(Bird_Behaviour, X1 X2.1 X2.2 X1 X2 log_FID = log(FID)) A 1 1 A 1_1 B 1 2 B 1_2 A 2 1 A 2_1 B 2 2 B 2_2 19 22 Adding new variables Subsetting data The outcome: You can subset your data with: head(Bird_Behaviour) • The [ ] -operator • The function subset() • With functions from the dplyr package  slice()  filter()  sample_frac()  sample_n()  select() 20 23 Adding new variables Subsetting data with the [ ]-operator Examples: We can split one column into two using the function separate() from dplyr package: # selects the first 4 columns Bird_Behaviour[ , 1:4] Bird_Behaviour <- separate(Bird_Behaviour, Species, # selects rows 2 and 3 c("Genus","Species"), Bird_Behaviour[c(2,3), ] sep = "_", remove = TRUE) # selects the rows 1 to 3 and columns 1 to 4 Bird_Behaviour[1:3, 1:4] X1 X2 X1 X2.1 X2.2 # selects the rows 1 to 3 and 6, and the columns 1 to 4 A 1_1 A 1 1 # and 8 B 1_2 B 1 2 Bird_Behaviour[c(1:3, 6), c(1:4, 8)] A 2_1 A 2 1 B 2_2 B 2 2 21 24

Subsetting data with the [ ] and $-operators Subsetting rows in dplyr Example: Subsetting by rows using slice() and fjlter() # selects all rows with males Examples slice() and fjlter(): Bird_Behaviour[Bird_Behaviour $ Sex == "male", ] Bird_Behaviour.slice <- slice(Bird_Behaviour, 3:5) # selects rows 3-5 Bird_Behaviour.filter <- filter(Bird_Behaviour, FID < 5) # selects rows that meet certain criteria 25 28 Subsetting data with subset() Subsetting rows in dplyr You can take a random sample of rows with sample_frac() and ?subset() sample_n() Examples sample_frac() and sample_n(): Argument Description Bird_Behaviour.50 <- sample_frac(Bird_Behaviour, x The object from which to extract subset size = 0.5, subset A logical expression that describes the set replace=FALSE) of rows to return # takes randomly 50% of the rows select An expression indicating which columns to return Bird_Behaviour_50Rows <- sample_n(Bird_Behaviour, 50, replace=FALSE) # takes randomly 50 rows 26 29 Examples Subsetting columns in dplyr You can subset by columns with select() subset(Bird_Behaviour, FID < 10) # selects all rows with FID smaller than 10m Examples: subset(Bird_Behaviour, FID < 10 & Sex == "male") Bird_Behaviour_col <- select(Bird_Behaviour, # selects all rows for males with FID smaller than Ind, # 10 Sex, Fledglings) subset(Bird_Behaviour, FID > 10 | FID < 15, # selects the columns Ind, Sex, and Fledglings select = c(Ind, Sex, Year)) # selects all rows that have a value of FID Bird_Behaviour_reduced <- select(Bird_Behaviour, # greater than 10 or less than 15. We keep only -Disturbance) # the IND, Sex and Year column # excludes the variable disturbance 27 30

Summarizing your data How can we get summaries for each species? Now we can get summaries for each species: You can summarize your data with dplyr Summary_species <- summarize(Bird_Behaviour_by_Species, mean.FID=mean(FID), # mean min.FID=min(FID), # minimum max.FID=max(FID), # maximum Example: med.FID=median(FID),# median Get the overall mean for FID using summarize() and mean() sd.FID=sd(FID), # standard deviation var.FID=var(FID), # variance summarize(Bird_Behaviour, n.FID=n()) # sample size mean.FID = mean(FID)) Summary_species mean.FID 1 11.82639 We can make a data frame out of a tibble with: as.data.frame(Summary_species) 31 34 Summarizing your data We can add more measurements to our summary: summarize(Bird_Behaviour, mean.FID = mean(FID), # mean min.FID = min(FID), # minimum max.FID = max(FID), # maximum med.FID = median(FID), # median sd.FID = sd(FID), # standard deviation var.FID = var(FID), # variance n.FID = n()) # sample size mean.FID max.FID med.FID sd.FID var.FID n.FID 1 11.82639 30 10 8.082036 65.3193 144 32 How can we get summaries for each species? Before you can calculate these summaries, you have to apply the group_by() function from the dplyr package: Bird_Behaviour_by_Species <- group_by(Bird_Behaviour, Genus_Species) 33

Reshaping data An introduction to WS 2018/2019 We will use data on - PDF document

Reshaping data An introduction to WS 2018/2019 We will use data on fish abundance. Download the file Fish_survey.csv from the course page. Set directory, for example: setwd("~/Desktop/Day_5") Import the sample data into a

Community Liaison Committee July 2016 Reshaping Services Recap Reshaping Services Our Change

Reshaping Services Programme Community Liaison Committee February 2017 Reshaping Services Recap

Reshaping a data frame Steve Bagley somgen223.stanford.edu 1 Reshaping data Sometimes data

Data reshaping with tidyr Data reshaping with tidyr and functionals with purrr and functionals

Vale of Glamorgan Council Reshaping Services Programme Community Liaison Committee January 2016

Reshaping Visible Services & Transport Scrutiny Consultation . 13 th and 14 th March 2017

Reshaping the Way Healthcare is Delivered Registration & Survey Responses: May 27 th , 2020

Day 3: Data Manipulation Sociology Methods Camp September 6th, 2018 1 / 54 Outline 1. Tidy

TRENDS RESHAPING RETAIL ELECTRIC SERVICE PNUCC BOARD MEETING MARCH 9, 2018 Threat or

Conference 2019 Break out 2b Reshaping the Local Authority approach to debt recovery Facilitator

NORTH AMERICAN PETROLEUM RENAISSANCE (Reshaping World LNG Markets) Lucian Pugliaresi Energy

Efficient Trajectory Reshaping in a Dynamic Environment Martin Biel, Mikael Norrlf KTH - Royal

Reshaping Prosafe - Transforming agreement with COSCO and Lenders Disclaimer All statements in

THE NORTH AMERICAN PETROLEUM RENAISSANCE (Reshaping World LNG Markets) Embassy of Panama August

RESHAPING MIGRATION NARRATIVES UN Human Rights roundtable, Bangkok Tim Dixon | @dixontim

The Future of Financial Services How disruptive innovations are reshaping the way financial

Behind Enemy Lines Espionage and Covert Operations Need for Intelligence Agencies Armies are

FOURTH QUARTER 2019 INVESTOR PRESENTATION Financing the Growth of Tomorrows Companies Today TM

On Designing and Thwarting Worms using Co-ordination Jayanthkumar Kannan Karthik

SCIENCE, TECHNOLOGY , AND PUBLIC POLICY POLS 2390 SPRING 2012 SCIENCE, TECHNOLOGY , AND

Lecture 2: Design Studies Information Visualization CPSC 533C, Fall 2011 Tamara Munzner UBC

The Knowledge Content of Neural Networks Keith L. Downing The Norwegian University of Science and

Spatial Capture-Recapture Scenario Detectors Animal locations Effective area? No Problem:

iSCSI MIB Team Status 50 th IETF - Minneapolis March 2001 Mark Bakke, Cisco Lawrence Lamers, San

Reshaping data An introduction to WS 2018/2019 We will use data on - PDF document

Reshaping data An introduction to WS 2018/2019 We will use data on fish abundance. Download the file Fish_survey.csv from the course page. Set directory, for example: setwd("~/Desktop/Day_5") Import the sample data into a

Community Liaison Committee July 2016 Reshaping Services Recap Reshaping Services Our Change

Reshaping Services Programme Community Liaison Committee February 2017 Reshaping Services Recap

Reshaping a data frame Steve Bagley somgen223.stanford.edu 1 Reshaping data Sometimes data

Data reshaping with tidyr Data reshaping with tidyr and functionals with purrr and functionals

Vale of Glamorgan Council Reshaping Services Programme Community Liaison Committee January 2016

Reshaping Visible Services &amp; Transport Scrutiny Consultation . 13 th and 14 th March 2017

Reshaping the Way Healthcare is Delivered Registration &amp; Survey Responses: May 27 th , 2020

Day 3: Data Manipulation Sociology Methods Camp September 6th, 2018 1 / 54 Outline 1. Tidy

TRENDS RESHAPING RETAIL ELECTRIC SERVICE PNUCC BOARD MEETING MARCH 9, 2018 Threat or

Conference 2019 Break out 2b Reshaping the Local Authority approach to debt recovery Facilitator

NORTH AMERICAN PETROLEUM RENAISSANCE (Reshaping World LNG Markets) Lucian Pugliaresi Energy

Efficient Trajectory Reshaping in a Dynamic Environment Martin Biel, Mikael Norrlf KTH - Royal

Reshaping Prosafe - Transforming agreement with COSCO and Lenders Disclaimer All statements in

THE NORTH AMERICAN PETROLEUM RENAISSANCE (Reshaping World LNG Markets) Embassy of Panama August

RESHAPING MIGRATION NARRATIVES UN Human Rights roundtable, Bangkok Tim Dixon | @dixontim

The Future of Financial Services How disruptive innovations are reshaping the way financial

Behind Enemy Lines Espionage and Covert Operations Need for Intelligence Agencies Armies are

FOURTH QUARTER 2019 INVESTOR PRESENTATION Financing the Growth of Tomorrows Companies Today TM

On Designing and Thwarting Worms using Co-ordination Jayanthkumar Kannan Karthik

SCIENCE, TECHNOLOGY , AND PUBLIC POLICY POLS 2390 SPRING 2012 SCIENCE, TECHNOLOGY , AND

Lecture 2: Design Studies Information Visualization CPSC 533C, Fall 2011 Tamara Munzner UBC

The Knowledge Content of Neural Networks Keith L. Downing The Norwegian University of Science and

Spatial Capture-Recapture Scenario Detectors Animal locations Effective area? No Problem:

iSCSI MIB Team Status 50 th IETF - Minneapolis March 2001 Mark Bakke, Cisco Lawrence Lamers, San

Reshaping Visible Services & Transport Scrutiny Consultation . 13 th and 14 th March 2017

Reshaping the Way Healthcare is Delivered Registration & Survey Responses: May 27 th , 2020