Rearranging and manipulating h e a d e r = T R U E , n - PDF document

An introduction to WS 2019/2020 m y d a t a < - r e a d . t a b l e ( fj l e = " m y d a t a . t x t " , Rearranging and manipulating h e a d e r = T R U E , n a . s t r i n g s = " n " ) data What was the sign for missing data in mydata.txt? Answer: “n” What is written in the first line of mydata.txt? Dr. Noémie Becker Answer: column names Dr. Eliza Argyridou Is the command correct? Answer: YES! Special thanks to : Dr. Benedikt Holtmann and Dr. SOnja Grath for sharing slides for this lecture What you should know after day 5 What you should know after day 5 Rearranging and manipulating data Rearranging and manipulating data ● Reshaping data ● Reshaping data ● Combining data sets ● Combining data sets ● Making new variables ● Making new variables ● Subsetting data ● Subsetting data ● Summarizing data ● Summarizing data We will work with two particular packages: ● t i d y r ● d p l y r YOUR TURN What do we have to do before we can work with a package in R? (2 things) 3 4 Reshaping data Reshaping data h e a d ( F i s h _ s u r v e y ) We will use data on fish abundance. Note: ● 3 species (trout, perch, stickleback) ● Download the file F from the course page. ● The numbers are abundance values for i s h _ s u r v e y . c s v the species at specific sites Set directory, for example: s e t w d ( " ~ / D e s k t o p / D a y _ 5 " ) To combine the three columns into one column that contains all ● Import the sample data into a variable F : i s h _ s u r v e y species you can use the function gather() from the tidyr package: F i s h _ s u r v e y < - r e a d . c s v ( " F i s h _ s u r v e y . c s v " , l i b r a r y ( t i d y r ) F i s h _ s u r v e y _ l o n g < - g a t h e r ( F i s h _ s u r v e y , h e a d e r = T R U E ) S p e c i e s , A b u n d a n c e , h e a d ( F i s h _ s u r v e y ) 4 : 6 ) 5 6

Reshaping data Reshaping data To convert the data back into a format with separate columns for each F i s h _ s u r v e y _ l o n g < - g a t h e r ( F i s h _ s u r v e y , species, you can use the function spread() from the tidyr package: S p e c i e s , A b u n d a n c e , 4 : 6 ) F i s h _ s u r v e y _ w i d e < - s p r e a d ( F i s h _ s u r v e y _ l o n g , h e a d ( F i s h _ s u r v e y _ l o n g ) S p e c i e s , A b u n d a n c e ) t a i l ( F i s h _ s u r v e y _ l o n g ) 7 8 What you should know after day 5 Combining data Rearranging and manipulating data We now want to combine the information given by three different data ● Reshaping data sets. ● Combining data sets ● Making new variables To combine the data sets we will use the package dplyr: ● Subsetting data ● Summarizing data l i b r a r y ( d p l y r ) F i s h _ s u r v e y . c s v W a t e r _ d a t a . c s v G P S _ d a t a . c s v 9 10 Combining data Which function could we use here? YOUR TURN Functjons to combine data sets in dplyr We can join data sets by using the columns they share. lefu_join(a, b, by = "x1") Joins matching rows from b to a right_join(a, b, by = "x1") Joins matching rows from a to b Fish survey Water characteristjcs GPS inner_join(a, b, by = "x1") Returns all rows from a where there are matching Site Site values in b Site Month Transect full_join(a, b, by = "x1") Joins data and returns all rows and columns Month Transect Latjtude Water temp. Species Longitude O 2 - content semi_join(a, b, by = "x1") All rows in a that have a match in b, keeping just columns from a. antj_join(a, b, by = "x1") All rows in a that do not have a match in b 11 12

Combining data Combining data 1) Join water characteristics to fish abundance data using inner_join() 2) Add GPS locations to new Fish_and_Water data set using inner_join() F i s h _ a n d _ W a t e r < - i n n e r _ j o i n ( F i s h _ s u r v e y _ l o n g , F i s h _ s u r v e y _ c o m b i n e d < - i n n e r _ j o i n ( F i s h _ a n d _ W a t e r , W a t e r _ d a t a , G P S _ l o c a t i o n , b y = c ( " S i t e " , " M o n t h " ) ) b y = c ( " S i t e " , " T r a n s e c t " ) ) 13 14 What you should know after day 5 Adding new variables Rearranging and manipulating data We will use data on bird behaviour. ● Reshaping data ● Combining data sets B i r d _ B e h a v i o u r < - r e a d . c s v ( " B i r d _ B e h a v i o u r . c s v " , ● Making new variables h e a d e r = T R U E , ● Subsetting data s t r i n g s A s F a c t o r s = F A L S E ) ● Summarizing data # G e t a n o v e r v i e w s t r ( B i r d _ B e h a v i o u r ) X1 X2 X1 X2 X3 A 1 A 1 T B 1 B 1 F A 2 A 2 T B 2 B 2 F We want to add the new variable (column) l o g _ F I D 15 16 Adding new variables Adding new variables Three possibilities: The outcome: a) Using $ h e a d ( B i r d _ B e h a v i o u r ) B i r d _ B e h a v i o u r $ l o g _ F I D < - l o g ( B i r d _ B e h a v i o u r $ F I D ) b) Using the [ ] - operator B i r d _ B e h a v i o u r [ , " l o g _ F I D " ] < - l o g ( B i r d _ B e h a v i o u r $ F I D ) c) Using the function mutate() from dplyr package B i r d _ B e h a v i o u r < - m u t a t e ( B i r d _ B e h a v i o u r , l o g _ F I D = l o g ( F I D ) ) 17 18

Adding new variables Combining variables We can split one column into two using the function separate() from We can combine two columns into one using the function unite() from dplyr package: the tidyr package: B i r d _ B e h a v i o u r < - s e p a r a t e ( B i r d _ B e h a v i o u r , B i r d _ B e h a v i o u r < - u n i t e ( B i r d _ B e h a v i o u r , S p e c i e s , " G e n u s _ S p e c i e s " , c ( " G e n u s " , " S p e c i e s " ) , c ( G e n u s , S p e c i e s ) , s e p = " _ " , s e p = " _ " , r e m o v e = T R U E ) r e m o v e = T R U E ) X1 X2.1 X2.2 X1 X2 X1 X2 X1 X2.1 X2.2 A 1 1 A 1_1 A 1_1 A 1 1 B 1 2 B 1_2 B 1_2 B 1 2 A 2 1 A 2_1 A 2_1 A 2 1 B 2 2 B 2_2 B 2_2 B 2 2 19 20 What you should know after day 5 Subsetting data Rearranging and manipulating data You can subset your data with: ● Reshaping data ● Combining data sets • The [ ] -operator ● Making new variables ● Subsetting data • The function subset() ● Summarizing data • With functions from the dplyr package  slice()  filter()  sample_frac()  sample_n()  select() 21 22 Subsetting data with the [ ]-operator Subsetting data with the [ ] and $-operators Examples: Example: # s e l e c t s t h e fj r s t 4 c o l u m n s # s e l e c t s a l l r o w s w i t h m a l e s B i r d _ B e h a v i o u r [ , 1 : 4 ] B i r d _ B e h a v i o u r [ B i r d _ B e h a v i o u r $ S e x = = " m a l e " , ] # s e l e c t s r o w s 2 a n d 3 B i r d _ B e h a v i o u r [ c ( 2 , 3 ) , ] # s e l e c t s t h e r o w s 1 t o 3 a n d c o l u m n s 1 t o 4 B i r d _ B e h a v i o u r [ 1 : 3 , 1 : 4 ] # s e l e c t s t h e r o w s 1 t o 3 a n d 6 , a n d t h e c o l u m n s 1 t o 4 # a n d 8 B i r d _ B e h a v i o u r [ c ( 1 : 3 , 6 ) , c ( 1 : 4 , 8 ) ] 23 24

Rearranging and manipulating h e a d e r = T R U E , n - PDF document

An introduction to WS 2019/2020 m y d a t a < - r e a d . t a b l e ( fj l e = " m y d a t a . t x t " , Rearranging and manipulating h e a d e r = T R U E , n a . s t r i n g

Rearranging and manipulating data Dr. Nomie Becker Dr. Sonja Grath Special thanks to : Dr.

What you should know after day 6 An introduction to WS 2018/2019 Review: Rearranging and

Manipulating Managed Execution Manipulating Managed Execution Runtimes to support Self-Healing

Rearranging deckchairs or changing course? The World Bank and global public goods Dominik

Sign Code Amendments editing and rearranging for clarity intent & purpose signs not

Lab 2: Linux/Unix shell Basics Navigating Shortcuts and globs Rearranging files Looking at

Digital Circuits and Systems Universality, Rearranging Truth Tables Shankar Balachandran*

Sorting Algorithms - rearranging a list of numbers into increasing (or decreasing) order.

Rearranging the Furniture Shifting discourses on skills development in South Africa 24 April

Manipulating the Human Memory for Fun and Profit Stefan Schumacher

Polynomial-Time What-If Analysis for Prefix-Manipulating MPLS Networks and Segment Routing!

Reconciling the Payroll Ledger June 18, 2019 Basics of Manipulating Data Reconciliation

pointer-manipulating programs Nadia Polikarpova joint work with Ilya Sergey (Yale-NUS) follow

Select Index Components & Import Data Manipulating Time Series Data in Python Market

Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language

Compare Time Series Growth Rates Manipulating Time Series Data in Python Comparing Stock

and remote 5. We are and using Australia is still passionate networks and as strong as

Opera&onal Status of MINOS+ Fermilab Opera&ons Review 8

QUEENSWAY SECONDARY SCHOOL 2019 Secondary 1 Registration Day 20 Dec 2018, Thursday 1 S ECONDARY

Ketchum Transportation Hub Detailed Site Analysis Open House #4 SITE SELECTION PROCESS OVERVIEW

Nov 20.-22., 2008, G ottingen Estimation of Different Scales in Microstructure DFG-SNF

In Search of Naming Patterns: A Survey of Finnish Lake Names Antti Leino

Mine Reclamation Applications of a New Water Budget Model: Wetbud W. Lee Daniels Dept. of Crop

Pre-history of planet detections Focus on transits 1620 - 1995 D. Briot 1 , J. Schneider 2 , P.

Rearranging and manipulating h e a d e r = T R U E , n - PDF document

An introduction to WS 2019/2020 m y d a t a < - r e a d . t a b l e ( fj l e = " m y d a t a . t x t " , Rearranging and manipulating h e a d e r = T R U E , n a . s t r i n g

Rearranging and manipulating data Dr. Nomie Becker Dr. Sonja Grath Special thanks to : Dr.

What you should know after day 6 An introduction to WS 2018/2019 Review: Rearranging and

Manipulating Managed Execution Manipulating Managed Execution Runtimes to support Self-Healing

Rearranging deckchairs or changing course? The World Bank and global public goods Dominik

Sign Code Amendments editing and rearranging for clarity intent &amp; purpose signs not

Lab 2: Linux/Unix shell Basics Navigating Shortcuts and globs Rearranging files Looking at

Digital Circuits and Systems Universality, Rearranging Truth Tables Shankar Balachandran*

Sorting Algorithms - rearranging a list of numbers into increasing (or decreasing) order.

Rearranging the Furniture Shifting discourses on skills development in South Africa 24 April

Manipulating the Human Memory for Fun and Profit Stefan Schumacher

Polynomial-Time What-If Analysis for Prefix-Manipulating MPLS Networks and Segment Routing!

Reconciling the Payroll Ledger June 18, 2019 Basics of Manipulating Data Reconciliation

pointer-manipulating programs Nadia Polikarpova joint work with Ilya Sergey (Yale-NUS) follow

Select Index Components &amp; Import Data Manipulating Time Series Data in Python Market

Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language

Compare Time Series Growth Rates Manipulating Time Series Data in Python Comparing Stock

and remote 5. We are and using Australia is still passionate networks and as strong as

Opera&amp;onal Status of MINOS+ Fermilab Opera&amp;ons Review 8

QUEENSWAY SECONDARY SCHOOL 2019 Secondary 1 Registration Day 20 Dec 2018, Thursday 1 S ECONDARY

Ketchum Transportation Hub Detailed Site Analysis Open House #4 SITE SELECTION PROCESS OVERVIEW

Nov 20.-22., 2008, G ottingen Estimation of Different Scales in Microstructure DFG-SNF

In Search of Naming Patterns: A Survey of Finnish Lake Names Antti Leino

Mine Reclamation Applications of a New Water Budget Model: Wetbud W. Lee Daniels Dept. of Crop

Pre-history of planet detections Focus on transits 1620 - 1995 D. Briot 1 , J. Schneider 2 , P.

Sign Code Amendments editing and rearranging for clarity intent & purpose signs not

Select Index Components & Import Data Manipulating Time Series Data in Python Market

Opera&onal Status of MINOS+ Fermilab Opera&ons Review 8