workshop 2 4 data manipulation
play

Workshop 2.4: Data manipulation Murray Logan 10 Mar 2019 Section - PowerPoint PPT Presentation

Workshop 2.4: Data manipulation Murray Logan 10 Mar 2019 Section 1 Data manipulation rename(,replace=) colnames() subset(,subset=,select=) revalue(,replace=) select(,...) recode() order() factor(,lab=) arrange() factor(,levels=)


  1. 3 37.11781 18.64913 142.2459 7 P4 A2 8 3 26.87429 20.14244 147.7174 H P3 A2 1 25.29508 18.46762 144.0437 2 18.94612 20.06427 144.8924 M P2 A1 6 2 23.83643 14.07060 144.8877 M P1 H 9 5 A2 M P4 A2 12 4 29.38325 19.68780 144.7944 M P3 11 A2 4 25.89843 14.52130 144.1700 L P4 A2 10 1 27.75781 20.33795 145.7753 L P3 A1 2 13.79532 20.38767 145.8359 > head (data.1, 2) 1 15.73546 17.25752 146.2397 Between Plot Cond Time > arrange (data.1,Between,Cond) 2 23.83643 14.07060 144.8877 M P1 A1 2 H LAT P1 A1 1 LONG LAT Temp Between Plot Cond Time Temp LONG L 4 37.95281 18.41013 142.0585 P2 A1 4 3 13.64371 20.74986 144.6884 L P1 A1 3 H 1 P2 A1 2 1 15.73546 17.25752 146.2397 H P1 A1 Your turn • sort by Between and then Cond

  2. H 2 23.83643 14.07060 144.8877 Between Plot Cond Time Temp LAT LONG 1 A1 P1 > head (data.1, 2) 1 15.73546 17.25752 146.2397 2 A1 P1 M Your turn • sort by Condition and then the ratio of Temp to LAT

  3. 3 37.11781 18.64913 142.2459 7 P4 A2 8 1 27.75781 20.33795 145.7753 L P3 A2 2 13.79532 20.38767 145.8359 4 25.89843 14.52130 144.1700 L P2 A1 6 3 13.64371 20.74986 144.6884 L P1 A1 L 9 4 37.95281 18.41013 142.0585 A1 M P4 A2 12 2 23.83643 14.07060 144.8877 M P1 11 A1 4 29.38325 19.68780 144.7944 M P3 A2 10 1 25.29508 18.46762 144.0437 M P2 5 H > head (data.1, 2) 1 15.73546 17.25752 146.2397 Between Plot Cond Time > arrange (data.1,Cond,Temp/LAT) 2 23.83643 14.07060 144.8877 M P1 A1 2 H P2 P1 A1 1 LONG LAT Temp Between Plot Cond Time Temp LAT LONG H A1 4 3 26.87429 20.14244 147.7174 H P3 A2 3 2 18.94612 20.06427 144.8924 P4 1 A2 2 1 15.73546 17.25752 146.2397 H P1 A1 Your turn • sort by Condition and then the ratio of Temp to LAT

  4. Section 3 Manipulating factors

  5. 2 H 2 23.83643 14.07060 144.8877 M P1 A1 > head (data.1, 2) 1 15.73546 17.25752 146.2397 P1 [1] "H" "L" "M" A1 1 LONG LAT Temp Between Plot Cond Time > levels (data.1$Cond) Manipulating factors • re-levelling • re-labelling • technically these operations are performed on single variables (vectors)

  6. 2 13.79532 20.38767 145.8359 4 37.95281 18.41013 142.0585 P1 L 3 13.64371 20.74986 144.6884 4 A1 P2 H 5 3 A1 P2 M 1 25.29508 18.46762 144.0437 6 A1 P2 L A1 2 23.83643 14.07060 144.8877 > data.3 <- data.1 LAT > levels (data.3$Cond) [1] "H" "L" "M" > data.3$Cond <- factor (data.3$Cond, levels= c ("L","M","H")) > levels (data.3$Cond) [1] "L" "M" "H" > head (data.3) Between Plot Cond Time Temp LONG M 1 A1 P1 H 1 15.73546 17.25752 146.2397 2 A1 P1 Re-levelling (sorting) factors

  7. 2 13.79532 20.38767 145.8359 High 3 A1 P1 Low 3 13.64371 20.74986 144.6884 4 A1 P2 4 37.95281 18.41013 142.0585 P1 Medium 5 A1 P2 Medium 1 25.29508 18.46762 144.0437 6 A1 P2 Low 2 23.83643 14.07060 144.8877 A1 > data.3 <- data.1 Between Plot > levels (data.3$Cond) [1] "H" "L" "M" > data.3$Cond <- factor (data.3$Cond, labels= c ("High","Low","Medium")) > levels (data.3$Cond) [1] "High" "Low" "Medium" > head (data.3) Cond Time 2 Temp LAT LONG 1 A1 P1 High 1 15.73546 17.25752 146.2397 Re-levelling (sorting) factors

  8. 2 13.79532 20.38767 145.8359 High 3 A1 P1 Low 3 13.64371 20.74986 144.6884 4 A1 P2 4 37.95281 18.41013 142.0585 P1 Medium 5 A1 P2 Medium 1 25.29508 18.46762 144.0437 6 A1 P2 Low 2 23.83643 14.07060 144.8877 A1 > data.3 <- data.1 > head (data.3) > levels (data.3$Cond) [1] "H" "L" "M" > data.3$Cond <- factor (data.3$Cond, levels= c ('L','M','H'), + labels= c ("Low","Medium","High")) > levels (data.3$Cond) [1] "Low" "Medium" "High" Between Plot 2 Cond Time Temp LAT LONG 1 A1 P1 High 1 15.73546 17.25752 146.2397 Re-levelling (sorting) factors

  9. 4 25.89843 14.52130 144.1700 P3 P2 Low 2 13.79532 20.38767 145.8359 7 A2 P3 H 3 26.87429 20.14244 147.7174 8 A2 P3 Medium 4 29.38325 19.68780 144.7944 9 A2 Low 6 1 27.75781 20.33795 145.7753 10 A2 P4 H 2 18.94612 20.06427 144.8924 11 A2 P4 Medium 3 37.11781 18.64913 142.2459 12 A2 P4 Low A1 1 25.29508 18.46762 144.0437 > data.3 <- data.1 %>% mutate (Cond= recode (Cond,'L'='Low', 'M'='Medium')) 1 15.73546 17.25752 146.2397 > levels (data.3$Cond) [1] "H" "Low" "Medium" > data.3 Between Plot Cond Time Temp LAT LONG 1 A1 P1 H 2 P2 Medium 4 A1 5 4 37.95281 18.41013 142.0585 H P2 A1 3 13.64371 20.74986 144.6884 A1 Low P1 A1 3 2 23.83643 14.07060 144.8877 P1 Medium Re-labelling factors

  10. 4 25.89843 14.52130 144.1700 P3 P2 Low 2 13.79532 20.38767 145.8359 7 A2 P3 H 3 26.87429 20.14244 147.7174 8 A2 P3 Medium 4 29.38325 19.68780 144.7944 9 A2 Low 6 1 27.75781 20.33795 145.7753 10 A2 P4 H 2 18.94612 20.06427 144.8924 11 A2 P4 Medium 3 37.11781 18.64913 142.2459 12 A2 P4 Low A1 1 25.29508 18.46762 144.0437 > data.3 <- data.1 %>% mutate (Cond= recode_factor (Cond,'L'='Low', 'M'='Medium')) 2 > levels (data.3$Cond) [1] "Low" "Medium" "H" > data.3 Between Plot Cond Time Temp LAT LONG 1 A1 P1 H 1 15.73546 17.25752 146.2397 A1 P2 Medium P1 Medium 2 23.83643 14.07060 144.8877 3 A1 P1 Low 3 13.64371 20.74986 144.6884 4 A1 P2 H 4 37.95281 18.41013 142.0585 5 A1 Re-levelling & labelling

  11. Re-levelling & labelling You might also want to check out the forcats package

  12. Section 4 Subset columns

  13. 4 25.89843 A2 M P3 A2 8 3 26.87429 H P3 7 9 2 13.79532 L P2 A1 6 1 25.29508 M 4 29.38325 A2 A1 A2 L P4 A2 12 3 37.11781 M P4 11 P3 2 18.94612 H P4 A2 10 1 27.75781 L P2 5 > head (data.1, 2) 1 15.73546 17.25752 146.2397 Between Plot Cond Time > select (data.1, Between,Plot,Cond,Time,Temp) 2 23.83643 14.07060 144.8877 M P1 A1 2 H 1 P1 A1 1 LONG LAT Temp Between Plot Cond Time Temp A1 4 37.95281 A1 H P2 A1 4 3 13.64371 L P1 3 P1 2 23.83643 M P1 A1 2 1 15.73546 H Selecting columns ( select )

  14. 4 25.89843 A2 M P3 A2 8 3 26.87429 H P3 7 9 2 13.79532 L P2 A1 6 1 25.29508 M 4 29.38325 A2 A1 A2 L P4 A2 12 3 37.11781 M P4 11 P3 2 18.94612 H P4 A2 10 1 27.75781 L P2 5 > head (data.1, 2) 1 15.73546 17.25752 146.2397 Between Plot Cond Time > select (data.1, -LAT,-LONG) 2 23.83643 14.07060 144.8877 M P1 A1 2 H 1 P1 A1 1 LONG LAT Temp Between Plot Cond Time Temp A1 4 37.95281 A1 H P2 A1 4 3 13.64371 L P1 3 P1 2 23.83643 M P1 A1 2 1 15.73546 H Selecting columns ( select )

  15. Selecting columns ( select ) n s t i o u n c r f l p e h e • contains() • ends_with() • starts_with() • matches() • everything() • ฀ must evaluate to indices

  16. P4 14.52130 144.1700 P2 18.46762 144.0437 Plot LAT LONG 1 P1 17.25752 146.2397 2 P1 14.07060 144.8877 3 P1 20.74986 144.6884 4 P2 18.41013 142.0585 5 6 2 23.83643 14.07060 144.8877 P2 20.38767 145.8359 7 P3 20.14244 147.7174 8 P3 19.68780 144.7944 9 P3 20.33795 145.7753 10 P4 20.06427 144.8924 11 P4 18.64913 142.2459 12 > select (data.1, contains ('L')) M P1 A1 2 1 15.73546 17.25752 146.2397 H P1 A1 1 LONG LAT Temp Between Plot Cond Time > head (data.1, 2) Selecting columns ( select ) s i o n n c t f u p e r h e l

  17. 12 14.52130 144.1700 18.41013 142.0585 2 23.83643 14.07060 144.8877 > select (data.1, starts_with ('L')) LAT LONG 1 17.25752 146.2397 2 14.07060 144.8877 3 20.74986 144.6884 4 5 P1 18.46762 144.0437 6 20.38767 145.8359 7 20.14244 147.7174 8 19.68780 144.7944 9 20.33795 145.7753 10 20.06427 144.8924 11 18.64913 142.2459 M A1 1 15.73546 17.25752 146.2397 H P1 A1 1 LONG LAT Temp Between Plot Cond Time > head (data.1, 2) 2 Selecting columns ( select ) s i o n n c t f u p e r h e l

  18. P4 14.52130 P2 18.46762 Plot LAT 1 P1 17.25752 2 P1 14.07060 3 P1 20.74986 4 P2 18.41013 5 6 2 23.83643 14.07060 144.8877 P2 20.38767 7 P3 20.14244 8 P3 19.68780 9 P3 20.33795 10 P4 20.06427 11 P4 18.64913 12 > select (data.1, ends_with ('t')) M P1 A1 2 1 15.73546 17.25752 146.2397 H P1 A1 1 LONG LAT Temp Between Plot Cond Time > head (data.1, 2) Selecting columns ( select ) s i o n n c t f u p e r h e l

  19. 4 25.89843 1 25.29508 Time Temp 1 1 15.73546 2 2 23.83643 3 3 13.64371 4 4 37.95281 5 6 2 23.83643 14.07060 144.8877 2 13.79532 7 3 26.87429 8 4 29.38325 9 1 27.75781 10 2 18.94612 11 3 37.11781 12 > select (data.1, matches ('^T[a-z]m.')) M P1 A1 2 1 15.73546 17.25752 146.2397 H P1 A1 1 LONG LAT Temp Between Plot Cond Time > head (data.1, 2) Selecting columns ( select ) s i o n n c t f u p e r h e l

  20. Regular expressions (regexp) https://www.rstudio.com/resources/cheatsheets/raw/master/regex.pdf lyrics“ !"#$%&’()*+, … … … • ฀img src=฀figure/regex.pdf฀, width=0cm฀฀

  21. 4 25.89843 A1 3 26.87429 H P3 A2 7 2 13.79532 L P2 6 A2 1 25.29508 M P2 A1 5 4 37.95281 H P2 A1 8 P3 3 13.64371 2 18.94612 L P4 A2 12 3 37.11781 M P4 A2 11 H M P4 A2 10 1 27.75781 L P3 A2 9 4 29.38325 4 L H > select (data.1, Between:Temp) Temp LAT LONG 1 A1 P1 P1 1 15.73546 17.25752 146.2397 2 A1 P1 M 2 23.83643 14.07060 144.8877 Between Plot Cond Time > head (data.1, 2) Temp 1 A1 P1 H 1 15.73546 2 A1 P1 M 2 23.83643 3 A1 Between Plot Cond Time Selecting columns ( select ) s i o n n c t f u p e r h e l

  22. 272.7 14.5 2 32.5 304 940 279.5 282.2 3 26.0 298 960 284.7 285.2 4 276 > head (nasa) 990 289.3 290.7 5 10.5 274 1000 292.2 292.7 6 9.5 264 1000 272.1 835 293.6 16.5 lat long month year cloudhigh cloudlow 1 36.20000 -113.8 1 1995 26.0 7.5 2 33.70435 -113.8 1 1995 20.0 11.5 3 31.20870 -113.8 1 1995 16.0 4 28.71304 -113.8 304 1 1995 13.0 20.5 5 26.21739 -113.8 1 1995 7.5 26.0 6 23.72174 -113.8 1 1995 8.0 30.0 cloudmid ozone pressure surftemp temperature 1 34.5 294.1 Your turn Select lat , long , and cloud.. columns

  23. 9.5 264 1 36.20000 -113.8 long cloudhigh cloudlow cloudmid lat > head ( select (nasa, lat, long, starts_with ("cloud"))) 293.6 294.1 1000 9.5 7.5 30.0 8.0 1 1995 6 23.72174 -113.8 292.7 292.2 1000 274 26.0 34.5 26.0 20.5 30.0 8.0 6 23.72174 -113.8 10.5 26.0 7.5 5 26.21739 -113.8 14.5 13.0 2 33.70435 -113.8 4 28.71304 -113.8 26.0 16.5 16.0 3 31.20870 -113.8 32.5 11.5 20.0 10.5 7.5 > head (nasa) 272.7 304 32.5 11.5 20.0 1 1995 2 33.70435 -113.8 272.1 835 279.5 304 34.5 7.5 26.0 1 1995 1 36.20000 -113.8 long month year cloudhigh cloudlow cloudmid ozone pressure surftemp temperature lat 940 282.2 1 1995 1 1995 5 26.21739 -113.8 290.7 289.3 990 276 14.5 20.5 13.0 4 28.71304 -113.8 3 31.20870 -113.8 285.2 284.7 960 298 26.0 16.5 16.0 1 1995 Your turn

  24. 5 0 0 0 72 81 6 V7 0 0 0 81 7 V8 0 > tikus[1:10, c (1:3,76:77)] 16 81 8 V9 0 0 0 81 9 V10 0 0 16 V6 81 10 V3 Psammocora contigua Psammocora digitata Pocillopora damicornis time rep V1 0 0 79 81 1 V2 0 0 51 81 2 0 9 0 42 81 3 V4 0 0 15 81 4 V5 0 0 81 Your turn Select rep , time and only Species that DONT contain pora

  25. > dplyr:: select (tikus, - contains ('pora')) > ## OR if we wanted to alter the order... > dplyr:: select (tikus, rep, time, everything (),- contains ('pora')) Your turn Select rep , time and only Species that DONT contain pora

  26. 0 0 0 V44 0 V43 0 V42 18 V41 0 V40 0 V39 V38 0 0 V37 0 V36 0 V35 0 V34 0 V33 0 V32 0 V45 V46 0 V54 V60 30 V59 0 V58 10 V57 0 V56 0 V55 0 0 0 V53 0 V52 0 V51 0 V50 0 V49 0 V48 0 V47 V31 V30 > dplyr:: select (tikus, `Pocillopora damicornis`) 0 0 V13 0 V12 0 V11 16 V10 0 V9 16 V8 V7 0 72 V6 9 V5 15 V4 42 V3 51 V2 79 V1 Pocillopora damicornis V14 V15 0 V23 V29 0 V28 0 V27 0 V26 0 V25 0 V24 0 0 0 V22 0 V21 0 V20 0 V19 0 V18 0 V17 0 V16 Select awkward names

  27. 25.89843 14.52130 144.1700 H 29.38325 19.68780 144.7944 4 M P3 A2 8 26.87429 20.14244 147.7174 3 P3 A2 A2 7 13.79532 20.38767 145.8359 2 L P2 A1 6 25.29508 18.46762 144.0437 9 P3 M P4 4 L P4 A2 12 37.11781 18.64913 142.2459 3 M A2 L 11 18.94612 20.06427 144.8924 2 H P4 A2 10 27.75781 20.33795 145.7753 1 1 P2 > head (data.1, 2) A1 1 LONG LAT Between Plot Condition Time Temperature > rename (data.1, Condition=Cond, Temperature=Temp) 2 23.83643 14.07060 144.8877 M P1 2 P1 1 15.73546 17.25752 146.2397 H P1 A1 1 LONG LAT Temp Between Plot Cond Time A1 H A1 L 5 37.95281 18.41013 142.0585 4 H P2 A1 4 13.64371 20.74986 144.6884 3 P1 1 A1 3 23.83643 14.07060 144.8877 2 M P1 A1 2 15.73546 17.25752 146.2397 Re-naming columns (vectors)

  28. Section 5 Filtering

  29. 3 37.11781 18.64913 142.2459 2 23.83643 14.07060 144.8877 A1 4 4 37.95281 18.41013 142.0585 H P2 A1 3 M M P1 A1 2 1 15.73546 17.25752 146.2397 H P1 A1 1 P2 1 25.29508 18.46762 144.0437 LAT 7 M P4 A2 8 2 18.94612 20.06427 144.8924 H P4 A2 4 29.38325 19.68780 144.7944 5 M P3 A2 6 3 26.87429 20.14244 147.7174 H P3 A2 LONG Temp > head (data.1, 2) 2 LAT Temp Between Plot Cond Time > filter (data.1, Cond=='H') 2 23.83643 14.07060 144.8877 M P1 A1 1 15.73546 17.25752 146.2397 1 H P1 A1 1 LONG LAT Temp Between Plot Cond Time LONG A1 Between Plot Cond Time P3 > filter (data.1, Cond %in% c ('H','M')) 2 18.94612 20.06427 144.8924 H P4 A2 4 3 26.87429 20.14244 147.7174 H A2 P1 3 4 37.95281 18.41013 142.0585 H P2 A1 2 1 15.73546 17.25752 146.2397 H Filtering

  30. 2 18.94612 20.06427 144.8924 2 23.83643 14.07060 144.8877 4 3 13.64371 20.74986 144.6884 L P1 A1 3 M P2 P1 A1 2 1 15.73546 17.25752 146.2397 H P1 A1 H 1 P3 H P4 A2 7 3 26.87429 20.14244 147.7174 H A2 4 37.95281 18.41013 142.0585 6 2 13.79532 20.38767 145.8359 L P2 A1 5 A1 LONG > head (data.1, 2) H 2 23.83643 14.07060 144.8877 M P1 A1 2 1 15.73546 17.25752 146.2397 P1 Between Plot Cond Time A1 1 LONG LAT Temp Between Plot Cond Time > filter (data.1, Cond=='H' & Temp<25) Temp LAT A2 Temp Between Plot Cond Time > filter (data.1, Cond=='H' | Temp<25) 2 18.94612 20.06427 144.8924 H P4 2 LAT 1 15.73546 17.25752 146.2397 H P1 A1 1 LONG Filtering

  31. H 2 23.83643 14.07060 144.8877 Between Plot Cond Time Temp LAT LONG 1 A1 P1 > head (data.1, 2) 1 15.73546 17.25752 146.2397 2 A1 P1 M Your turn Keep only those rows with Temp less than 20 and LAT greater than 20 or LONG less than 145

  32. 2 18.94612 20.06427 144.8924 2 LAT LONG 1 A1 P1 L 3 13.64371 20.74986 144.6884 A1 Between Plot Cond Time P2 L 2 13.79532 20.38767 145.8359 3 A2 P4 H Temp LONG <145)) > head (data.1, 2) H Between Plot Cond Time Temp LAT LONG 1 A1 P1 1 15.73546 17.25752 146.2397 > filter (data.1, Temp<20 & (LAT>20 | 2 A1 P1 M 2 23.83643 14.07060 144.8877 Your turn Keep only those rows with Temp less than 20 and LAT greater than 20, or LONG less than 145

  33. <dbl> 26.0, 20.0, 16.0, 13.0, 7.5, 8.0, 14.5, 19.5, 22.5, 21.0, 19.0, 1... > glimpse (nasa) $ surftemp <dbl> 835, 940, 960, 990, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 100... $ pressure <dbl> 304, 304, 298, 276, 274, 264, 258, 252, 250, 250, 248, 248, 250, ... $ ozone <dbl> 34.5, 32.5, 26.0, 14.5, 10.5, 9.5, 11.0, 17.5, 18.5, 16.5, 12.5, ... $ cloudmid <dbl> 7.5, 11.5, 16.5, 20.5, 26.0, 30.0, 29.5, 26.5, 27.5, 26.0, 28.5, ... $ cloudlow $ cloudhigh $ temperature <dbl> 272.1, 282.2, 285.2, 290.7, 292.7, 293.6, 294.6, 296.9, 297.8, 29... <int> 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995,... $ year <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,... $ month <dbl> -113.8000, -113.8000, -113.8000, -113.8000, -113.8000, -113.8000,... $ long <dbl> 36.200000, 33.704348, 31.208696, 28.713043, 26.217391, 23.721739,... $ lat Variables: 11 Observations: 41,472 <dbl> 272.7, 279.5, 284.7, 289.3, 292.2, 294.1, 295.0, 298.3, 300.1, 30... Your turn Filter to the largest ozone value for the second month of the last year

  34. > filter (nasa,year== max (year) & month==2) %>% arrange (-ozone) %>% head (5) > filter (nasa,year== max (year) & month==2) %>% arrange (-ozone) %>% slice (1:5) > ##OR > filter (nasa,year== max (year) & month==2 ) %>% top_n (5, ozone) Your turn Filter to the largest ozone value for the second month of the last year

  35. $ temperature <dbl> 272.1, 282.2, 285.2, 290.7, 292.7, 293.6, 294.6, 296.9, 297.8, 29... $ cloudhigh <dbl> 272.7, 279.5, 284.7, 289.3, 292.2, 294.1, 295.0, 298.3, 300.1, 30... $ surftemp <dbl> 835, 940, 960, 990, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 100... $ pressure <dbl> 304, 304, 298, 276, 274, 264, 258, 252, 250, 250, 248, 248, 250, ... $ ozone <dbl> 34.5, 32.5, 26.0, 14.5, 10.5, 9.5, 11.0, 17.5, 18.5, 16.5, 12.5, ... $ cloudmid <dbl> 7.5, 11.5, 16.5, 20.5, 26.0, 30.0, 29.5, 26.5, 27.5, 26.0, 28.5, ... $ cloudlow <dbl> 26.0, 20.0, 16.0, 13.0, 7.5, 8.0, 14.5, 19.5, 22.5, 21.0, 19.0, 1... <int> 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995, 1995,... $ year <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,... $ month <dbl> -113.8000, -113.8000, -113.8000, -113.8000, -113.8000, -113.8000,... $ long <dbl> 36.200000, 33.704348, 31.208696, 28.713043, 26.217391, 23.721739,... $ lat Variables: 11 Observations: 41,472 > glimpse (nasa) Your turn Filter to all ozone values between 320 and 325 in the first month of the last year

  36. > filter (nasa,ozone > 320 & ozone<325, month== first (month), year== last (year)) > ##OR > filter (nasa, between (ozone,320,325), month== first (month), year== last (year)) Your turn Filter to all ozone values between 320 and 325 in the first month of the last year

  37. 3 26.87429 20.14244 147.7174 P2 1 LONG LAT Temp Between Plot Cond Time > slice (data.1, c (1:4,7)) 4 37.95281 18.41013 142.0585 H A1 P1 4 3 13.64371 20.74986 144.6884 L P1 A1 3 2 23.83643 14.07060 144.8877 M P1 A1 H 2 4 H P3 A2 5 4 37.95281 18.41013 142.0585 H P2 A1 3 13.64371 20.74986 144.6884 1 15.73546 17.25752 146.2397 L P1 A1 3 2 23.83643 14.07060 144.8877 M P1 A1 2 A1 1 15.73546 17.25752 146.2397 Temp A1 H LAT LONG 1 A1 P1 H 1 15.73546 17.25752 146.2397 2 P1 > head (data.1, 2) M 2 23.83643 14.07060 144.8877 > slice (data.1, 1:4) Between Plot Cond Time Temp LAT LONG 1 A1 P1 Between Plot Cond Time Slicing r m b e n u r o w b y n g r i l t e F i

  38. 1 27.75781 20.33795 145.7753 3 37.11781 18.64913 142.2459 10 1 25.29508 18.46762 144.0437 M P2 A1 5.2 M P4 P4 A2 11.1 3 37.11781 18.64913 142.2459 M P4 A2 H 11 P2 L P3 A2 9 2 13.79532 20.38767 145.8359 L A1 2 18.94612 20.06427 144.8924 6.1 4 25.89843 14.52130 144.1700 L P4 A2 12 A2 2 13.79532 20.38767 145.8359 > head (data.1, 2) H 2 23.83643 14.07060 144.8877 M P1 A1 2 1 15.73546 17.25752 146.2397 P1 Between Plot Cond Time A1 1 LONG LAT Temp Between Plot Cond Time > sample_n (data.1, 10, replace=TRUE) Temp L A1 P2 A1 6 1 25.29508 18.46762 144.0437 M P2 5.1 LAT 1 25.29508 18.46762 144.0437 M P2 A1 5 LONG Sampling

  39. 2 23.83643 14.07060 144.8877 P1 P2 H 4 37.95281 18.41013 142.0585 10 A2 P4 H 2 18.94612 20.06427 144.8924 3 A1 L 4 3 13.64371 20.74986 144.6884 9 A2 P3 L 1 27.75781 20.33795 145.7753 2 A1 P1 M A1 1 25.29508 18.46762 144.0437 > head (data.1, 2) A1 Between Plot Cond Time Temp LAT LONG 1 A1 P1 H 1 15.73546 17.25752 146.2397 2 P1 M M 2 23.83643 14.07060 144.8877 > sample_frac (data.1, 0.5, replace=TRUE) Between Plot Cond Time Temp LAT LONG 5 A1 P2 Sampling

  40. [1] "H" "L" "M" 1 15.73546 17.25752 146.2397 > levels (data.1$Cond) > #examine the levels of the Cond factor 2 23.83643 14.07060 144.8877 M P1 A1 2 H > head (data.1, 2) P1 A1 1 LONG LAT Temp Between Plot Cond Time Effects of filtering

  41. [1] "A1" "A2" P1 > levels (data.3$Between) [1] "P1" "P2" "P3" "P4" > levels (data.3$Plot) [1] "H" "L" "M" > levels (data.3$Cond) > #examine the levels of the Cond factor 3 13.64371 20.74986 144.6884 L P1 A1 3 2 23.83643 14.07060 144.8877 M A1 > #subset the dataset to just Cond H 2 1 15.73546 17.25752 146.2397 H P1 A1 1 LONG LAT Temp Between Plot Cond Time > data.3 > #examine subset data > data.3<- filter (data.1,Plot=='P1') Effects of filtering

  42. [1] "A1" > #subset the dataset to just Cond H > levels (data.3$Between) [1] "P1" > levels (data.3$Plot) [1] "H" "L" "M" > levels (data.3$Cond) > #examine the levels of each factor > data.3<- droplevels (data.3) > #drop the unused factor levels from all factors > data.3<- filter (data.1,Plot=='P1') Effects of filtering s t o r f a c l l - a o n c t i r e C o r

  43. [1] "A1" "A2" > #subset the dataset to just Cond H > levels (data.3$Between) [1] "P1" > levels (data.3$Plot) [1] "H" "L" "M" > levels (data.3$Cond) > #examine the levels of each factor > data.3$Plot<- factor (data.3$Plot) > #drop the unused factor levels from Cond > data.3<- filter (data.1,Plot=='P1') Effects of filtering r c t o f a g l e s i n - o n c t i r r e C o

  44. Section 6 Adding columns

  45. 4 25.89843 14.52130 144.1700 158.6913 7 P3 A2 8 3 26.87429 20.14244 147.7174 167.8598 H P3 A2 2 13.79532 20.38767 145.8359 166.2236 4 29.38325 19.68780 144.7944 164.4822 L P2 A1 6 1 25.29508 18.46762 144.0437 162.5113 M P2 M 9 5 A2 L P4 A2 12 3 37.11781 18.64913 142.2459 160.8950 M P4 11 A2 2 18.94612 20.06427 144.8924 164.9567 H P4 A2 10 1 27.75781 20.33795 145.7753 166.1133 L P3 A1 4 37.95281 18.41013 142.0585 160.4686 > head (data.1, 2) 2 Temp Between Plot Cond Time > mutate (data.1, LL=LAT+LONG) 2 23.83643 14.07060 144.8877 M P1 A1 1 15.73546 17.25752 146.2397 LONG H P1 A1 1 LONG LAT Temp Between Plot Cond Time LAT LL H 3 P2 A1 4 3 13.64371 20.74986 144.6884 165.4383 L P1 A1 2 23.83643 14.07060 144.8877 158.9583 1 M P1 A1 2 1 15.73546 17.25752 146.2397 163.4972 H P1 A1 Mutate

  46. 4 25.89843 14.52130 144.1700 3.254182 7 P3 A2 8 3 26.87429 20.14244 147.7174 3.291170 H P3 A2 2 13.79532 20.38767 145.8359 2.624329 4 29.38325 19.68780 144.7944 3.380425 L P2 A1 6 1 25.29508 18.46762 144.0437 3.230610 M P2 M 9 5 A2 L P4 A2 12 3 37.11781 18.64913 142.2459 3.614097 M P4 11 A2 2 18.94612 20.06427 144.8924 2.941599 H P4 A2 10 1 27.75781 20.33795 145.7753 3.323517 L P3 A1 4 37.95281 18.41013 142.0585 3.636343 > head (data.1, 2) 2 Temp Between Plot Cond Time > mutate (data.1, logTemp= log (Temp)) 2 23.83643 14.07060 144.8877 M P1 A1 1 15.73546 17.25752 146.2397 LONG H P1 A1 1 LONG LAT Temp Between Plot Cond Time LAT logTemp H 3 P2 A1 4 3 13.64371 20.74986 144.6884 2.613279 L P1 A1 2 23.83643 14.07060 144.8877 3.171215 1 M P1 A1 2 1 15.73546 17.25752 146.2397 2.755917 H P1 A1 Mutate

  47. 1.2120555 P3 4 29.38325 19.68780 144.7944 24.68638 M P3 A2 8 2.1879137 3 26.87429 20.14244 147.7174 24.68638 H A2 9 7 2 13.79532 20.38767 145.8359 24.68638 -10.8910607 L P2 A1 6 0.6087009 1 25.29508 18.46762 144.0437 24.68638 M 4.6968702 A2 A1 A2 4 25.89843 14.52130 144.1700 24.68638 L P4 A2 12 12.4314348 3 37.11781 18.64913 142.2459 24.68638 M P4 11 P3 -5.7402607 2 18.94612 20.06427 144.8924 24.68638 H P4 A2 10 3.0714367 1 27.75781 20.33795 145.7753 24.68638 L P2 5 > head (data.1, 2) A1 LAT Temp Between Plot Cond Time > #mutate(data.1, cTemp=Temp-mean(Temp)) > ## OR if just want the centered variable.. > mutate (data.1, MeanTemp= mean (Temp), cTemp=Temp-MeanTemp) 2 23.83643 14.07060 144.8877 M P1 2 cTemp 1 15.73546 17.25752 146.2397 H P1 A1 1 LONG LAT Temp Between Plot Cond Time LONG MeanTemp 1 13.2664312 3 4 37.95281 18.41013 142.0585 24.68638 H P2 A1 4 3 13.64371 20.74986 144.6884 24.68638 -11.0426630 L P1 A1 -0.8499436 A1 2 23.83643 14.07060 144.8877 24.68638 M P1 A1 2 -8.9509150 1 15.73546 17.25752 146.2397 24.68638 H P1 Mutate

  48. NA 37.11781 6 3 26.87429 20.14244 147.7174 29.38325 13.79532 H P3 A2 7 2 13.79532 20.38767 145.8359 26.87429 25.29508 L P2 A1 1 25.29508 18.46762 144.0437 13.79532 37.95281 A2 M P2 A1 5 4 37.95281 18.41013 142.0585 25.29508 13.64371 H P2 A1 4 3 13.64371 20.74986 144.6884 37.95281 23.83643 8 P3 P1 11 4 25.89843 14.52130 144.1700 L P4 A2 12 3 37.11781 18.64913 142.2459 25.89843 18.94612 M P4 A2 2 18.94612 20.06427 144.8924 37.11781 27.75781 M H P4 A2 10 1 27.75781 20.33795 145.7753 18.94612 29.38325 L P3 A2 9 4 29.38325 19.68780 144.7944 27.75781 26.87429 L A1 1 15.73546 17.25752 146.2397 Between Plot Cond Time Temp LAT LONG 1 A1 P1 H 3 2 A1 P1 M 2 23.83643 14.07060 144.8877 > mutate (data.1, leadTemp= lead (Temp), lagTemp= lag (Temp)) Temp > head (data.1, 2) LAT LONG leadTemp lagTemp 1 A1 P1 H 1 15.73546 17.25752 146.2397 23.83643 NA 2 A1 P1 M 2 23.83643 14.07060 144.8877 13.64371 15.73546 Between Plot Cond Time Mutate s i o n n c t f u d o w W i n

  49. 4 4 M P3 A2 8 3 7 3 26.87429 20.14244 147.7174 H P3 A2 7 2 2 13.79532 20.38767 145.8359 10 L P2 A1 6 1 1 1 25.29508 18.46762 144.0437 M P2 A1 5 4 4 29.38325 19.68780 144.7944 4 4 37.95281 18.41013 142.0585 11 10 4 25.89843 14.52130 144.1700 L P4 A2 12 3 7 3 37.11781 18.64913 142.2459 M P4 A2 2 9 4 2 18.94612 20.06427 144.8924 H P4 A2 10 1 1 1 27.75781 20.33795 145.7753 L P3 A2 10 H 2 23.83643 14.07060 144.8877 A1 LONG rankTime denseRankTime LAT Temp Between Plot Cond Time > mutate (data.1, rankTime= min_rank (Time),denseRankTime= dense_rank (Time)) P2 M P1 2 A1 1 15.73546 17.25752 146.2397 H P1 A1 1 LONG LAT Temp 1 P1 > head (data.1, 2) 3 A1 4 3 7 3 13.64371 20.74986 144.6884 L P1 A1 2 H 4 2 23.83643 14.07060 144.8877 M P1 A1 2 1 1 1 15.73546 17.25752 146.2397 Between Plot Cond Time Mutate s i o n n c t f u d o w W i n

  50. 12 2 M P3 A2 8 8 8 3 26.87429 20.14244 147.7174 H P3 A2 7 5 2 13.79532 20.38767 145.8359 10 L P2 A1 6 2 6 1 25.29508 18.46762 144.0437 M P2 A1 5 10 4 29.38325 19.68780 144.7944 11 4 37.95281 18.41013 142.0585 11 7 4 25.89843 14.52130 144.1700 L P4 A2 12 9 11 3 37.11781 18.64913 142.2459 M P4 A2 6 9 4 2 18.94612 20.06427 144.8924 H P4 A2 10 3 9 1 27.75781 20.33795 145.7753 L P3 A2 12 H 2 23.83643 14.07060 144.8877 A1 LONG rowTemp rowTime LAT Temp Between Plot Cond Time > mutate (data.1, rowTemp= row_number (Temp), rowTime= row_number (Time)) P2 M P1 2 A1 1 15.73546 17.25752 146.2397 H P1 A1 1 LONG LAT Temp 1 P1 > head (data.1, 2) 3 A1 4 7 1 3 13.64371 20.74986 144.6884 L P1 A1 4 H 5 2 23.83643 14.07060 144.8877 M P1 A1 2 1 3 1 15.73546 17.25752 146.2397 Between Plot Cond Time Mutate s i o n n c t f u d o w W i n

  51. 3 L A2 8 3 3 26.87429 20.14244 147.7174 H P3 A2 7 1 2 13.79532 20.38767 145.8359 P2 M A1 6 2 1 25.29508 18.46762 144.0437 M P2 A1 5 4 4 37.95281 18.41013 142.0585 H P3 4 29.38325 19.68780 144.7944 A1 11 4 25.89843 14.52130 144.1700 L P4 A2 12 4 3 37.11781 18.64913 142.2459 M P4 A2 2 4 2 18.94612 20.06427 144.8924 H P4 A2 10 3 1 27.75781 20.33795 145.7753 L P3 A2 9 P2 4 A1 2 Between Plot Cond Time > mutate (data.1, ntile (Temp,4)) 2 23.83643 14.07060 144.8877 M P1 1 1 15.73546 17.25752 146.2397 LAT H P1 A1 1 LONG LAT Temp Temp LONG ntile(Temp, 4) > head (data.1, 2) M 3 13.64371 20.74986 144.6884 L P1 A1 3 2 2 23.83643 14.07060 144.8877 P1 1 A1 2 1 1 15.73546 17.25752 146.2397 H P1 A1 Between Plot Cond Time Mutate s i o n n c t f u d o w W i n

  52. TRUE L A2 8 TRUE 3 26.87429 20.14244 147.7174 H P3 A2 7 FALSE 2 13.79532 20.38767 145.8359 P2 M A1 6 TRUE 1 25.29508 18.46762 144.0437 M P2 A1 5 FALSE 4 37.95281 18.41013 142.0585 H P3 4 29.38325 19.68780 144.7944 A1 11 4 25.89843 14.52130 144.1700 L P4 A2 12 FALSE 3 37.11781 18.64913 142.2459 M P4 A2 FALSE TRUE 2 18.94612 20.06427 144.8924 H P4 A2 10 TRUE 1 27.75781 20.33795 145.7753 L P3 A2 9 P2 4 A1 2 Between Plot Cond Time > mutate (data.1, between (Temp,20,30)) 2 23.83643 14.07060 144.8877 M P1 FALSE 1 15.73546 17.25752 146.2397 LAT H P1 A1 1 LONG LAT Temp Temp LONG between(Temp, 20, 30) > head (data.1, 2) M 3 13.64371 20.74986 144.6884 L P1 A1 3 TRUE 2 23.83643 14.07060 144.8877 P1 1 A1 2 FALSE 1 15.73546 17.25752 146.2397 H P1 A1 Between Plot Cond Time Mutate s i o n n c t f u d o w W i n

  53. 4 25.89843 14.52130 144.1700 Medium M A1 P1 H 1 15.73546 17.25752 146.2397 Low 2 A1 P1 2 23.83643 14.07060 144.8877 Medium fTemp 3 A1 P1 L 3 13.64371 20.74986 144.6884 Low 4 A1 1 LONG H P4 10 A2 P4 H 2 18.94612 20.06427 144.8924 Low 11 A2 M LAT 3 37.11781 18.64913 142.2459 High 12 A2 P4 L 4 25.89843 14.52130 144.1700 Medium Between Plot Cond Time Temp P2 4 37.95281 18.41013 142.0585 L Low P3 L 1 27.75781 20.33795 145.7753 Medium 10 A2 P4 H 2 18.94612 20.06427 144.8924 11 9 A2 P4 M 3 37.11781 18.64913 142.2459 High 12 A2 P4 L A2 4 29.38325 19.68780 144.7944 Medium High L 5 A1 P2 M 1 25.29508 18.46762 144.0437 Medium 6 A1 P2 2 13.79532 20.38767 145.8359 M Low 7 A2 P3 H 3 26.87429 20.14244 147.7174 Medium 8 A2 P3 1 27.75781 20.33795 145.7753 Medium P3 A1 LONG between (Temp, 20, 30) ~ 'Medium', + Temp>30 ~ 'High')) Between Plot Cond Time Temp LAT fTemp > mutate (data.1, fTemp= case_when (Temp<20 ~ 'Low', 1 A2 P1 H 1 15.73546 17.25752 146.2397 Low 2 + > ## OR P1 H Temp LAT LONG 1 A1 P1 1 15.73546 17.25752 146.2397 ifelse ( between (Temp,20,30), 'Medium', 'High'))) 2 A1 P1 M 2 23.83643 14.07060 144.8877 > mutate (data.1, fTemp= ifelse (Temp<20, 'Low', + A1 M > head (data.1, 2) P3 P2 L 2 13.79532 20.38767 145.8359 Low 7 A2 H 6 3 26.87429 20.14244 147.7174 Medium 8 A2 P3 M 4 29.38325 19.68780 144.7944 Medium 9 A1 1 25.29508 18.46762 144.0437 Medium 2 23.83643 14.07060 144.8877 Medium 4 3 A1 P1 L 3 13.64371 20.74986 144.6884 Low A1 M P2 H 4 37.95281 18.41013 142.0585 High 5 A1 P2 Between Plot Cond Time Mutate s i o n n c t f u d o w W i n

  54. 4 25.89843 14.52130 144.1700 Medium 1 25.29508 18.46762 144.0437 Medium H P3 A2 7 Low 2 13.79532 20.38767 145.8359 L P2 A1 6 M 8 P2 A1 5 High 4 37.95281 18.41013 142.0585 H P2 A1 4 Low 3 26.87429 20.14244 147.7174 Medium A2 L Low L P4 A2 12 High 3 37.11781 18.64913 142.2459 M P4 A2 11 2 18.94612 20.06427 144.8924 P3 H P4 A2 10 1 27.75781 20.33795 145.7753 Medium L P3 A2 9 4 29.38325 19.68780 144.7944 Medium M 3 13.64371 20.74986 144.6884 P1 2 1 15.73546 17.25752 146.2397 > mutate (data.1, fTemp= cut (Temp, breaks= c (0,20,30,100), 2 23.83643 14.07060 144.8877 M P1 A1 A1 H labels= c ('Low','Medium','High'))) P1 A1 1 LONG LAT Temp + Between Plot Cond Time > head (data.1, 2) Low 3 2 23.83643 14.07060 144.8877 Medium M P1 A1 2 1 15.73546 17.25752 146.2397 Temp H P1 A1 1 fTemp LONG LAT Between Plot Cond Time Mutate s i o n n c t f u d o w W i n

  55. Section 7 Summarising (aggregating) data

  56. 1 24.68638 65.72792 2.340369 M SEM VarTemp MeanTemp > summarise (data.1, MeanTemp= mean (Temp), VarTemp= var (Temp), SEM= SE (Temp)) > SE <- function (x) sd (x)/ sqrt ( length (x)) 1 24.68638 65.72792 12 N VarTemp MeanTemp > summarise (data.1, MeanTemp= mean (Temp), VarTemp= var (Temp), N= n ()) 2 23.83643 14.07060 144.8877 P1 > head (data.1, 2) A1 2 1 15.73546 17.25752 146.2397 H P1 A1 1 LONG LAT Temp Between Plot Cond Time Summarise

  57. 144.7791 1.363636 65.72792 5.048825 2.512696 > summarise_at (data.1, vars (Temp,LAT),.funs= funs (mean,var)) 24.68638 18.56219 2.5 1 Time_mean Temp_mean LAT_mean LONG_mean Time_var Temp_var LAT_var LONG_var > summarise_if (data.1, is.numeric, .funs= funs (mean,var)) 24.68638 18.56219 65.72792 5.048825 1 LAT_var Temp_mean LAT_mean Temp_var 1 1.363636 0.7272727 1.363636 65.72792 5.048825 2.512696 > summarise_all (data.1, .funs= funs (mean,var)) LAT_var LONG_var Cond_var Time_var Temp_var Plot_var 0.2727273 2.5 24.68638 18.56219 144.7791 NA NA NA 1 Between_mean Plot_mean Cond_mean Time_mean Temp_mean LAT_mean LONG_mean Between_var Summarise

  58. 3 2 3 2 H TRUE 1 3 L FALSE 4 L 1 H TRUE 2 5 M FALSE 1 6 M TRUE FALSE <int> > count (data.1, Cond) 2 L # A tibble: 3 x 2 Cond n <fct> <int> 1 H 4 4 <fct> <lgl> 3 M 4 > count (data.1, Cond, between (Temp,20,30)) # A tibble: 6 x 3 Cond `between(Temp, 20, 30)` n Summarise

  59. Section 8 Piping

  60. 2 18.94612 Cond Time M 1 25.29508 18.46762 144.0437 6 A1 P2 L 2 13.79532 20.38767 145.8359 > data.1 %>% filter (Cond=='H') %>% + select (Cond, starts_with ('t')) Temp A1 1 H 1 15.73546 2 H 4 37.95281 3 H 3 26.87429 4 H P2 5 > head (data.1, 6) A1 Between Plot Cond Time Temp LAT LONG 1 A1 P1 H 1 15.73546 17.25752 146.2397 2 P1 4 37.95281 18.41013 142.0585 M 2 23.83643 14.07060 144.8877 3 A1 P1 L 3 13.64371 20.74986 144.6884 4 A1 P2 H Piping

  61. Section 9 Grouping (=aggregating)

  62. 27.3 Mean A1 P2 L 2 13.79532 20.38767 145.8359 > data.1 %>% group_by (Between,Plot) %>% + summarise (Mean= mean (Temp)) # A tibble: 4 x 3 # Groups: Between [?] Between Plot <fct> 1 25.29508 18.46762 144.0437 <fct> <dbl> 1 A1 P1 17.7 2 A1 P2 25.7 3 A2 P3 28.0 4 A2 P4 6 M > head (data.1, 6) M Between Plot Cond Time Temp LAT LONG 1 A1 P1 H 1 15.73546 17.25752 146.2397 2 A1 P1 2 23.83643 14.07060 144.8877 P2 3 A1 P1 L 3 13.64371 20.74986 144.6884 4 A1 P2 H 4 37.95281 18.41013 142.0585 5 A1 Grouping

  63. 18.9 <fct> 29.0 17.7 P1 1 A1 <dbl> <int> <dbl> <fct> <dbl> N First 15.7 Var Mean Between Plot Between [?] # Groups: # A tibble: 4 x 6 3 2 A1 + 3 3 84.1 27.3 P4 4 A2 26.9 1.62 P2 28.0 P3 3 A2 38.0 3 25.7 146. summarise (Mean= mean (Temp), Var= var (Temp), N= n (),First= first (Temp)) > data.1 %>% group_by (Between,Plot) %>% > head (data.1, 6) H 2 23.83643 14.07060 144.8877 M P1 A1 2 1 15.73546 17.25752 146.2397 P1 A1 A1 1 LONG LAT Temp Between Plot Cond Time 3 P1 2 13.79532 20.38767 145.8359 P2 L P2 A1 6 1 25.29508 18.46762 144.0437 M A1 L 5 4 37.95281 18.41013 142.0585 H P2 A1 4 3 13.64371 20.74986 144.6884 Grouping

  64. 27.3 146. 29.4 4 M P3 8 A2 28.0 148. 20.1 26.9 3 H P3 7 A2 25.7 20.4 145. 13.8 2 L P2 6 A1 25.7 144. 18.5 25.3 1 M P2 5 A1 25.7 19.7 28.0 18.4 11 A2 144. 14.5 25.9 4 L P4 12 A2 27.3 142. 18.6 37.1 3 M P4 27.3 9 A2 145. 20.1 18.9 2 H P4 10 A2 28.0 146. 20.3 27.8 1 L P3 142. 38.0 LAT P3 Time Cond Between Plot Between, Plot [4] # Groups: # A tibble: 12 x 8 mutate (Mean= mean (Temp)) + > data.1 %>% group_by (Between,Plot) %>% 27.3 P4 4 A2 28.0 3 A2 4 25.7 P2 2 A1 17.7 P1 1 A1 <fct> <dbl> <fct> Mean Between Plot Between [?] # Groups: # A tibble: 4 x 3 summarise (Mean= mean (Temp)) Temp LONG > data.1 %>% group_by (Between,Plot) %>% 14.1 H P2 4 A1 17.7 145. 20.7 13.6 3 L P1 3 A1 17.7 145. 23.8 Mean 2 M P1 2 A1 17.7 146. 17.3 15.7 1 H P1 1 A1 <fct> <fct> <int> <dbl> <dbl> <dbl> <dbl> <fct> + Grouping mutate vs summarise

  65. -1.42 3 1.38 28.0 145. 19.7 29.4 4 M P3 8 A2 -1.13 28.0 148. 20.1 26.9 H P3 P2 25.3 18.5 144. 25.7 -0.386 6 A1 L P3 2 13.8 20.4 146. 25.7 -11.9 7 A2 9 A2 L M M 27.3 144. 14.5 25.9 4 L P4 12 A2 9.80 27.3 142. 18.6 37.1 3 P4 1 P4 27.8 20.3 146. 28.0 -0.247 10 A2 H 11 A2 2 18.9 20.1 145. 27.3 -8.37 1 P2 > head (data.1, 2) mutate (Mean= mean (Temp), cTemp=Temp-Mean) <dbl> <fct> <fct> <int> <dbl> <dbl> <dbl> <dbl> <fct> cTemp Mean LONG LAT Temp Time Cond Between Plot Between, Plot [4] # Groups: # A tibble: 12 x 9 + P1 P1 Between Plot Cond Time Temp LAT LONG 1 A1 H > data.1 %>% group_by (Between,Plot) %>% 1 15.73546 17.25752 146.2397 2 A1 P1 M 2 23.83643 14.07060 144.8877 1 A1 H 5 A1 P2 13.6 20.7 145. 17.7 -4.09 4 A1 H L 4 38.0 18.4 142. 25.7 12.3 3 P1 1 P1 15.7 17.3 146. 17.7 -2.00 2 A1 M 3 A1 2 23.8 14.1 145. 17.7 6.10 Grouping

  66. 144. 144. NA 2.00 17.7 17.4 145. 2 A1 P2 NA 2.33 25.7 19.1 3 A2 1 A1 P3 NA 2.67 28.0 20.1 146. 4 A2 P4 NA 3.00 27.3 17.7 P1 <fct> <dbl> <dbl> <dbl> <dbl> <dbl> > head (data.1, 2) M Between Plot Cond Time Temp LAT LONG 1 A1 P1 H 1 15.73546 17.25752 146.2397 2 A1 P1 2 23.83643 14.07060 144.8877 <fct> > data.1 %>% group_by (Between,Plot) %>% + summarise_each ( funs (mean)) # A tibble: 4 x 7 # Groups: Between [?] Between Plot Cond Time Temp LAT LONG Grouping

  67. 144. 19.1 <fct> <dbl> <dbl> <dbl> 1 A1 P1 17.7 17.4 145. 2 A1 P2 25.7 144. LONG 3 A2 P3 28.0 20.1 146. 4 A2 P4 27.3 17.7 <fct> LAT > head (data.1, 2) 2 Between Plot Cond Time Temp LAT LONG 1 A1 P1 H 1 15.73546 17.25752 146.2397 A1 Temp P1 M 2 23.83643 14.07060 144.8877 > data.1 %>% select (-Cond,-Time) %>% group_by (Between,Plot) %>% + summarise_all ( funs (mean)) # A tibble: 4 x 5 # Groups: Between [?] Between Plot Grouping

  68. 0.790 3 A2 17.4 145. 3.11 1.93 0.487 2 A1 P2 25.7 19.1 144. 6.98 0.650 1.09 P3 P1 28.0 20.1 146. 0.735 0.193 0.859 4 A2 P4 27.3 17.7 144. 5.29 1.66 17.7 1 A1 > head (data.1, 2) 2 23.83643 14.07060 144.8877 Between Plot Cond Time Temp LAT LONG 1 A1 P1 H 1 15.73546 17.25752 146.2397 2 A1 P1 M > data.1 %>% group_by (Between,Plot) %>% <dbl> + summarise_at ( vars (Temp, LAT, LONG), funs (mean,SE)) # A tibble: 4 x 8 # Groups: Between [?] Between Plot Temp_mean LAT_mean LONG_mean Temp_SE LAT_SE LONG_SE <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> Grouping

  69. 10 0 0 0 72 81 6 V7 0 0 0 81 7 V8 0 16 5 81 8 V9 0 0 0 81 9 V10 0 0 16 81 V6 81 V3 9 V1 0 0 79 81 1 V2 0 0 51 81 2 0 > tikus[1:10, c (1:3,76:77)] 0 42 81 3 V4 0 0 15 81 4 V5 0 0 Psammocora contigua Psammocora digitata Pocillopora damicornis time rep Your turn Calculate for each year, the mean abundance of Pocillopora damicornis

  70. 4.00 2 83 6 88 1.80 5 87 0. 4 85 0. 3 84 0. 30.0 1 81 <dbl> <fct> MeanAbundance time # A tibble: 6 x 2 summarise (MeanAbundance= mean (`Pocillopora damicornis`)) + > tikus %>% group_by (time) %>% Your turn Calculate for each year, the mean abundance of Pocillopora damicornis

  71. 293.6 7.5 285.2 4 28.71304 -113.8 1 1995 13.0 20.5 14.5 276 990 289.3 290.7 5 26.21739 -113.8 1 1995 26.0 960 10.5 274 1000 292.2 292.7 6 23.72174 -113.8 1 1995 8.0 30.0 9.5 264 1000 294.1 284.7 298 272.1 2 33.70435 -113.8 lat long month year cloudhigh cloudlow cloudmid ozone pressure surftemp temperature 1 36.20000 -113.8 1 1995 26.0 7.5 34.5 304 835 272.7 26.0 1 1995 > nasa = as.data.frame (nasa) 20.0 11.5 32.5 304 940 279.5 282.2 3 31.20870 -113.8 1 1995 16.0 16.5 > head (nasa) Your turn Calculate for each year, the number of samples as well as the mean and variance of ozone

  72. 353. 507. 6912 266. 327. 4 1998 6912 267. 5 3 1999 6912 270. 368. 6 2000 6912 269. 1997 326. <int> <int> <dbl> <dbl> 267. summarise (N= n (),Mean= mean (ozone), Var= var (ozone)) # A tibble: 6 x 4 year N Mean Var 1 > nasa %>% group_by (year) %>% 1995 6912 264. 258. 2 1996 6912 + Your turn Calculate for each year, the number of samples as well as the mean and variance of ozone

  73. Section 10 Reshaping data

  74. 2 Between Plot P3 Time.2 8 12 A2 P4 Time.2 2 Time Count 11 1 A1 P1 Time.0 8 2 A1 P2 Time.0 A2 11 3 A2 P2 Time.1 12 7 A2 P3 Time.1 11 8 P4 Time.1 P2 Time.2 9 9 A1 P1 Time.2 14 10 A1 10 A2 6 P2 Time.2 9 9 A1 P1 Time.2 14 10 A1 11 A2 11 A2 P3 Time.2 8 12 A2 P4 Time.2 P4 Time.1 8 P3 Time.0 P1 Time.1 7 4 A2 P4 Time.0 11 5 A1 14 11 6 A1 P2 Time.1 12 7 A2 P3 Time.1 A1 14 2 P2 Time.0 > data.w %>% gather (Time,Count, -Between, -Plot) Between Plot Time Count 1 A1 P1 Time.0 8 P1 Time.1 A1 10 > data.w %>% gather (Time,Count,Time.0:Time.2) 3 A2 P3 Time.0 7 4 A2 P4 Time.0 11 5 A1 > ## OR Reshaping data frames a d a t d e W i Between Plot Time.0 Time.1 Time.2 R1 A1 P1 8 14 14 R2 A1 P2 10 12 11 R3 A2 P3 7 11 8 R4 A2 P4 11 9 2 t ) m e l g ( l o n t o d e W i

  75. Reshaping data frames a d a t n g L o Resp1 Resp2 Between Plot Subplot Within 8 17 A1 P1 S1 B1 10 18 A1 P1 S1 B2 7 17 A1 P1 S2 B1 11 21 A1 P1 S2 B2 14 19 A2 P2 S3 B1 12 13 A2 P2 S3 B2 11 24 A2 P2 S4 B1 9 18 A2 P2 S4 B2 14 25 A3 P3 S5 B1 11 18 A3 P3 S5 B2 8 27 A3 P3 S6 B1 2 22 A3 P3 S6 B2 8 17 A1 P4 S7 B1 10 22 A1 P4 S7 B2 7 16 A1 P4 S8 B1 12 13 A1 P4 S8 B2 11 23 A2 P5 S9 B1 12 19 A2 P5 S9 B2 12 23 A2 P5 S10 B1 10 21 A2 P5 S10 B2 3 17 A3 P6 S11 B1 11 16 A3 P6 S11 B2 13 26 A3 P6 S12 B1 7 28 A3 P6 S12 B2

  76. > #reshape2:::cast(data,Between+Plot+Subplot~Within,value="Resp1") A2 8 S9 11 12 P5 A2 7 9 S4 11 P2 6 P5 S3 14 12 P2 A2 5 7 12 S8 P4 A1 A2 S10 12 10 8 10 A3 7 S12 13 P6 A3 12 3 11 S11 P6 11 9 2 8 S6 P3 A3 10 S5 14 11 P3 A3 4 S7 > head (data,2) Resp1 Resp2 Between Plot Subplot Within S1 B2 18 10 2 B1 S1 P4 P1 A1 17 8 1 > data %>% select (-Resp2) %>% spread (Within,Resp1) A1 Between Plot Subplot B1 B2 1 A1 P1 S1 8 10 2 A1 P1 S2 7 11 3 A1 P1 Reshaping data frames t ) c a s n ( i d e W Widen Resp1 for repeated measures (Within)

  77. 28 S4 S4 P2 A2 32 24 B1 Resp2 P2 18 A2 31 13 B2 Resp2 S3 P2 A2 B2 Resp2 33 19 B2 Resp2 B1 Resp2 S6 P3 A3 35 18 S5 A3 P3 A3 34 25 B1 Resp2 S5 P3 30 B1 Resp2 36 25 26 17 B1 Resp2 S1 P1 A1 7 P1 B2 Resp1 S12 P6 A3 24 13 B1 Resp1 A1 S1 S3 A1 P2 A2 29 21 B2 Resp2 S2 P1 28 B2 Resp2 17 B1 Resp2 S2 P1 A1 27 18 27 A3 P6 P5 P6 A3 45 21 B2 Resp2 S10 A2 B1 Resp2 44 23 B1 Resp2 S10 P5 A2 43 S11 17 B2 Resp2 S12 B2 Resp2 S12 P6 A3 48 26 B1 Resp2 P6 46 A3 47 16 B2 Resp2 S11 P6 A3 19 S9 P3 17 22 B2 Resp2 S7 P4 A1 38 B1 Resp2 A1 S7 P4 A1 37 22 B2 Resp2 S6 39 P4 P5 41 A2 42 23 B1 Resp2 S9 P5 A2 13 S8 B2 Resp2 S8 P4 A1 40 16 B1 Resp2 S12 A3 P3 14 12 B2 Resp1 S3 P2 A2 6 B1 Resp1 A2 S3 P2 A2 5 11 B2 Resp1 S2 7 P2 A1 9 10 14 B1 Resp1 S5 P3 A3 9 S4 B2 Resp1 S4 P2 A2 8 11 B1 Resp1 P1 4 23 10 > data %>% gather (Resp,Count,Resp1:Resp2) B2 S1 P1 A1 18 2 Resp Count B1 S1 P1 A1 17 8 1 Between Plot Subplot Within 1 7 B2 Resp1 B1 Resp1 S2 P1 A1 3 10 S1 A1 P1 A1 2 8 B1 Resp1 S1 P1 A3 S5 > head (data,2) B2 Resp1 B1 Resp1 S10 P5 A2 19 12 S9 20 P5 A2 18 11 B1 Resp1 S9 P5 12 A2 17 3 11 B2 Resp1 S11 P6 A3 22 B1 Resp1 P5 S11 P6 A3 21 10 B2 Resp1 S10 A2 12 B2 Resp1 A3 A1 13 2 B2 Resp1 S6 P3 12 S7 8 B1 Resp1 S6 P3 A3 11 11 P4 B1 Resp1 B2 Resp1 P4 S8 P4 A1 16 7 B1 Resp1 S8 A1 8 15 10 B2 Resp1 S7 P4 A1 14 Resp1 Resp2 Between Plot Subplot Within Reshaping data frames Widen Resp1 and Resp2 for repeated measures (Within)

  78. 28 S4 B2_Resp2 30 A2 P2 S3 B2_Resp2 13 31 A2 P2 S4 B1_Resp2 24 32 A2 P2 18 S3 B1_Resp2 33 A3 P3 S5 B1_Resp2 25 34 A3 P3 S5 B2_Resp2 18 35 A3 P3 19 P2 27 26 P6 S12 B1_Resp1 13 24 A3 P6 S12 B2_Resp1 7 25 A1 P1 S1 B1_Resp2 17 A1 A2 P1 S1 B2_Resp2 18 27 A1 P1 S2 B1_Resp2 17 28 A1 P1 S2 B2_Resp2 21 29 S6 B1_Resp2 36 23 S11 B1_Resp2 43 A2 P5 S10 B1_Resp2 23 44 A2 P5 S10 B2_Resp2 21 45 A3 P6 17 S9 B2_Resp2 46 A3 P6 S11 B2_Resp2 16 47 A3 P6 S12 B1_Resp2 26 48 A3 P6 S12 B2_Resp2 19 P5 A3 39 P3 S6 B2_Resp2 22 37 A1 P4 S7 B1_Resp2 17 38 A1 P4 S7 B2_Resp2 22 A1 A2 P4 S8 B1_Resp2 16 40 A1 P4 S8 B2_Resp2 13 41 A2 P5 S9 B1_Resp2 23 42 A3 11 14 S3 B2_Resp1 4 A1 P1 S2 B2_Resp1 11 5 A2 P2 S3 B1_Resp1 14 6 A2 P2 12 S2 B1_Resp1 7 A2 P2 S4 B1_Resp1 11 8 A2 P2 S4 B2_Resp1 9 9 A3 P3 7 P1 S11 B2_Resp1 B2 1 8 17 A1 P1 S1 B1 2 10 18 A1 P1 S1 > data %>% gather (Resp,Count,Resp1:Resp2) %>% unite (WR,Within,Resp) A1 Between Plot Subplot WR Count 1 A1 P1 S1 B1_Resp1 8 2 A1 P1 S1 B2_Resp1 10 3 S5 B1_Resp1 10 > head (data,2) S10 B1_Resp1 17 A2 P5 S9 B1_Resp1 11 18 A2 P5 S9 B2_Resp1 12 19 A2 P5 12 S8 B2_Resp1 20 A2 P5 S10 B2_Resp1 10 21 A3 P6 S11 B1_Resp1 3 22 A3 P6 12 P4 A3 13 P3 S5 B2_Resp1 11 11 A3 P3 S6 B1_Resp1 8 12 A3 P3 S6 B2_Resp1 2 A1 A1 P4 S7 B1_Resp1 8 14 A1 P4 S7 B2_Resp1 10 15 A1 P4 S8 B1_Resp1 7 16 Resp1 Resp2 Between Plot Subplot Within Reshaping data frames Widen Resp1 and Resp2 for repeated measures (Within)

  79. 28 23 9 21 10 23 12 S10 P5 A2 8 19 12 11 P3 S9 P5 A2 7 18 9 24 11 S4 P2 A2 6 A3 S5 12 P6 7 26 13 S12 P6 A3 12 16 11 17 3 S11 A3 14 11 22 2 27 8 S6 P3 A3 10 18 11 25 13 19 2 B2 10 17 8 S1 P1 A1 1 Between Plot Subplot B1_Resp1 B1_Resp2 B2_Resp1 B2_Resp2 spread (WR,Count) + > data %>% gather (Resp,Count,Resp1:Resp2) %>% unite (WR,Within,Resp) %>% S1 14 P1 A1 18 10 2 B1 S1 P1 A1 17 8 1 18 A1 > head (data,2) 4 S3 P2 A2 5 13 12 16 7 S8 P4 A1 22 P1 10 17 8 S7 P4 A1 3 21 11 17 7 S2 Resp1 Resp2 Between Plot Subplot Within Reshaping data frames Widen Resp1 and Resp2 for repeated measures (Within)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend