plyr
play

plyr split-apply-combine for mortals sean anderson - PowerPoint PPT Presentation

plyr split-apply-combine for mortals sean anderson sean_anderson@sfu.ca why? 1. its everywhere 2. less code, simple syntax 3. it runs faster look familiar? > d year count 1 2000 16 2 2000 4 3 2000 12 4 2001 15


  1. plyr split-apply-combine for mortals sean anderson sean_anderson@sfu.ca

  2. why? 1. it’s everywhere 2. less code, simple syntax 3. it runs faster

  3. look familiar? > d year count 1 2000 16 2 2000 4 3 2000 12 4 2001 15 5 2001 7 6 2001 12 7 2002 20 ...

  4. why apply > for loop? less code subsetting saving results faster

  5. > d year count 1 2000 16 2 2000 4 3 2000 12 4 2001 15 5 2001 7 6 2001 12 7 2002 20 ...

  6. year mean 1 2000 10.66667 2 2001 11.33333 3 2002 13.66667

  7. d.split <- split(d, d$year) results <- vector("list", length = length(d.split)) for(i in 1:length(d.split)) { temp <- d.split[[i]] temp.mean <- mean(temp$count) results[[i]] <- data.frame( year = unique(temp$year), mean = temp.mean) } do.call("rbind", results) inspired by Hadley Wickham: http://had.co.nz/plyr/

  8. apply(array, 1 or 2, func) sapply(vector, func) lapply(list, func) tapply(vector, index, func) aggregate(object, by, func) ...

  9. d.split <- split(d, d$year) result <- lapply(d.split, function(x) mean(x$count)) result <- unlist(result) result <- data.frame(year = unique(d$year), mean = result) row.names(result) <- NULL

  10. enter plyr

  11. ddply(d, "year", summarize, mean = mean(count))

  12. d.split <- split(d, d$year) results <- vector("list", length = length(d.split)) for(i in 1:length(d.split)) { temp <- d.split[[i]] temp.mean <- mean(temp$count) results[[i]] <- data.frame( year = unique(temp$year), mean = temp.mean) } do.call("rbind", results)

  13. output input ddply()

  14. d - data frame l - list a - array _ - discard

  15. ddply(data, "split", function)

  16. ddply(d, "year", summarise, mean.count = mean(count))

  17. year mean 1 2000 10.66667 2 2001 11.33333 3 2002 13.66667

  18. ddply(d, "year", transform, total.count = sum(count))

  19. year count total 1 2000 16 32 2 2000 4 32 3 2000 12 32 4 2001 15 34 5 2001 7 34 6 2001 12 34 7 2002 20 41 8 2002 15 41 9 2002 6 41

  20. ddply(d, "year", function(x) { browser() }) Browse[1]> x year count 1 2000 16 2 2000 4 3 2000 12 Browse[1]> Q >

  21. library(doMC) registerDoMC(2) # 2 cores ddply(d, f, .parallel = TRUE))

  22. # fail gracefully: failwith(default, f)

  23. remember 1. it’s everywhere 2. less code, simple syntax 3. it runs faster (sometimes) use it.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend