Reshape Flexible data restructuring with R Hadley Wickham - - PowerPoint PPT Presentation

reshape
SMART_READER_LITE
LIVE PREVIEW

Reshape Flexible data restructuring with R Hadley Wickham - - PowerPoint PPT Presentation

Reshape Flexible data restructuring with R Hadley Wickham Statistics, Iowa State University Introduction What is reshaping? Why its not easy at the moment Example Details Future work Aggregating Reduce big table to


slide-1
SLIDE 1

Flexible data restructuring with R

Reshape

Hadley Wickham Statistics, Iowa State University

slide-2
SLIDE 2
  • What is reshaping?
  • Why it’s not easy at the moment
  • Example
  • Details
  • Future work

Introduction

slide-3
SLIDE 3
  • Reduce big table to small table
  • (must lose information)
  • Each cell in the new table corresponds to

multiple cells in the old table

Aggregating

slide-4
SLIDE 4
  • Like aggregating, but each new cell

corresponds to one old cell

  • Useful when investigating relationships

between different aspects of your data (and especially when using lattice graphics)

  • Similar to transposing a matrix (but what

happens to ragged data?)

Reshaping

slide-5
SLIDE 5

ftable table xtabs tapply by merge aggregate match split reshape mapply

Which one do I use?

slide-6
SLIDE 6
  • Many different tools in R
  • Tend to be rather specialised/limited
  • Can be difficult to figure out which one to

use

  • Reshape handles many different needs within
  • ne framework

Motivation

slide-7
SLIDE 7

Example

slide-8
SLIDE 8

Conceptual framework

  • Id vs. measured variables
  • random variables vs. their indices
  • categorical vs. continuous
  • Gives a more flexible data format
  • Deals with very ragged data
  • Missing values implicit
slide-9
SLIDE 9

Subject Age Height Weight John 20 1.95 100 John 21 1.96 NA Subject Age Variable Value John 20 Height 1.95 John 20 Weight 100 John 21 Height 1.96

slide-10
SLIDE 10

Mechanics

  • Specifying the output format
  • Adding margins
  • Functions that return multiple values
  • Row and column names
slide-11
SLIDE 11

Output specification

  • How do you specify what output you want?
  • I’ve used a formula type interface
  • Formatting output
  • What are other alternatives?
slide-12
SLIDE 12

Margins

  • Multiple levels of aggregation at the same

time

  • Useful for summaries
  • (Pivot table inspired)
slide-13
SLIDE 13

Row & column names

  • Explicit vs implicit
  • Most inbuilt functions store implicitly

(frustrating when trying to plot!)

  • Reshape stores explicitly (but makes it easy

to get rid of them)

slide-14
SLIDE 14

Efficiency

  • Size (limited by memory)
  • Multiple copies of data
  • 150,000 x 5
  • Speed
  • merge is slow
  • RDMS?
slide-15
SLIDE 15

Future

  • Aggregate function on data frame
  • Non-numeric data and summaries
  • Built-in graphical display of output
  • Larger data/database integration
slide-16
SLIDE 16

http://had.co.nz/ reshape