data visualization with ggplot2
play

Data visualization with ggplot2 R.W. Oldford Computational - PowerPoint PPT Presentation

Data visualization with ggplot2 R.W. Oldford Computational pipelines Have some function/module which takes some input, performs some actions on it (transformations, summarizing, adding information, etc.) and produces output: Computational


  1. Data visualization with ggplot2 R.W. Oldford

  2. Computational pipelines Have some function/module which takes some input, performs some actions on it (transformations, summarizing, adding information, etc.) and produces output:

  3. Computational pipelines Have some function/module which takes some input, performs some actions on it (transformations, summarizing, adding information, etc.) and produces output:

  4. Computational pipelines Have some function/module which takes some input, performs some actions on it (transformations, summarizing, adding information, etc.) and produces output: If we have several of these, we can connect them one to another in sequence to produce a “pipeline” of modules or steps in the processing of the original input:

  5. Computational pipelines Have some function/module which takes some input, performs some actions on it (transformations, summarizing, adding information, etc.) and produces output: If we have several of these, we can connect them one to another in sequence to produce a “pipeline” of modules or steps in the processing of the original input:

  6. Computational pipelines Have some function/module which takes some input, performs some actions on it (transformations, summarizing, adding information, etc.) and produces output: If we have several of these, we can connect them one to another in sequence to produce a “pipeline” of modules or steps in the processing of the original input:

  7. Computational pipelines Have some function/module which takes some input, performs some actions on it (transformations, summarizing, adding information, etc.) and produces output: If we have several of these, we can connect them one to another in sequence to produce a “pipeline” of modules or steps in the processing of the original input:

  8. Computational pipelines Have some function/module which takes some input, performs some actions on it (transformations, summarizing, adding information, etc.) and produces output: If we have several of these, we can connect them one to another in sequence to produce a “pipeline” of modules or steps in the processing of the original input: The connected components form a “pipeline” through which the original input “flows”, with some processing/transformation of the data occurring at each step.

  9. Computational pipelines A simple metaphor (viz. that of laying pipes end to end):

  10. Computational pipelines A simple metaphor (viz. that of laying pipes end to end): - data passes through and is processed by a set of computational steps serially linked so that the output of one becomes the input of the next

  11. Computational pipelines A simple metaphor (viz. that of laying pipes end to end): - data passes through and is processed by a set of computational steps serially linked so that the output of one becomes the input of the next - the Unix “pipe” | is called a “pipe”: ls -R Notes

  12. Computational pipelines A simple metaphor (viz. that of laying pipes end to end): - data passes through and is processed by a set of computational steps serially linked so that the output of one becomes the input of the next - the Unix “pipe” | is called a “pipe”: ls -R Notes | grep ".pdf"

  13. Computational pipelines A simple metaphor (viz. that of laying pipes end to end): - data passes through and is processed by a set of computational steps serially linked so that the output of one becomes the input of the next - the Unix “pipe” | is called a “pipe”: ls -R Notes | grep ".pdf" | sort

  14. Computational pipelines A simple metaphor (viz. that of laying pipes end to end): - data passes through and is processed by a set of computational steps serially linked so that the output of one becomes the input of the next - the Unix “pipe” | is called a “pipe”: ls -R Notes | grep ".pdf" | sort | more

  15. Computational pipelines A simple metaphor (viz. that of laying pipes end to end): - data passes through and is processed by a set of computational steps serially linked so that the output of one becomes the input of the next - the Unix “pipe” | is called a “pipe”: ls -R Notes | grep ".pdf" | sort | more - a graphics rendering pipeline (from Kaufman, Fan and Petkov (2009) Implementing the lattice Boltzmann model on commodity graphics hardware J. Stat. Mech.)]

  16. Computational pipelines A simple metaphor (viz. that of laying pipes end to end): - data passes through and is processed by a set of computational steps serially linked so that the output of one becomes the input of the next - the Unix “pipe” | is called a “pipe”: ls -R Notes | grep ".pdf" | sort | more - a graphics rendering pipeline (from Kaufman, Fan and Petkov (2009) Implementing the lattice Boltzmann model on commodity graphics hardware J. Stat. Mech.)]

  17. Wilkinson’s Grammar of Graphics pipeline Lee Wilkinson’s monumental The Grammar of Graphics begins with a pipeline model for constructing statistical graphics: Each step in the pipeline transforms its input to produce output for the next step. The order of steps is essential, though not all need be there for every plot. Because the pipeline consists of separate components, the final graphic that is rendered can be simply and sometimes dramatically changed by making changes to a single component in the pipeline.

  18. ggplot2 – a grammar of graphics for R Inspired by Wilkinson’s “Grammar of Graphics”, Hadley Wickham (in his 2008 Iowa State PhD thesis: Practical tools for exploring data and models) developed a “layered grammar of graphics.” This is implemented as ggplot2 in R . library (ggplot2) Much like Wilkinson’s original grammar, ggplot2 uses a pipeline model for its graphics construction in that a plot is built in an ordered series of steps, where each step operates on the output of its immediate predecessor in the line. Departing from the grammar, ggplot2 slightly mixes metaphors in that each step in the pipeline can (typically) be thought of as adding a layer to all that preceded it. From the ggplot2 book: "The layered grammar of graphics (Wickham 2009) builds on Wilkinson’s grammar, focussing on the primacy of layers and adapting it for embedding within R. In brief, the grammar tells us that a statistical graphic is a mapping from data to aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars). The plot may also contain statistical transformations of the data and is drawn on a specific coordinate system. Facetting can be used to generate the same plot for different subsets of the dataset. It is the combination of these independent components that make up a graphic." Notationally, the components of the pipeline appear in sequence connected one to the next via an intervening + sign, thus emphasizing each as an addition of a layer (or of some further processing of the plot).

  19. Data - South African heart disease Consider the ‘SAheart‘ data from the package ‘ElemStatLearn‘. This is a sample from a retrospective study of heart disease in males from a high-risk region of the Western Cape, South Africa. There are 462 cases and 10 variates. The first few obervations (cases) are shown below. sbp tobacco ldl adiposity f amhist typea obesity alcohol age chd 160 12.00 5.73 23.11 Present 49 25.30 97.20 52 1 144 0.01 4.41 28.61 Absent 55 28.87 2.06 63 1 118 0.08 3.48 32.28 Present 52 29.14 3.81 46 0 170 7.50 6.41 38.03 Present 51 31.99 24.26 58 1 134 13.60 3.50 27.78 Present 60 25.99 57.34 49 1 132 6.20 6.47 36.21 Present 62 30.77 14.14 45 0 For example, sbp denotes “systolic blood pressure”, sbp “low density lipoprotein cholesterol”. famhist “family history of heart disease”, age “age at onset” (in years), and chd indicates whether the patient has coronary heart disease or not (a response). (see help(SAheart, package="ElemStatLearn") for details)

  20. Constructing a plot - the pipeline In the grammar of graphics, a plot processes each component in turn ggplot (data = SAheart) First the data

  21. Constructing a plot - pipeline In the grammar of graphics, a plot processes each component in turn ggplot (data = SAheart) + aes ( x = age, y = chd) 1.00 0.75 chd 0.50 0.25 0.00 20 30 40 50 60 age Then the mapping of the data to plot “aesthetics”

  22. Constructing a plot - pipeline In the grammar of graphics, a plot processes each component in turn ggplot (data = SAheart) + aes ( x = age, y = chd) + geom_point () 1.00 0.75 chd 0.50 0.25 0.00 20 30 40 50 60 age Then the geometry.

  23. Constructing a plot - pipeline In the grammar of graphics, a plot processes each component in turn ggplot (data = SAheart) + aes ( x = age, y = chd) + geom_point () + geom_smooth () 1.00 0.75 0.50 chd 0.25 0.00 20 30 40 50 60 age Which can have several further steps in the pipeline

  24. Constructing a plot Alternatively, in the grammar of ggplot2 , a plot is also a sum of component layers. ggplot (data = SAheart, mapping = aes (x = age, y = chd)) 1.00 0.75 chd 0.50 0.25 0.00 20 30 40 50 60 age The base display with mapping.

  25. Constructing a plot Alternatively, in the grammar of ggplot2 , a plot is also a sum of component layers. ggplot (data = SAheart, mapping = aes (x = age, y = chd)) + geom_point () 1.00 0.75 chd 0.50 0.25 0.00 20 30 40 50 60 age Here the + is adding layers.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend