a grammar of graphics for genomics
play

A Grammar of Graphics for Genomics The ggbio Package Michael - PowerPoint PPT Presentation

A Grammar of Graphics for Genomics The ggbio Package Michael Lawrence Genentech August 29, 2012 Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 1 / 18 Outline 1 Motivation 2 High-level Plots 3 Grammar


  1. A Grammar of Graphics for Genomics The ggbio Package Michael Lawrence Genentech August 29, 2012 Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 1 / 18

  2. Outline 1 Motivation 2 High-level Plots 3 Grammar Components Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 2 / 18

  3. Outline 1 Motivation 2 High-level Plots 3 Grammar Components Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 3 / 18

  4. Data on the Genome • Comes in two flavors: • Annotations (genes, TF binding sites, ...) • Experimental measurements (sequence reads) • Both types are tied to genomic coordinates, providing a common axis that permits cross-dataset comparison and inference • Typically stored as a table, with the range as a fundamental variable type, plus metadata Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 4 / 18

  5. Data on the Genome • Comes in two flavors: • Annotations (genes, TF binding sites, ...) • Experimental measurements (sequence reads) • Both types are tied to genomic coordinates, providing a common axis that permits cross-dataset comparison and inference • Typically stored as a table, with the range as a fundamental variable type, plus metadata 120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 4 / 18

  6. Data on the Genome • Comes in two flavors: • Annotations (genes, TF binding sites, ...) • Experimental measurements (sequence reads) • Both types are tied to genomic coordinates, providing a common axis that permits cross-dataset comparison and inference • Typically stored as a table, with the range as a fundamental variable type, plus metadata 60 50 40 30 20 10 0 120928000 120930000 120932000 120934000 120936000 120938000 Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 4 / 18

  7. Data on the Genome • Comes in two flavors: • Annotations (genes, TF binding sites, ...) • Experimental measurements (sequence reads) • Both types are tied to genomic coordinates, providing a common axis that permits cross-dataset comparison and inference • Typically stored as a table, with the range as a fundamental variable type, plus metadata 60 50 40 30 20 10 0 120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 4 / 18

  8. Data on the Genome • Comes in two flavors: • Annotations (genes, TF binding sites, ...) • Experimental measurements (sequence reads) • Both types are tied to genomic coordinates, providing a common axis that permits cross-dataset comparison and inference • Typically stored as a table, with the range as a fundamental variable type, plus metadata seqnames start end strand exon id tx id 10 120927215 120928045 - 129230 14886,14887 10 120928689 120928854 - 129229 14886,14887 10 120931894 120931997 - 129228 14886,14887 10 120933249 120933384 - 129227 14886,14887 10 120933963 120934069 - 129226 14886 10 120933963 120934104 - 119757 14887 10 120936533 120936665 - 119756 14887 10 120936552 120936665 - 129225 14886 10 120938267 120938345 - 129224 14886,14887 Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 4 / 18

  9. Challenges Big data, wide spaces • Need summaries that are efficiently computed, communicate more with less and expose the most interesting aspects of the data • Need different ways of viewing the data, depending on the density and scale, from whole genome to single basepair Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 5 / 18

  10. Challenges Big data, wide spaces 120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb • Need summaries that are efficiently computed, communicate more with less and expose the most interesting aspects of the data • Need different ways of viewing the data, depending on the density and scale, from whole genome to single basepair Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 5 / 18

  11. Challenges Big data, wide spaces 60 50 40 30 20 10 0 120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb • Need summaries that are efficiently computed, communicate more with less and expose the most interesting aspects of the data • Need different ways of viewing the data, depending on the density and scale, from whole genome to single basepair Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 5 / 18

  12. Challenges Big data, wide spaces 300000 250000 200000 150000 100000 50000 0 0 Mb 50 Mb 100 Mb • Need summaries that are efficiently computed, communicate more with less and expose the most interesting aspects of the data • Need different ways of viewing the data, depending on the density and scale, from whole genome to single basepair Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 5 / 18

  13. Existing Tools UCSC IGB IGV Circos GViz Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18

  14. Existing Tools UCSC IGB IGV Circos GViz Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18

  15. Existing Tools UCSC IGB IGV Circos GViz Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18

  16. Existing Tools UCSC IGB IGV Circos GViz Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18

  17. Existing Tools UCSC IGB IGV Circos GViz Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18

  18. Existing Tools UCSC IGB IGV Circos GViz Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18

  19. Existing Tools UCSC IGB IGV Circos GViz Limitations • Limited to one type of view (linear or circular) • Not tightly integrated with an analysis environment through standard, abstract data structures (except GViz) • No low-level toolkit for prototyping new types of graphics Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18

  20. Grammars of Graphics • A grammar of graphics is a language for expressing plots • Graphics are constructed through the combination of various types of primitives; like legos for graphics • The most prominent grammar was introduced by Wilkinson’s book The Grammar of Graphics • Wilkinson’s grammar was extended by Wickham and the ggplot2 package Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 7 / 18

  21. Grammars of Graphics • A grammar of graphics is a language for expressing plots • Graphics are constructed through the combination of various types of primitives; like legos for graphics • The most prominent grammar was introduced by Wilkinson’s book The Grammar of Graphics • Wilkinson’s grammar was extended by Wickham and the ggplot2 package Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 7 / 18

  22. Grammars of Graphics • A grammar of graphics is a language for expressing plots • Graphics are constructed through the combination of various types of primitives; like legos for graphics • The most prominent grammar was introduced by Wilkinson’s book The Grammar of Graphics • Wilkinson’s grammar was extended by Wickham and the ggplot2 package Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 7 / 18

  23. Grammars of Graphics • A grammar of graphics is a language for expressing plots • Graphics are constructed through the combination of various types of primitives; like legos for graphics • The most prominent grammar was introduced by Wilkinson’s book The Grammar of Graphics • Wilkinson’s grammar was extended by Wickham and the ggplot2 package Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 7 / 18

  24. The ggbio Package • An R/Bioconductor package that extends the Wilkinson/Wickham grammar for applications in genomics • Integrated with Bioconductor • Operates on standard, abstract genomic data structures • Leverages efficient range-based algorithms • Programming interface has two levels of abstraction: autoplot Maps Bioconductor data structures to plots grammar Mix and match to create custom plots Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 8 / 18

  25. Outline 1 Motivation 2 High-level Plots 3 Grammar Components Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 9 / 18

  26. Basic Plots Gene Structures Read Alignments Sequence Multiple Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 10 / 18

  27. Basic Plots Gene Structures Read Alignments Sequence Multiple 120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 10 / 18

  28. Basic Plots Gene Structures Read Alignments Sequence Multiple 60 50 40 30 20 10 0 120928000 120930000 120932000 120934000 120936000 120938000 Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 10 / 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend