A Grammar of Graphics for Genomics The ggbio Package Michael - - PowerPoint PPT Presentation

a grammar of graphics for genomics
SMART_READER_LITE
LIVE PREVIEW

A Grammar of Graphics for Genomics The ggbio Package Michael - - PowerPoint PPT Presentation

A Grammar of Graphics for Genomics The ggbio Package Michael Lawrence Genentech August 29, 2012 Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 1 / 18 Outline 1 Motivation 2 High-level Plots 3 Grammar


slide-1
SLIDE 1

A Grammar of Graphics for Genomics

The ggbio Package Michael Lawrence

Genentech

August 29, 2012

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 1 / 18

slide-2
SLIDE 2

Outline

1 Motivation 2 High-level Plots 3 Grammar Components

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 2 / 18

slide-3
SLIDE 3

Outline

1 Motivation 2 High-level Plots 3 Grammar Components

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 3 / 18

slide-4
SLIDE 4

Data on the Genome

  • Comes in two flavors:
  • Annotations (genes, TF binding sites, ...)
  • Experimental measurements (sequence reads)
  • Both types are tied to genomic coordinates, providing a common axis

that permits cross-dataset comparison and inference

  • Typically stored as a table, with the range as a fundamental variable

type, plus metadata

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 4 / 18

slide-5
SLIDE 5

Data on the Genome

  • Comes in two flavors:
  • Annotations (genes, TF binding sites, ...)
  • Experimental measurements (sequence reads)
  • Both types are tied to genomic coordinates, providing a common axis

that permits cross-dataset comparison and inference

  • Typically stored as a table, with the range as a fundamental variable

type, plus metadata

120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 4 / 18

slide-6
SLIDE 6

Data on the Genome

  • Comes in two flavors:
  • Annotations (genes, TF binding sites, ...)
  • Experimental measurements (sequence reads)
  • Both types are tied to genomic coordinates, providing a common axis

that permits cross-dataset comparison and inference

  • Typically stored as a table, with the range as a fundamental variable

type, plus metadata

10 20 30 40 50 60 120928000 120930000 120932000 120934000 120936000 120938000

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 4 / 18

slide-7
SLIDE 7

Data on the Genome

  • Comes in two flavors:
  • Annotations (genes, TF binding sites, ...)
  • Experimental measurements (sequence reads)
  • Both types are tied to genomic coordinates, providing a common axis

that permits cross-dataset comparison and inference

  • Typically stored as a table, with the range as a fundamental variable

type, plus metadata

10 20 30 40 50 60 120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 4 / 18

slide-8
SLIDE 8

Data on the Genome

  • Comes in two flavors:
  • Annotations (genes, TF binding sites, ...)
  • Experimental measurements (sequence reads)
  • Both types are tied to genomic coordinates, providing a common axis

that permits cross-dataset comparison and inference

  • Typically stored as a table, with the range as a fundamental variable

type, plus metadata

seqnames start end strand exon id tx id 10 120927215 120928045

  • 129230

14886,14887 10 120928689 120928854

  • 129229

14886,14887 10 120931894 120931997

  • 129228

14886,14887 10 120933249 120933384

  • 129227

14886,14887 10 120933963 120934069

  • 129226

14886 10 120933963 120934104

  • 119757

14887 10 120936533 120936665

  • 119756

14887 10 120936552 120936665

  • 129225

14886 10 120938267 120938345

  • 129224

14886,14887 Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 4 / 18

slide-9
SLIDE 9

Challenges

Big data, wide spaces

  • Need summaries that are efficiently computed, communicate more

with less and expose the most interesting aspects of the data

  • Need different ways of viewing the data, depending on the density and

scale, from whole genome to single basepair

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 5 / 18

slide-10
SLIDE 10

Challenges

Big data, wide spaces

120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

  • Need summaries that are efficiently computed, communicate more

with less and expose the most interesting aspects of the data

  • Need different ways of viewing the data, depending on the density and

scale, from whole genome to single basepair

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 5 / 18

slide-11
SLIDE 11

Challenges

Big data, wide spaces

10 20 30 40 50 60 120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

  • Need summaries that are efficiently computed, communicate more

with less and expose the most interesting aspects of the data

  • Need different ways of viewing the data, depending on the density and

scale, from whole genome to single basepair

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 5 / 18

slide-12
SLIDE 12

Challenges

Big data, wide spaces

50000 100000 150000 200000 250000 300000 0 Mb 50 Mb 100 Mb

  • Need summaries that are efficiently computed, communicate more

with less and expose the most interesting aspects of the data

  • Need different ways of viewing the data, depending on the density and

scale, from whole genome to single basepair

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 5 / 18

slide-13
SLIDE 13

Existing Tools

UCSC IGB IGV Circos GViz

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18

slide-14
SLIDE 14

Existing Tools

UCSC IGB IGV Circos GViz

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18

slide-15
SLIDE 15

Existing Tools

UCSC IGB IGV Circos GViz

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18

slide-16
SLIDE 16

Existing Tools

UCSC IGB IGV Circos GViz

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18

slide-17
SLIDE 17

Existing Tools

UCSC IGB IGV Circos GViz

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18

slide-18
SLIDE 18

Existing Tools

UCSC IGB IGV Circos GViz

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18

slide-19
SLIDE 19

Existing Tools

UCSC IGB IGV Circos GViz

Limitations

  • Limited to one type of view (linear or circular)
  • Not tightly integrated with an analysis environment through

standard, abstract data structures (except GViz)

  • No low-level toolkit for prototyping new types of graphics

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 6 / 18

slide-20
SLIDE 20

Grammars of Graphics

  • A grammar of graphics is a

language for expressing plots

  • Graphics are constructed

through the combination of various types of primitives; like legos for graphics

  • The most prominent

grammar was introduced by Wilkinson’s book The Grammar of Graphics

  • Wilkinson’s grammar was

extended by Wickham and the ggplot2 package

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 7 / 18

slide-21
SLIDE 21

Grammars of Graphics

  • A grammar of graphics is a

language for expressing plots

  • Graphics are constructed

through the combination of various types of primitives; like legos for graphics

  • The most prominent

grammar was introduced by Wilkinson’s book The Grammar of Graphics

  • Wilkinson’s grammar was

extended by Wickham and the ggplot2 package

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 7 / 18

slide-22
SLIDE 22

Grammars of Graphics

  • A grammar of graphics is a

language for expressing plots

  • Graphics are constructed

through the combination of various types of primitives; like legos for graphics

  • The most prominent

grammar was introduced by Wilkinson’s book The Grammar of Graphics

  • Wilkinson’s grammar was

extended by Wickham and the ggplot2 package

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 7 / 18

slide-23
SLIDE 23

Grammars of Graphics

  • A grammar of graphics is a

language for expressing plots

  • Graphics are constructed

through the combination of various types of primitives; like legos for graphics

  • The most prominent

grammar was introduced by Wilkinson’s book The Grammar of Graphics

  • Wilkinson’s grammar was

extended by Wickham and the ggplot2 package

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 7 / 18

slide-24
SLIDE 24

The ggbio Package

  • An R/Bioconductor package that extends the Wilkinson/Wickham

grammar for applications in genomics

  • Integrated with Bioconductor
  • Operates on standard, abstract genomic data structures
  • Leverages efficient range-based algorithms
  • Programming interface has two levels of abstraction:

autoplot Maps Bioconductor data structures to plots grammar Mix and match to create custom plots

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 8 / 18

slide-25
SLIDE 25

Outline

1 Motivation 2 High-level Plots 3 Grammar Components

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 9 / 18

slide-26
SLIDE 26

Basic Plots

Gene Structures Read Alignments Sequence Multiple

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 10 / 18

slide-27
SLIDE 27

Basic Plots

Gene Structures Read Alignments Sequence Multiple

120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 10 / 18

slide-28
SLIDE 28

Basic Plots

Gene Structures Read Alignments Sequence Multiple

10 20 30 40 50 60 120928000 120930000 120932000 120934000 120936000 120938000

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 10 / 18

slide-29
SLIDE 29

Basic Plots

Gene Structures Read Alignments Sequence Multiple

120928700 120928750 120928800 120928850 A C G T

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 10 / 18

slide-30
SLIDE 30

Basic Plots

Gene Structures Read Alignments Sequence Multiple

CGTAGGAGAATCCGGTGTCCAGTTCGCTGGGCAGACTTCTCCATGTGTTT

120928690 120928700 120928710 120928720 120928730 120928740

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 10 / 18

slide-31
SLIDE 31

Basic Plots

Gene Structures Read Alignments Sequence Multiple

10 20 30 40 50 60 120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 10 / 18

slide-32
SLIDE 32

Overview Plots

Grand Linear Karyogram Circular

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 11 / 18

slide-33
SLIDE 33

Overview Plots

Grand Linear Karyogram Circular

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 11 / 18

slide-34
SLIDE 34

Overview Plots

Grand Linear Karyogram Circular

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X 5.0e+07 1.0e+08 1.5e+08 2.0e+08 seqReg Exon Intron Other

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 11 / 18

slide-35
SLIDE 35

Overview Plots

Grand Linear Karyogram Circular

  • 0M

50M 100M 150M 200M 0M 50M 100M 150M 200M 0M 5 M 100M 150M 0M 50M 1 M 150M M 5 M 100M 150M 0M 5 M 100M 150M 0M 5 M 100M 150M M 50M 100M 0M 50M 100M M 50M 1 M 0M 50M 100M 0M 5 M 1 M 0M 5 M 100M 0M 5 M 100M M 50M 100M 0M 5 M 0M 50M 0M 50M 0M 50M M 50M 0M M 5 M

1 2 3 4 5 6 7 8 9 1 11 1 2 13 14 1 5 16 17 1 8 19 20 21 2 2

rearrangements interchromosomal intrachromosomal tumreads

  • 4

6 8 10 12

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 11 / 18

slide-36
SLIDE 36

Specialized Plots

Mismatch summary + VCF Edge-linked Intervals

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 12 / 18

slide-37
SLIDE 37

Specialized Plots

Mismatch summary + VCF Edge-linked Intervals

5 10 15

Counts

read A C G T

T G A A A G T A C C G T G T G A C A T C A C A G G C T G G G A G C T T G A

25235720 25235725 25235730 25235735 25235740 25235745 25235750 25235755

mismatch snp reference

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 12 / 18

slide-38
SLIDE 38

Specialized Plots

Mismatch summary + VCF Edge-linked Intervals

Expression

200 400 600 800 1000 group GM12878 K562 uc002rau.2 uc010yjg.1 uc002rav.2 uc010yjh.1 uc002raw.2 10930000 10940000 10950000 10960000 10970000 10980000

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 12 / 18

slide-39
SLIDE 39

Outline

1 Motivation 2 High-level Plots 3 Grammar Components

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 13 / 18

slide-40
SLIDE 40

The Wilkinson/Wickham Grammar of Graphics

Geom The shape used for drawing the data Stat Transforms the data before plotting Scale Maps data to geom aesthetics, guides like legends and axes Coord Maps from geom space to device space Facet Small multiples of data subsets (trellis)

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 14 / 18

slide-41
SLIDE 41

The Wilkinson/Wickham Grammar of Graphics

Geom The shape used for drawing the data Stat Transforms the data before plotting Scale Maps data to geom aesthetics, guides like legends and axes Coord Maps from geom space to device space Facet Small multiples of data subsets (trellis)

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 14 / 18

slide-42
SLIDE 42

The Wilkinson/Wickham Grammar of Graphics

Geom The shape used for drawing the data Stat Transforms the data before plotting Scale Maps data to geom aesthetics, guides like legends and axes Coord Maps from geom space to device space Facet Small multiples of data subsets (trellis)

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 14 / 18

slide-43
SLIDE 43

The Wilkinson/Wickham Grammar of Graphics

Geom The shape used for drawing the data Stat Transforms the data before plotting Scale Maps data to geom aesthetics, guides like legends and axes Coord Maps from geom space to device space Facet Small multiples of data subsets (trellis)

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 14 / 18

slide-44
SLIDE 44

The Wilkinson/Wickham Grammar of Graphics

Geom The shape used for drawing the data Stat Transforms the data before plotting Scale Maps data to geom aesthetics, guides like legends and axes Coord Maps from geom space to device space Facet Small multiples of data subsets (trellis)

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 14 / 18

slide-45
SLIDE 45

The Wilkinson/Wickham Grammar of Graphics

Geom The shape used for drawing the data Stat Transforms the data before plotting Scale Maps data to geom aesthetics, guides like legends and axes Coord Maps from geom space to device space Facet Small multiples of data subsets (trellis)

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 14 / 18

slide-46
SLIDE 46

A Grammar of Graphics for Genomics

Extensions are marked in red

1 2 48.245 Mb 48.250 Mb 48.255 Mb 48.260 Mb 48.265 Mb 48.270 Mb strand + − statistical transformation: geometric object: chevron geometric object: alignment Y scale: discrete from stepping geometric object: rect stepping X scale: sequence color scale: discrete from strand

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 15 / 18

slide-47
SLIDE 47

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrect Stat: gene reduce stepping coverage mismatch table Scale: sequence genome fold-change giemsa Coord: truncate-gaps Layout: tracks range-facet

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

slide-48
SLIDE 48

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrect Stat: gene reduce stepping coverage mismatch table Scale: sequence genome fold-change giemsa Coord: truncate-gaps Layout: tracks range-facet

NM_006793(GeneID:10935) NM_014098(GeneID:10935) 120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

slide-49
SLIDE 49

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrect Stat: gene reduce stepping coverage mismatch table Scale: sequence genome fold-change giemsa Coord: truncate-gaps Layout: tracks range-facet

NM_006793(GeneID:10935) NM_014098(GeneID:10935) 120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

slide-50
SLIDE 50

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrect Stat: gene reduce stepping coverage mismatch table Scale: sequence genome fold-change giemsa Coord: truncate-gaps Layout: tracks range-facet

NM_006793(GeneID:10935) NM_014098(GeneID:10935) 120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

slide-51
SLIDE 51

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrect Stat: gene reduce stepping coverage mismatch table Scale: sequence genome fold-change giemsa Coord: truncate-gaps Layout: tracks range-facet

NM_006793(GeneID:10935) NM_014098(GeneID:10935) 120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

slide-52
SLIDE 52

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrect Stat: gene reduce stepping coverage mismatch table Scale: sequence genome fold-change giemsa Coord: truncate-gaps Layout: tracks range-facet

NM_006793(GeneID:10935) NM_014098(GeneID:10935) 120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

slide-53
SLIDE 53

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrect Stat: gene reduce stepping coverage mismatch table Scale: sequence genome fold-change giemsa Coord: truncate-gaps Layout: tracks range-facet

120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

  • riginal

reduced Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

slide-54
SLIDE 54

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrect Stat: gene reduce stepping coverage mismatch table Scale: sequence genome fold-change giemsa Coord: truncate-gaps Layout: tracks range-facet

120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

stepping Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

slide-55
SLIDE 55

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrect Stat: gene reduce stepping coverage mismatch table Scale: sequence genome fold-change giemsa Coord: truncate-gaps Layout: tracks range-facet

10 20 30 40 50 60 120928000 120930000 120932000 120934000 120936000 120938000

Coverage Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

slide-56
SLIDE 56

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrect Stat: gene reduce stepping coverage mismatch table Scale: sequence genome fold-change giemsa Coord: truncate-gaps Layout: tracks range-facet

10 20 30 40 50 60 120928000 120930000 120932000 120934000 120936000 120938000

Counts

A C G T

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

slide-57
SLIDE 57

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrect Stat: gene reduce stepping coverage mismatch table Scale: sequence genome fold-change giemsa Coord: truncate-gaps Layout: tracks range-facet

10 20 30 40 50 60

Counts

A C G T 120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

slide-58
SLIDE 58

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrect Stat: gene reduce stepping coverage mismatch table Scale: sequence genome fold-change giemsa Coord: truncate-gaps Layout: tracks range-facet

  • riginal

truncated Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

slide-59
SLIDE 59

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrect Stat: gene reduce stepping coverage mismatch table Scale: sequence genome fold-change giemsa Coord: truncate-gaps Layout: tracks range-facet

500 1000 1500 2000 500 1000 1500 2000 normal tumor score 500 1000 1500 novel FALSE TRUE

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

slide-60
SLIDE 60

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrect Stat: gene reduce stepping coverage mismatch table Scale: sequence genome fold-change giemsa Coord: truncate-gaps Layout: tracks range-facet

10 11 0e+00 2e+05 4e+05 6e+05 0e+00 5e+07 1e+08 0e+00 5e+07 1e+08

Coverage Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

slide-61
SLIDE 61

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrect Stat: gene reduce stepping coverage mismatch table Scale: sequence genome fold-change giemsa Coord: truncate-gaps Layout: tracks range-facet

0e+00 2e+05 4e+05 6e+05 10 11

Coverage Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

slide-62
SLIDE 62

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrect Stat: gene reduce stepping coverage mismatch table Scale: sequence genome fold-change giemsa Coord: truncate-gaps Layout: tracks range-facet

50 100 150 200 1 2 3 4 5 6

Samples Features

−5000 5000 value

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

slide-63
SLIDE 63

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrect Stat: gene reduce stepping coverage mismatch table Scale: sequence genome fold-change giemsa Coord: truncate-gaps Layout: tracks range-facet

chr10 chr10

chr10 Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

slide-64
SLIDE 64

Components of the Genomic Grammar

Geom: alignment chevron arch arrow arrowrect Stat: gene reduce stepping coverage mismatch table Scale: sequence genome fold-change giemsa Coord: truncate-gaps Layout: tracks range-facet

chr10 chr10

chr10

10 20 30 40 50 60

Counts

A C G T 120.928 Mb 120.93 Mb 120.932 Mb 120.934 Mb 120.936 Mb 120.938 Mb

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 16 / 18

slide-65
SLIDE 65

Summary

  • The ggbio package is a toolkit for plotting genomic data and

annotations

  • Available as part of the Bioconductor project
  • Easy to use and flexible enough to handle the diverse use cases

encountered in genomics

  • Useful plots are automatically generated from Bioconductor data

structures using reasonable defaults

  • New types of plots can be constructed from grammar primitives

specially designed for genomics

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 17 / 18

slide-66
SLIDE 66

Acknowledgements

Tengfei Yin Di Cook Robert Gentleman

Michael Lawrence (Genentech) A Grammar of Graphics for Genomics August 29, 2012 18 / 18