Genome Visualization with Circos INTRODUCTION TO CIRCOS MARTIN - - PowerPoint PPT Presentation

genome visualization with circos
SMART_READER_LITE
LIVE PREVIEW

Genome Visualization with Circos INTRODUCTION TO CIRCOS MARTIN - - PowerPoint PPT Presentation

Genome Visualization with Circos INTRODUCTION TO CIRCOS MARTIN KRZYWINSKI Michael Smith Genome Sciences Center BC Cancer Research Center Vancouver, Canada EMBO PRACTICAL COURSE: BIOINFORMATICS AND COMPARATIVE GENOME ANALYSES Stazione


slide-1
SLIDE 1

GENOME VISUALIZATION WITH CIRCOS v20120508

Stazione Zoologica Anton Dohrn, Naples - Italy May 7–19, 2012

EMBO PRACTICAL COURSE: BIOINFORMATICS AND COMPARATIVE GENOME ANALYSES

INTRODUCTION TO CIRCOS

MARTIN KRZYWINSKI

Genome Visualization with Circos

Michael Smith Genome Sciences Center BC Cancer Research Center Vancouver, Canada

slide-2
SLIDE 2

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

Thomson, N.R., et al., Comparative genome analysis of Salmonella Enteritidis PT4 and Salmonella Gallinarum 287/91 provides insights into evolutionary and host adaptation pathways. Genome Res, 2008. 18(10): p. 1624-37.

AVOID LINEAR LAYOUT COMPARISONS

Thomson, N.R., et al., Comparative genome analysis of Salmonella Enteritidis PT4 and Salmonella Gallinarum 287/91 provides insights into evolutionary and host adaptation pathways. Genome Res, 2008. 18(10): p. 1624-37.

2

slide-3
SLIDE 3

LITERATURE AND MEDIA

circos appearances

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction 3

slide-4
SLIDE 4

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

>100 citations, 5 book covers

CIRCOS IN THE LITERATURE

4

slide-5
SLIDE 5

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

NYT Science, 4 May 2012

CIRCOS IN THE LITERATURE

5

slide-6
SLIDE 6

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

http://www.circos.ca/images/scientific_literature/

VARIETY OF VISUALIZATIONS

6

slide-7
SLIDE 7

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

Hillmer AM, Yao F , Inaki K et al. 2011 Comprehensive long-span paired-end-tag mapping reveals characteristic patterns of structural variations in epithelial cancer genomes. Genome research 21:665-675.

PRIMARY LITERATURE

7

slide-8
SLIDE 8

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

Ledford H 2010 Big science: The cancer genome challenge. Nature 464:972-974.

REVIEW LITERATURE

8

slide-9
SLIDE 9

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

AQ Magazine, April 2011 (Simon Fraser University)

POPULAR SCIENCE

9

slide-10
SLIDE 10

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

Wired, April 2010

POPULAR CULTURE

10

slide-11
SLIDE 11

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

The town of Caceres, Spain, a UNESCO World Heritage Site, used Circos to illustrate the relationships between businesses in their urban planning strategy.

URBAN PLANNING

11

slide-12
SLIDE 12

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

.

ADVERTISING

12

slide-13
SLIDE 13

CIRCULAR LAYOUT + FLEXIBLE IMPLEMENTATION

what makes circos useful?

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction 13

slide-14
SLIDE 14

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

TIMELY + EFFECTIVE Circos addresses the need to visualize differences in disease genomes and assess variation in genomic content across many samples. Dynamic rules provide a way to adjust the format of figure elements based on data values. SVG output is designed for publication-quality visualizations. Perceptual color palettes and high quality fonts are built in.

WHY IS CIRCOS USEFUL?

COMPATIBLE Driven entirely by plain-text configuration files. Data agnostic. Simple format for data input. Highly automatable. Fits naturally into any data pipeline. Extended longevity: performs only visualization, not analysis. SIMPLE + DEEP Large number of data tracks, which can be stacked and layered. Format of everything in the figure can be dynamically adjusted based on rules that react to data values. Utility tools assist with manipulating data files (e.g. binning links and ordering ideograms to optimize layout).

14

slide-15
SLIDE 15

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

Moving your eye across the curved path is faster and more comfortable.

EYE PREFERS CURVES

15

slide-16
SLIDE 16

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

Bin size ranges from 50 Mb (inside) to 1 Mb (outside). Image shows the density of genes across the human genome.

VARIABLE RESOLUTION

16

slide-17
SLIDE 17

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

Linear layout of scale has disadvantages of changing focus (regions in the center of the image receive more attention), broken adjacency (neighbouring points on a linear scale are separated), broken continuity (data tracks are difficult to follow from one edge of the figure to another), and non-uniform data emphasis (center and edge of the axis are not perceived uniformly - the edge implies periphery, which may not apply.

ADJACENCY, CONTINUITY & FOCUS

17

slide-18
SLIDE 18

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

(A) histogram (B) ideograms (C) histogram (D) heat map (E) links (F) highlights (G) grid (H) ticks. Format of data in tracks A, C, D, E is adjusted by rules based on data values.

TYPICAL CIRCOS IMAGE

18

slide-19
SLIDE 19

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

examples from literature

19

slide-20
SLIDE 20

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

The most frequent complex rearrangements involving MLL and (A) AFF1/AF4. Localization of chromosomal breakpoints and UPN of individual patients are indicated. Colored lines indicate in-frame fusions (green), out-of-frame fusions (red), no partner gene present at the recombination site (blue). Meyer, C., E. Kowarz, et al. (2009). "New insights to the MLL recombinome of acute leukemias." Leukemia 23(8): 1490-1499. Figure by M Krzywinski.

EXAMPLE FROM LITERATURE

20

slide-21
SLIDE 21

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

Various types of data tracks can be stacked. Five instances of a compound track each represent copy number information from a different sample. Two histograms, a line plot and a scatter plot are used to form a compound track. Using links and highlights, attention is drawn to the progression of scale increase within chr17:53-63Mb. This region is magnified at 5x and smaller subregions are further magnified to 40x. Krzywinski, M., J. Schein, et al. (2009). "Circos: an information aesthetic for comparative genomics." Genome Res 19(9): 1639-1645.

EXAMPLE FROM LITERATURE

21

slide-22
SLIDE 22

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

Data sets which do not sample the genome uniformly (A) can be effectively shown by using a connector track (B) to show the remapping onto an index scale (C). Shown in the figure are methylation values (A) for 7 tissues are summarized using stacked histograms (C), whose bins represent statistics for remapped methylation probe positions. Zimmer, C. (2008). Now: The Rest of the Genome. New York Times. Figure by M Krzywinski.

EXAMPLE FROM LITERATURE

22

slide-23
SLIDE 23

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

The same data set is shown in all panels. (A) each link represents one of a subset of 2,500 segmental duplications within the human genome. (B) rules are used to change link color and thickness. (C) rules are used to show only links to chrY. (D) in addition to rules in (C), other rules add a second layer

  • f links from chr8.

LINK GEOMETRY

23

slide-24
SLIDE 24

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

Regions of similarity between human and dog genomes. (A) human genome. (B) human ideograms. (C) dog genome. (D) dog ideograms, coded by most similar human chromosome. (E,F) link bundles connect similar regions. (F1) rules are used to color bundles by size. (F2) bundles twist when similarity involves opposite strands. American Scientist, Sept-Oct 2007. Cover figure by M Krzywinski.

EXAMPLE FROM LITERATURE

24

slide-25
SLIDE 25

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

RULES CAN CHANGE DATA GLYPH COLOR AND SIZE

The size and outline of each scatter plot glyph is influenced by the data

  • value. The data value itself can be altered, as see in the two outermost

collapsed scatter plots, where the value for each point has been set to 0 to display the glyphs at the same radius.

25

slide-26
SLIDE 26

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

Each track is associated with several internal counters. The value of the counters are different for each track and can be used to drive track generation from a single template.

TRACK DEFINITION WITH TEMPLATES

By referencing the template multiple times, new tracks can be created automatically, without having change the template.

26

slide-27
SLIDE 27

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

Three tracks showing sequence data. Each label corresponds to a base, colored by the identity of the base. In the first track, each base label is changed to “X” using rules. In the second track, a wingding symbol font is used, and the label is changed to “n”, which corresponds to a square glyph in this font. In the third track, the label is changed to “l”, which is a circle.

GLYPH TRACKS

27

slide-28
SLIDE 28

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

A single gene density data file is used to populate four tracks. Individual density data points are categorized based on categoreis: cancer genes (red), OMIM genes (orange), and all others (green). Rules are used to show specific categories in a track and to change the label from the category name (e.g. cancer) to an “l”, which is a circle in the wingding font.

BUBBLE DENSITY TRACK

28

slide-29
SLIDE 29

CONTROL AND INTEGRATION

implementation

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction 29

slide-30
SLIDE 30

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

Central configuration file defines data track information and imports other configuration files that store parameters that change less frequently. Each data file can be used for multiple tracks. PNG image output is used for immediate viewing, web-based reporting or presentation. SVG output is ideal for high-res publication and post-processing individual elements.

ALL INPUT IS PLAIN TEXT

30

slide-31
SLIDE 31

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

circos can adjust the visualization based on data values rules rules are snippets of code associated with at track circos is driven by plain text files and can be easily automated circos does not have an interface circos does not perform any analysis, several tools for this are included in tools/ circos is only a tool beautiful visualizations – yes ugly visualizations – yes

CIRCOS IN A NUTSHELL

31

slide-32
SLIDE 32

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

SESSIONS 2, 3, 4, 5, 6

practical sessions

32

slide-33
SLIDE 33

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

.

SESSION 2 - IDEOGRAM LAYOUT AND FORMATTING

33

slide-34
SLIDE 34

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

.

SESSION 3 - DATA TRACKS

34

slide-35
SLIDE 35

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

.

SESSION 4 - BUNDLES AND AUTOMATION

35

slide-36
SLIDE 36

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

You will start with a template configuration file that creates the image on the left. You will make changes to the file to reformat and add image elements to create the image on the right.

SESSION 5 - CIRCOS CHALLENGE

36

slide-37
SLIDE 37

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

Layout of ideogram from three genomes.

SESSION 6 - YEAST GENOME VISUALIZATION

37

slide-38
SLIDE 38

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

Genome duplication in Zyro genome (left) and between two Zyro chromosomes. Intra- and inter-chromosomal duplications are visually separated.

SESSION 6 - YEAST GENOME VISUALIZATION

38

slide-39
SLIDE 39

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

Conservation between large chromosomes in the three genomes. Dynamic rules are used to color the links.

SESSION 6 - YEAST GENOME VISUALIZATION

39

slide-40
SLIDE 40

GENOME VISUALIZATION WITH CIRCOS · Session 1 · Introduction

.

.

40