drawing data one genome, four samples SESSION 2 MARTIN KRZYWINSKI - - PowerPoint PPT Presentation

drawing data one genome four samples
SMART_READER_LITE
LIVE PREVIEW

drawing data one genome, four samples SESSION 2 MARTIN KRZYWINSKI - - PowerPoint PPT Presentation

drawing data one genome, four samples SESSION 2 MARTIN KRZYWINSKI Genome Sciences Center BC Cancer Agency Vancouver, Canada EMBO GLOBAL EXCHANGE LECTURE COURSE: HIGH-THROUGHPUT NEXT GENERATION SEQUENCING APPLIED TO INFECTIOUS DISEASES


slide-1
SLIDE 1

GENOME VISUALIZATION WITH CIRCOS v20140922

Institut Pasteur de Tunis, Tunis, Tunesia Sep 15–25, 2014

EMBO GLOBAL EXCHANGE LECTURE COURSE: HIGH-THROUGHPUT NEXT GENERATION SEQUENCING APPLIED TO INFECTIOUS DISEASES

drawing data—

  • ne genome, four samples

Genome Sciences Center BC Cancer Agency Vancouver, Canada

SESSION 2

MARTIN KRZYWINSKI

slide-2
SLIDE 2

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

drawing and spacing ideograms relative ideogram spacing changing ideogram scale ideogram selection ideogram order drawing ideogram regions chromosome breaks

  • rdering ideogram regions

cytogenetic bands drawing multiple genomes ideogram progression and orientation relative and absolute ticks

This is the image you will create during this session. It contains chrs 1 & 2 from human and mouse

  • genomes. Each chromosome occupies 1/4 of the figure.

SESSION FINAL IMAGE

2

slide-3
SLIDE 3

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

LESSON 1

highlights

slide-4
SLIDE 4

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

> cat ../../../data/lm.gene.txt

LmxM.07 425833 429915 LmxM.07.0880 type=gene,strand=+,ID=LmxM. 07.0880,Name=LmxM.07.0880,description=protein+kinase%2C +putative,size=4083,web_id=LmxM.07.0880,locus_tag=LmxM. 07.0880,size=4083,Alias=322488462,401415487,LmxM07.0880,LmxM. 07.0880,LmxM07.0880.1,LmxM07.0880.1:pep LmxM.07 441328 442218 LmxM.07.0900 type=gene,strand=+,ID=LmxM. 07.0900,Name=LmxM.07.0900,description=serine%2Fthreonine+kinase%2C +putative%2Cprotein+kinase%2C+putative,size=891,web_id=LmxM. 07.0900,locus_tag=LmxM. 07.0900,size=891,Alias=322488464,401415491,LmxM07.0900,LmxM. 07.0900,LmxM07.0900.1,LmxM07.0900.1:pep

GENE ANNOTATION INPUT FILE

slide-5
SLIDE 5

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

# 2/1/etc/circos.conf chromosomes_units = 1000 chromosomes = -/00/ chromosomes_color = /./=white <plots> <plot> type = highlight file = conf(datadir)/lm.gene.txt fill_color = black r1 = dims(ideogram,radius_outer) r0 = dims(ideogram,radius_inner) minsize = 5u stroke_thickness = undef </plot> </plots>

.

GENES AS HIGHLIGHTS

slide-6
SLIDE 6

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

# 2/1/etc/circos.conf chromosomes_units = 1000 chromosomes = -/00/ chromosomes_color = /./=white <plots> <plot> type = highlight file = conf(datadir)/lm.gene.txt fill_color = black_a4 r1 = dims(ideogram,radius_outer) r0 = dims(ideogram,radius_inner) minsize = 5u stroke_thickness = undef </plot> </plots>

.

TRANSPARENCY

slide-7
SLIDE 7

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

# 2/1/etc/circos.conf chromosomes_units = 1000 chromosomes = -/00/ chromosomes_color = /./=white <plots> <plot> type = highlight file = conf(datadir)/lm.gene.txt fill_color = black_a4 r1 = dims(ideogram,radius_outer) r0 = dims(ideogram,radius_inner) minsize = 5u stroke_thickness = undef <rules> <rule> condition = var(description) =~ /putative/ fill_color = red z = 5 </rule> </rules> </plot> </plots>

.

RULES TO SHOW PUTATIVE PROTEINS

slide-8
SLIDE 8

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

LESSON 2

heatmaps

slide-9
SLIDE 9

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

# 2/2/etc/circos.conf <plot> type = heatmap r1 = 0.80r r0 = 0.76r file = conf(datadir)/lm.exp.ah063.txt color = reds-8-seq scale_log_base = 0.5 minsize = 25u <<include rules.heatmap.conf>> </plot> <plot> type = heatmap r1 = 0.75r r0 = 0.71r file = conf(datadir)/lm.exp.ah064.txt color = reds-8-seq scale_log_base = 0.5 minsize = 25u <<include rules.heatmap.conf>> </plot> ...

.

HEATMAPS, 4 SAMPLES

slide-10
SLIDE 10

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

# etc/rules.heatmap.conf <rules> <rule> use = yes condition = var(value) < 1000 show = no </rule> <rule> condition = 1 z = eval(var(value)) </rule> </rules>

.

HISTOGRAM

slide-11
SLIDE 11

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

The heatmap blocks had many common parameters <plot> type = heatmap r1 = 0.80r r0 = 0.76r file = conf(datadir)/lm.exp.ah063.txt color = reds-8-seq scale_log_base = 0.5 minsize = 25u <<include rules.heatmap.conf>> </plot> <plot> type = heatmap r1 = 0.75r r0 = 0.71r file = conf(datadir)/lm.exp.ah064.txt color = reds-8-seq scale_log_base = 0.5 minsize = 25u <<include rules.heatmap.conf>> </plot>

IMPORTING COMMON TRACK SETTINGS

# etc/heatmap.conf type = heatmap file = conf(datadir)/lm.exp.ah063.txt color = reds-8-seq scale_log_base = 0.5 minsize = 25u <<include rules.heatmap.conf>> # 2/2/etc/circos.conf <plot> <<include heatmap.conf>> r1 = 0.80r r0 = 0.76r </plot> <plot> <<include heatmap.conf>> r1 = 0.75r r0 = 0.71r </plot> ...

11

slide-12
SLIDE 12

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

LESSON 3

floating histograms

slide-13
SLIDE 13

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

LmxM.01 14846 16843 LmxM.01.0050 exp_ah063=64,exp_ah064=195,exp_ah065=154,exp_ah066=240, exp_min=64,exp_max=240,exp_avg=163.25,exp_range=176,exp_minah=ah063,exp_maxah=ah066 LmxM.01 27727 28314 LmxM.01.0110 exp_ah063=72,exp_ah064=117,exp_ah065=127,exp_ah066=212, exp_min=72,exp_max=212,exp_avg=132,exp_range=140,exp_minah=ah063,exp_maxah=ah066 The value of each data point is the gene name, e.g. LmxM.01.0050. We can change the value by using a rule <rule> condition = 1 value = eval(var(exp_ah063)) </rule>

IMPORTING COMMON TRACK SETTINGS

13

slide-14
SLIDE 14

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

# 2/3/etc/circos.conf <plot> sample = ah063 r1 = 0.78r r0 = 0.74r <<include heatmap.conf>> </plot> ... # etc/heatmap.conf type = heatmap color = reds-8-seq scale_log_base = 0.5 minsize = 25u <<include rules.heatmap.conf>> # etc/rules.heatmap.conf <rules> <rule> condition = 1 value = eval(var(exp_conf(.,sample))) flow = continue </rule> <rule> condition = var(value) < 1000 show = no </rule> <rule> condition = 1 z = eval(var(value)) </rule> </rules>

.

ABSOLUTE IDEOGRAM SCALE

slide-15
SLIDE 15

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

# 2/3/etc/circos.conf <plot> type = histogram float = yes r0 = 0.80r r1 = 0.90r fill_color = black stroke_thickness = undef min = 0 max = 300 minsize = 20u <<include etc/axes.conf>> <rules> <rule> condition = 1 value = eval(sqrt(var(exp_max))) valuebase = eval(sqrt(var(exp_min))) #flow = continue </rule> <rule> condition = var(exp_range) < 1000 fill_color = grey z = -10 </rule> <rule> condition = var(exp_range) > 10000 fill_color = red z = 10 </rule> </rules> </plot> </plots>

.

FLOATING HISTOGRAM

slide-16
SLIDE 16

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

# 2/3/etc/circos.conf <backgrounds> use = yes <background> y1 = 50 color = vvlgrey </background> <background> y0 = 150 color = vvlred </background> </backgrounds>

.

FLOATING HISTOGRAM

slide-17
SLIDE 17

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

# 2/3/etc/circos.conf <rules> <rule> condition = 1 value = eval(sqrt(var(exp_max))) valuebase = eval(sqrt(var(exp_min))) flow = continue </rule> <rule> condition = var(exp_range) < 1000 fill_color = grey z = -10 </rule> <rule> condition = var(exp_range) > 10000 fill_color = red z = 10 </rule> </rules>

.

FLOATING HISTOGRAM

slide-18
SLIDE 18

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

LESSON 4

scatter plot

slide-19
SLIDE 19

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

LmxM.01 14846 16843 LmxM.01.0050 exp_ah063=64,exp_ah064=195,exp_ah065=154,exp_ah066=240, exp_min=64,exp_max=240,exp_avg=163.25,exp_range=176,exp_minah=ah063,exp_maxah=ah066 The value of each data point is the gene name, e.g. LmxM.01.0050. We can use the name of the sample at which expression was minimum to draw/hide a point <rule> condition = var(exp_minah) ne “ah063” show = no </rule> As before, we can change the value by using a rule <rule> condition = 1 value = eval(var(exp_ah063)) </rule>

IMPORTING COMMON TRACK SETTINGS

19

slide-20
SLIDE 20

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

# etc 2/4/etc.circos.conf glyph_size = 8 <plot> show = yes type = scatter r0 = 0.92r r1 = 0.98r sample = ah063 <<include rules.scatter.conf>> <<include axes.scatter.conf>> </plot> # etc/rules.scatter.conf <rules> <rule> use = no condition = var(exp_range) < 1000 show = no </rule> <rule> condition = 1 value = eval(var(exp_max)) #flow = continue </rule>

.

SCATTER PLOT

slide-21
SLIDE 21

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

<rules> <rule> use = yes condition = var(exp_range) < 1000 show = no </rule> <rule> condition = 1 value = eval(var(exp_max)) flow = continue </rule> <rule> condition = var(exp_maxah) eq "conf(.,sample)" color = red #flow = goto fade if true </rule> <rule> condition = var(exp_minah) eq "conf(.,sample)" color = blue z = -5 #flow = goto fade if true </rule> </rules>

.

SCATTER PLOT

slide-22
SLIDE 22

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

<rules> <rule> condition = var(exp_range) < 1000 show = no </rule> <rule> condition = 1 value = eval(var(exp_max)) flow = continue </rule> <rule> condition = var(exp_maxah) eq "conf(.,sample)" color = red flow = goto fade if true </rule> <rule> condition = var(exp_minah) eq "conf(.,sample)" color = blue z = -5 flow = goto fade if true </rule> <rule> condition = 1 show = no </rule> <rule> tag = fade condition = var(exp_range) < 5000 color = eval(sprintf("%s_a5",var(color))) </rule> <rule> condition = var(exp_range) < 10000 color = eval(sprintf("%s_a3",var(color))) </rule> </rules>

.

SCATTER PLOT

slide-23
SLIDE 23

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

glyph_size = 12 <plot> show = yes type = scatter r0 = 0.92r r1 = 0.92r sample = ah063 <<include rules.scatter.conf>> <<include axes.scatter.conf>> </plot> <plot> show = yes type = scatter r0 = 0.94r r1 = 0.94r sample = ah064 <<include rules.scatter.conf>> <<include axes.scatter.conf>> </plot> <plot> show = yes type = scatter r0 = 0.96r r1 = 0.96r sample = ah065 <<include rules.scatter.conf>> <<include axes.scatter.conf>> </plot> ...

.

SCATTER PLOT

slide-24
SLIDE 24

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

LESSON 5

resampling data

slide-25
SLIDE 25

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

> ls ~/circos/data > cat lm.exp.txt | $CIRCOS/tools/resample/bin/resample -bin 25000 -count > gene.count.txt LmxM.00 0 24999 3 LmxM.00 25000 49999 6 LmxM.00 50000 74999 5 LmxM.00 75000 99999 5 LmxM.00 100000 124999 3 ...

IMPORTING COMMON TRACK SETTINGS

25

slide-26
SLIDE 26

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

# 2/5/etc/circos.conf <plot> type = histogram file = conf(datadir)/gene.count.txt r1 = 1r+100p r0 = 1r fill_color = black </plot> # 2/5/etc/ideogram.position.conf radius = 0.8r thickness = 10p fill = yes fill_color = black stroke_thickness = 1 stroke_color = black # 2/5/etc/ticks.conf show_grid = yes <ticks> radius = dims(ideogram,radius_outer) +100p ... <tick> grid = yes grid_start = 1r grid_end = 1r+100p grid_thickness =1p grid_color = grey ... </tick> ...

.

RESAMPLED HISTOGRAM

slide-27
SLIDE 27

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

# 2/5/etc/circos.conf <plots> show = yes ...

.

SCATTER PLOT

slide-28
SLIDE 28

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

LESSON 6

scatter plot with min/max error bars

slide-29
SLIDE 29

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

# 2/6/etc/circos.conf exp_range_cutoff = 10000 <plot> hide* = no sample = ah066 r1 = 0.99r r0 = 0.80r <<include minmax.conf>> </plot> # minmax.conf hide = conf(quick) type = histogram float = yes color = grey <<include rules.minmax.conf>> # rules.minmax.conf <rules> <<include rules.cutoff.conf>> <rule> condition = 1 value = eval(sqrt(var(exp_max))) valuebase = eval(sqrt(var(exp_min))) flow = continue </rule> </rules> # rules.cutoff.conf <rule> condition = var(exp_range) < conf(exp_range_cutoff) show = no </rule>

.

FLOATING HISTOGRAM

slide-30
SLIDE 30

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

# 2/6/etc/circos.conf exp_range_cutoff = 10000 <plot> hide* = no sample = ah066 r1 = 0.99r r0 = 0.80r <<include minmax.scatter.conf>> </plot> # minmax.scatter.conf hide = conf(quick) type = scatter glyph_size = 10 stroke_color = grey stroke_thickness = 1 <<include rules.minmax.scatter.conf>> <<include axes.conf>> # rules.minmax.scatter.conf <<include rules.cutoff.conf>> <rule> condition = 1 value = eval(sqrt(var(exp_conf (.,sample)))) flow = continue </rule> <rule> condition = 1 color = eval(sprintf( "rdbu-4-div-%d",remap_int((var(value)**2-var (exp_min))/var(exp_range),0,1,4,1) )) z = eval(var(value)) </rule> </rules>

.

SCATTER PLOT ON TOP

slide-31
SLIDE 31

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

# rules.minmax.scatter.conf <<include rules.cutoff.conf>> <rule> condition = 1 value = eval(sqrt(var(exp_conf(.,sample)))) flow = continue </rule> <rule> condition = 1 # remap_int(X,XMIN,XMAX,TARGETMIN,TARGETMAX) color = eval(sprintf("rdbu-4-div-%d",remap_int((var(value)**2-var(exp_min))/var(exp_range),0,1,4,1))) z = eval(var(value)) </rule> </rules>

COLOR MAPPING BASED ON VALUE

31

slide-32
SLIDE 32

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

# 2/6/etc/circos.conf exp_range_cutoff = 2500 # you can set this on the command line > circos -param exp_range_cutoff=2500

.

USING A DIFFERENT CUTOFF

slide-33
SLIDE 33

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

# 2/6/etc/circos.conf quick = no exp_range_cutoff = 10000

.

FOUR SAMPLES

slide-34
SLIDE 34

GENOME VISUALIZATION WITH CIRCOS · Session 2 · Drawing data—one genome, four samples

.