Complemen(ng*Computa(on*with* Visualiza(on*in*Genomics* - - PowerPoint PPT Presentation

complemen ng computa on with visualiza on in genomics
SMART_READER_LITE
LIVE PREVIEW

Complemen(ng*Computa(on*with* Visualiza(on*in*Genomics* - - PowerPoint PPT Presentation

British Columbia Cancer Agency Genome Sciences Centre Vancouver . British Columbia . Canada Complemen(ng*Computa(on*with* Visualiza(on*in*Genomics* March*11,*2010* EBI*Interfaces*Interest*Forum* Cydney*Nielsen* Discovery*path*


slide-1
SLIDE 1

British Columbia Cancer Agency Genome Sciences Centre

Vancouver . British Columbia . Canada

Complemen(ng*Computa(on*with* Visualiza(on*in*Genomics*

March*11,*2010* EBI*Interfaces*Interest*Forum*

Cydney*Nielsen*

slide-2
SLIDE 2

Discovery*path*

Biological*Sample* Genomic*Data* Scien(fic*Insight*

slide-3
SLIDE 3

Discovery*path*

Biological*Sample* Genomic*Data* Scien(fic*Insight*

slide-4
SLIDE 4

Components*of*Data*Analysis*

Genomic*Data* Scien(fic*Insight* Analysis* Automa(on* Human*Judgment*

slide-5
SLIDE 5

Outline*

  • Genome*Assembly*Visualiza(on*

– ABySSMExplorer*

  • Complement*to*genome*browsing**

– Using*clustering*and*interac(ve*data* explora(on*

slide-6
SLIDE 6

Outline*

  • Genome*Assembly*Visualiza(on*

– ABySSMExplorer*

  • Complement*to*genome*browsing**

– Using*clustering*and*interac(ve*data* explora(on*

slide-7
SLIDE 7

Genome*Sequencing*

cell*popula(on* extracted*DNA* sheared*DNA* sequencing*reads*

Shotgun approach

AGCGGATTGCATGACAGT* GTACAGCCTGACAGAAGC* CATGACAGTCCGAGTACA* TTCAGAATGGTACAGCAG* GCGCTACGATCAGATCAA*

slide-8
SLIDE 8

ABySS*–*Assembly*By*Short*Sequences*

Sequencing*read*set*(read*length*=*7*nt):* Corresponding*de*Bruijn*graph*(k*=*5*nt):* GGACA GGACATC TC GGACA GGACAGA GA

Simpson*et$al.*Genome*Res*2009$

slide-9
SLIDE 9

ABySS*–*Assembly*By*Short*Sequences*

Sequencing*read*set*(read*length*=*7*nt):* Corresponding*de*Bruijn*graph*(k*=*5*nt):* ABySS*merges*unambiguously*connected*ver(ces*to*form*con(gs* GGACA GGACATC TC GGACA GGACAGA GA

Simpson*et$al.*Genome*Res*2009$

slide-10
SLIDE 10

Assembly*Ambigui(es*

True*genome*sequence*

GGATTGAAAAAAAAAAAAAAAAGTAGCACGAATATACATAGAAAAAAAAAAAAAAAAATTACG

slide-11
SLIDE 11

Assembly*Ambigui(es*

Assembled sequence

de Bruijn graph representation

True*genome*sequence*

GGATTGAAAAAAAAAAAAAAAAGTAGCACGAATATACATAGAAAAAAAAAAAAAAAAATTACG

slide-12
SLIDE 12

Star(ng*Point*

Shaun*Jackman*

slide-13
SLIDE 13

Example*of*exis(ng*tools:*Consed*

slide-14
SLIDE 14

Example*of*exis(ng*tools:*Consed*

slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18

Proper(es*of*DNA*

slide-19
SLIDE 19

Capture*sequence*strand*

AAAAAT 1+* 2+*

slide-20
SLIDE 20

Capture*sequence*strand*

AAAAAT TTTTTA 1+* 2+* 1M* 2M*

slide-21
SLIDE 21

Capture*sequence*strand*

AAAAAT 1+* 2+* TTTTTA

slide-22
SLIDE 22

Capture*sequence*strand*

AAAAAT TTTTTA 1M* 2M*

slide-23
SLIDE 23
slide-24
SLIDE 24
  • ne*oscilla(on*=*100*nt*

Capture*sequence*length*

slide-25
SLIDE 25

Genome*Sequencing*

cell*popula(on* extracted*DNA* sheared*DNA* sequencing*reads*

(typically*produce*millions)*

AGCGGATTGCATGACAGT* GTACAGCCTGACAGAAGC* CATGACAGTCCGAGTACA* TTCAGAATGGTACAGCAG* GCGCTACGATCAGATCAA*

read* read* dsDNA* fragment*

(known*size)*

read*pair*informa(on*

slide-26
SLIDE 26

A_er*building*the*ini(al*singleMend*(SE)*con(gs*from*kMmer*sequences,* ABySS*uses*pairedMend*reads*to*resolve*ambigui(es.*

Capture*read*pair*informa(on*

slide-27
SLIDE 27

Capture*read*pair*informa(on*

Paired*end*read*informa(on*is*used*the*construct*paired*end*(PE)*con(gs* blue*gradient*=*paired*end*con(g*

  • range*=*selected*single*end*con(g*

…*13+$$44&$$46+$$4+$$79+$$70+*…*

slide-28
SLIDE 28

ABySSMExplorer*

  • *Visual*representa(on*of:*
  • *con(g*adjacency*informa(on*
  • *con(g*strand*
  • *con(g*length*
  • *pairedMend*rela(onships*
  • *pairedMend*con(gs*

*

  • *Implemented*using*the*Java*Universal*Network/Graph*

Framework*(JUNG)*

  • *Applied*the*KamadaMKawai*layout*algorithm*(JUNG*

implementa(on)*

  • *Use*ABySS*files*as*input*(version*1.1.0*and*higher)*
slide-29
SLIDE 29

hdp://www.bcgsc.ca/plaeorm/bioinfo/so_ware/abyssMexplorer*

slide-30
SLIDE 30

Part*1:*Conclusions*and*Future*Work*

  • *Graph*encoding*provides*a*integrated*display*of*

genome*assemblies*and*associated*metaMdata*

  • *This*representa(on*is*par(cularly*powerful*for*

revealing*highMlevel*genome*assembly*structure,* not*readily*viewable*in*any*other*interac(ve*tool*

  • *Future*work*includes:*
  • *support*for*other*assembly*algorithm*outputs**
  • *enable*flexible*annota(on*display*
  • *integrate*with*exis(ng*assembly*edi(ng*tools*
slide-31
SLIDE 31

Outline*

  • Genome*Assembly*Visualiza(on*

– ABySSMExplorer*

  • Complement*to*genome*browsing**

– Using*clustering*and*interac(ve*data* explora(on*

slide-32
SLIDE 32

Genome*Sequencing*

cell*popula(on* extracted*DNA* sheared*DNA* sequencing*reads*

(typically*produce*millions)*

AGCGGATTGCATGACAGT* GTACAGCCTGACAGAAGC* CATGACAGTCCGAGTACA* TTCAGAATGGTACAGCAG* GCGCTACGATCAGATCAA*

slide-33
SLIDE 33

Genome*Sequencing*

cell*popula(on* extracted*DNA* sheared*DNA* sequencing*reads*

(typically*produce*millions)*

AGCGGATTGCATGACAGT* GTACAGCCTGACAGAAGC* CATGACAGTCCGAGTACA* TTCAGAATGGTACAGCAG* GCGCTACGATCAGATCAA*

slide-34
SLIDE 34

Genome*Sequencing*

cell*popula(on* extracted*DNA* sheared*DNA* sequencing*reads*

(typically*produce*millions)*

AGCGGATTGCATGACAGT* GTACAGCCTGACAGAAGC* CATGACAGTCCGAGTACA* TTCAGAATGGTACAGCAG* GCGCTACGATCAGATCAA*

Chroma(n* Immunoprecipita(on* and*Sequencing** (ChIPMSeq)*

GTACAGCCTGACAGAAGC* TTCAGAATGGTACAGCAG*

selec/on*

slide-35
SLIDE 35

Align*sequences*to*the*genome*

AGCGGATTGCATGACAGTCCGAGTACAGCCTGACAGA AGCGGATTGCATGACAGTCCGAGTACAGCCTGACAGA

Reference$Genome$

CCGAGTACAGCCTGACAGA CCGAGTACAGCCTGACAGA$ GCATGACAGTCCGAGTAC GCATGACAGTCCGAGTAC$ AGCGGATTGCATGACAGT AGCGGATTGCATGACAGT$ AGCGGATTGCATGACAGT AGCGGATTGCATGACAGT$ AGCGGATTGCATGACAGT AGCGGATTGCATGACAGT$ TTGCATGACAGTCCGAGT TTGCATGACAGTCCGAGT$

Genomic$coordinate$ Read$coverage$

slide-36
SLIDE 36

H3K4me3 * H3K36me3 * H3K27me3 * H3K9me3 * H3K9Ac * MRE *

Genome*browser*can*reveal*local*paderns*

slide-37
SLIDE 37

Difficult*to*get*global*overview*

slide-38
SLIDE 38

2.*Extract*data*matrices* 1.*For*example,*transcrip(onal*start*sites*(TSS*+/M*3000*nt)*

H3K4me3* H3K9Ac* H3K4me1* H3K36me3* MeDIP* MRE*

3.*Cluster*matrices*(kMmeans*clustering*with*Euclidean*distance)*

Focus*on*regions*of*interest*

Normaliza(on*for*bin*i,*sample*h:*

slide-39
SLIDE 39

2.*Extract*data*matrices* 1.*For*example,*transcrip(onal*start*sites*(TSS*+/M*3000*nt)*

H3K4me3* H3K9Ac* H3K4me1* H3K36me3* MeDIP* MRE*

3.*Cluster*matrices*(kMmeans*clustering*with*Euclidean*distance)*

Focus*on*regions*of*interest*

Normaliza(on*for*bin*i,*sample*h:*

slide-40
SLIDE 40

2.*Extract*data*matrices* 1.*For*example,*transcrip(onal*start*sites*(TSS*+/M*3000*nt)*

H3K4me3* H3K9Ac* H3K4me1* H3K36me3* MeDIP* MRE*

3.*Cluster*matrices*(kMmeans*clustering*with*Euclidean*distance)*

Focus*on*regions*of*interest*

Normaliza(on*for*bin*i,*sample*h:*

slide-41
SLIDE 41

cluster*size*indicator*(total*n=*15,618)* individual*TSS* cluster** (average*values*displayed)* scroll*bar*to*explore*all* cluster*members*

H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K27me3* H3K9me3* MeDIP* MRE* mRNA* H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K27me3* H3K9me3* MeDIP* MRE* mRNA*

4.*Interac(ve*cluster*visualiza(on*(data*from*H1*cells)*

HOXC12*gene*

5.*LinkMout*to*UCSC*genome*browser*

H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K27me3* H3K9me3* MeDIP* MRE* mRNA*

Enable*interac(ve*explora(on*

slide-42
SLIDE 42

cluster*size*indicator*(total*n=*15,618)* individual*TSS* cluster** (average*values*displayed)* scroll*bar*to*explore*all* cluster*members*

H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K27me3* H3K9me3* MeDIP* MRE* mRNA* H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K27me3* H3K9me3* MeDIP* MRE* mRNA*

4.*Interac(ve*cluster*visualiza(on*(data*from*H1*cells)*

H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K27me3* H3K9me3* MeDIP* MRE* mRNA*

Enable*interac(ve*explora(on*

slide-43
SLIDE 43

cluster*size*indicator*(total*n=*15,618)* individual*TSS* cluster** (average*values*displayed)* scroll*bar*to*explore*all* cluster*members*

H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K27me3* H3K9me3* MeDIP* MRE* mRNA* H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K27me3* H3K9me3* MeDIP* MRE* mRNA*

4.*Interac(ve*cluster*visualiza(on*(data*from*H1*cells)*

HOXC12*gene*

5.*LinkMout*to*UCSC*genome*browser*

H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K27me3* H3K9me3* MeDIP* MRE* mRNA*

Enable*interac(ve*explora(on*

slide-44
SLIDE 44
  • *Clustering*reveals*paderns*that*were*not*obvious*

using*a*genome*browser.* *

  • *Access*to*both*global*and*detailed*view*is*valuable*
  • *Future*work*includes:*
  • *search*func(onality*(e.g.*by*region*id)*
  • *integra(on*with*other*clustering*tools*
  • *richer*analysis*func(onality*(e.g.*interac(ve*

clustering)*

Part*2:*Conclusions*and*Future*Work*

slide-45
SLIDE 45

Acknowledgements*

ABySSMExplorer* Lymphoma*Project*Analyst* Shaun Jackman İnanç Birol Jason Chang Karen Mungall Lymphoma Genomics Team Supervisor* Steven Jones Primary*Data*Genera(on* NIH*Epigenomics*Roadmap* Joe Costello, UCSF Peggy Farnham, UC Davis Thea Tlsty, UCSF Marco Marra Martin Hirst Yongjun Zhao Nina Thiessen Richard Varhol