British Columbia Cancer Agency Genome Sciences Centre
Vancouver . British Columbia . Canada
Complemen(ng*Computa(on*with* Visualiza(on*in*Genomics*
March*11,*2010* EBI*Interfaces*Interest*Forum*
Complemen(ng*Computa(on*with* Visualiza(on*in*Genomics* - - PowerPoint PPT Presentation
British Columbia Cancer Agency Genome Sciences Centre Vancouver . British Columbia . Canada Complemen(ng*Computa(on*with* Visualiza(on*in*Genomics* March*11,*2010* EBI*Interfaces*Interest*Forum* Cydney*Nielsen* Discovery*path*
British Columbia Cancer Agency Genome Sciences Centre
Vancouver . British Columbia . Canada
March*11,*2010* EBI*Interfaces*Interest*Forum*
Biological*Sample* Genomic*Data* Scien(fic*Insight*
Biological*Sample* Genomic*Data* Scien(fic*Insight*
Genomic*Data* Scien(fic*Insight* Analysis* Automa(on* Human*Judgment*
cell*popula(on* extracted*DNA* sheared*DNA* sequencing*reads*
Shotgun approach
AGCGGATTGCATGACAGT* GTACAGCCTGACAGAAGC* CATGACAGTCCGAGTACA* TTCAGAATGGTACAGCAG* GCGCTACGATCAGATCAA*
Sequencing*read*set*(read*length*=*7*nt):* Corresponding*de*Bruijn*graph*(k*=*5*nt):* GGACA GGACATC TC GGACA GGACAGA GA
Simpson*et$al.*Genome*Res*2009$
Sequencing*read*set*(read*length*=*7*nt):* Corresponding*de*Bruijn*graph*(k*=*5*nt):* ABySS*merges*unambiguously*connected*ver(ces*to*form*con(gs* GGACA GGACATC TC GGACA GGACAGA GA
Simpson*et$al.*Genome*Res*2009$
True*genome*sequence*
GGATTGAAAAAAAAAAAAAAAAGTAGCACGAATATACATAGAAAAAAAAAAAAAAAAATTACG
Assembled sequence
de Bruijn graph representation
True*genome*sequence*
GGATTGAAAAAAAAAAAAAAAAGTAGCACGAATATACATAGAAAAAAAAAAAAAAAAATTACG
Shaun*Jackman*
AAAAAT 1+* 2+*
AAAAAT TTTTTA 1+* 2+* 1M* 2M*
AAAAAT 1+* 2+* TTTTTA
AAAAAT TTTTTA 1M* 2M*
cell*popula(on* extracted*DNA* sheared*DNA* sequencing*reads*
(typically*produce*millions)*
AGCGGATTGCATGACAGT* GTACAGCCTGACAGAAGC* CATGACAGTCCGAGTACA* TTCAGAATGGTACAGCAG* GCGCTACGATCAGATCAA*
read* read* dsDNA* fragment*
(known*size)*
read*pair*informa(on*
A_er*building*the*ini(al*singleMend*(SE)*con(gs*from*kMmer*sequences,* ABySS*uses*pairedMend*reads*to*resolve*ambigui(es.*
Paired*end*read*informa(on*is*used*the*construct*paired*end*(PE)*con(gs* blue*gradient*=*paired*end*con(g*
…*13+$$44&$$46+$$4+$$79+$$70+*…*
*
Framework*(JUNG)*
implementa(on)*
cell*popula(on* extracted*DNA* sheared*DNA* sequencing*reads*
(typically*produce*millions)*
AGCGGATTGCATGACAGT* GTACAGCCTGACAGAAGC* CATGACAGTCCGAGTACA* TTCAGAATGGTACAGCAG* GCGCTACGATCAGATCAA*
cell*popula(on* extracted*DNA* sheared*DNA* sequencing*reads*
(typically*produce*millions)*
AGCGGATTGCATGACAGT* GTACAGCCTGACAGAAGC* CATGACAGTCCGAGTACA* TTCAGAATGGTACAGCAG* GCGCTACGATCAGATCAA*
cell*popula(on* extracted*DNA* sheared*DNA* sequencing*reads*
(typically*produce*millions)*
AGCGGATTGCATGACAGT* GTACAGCCTGACAGAAGC* CATGACAGTCCGAGTACA* TTCAGAATGGTACAGCAG* GCGCTACGATCAGATCAA*
GTACAGCCTGACAGAAGC* TTCAGAATGGTACAGCAG*
selec/on*
AGCGGATTGCATGACAGTCCGAGTACAGCCTGACAGA AGCGGATTGCATGACAGTCCGAGTACAGCCTGACAGA
Reference$Genome$
CCGAGTACAGCCTGACAGA CCGAGTACAGCCTGACAGA$ GCATGACAGTCCGAGTAC GCATGACAGTCCGAGTAC$ AGCGGATTGCATGACAGT AGCGGATTGCATGACAGT$ AGCGGATTGCATGACAGT AGCGGATTGCATGACAGT$ AGCGGATTGCATGACAGT AGCGGATTGCATGACAGT$ TTGCATGACAGTCCGAGT TTGCATGACAGTCCGAGT$
Genomic$coordinate$ Read$coverage$
H3K4me3 * H3K36me3 * H3K27me3 * H3K9me3 * H3K9Ac * MRE *
2.*Extract*data*matrices* 1.*For*example,*transcrip(onal*start*sites*(TSS*+/M*3000*nt)*
H3K4me3* H3K9Ac* H3K4me1* H3K36me3* MeDIP* MRE*
3.*Cluster*matrices*(kMmeans*clustering*with*Euclidean*distance)*
Normaliza(on*for*bin*i,*sample*h:*
2.*Extract*data*matrices* 1.*For*example,*transcrip(onal*start*sites*(TSS*+/M*3000*nt)*
H3K4me3* H3K9Ac* H3K4me1* H3K36me3* MeDIP* MRE*
3.*Cluster*matrices*(kMmeans*clustering*with*Euclidean*distance)*
Normaliza(on*for*bin*i,*sample*h:*
2.*Extract*data*matrices* 1.*For*example,*transcrip(onal*start*sites*(TSS*+/M*3000*nt)*
H3K4me3* H3K9Ac* H3K4me1* H3K36me3* MeDIP* MRE*
3.*Cluster*matrices*(kMmeans*clustering*with*Euclidean*distance)*
Normaliza(on*for*bin*i,*sample*h:*
cluster*size*indicator*(total*n=*15,618)* individual*TSS* cluster** (average*values*displayed)* scroll*bar*to*explore*all* cluster*members*
H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K27me3* H3K9me3* MeDIP* MRE* mRNA* H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K27me3* H3K9me3* MeDIP* MRE* mRNA*
4.*Interac(ve*cluster*visualiza(on*(data*from*H1*cells)*
HOXC12*gene*
5.*LinkMout*to*UCSC*genome*browser*
H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K27me3* H3K9me3* MeDIP* MRE* mRNA*
cluster*size*indicator*(total*n=*15,618)* individual*TSS* cluster** (average*values*displayed)* scroll*bar*to*explore*all* cluster*members*
H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K27me3* H3K9me3* MeDIP* MRE* mRNA* H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K27me3* H3K9me3* MeDIP* MRE* mRNA*
4.*Interac(ve*cluster*visualiza(on*(data*from*H1*cells)*
H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K27me3* H3K9me3* MeDIP* MRE* mRNA*
cluster*size*indicator*(total*n=*15,618)* individual*TSS* cluster** (average*values*displayed)* scroll*bar*to*explore*all* cluster*members*
H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K27me3* H3K9me3* MeDIP* MRE* mRNA* H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K27me3* H3K9me3* MeDIP* MRE* mRNA*
4.*Interac(ve*cluster*visualiza(on*(data*from*H1*cells)*
HOXC12*gene*
5.*LinkMout*to*UCSC*genome*browser*
H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K27me3* H3K9me3* MeDIP* MRE* mRNA*
ABySSMExplorer* Lymphoma*Project*Analyst* Shaun Jackman İnanç Birol Jason Chang Karen Mungall Lymphoma Genomics Team Supervisor* Steven Jones Primary*Data*Genera(on* NIH*Epigenomics*Roadmap* Joe Costello, UCSF Peggy Farnham, UC Davis Thea Tlsty, UCSF Marco Marra Martin Hirst Yongjun Zhao Nina Thiessen Richard Varhol