complemen ng computa on with visualiza on in genomics
play

Complemen(ng*Computa(on*with* Visualiza(on*in*Genomics* - PowerPoint PPT Presentation

British Columbia Cancer Agency Genome Sciences Centre Vancouver . British Columbia . Canada Complemen(ng*Computa(on*with* Visualiza(on*in*Genomics* March*11,*2010* EBI*Interfaces*Interest*Forum* Cydney*Nielsen* Discovery*path*


  1. British Columbia Cancer Agency Genome Sciences Centre Vancouver . British Columbia . Canada Complemen(ng*Computa(on*with* Visualiza(on*in*Genomics* March*11,*2010* EBI*Interfaces*Interest*Forum* Cydney*Nielsen*

  2. Discovery*path* Biological*Sample* Genomic*Data* Scien(fic*Insight*

  3. Discovery*path* Biological*Sample* Genomic*Data* Scien(fic*Insight*

  4. Components*of*Data*Analysis* Automa(on* Analysis* Genomic*Data* Scien(fic*Insight* Human*Judgment*

  5. Outline* • Genome*Assembly*Visualiza(on* – ABySSMExplorer* • Complement*to*genome*browsing** – Using*clustering*and*interac(ve*data* explora(on*

  6. Outline* • Genome*Assembly*Visualiza(on* – ABySSMExplorer* • Complement*to*genome*browsing** – Using*clustering*and*interac(ve*data* explora(on*

  7. Genome*Sequencing* cell*popula(on* extracted*DNA* Shotgun approach sheared*DNA* sequencing*reads* AGCGGATTGCATGACAGT* GCGCTACGATCAGATCAA* GTACAGCCTGACAGAAGC* CATGACAGTCCGAGTACA* TTCAGAATGGTACAGCAG*

  8. ABySS*–*Assembly*By*Short*Sequences* Simpson* et$al. *Genome*Res*2009 $ GGACA GGACATC TC Sequencing*read*set*(read*length*=*7*nt):* GGACA GGACAGA GA Corresponding*de*Bruijn*graph*( k *=*5*nt):*

  9. ABySS*–*Assembly*By*Short*Sequences* Simpson* et$al. *Genome*Res*2009 $ GGACA GGACATC TC Sequencing*read*set*(read*length*=*7*nt):* GGACA GGACAGA GA Corresponding*de*Bruijn*graph*( k *=*5*nt):* ABySS*merges*unambiguously*connected*ver(ces*to*form*con(gs*

  10. Assembly*Ambigui(es* True*genome*sequence* GGATTGAAAAAAAAAAAAAAAAGTAGCACGAATATACATAGAAAAAAAAAAAAAAAAATTACG

  11. Assembly*Ambigui(es* True*genome*sequence* GGATTGAAAAAAAAAAAAAAAAGTAGCACGAATATACATAGAAAAAAAAAAAAAAAAATTACG Assembled sequence de Bruijn graph representation

  12. Star(ng*Point* Shaun*Jackman*

  13. Example*of*exis(ng*tools:*Consed*

  14. Example*of*exis(ng*tools:*Consed*

  15. Proper(es*of*DNA*

  16. Capture*sequence*strand* AAAAAT 2+* 1+*

  17. Capture*sequence*strand* AAAAAT 2+* 1+* TTTTTA 2M* 1M*

  18. Capture*sequence*strand* AAAAAT 1+* 2+* TTTTTA

  19. Capture*sequence*strand* AAAAAT 1M* 2M* TTTTTA

  20. Capture*sequence*length* one*oscilla(on*=*100*nt*

  21. Genome*Sequencing* cell*popula(on* extracted*DNA* read*pair*informa(on* read* sheared*DNA* dsDNA* fragment* (known*size)* sequencing*reads* AGCGGATTGCATGACAGT* read* GCGCTACGATCAGATCAA* GTACAGCCTGACAGAAGC* (typically*produce*millions)* CATGACAGTCCGAGTACA* TTCAGAATGGTACAGCAG*

  22. Capture*read*pair*informa(on* A_er*building*the*ini(al*singleMend*(SE)*con(gs*from* k Mmer*sequences,* ABySS*uses*pairedMend*reads*to*resolve*ambigui(es.*

  23. Capture*read*pair*informa(on* Paired*end*read*informa(on*is*used*the*construct*paired*end*(PE)*con(gs* …* 13+$$44&$$46+$$4+$$79+$$70+ *…* blue*gradient*=*paired*end*con(g* orange*=*selected*single*end*con(g*

  24. ABySSMExplorer* • *Visual*representa(on*of:* • *con(g*adjacency*informa(on* • *con(g*strand* • *con(g*length* • *pairedMend*rela(onships* • *pairedMend*con(gs* * • *Implemented*using*the*Java*Universal*Network/Graph* Framework*(JUNG)* • *Applied*the*KamadaMKawai*layout*algorithm*(JUNG* implementa(on)* • *Use*ABySS*files*as*input*(version*1.1.0*and*higher)*

  25. hdp://www.bcgsc.ca/plaeorm/bioinfo/so_ware/abyssMexplorer*

  26. Part*1:*Conclusions*and*Future*Work* • *Graph*encoding*provides*a*integrated*display*of* genome*assemblies*and*associated*metaMdata* • *This*representa(on*is*par(cularly*powerful*for* revealing*highMlevel*genome*assembly*structure,* not*readily*viewable*in*any*other*interac(ve*tool* • *Future*work*includes:* • *support*for*other*assembly*algorithm*outputs** • *enable*flexible*annota(on*display* • *integrate*with*exis(ng*assembly*edi(ng*tools*

  27. Outline* • Genome*Assembly*Visualiza(on* – ABySSMExplorer* • Complement*to*genome*browsing** – Using*clustering*and*interac(ve*data* explora(on*

  28. Genome*Sequencing* cell*popula(on* extracted*DNA* sheared*DNA* sequencing*reads* AGCGGATTGCATGACAGT* GCGCTACGATCAGATCAA* GTACAGCCTGACAGAAGC* (typically*produce*millions)* CATGACAGTCCGAGTACA* TTCAGAATGGTACAGCAG*

  29. Genome*Sequencing* cell*popula(on* extracted*DNA* sheared*DNA* sequencing*reads* AGCGGATTGCATGACAGT* GCGCTACGATCAGATCAA* GTACAGCCTGACAGAAGC* (typically*produce*millions)* CATGACAGTCCGAGTACA* TTCAGAATGGTACAGCAG*

  30. Genome*Sequencing* cell*popula(on* Chroma(n* Immunoprecipita(on* and*Sequencing** extracted*DNA* (ChIPMSeq)* selec/on * sheared*DNA* sequencing*reads* AGCGGATTGCATGACAGT* GCGCTACGATCAGATCAA* GTACAGCCTGACAGAAGC* (typically*produce*millions)* GTACAGCCTGACAGAAGC* CATGACAGTCCGAGTACA* TTCAGAATGGTACAGCAG* TTCAGAATGGTACAGCAG*

  31. Align*sequences*to*the*genome* CCGAGTACAGCCTGACAGA CCGAGTACAGCCTGACAGA $ GCATGACAGTCCGAGTAC GCATGACAGTCCGAGTAC $ TTGCATGACAGTCCGAGT $ TTGCATGACAGTCCGAGT AGCGGATTGCATGACAGT $ AGCGGATTGCATGACAGT AGCGGATTGCATGACAGT $ AGCGGATTGCATGACAGT AGCGGATTGCATGACAGT $ AGCGGATTGCATGACAGT Reference$Genome $ AGCGGATTGCATGACAGTCCGAGTACAGCCTGACAGA AGCGGATTGCATGACAGTCCGAGTACAGCCTGACAGA Read$coverage $ Genomic$coordinate $

  32. Genome*browser*can*reveal*local*paderns* H3K4me3 * H3K36me3 * H3K27me3 * H3K9me3 * H3K9Ac * MRE *

  33. Difficult*to*get*global*overview*

  34. Focus*on*regions*of*interest* 1.*For*example,*transcrip(onal*start*sites*(TSS*+/M*3000*nt)* H3K4me3* H3K9Ac* H3K4me1* H3K36me3* MeDIP* MRE* 2.*Extract*data*matrices* Normaliza(on*for*bin* i ,*sample* h :* 3.*Cluster*matrices*(kMmeans*clustering*with*Euclidean*distance)*

  35. Focus*on*regions*of*interest* 1.*For*example,*transcrip(onal*start*sites*(TSS*+/M*3000*nt)* H3K4me3* H3K9Ac* H3K4me1* H3K36me3* MeDIP* MRE* 2.*Extract*data*matrices* Normaliza(on*for*bin* i ,*sample* h :* 3.*Cluster*matrices*(kMmeans*clustering*with*Euclidean*distance)*

  36. Focus*on*regions*of*interest* 1.*For*example,*transcrip(onal*start*sites*(TSS*+/M*3000*nt)* H3K4me3* H3K9Ac* H3K4me1* H3K36me3* MeDIP* MRE* 2.*Extract*data*matrices* Normaliza(on*for*bin* i ,*sample* h :* 3.*Cluster*matrices*( k Mmeans*clustering*with*Euclidean*distance)*

  37. Enable*interac(ve*explora(on* 4.*Interac(ve*cluster*visualiza(on*(data*from*H1*cells)* cluster*size*indicator*(total*n=*15,618)* H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K4me3* H3K27me3* H3K9me3* H3K9Ac* MeDIP* H3K4me1* MRE* mRNA* H3K36me3* cluster** H3K27me3* (average*values*displayed)* individual*TSS* H3K9me3* MeDIP* H3K4me3* H3K9Ac* MRE* H3K4me1* H3K36me3* mRNA* H3K27me3* H3K9me3* MeDIP* MRE* HOXC12*gene* mRNA* scroll*bar*to*explore*all* cluster*members* 5.*LinkMout*to*UCSC*genome*browser*

  38. Enable*interac(ve*explora(on* 4.*Interac(ve*cluster*visualiza(on*(data*from*H1*cells)* cluster*size*indicator*(total*n=*15,618)* H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K4me3* H3K27me3* H3K9me3* H3K9Ac* MeDIP* H3K4me1* MRE* mRNA* H3K36me3* cluster** H3K27me3* (average*values*displayed)* individual*TSS* H3K9me3* MeDIP* H3K4me3* H3K9Ac* MRE* H3K4me1* H3K36me3* mRNA* H3K27me3* H3K9me3* MeDIP* MRE* mRNA* scroll*bar*to*explore*all* cluster*members*

  39. Enable*interac(ve*explora(on* 4.*Interac(ve*cluster*visualiza(on*(data*from*H1*cells)* cluster*size*indicator*(total*n=*15,618)* H3K4me3* H3K9Ac* H3K4me1* H3K36me3* H3K4me3* H3K27me3* H3K9me3* H3K9Ac* MeDIP* H3K4me1* MRE* mRNA* H3K36me3* cluster** H3K27me3* (average*values*displayed)* individual*TSS* H3K9me3* MeDIP* H3K4me3* H3K9Ac* MRE* H3K4me1* H3K36me3* mRNA* H3K27me3* H3K9me3* MeDIP* MRE* HOXC12*gene* mRNA* scroll*bar*to*explore*all* cluster*members* 5.*LinkMout*to*UCSC*genome*browser*

  40. Part*2:*Conclusions*and*Future*Work* • *Clustering*reveals*paderns*that*were*not*obvious* using*a*genome*browser.* * • *Access*to*both*global*and*detailed*view*is*valuable* • *Future*work*includes:* • *search*func(onality*(e.g.*by*region*id)* • *integra(on*with*other*clustering*tools* • *richer*analysis*func(onality*(e.g.*interac(ve* clustering)*

  41. Acknowledgements* NIH*Epigenomics*Roadmap* ABySSMExplorer* Joe Costello, UCSF Shaun Jackman Peggy Farnham, UC Davis İ nanç Birol Thea Tlsty, UCSF Jason Chang Marco Marra Martin Hirst Lymphoma*Project*Analyst* Yongjun Zhao Karen Mungall Nina Thiessen Richard Varhol Supervisor* Primary*Data*Genera(on* Steven Jones Lymphoma Genomics Team

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend