r bioconductor for sequence analysis
play

R / Bioconductor for Sequence Analysis Martin Morgan 1 June 20-23, - PowerPoint PPT Presentation

R / Bioconductor for Sequence Analysis Martin Morgan 1 June 20-23, 2011 1 mtmorgan@fhcrc.org Bioconductor Goal Help biologists understand their data Expression and other microarray Focus Sequence analysis Imaging, flow cytometry, . .


  1. R / Bioconductor for Sequence Analysis Martin Morgan 1 June 20-23, 2011 1 mtmorgan@fhcrc.org

  2. Bioconductor Goal Help biologists understand their data ◮ Expression and other microarray Focus ◮ Sequence analysis ◮ Imaging, flow cytometry, . . . ◮ Based on the R programming language – Themes statistics, visualization, interoperability ◮ Reproducible – scripts, vignettes , packages ◮ Open source / open development ◮ Contributions from ‘core’ members and (primarily academic) user community Status > 460 packages; very active web site and mailing list; annual conferences; courses; . . .

  3. Using R / Bioconductor Flexibility ◮ Programming language Leveraging resources, e.g., > library(GEOquery) SQL, XML, third party > eset = getGEO( ' ... ' ) libraries (e.g., samtools ) ◮ Scripts, vignettes, packages R statistical methods and ◮ Appeal visualization

  4. Using R / Bioconductor ◮ Programming language 1. Reproducibility > library(GEOquery) > eset = getGEO( ' ... ' ) 2. Communication ◮ Scripts, vignettes, packages 3. Enabling ◮ Appeal

  5. Using R / Bioconductor ◮ Programming language Statisticians > library(GEOquery) > eset = getGEO( ' ... ' ) Bioinformaticists ◮ Scripts, vignettes, packages . . . but not everyone! ◮ Appeal

  6. A Package Tour Bioconductor Pre-processing ◮ Expression and other microarrays Quality assessment ◮ Sequence analysis Differential expression (e.g., limma ) ◮ Annotation and archive Gene set enrichment resources ◮ Additional Many features for free, e.g., machine learning, visualization All of CRAN

  7. A Package Tour Bioconductor ◮ Expression and other microarrays Array CGH (e.g., DNAcopy ) ◮ Sequence analysis Methylation, epigenetics, miRNA ◮ Annotation and archive resources Genotyping (e.g., snpStats ) ◮ Additional All of CRAN

  8. A Package Tour Bioconductor ◮ Expression and other I/O, QA, manipulation microarrays RNAseq differential representation ◮ Sequence analysis (e.g., DESeq ) ◮ Annotation and archive Gene set analysis (e.g., goseq ) resources ChIPseq ◮ Additional Metabiome All of CRAN

  9. A Package Tour 50 ovarian cancer, 13 benign / normal RNAseq samples Bioconductor ◮ Expression and other microarrays ◮ Sequence analysis ◮ Annotation and archive resources ◮ Additional All of CRAN

  10. A Package Tour Differential representation in SOC vs. Control Bioconductor ◮ Expression and other microarrays ◮ Sequence analysis ◮ Annotation and archive resources ◮ Additional All of CRAN

  11. A Package Tour Bioconductor KEGG terms under-represented in ◮ Expression and other SOC microarrays Description P Value ◮ Sequence analysis 1 Spliceosome 0.0017 ◮ Annotation and archive 3 Ribosome 0.0073 resources 5 Cell cycle 0.0123 ◮ Additional ... All of CRAN Investigate intron abundances

  12. A Package Tour Curated, versioned (semi-annual) Bioconductor ◮ Chip ◮ Expression and other microarrays ◮ Organism ◮ Sequence analysis ◮ Pathway ◮ Annotation and archive ◮ Homology resources ◮ miRNA ◮ Additional biomaRt , UCSC All of CRAN GEO , ArrayExpress , SRA

  13. A Package Tour Examples: Identify human genes in Bioconductor ‘spliceosome’, ‘ribosome’, and ‘cell ◮ Expression and other cycle’ KEGG pathways. microarrays Discover and retrieve GEO ◮ Sequence analysis expression arrays related to ovarian ◮ Annotation and archive carcinomas. resources Remotely query 1000 genomes BAM ◮ Additional files for regions of interest, e.g., All of CRAN ‘spliceosome’ genes. Input TCGA ovarian cancer copy number and clinical data.

  14. A Package Tour 86 Paired HMS HG-CGH-244A TCGA samples Bioconductor ◮ Expression and other microarrays ◮ Sequence analysis ◮ Annotation and archive resources ◮ Additional All of CRAN

  15. A Package Tour Bioconductor ◮ Expression and other microarrays Pathways and networks ◮ Sequence analysis Flow cytometry ◮ Annotation and archive High-throughput qPCR resources Image processing ( e.g., EBImage ) ◮ Additional All of CRAN

  16. A Package Tour Bioconductor ◮ Expression and other microarrays 3000+ packages ◮ Sequence analysis Novel approaches, e.g., cghFLasso ◮ Annotation and archive Advanced statistical analyses, e.g., resources Bayesian network models ◮ Additional All of CRAN

  17. Common work flows Input / output ◮ Fasta, fastq – ShortRead ◮ SAM / BAM, tabix, indexed fasta – Rsamtools ◮ Genome tracks & related formats – rtracklayer Pre-processing / manipulation / count & measure ◮ String manipulation, pattern matching Biostrings ◮ Quality assessment ShortRead ◮ finding / counting overlaps GenomicRanges Analysis domains ◮ RNAseq, e.g., DESeq , edgeR , goseq ◮ ChIPseq, e.g., ChIPpeakAnno Annotation / variants ◮ AnnotationDbi / org.* , GenomicFeatures , BSgenome , biomaRt

  18. Useful data structures Biostrings , BSgenome ◮ XString , XStringSet GenomicRanges ◮ GappedAlignments – CIGAR ◮ GRanges / GRangesList – sequence, strand IRanges ◮ IRanges / IRangesList / RangedData – ranges ◮ Rle – run length encoding ◮ Views

  19. Effective compulational software Effective computational biology software 1. Extensive: data, annotation 2. Statistical: volume, technology, experimental design 3. Reproducible: long-term, multi-participant science 4. Current: novel, technology-driven 5. Accessible: affordable, transparent, usable

  20. Bioconductor Who ◮ FHCRC: Herv´ e Pag` es, Marc Carlson, Nishant Gopalakrishnan, Valerie Obenchain, Dan Tenenbaum, Chao-Jen Wong ◮ Robert Gentleman (Genentech), Vince Carey (Harvard / Brigham & Women’s), Rafael Irizzary (Johns Hopkins), Wolfgang Huber (EBI, Hiedelberg) ◮ A large number of contributors, world-wide Resources ◮ http://bioconductor.org: installation, packages, work flows, courses, events ◮ Mailing list: friendly prompt help ◮ Conference: Morning talks, afternoon workshops, evening social. 28-29 July, Seattle, WA. Developer Day July 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend