The Bioconductor Project Paula Andrea Martinez, PhD. Data Scientist - - PowerPoint PPT Presentation

the bioconductor project
SMART_READER_LITE
LIVE PREVIEW

The Bioconductor Project Paula Andrea Martinez, PhD. Data Scientist - - PowerPoint PPT Presentation

DataCamp Introduction to Bioconductor INTRODUCTION TO BIOCONDUCTOR The Bioconductor Project Paula Andrea Martinez, PhD. Data Scientist DataCamp Introduction to Bioconductor Bioconductor [1] Bioconductor (www.bioconductor.org) DataCamp


slide-1
SLIDE 1

DataCamp Introduction to Bioconductor

The Bioconductor Project

INTRODUCTION TO BIOCONDUCTOR

Paula Andrea Martinez, PhD.

Data Scientist

slide-2
SLIDE 2

DataCamp Introduction to Bioconductor

Bioconductor

[1] Bioconductor (www.bioconductor.org)

slide-3
SLIDE 3

DataCamp Introduction to Bioconductor

What do we measure and why?

Structure: elements, regions, size, order, relationships Function: expression, levels, regulation, phenotypes

slide-4
SLIDE 4

DataCamp Introduction to Bioconductor

How to install Bioconductor packages?

Biconductor has its own repository, way to install packages, and each release is designed to work with a specific version of R. For this course, you'll be using Bioconductor version 3.6. Bioconductor version 3.7 or earlier uses BiocLite: Bioconductor version 3.8 and later uses BiocManager:

source("https://bioconductor.org/biocLite.R") biocLite("packageName") if (!requireNamespace("BiocManager")) install.packages("BiocManager") BiocManager::install()

slide-5
SLIDE 5

DataCamp Introduction to Bioconductor

Bioconductor version and package version

BiocInstaller works for Bioconductor version 3.7 or earlier

# Check Bioconductor version (For versions <= 3.7) BiocInstaller::biocVersion() # or biocVersion() # Load a package library(packageName) # Check versions for reproducibility sessionInfo() # or packageVersion("packageName") # Check package updates (Bioconductor version <= 3.7) BiocInstaller::biocValid() # or biocValid()

slide-6
SLIDE 6

DataCamp Introduction to Bioconductor

Let's practice!

INTRODUCTION TO BIOCONDUCTOR

slide-7
SLIDE 7

DataCamp Introduction to Bioconductor

The Role of S4 in Bioconductor

INTRODUCTION TO BIOCONDUCTOR

Paula Andrea Martinez, PhD.

Data Scientist

slide-8
SLIDE 8

DataCamp Introduction to Bioconductor

S3

Positive CRAN, simple but powerful Flexible and interactive Uses a generic function Functionality depends on the first argument Example: plot() and methods(plot) Negative Bad at validating types and naming conventions (dot not dot?) Inheritance works, but depends on the input

slide-9
SLIDE 9

DataCamp Introduction to Bioconductor

S4

Positive Formal definition of classes Bioconductor reusability Has validation of types Naming conventions Example: mydescriptor <- new("GenomeDescription") Negative Complex structure compared to S3

slide-10
SLIDE 10

DataCamp Introduction to Bioconductor

Is it S4 or not?

Ask if an object is S4

str of S4 objects start with Formal class

isS4(mydescriptor) [1] TRUE str(mydescriptor) Formal class 'GenomeDescription' [package "GenomeInfoDb"] with 7 slots ...

slide-11
SLIDE 11

DataCamp Introduction to Bioconductor

S4 class definition

A class describes a representation name slots (methods/fields) contains (inheritance definition) Example

MyEpicProject <- setClass(# Define class name with UpperCamelCase "MyEpicProject", # Define slots, helpful for validation slots = c(ini = "Date", end = "Date", milestone = "character"), # Define inheritance contains = "MyProject")

slide-12
SLIDE 12

DataCamp Introduction to Bioconductor

S4 Accesors

Object summary

.S4methods(class = "GenomeDescription") [1] commonName organism provider providerVersion [5] releaseDate releaseName seqinfo seqnames [9] show toString bsgenomeName showMethods(classes = "GenomeDescription", where = search()) show(myDescriptor) | organism: () | provider: | provider version: | release date: | release name: | --- | seqlengths:

slide-13
SLIDE 13

DataCamp Introduction to Bioconductor

Let's practice!

INTRODUCTION TO BIOCONDUCTOR

slide-14
SLIDE 14

DataCamp Introduction to Bioconductor

Introducing biology of genomic datasets

INTRODUCTION TO BIOCONDUCTOR

Paula Andrea Martinez, PhD.

Data Scientist

slide-15
SLIDE 15

DataCamp Introduction to Bioconductor

slide-16
SLIDE 16

DataCamp Introduction to Bioconductor

slide-17
SLIDE 17

DataCamp Introduction to Bioconductor

Genome elements

Genetic information DNA alphabet A set of chromosomes (highly variable number) Genes (carry heredity instructions) coding and non-coding Proteins (responsible for specific functions) DNA-to-RNA (transcription) RNA-to-protein (translation)

slide-18
SLIDE 18

DataCamp Introduction to Bioconductor

Yeast

A single cell microorganism The fungus that people love ❤ Used for fermentation: beer, bread, kefir, kombucha, bioremediation, etc. Name: Saccharomyces cerevisiae or

  • S. cerevisiae
slide-19
SLIDE 19

DataCamp Introduction to Bioconductor

Yeast genome

BSgenome annotation package Using accessors

# load the package and store data into yeast library(BSgenome.Scerevisiae.UCSC.sacCer3) yeast <- BSgenome.Scerevisiae.UCSC.sacCer3 #interested in other genomes? available.genomes() # Chromosome number length(yeast) # Chromosome names names(yeast) # Sequence lengths seqlengths(yeast)

slide-20
SLIDE 20

DataCamp Introduction to Bioconductor

Get sequences

S4 method for BSgenome

# S4 method getSeq() requires a BSgenome object getSeq(yeast) # Select chromosome sequence by name, one or many getSeq(yeast, "chrM") # Select start, end and or width # end = 10, selects first 10 base pairs of each chromosome getSeq(yeast, end = 10)

slide-21
SLIDE 21

DataCamp Introduction to Bioconductor

Let's practice!

INTRODUCTION TO BIOCONDUCTOR