Statistical LeaRning
Katja Nowick
Bioinformatics group,
Markus Kreuz
IMISE
Statistical LeaRning Katja Nowick Bioinformatics group, Markus - - PowerPoint PPT Presentation
Statistical LeaRning Katja Nowick Bioinformatics group, Markus Kreuz IMISE What is R? Programming/scripting language Comprehensive statistical environment Strength : statistical data analysis + graphical display Why use R?
Bioinformatics group,
IMISE
query(MotifDb, "DAL80") pfm.dal80.jaspar = query(MotifDb, "DAL80")[[1]] seqLogo(pfm.dal80.jaspar) dal1 = "YIR027C" chromosomal.loc = transcriptsBy(TxDb.Scerevisiae.UCSC.sacCer3.sgdGene, by = "gene")[dal1] promoter.dal1 = getPromoterSeq(chromosomal.loc, Scerevisiae, upstream=1000, downstream=0) pcm.dal80.jaspar = round(100 * pfm.dal80.jaspar) matchPWM(pcm.dal80.jaspar, unlist(promoter.dal1)[[1]], "90%")
Finding binding sites for a transcription factor in the promoter of a gene With only 8 lines of code:
files = list.files("fastq", full=TRUE) names(files) = sub(".fastq", "", basename(files)) qas = lapply(seq_along(files), function(i, files) qa(readFastq(files[i]), names(files)[i]), files) qa <- do.call(rbind, qas) save(qa, file=file.path("output", "qa.rda")) browseURL(report(qa))
With 6 lines of code: From a directory of FastQ files to a full quality report:
Hcounts=read.table("segemehl.hg19.readCount", head=T, sep="\t", quote="", stringsAsFactor=FALSE, row.names="id") Ccounts=read.table("segemehl.panTro3.readCount", head=T, sep="\t", quote="", stringsAsFactor=FALSE, row.names="id") counts=cbind(Hcounts, Ccounts) colnames(counts)=c("H1", "H2", "H3", "C1", "C2", "C3") groups=c(0,0,0,1,1,1) counts_DESeq_obj=newCountDataSet(counts, groups) counts_DESeq_obj=estimateSizeFactors(counts_DESeq_obj) counts_DESeq_obj=estimateDispersions(counts_DESeq_obj) DESeq_DEgenes=nbinomTest(counts_DESeq_obj, "0", "1") plot(DESeq_DEgenes$baseMean,DESeq_DEgenes$log2FoldChange,log="x", pch=20, cex=.1,col=ifelse(DESeq_DEgenes$padj < 0.05, "red", "black" ) ) signDESeq_DEgenes=DESeq_DEgenes[DESeq_DEgenes$padj<0.05,] head(signDESeq_DEgenes[order(signDESeq_DEgenes$padj),])
From mapped and counted RNA-Seq data to differentially expressed genes
http://www.r-project.org/
– http://www.math.ilstu.edu/dhkim/Rstuff/Rtutor.html – http://www.statmethods.net/index.html – http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/R_Bi
Dec 3rd: Introduction to R Dec 10th: Statistics and Graphics Dec 17th: A small programming project Jan 14th: Analysis of gene expression data Jan 21st: Clustering and Gene Ontology
Multiple exercises in between
a binding site for the transcription factor EGR1:
window from 5000 bp upstream to 2000 bp downstream of the transcription start site
macaque genome to answer this question
human genome
Multiple exercises in between