RNA-seq differential expression analysis bioconnector.org/workshops

Agenda • Our data: source, pre-processing, structure • Importing & exploring data • Processing and analysis with DESeq2 - Structuring the count data and metadata - Running the analysis - Extracting results • Data visualization • Alternative approaches

What this class is not • This is not an introductory R class. Pre-requisites: - Basic R skills: data frames, packages, importing data, saving results - Manipulating data with dplyr and %>% - Tidy data & advanced manipulation - Data Visualization with ggplot2 • This is not a statistics course. • This is not a comprehensive RNA-seq theory/practice course. Refer to the Conesa 2016 and Soneseson 2015 references on the workshop website. - We only discuss a simple 2-group design (treated vs. control). - Complex designs, multifactorial experiments, interactions, batch e ff ects, etc. - Transcriptome assembly & reference-free approaches - Upstream analysis...

What this class is not • This class does not cover upstream pre-processing. • Sequence read QA/QC • Our quantitation path: (Kallisto/Salmon + txImport): - "Alignment-free" transcript abundance estimation - Gene-level abundance summarization • Alternative path 1 (STAR + featureCounts): - Spliced alignment to genome - Counting reads overlapping exons • Alternative path 2 (Tophat+Cu ffl inks; HISAT+StringTie): - Spliced alignment to genome - Transcriptome assembly - Transcript abundance estimation

Course website: bioconnector.org • Data • Setup instructions • Lessons dropdown: RNA-seq: airway • ? dropdown: FAQs, resources, etc.

Our data: Background • Himes et al. "RNA-Seq Transcriptome Profiling Identifies CRISPLD2 as a Glucocorticoid Responsive Gene that Modulates Cytokine Function in Airway Smooth Muscle Cells." PLoS ONE . 2014 Jun 13;9(6):e99625. PMID: 24926665. • Glucocorticoids inhibit inflammatory processes, used to treat asthma because of anti- inflammatory e ff ects on airway smooth muscle (ASM) cells. • RNA-seq to profile gene expression changes in 4 ASM cell lines treated w/ dexamethasone (synthetic glucocorticoid). • Results: many di ff erentially expressed genes. Focus on CRISPLD2 - Encodes a secreted protein involved in lung development - SNPs in CRISPLD2 in previous GWAS associated w/ inhaled corticosteroid resistance and bronchodilator response in asthma patients. - Confirmed the upregulated CRISPLD2 w/ qPCR and increased protein expression w/ Western blotting. • They analyzed with Tophat and Cu ffl inks. We're taking a di ff erent approach with DESeq2. See recommended reading and resources page for more info.

Data pre-processing • Analyzing RNA-seq data starts with sequencing reads. • Many di ff erent approaches, see references on class website. • Our workflow (previously done): - Reads downloaded from GEO (GSE:GSE52778) - Quantify transcript abundance ( kallisto ). - Summarize to gene-level abundance – length-scaled counts ( txImport ). • Our starting point is a count matrix : each cell indicates the number of reads originating from a particular gene (in rows) for each sample (in columns).

Data structure: counts + metadata countData colData gene ctrl_1 ctrl_2 exp_1 exp_1 id treatment sex ... geneA 10 11 56 45 ctrl_1 control male ... geneB 0 0 128 54 ctrl_2 control female ... geneC 42 41 59 41 exp_1 treatment male ... geneD 103 122 1 23 exp_2 treatment female ... geneE 10 23 14 56 Sample names: geneF 0 1 2 0 ctrl_1, ctrl_2, exp_1, exp_2 … … … … … countData is the count matrix colData describes metadata (number of reads coming from about the columns of countData each gene for each sample) First column of colData must match column names of countData (-1st)

Counting is (relatively) easy:

Problem: transcript length bias Trapnell, Cole, et al. "Differential analysis of gene regulation at transcript resolution with RNA-seq." Nature biotechnology 31.1 (2013): 46-53.

Transcript quantification: kallisto • Don't need a basepair-to-basepair alignment. Only need to know abundance. • Kallisto determines which transcripts are compatible with the reads (and their abundance). Bray, N. L., Pimentel, H., Melsted, P ., & Pachter, L. (2016). Near-optimal probabilistic RNA-seq quantification. Nature biotechnology, 34(5), 525-527.

Gene-level summarization: txImport • Di ff erential gene expression (c/f transcript): - More powerful - More accurate - More interpretable • Gene-level summaries from transcript abundance estimates are more accurate than simple counts. Estimated abundance True abundance Soneson, C., Love, M. I., & Robinson, M. D. (2015). Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Research .

Getting Started • Go to bioconnector.org . Hit the data link on the top navbar. Download the following files, save them somewhere on your computer you can easily find. E.g., create a new folder on your desktop called airway and save it there, or move them to your project directory. - airway_scaledcounts.csv - airway_metadata.csv - annotables_grch38.csv • Using project management: Open your .Rproj file to start R running in the same folder as the data. File ➡ New file ➡ R script. Save this file as airway_analysis.R . • Not using project management: Open RStudio. File ➡ New file ➡ R script. Save this file as airway_analysis.R in the same folder as the data. Quit RStudio, then double-click the R script to open R running in that folder.) • Load the data: library(tidyverse) mycounts <- read_csv("airway_scaledcounts.csv") metadata <- read_csv("airway_metadata.csv")

RNA-seq differential expression analysis bioconnector.org/workshops - PowerPoint PPT Presentation

RNA-seq differential expression analysis bioconnector.org/workshops Agenda Our data: source, pre-processing, structure Importing & exploring data Processing and analysis with DESeq2 - Structuring the count data and metadata -

Introduction to RNA-Seq Mary Piper Bioinformatics Consultant and Trainer DataCamp RNA-Seq

RNA-seq basics: From reads to differential expression COMBINE RNA-seq Workshop RNA sequencing

Differential expression analysis Mary Piper Bioinformatics Consultant and Trainer DataCamp

Winter School, 2 July 2012 Why do RNA-seq? Differential expression analysis of Discover new

Overview of the DE analysis Mary Piper Bioinformatics Consultant and Trainer DataCamp RNA-Seq

Visualization of results Mary Piper Bioinformatics Consultant and Trainer DataCamp RNA-Seq

RNA-seq Data Analysis Introduction to RNA-seq data analysis June, 2018 1 Luigi Grassi < lg

Differential expression analysis for sequencing count data Simon Anders RNA-Seq Count data in

Normalization and differential expression II Katharina H oel Statistical Analysis of RNA-Seq

Differential Expression Analysis using limma COMBINE RNA-seq Workshop limma package: Linear

RNA-seq: filtering, quality control and visualisation COMBINE RNA-seq Workshop QC and

Jen Grenier Director, TREx Facility Announcements New and Improved Project Submission Form

Differential expression analysis John Blischak Instructor DataCamp Differential Expression

Confirming Differential Gene Expression in Honeybee flight muscles RNA seq analysis

RNA-seq Data Analysis Introduction to RNA-seq data analysis September, 2018 1 Guillermo Parada

Introduc)on to the Analysis of RNA-seq Data Lecture

Glucocorticoid Associated Osteoporosis Prevention and Treatment Jonathan Graf, MD Professor of

Pharmacy Students General Meeting February 17 th , 2016 5:45 pm Agenda for Today

UPDATE FROM ENETS 2019 Barcelona, Spain Dr. Kira Oleinikov, MD Endocrinologist, Hadassah-Hebrew

Management of Metastatic Pancreatic Neuroendocrine Tumors Jonathan R Strosberg, M.D. Associate

- ACS - CK Naber, P Urban, PJ Ong, M Valdes-Chavarri, A Abizaid, SJ Pocock, F Fabbiocchi, C

Newer Developments in ED Operations and Alternatives 1 Topics Community Cooperation

Stress and the body Stress response associated with hypertension, heart disease, T2DM,

The normal host requires q an intact cell-mediated immune system a sufficient number of,

Sambuz

Useful Links

Newsletter

Mail Us