RNA-seq: Analysis options Genome? Biological samples/Library - - PowerPoint PPT Presentation

rna seq analysis options
SMART_READER_LITE
LIVE PREVIEW

RNA-seq: Analysis options Genome? Biological samples/Library - - PowerPoint PPT Presentation

RNA-seq: Analysis options Genome? Biological samples/Library preparation Transcriptome Sequence reads FASTQ (+reference transcriptome index) Pseudocounts with Kallisto, Sailfish, Salmon Count matrix generated using tximport DGE with


slide-1
SLIDE 1

RNA-seq: Analysis options

slide-2
SLIDE 2

✓ Genome? ✓ Transcriptome

Sequence reads DGE with R: DESeq2, EdgeR, limma:voom

(+reference transcriptome index) Count matrix generated using tximport

DGE or isoform-level DE with R: Sleuth Pseudocounts with Kallisto, Sailfish, Salmon

FASTQ

Biological samples/Library preparation

Differential Expression Analysis Workflow #1

slide-3
SLIDE 3

multiple BAMs

Sequence reads Quality control: FASTQC DGE with R: DESeq2, EdgeR, limma:voom

FASTQ (+known GTF, optional) (+reference genome index) (+reference transcriptome index) Count matrix generated using tximport

DGE or isoform-level DE with R: Sleuth Pseudocounts with Kallisto, Sailfish, Salmon Quality control: Qualimap Quality control: MultiQC

FASTQ

Biological samples/Library preparation Alignment to Genome: HISAT2, STAR

✓ Genome? ✓ Transcriptome

Differential Expression Analysis Workflow #1

slide-4
SLIDE 4

multiple BAMs (+known GTF)

Sequence reads Alignment to Genome: HISAT2, STAR DGE with R: DESeq2, EdgeR, limma:voom Count reads associated with genes: htseq-count, featureCounts

FASTQ (+known GTF, optional) (+reference genome index)

✓ Genome ✓ GTF annotation file

(transcriptome)

Count matrix generated from BAM using featurecounts

Differential Expression Analysis Workflow #2

slide-5
SLIDE 5

multiple BAMs (+known GTF)

Sequence reads Alignment to Genome: HISAT2, STAR DGE with R: DESeq2, EdgeR, limma:voom Count reads associated with genes: htseq-count, featureCounts

FASTQ FASTQ (+known GTF, optional) (+reference genome index)

✓ Genome ✓ GTF annotation file

(transcriptome)

Count matrix generated from BAM using featurecounts multiple BAMs

Quality control: Qualimap Quality control: MultiQC

Differential Expression Analysis Workflow #2

Quality control: FASTQC

slide-6
SLIDE 6

multiple BAMs (+known GTF)

Sequence reads Alignment to Genome: HISAT2, STAR DGE with R: DESeq2, EdgeR, limma:voom Count reads associated with genes: htseq-count, featureCounts

FASTQ FASTQ (+known GTF, optional) (+reference genome index)

✓ Genome ✓ GTF annotation file

(transcriptome)

Count matrix generated from BAM using featurecounts multiple BAMs

Quality control: Qualimap Quality control: MultiQC

https:// hbctraining.github.io/ Intro-to-rnaseq-hpc-O2/ https:// hbctraining.github.io/ DGE_workshop/

Differential Expression Analysis Workflow #2

Quality control: FASTQC

slide-7
SLIDE 7

Reference-based assembly

  • Genome is known

Alternative methods: transcriptome assembly

slide-8
SLIDE 8

Reference-based assembly

  • Genome is known
  • Transcriptome not available or is not good enough

Alternative methods: transcriptome assembly

slide-9
SLIDE 9

Reference-based assembly

  • Genome is known
  • Transcriptome not available or is not good enough
  • Cufflinks and Scripture are two reference-based transcriptome assemblers

Alternative methods: transcriptome assembly

slide-10
SLIDE 10

Reference-based assembly

  • Genome is known
  • Transcriptome not available or is not good enough
  • Cufflinks and Scripture are two reference-based transcriptome assemblers
  • Additional annotation of any newly-discovered genes or isoforms will need

to be generated

Alternative methods: transcriptome assembly

slide-11
SLIDE 11

De novo assembly

  • Genome is not known, or is of poor quality

Alternative methods: transcriptome assembly

slide-12
SLIDE 12

De novo assembly

  • Genome is not known, or is of poor quality
  • Amount of data needed is greater than for a reference-based assembly

Alternative methods: transcriptome assembly

slide-13
SLIDE 13

De novo assembly

  • Genome is not known, or is of poor quality
  • Amount of data needed is greater than for a reference-based assembly
  • Oases, TransABySS, Trinity are examples of well-regarded transcriptome

assemblers, especially Trinity

Alternative methods: transcriptome assembly

slide-14
SLIDE 14

De novo assembly

  • Genome is not known, or is of poor quality
  • Amount of data needed is greater than for a reference-based assembly
  • Oases, TransABySS, Trinity are examples of well-regarded transcriptome

assemblers, especially Trinity

  • Newly-discovered genes or isoforms will need to be annotated using

homolog-based and other methodologies

Alternative methods: transcriptome assembly

slide-15
SLIDE 15

Reference-based assembly De novo assembly

Transcriptome Assembly

Martin J.A. and Wang Z., Nat. Rev. Genet. (2011) 12:671–682

slide-16
SLIDE 16

Sequence reads Annotate the genes/transcripts Pseudocounts with Kallisto, Sailfish, Salmon DGE with R: DESeq2, EdgeR, limma:voom DGE or isoform-level DE with R: Sleuth

Count matrix generated using tximport

Differential Expression Analysis Workflow #3

Alignment to Genome: HISAT2, STAR Merge assemblies from all samples Reference-based transcriptome assembly Quality control: FASTQC

slide-17
SLIDE 17

Sequence reads de novo assembly with Trinity Annotate the genes/transcripts

Differential Expression Analysis Workflow #4

Pseudocounts with Kallisto, Sailfish, Salmon DGE with R: DESeq2, EdgeR, limma:voom DGE or isoform-level DE with R: Sleuth

Count matrix generated using tximport

Quality control: FASTQC

slide-18
SLIDE 18

These materials have been developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.