Characterizing transcriptomes using ngs data T. Kllman BILS/Scilife - - PowerPoint PPT Presentation

▶

Aug 13, 2023 130 likes •483 views

Characterizing transcriptomes using ngs data T. Kllman BILS/Scilife Lab/Uppsala University Feb. 2015 20150212 1/33 Outline The transcriptome 1 RNA sequence technologies 2 RNA-seq analysis 3 Mapping based approach Tools for working

SLIDE 1

Characterizing transcriptomes using ngs data

T. Källman

BILS/Scilife Lab/Uppsala University

Feb. 2015

20150212 1/33

SLIDE 2

Outline

1

The transcriptome

2

RNA sequence technologies

3

RNA-seq analysis Mapping based approach Tools for working with ngs alignments Gene expression from RNA-seq de-novo assembly

20150212 2/33

SLIDE 3

The transcriptome

The Central Dogma

ATG Promoter Region Intron Exon AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA UGA UAA UAG PO

S S 3’ Poly A tail 5’ Cap Methionine Stop Codons

Transcription and mRNA processing Translation Post-Translational Modification DNA mRNA Protein

5’ Un-Translated Region TATA

Active Protein 20150212 3/33

SLIDE 4

The transcriptome

A more complex view

20150212 4/33

SLIDE 5

The transcriptome

Transcriptomes vs genomes

Dynamic, not the same over tissues and time points Smaller sequence space Less repetitive (but large gene families can be found) Fairly stable in size? (eg. 2-4 fold change among eukaryotes, whereas genome size can vary 1000-fold) Genes are often expressed in multiple different splice-variants RNA often from only one strand

20150212 5/33

SLIDE 6

RNA sequence technologies

NGS data

20150212 6/33

SLIDE 7

RNA sequence technologies

Machine output

20150212 7/33

SLIDE 8

RNA sequence technologies

Machine output

20150212 8/33

SLIDE 9

RNA sequence technologies

Sequence quality

Phred quality scores: Q = -10 x log P (High Q = high probability of the base being correct A Phred quality score of 20 to a base, means that the base is called incorrectly in 1 out of 100 times.

20150212 9/33

SLIDE 10

RNA sequence technologies

Pair-end (PE) sequencing

20150212 10/33

SLIDE 11

RNA sequence technologies

Pair-end reads

File format Two files are created The order in files identical and naming of reads are the same with the exception of the end The way of naming reads are changing over time so the read names depend on software version

@61DFRAAXX100204:1:100:10494:3070/1 AAACAACAGGGCACATTGTCACTCTTGTATTTGAAAAACACTTTCCGGCCAT + ACCCCCCCCCCCCCCCCCCCCCCCCCCCCCBC?CCCCCCCCC@@CACCCCCA @61DFRAAXX100204:1:100:10494:3070/2 ATCCAAGTTAAAACAGAGGCCTGTGACAGACTCTTGGCCCATCGTGTTGATA + _^_a^cccegcgghhgZc`ghhcêgggd^_[d]defcdfd^ZÔXWaQâd

20150212 11/33

SLIDE 12

RNA sequence technologies

Pair-end data

20150212 12/33

SLIDE 13

RNA sequence technologies

Stranded or not

20150212 13/33

SLIDE 14

RNA-seq analysis

Two main routes for analysis

Haas & Zody (2010), Nature Biotechnology 28, 421–423 20150212 14/33

SLIDE 15

RNA-seq analysis Mapping based approach

Aligning short reads from RNA to genomes

Large number of programs available: Star, Tophat, Subread etc Important feature: Allow for spliced mapping

20150212 15/33

SLIDE 16

RNA-seq analysis Mapping based approach

Example workflow

Tophat: Aligns reads to genome (allows for spliced read mapping) Cufflinks: Extract transcripts from spliced read alignments Cuffmerge: Merge results from multiple Cufflinks results

Trapnell et al. (2012), Nature Protocols 7, 562–578 20150212 16/33

SLIDE 17

RNA-seq analysis Mapping based approach

Tophat

Efficient and fast alignment to the genome using bowtie2

Create a data base of putative splice junctions from the reads mapping in step 1

Map reads that did not map in step 1 run using the splice information

20150212 17/33

SLIDE 18

RNA-seq analysis Mapping based approach

Cufflinks

20150212 18/33

SLIDE 19

RNA-seq analysis Tools for working with ngs alignments

Samtools

Program to work with ngs alignment files (SAM, BAM, CRAM) Can be used to view data, calculate basic info, extract subsets of alignments and convert between file formats http://www.htslib.org

20150212 19/33

SLIDE 20

RNA-seq analysis Tools for working with ngs alignments

Picard

A set of Java command line tools with the same (or similar functionality as samtools) Note that even though they largely aim at doing similar functions Picard and Samtools is not always generating compatible file formats http://broadinstitute.github.io/picard/

20150212 20/33

SLIDE 21

RNA-seq analysis Tools for working with ngs alignments

Samtools tview, a text-based alignment viewer

$ samtools view alignment.bam target.fasta

20150212 21/33

SLIDE 22

RNA-seq analysis Tools for working with ngs alignments

IGV: Integrative Genomics Viewer

20150212 22/33

SLIDE 23

RNA-seq analysis Tools for working with ngs alignments