analysis of
play

Analysis of Ashley Sawle based on slides by Bernard Pereira The - PowerPoint PPT Presentation

Analysis of Ashley Sawle based on slides by Bernard Pereira The many faces of RNA-seq Techniques mRNA-seq Exome capture Targeted miRNA Small RNA piRNA Total RNA sncRNA Ribosome


  1. Analysis of � Ashley Sawle � based on slides by Bernard Pereira �

  2. � The many faces of RNA-seq – Techniques � • mRNA-seq � • Exome capture � • Targeted � miRNA • Small RNA � piRNA • Total RNA � sncRNA • Ribosome profiling • Single Cell RNA-Seq �

  3. The many faces of RNA-seq – Applications � Discovery � • Transcripts � • Isoforms � • Splice junctions � • Fusion genes � Differential expression � • Gene level expression changes Gene level expression changes � • Relative isoform abundance � • Splicing patterns � Variant calling �

  4. Microarray à RNA-seq � Guo et al. (2013) Plos One Wang et al (2014) Nature Biotech.

  5. Library Preparation & Sequencing � QC - RIN number � Multiplexing � Sigurgeirsson, Emanuelsson & Lundeberg (2014) PLOS ONE modified from Malone JH, Oliver B (2011) BMC Biol.

  6. Sources of Noise � Biological Technical Sampling Process

  7. Sources of Noise – Sampling Bias � Sample A Sample B Subsampling a from a pool of RNAs �

  8. Sources of Noise – Sampling Bias � Transcript B Transcript A � Transcript length affects the number of RNA fragments present in the library from that gene �

  9. Sources of Noise - Process �

  10. Sources of Noise - Process �

  11. Sources of Noise – Process � PCR � Duplicates � Optical � Index Swapping � Sequencing Errors �

  12. Raw Sequence QC - FASTQC � https://www.bioinformatics.babraham.ac.uk/projects/fastqc/

  13. Trimming � • Quality-based Trimming � • Adapter contamination � Insert 50 bases

  14. Adapter contamination - FASTQC �

  15. Sequence to Sense � Conesa et al. (2016) Genome Biology

  16. De Novo assembly � e.g. TRINITY Haas, B.J.. et al (2013) Nature Protocols

  17. Analysis Overview � Mapping � Summarisation � Normalisation � DE analysis � Functional analysis �

  18. Reference-based assembly � Genome mapping Genome mapping Transcriptome ranscriptome mapping mapping Can identify novel features • • No repetitive reference Splice aware? • • Novel features? Can be difficult to reconstruct • • How reliable is the isoform and gene structures transcriptome? Trapnell & Salzberg (2009) Nature Biotech

  19. A smart suit(e) for RNA-seq analysis � Trapnell, C. et al (2012) Nature Protocols

  20. Spliced Alignment �

  21. Spliced Alignment with Tophat/Bowtie � Kim, D. et al (2012) Genome Biology

  22. Visualising Mapping Results – IGV �

  23. Summarisation/Counting � Genome-based features � Transcript-based features � • Exon or gene boundaries? � • Transcript assembly � • Isoform structures � • Novel structures � • Gene multireads � • Isoform multireads � Oshlack, A. et al. (2010) Genome Biology

  24. Summarisation/Counting � e.g. Htseq or Subread

  25. Summarisation/Counting � Mortazavi, A. et al (2008) Nature Methods

  26. Counting �

  27. Normalisation � • Counting à estimate of relative counts for each gene Does this accurately r Does this accurately repr epresent the original population? esent the original population? Library size Gene Properties Sequencing depth varies GC content, length, sequence between samples Library composition Highly expressed genes overrepresented at cost of lowly expressed genes

  28. Normalisation - Scaling � Total Count � • Normalise each sample by total number of reads sequenced. � • Can also use another statistic similar to total count; eg. median, upper quartile � Scaling

  29. Normalisation - TPM � reads for gene A RPK for gene A length of gene A ÷ 1000 sum of all RPKs Scaling factor 1,000,000 RPK for gene A TPM for gene A Scaling factor

  30. Normalisation – Geometric Scaling � Geometric scaling factor Assumes that most genes are not differentially expressed • RC of Gene 1 GM of Gene 1 RC of Gene 2 GM of Gene 2 RC of Gene 3 GM of Gene 3 Median . . . . . . . . . . . . RC of Gene N GM of Gene N RC = read counts (per sample) GM =geometric mean (all samples)

  31. Normalisation – Trimmed Mean of M � Trimmed mean of M Implemented in edgeR • Assumes most genes are not differentially expressed • Robinson, M.D. & Oshlack, A. (2010) Genome Biology

  32. Differential Expression � • Comparing feature abundance under different conditions • Assumes linearity of signal • When feature = gene , well-established pre- and post- analysis strategies exist Mortazavi, A. et al (2008) Nature Methods

  33. Differential Expression � • Simple difference in means � B 7 7 6 6 5 5 B 4 4 A A 3 3 2 2 1 1 0 0 • Replication introduces variance �

  34. Differential Expression - Modelling � Normal distribution à t-test Normal distribution t-test �

  35. Differential Expression- Modelling � • Use the Poisson distribution for count data � • Just one parameter required – the mean �

  36. Differential Expression- Modelling � • Biology is never that simple � • The negative binomial distribution represents an overdispersed Poisson distribution � • It has two parameters: � mean and (over)dispersion � Anders, S. & Huber, W. (2010) Genome Biology

  37. Differential Expression- Modelling � • Estimating the dispersion parameter can be difficult with a small number of samples � • edgeR: models the variance as the sum of technical and biological variance � • ‘Share’ information from all genes to obtain global estimate - shrinkage � Simon Anders

  38. � Modelling – in fashion • DESeq uses a similar formulation of the variance term �

  39. Towards Biological Meaning � • Clustering Hamy et al. (2016) PLOS One

  40. Towards Biological Meaning � • Gene Set Enrichment Analysis

  41. Towards Biological Meaning � • Network analysis Hamy et al. (2016) PLOS One

  42. Replicates v Sequencing Depth � Liu et al. (2014) Bioinformatics

  43. Replicates v Sequencing Depth � HIGH MEDIUM LOW Liu et al. (2014) Bioinformatics

  44. Replicates v Sequencing Depth � Liu et al. (2014) Bioinformatics

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend