Statistical Methods for Bulk and Single-cell RNA Sequencing Data - PowerPoint PPT Presentation

Statistical Methods for Bulk and Single-cell RNA Sequencing Data Jingyi Jessica Li Department of Statistics University of California, Los Angeles http://jsb.ucla.edu

The central dogma of molecular biology 2018 marks the 60th anniversary of the central dogma: DNA makes RNA makes proteins. Francis Crick speaking at the 1963 CSH Symposium [Cobb, PLoS Biology , 2017] 1

The central dogma of molecular biology The central dogma of molecular biology: DNA makes RNA makes proteins. DNA AACGTCGT GCTG CCG AATCAA transcription RNA AACGUCGU GCUG CCG AAUCAA translation protein 2

The central dogma of molecular biology In transcription, a particular segment of DNA (combinations of exons) is copied into RNA segments. exon 1 exon 2 exon 3 exon 4 gene AACGTCGT GCTG CCG AATCAA (DNA) transcription introns removed RNA AACGUCGU GCUG CCG AAUCAA translation protein 3

Understanding genome functions ? [Kundaje et al., Nature , 2015] 4

Understanding genome functions ? 4

Alternative splicing In alternative splicing, particular exons of a gene may be included into or excluded from a mature RNA isoform [Chow et al., Cell , 1977] . gene AACGTCGT GCTG CCG AATCAA alternative splicing isoforms AACGUCGU GCUG CCG AAUCAA AACGUCGU CCG AAUCAA isoform A isoform B (exon 2 included) (exon 2 excluded) 5

Alternative splicing In alternative splicing, particular exons of a gene may be included into or excluded from a mature RNA isoform [Chow et al., Cell , 1977] . gene AACGTCGT GCTG CCG AATCAA alternative splicing AACGUCGU CCG AAUCAA AACGUCGU GCUG CCG AAUCAA AACGUCGU CCG AAUCAA AACGUCGU CCG AAUCAA isoforms AACGUCGU GCUG CCG AAUCAA AACGUCGU CCG AAUCAA AACGUCGU CCG AAUCAA AACGUCGU GCUG CCG AAUCAA AACGUCGU CCG AAUCAA AACGUCGU CCG AAUCAA isoform A isoform B translation proteins protein B protein A 5

Diversity in RNA isoform structures Abnormal splicing can lead to genetic diseases. gene AACGTCGT GCTG CCG AATCAA normal condition normal splicing AACGUCGU GCUG CCG AAUCAA AACGUCGU GCUG CCG AAUCAA AACGUCGU GCUG CCG AAUCAA RNA AACGUCGU CCG AAUCAA AACGUCGU CCG AAUCAA AACGUCGU CCG AAUCAA isoforms AACGUCGU CCG AAUCAA AACGUCGU CCG AAUCAA AACGUCGU CCG AAUCAA AACGUCGU CCG AAUCAA proteins 6

Diversity in RNA isoform structures Abnormal splicing can lead to genetic diseases. gene AACGTCGT GCTG CCG AATCAA normal condition disease condition normal splicing abnormal splicing AACGUCGU GCUG CCG AAUCAA AACGUCGU GCUG CCG AAUCAA AACGUCGU GCUG CCG AAUCAA AACGUCGUAAUCAA RNA AACGUCGU CCG AAUCAA AACGUCGUAAUCAA AACGUCGUAAUCAA AACGUCGU CCG AAUCAA AACGUCGU CCG AAUCAA AACGUCGUAAUCAA AACGUCGUAAUCAA isoforms AACGUCGU CCG AAUCAA AACGUCGU CCG AAUCAA AACGUCGUAAUCAA AACGUCGUAAUCAA AACGUCGU CCG AAUCAA AACGUCGU CCG AAUCAA proteins 6

Understanding genome functions 1000 Genomes The human genome project ENCODE Pilot 1000 Genomes Pilot GTEx project project Epigenome Worm genome Mouse genome modENCODE ENCODE Roadmap 7

RNA sequencing (RNA-seq) technology AACGUCGUUG GCUGGU CCGGAGG AAUCAAGAACUAUAC AACGUCGUUG GCUGGU CCGGAGG AAUCAAGAACUAUAC full length RNA isoforms AACGUCGUUG GCUGGU CCGGAGG (unknown) AACGUCGUUG GCUGGU CCGGAGG statistical inference RNA-seq experiments RNA-seq data (observed) AACGTCGTTG GCTGGT CCGGAGG AATCAAGAACTATAC 8

RNA sequencing (RNA-seq) experiment full length RNA isoforms AACGUCGUUG GCUGGU CCGGAGG AAUCAAGAACUAUAC AACGUCGUUG GCUGGU CCGGAGG AAUCAAGAACUAUAC (1712 bp on average) fragmentation RNA fragments AACGUCG UUG GCUGGU CCGG AGG AAUCAAGAACUAUAC (< 600 bp) AACGUCGUUG GCUGGU CCGGAGG AAUC AAGAACUAUAC 9

RNA sequencing (RNA-seq) experiment full length RNA isoforms AACGUCGUUG GCUGGU CCGGAGG AAUCAAGAACUAUAC AACGUCGUUG GCUGGU CCGGAGG AAUCAAGAACUAUAC (1712 bp on average) fragmentation RNA fragments AACGUCG UUG GCUGGU CCGG AGG AAUCAAGAACUAUAC (< 600 bp) AACGUCGUUG GCUGGU CCGGAGG AAUC AAGAACUAUAC processing sequencing TTGCAGC AAC CGACCA GGCC TCC TTAGTTCTTGATATG AACGTCG TTG GCTGGT CCGG AGG AATCAAGAACTATAC TTGCAGCAAC CGACCA GGCCTCC TTAG TTCTTGATATG AACGTCGTTG GCTGGT CCGGAGG AATC AAGAACUAUAC 9

RNA sequencing (RNA-seq) experiment AACGUCGUUG GCUGGU CCGGAGG AAUCAAGAACUAUAC full length RNA isoforms (1712 bp on average) AACGUCGUUG GCUGGU CCGGAGG AAUCAAGAACUAUAC fragmentation RNA fragments AACGUCG UUG GCUGGU CCGG AGG AAUCAAGAACUAUAC (< 600 bp) AACGUCGUUG GCUGGU CCGGAGG AAUC AAGAACUAUAC processing sequencing TTGCAGC AAC CGACCA GGCC TCC TTAGTTCTTGATATG AACGTCG TTG GCTGGT CCGG AGG AATCAAGAACTATAC TTGCAGCAAC CGACCA GGCCTCC TTAG TTCTTGATATG AACGTCGTTG GCTGGT CCGGAGG AATC AAGAACUAUAC RNA-seq reads AACG CAGC TTG G GGCC AGG A TATG (< 300 bp) AACG CAAC GCTG TTAG AAGA TATG RNA-seq reads ∝ isoform abundance × isoform length 9

Mapping RNA-seq reads to the reference genome AACGUCGUUG GCUGGU CCGGAGG AAUCAAGAACUAUAC full length RNA isoforms AACGUCGUUG GCUGGU CCGGAGG AAUCAAGAACUAUAC (1712 bp on average) processing sequencing RNA-seq reads AACG CAGC TTG G GGCC AGG A TATG (< 300 bp) AACG CAAC GCTG TTAG AAGA TATG mapping (alignment) RNA-seq reads aligned to genome AACGTCGTTG GCTGGT CCGGAGG AATCAAGAACTATAC 10

Mapping RNA-seq reads to the reference genome full length mRNA transcript AACGUCGUUG GCUGGU CCGGAGG AAUCAAGAACUAUAC (1712 bp on average) AACGUCGUUG GCUGGU CCGGAGG AAUCAAGAACUAUAC processing sequencing RNA-seq reads AACG CAGC TTG G GGCC AGG A TATG (< 300 bp) AACG CAAC GCTG TTAG AAGA TATG mapping (alignment) RNA-seq reads aligned to genome AACGTCGTTG GCTGGT CCGGAGG AATCAAGAACTATAC 2 2 1 2 10

Mapping RNA-seq reads to the reference genome AACGUCGUUG GCUGGU CCGGAGG AAUCAAGAACUAUAC full length RNA isoforms AACGUCGUUG GCUGGU CCGGAGG AAUCAAGAACUAUAC (1712 bp on average) processing sequencing RNA-seq reads AACG CAGC TTG G GGCC AGG A TATG (< 300 bp) AACG CAAC GCTG TTAG AAGA TATG mapping (alignment) histogram of RNA-seq read counts AACGTCGTTG GCTGGT CCGGAGG AATCAAGAACTATAC 10

Reference-based RNA-seq data analysis 1. Align RNA-seq reads to a reference genome 2. Analyze aligned reads at three levels a n g i = n gene-level: n 1 DNA mRNA n 1 exon-level: b φ i = n 1 + n 2 n 2 transcript-level: ambiguous α 1 c α 2 RNA-seq reads 11

Single-cell (sc) vs. bulk RNA-seq at the gene level Tissue scRNA-seq bulk RNA-seq genes cells tissue 12

Bulk RNA-seq: transcript/isoform discovery & quantification

AIDE: annotation-assisted isoform discovery isoform-level 13

Isoform discovery: which isoforms are expressed? • More than 90% genes undergo alternative splicing in mammals [Hooper, Human Genomics , 2014] . • At least 35% genetic diseases involve abnormal splicing [Manning et al., Nature Reviews Mol. Cell Biol. 2017] . gene AACGTCGT GCTG CCG AATCAA alternative splicing isoforms AACGUCGU GCUG CCG AAUCAA AACGUCGU CCG AAUCAA isoform A isoform B (exon 2 included) (exon 2 excluded) 14

Isoform discovery: which isoforms are expressed? RNA-seq data genome statistical modeling gene AACGTCGT GCTG CCG AATCAA Which isoforms are expressed? isoforms AACGUCGU GCUG CCG AAUCAA AACGUCGU GCUG AACGUCGU CCG AACGUCGU AAUCAA AACGUCGU GCUG CCG GCUG CCG AAUCAA 15

Challenge 1: large number of candidate isoforms # of exons − 1 Variable size ( # of candidate isoforms) = 2 RNA-seq data genome statistical modeling gene AACGTCGT GCTG CCG AATCAA Which isoforms are expressed? isoforms AACGUCGU GCUG CCG AAUCAA AACGUCGU GCUG AACGUCGU CCG AACGUCGU AAUCAA AACGUCGU GCUG CCG GCUG CCG AAUCAA For this 4-exon gene, 2 4 − 1 = 15 candidate isoforms 16

Challenge 2: great information loss • RNA-seq reads are very short compared with full-length isoforms. • Most RNA-seq reads do not uniquely map to a single isoform. isoform 1 isoform 2 ? gene isoform 3 isoform 4 17

Challenge 2: great information loss • RNA-seq reads are very short compared with full-length isoforms. • Most RNA-seq reads do not uniquely map to a single isoform. isoform 1 isoform 2 ? gene isoform 3 isoform 4 • Technical biases introduced into RNA-seq experiments. 17

Existing isoform discovery methods State-of-the-art methods for isoform discovery: • SIIER [Jiang et al., Bioinformatics , 2009] • Cufflinks [Trapnell et al., Nature Biotechnology , 2010] • SLIDE [Li et al., Proc. Natl. Acad. Sci. 2011] • StringTie [Pertea et al., Nature Biotechnology , 2015] • · · · Limitations: 1. Low accuracy for genes with complex splicing structures. 2. Difficult to improve isoform-level performance. [Kanitz et al., Genome Biology , 2015] 3. Usage of annotations results in false positives. 18

Usage of annotations results in false positives Annotated isoforms are experimentally validated: gene 1 1 2 annotated isoforms 3 4 • Ensembl database: 203 , 903 isoforms [Zerbino et al., Nucleic Acids Research , 2017] 19

Usage of annotations results in false positives Annotated isoforms are experimentally validated: gene 1 1 2 annotated isoforms 3 4 • Ensembl database: 203 , 903 isoforms [Zerbino et al., Nucleic Acids Research , 2017] expressed isoforms in normal brain annotated isoforms 19

Statistical Methods for Bulk and Single-cell RNA Sequencing Data - PowerPoint PPT Presentation

Statistical Methods for Bulk and Single-cell RNA Sequencing Data Jingyi Jessica Li Department of Statistics University of California, Los Angeles http://jsb.ucla.edu The central dogma of molecular biology 2018 marks the 60th anniversary of

Bulk Density and Void Content Bulk Density Bulk density ( n .) the mass of a unit volume of bulk

Cup Concept with Profits Bulk Merchandising Solutions.Bulk Merchandising Solutions.Bulk

RNA World Hypothesis and RNA folding By Lixin Dai October 16, 2002 Outline: RNA World

Prediction of RNA-RNA Interaction slides by Mathias M ohl and Rolf Backofen ohl M.M c

Single-cell RNA-sequencing Ximena Ibarra-Soria CRUK Cambridge Institute RNA-Sequence Analysis

Bacteria Without a Cell Wall L-forms Pros & Cons of Cell Wall Cell membrane Cell wall DNA

What is single-cell RNA-Seq, and why is it useful? S IN GLE-CELL RN A-S EQ W ORK F LOW S IN R

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA) DNA

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA)

DNA AND RNA ATI TEAS SCIENCE DNA & RNA Questions related to DNA and RNA cover topics

Introduction to RNA-Seq Mary Piper Bioinformatics Consultant and Trainer DataCamp RNA-Seq

Prediction of RNA-RNA-Interaction 20 1 15 1 5 10 20 5 10 20 15 10 1 15 5 1 20 10

RNA-seq basics: From reads to differential expression COMBINE RNA-seq Workshop RNA sequencing

RNA Secondary Structure CSE 417 W.L. Ruzzo The Double Helix Los Alamos Science The Central

Introduction to single cell RNA sequencing CRUK Bioinformatics Summer School 2018 Mike

Introduction to Single Cell RNA Sequencing Sarah Boswell Director of the Single Cell Core,

Automa'c design of digital synthe'c gene circuits Mario A. Marchisio and Joerg Stelling

INTRODUCTION TO GENETIC EPIDEMIOLOGY (EPID0754) Prof. Dr. Dr. K. Van Steen Introduction to

Bioinformatics: Sequence Analysis COMP 571 Luay Nakhleh, Rice University 2 Course Information

Functional Genomics @ Scale A long-term goal of functional genomics is to decipher the rules by

Reconstruction and Clustering with Graph optimization and Priors on Gene Networks and Images

A Multitask Multiple Kernel Learning Algorithm for Survival Analysis with Application to Cancer

Vials - VIsualizing ALternative splicing of genes By: Louie Dinh Biology! Im going to have

Environmentally-responsive poly(aminoesters): Applications for the delivery of mRNA Dr. Timothy

Sambuz

Useful Links

Newsletter

Mail Us

Statistical Methods for Bulk and Single-cell RNA Sequencing Data - PowerPoint PPT Presentation

Statistical Methods for Bulk and Single-cell RNA Sequencing Data Jingyi Jessica Li Department of Statistics University of California, Los Angeles http://jsb.ucla.edu The central dogma of molecular biology 2018 marks the 60th anniversary of

Bulk Density and Void Content Bulk Density Bulk density ( n .) the mass of a unit volume of bulk

Cup Concept with Profits Bulk Merchandising Solutions.Bulk Merchandising Solutions.Bulk

RNA World Hypothesis and RNA folding By Lixin Dai October 16, 2002 Outline: RNA World

Prediction of RNA-RNA Interaction slides by Mathias M ohl and Rolf Backofen ohl M.M c

Single-cell RNA-sequencing Ximena Ibarra-Soria CRUK Cambridge Institute RNA-Sequence Analysis

Bacteria Without a Cell Wall L-forms Pros &amp; Cons of Cell Wall Cell membrane Cell wall DNA

What is single-cell RNA-Seq, and why is it useful? S IN GLE-CELL RN A-S EQ W ORK F LOW S IN R

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA) DNA

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA)

DNA AND RNA ATI TEAS SCIENCE DNA &amp; RNA Questions related to DNA and RNA cover topics

Introduction to RNA-Seq Mary Piper Bioinformatics Consultant and Trainer DataCamp RNA-Seq

Prediction of RNA-RNA-Interaction 20 1 15 1 5 10 20 5 10 20 15 10 1 15 5 1 20 10

RNA-seq basics: From reads to differential expression COMBINE RNA-seq Workshop RNA sequencing

RNA Secondary Structure CSE 417 W.L. Ruzzo The Double Helix Los Alamos Science The Central

Introduction to single cell RNA sequencing CRUK Bioinformatics Summer School 2018 Mike

Introduction to Single Cell RNA Sequencing Sarah Boswell Director of the Single Cell Core,

Automa'c design of digital synthe'c gene circuits Mario A. Marchisio and Joerg Stelling

INTRODUCTION TO GENETIC EPIDEMIOLOGY (EPID0754) Prof. Dr. Dr. K. Van Steen Introduction to

Bioinformatics: Sequence Analysis COMP 571 Luay Nakhleh, Rice University 2 Course Information

Functional Genomics @ Scale A long-term goal of functional genomics is to decipher the rules by

Reconstruction and Clustering with Graph optimization and Priors on Gene Networks and Images

A Multitask Multiple Kernel Learning Algorithm for Survival Analysis with Application to Cancer

Vials - VIsualizing ALternative splicing of genes By: Louie Dinh Biology! Im going to have

Environmentally-responsive poly(aminoesters): Applications for the delivery of mRNA Dr. Timothy

Sambuz

Useful Links

Newsletter

Mail Us

Bacteria Without a Cell Wall L-forms Pros & Cons of Cell Wall Cell membrane Cell wall DNA

DNA AND RNA ATI TEAS SCIENCE DNA & RNA Questions related to DNA and RNA cover topics