Quantitatively deconvolving alternative RNA secondary structures - - PowerPoint PPT Presentation

quantitatively deconvolving alternative rna secondary
SMART_READER_LITE
LIVE PREVIEW

Quantitatively deconvolving alternative RNA secondary structures - - PowerPoint PPT Presentation

Quantitatively deconvolving alternative RNA secondary structures Regina Bohnert and Gunnar R atsch Friedrich Miescher Laboratory of the Max Planck Society T ubingen, Germany July 15, 2011 HiTSeq-SIG, ISMB/ECCB 2011 Introduction


slide-1
SLIDE 1

Quantitatively deconvolving alternative RNA secondary structures

Regina Bohnert and Gunnar R¨ atsch

Friedrich Miescher Laboratory

  • f the Max Planck Society

T¨ ubingen, Germany

July 15, 2011

HiTSeq-SIG, ISMB/ECCB 2011

slide-2
SLIDE 2

Introduction Motivation

RNA Secondary Structure and Its Role

fml

◮ RNA-seq: transcript

identification with quantitation

◮ But: information about

structure is missing

◮ Important role in basic

cellular processes pre-mRNA

intron exon

mRNA

short reads junction reads

reference genome

Figure adapted from Wikipedia

Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 1 / 13

slide-3
SLIDE 3

Introduction Motivation

RNA Secondary Structure and Its Role

fml

◮ RNA-seq: transcript

identification with quantitation

◮ But: information about

structure is missing

◮ Important role in basic

cellular processes

◮ Regulation of gene

expression

Trp operon, figure taken from Wikipedia

Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 1 / 13

slide-4
SLIDE 4

Introduction Motivation

RNA Secondary Structure and Its Role

fml

◮ RNA-seq: transcript

identification with quantitation

◮ But: information about

structure is missing

◮ Important role in basic

cellular processes

◮ Regulation of gene

expression, alternative splicing

◮ Mixtures of alternative

structures may co-exist

TPP riboswitch, figure taken from Wachter (2010)

Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 1 / 13

slide-5
SLIDE 5

Introduction PARS Protocol

Profiling RNA Secondary Structures

fml

Library preparation deep sequencing Library preparation deep sequencing In vitro folding V1 Random fragmentation mRNA 5’ cap 5’ OH 5’ OH 5’P 5’P 5’P 5’P AAAAAA

AGGCAUGCACCUGGUAGCUAGUCUUUAAACC …

V1 pro le

Number

  • f reads

S1 RNase V1 digestion Random fragmentation S1 nuclease digestion

AGGCAUGCACCUGGUAGCUAGUCUUUAAACC …

S1 pro le

b a

PARS (parallel analysis of RNA structure) protocol, figure taken from (Kertesz et al., 2010)

Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 2 / 13

slide-6
SLIDE 6

Structure Quantitation with sQuant Bias

Bias Correction

fml

◮ Observation: no uniform

distribution

(Kertesz et al., 2010)

◮ Data simulation

◮ Simulated structures

(Hofacker et al., 1994)

◮ Generated read counts for

ds/ss experiments (PARS-like)

◮ Read starts at ds/ss ◮ Enzyme errors ◮ Fragmentation, size

selection

transcript position read counts

! " " # $ " $ # $ ! " $ ! $ $ $ " " $ $ $ ! # ! # $ " $ $ " ! " " " $ " $ $ " "" "! # # " $ " " $ $ $ # # ! " $ # " " " " ! $ $ $ # $ " ! !#"" ! ! # ! " $ ! " $ " $ ! " $ " $ ! " $ " $ " $ # $ " " " $ " " $ " $ $ $ $ $ ! " " # " " $ $ # ! $ " $ $ $ $ # " # " $ ! $ " " $ $ " " $ # $ ! # # $ $ # " ! " " " " " " " " " " #

without noise after digestion after fragm. + size selection

% !!!!"""""""!!""!!!!""""!!""!"!!!!!!#!##!!!####!##!!#######!!!!!!""""!!!!!"""""""!!!!!!""""!!!!!####!!!!!!!!!!""""""""!!!!!!!########!!!!!!!!###!####!!####!!!!!!!!!

Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 3 / 13

slide-7
SLIDE 7

Structure Quantitation with sQuant Bias

Bias Correction

fml

◮ Observation: no uniform

distribution

(Kertesz et al., 2010)

◮ Data simulation

◮ Simulated structures

(Hofacker et al., 1994)

◮ Generated read counts for

ds/ss experiments (PARS-like)

◮ Bias correction: Ridge regression

◮ Features: distances to

proximal cutting sites

◮ Target: expected/observed

read counts

◮ Model explains ≈ 75% of

the variability

transcript position read counts

! " " # $ " $ # $ ! " $ ! $ $ $ " " $ $ $ ! # ! # $ " $ $ " ! " " " $ " $ $ " "" "! # # " $ " " $ $ $ # # ! " $ # " " " " ! $ $ $ # $ " ! !#"" ! ! # ! " $ ! " $ " $ ! " $ " $ ! " $ " $ " $ # $ " " " $ " " $ " $ $ $ $ $ ! " " # " " $ $ # ! $ " $ $ $ $ # " # " $ ! $ " " $ $ " " $ # $ ! # # $ $ # " ! " " " " " " " " " " #

simulated predicted

!!!!"""""""!!""!!!!""""!!""!"!!!!!!#!##!!!####!##!!#######!!!!!!""""!!!!!"""""""!!!!!!""""!!!!!####!!!!!!!!!!""""""""!!!!!!!########!!!!!!!!###!####!!####!!!!!!!!!

Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 3 / 13

slide-8
SLIDE 8

Structure Quantitation with sQuant Quantitation

Questions

fml

◮ How can known transcripts with structural information be

quantified?

◮ Single transcript with multiple possible structures ◮ Alternative transcripts with multiple possible structures

◮ How can structures be inferred from read data?

Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 4 / 13

slide-9
SLIDE 9

Structure Quantitation with sQuant Quantitation

sQuant – Basic Idea

fml

A B C

! " " # $ " $ # $ ! " $ ! $ $ $ " " $ $ $ ! # ! # $ " $ $ " ! " " " $ " $ $ " "" "! # # " $ " " $ $ $ # # ! " $ # " " " " ! $ $ $ # $ " ! !#"" ! ! # ! " $ ! " $ " $ ! " $ " $ ! " $ " $ " $ # $ " " " $ " " $ " $ $ $ $ $ ! " " # " " $ $ # ! $ " $ $ $ $ # " # " $ ! $ " " $ $ " " $ # $ ! # # $ $ # " ! " " " " " " " " " " # ! " " # $ " $ # $ ! " $ ! $ $ $ " " $ $ $ ! # ! # $ " $ $ " ! " " " $" $ $ " "" " ! # # " $ " " $ $ $ # # !" $ # " " " " ! $ $ $ # $ " ! ! # " " ! ! # ! " $ ! " $ " $ ! " $ " $ ! " $ " $ " $ # $ " " " $ " " $ " $ $ $ $ $ ! "" # " " $ $ # ! $ " $ $ $ $ # " # " $ ! $ " " $ $ " " $ # $ ! # # $ $ # " ! " " " " " " " " " " #

transcript position 5’ -> 3’ Structure A read counts

!!!!"""""""!!""!!!!""""!!""!"!!!!!!#!##!!!####!##!!#######!!!!!!""""!!!!!"""""""!!!!!!""""!!!!!####!!!!!!!!!!""""""""!!!!!!!########!!!!!!!!###!####!!####!!!!!!!!!

ds ss transcript position 5’ -> 3’ Structure B

!""""!!!""""""!!""""!!!"""!"""!!!!!!!###!###!!!####!##!####!!####!!!!!!!!""""""!"""""!!!!!!!!#####!!!!"""""!!"""""!""""!!!!#########!!#!####!##!####!!!!!!!!!!!!!!!

read counts ds ss wA wB Mixture of structures transcript position 5’ -> 3’ read counts ds ss

Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 5 / 13

slide-10
SLIDE 10

Structure Quantitation with sQuant Quantitation

Recap from Transcript Quantitation

fml

rQuant (Bohnert et al., 2009; Bohnert and R¨

atsch, 2010)

read coverage genome position 5’ -> 3’ Mixture of transcripts

expected

  • bserved

read coverage genome position 5’ -> 3’ Short transcript read coverage genome position 5’ -> 3’ Long transcript M A B C wA wB

Mp = wAAp + wBBp ⇒ minwA,wB

  • p ℓ (Mp, Cp)

Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 6 / 13

slide-11
SLIDE 11

Structure Quantitation with sQuant Quantitation

sQuant – Basic Idea

fml

! " " # $ " $ # $ ! " $ ! $ $ $ " " $ $ $ ! # ! # $ " $ $ " ! " " " $ " $ $ " "" "! # # " $ " " $ $ $ # # ! " $ # " " " " ! $ $ $ # $ " ! !#"" ! ! # ! " $ ! " $ " $ ! " $ " $ ! " $ " $ " $ # $ " " " $ " " $ " $ $ $ $ $ ! " " # " " $ $ # ! $ " $ $ $ $ # " # " $ ! $ " " $ $ " " $ # $ ! # # $ $ # " ! " " " " " " " " " " # ! " " # $ " $ # $ ! " $ ! $ $ $ " " $ $ $ ! # ! # $ " $ $ " ! " " " $" $ $ " "" " ! # # " $ " " $ $ $ # # !" $ # " " " " ! $ $ $ # $ " ! ! # " " ! ! # ! " $ ! " $ " $ ! " $ " $ ! " $ " $ " $ # $ " " " $ " " $ " $ $ $ $ $ ! "" # " " $ $ # ! $ " $ $ $ $ # " # " $ ! $ " " $ $ " " $ # $ ! # # $ $ # " ! " " " " " " " " " " #

transcript position 5’ -> 3’ Structure A read counts

!!!!"""""""!!""!!!!""""!!""!"!!!!!!#!##!!!####!##!!#######!!!!!!""""!!!!!"""""""!!!!!!""""!!!!!####!!!!!!!!!!""""""""!!!!!!!########!!!!!!!!###!####!!####!!!!!!!!!

ds ss transcript position 5’ -> 3’ Structure B

!""""!!!""""""!!""""!!!"""!"""!!!!!!!###!###!!!####!##!####!!####!!!!!!!!""""""!"""""!!!!!!!!#####!!!!"""""!!"""""!""""!!!!#########!!#!####!##!####!!!!!!!!!!!!!!!

read counts ds ss wA wB Mixture of structures transcript position 5’ -> 3’ read counts ds ss M A B C

Mss

p = wAAss p + wBBss p

Mds

p = wAAds p + wBBds p

⇒ min

wA,wB

  • p
  • Mds

p , C ds p

  • + ℓ
  • Mss

p , C ss p

  • Regina Bohnert (FML, T¨

ubingen) Secondary structure quantitation Jul 15, 2011 7 / 13

slide-12
SLIDE 12

Structure Quantitation with sQuant Quantitation

Results

fml

  • 1. Mixture of structures

◮ 500 genes, one transcript per gene ◮ Complex mixture of 1-10 known structures ◮ Use sQuant to quantify structures ◮ Pearson’s correlation with ground truth: 90.9%

  • 2. Mixture of structures and transcript isoforms

◮ 500 genes, with ≥ 2 alternative transcripts per gene ◮ Complex mixture of 1-10 structures ◮ Use sQuant to quantify structures ◮ Pearson’s correlation of with ground truth: 89.8%

  • 3. Advantage of bias correction

◮ 1000 genes, 1 structure per transcript ◮ Pearson’s correlation of with ground truth ◮ w/o bias correction: 82.6% ◮ w/

bias correction: 83.5%

Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 8 / 13

slide-13
SLIDE 13

Structure Quantitation with sQuant Quantitation

Does structure help for transcript quantitation?

fml

RNA-seq-based transcript quantitation is difficult: when

◮ reads are short ◮ alternative events are far apart ◮ difficult to determine which exon combination is present

Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 9 / 13

slide-14
SLIDE 14

Structure Quantitation with sQuant Quantitation

Does structure help for transcript quantitation?

fml

RNA-seq-based transcript quantitation is difficult: Different exon combinations lead to different RNA structures! Comparison: PARS-seq & sQuant vs. RNA-seq & rQuant

◮ 500 genes with ≥ 2 alternative transcripts ◮ Ideal situation: one known structure per transcript ◮ Compare quantitation of RNA transcripts/structures

sQuant: 85.1% rQuant: 64.9%

Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 9 / 13

slide-15
SLIDE 15

Structure Quantitation with sQuant Structure Predictions

Questions

fml

◮ How can known transcripts with structural information be

quantified?

◮ How can structures be inferred from read data?

Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 10 / 13

slide-16
SLIDE 16

Structure Quantitation with sQuant Structure Predictions

De Novo Structure Predictions

fml

Idea: Use most confident positions as contraints for paired and unpaired bases (τ = pseudo count) score(p) = log ds(p) + τ ss(p) + τ

  • Single structure

w/o bias Single structure with bias Two structures with bias

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4

Hamming distance to true structure number of included constraints

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −200 −100 100 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4

number of included constraints

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −200 −100 100 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4

number of included constraints

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −200 −100 100

Minimum free energy of predicted structure Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 11 / 13

slide-17
SLIDE 17

Structure Quantitation with sQuant Structure Predictions

Predicting Multiple Structures

fml

Idea:

  • 1. Predict most abundant structure s based on Mds

p & Mss p

  • 2. Quantify structure with sQuant ⇒ ws
  • 3. Adjust read counts based on predicted profiles Ads

p & Ass p

Mds

p,new = Mds p − wsAds p

Mss

p,new = Mss p − wsAss p

  • 4. Go to 1 or stop

Preliminary results:

◮ 50 genes with 1-10 structures ◮ Read counts w/o & w/ biases ◮ Run sQuant.denovo to identify

structures

◮ Measure distance to nearest

prediction

1 2 3 4 5 6 7 8 9 10 0.05 0.1 0.15 0.2 0.25 0.3

sQuant.struct w/o noise sQuant.struct w/ noise MFE base line Number of structures in mixture Average distance to nearest structure Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 12 / 13

slide-18
SLIDE 18

Summary

Summary

fml

◮ Prediction of single structures appears feasible ◮ Quantitation of multiple know structures works well

◮ PARS-seq may have an advantage over RNA-seq for transcript

quantitation

◮ Enzyme digestion leads to predictable, non-uniform read

coverage

◮ Predicting multiple mixed structures appears difficult ◮ So far based on simulations, need more real data . . . ◮ More information on

http://www.fml.mpg.de/raetsch/suppl/squant

◮ The slides can be found on

http://www.fml.mpg.de/raetsch/lectures Thank you for your attention!

Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 13 / 13

slide-19
SLIDE 19

References

fml

I

Regina Bohnert and Gunnar R¨

  • atsch. rQuant.web: a tool for RNA-Seq-based transcript quantitation. Nucleic Acids Research, 38

(Web Server issue):W348–51, Jul 2010. doi: 10.1093/nar/gkq448. Regina Bohnert, Jonas Behr, and Gunnar R¨

  • atsch. Transcript quantification with RNA-Seq data. BMC Bioinformatics, 10(Suppl

13):P5, Oct 2009. Ivo L. Hofacker, Walter Fontana, Peter F. Stadler, L. Sebastian Bonhoeffer, Manfred Tacker, and Peter Schuster. Fast Folding and Comparison of RNA Secondary Structures. Monatshefte f¨ ur Chemie / Chemical Monthly, 125:167–188, 1994. Michael Kertesz, Yue Wan, Elad Mazor, John L Rinn, Robert C Nutter, Howard Y Chang, and Eran Segal. Genome-wide measurement of RNA secondary structure in yeast. Nature, 467(7311):103–7, Sep 2010. doi: 10.1038/nature09322. Andreas Wachter. Riboswitch-mediated control of gene expression in eukaryotes. RNA Biology, 7(1):67–76, 2010. Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 1