Quantitatively deconvolving alternative RNA secondary structures
Regina Bohnert and Gunnar R¨ atsch
Friedrich Miescher Laboratory
- f the Max Planck Society
T¨ ubingen, Germany
July 15, 2011
HiTSeq-SIG, ISMB/ECCB 2011
Quantitatively deconvolving alternative RNA secondary structures - - PowerPoint PPT Presentation
Quantitatively deconvolving alternative RNA secondary structures Regina Bohnert and Gunnar R atsch Friedrich Miescher Laboratory of the Max Planck Society T ubingen, Germany July 15, 2011 HiTSeq-SIG, ISMB/ECCB 2011 Introduction
Friedrich Miescher Laboratory
T¨ ubingen, Germany
HiTSeq-SIG, ISMB/ECCB 2011
Introduction Motivation
◮ RNA-seq: transcript
◮ But: information about
◮ Important role in basic
Figure adapted from Wikipedia
Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 1 / 13
Introduction Motivation
◮ RNA-seq: transcript
◮ But: information about
◮ Important role in basic
◮ Regulation of gene
Trp operon, figure taken from Wikipedia
Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 1 / 13
Introduction Motivation
◮ RNA-seq: transcript
◮ But: information about
◮ Important role in basic
◮ Regulation of gene
◮ Mixtures of alternative
TPP riboswitch, figure taken from Wachter (2010)
Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 1 / 13
Introduction PARS Protocol
Library preparation deep sequencing Library preparation deep sequencing In vitro folding V1 Random fragmentation mRNA 5’ cap 5’ OH 5’ OH 5’P 5’P 5’P 5’P AAAAAA
AGGCAUGCACCUGGUAGCUAGUCUUUAAACC …
V1 pro le
Number
S1 RNase V1 digestion Random fragmentation S1 nuclease digestion
AGGCAUGCACCUGGUAGCUAGUCUUUAAACC …
S1 pro le
b a
PARS (parallel analysis of RNA structure) protocol, figure taken from (Kertesz et al., 2010)
Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 2 / 13
Structure Quantitation with sQuant Bias
◮ Observation: no uniform
(Kertesz et al., 2010)
◮ Data simulation
◮ Simulated structures
(Hofacker et al., 1994)
◮ Generated read counts for
◮ Read starts at ds/ss ◮ Enzyme errors ◮ Fragmentation, size
transcript position read counts
! " " # $ " $ # $ ! " $ ! $ $ $ " " $ $ $ ! # ! # $ " $ $ " ! " " " $ " $ $ " "" "! # # " $ " " $ $ $ # # ! " $ # " " " " ! $ $ $ # $ " ! !#"" ! ! # ! " $ ! " $ " $ ! " $ " $ ! " $ " $ " $ # $ " " " $ " " $ " $ $ $ $ $ ! " " # " " $ $ # ! $ " $ $ $ $ # " # " $ ! $ " " $ $ " " $ # $ ! # # $ $ # " ! " " " " " " " " " " #without noise after digestion after fragm. + size selection
% !!!!"""""""!!""!!!!""""!!""!"!!!!!!#!##!!!####!##!!#######!!!!!!""""!!!!!"""""""!!!!!!""""!!!!!####!!!!!!!!!!""""""""!!!!!!!########!!!!!!!!###!####!!####!!!!!!!!!
Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 3 / 13
Structure Quantitation with sQuant Bias
◮ Observation: no uniform
(Kertesz et al., 2010)
◮ Data simulation
◮ Simulated structures
(Hofacker et al., 1994)
◮ Generated read counts for
◮ Bias correction: Ridge regression
◮ Features: distances to
◮ Target: expected/observed
◮ Model explains ≈ 75% of
transcript position read counts
! " " # $ " $ # $ ! " $ ! $ $ $ " " $ $ $ ! # ! # $ " $ $ " ! " " " $ " $ $ " "" "! # # " $ " " $ $ $ # # ! " $ # " " " " ! $ $ $ # $ " ! !#"" ! ! # ! " $ ! " $ " $ ! " $ " $ ! " $ " $ " $ # $ " " " $ " " $ " $ $ $ $ $ ! " " # " " $ $ # ! $ " $ $ $ $ # " # " $ ! $ " " $ $ " " $ # $ ! # # $ $ # " ! " " " " " " " " " " #simulated predicted
!!!!"""""""!!""!!!!""""!!""!"!!!!!!#!##!!!####!##!!#######!!!!!!""""!!!!!"""""""!!!!!!""""!!!!!####!!!!!!!!!!""""""""!!!!!!!########!!!!!!!!###!####!!####!!!!!!!!!
Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 3 / 13
Structure Quantitation with sQuant Quantitation
◮ How can known transcripts with structural information be
◮ Single transcript with multiple possible structures ◮ Alternative transcripts with multiple possible structures
◮ How can structures be inferred from read data?
Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 4 / 13
Structure Quantitation with sQuant Quantitation
A B C
! " " # $ " $ # $ ! " $ ! $ $ $ " " $ $ $ ! # ! # $ " $ $ " ! " " " $ " $ $ " "" "! # # " $ " " $ $ $ # # ! " $ # " " " " ! $ $ $ # $ " ! !#"" ! ! # ! " $ ! " $ " $ ! " $ " $ ! " $ " $ " $ # $ " " " $ " " $ " $ $ $ $ $ ! " " # " " $ $ # ! $ " $ $ $ $ # " # " $ ! $ " " $ $ " " $ # $ ! # # $ $ # " ! " " " " " " " " " " # ! " " # $ " $ # $ ! " $ ! $ $ $ " " $ $ $ ! # ! # $ " $ $ " ! " " " $" $ $ " "" " ! # # " $ " " $ $ $ # # !" $ # " " " " ! $ $ $ # $ " ! ! # " " ! ! # ! " $ ! " $ " $ ! " $ " $ ! " $ " $ " $ # $ " " " $ " " $ " $ $ $ $ $ ! "" # " " $ $ # ! $ " $ $ $ $ # " # " $ ! $ " " $ $ " " $ # $ ! # # $ $ # " ! " " " " " " " " " " #transcript position 5’ -> 3’ Structure A read counts
!!!!"""""""!!""!!!!""""!!""!"!!!!!!#!##!!!####!##!!#######!!!!!!""""!!!!!"""""""!!!!!!""""!!!!!####!!!!!!!!!!""""""""!!!!!!!########!!!!!!!!###!####!!####!!!!!!!!!
ds ss transcript position 5’ -> 3’ Structure B
!""""!!!""""""!!""""!!!"""!"""!!!!!!!###!###!!!####!##!####!!####!!!!!!!!""""""!"""""!!!!!!!!#####!!!!"""""!!"""""!""""!!!!#########!!#!####!##!####!!!!!!!!!!!!!!!
read counts ds ss wA wB Mixture of structures transcript position 5’ -> 3’ read counts ds ss
Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 5 / 13
Structure Quantitation with sQuant Quantitation
atsch, 2010)
read coverage genome position 5’ -> 3’ Mixture of transcripts
expected
read coverage genome position 5’ -> 3’ Short transcript read coverage genome position 5’ -> 3’ Long transcript M A B C wA wB
Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 6 / 13
Structure Quantitation with sQuant Quantitation
transcript position 5’ -> 3’ Structure A read counts
!!!!"""""""!!""!!!!""""!!""!"!!!!!!#!##!!!####!##!!#######!!!!!!""""!!!!!"""""""!!!!!!""""!!!!!####!!!!!!!!!!""""""""!!!!!!!########!!!!!!!!###!####!!####!!!!!!!!!
ds ss transcript position 5’ -> 3’ Structure B
!""""!!!""""""!!""""!!!"""!"""!!!!!!!###!###!!!####!##!####!!####!!!!!!!!""""""!"""""!!!!!!!!#####!!!!"""""!!"""""!""""!!!!#########!!#!####!##!####!!!!!!!!!!!!!!!
read counts ds ss wA wB Mixture of structures transcript position 5’ -> 3’ read counts ds ss M A B C
p = wAAss p + wBBss p
p = wAAds p + wBBds p
wA,wB
p , C ds p
p , C ss p
ubingen) Secondary structure quantitation Jul 15, 2011 7 / 13
Structure Quantitation with sQuant Quantitation
◮ 500 genes, one transcript per gene ◮ Complex mixture of 1-10 known structures ◮ Use sQuant to quantify structures ◮ Pearson’s correlation with ground truth: 90.9%
◮ 500 genes, with ≥ 2 alternative transcripts per gene ◮ Complex mixture of 1-10 structures ◮ Use sQuant to quantify structures ◮ Pearson’s correlation of with ground truth: 89.8%
◮ 1000 genes, 1 structure per transcript ◮ Pearson’s correlation of with ground truth ◮ w/o bias correction: 82.6% ◮ w/
Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 8 / 13
Structure Quantitation with sQuant Quantitation
◮ reads are short ◮ alternative events are far apart ◮ difficult to determine which exon combination is present
Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 9 / 13
Structure Quantitation with sQuant Quantitation
◮ 500 genes with ≥ 2 alternative transcripts ◮ Ideal situation: one known structure per transcript ◮ Compare quantitation of RNA transcripts/structures
Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 9 / 13
Structure Quantitation with sQuant Structure Predictions
◮ How can known transcripts with structural information be
◮ How can structures be inferred from read data?
Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 10 / 13
Structure Quantitation with sQuant Structure Predictions
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4
Hamming distance to true structure number of included constraints
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −200 −100 100 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4
number of included constraints
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −200 −100 100 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4
number of included constraints
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −200 −100 100
Minimum free energy of predicted structure Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 11 / 13
Structure Quantitation with sQuant Structure Predictions
p & Mss p
p & Ass p
p,new = Mds p − wsAds p
p,new = Mss p − wsAss p
◮ 50 genes with 1-10 structures ◮ Read counts w/o & w/ biases ◮ Run sQuant.denovo to identify
◮ Measure distance to nearest
1 2 3 4 5 6 7 8 9 10 0.05 0.1 0.15 0.2 0.25 0.3
sQuant.struct w/o noise sQuant.struct w/ noise MFE base line Number of structures in mixture Average distance to nearest structure Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 12 / 13
Summary
◮ Prediction of single structures appears feasible ◮ Quantitation of multiple know structures works well
◮ PARS-seq may have an advantage over RNA-seq for transcript
◮ Enzyme digestion leads to predictable, non-uniform read
◮ Predicting multiple mixed structures appears difficult ◮ So far based on simulations, need more real data . . . ◮ More information on
◮ The slides can be found on
Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 13 / 13
Regina Bohnert and Gunnar R¨
(Web Server issue):W348–51, Jul 2010. doi: 10.1093/nar/gkq448. Regina Bohnert, Jonas Behr, and Gunnar R¨
13):P5, Oct 2009. Ivo L. Hofacker, Walter Fontana, Peter F. Stadler, L. Sebastian Bonhoeffer, Manfred Tacker, and Peter Schuster. Fast Folding and Comparison of RNA Secondary Structures. Monatshefte f¨ ur Chemie / Chemical Monthly, 125:167–188, 1994. Michael Kertesz, Yue Wan, Elad Mazor, John L Rinn, Robert C Nutter, Howard Y Chang, and Eran Segal. Genome-wide measurement of RNA secondary structure in yeast. Nature, 467(7311):103–7, Sep 2010. doi: 10.1038/nature09322. Andreas Wachter. Riboswitch-mediated control of gene expression in eukaryotes. RNA Biology, 7(1):67–76, 2010. Regina Bohnert (FML, T¨ ubingen) Secondary structure quantitation Jul 15, 2011 1