The Ensemble of RNA Structures Example: best structures of the RNA - PowerPoint PPT Presentation

The Ensemble of RNA Structures Example: best structures of the RNA sequence GGGGGUAUAGCUCAGGGGUAGAGCAUUUGACUGCAGAUCAAGAGGUCCCUGGUUCAAAUCCAGGUGCCCCCU free energy in kcal/mol (((((((..((((.......))))...........((((....))))(((((.......)))))))))))). -28.10 (((((((..((((.......))))....((((.(.......).))))(((((.......)))))))))))). -27.90 ((((((((.((((.......))))(((((((((..((((....))))..)))).)))))....)))))))). -27.80 ((((((((.((((.......))))(((((((((..((((....))))..))).))))))....)))))))). -27.80 (((((((..((((.......))))....((((...........))))(((((.......)))))))))))). -27.60 (((((((..((((.......))))....(((..(.......)..)))(((((.......)))))))))))). -27.50 ((((((((.((((.......)))).((((((((..((((....))))..)))).)))).....)))))))). -27.20 ((((((((.((((.......)))).((((((((..((((....))))..))).))))).....)))))))). -27.20 ((((((((.((((.......))))...........((((....)))).((((.......)))))))))))). -27.20 ((((((...((((.......))))...........((((....))))(((((.......))))).)))))). -27.20 (((((((...(((...(((...(((......)))..)))..)))...(((((.......)))))))))))). -27.10 ((((((((.((((.......))))((((((((...((((....))))...))).)))))....)))))))). -27.00 ((((((((.((((.......))))((((((((...((((....))))...)).))))))....)))))))). -27.00 ((((((((.((((.......))))....((((.(.......).)))).((((.......)))))))))))). -27.00 (((((((..((((.......)))).((((((....).))))).....(((((.......)))))))))))). -27.00 (((((((..((((.......))))...........(((......)))(((((.......)))))))))))). -27.00 ((((((...((((.......))))....((((.(.......).))))(((((.......))))).)))))). -27.00 ((((((((.((((.......))))(((((((((..(((......)))..)))).)))))....)))))))). -26.70 ((((((((.((((.......))))(((((((((..(((......)))..))).))))))....)))))))). -26.70 ((((((((.((((.......))))....((((...........)))).((((.......)))))))))))). -26.70 (((((((..((((.......)))).(((((.......))))).....(((((.......)))))))))))). -26.70 S.Will, 18.417, Fall 2011 ((((((...((((.......))))....((((...........))))(((((.......))))).)))))). -26.70 The set of all non-crossing RNA structures of an RNA sequence S is called (structure) ensemble P of S .

Is Minimal Free Energy Structure Prediction Useful? • BIG PLUS: loop-based energy model quite realistic • Still mfe structure may be “wrong”: Why? • Lesson: be careful, be sceptical! (as always, but in particular when biology is involved) • What would you improve? S.Will, 18.417, Fall 2011

Probability of a Structure How probable is an RNA structure P for a RNA sequence S ? GOAL: define probability Pr[ P | S ]. IDEA: Think of RNA folding as a dynamic system of structures (=states of the system). Given much time, a sequence S will form every possible structure P . For each structure there is a probability for observing it at a given time. This means: we look for a probability distribution! Requirements: probability depends on energy — the lower the more probable. No additional assumptions! S.Will, 18.417, Fall 2011

Distribution of States in a System Definition (Boltzmann distribution) Let X = { X 1 , . . . , X N } denote a system of states, where state X i has energy E i . The system is Boltzmann distributed with temperature T iff Pr[ X i ] = exp ( − β E i ) / Z for Z := � i exp ( − β E i ), where β = ( k B T ) − 1 . Remarks • broadly used in physics to describe systems of whatever • Boltzmann distribution is usually assumed for the thermodynamic equilibrium (i.e. after sufficiently much time) • transfer to RNA easy to see: structures=states, energies • why temperature? • very high temperature: all states equally probable S.Will, 18.417, Fall 2011 • very low temperature: only best states occur • k B ≈ 1 . 38 × 10 − 23 J / K is known as Boltzmann constant ; β is called inverse temperature . • call exp ( − β E i ) Boltzmann weight of X i .

What next? We assume that the structure ensemble of an RNA sequence is Boltzmann distributed. • What are the benefits? (More than just probabilities of structures . . . ) • Why is it reasonable to assume Boltzmann distribution? (Well, a physicist told me . . . ) • How to calculate probabilities efficiently? (McCaskill’s algorithm) S.Will, 18.417, Fall 2011

Benefits of Assuming Boltzmann Definition Probability of a structure P for S: Pr[ P | S ] := exp ( − β E ( P )) / Z . Allows more profound weighting of structures in the ensemble. We need efficient computation of partition function Z ! Even more interesting: probability of structural elements Definition Probability of a base pair ( i , j ) for S: � Pr[( i , j ) | S ] := Pr[ P | S ] P ∋ ( i , j ) Again, we need Z (and some more). Base pair probabilities enable a new S.Will, 18.417, Fall 2011 view at the structure ensemble (visually but also algorithmically!). Remark: For RNA, we have “real” temperature, e.g. T = 37 ◦ C , which determines β = ( k B T ) − 1 . For calculations pay attention to physical units!

An Immediate Use of Base Pair Probabilities MFE structure and base pair probability dot plot 1 of a tRNA dot.ps GGGGGUAUAGCUCAGGGGUAGAGCAUUUGACUGCAGAUCAAGAGGUCCCUGGUUCAAAUCCAGGUGCCCCCU U G G G G G U A U A G C U C A G G G G U A G A G C A U U U G A C U G C A G A U C A A G A G G U C C C U G G U U C A A A U C C A G G U G C C C C C U G G G G G U A U A G C U C A G G G G U A G A G C A U U U G A C U G C A G A U C A A G A G G U C C C U G G U U C A A A U C C A G G U G C C C C C U G G G G G U A U A G C U C A G G G G U A G A G C A U U U G A C U G C A G A U C A A G A G G U C C C U G G U U C A A A U C C A G G U G C C C C C U G C G C G C G C G C U G A U G A G U G G A C A U G C G C G U AG A G C C U A A U A U A U C G U A U G C G U U C G C C A G C S.Will, 18.417, Fall 2011 U A G U G C A A A G G G G G G U A U A G C U C A G G G G U A G A G C A U U U G A C U G C A G A U C A A G A G G U C C C U G G U U C A A A U C C A G G U G C C C C C U 1 computed by “RNAfold -p”

Why Do We Assume Boltzmann We will give an argument from information theory. We will show: The Boltzmann distribution makes the least number of assumptions. Formally, the B.d. is the distribution with the lowest information content/maximal (Shannon) entropy. As a consequence: without further information about our system, Boltzmann is our best choice. [ What could “further information” mean in a biological context? ] S.Will, 18.417, Fall 2011

Shannon Entropy (by Example) We toss a coin. For our coin, heads and tails show up with respective probabilities p and q (not necessarily fair). How uncertain are we about the result? p = 0 . 5 , q = 0 . 5 ⇒ Answer: expected 1.0 H = 1 — maximal p * log2(1/p) + q * log2(1/q) 0.8 information uncertainty 0.6 p = 1 , q = 0 ⇒ 1 1 0.4 H = p log b p + q log b q . H = 0 — no uncer- 0.2 tainty 0.0 0.2 0.4 0.6 0.8 1.0 p This is Shannon entropy — a measure of uncertainty. In general, define the Shannon entropy 2 as N S.Will, 18.417, Fall 2011 � H ( � p ) := − p i log b p i . i =1 2 of a probability distribution � p over N states X 1 . . . X N

Formalizing “Least number of assumptions” Example: Assume: we have N events. Without further assumptions, we will naturally assume the uniform distribution p i = 1 N . This is the uniquely defined distribution maximizing the entropy H ( � p ) = − � i p i log b p i . It is found by solving the following optimization problem: maximize the function � H ( � p ) = − p i log b p i S.Will, 18.417, Fall 2011 i under the side condition � i p i = 1 .

Formalizing “Least number of assumptions” Theorem: Given a system of states X 1 . . . X N and energies E i for X i . The Boltzmann distribution is the probability distribution � p that maximizes Shannon entropy N � H ( � p ) = − p i log b p i i =1 under the assumption of known average energy of the system N � < E > = p i E i . i =1 S.Will, 18.417, Fall 2011

Proof We show that the Boltzmann distribution is uniquely obtained by solving N � 3 maximize function H ( � p ) = − p i ln p i i =1 under the side conditions • C 1 ( � p ) = � i p i − 1 = 0 and • C 2 ( � p ) = � i p i E i − < E > = 0 by using the method of Lagrange multipliers. S.Will, 18.417, Fall 2011 3 whether using ln or log b is equivalent for maximization

The Ensemble of RNA Structures Example: best structures of the RNA - PowerPoint PPT Presentation

The Ensemble of RNA Structures Example: best structures of the RNA sequence GGGGGUAUAGCUCAGGGGUAGAGCAUUUGACUGCAGAUCAAGAGGUCCCUGGUUCAAAUCCAGGUGCCCCCU free energy in kcal/mol (((((((..((((.......))))...........((((....))))(((((.......)))))))))))).

RNA World Hypothesis and RNA folding By Lixin Dai October 16, 2002 Outline: RNA World

Prediction of RNA-RNA Interaction slides by Mathias M ohl and Rolf Backofen ohl M.M c

DNA AND RNA ATI TEAS SCIENCE DNA & RNA Questions related to DNA and RNA cover topics

Prediction of RNA-RNA-Interaction 20 1 15 1 5 10 20 5 10 20 15 10 1 15 5 1 20 10

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA) DNA

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA)

Introduction to RNA-Seq Mary Piper Bioinformatics Consultant and Trainer DataCamp RNA-Seq

RNA-seq basics: From reads to differential expression COMBINE RNA-seq Workshop RNA sequencing

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

RNA Secondary RNA Secondary Structures: Structures: A Case Study on A Case Study on Viruses

Long Noncoding RNA The Dark Matter of the Genome Megan McSweeney BMS 265 Long Noncoding RNA

RNA Secondary Structure CSE 417 W.L. Ruzzo The Double Helix Los Alamos Science The Central

RNA-RNA Interaction Prediction with Stochastic Grammars Sebastian Wild Markus Nebel Anika

A partition function algorithm for RNA-RNA interaction Hamidreza Chitsaz Raheleh Salari, Cenk

The Double Helix RNA Secondary Structure CSE 417 W.L. Ruzzo Los Alamos Science The Central

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

I/O monotone dynamical systems Germn A. Enciso University of California, Irvine Eduardo

vs. 14.05.2012 | Eva Christina Ackermann and Barbara Drossel | Technische Universitt Darmstadt |

GENOME 541 Syllabus ! protein and DNA sequence analysis to Modeling and Searching

Small RNAs and how to analyze them using sequencing Jakub

One year of developments and collaborations around the MinION on the Genomic facility of the

Program logics for relaxed consistency UPMARC Summer School 2014 Viktor Vafeiadis Max Planck

Introduction to Syngenta Classification: EXTERNAL USE 2014 Who is Syngenta? World-leading

Roach Master: Putting it all together Mohammad T . Irfan 10/29/2013 What we want Want to

Sambuz

Useful Links

Newsletter

Mail Us

The Ensemble of RNA Structures Example: best structures of the RNA - PowerPoint PPT Presentation

The Ensemble of RNA Structures Example: best structures of the RNA sequence GGGGGUAUAGCUCAGGGGUAGAGCAUUUGACUGCAGAUCAAGAGGUCCCUGGUUCAAAUCCAGGUGCCCCCU free energy in kcal/mol (((((((..((((.......))))...........((((....))))(((((.......)))))))))))).

RNA World Hypothesis and RNA folding By Lixin Dai October 16, 2002 Outline: RNA World

Prediction of RNA-RNA Interaction slides by Mathias M ohl and Rolf Backofen ohl M.M c

DNA AND RNA ATI TEAS SCIENCE DNA &amp; RNA Questions related to DNA and RNA cover topics

Prediction of RNA-RNA-Interaction 20 1 15 1 5 10 20 5 10 20 15 10 1 15 5 1 20 10

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA) DNA

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA)

Introduction to RNA-Seq Mary Piper Bioinformatics Consultant and Trainer DataCamp RNA-Seq

RNA-seq basics: From reads to differential expression COMBINE RNA-seq Workshop RNA sequencing

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

RNA Secondary RNA Secondary Structures: Structures: A Case Study on A Case Study on Viruses

Long Noncoding RNA The Dark Matter of the Genome Megan McSweeney BMS 265 Long Noncoding RNA

RNA Secondary Structure CSE 417 W.L. Ruzzo The Double Helix Los Alamos Science The Central

RNA-RNA Interaction Prediction with Stochastic Grammars Sebastian Wild Markus Nebel Anika

A partition function algorithm for RNA-RNA interaction Hamidreza Chitsaz Raheleh Salari, Cenk

The Double Helix RNA Secondary Structure CSE 417 W.L. Ruzzo Los Alamos Science The Central

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

I/O monotone dynamical systems Germn A. Enciso University of California, Irvine Eduardo

vs. 14.05.2012 | Eva Christina Ackermann and Barbara Drossel | Technische Universitt Darmstadt |

GENOME 541 Syllabus ! protein and DNA sequence analysis to Modeling and Searching

Small RNAs and how to analyze them using sequencing Jakub

One year of developments and collaborations around the MinION on the Genomic facility of the

Program logics for relaxed consistency UPMARC Summer School 2014 Viktor Vafeiadis Max Planck

Introduction to Syngenta Classification: EXTERNAL USE 2014 Who is Syngenta? World-leading

Roach Master: Putting it all together Mohammad T . Irfan 10/29/2013 What we want Want to

Sambuz

Useful Links

Newsletter

Mail Us

DNA AND RNA ATI TEAS SCIENCE DNA & RNA Questions related to DNA and RNA cover topics