the ensemble of rna structures
play

The Ensemble of RNA Structures Example: best structures of the RNA - PowerPoint PPT Presentation

The Ensemble of RNA Structures Example: best structures of the RNA sequence GGGGGUAUAGCUCAGGGGUAGAGCAUUUGACUGCAGAUCAAGAGGUCCCUGGUUCAAAUCCAGGUGCCCCCU free energy in kcal/mol (((((((..((((.......))))...........((((....))))(((((.......)))))))))))).


  1. The Ensemble of RNA Structures Example: best structures of the RNA sequence GGGGGUAUAGCUCAGGGGUAGAGCAUUUGACUGCAGAUCAAGAGGUCCCUGGUUCAAAUCCAGGUGCCCCCU free energy in kcal/mol (((((((..((((.......))))...........((((....))))(((((.......)))))))))))). -28.10 (((((((..((((.......))))....((((.(.......).))))(((((.......)))))))))))). -27.90 ((((((((.((((.......))))(((((((((..((((....))))..)))).)))))....)))))))). -27.80 ((((((((.((((.......))))(((((((((..((((....))))..))).))))))....)))))))). -27.80 (((((((..((((.......))))....((((...........))))(((((.......)))))))))))). -27.60 (((((((..((((.......))))....(((..(.......)..)))(((((.......)))))))))))). -27.50 ((((((((.((((.......)))).((((((((..((((....))))..)))).)))).....)))))))). -27.20 ((((((((.((((.......)))).((((((((..((((....))))..))).))))).....)))))))). -27.20 ((((((((.((((.......))))...........((((....)))).((((.......)))))))))))). -27.20 ((((((...((((.......))))...........((((....))))(((((.......))))).)))))). -27.20 (((((((...(((...(((...(((......)))..)))..)))...(((((.......)))))))))))). -27.10 ((((((((.((((.......))))((((((((...((((....))))...))).)))))....)))))))). -27.00 ((((((((.((((.......))))((((((((...((((....))))...)).))))))....)))))))). -27.00 ((((((((.((((.......))))....((((.(.......).)))).((((.......)))))))))))). -27.00 (((((((..((((.......)))).((((((....).))))).....(((((.......)))))))))))). -27.00 (((((((..((((.......))))...........(((......)))(((((.......)))))))))))). -27.00 ((((((...((((.......))))....((((.(.......).))))(((((.......))))).)))))). -27.00 ((((((((.((((.......))))(((((((((..(((......)))..)))).)))))....)))))))). -26.70 ((((((((.((((.......))))(((((((((..(((......)))..))).))))))....)))))))). -26.70 ((((((((.((((.......))))....((((...........)))).((((.......)))))))))))). -26.70 (((((((..((((.......)))).(((((.......))))).....(((((.......)))))))))))). -26.70 S.Will, 18.417, Fall 2011 ((((((...((((.......))))....((((...........))))(((((.......))))).)))))). -26.70 The set of all non-crossing RNA structures of an RNA sequence S is called (structure) ensemble P of S .

  2. Is Minimal Free Energy Structure Prediction Useful? • BIG PLUS: loop-based energy model quite realistic • Still mfe structure may be “wrong”: Why? • Lesson: be careful, be sceptical! (as always, but in particular when biology is involved) • What would you improve? S.Will, 18.417, Fall 2011

  3. Probability of a Structure How probable is an RNA structure P for a RNA sequence S ? GOAL: define probability Pr[ P | S ]. IDEA: Think of RNA folding as a dynamic system of structures (=states of the system). Given much time, a sequence S will form every possible structure P . For each structure there is a probability for observing it at a given time. This means: we look for a probability distribution! Requirements: probability depends on energy — the lower the more probable. No additional assumptions! S.Will, 18.417, Fall 2011

  4. Distribution of States in a System Definition (Boltzmann distribution) Let X = { X 1 , . . . , X N } denote a system of states, where state X i has energy E i . The system is Boltzmann distributed with temperature T iff Pr[ X i ] = exp ( − β E i ) / Z for Z := � i exp ( − β E i ), where β = ( k B T ) − 1 . Remarks • broadly used in physics to describe systems of whatever • Boltzmann distribution is usually assumed for the thermodynamic equilibrium (i.e. after sufficiently much time) • transfer to RNA easy to see: structures=states, energies • why temperature? • very high temperature: all states equally probable S.Will, 18.417, Fall 2011 • very low temperature: only best states occur • k B ≈ 1 . 38 × 10 − 23 J / K is known as Boltzmann constant ; β is called inverse temperature . • call exp ( − β E i ) Boltzmann weight of X i .

  5. What next? We assume that the structure ensemble of an RNA sequence is Boltzmann distributed. • What are the benefits? (More than just probabilities of structures . . . ) • Why is it reasonable to assume Boltzmann distribution? (Well, a physicist told me . . . ) • How to calculate probabilities efficiently? (McCaskill’s algorithm) S.Will, 18.417, Fall 2011

  6. Benefits of Assuming Boltzmann Definition Probability of a structure P for S: Pr[ P | S ] := exp ( − β E ( P )) / Z . Allows more profound weighting of structures in the ensemble. We need efficient computation of partition function Z ! Even more interesting: probability of structural elements Definition Probability of a base pair ( i , j ) for S: � Pr[( i , j ) | S ] := Pr[ P | S ] P ∋ ( i , j ) Again, we need Z (and some more). Base pair probabilities enable a new S.Will, 18.417, Fall 2011 view at the structure ensemble (visually but also algorithmically!). Remark: For RNA, we have “real” temperature, e.g. T = 37 ◦ C , which determines β = ( k B T ) − 1 . For calculations pay attention to physical units!

  7. An Immediate Use of Base Pair Probabilities MFE structure and base pair probability dot plot 1 of a tRNA dot.ps GGGGGUAUAGCUCAGGGGUAGAGCAUUUGACUGCAGAUCAAGAGGUCCCUGGUUCAAAUCCAGGUGCCCCCU U G G G G G U A U A G C U C A G G G G U A G A G C A U U U G A C U G C A G A U C A A G A G G U C C C U G G U U C A A A U C C A G G U G C C C C C U G G G G G U A U A G C U C A G G G G U A G A G C A U U U G A C U G C A G A U C A A G A G G U C C C U G G U U C A A A U C C A G G U G C C C C C U G G G G G U A U A G C U C A G G G G U A G A G C A U U U G A C U G C A G A U C A A G A G G U C C C U G G U U C A A A U C C A G G U G C C C C C U G C G C G C G C G C U G A U G A G U G G A C A U G C G C G U AG A G C C U A A U A U A U C G U A U G C G U U C G C C A G C S.Will, 18.417, Fall 2011 U A G U G C A A A G G G G G G U A U A G C U C A G G G G U A G A G C A U U U G A C U G C A G A U C A A G A G G U C C C U G G U U C A A A U C C A G G U G C C C C C U 1 computed by “RNAfold -p”

  8. Why Do We Assume Boltzmann We will give an argument from information theory. We will show: The Boltzmann distribution makes the least number of assumptions. Formally, the B.d. is the distribution with the lowest information content/maximal (Shannon) entropy. As a consequence: without further information about our system, Boltzmann is our best choice. [ What could “further information” mean in a biological context? ] S.Will, 18.417, Fall 2011

  9. Shannon Entropy (by Example) We toss a coin. For our coin, heads and tails show up with respective probabilities p and q (not necessarily fair). How uncertain are we about the result? p = 0 . 5 , q = 0 . 5 ⇒ Answer: expected 1.0 H = 1 — maximal p * log2(1/p) + q * log2(1/q) 0.8 information uncertainty 0.6 p = 1 , q = 0 ⇒ 1 1 0.4 H = p log b p + q log b q . H = 0 — no uncer- 0.2 tainty 0.0 0.2 0.4 0.6 0.8 1.0 p This is Shannon entropy — a measure of uncertainty. In general, define the Shannon entropy 2 as N S.Will, 18.417, Fall 2011 � H ( � p ) := − p i log b p i . i =1 2 of a probability distribution � p over N states X 1 . . . X N

  10. Shannon Entropy (by Example) We toss a coin. For our coin, heads and tails show up with respective probabilities p and q (not necessarily fair). How uncertain are we about the result? p = 0 . 5 , q = 0 . 5 ⇒ Answer: expected 1.0 H = 1 — maximal p * log2(1/p) + q * log2(1/q) 0.8 information uncertainty 0.6 p = 1 , q = 0 ⇒ 1 1 0.4 H = p log b p + q log b q . H = 0 — no uncer- 0.2 tainty 0.0 0.2 0.4 0.6 0.8 1.0 p This is Shannon entropy — a measure of uncertainty. In general, define the Shannon entropy 2 as N S.Will, 18.417, Fall 2011 � H ( � p ) := − p i log b p i . i =1 2 of a probability distribution � p over N states X 1 . . . X N

  11. Shannon Entropy (by Example) We toss a coin. For our coin, heads and tails show up with respective probabilities p and q (not necessarily fair). How uncertain are we about the result? p = 0 . 5 , q = 0 . 5 ⇒ Answer: expected 1.0 H = 1 — maximal p * log2(1/p) + q * log2(1/q) 0.8 information uncertainty 0.6 p = 1 , q = 0 ⇒ 1 1 0.4 H = p log b p + q log b q . H = 0 — no uncer- 0.2 tainty 0.0 0.2 0.4 0.6 0.8 1.0 p This is Shannon entropy — a measure of uncertainty. In general, define the Shannon entropy 2 as N S.Will, 18.417, Fall 2011 � H ( � p ) := − p i log b p i . i =1 2 of a probability distribution � p over N states X 1 . . . X N

  12. Formalizing “Least number of assumptions” Example: Assume: we have N events. Without further assumptions, we will naturally assume the uniform distribution p i = 1 N . This is the uniquely defined distribution maximizing the entropy H ( � p ) = − � i p i log b p i . It is found by solving the following optimization problem: maximize the function � H ( � p ) = − p i log b p i S.Will, 18.417, Fall 2011 i under the side condition � i p i = 1 .

  13. Formalizing “Least number of assumptions” Theorem: Given a system of states X 1 . . . X N and energies E i for X i . The Boltzmann distribution is the probability distribution � p that maximizes Shannon entropy N � H ( � p ) = − p i log b p i i =1 under the assumption of known average energy of the system N � < E > = p i E i . i =1 S.Will, 18.417, Fall 2011

  14. Proof We show that the Boltzmann distribution is uniquely obtained by solving N � 3 maximize function H ( � p ) = − p i ln p i i =1 under the side conditions • C 1 ( � p ) = � i p i − 1 = 0 and • C 2 ( � p ) = � i p i E i − < E > = 0 by using the method of Lagrange multipliers. S.Will, 18.417, Fall 2011 3 whether using ln or log b is equivalent for maximization

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend