Combinatorial approaches to RNA folding Part III: Stocastic - PowerPoint PPT Presentation

Combinatorial approaches to RNA folding Part III: Stocastic algorithms via language theory Matthew Macauley Department of Mathematical Sciences Clemson University http://www.math.clemson.edu/~macaule/ Math 4500, Fall 2016 M. Macauley (Clemson) RNA folding via formal grammars Math 4500, Fall 2016 1 / 14

Overview M. Macauley (Clemson) RNA folding via formal grammars Math 4500, Fall 2016 2 / 14

Overview Main question Given a raw sequence of RNA, can we predict how it will fold? M. Macauley (Clemson) RNA folding via formal grammars Math 4500, Fall 2016 2 / 14

Overview Main question Given a raw sequence of RNA, can we predict how it will fold? There are two main approaches to this problem: 1. Energy minimization. Calculate the “free energy” of a folded structure. The “most likely” structures tend to be those where free energy is minimized. The free energy is computed recursively using dynamic programming. 2. Formal language theory. Use a formal grammar to algorithmically generate secondary structures: production rules convert symbols into strings according to the langauge’s syntax. If we assign probabilities to the rules, then the “most likely” structure is the one that ocurrs with the highest probability. M. Macauley (Clemson) RNA folding via formal grammars Math 4500, Fall 2016 2 / 14

Overview Main question Given a raw sequence of RNA, can we predict how it will fold? There are two main approaches to this problem: 1. Energy minimization. Calculate the “free energy” of a folded structure. The “most likely” structures tend to be those where free energy is minimized. The free energy is computed recursively using dynamic programming. 2. Formal language theory. Use a formal grammar to algorithmically generate secondary structures: production rules convert symbols into strings according to the langauge’s syntax. If we assign probabilities to the rules, then the “most likely” structure is the one that ocurrs with the highest probability. In this lecture, we will study the formal language theory approach. M. Macauley (Clemson) RNA folding via formal grammars Math 4500, Fall 2016 2 / 14

Some history In his famous 1859 book Evolution of the Species , Charles Darwin wrote: “ the formation of different languages and of distinct species, and the proofs that both have been developed through a gradual process, are curiously parallel. ” M. Macauley (Clemson) RNA folding via formal grammars Math 4500, Fall 2016 3 / 14

Some history In his famous 1859 book Evolution of the Species , Charles Darwin wrote: “ the formation of different languages and of distinct species, and the proofs that both have been developed through a gradual process, are curiously parallel. ” Decades later, scientists would discover a macromolecule called DNA that encoded genetic instructions for life in a mysterious language over the alphabet Σ = { a , c , g , t } . M. Macauley (Clemson) RNA folding via formal grammars Math 4500, Fall 2016 3 / 14

Some history In his famous 1859 book Evolution of the Species , Charles Darwin wrote: “ the formation of different languages and of distinct species, and the proofs that both have been developed through a gradual process, are curiously parallel. ” Decades later, scientists would discover a macromolecule called DNA that encoded genetic instructions for life in a mysterious language over the alphabet Σ = { a , c , g , t } . Though this would eventually lead to the fields of molecular biology and linguistics becoming interwined, major developments were needed in both fields before this could happen. M. Macauley (Clemson) RNA folding via formal grammars Math 4500, Fall 2016 3 / 14

Some history In his famous 1859 book Evolution of the Species , Charles Darwin wrote: “ the formation of different languages and of distinct species, and the proofs that both have been developed through a gradual process, are curiously parallel. ” Decades later, scientists would discover a macromolecule called DNA that encoded genetic instructions for life in a mysterious language over the alphabet Σ = { a , c , g , t } . Though this would eventually lead to the fields of molecular biology and linguistics becoming interwined, major developments were needed in both fields before this could happen. Noam Chomsky is considered to be the father of modern linguistics. In the 1950s, he helped popularize the universal grammar theory. M. Macauley (Clemson) RNA folding via formal grammars Math 4500, Fall 2016 3 / 14

Some history In his famous 1859 book Evolution of the Species , Charles Darwin wrote: “ the formation of different languages and of distinct species, and the proofs that both have been developed through a gradual process, are curiously parallel. ” Decades later, scientists would discover a macromolecule called DNA that encoded genetic instructions for life in a mysterious language over the alphabet Σ = { a , c , g , t } . Though this would eventually lead to the fields of molecular biology and linguistics becoming interwined, major developments were needed in both fields before this could happen. Noam Chomsky is considered to be the father of modern linguistics. In the 1950s, he helped popularize the universal grammar theory. Chomsky’s work led to a more rigorous mathematical treatment of formal langauges, revolutionizing the field of linguistics. M. Macauley (Clemson) RNA folding via formal grammars Math 4500, Fall 2016 3 / 14

Some history In his famous 1859 book Evolution of the Species , Charles Darwin wrote: “ the formation of different languages and of distinct species, and the proofs that both have been developed through a gradual process, are curiously parallel. ” Decades later, scientists would discover a macromolecule called DNA that encoded genetic instructions for life in a mysterious language over the alphabet Σ = { a , c , g , t } . Though this would eventually lead to the fields of molecular biology and linguistics becoming interwined, major developments were needed in both fields before this could happen. Noam Chomsky is considered to be the father of modern linguistics. In the 1950s, he helped popularize the universal grammar theory. Chomsky’s work led to a more rigorous mathematical treatment of formal langauges, revolutionizing the field of linguistics. Also in the 1950s, the structure of DNA, the newly discovered fundamental building block of life, was finally understood. M. Macauley (Clemson) RNA folding via formal grammars Math 4500, Fall 2016 3 / 14

Some history Formal langauages involve an alphabet Σ and production rules that turn symbols into substrings to generate words. M. Macauley (Clemson) RNA folding via formal grammars Math 4500, Fall 2016 4 / 14

Some history Formal langauages involve an alphabet Σ and production rules that turn symbols into substrings to generate words. The use of formal language theory to study molecular biology began in the 1980s. M. Macauley (Clemson) RNA folding via formal grammars Math 4500, Fall 2016 4 / 14

Some history Formal langauages involve an alphabet Σ and production rules that turn symbols into substrings to generate words. The use of formal language theory to study molecular biology began in the 1980s. The earliest work involved using regular grammars to model biological sequences. M. Macauley (Clemson) RNA folding via formal grammars Math 4500, Fall 2016 4 / 14

Some history Formal langauages involve an alphabet Σ and production rules that turn symbols into substrings to generate words. The use of formal language theory to study molecular biology began in the 1980s. The earliest work involved using regular grammars to model biological sequences. Assigning probabilities to the production rules yields hidden Markov models (HMMs), and these have been widely used in sequence analysis. M. Macauley (Clemson) RNA folding via formal grammars Math 4500, Fall 2016 4 / 14

Some history Formal langauages involve an alphabet Σ and production rules that turn symbols into substrings to generate words. The use of formal language theory to study molecular biology began in the 1980s. The earliest work involved using regular grammars to model biological sequences. Assigning probabilities to the production rules yields hidden Markov models (HMMs), and these have been widely used in sequence analysis. The location of bases in DNA and RNA strands are not uncorrelated. Regular grammars cannot model this. M. Macauley (Clemson) RNA folding via formal grammars Math 4500, Fall 2016 4 / 14

Some history Formal langauages involve an alphabet Σ and production rules that turn symbols into substrings to generate words. The use of formal language theory to study molecular biology began in the 1980s. The earliest work involved using regular grammars to model biological sequences. Assigning probabilities to the production rules yields hidden Markov models (HMMs), and these have been widely used in sequence analysis. The location of bases in DNA and RNA strands are not uncorrelated. Regular grammars cannot model this. A larger class of grammars needs to be used to account for this: context-free grammars (CFGs). M. Macauley (Clemson) RNA folding via formal grammars Math 4500, Fall 2016 4 / 14

Combinatorial approaches to RNA folding Part III: Stocastic - PowerPoint PPT Presentation

Combinatorial approaches to RNA folding Part III: Stocastic algorithms via language theory Matthew Macauley Department of Mathematical Sciences Clemson University http://www.math.clemson.edu/~macaule/ Math 4500, Fall 2016 M. Macauley

RNA World Hypothesis and RNA folding By Lixin Dai October 16, 2002 Outline: RNA World

Combinatorial approaches to RNA folding Part I: Basics Matthew Macauley Department of

Combinatorial approaches to RNA folding Part III: Stocastic algorithms via language theory

Combinatorial approaches to RNA folding Part II: Energy minimization via dynamic programming

Prediction of RNA-RNA Interaction slides by Mathias M ohl and Rolf Backofen ohl M.M c

CS3000: Algorithms & Data Jonathan Ullman Lecture 8: Dynamic Programming: RNA Folding,

DNA AND RNA ATI TEAS SCIENCE DNA & RNA Questions related to DNA and RNA cover topics

Prediction of RNA-RNA-Interaction 20 1 15 1 5 10 20 5 10 20 15 10 1 15 5 1 20 10

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA) DNA

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA)

Introduction to RNA-Seq Mary Piper Bioinformatics Consultant and Trainer DataCamp RNA-Seq

RNA-seq basics: From reads to differential expression COMBINE RNA-seq Workshop RNA sequencing

Protein Folding Protein Folding Proteins have unique 3-dimensional shapes created by the

Protein Folding Protein Folding Proteins have unique 3-dimensional shapes created by the

Predicting Protein Folding Paths S.Will, 18.417, Fall 2011 Protein Folding by Robotics S.Will,

Lecture 4: RNA folding Chapter 6 Problem 6.51 in Jones and Pevzner and the Turner model

CS 591 S2Formal Language Theory: Integrating Experimentation and ProofFall 2019 Instructor

JUST THE MATHS SLIDES NUMBER 13.4 INTEGRATION APPLICATIONS 4 (Lengths of curves) by

Superposition with Lambdas Alexander Bentkamp Jasmin Blanchette Sophie Tourret Petar Vukmirovi

MetaFun: Meta-Learning with Iterative Functional Updates Jin Xu, Jean-Francois Ton, Hyunjik Kim,

Linear-Time Algorithm for Morphic Imprimitivity Testing Tomasz Kociumaka 1 Jakub Radoszewski 1

MA/CSSE 474 Theory of Computation Decision Problems, Continued DFSMs Your Questions? HW1

Class 26: review for final exam solutions, 18.05, Spring 2014 Four ways to fill each slot: 4 5 .

Secure Two-Party Distribution Testing Alexandr Andoni Tal Malkin Negev Shekel Nosatzki

Combinatorial approaches to RNA folding Part III: Stocastic - PowerPoint PPT Presentation

Combinatorial approaches to RNA folding Part III: Stocastic algorithms via language theory Matthew Macauley Department of Mathematical Sciences Clemson University http://www.math.clemson.edu/~macaule/ Math 4500, Fall 2016 M. Macauley

RNA World Hypothesis and RNA folding By Lixin Dai October 16, 2002 Outline: RNA World

Combinatorial approaches to RNA folding Part I: Basics Matthew Macauley Department of

Combinatorial approaches to RNA folding Part III: Stocastic algorithms via language theory

Combinatorial approaches to RNA folding Part II: Energy minimization via dynamic programming

Prediction of RNA-RNA Interaction slides by Mathias M ohl and Rolf Backofen ohl M.M c

CS3000: Algorithms &amp; Data Jonathan Ullman Lecture 8: Dynamic Programming: RNA Folding,

DNA AND RNA ATI TEAS SCIENCE DNA &amp; RNA Questions related to DNA and RNA cover topics

Prediction of RNA-RNA-Interaction 20 1 15 1 5 10 20 5 10 20 15 10 1 15 5 1 20 10

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA) DNA

PROTEIN SYNTHESIS RNA (ribonucleic acid) 3 types RNA DIFFERENCES 1. messenger RNA (mRNA)

Introduction to RNA-Seq Mary Piper Bioinformatics Consultant and Trainer DataCamp RNA-Seq

RNA-seq basics: From reads to differential expression COMBINE RNA-seq Workshop RNA sequencing

Protein Folding Protein Folding Proteins have unique 3-dimensional shapes created by the

Protein Folding Protein Folding Proteins have unique 3-dimensional shapes created by the

Predicting Protein Folding Paths S.Will, 18.417, Fall 2011 Protein Folding by Robotics S.Will,

Lecture 4: RNA folding Chapter 6 Problem 6.51 in Jones and Pevzner and the Turner model

CS 591 S2Formal Language Theory: Integrating Experimentation and ProofFall 2019 Instructor

JUST THE MATHS SLIDES NUMBER 13.4 INTEGRATION APPLICATIONS 4 (Lengths of curves) by

Superposition with Lambdas Alexander Bentkamp Jasmin Blanchette Sophie Tourret Petar Vukmirovi

MetaFun: Meta-Learning with Iterative Functional Updates Jin Xu, Jean-Francois Ton, Hyunjik Kim,

Linear-Time Algorithm for Morphic Imprimitivity Testing Tomasz Kociumaka 1 Jakub Radoszewski 1

MA/CSSE 474 Theory of Computation Decision Problems, Continued DFSMs Your Questions? HW1

Class 26: review for final exam solutions, 18.05, Spring 2014 Four ways to fill each slot: 4 5 .

Secure Two-Party Distribution Testing Alexandr Andoni Tal Malkin Negev Shekel Nosatzki

CS3000: Algorithms & Data Jonathan Ullman Lecture 8: Dynamic Programming: RNA Folding,

DNA AND RNA ATI TEAS SCIENCE DNA & RNA Questions related to DNA and RNA cover topics