The Double Helix CSE 421: Intro to Algorithms Summer 2007 W. L. - PowerPoint PPT Presentation

The Double Helix CSE 421: Intro to Algorithms Summer 2007 W. L. Ruzzo Dynamic Programming, II RNA Folding Los Alamos Science http://www.rcsb.org/pdb/explore.do?structureId=1GAT The “Central Dogma” of Non-coding RNA Molecular Biology DNA → RNA → Protein • Messenger RNA - codes for proteins • Non-coding RNA - all the rest – Before, say, mid 1990’s, 1-2 dozen known Protein (critically important, but narrow roles: e.g. gene ribosomal and transfer RNA, splicing, SRP) DNA • Since mid 90’s dramatic discoveries (chromosome) • Regulation, transport, stability/degradation RNA • E.g. “microRNA”: hundreds in humans (messenger) cell • E.g. “riboswitches”: thousands in bacteria 1

RNA DNA structure: dull Structure: Rich …ACCGCTAGATG… • RNA’s fold, and function …TGGCGATCTAC… • Nature uses what works RNA http://www.rcsb.org/pdb/explore.do?structureId=1EVV Secondary Structure: Not everything, but important, easier than 3d 2

Q: What’s so hard? Why is structure important? G A A A A A A A A U G C G U U C U C G A C U C G C U A G C G G U G C A A G G G G A G A C U C G C C • For protein-coding, similarity in sequence is a G G C A G C A A G A G G G G A G A A G G A powerful tool for finding related sequences C A C C A C U U G U A C C – e.g. “hemoglobin” is easily recognized in all vertebrates C C G A A • For non-coding RNA, many different sequences A A A G G have the same structure, and structure is most C U G C C A A A A U A G A A A G U important for function. G A G A C A C U C U U U G U U G G U C C U C U G G C A G C G G U G C G – So, using structure plus sequence, can find related A C G C A U U G C G U A A A sequences at much greater evolutionary distances A C G U G C U G U U U G U A G G G C A: Structure often more important than sequence Chloroflexi Chloroflexus aurantiacus δ -Proteobacteria Geobacter metallireducens 6S mimics an Used by CMfinder Geobacter sulphurreducens Found by scan open promoter Symbiobacterium thermophilum E.coli Barrick et al. RNA 2005 Trotochaud et al. NSMB 2005 Willkomm et al. NAR 2005 3

“Central Dogma” = 6.5 RNA Secondary Structure “Central Chicken & Egg”? DNA → RNA → Protein Nussinov’s Algorithm Protein gene DNA (chromosome) RNA (messenger) cell Was there once an “RNA World”? 4

RNA Secondary Structure RNA Secondary Structure (somewhat oversimplified) RNA. String B = b 1 b 2 … b n over alphabet { A, C, G, U }. Secondary structure. A set of pairs S = { (b i , b j ) } that satisfy:  [Watson-Crick.] Secondary structure. RNA is usually single-stranded, and tends to loop – S is a matching , i.e. each base pairs with at most one other, and back and form base pairs with itself. This structure is essential for – each pair in S is a Watson-Crick pair: A-U, U-A, C-G, or G-C. understanding behavior of molecule.  [No sharp turns.] The ends of each pair are separated by at least 4 intervening bases. If (b i , b j ) ∈ S, then i < j - 4. A C  [Non-crossing.] If (b i , b j ) and (b k , b l ) are two pairs in S, then we Ex: GUCGAUUGAGCGAAUGUAACAACGUGGCUACGGCGAGA A A cannot have i < k < j < l. (Violation of this is called a pseudoknot. ) U A C G G A A U G C Free energy. Usual hypothesis is that an RNA molecule will form the G A U U A U G secondary structure with the optimum total free energy. U C G C A G approximate by number of base pairs C G A G C G C G Goal. Given an RNA molecule B = b 1 b 2 … b n , find a secondary structure S U A that maximizes the number of base pairs. G complementary base pairs: A-U, C-G RNA Secondary Structure: Examples RNA Secondary Structure: Subproblems Examples. First attempt. OPT[j] = maximum number of base pairs in a secondary structure of the substring b 1 b 2 … b j . G G G G G C G G C U C U G G A match b t and b j C G C G C U C U A U A U A G C G U A U A U A A U 1 t j U A base pair Difficulty. Results in two sub-problems.  Finding secondary structure in: b 1 b 2 … b t-1 . OPT(t-1) A C C G G U G U A A C G G G G U A A C C G G U U G A U U U  Finding secondary structure in: b t+1 b t+2 … b j-1 . ≤ 4 not OPT of anything; need more sub-problems ok sharp turn U A C C G G U G U A A C crossing 5

Dynamic Programming Over Intervals: (R. Nussinov’s algorithm) Bottom Up Dynamic Programming Over Intervals Notation. OPT[i, j] = maximum number of base pairs in a secondary Q. What order to solve the sub-problems? structure of the substring b i b i+1 … b j . A. Do shortest intervals first.  Case 1. If i ≥ j - 4. Key point: – OPT[i, j] = 0 by no-sharp turns condition. k Either last base RNA(b 1 ,…,b n ) { 4 0 0 0 for k = 5, 6, …, n-1 is unpaired 3 0 0 for i = 1, 2, …, n-k  Case 2. Base b j is not involved in a pair. i 2 0 j = i + k (case 1,2) or – OPT[i, j] = OPT[i, j-1] Compute OPT[i, j] 1 paired (case 3) 6 7 8 9 return OPT[1, n] using recurrence  Case 3. Base b j pairs with b t for some i ≤ t < j - 4. j } – non-crossing constraint decouples resulting sub-problems – OPT[i, j] = 1 + max t { OPT[i, t-1] + OPT[t+1, j-1] } j 1 4 5 6 7 8 9 take max over t such that i ≤ t < j-4 and 1 0 0 0 1 Running time. O(n 3 ). b t and b j are Watson-Crick complements 2 2 0 0 0 0 i i 0 0 0 0 0 3 3 Remark. Same core idea in CKY algorithm to parse context-free grammars. 0 0 0 4 0 0 0 4 k Computing one cell: OPT[2,18] = ? G G G A A A A C C C A A A G G G G U U U n= 20 C U C C G G U U G C A A U G U C n = 16 ( ( ( . . . . ) ) ) ( ( ( . . . . ) ) ) ( ( . ( . . . . ) . ) . . ) . . 0 0 0 0 0 1 1 1 1 1 2 2 2 3 3 3 0 0 0 0 0 0 0 1 2 3 3 3 3 3 3 3 3 4 5 6 E.g.: 0 0 0 0 0 0 0 0 1 1 2 2 2 2 2 2 Case 1: 0 0 0 0 0 0 0 1 2 2 2 2 2 2 3 3 3 4 5 6 0 0 0 0 0 0 0 0 1 1 1 1 1 2 2 2 OPT[1,6] = 1: 2 ≥ 18-4? no. 0 0 0 0 0 0 0 1 1 1 1 1 1 2 2 3 3 4 5 6 0 0 0 0 0 0 0 0 1 1 1 1 1 2 2 2 Case 2: 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 3 4 5 6 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 2 B 18 unpaired? CUCCGG 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 3 4 5 6 Always a possibility; (....) 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 then OPT[2,18] ≥ 3 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 3 4 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 3 4 4 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 GGAAAACCCAAAGGGGU 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 3 3 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E.g.: ((....))(....)... 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 2 2 2 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 OPT[6,16] = 2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 GUUGCAAUGUC 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 ((....)...) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 � 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 if i � j � 4 � 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 � OPT[ i , j -1] � OPT( i , j ) = � max � � � otherwise 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 � 1 + max t (OPT[ i , t � 1] + OPT[ t + 1, j � 1] � � 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6

The Double Helix CSE 421: Intro to Algorithms Summer 2007 W. L. - PowerPoint PPT Presentation

The Double Helix CSE 421: Intro to Algorithms Summer 2007 W. L. Ruzzo Dynamic Programming, II RNA Folding Los Alamos Science http://www.rcsb.org/pdb/explore.do?structureId=1GAT The Central Dogma of Non-coding RNA Molecular Biology

The Triple Helix Model Role of different entities 1 The Triple Helix Model Role of

CSE 421 Midterm Scores Mean 83 Sigma 11 1 CSE 421 Algorithms Sequence Alignment 1 Sequence

Managing Containers with Helix Kanak Biscuitwala Jason Zhang Apache Helix Committers @ LinkedIn

Dynamic Programming The most important algorithmic technique covered in CSE 421 CSE 421

More Java Graphics Shape Classes: Face Check out Faces from SVN Finish Java Graphics: text and

Helix Track Finding and Track Fitting Algorithm A FPGA tracking algorithm for helix tracking using

RNA Secondary Structure CSE 417 W.L. Ruzzo The Double Helix Los Alamos Science The Central

The Double Helix RNA Secondary Structure CSE 417 W.L. Ruzzo Los Alamos Science The Central

Breadth-First Search Completely explore the vertices CSE 421: Intro to in order of their

Defining Efficiency CSE 421: Intro Algorithms Runs fast on typical real problem instances

Interchange Intro Presentation Plus: Intro (Mixed media Interchange Intro Presentation Plus: Intro

Interchange Intro Presentation Plus: Intro (Mixed media Interchange Intro Presentation Plus: Intro

Names Quattro S Double A Double S Double C Triple C Quattro C Variations All Boxer models

Double Chooz Experiment Status Double Chooz Experiment Status Jelena Maricic, Drexel University

ARGO GROUP 421 WEST 14TH STREET NEW YORK, NY APRIL 3, 2018 1516876-19 GANSEVOORT MARKET

5.1 CABINETMAKERS SUPPLY www.cabinetmakerssupply.net fax) 703-421-6333 (ph) 703-421-6331 3554 -

PLANNING IN KANSAS INCORPORATING HEALTH INTO LOCAL PLANNING EFFORTS J UNE 1, 2017 1 HOW TO USE

THE MARRIAGE OF CLOUD, HPC AND CONTAINERS ...AND SERVERLESS? ADAM HUFFMAN Senior HPC and Cloud

A comparison of hazard perception and responding in car drivers and motorcyclists Narelle

Deep stupidity: what neural networks can and cannot do .. Prof J. Mark Bishop (PhD) Director

from a Water Engineering Perspective JUSTIN CRICK CIVIL ENGINEERING SAN JOSE STATE UNIVERSITY

Formal Methods and CyberSecurity James Davenport University of Bath Former Fulbright

Feedback Message Passing for Inference in Gaussian Graphical Models Ying Liu Venkat

Tentacular analysis of microarray data Dhammika Amaratunga Senior Research Fellow, Nonclinical