Modeling and Searching for Non-Coding RNA
W.L. Ruzzo !
http://www.cs.washington.edu/homes/ruzzo http://www.cs.washington.edu/homes/ruzzo/ courses/gs541/10sp
Modeling and Searching for Non-Coding RNA W.L. Ruzzo ! - - PowerPoint PPT Presentation
Modeling and Searching for Non-Coding RNA W.L. Ruzzo ! http://www.cs.washington.edu/homes/ruzzo http://www.cs.washington.edu/homes/ruzzo/ courses/gs541/10sp GENOME 541 Syllabus ! protein and DNA sequence analysis to
http://www.cs.washington.edu/homes/ruzzo http://www.cs.washington.edu/homes/ruzzo/ courses/gs541/10sp
10
Noncoding RNA Examples! RNA structure prediction!
RNA “motif” models! Search!
Motif discovery! Applications!
17
18
uracil! thymine!
CH3!
pairs ! with A!
seemed in 1958. Solid arrows represent probable transfers, dotted arrows possible
represent the impossible transfers postulated by the central dogma. They are the three possible arrows starting from protein.!
rRNA - ribosomal RNA (~4 kinds, 120-5k nt)! tRNA - transfer RNA (~61 kinds, ~ 75 nt)! RNaseP - tRNA processing (~300 nt)! snRNA - small nuclear RNA (splicing: U1, etc, 60-300nt)! a handful of others!
26
5´ 3´
28
29
34
Alberts, et al, 3e.
SAM! Grundy & Henkin, Mol. Microbiol 1998 Epshtein, et al., PNAS 2003 Winkler et al., Nat. Struct. Biol. 2003
35
Alberts, et al, 3e.
SAM-II!
SAM-I! Grundy, Epshtein, Winkler et al., 1998, 2003
Corbino et al., Genome Biol. 2005
36
Alberts, et al, 3e. Corbino et al., Genome Biol. 2005
SAM-III!
SAM-II! SAM-I!
Fuchs et al., NSMB 2006
Grundy, Epshtein, Winkler et al., 1998, 2003
37
Alberts, et al, 3e. Corbino et al., Genome Biol. 2005
Weinberg et al., RNA 2008 SAM-III! SAM-II! SAM-I! Fuchs et al., NSMB 2006 Grundy, Epshtein, Winkler et al., 1998, 2003 SAM-IV!
38
Alberts, et al, 3e.
Corbino et al., Genome
Weinberg et al., RNA 2008 SAM-III! SAM-II! SAM-I! Fuchs et al., NSMB 2006 Grundy, Epshtein, Winkler et al., 1998, 2003 SAM-IV!
Meyer, etal., BMC Genomics 2009
39
40
~ 20 ligands known; multiple nonhomologous solutions for some! dozens to hundreds of instances of each! TPP known in archaea & eukaryotes!
control! In some bacteria, more riboregulators identified than protein TFs! all found since ~2003!
58
Barrick et al. RNA 2005 Trotochaud et al. NSMB 2005 Willkomm et al. NAR 2005
E.coli
Bacillus/" Clostridium! Actino- bacteria! 64
65
1 10 100 1,000 23S rRNA 16S rRNA Group II intron tmRNA OLE Group I intron RNase P AdoCbl riboswitch glmS ribozyme Lysine riboswitch IMES-1 IMES-2 GOLLD HEARO Average size (nucleotides) Multistem junctions plus pseudoknots Not ribozyme Unknown function Ribozyme
Zasha Weinberg1,2, Jonathan Perreault2, Michelle M. Meyer2 & Ronald R. Breaker1,2,3
Vol 462 |3 December 2009 |doi:10.1038/nature08586
b
ACAAAATATATTACTCAACTGTCAG ATGAGCCAAAAACGCGAACTAGAA ACAAAATATATCACTCAACTATGAGCCAAAAACGCGAACTAGAA
Nostoc sp. 149530 151150 75790 HEARO HEARO
1–58 nt 1–2 nt 1–9 nt 0–39 nt 0–7 nt 0–18 nt 0–10 nt 1–6 nt 0–11 nt G A G Y R C U ACG U U R C A C C Y G R A UG Y Y Y Y A G U Y Y G C Y C U G R Y R Y Y R Y R R Y A A CAU U CG A R G R R R R A A Y Y Y Y R G R R R Stem usually has A bulge
mismatch Pseudoknot 0–17 nt U Y C C UC Y C UR AR G R GYY U C C A U G A 3′ integration site 0–14 nt U U A A A C A R Y R RG G R R A G U G 73% U C A CG C U G G C GA A AG G Y A A A G C G C C G A A G G 7% 5′ 0–70 nt G U C A R Y A C C C C U R AA G G G GC U U R G Y U G A C Y A
a
5′ integration site 3′ 0–1490 nt
ORF
|
66
a
GOLLD
U Y A A A Y C U R Y G CA R R Y R R G G C A U Y R A A G R G R A G U A R Pseudoknot E-loop R R R G G Y R G Y A U Y U Y U C A A A A G R R R R C R Y R R R C R C C Y Y Y A A G A A A A G U Y Y Y R G Y R Y G A A G C UA U R Y Y R G Y Y R RR Y C C A A G Y Y R G A G U A R Y Y R Y A R A R UG R U R Y U A A R A Y C G 0–129 nt (can contain tRNA) R Y R R R Y Y Y R G C C G U R E-loop 0–2 nt 0–22 nt 0–7 nt G R R U A C G U G G A A R R R R G AA A U A A U Y Y Y A A A G Y Y Y R UG U A U C U C AR U 3′ 0–3 nt 0–2 nt AR Y G R U A Y R Y Y A G Y Y R A G G G Y R A C CU R R GG R R R R R R R U A Y Y G R Y G YR GR Y Y R RUUG A G R U G R RA A Y CAAU A R G A A A R Y Y R 5′ 0–2 nt 3 nt 7 or 8 nt G G C G Y Y U A G U C Y A R A U AARC Y G A A R G R R U AAA G G U G C G Y Y R R A R R C R U A R R CA G R R G G R Y Y CA G G C G U C Pseudoknot G A U C 1–2 nt AGRR Y UGY RA RA A RU R GRY Y A U C C R R Y Y Y A Y A U U G C G U Y C A A U R Y AR A G R C U U A A A A C C G AA G G U A G Y G UA C R G G UG GU G C U G U U R Y U C CUU R Y Y Y C U AC C A R G G U U G A A G R C U U G A A R U AU G Pseudoknot Pseudoknot Pseudoknot
Variable-length hairpin Variable-length loop Zero-length connector Variable-length region 90% 97% 75% 50% Nucleotide identity Nucleotide present 75%
N N
97%
N
90% Covarying mutations Base pair annotations R: A or G, Y: C or U. nt: nucleotides Compatible mutations No mutations observed Modular sub-structure
67
b
GOLLD phage genomic DNA GOLLD phage genomic DNA 1 0.5 Bacterial cell density GOLLD RNA Mitomycin C No treatment 2 4 6 8 10 12 14 22 2 4 6 8 10 12 14 22 Hours Fraction of maximum
|
More abundant than 5S rRNA! From unknown marine organisms!
68
! ! !
In some species, we know identities of more ribo- regulators than protein regulators!
(without proper alignments, etc.)!
77
79
80
2006 Nobel Prize! Fire & Mello!
Evofold! S Pedersen, G Bejerano, A Siepel, K Rosenbloom, K Lindblad-Toh, ES Lander, J Kent, W Miller, D Haussler, "Identification and classification of conserved RNA secondary structures in the human genome." PLoS Comput. Biol., 2, #4 (2006) e33. ! 48,479 candidates (~70% FDR?)! RNAz! S Washietl, IL Hofacker, M Lukasser, A Hutenhofer, PF Stadler, "Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome." Nat. Biotechnol., 23, #11 (2005) 1383-90.! 30,000 structured RNA elements ! 1,000 conserved across all vertebrates. ! ~1/3 in introns of known genes, ~1/6 in UTRs ! ~1/2 located far from any known gene! FOLDALIGN! E Torarinsson, M Sawera, JH Havgaard, M Fredholm, J Gorodkin, "Thousands of corresponding human and mouse genomic regions unalignable in primary sequence contain common RNA structure." Genome Res., 16, #7 (2006) 885-9.! 1800 candidates from 36970 (of 100,000) pairs! CMfinder! Torarinsson, Yao, Wiklund, Bramsen, Hansen, Kjems, Tommerup, Ruzzo and Gorodkin. Comparative genomics beyond sequence based alignments: RNA structures in the ENCODE regions. Genome Research, Feb 2008, 18(2):242-251 PMID: 18096747! 6500 candidates in ENCODE alone (better FDR, but still high)!
103
5´ 3´
104
G-U!
A C U G C A G G G A G C A A G C G A G G C C U C U G C A A U G A C G G U G C A U G A G A G C G U C U U U U C A A C A C U G U U A U G G A A G U U U G G C U A G C G U U C U A G AG C U G U G A C A C U G C C G C G A C G G G A A A G U A A C G G G C G G C G A G U A A A C C C G A U C C C G G U G A A U A G C C U G A A A A A C A A A G U A C A C G G G A U A C G
105
113
Anticodon loop! Anticodon" loop!
3’! 5’!
116
a.a.
Anticodon! loop! Anticodon" loop!
3’! 5’!
5’! 3’!
117
j < i’, or ! i < i’ < j’ < j !
2nd pair follows 1st, or is nested within it; " no “pseudoknots.”!
119
!
C! G! G! C! A! G! U! U! U! A! U! A! C! C! G! G! U! G! U! A! G! G! C! A! G! U! U! A! C! G! G! C! A! U! G! U! U! A!
sharp turn! crossing!
G! !4! U! A! C! C! G! G! U! U! G! A! base pair! C! G! G! C! A! G! U! U! U! A! C! A! U! A! C! G! G! G! G! U! A! U! A! C! C! G! G! U! G! U! A! A! C!
R Nussinov, AB Jacobson, "Fast algorithm for predicting the secondary structure of single-stranded RNA." PNAS 1980.
j i j-1 j k-1 k i j-1 k+1
K=2! 3! 4! 5!
1 2 3 4 5
hairpin! stack! bulge/! interior! multi-! loop! bulge/! interior!
139