1
Current Trends: Non-coding RNAs
“Central Dogma” of molecular biology DNA RNA Protein
(mRNA)
Current Trends: Non-coding RNAs Central Dogma of molecular biology - - PDF document
Current Trends: Non-coding RNAs Central Dogma of molecular biology Reverse RNA virus transcriptase replication DNA RNA Protein Cellular functions (mRNA) in vitro (ncRNA) 1 Non-coding RNAs Found in prokaryotes (small RNAs)
(mRNA)
Gottesman, Trends in Genetics 21:399-404
RNAi RNAa…
Storz et al., Ann. Rev.
Red = bacterial Blue = eukaryotes
UAGCAUGUACGUAGCUAGCUACGAUUGUUAUUACUGUCGUGCUUUCACUUCUCGCAGGAGUCCUCGUAUGGUA
A U G G A A A A A A A A A C C C C C C C C C C G G G G G G G G G U U U U U U U U U U U A A A A A A C C C C C C G G G G G G G C C C C C A A A A A U U U U U U U U G G G G G U U C C C C
RNA gene messenger RNA
G C A CC A U U U ACA U GU ACAGCACGAAAGU AAGAG CG UCC CAG AGGACUAGCG UGU A UA UGUCGUGCUUUCA UUCUC GC AGG GUC UCGUAUGGUA ||| | || ||||||||||||| ||||| || ||| ||| || C C C C C C G
U G U G G G C C A A U U U G G G G U U C C C C
UUCAUUAUGACCUUCGUU UAGCAUGUACGUAGCUAGCUACGAU
A U G G A A A A A A A A A C C C C C C C C C C G G G G G G G G G U U U U U U U U U U U A A A A A A C C C C C C G G G G G G G C C C C C A A A A A U U U U U U U U G G G G G U U C C C C
41kb EP
growth suppressors growth promoter growth promoter
Whooping cough Meningitis Botulism Dental Cavities Dysentery The Black Plague Syphilis Tetanus Scarlet Fever Yaws Pneumonia Gonorrhea Gastroenteritis Typhoid Fever Rocky Mountain Spotted Fever Rheumatic Fever Anthrax Leprosy Tuberculosis Diptheria Cholera Strep Throat Food Poisoning Lyme Disease Peptic Ulcers
Genes assigned function Conserved hypothetical genes Hypothetical genes Total genes 5131416 bp total: chromosome is 4969803 bp; pMR-1 is 161613 bp 45.9% G-C content; 85.5% of genome is coding 5066 4938 (97.5%) Protein-coding genes 128 (2.5%) tRNA and rRNA genes 1159 (27%) 864 (17.5%) 2915 (59%)
GTCAGTATAGTCGCATTATAGCCGATCTGAGTCAGTCAGTCGTAGTATCGTAGTCAGTCGTACGTAGTCAGTCGTATCAGTCGAGTCAGTCGA GCTAGTCGATCGTATCACTATCATCGTACGTAGTGCTAGTCAGTGTCATCGATGCGTACGTAGTCAGTTACGTAGCATCGTACGTAGTCATGC ATGCTAGCTAGCTAGCTAGCTAGCTACGCGATCGTGCGTATGCGTATATTATATGCGCTAGCAGTCGTAGTACGTAGTACTATGTATGCGTAC GTGATGCTAGTTGCGTACGATAGCGATACGATCAGTCGTATCGATCGTATGCATCGAGAGTCGTAGTAGCGATTAGCGCTAGTCATTATAGTC GTACTTAGGTCGCGGCGATTACGGATAGTCTGATCACGACGTATGAGCTGACGCGGCGATCAGGAAGACCCTCGCGGAGAACCTGAAAGCACG ACATTGCTCACATTGCTTCCAGTATTACTTAGCCAGCCGGGTGCTGGCTTTTTGTACGTACTGAGTCGGCATTATAGCGTATGCATACGGAGT ACGAGTCGTACGGACAGTCGTAGTCAGTCTGATCAGTCAGTCGTAGTCGTATGCAGTCGACGAGTCGTACGTATGCAGTCGATCGTTAGGACT CGTAAGTCGTATCATATCGGATTATAGCATGCTAGAGCTAGTCGTATAGTCTACGAGTTATACGTCTAGTGGCTAGTGTACGTCAGTCGTACG ATGCAGTTAGTAGTCTAGTATTACGATTAGTCGTGATCTGAGTAGTTACGTCGATGGTAGCCATTATACGTACTTAC
GTCAGTATAGTCGCATTATAGCCGATCTGAGTCAGTCAGTCGTAGTATCGTAGTCAGTCGTACGTAGTCAGTCGTATCAGTCGAGTCAGTCGA GCTAGTCGATCGTATCACTATCATCGTACGTAGTGCTAGTCAGTGTCATCGATGCGTACGTAGTCAGTTACGTAGCATCGTACGTAGTCATGC ATGCTAGCTAGCTAGCTAGCTAGCTACGCGATCGTGCGTATGCGTATATTATATGCGCTAGCAGTCGTAGTACGTAGTACTATGTATGCGTAC GTGATGCTAGTTGCGTACGATAGCGATACGATCAGTCGTATCGATCGTATGCATCGAGAGTCGTAGTAGCGATTAGCGCTAGTCATTATAGTC GTACTTAGGTCGCGGCGATTACGGATAGTCTGATCACGACGTATGAGCTGACGCGGCGATCAGGAAGACCCTCGCGGAGAACCTGAAAGCACG ACATTGCTCACATTGCTTCCAGTATTACTTAGCCAGCCGGGTGCTGGCTTTTTGTACGTACTGAGTCGGCATTATAGCGTATGCATACGGAGT ACGAGTCGTACGGACAGTCGTAGTCAGTCTGATCAGTCAGTCGTAGTCGTATGCAGTCGACGAGTCGTACGTATGCAGTCGATCGTTAGGACT CGTAAGTCGTATCATATCGGATTATAGCATGCTAGAGCTAGTCGTATAGTCTACGAGTTATACGTCTAGTGGCTAGTGTACGTCAGTCGTACG ATGCAGTTAGTAGTCTAGTATTACGATTAGTCGTGATCTGAGTAGTTACGTCGATGGTAGCCATTATACGTACTTAC
A 0.23 C 0.27 G 0.27 T 0.23 A 0.23 C 0.27 G 0.27 T 0.23 A 0.28 C 0.22 G 0.22 T 0.28 A 0.28 C 0.22 G 0.22 T 0.28
GTCAGTATAGTCGCATTATAGCCGATCTGAGTCAGTCAGTCGTAGTATCGTAGTCAGTCGTACGTAGTCAGTCGTATCAGTCGAGTCAGTCGA GCTAGTCGATCGTATCACTATCATCGTACGTAGTGCTAGTCAGTGTCATCGATGCGTACGTAGTCAGTTACGTAGCATCGTACGTAGTCATGC ATGCTAGCTAGCTAGCTAGCTAGCTACGCGATCGTGCGTATGCGTATATTATATGCGCTAGCAGTCGTAGTACGTAGTACTATGTATGCGTAC GTGATGCTAGTTGCGTACGATAGCGATACGATCAGTCGTATCGATCGTATGCATCGAGAGTCGTAGTAGCGATTAGCGCTAGTCATTATAGTC GTACTTAGGTCGCGGCGATTACGGATAGTCTGATCACGACGTATGAGCTGACGCGGCGATCAGGAAGACCCTCGCGGAGAACCTGAAAGCACG ACATTGCTCACATTGCTTCCAGTATTACTTAGCCAGCCGGGTGCTGGCTTTTTGTACGTACTGAGTCGGCATTATAGCGTATGCATACGGAGT ACGAGTCGTACGGACAGTCGTAGTCAGTCTGATCAGTCAGTCGTAGTCGTATGCAGTCGACGAGTCGTACGTATGCAGTCGATCGTTAGGACT CGTAAGTCGTATCATATCGGATTATAGCATGCTAGAGCTAGTCGTATAGTCTACGAGTTATACGTCTAGTGGCTAGTGTACGTCAGTCGTACG ATGCAGTTAGTAGTCTAGTATTACGATTAGTCGTGATCTGAGTAGTTACGTCGATGGTAGCCATTATACGTACTTAC GATATCGGTACGTGTCTAGCATGAGTCTATCTATACTGTCGGCGTATCGTACGTATGCGTAATCGATCAGTGTCGTATCGAGTTACGATGCAT GAGTCGTACGTATCGTAGCATGCTAGCTACGATGCTAGCATGCTAGCATCGATGCATGCATGCTGACTAGATCGTACGTAGCTACGTAGTCGT AAGTCGTAGTCGTAGCTAGTTAGCGCGTATAGCGTACGTAGTACGTATCGATGCGTAGTCATTACGACTGATCGTAAGTCGAGCGATCAGCAA GACCCACGAGGAGAACCTGAAGCACGACATTGCTCAATTGCTTCCAGATTACGTAGCCAGGGCCGGGTGCTGGTTTTTCAGTCGTACGTAGCT AGTAGTCGTACTGAGCAGTCTAGCATCGTAGTCATGATTGCGTACGTATCGATCGAGTCGATGCATGTATATATGCCGCGTACTGACGTACGT AGTCTAGCTAGTCATGCTATATACGGCGCTAGTCGTAGTACGTCGTAGTCAGTGTCAGTATCGAGTCATGCATGTCGTACGTATGGCATGGCT AGTCATGGACTAGCTAGTAGCGTACGTAGTCATTATACGTACGTCGTATGATATATTAGCGCCGCGGTGTACTGCGTCGTGTCGTATACTACT GATCTGATCGTAGTACTGCTACGTAGTCGTAGCAGTCGATCGTATGCATGCGTAGTCGTAGTCTAGCTGATCTACGTAGTCGTAGTATGCGTA GTCTAGTCTATGCATTATATGCTATAGTCATGCTAGCATACGT
Genome (1) Genome (2) Shewanella amazonensis, Shewanella baltica, Shewanella denitrificans, Shewanella frigidimarina, Shewanella loihica, Vibrio cholerae, Yersinia pestis, Photorhabdus luminescens, Photobacterium profundum
Rivas and Eddy, BMC Bioinformatics 2:8
A A A A C T G G C T T G T G A A A A G C C G G C T T G A G T
Derive score based on: # of compensatory mutations Length of sequence Sequence structure
0.00 0.02 0.04 0.06 0.08 0.10
3 6 9 12 15 18 21 24 Score Frequency (probability) of score Documented RNAs Intergenic Regions EVD for Documented RNAs EVD for Intergenic Regions
10 20 30 40 50 60 70 80 90
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Correlation Coefficient # of IG Probes Correlation in non-operon IG regions Correlation in operon IG regions
3) DNA Microarray Data 2) Conserved Structure Data
A U G G A A A A A A A A A C C C C C C C C C C G G G G G G G G G U U U U U U U U U U U A A A A A A C C C C C C G G G G G G G C C C C C A A A A A U U U U U U U U G G G G G U U C C C C A U G G A A A A A A A A A C C C C C C C C C C G G G G G G G G G U U U U U U U U U U U A A A A A A C C C C C C G G G G G G G C C C C C A A A A A U U U U U U U U G G G G G U U C C C C A U G G A A A A A A A A A C C C C C C C C C C G G G G G G G G G U U U U U U U U U U U A A A A A A C C C G C C G G G G G G C U C C C C A A A A A U U U U U U U U A G G G G U U C C C C A U G G A A A A A A A A A C C C C C C C C C C G G G G G G G G G U U U U U U U U U U U A A A A A A C C C G C C G G G G G G C U C C C C A A A A A U U U U U U U U A G G G G U U C C C C
1) Sequence Data ATGCATGCTAGTCATC GATCGATC
A 0.28 C 0.22 G 0.22 T 0.28 A 0.28 C 0.22 G 0.22 T 0.28 A 0.28 C 0.22 G 0.22 T 0.28 A 0.28 C 0.22 G 0.22 T 0.28 A 0.23 C 0.27 G 0.27 T 0.23 A 0.23 C 0.27 G 0.27 T 0.23 A 0.23 C 0.27 G 0.27 T 0.23 A 0.23 C 0.27 G 0.27 T 0.23
general Markov model
A U G G A A A A A A A A A C C C C C C C C C C G G G G G G G G G U U U U U U U U U U U A A A A A A C C C C C C G G G G G G G C C C C C A A A A A U U U U U U U U G G G G G U U C C C C A U G G A A A A A A A A A C C C C C C C C C C G G G G G G G G G U U U U U U U U U U U A A A A A A C C C C C C G G G G G G G C C C C C A A A A A U U U U U U U U G G G G G U U C C C C A U G G A A A A A A A A A C C C C C C C C C C G G G G G G G G G U U U U U U U U U U U A A A A A A C C C G C C G G G G G G C U C C C C A A A A A U U U U U U U U A G G G G U U C C C C A U G G A A A A A A A A A C C C C C C C C C C G G G G G G G G G U U U U U U U U U U U A A A A A A C C C G C C G G G G G G C U C C C C A A A A A U U U U U U U U A G G G G U U C C C C
ATGCATGCTAGTCATC GATCGATC
A 0.28 C 0.22 G 0.22 T 0.28 A 0.28 C 0.22 G 0.22 T 0.28 A 0.28 C 0.22 G 0.22 T 0.28 A 0.28 C 0.22 G 0.22 T 0.28 A 0.23 C 0.27 G 0.27 T 0.23 A 0.23 C 0.27 G 0.27 T 0.23 A 0.23 C 0.27 G 0.27 T 0.23 A 0.23 C 0.27 G 0.27 T 0.23 GTCAGTATAGTCGCATTATAGCCGATCTGAGTCAGTCAGTCGTAGTATCGTAGTCAGTCGTACGTAGTCAGTCGTATCAGTCGAGTCAGTCGA GCTAGTCGATCGTATCACTATCATCGTACGTAGTGCTAGTCAGTGTCATCGATGCGTACGTAGTCAGTTACGTAGCATCGTACGTAGTCATGC ATGCTAGCTAGCTAGCTAGCTAGCTACGCGATCGTGCGTATGCGTATATTATATGCGCTAGCAGTCGTAGTACGTAGTACTATGTATGCGTAC GTGATGCTAGTTGCGTACGATAGCGATACGATCAGTCGTATCGATCGTATGCATCGAGAGTCGTAGTAGCGATTAGCGCTAGTCATTATAGTC GTACTTAGGTCGCGGCGATTACGGATAGTCTGATCACGACGTATGAGCTGACGCGGCGATCAGGAAGACCCTCGCGGAGAACCTGAAAGCACG ACATTGCTCACATTGCTTCCAGTATTACTTAGCCAGCCGGGTGCTGGCTTTTTGTACGTACTGAGTCGGCATTATAGCGTATGCATACGGAGT ACGAGTCGTACGGACAGTCGTAGTCAGTCTGATCAGTCAGTCGTAGTCGTATGCAGTCGACGAGTCGTACGTATGCAGTCGATCG
3) DNA Microarray Data 2) Conserved Structure Data 1) Sequence Data
0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100%
1.0 - Specificity Sensitivity (1) (3) (2) (QRNA) (1,2) (1,2,3) (2,3) (1,3) (1) Primary sequence (2) Conserved structure (3) Expression data % actual ncRNAs found False positive rate
Fur repressor is active when bound to Fe2+ Fe2+ limitation induces RyhB expression
Shewanella ryhB ~100 nt
spot42 negatively regulates translation of galK but does not affect galE translation spot42 expression increases the GalE:GalK ratio Thus, glucose induces spot42 expression
~110 nt