cse182 l16
play

CSE182-L16 Non-coding RNA Biol. Data analysis: Review Assembly - PowerPoint PPT Presentation

CSE182-L16 Non-coding RNA Biol. Data analysis: Review Assembly Protein Sequence Sequence Analysis Analysis Gene Finding Much other analysis is possible Assembly Genomic Analysis/ Pop. Genetics Protein Sequence Sequence Analysis


  1. CSE182-L16 Non-coding RNA

  2. Biol. Data analysis: Review Assembly Protein Sequence Sequence Analysis Analysis Gene Finding

  3. Much other analysis is possible Assembly Genomic Analysis/ Pop. Genetics Protein Sequence Sequence Analysis Analysis Gene Finding ncRNA

  4. A Static picture of the cell is insufficient • Each Cell is continuously active, – Genes are being transcribed into RNA – RNA is translated into proteins – Proteins are PT modified and transported – Proteins perform various cellular Gene functions Proteomic Regulation Transcript • Can we probe the Cell profiling profiling dynamically

  5. ncRNA gene finding • Gene is transcribed but not translated. • What are the clues to non-coding genes? – Look for signals selecting start of transcription and translation. Non coding genes are transcribed by Pol III – Non-coding genes have structure. Look for genomic sequences that fold into an RNA structure • Structure: Given a sequence, what is the structure into which it can fold with minimum energy?

  6. tRNA structure

  7. RNA structure: Basics • Key: RNA is single-stranded. Think of a string over 4 letters, AC,G, and U. • The complementary bases form pairs. • Base-pairing defines a secondary structure. The base- pairing is usually non-crossing.

  8. RNA structure: pseudoknots Sometimes, unpaired bases in loops form ‘crossing pairs’. These are pseudoknots

  9. RNA structure prediction • Any set of non-crossing base-pairs defines a secondary structure. • Abstract Question: – Given an RNA string find a structure that maximizes the number of non-crossing base- pairs – Incorporate the true energetics of folding – Incorporate Pseudo-knots

  10. A combinatorial problem • Input: • A string over A,C,G,U • A pairs with U, C pairs with G • Output: • A subset of possible base-pairs of maximum size such that • No two base-pairs intersect • How can we compute this set efficiently?

  11. RNA structure Nussinov’s algorithm 1. Score B for every base-pair. No penalty for loops. No pesudo-knots. 1. Let W(i,j) be the score of the best structure of the subsequence 2. from i to j. for i = n down to 1 { for j = i+1 to n { Ï B ( r i , r j ) + W ( i + 1, j - 1), Ô W ( i , j - 1), Ô W(i +1,j) W ( i , j ) = max Ì W ( i , k ) + W ( k + 1, j ) i £ k < j Ô Ô Ó } }

  12. Obtaining RNA structure for i = n downto 1 { for j = i+1 to n { Ï B ( r i , r j ) + W ( i + 1, j - 1), (1) Ô W ( i , j - 1), (2) Ô W ( i , j ) = max Ì (3) W(i +1,j) Ô (4) W(i,k) +W(k +1,j) Ô Ó if (1) { S(i,j) = / else if (2) S(i,j) = | else if(3) S(i,j) = - else S(i,j) = k } } }

  13. Obtaining RNA Structure Procedure print_RNA(i,j) { if S(i,j) = / { print “(i,j)”; print_RNA(i+1,j-1); else if (S(i,j) = -) { print_RNA(i+1,j); } else if (S(i,j) = |) { print_RNA(i,j-1); } else { k=S(i,j) print_RNA(i,k); print_RNA(k+1,j); } }

  14. RNA structure: example A C G A U U A C G A U U 1 2 3 4 5 6 1 2 3 4 5 6 i 1 2 3 4 5 6 j 0 2 1 1 3 1 1 0 4 2 2 1 1 5 3 2 1 1 0 6

  15. RNA Structure: Details

  16. Base-pairing & Loops Base-pairs arise from complementary nucleotides • Single-stranded • Stack is when 2 base-pairs are contiguous • Loops arise when there are unpaired bases. • They are characterized by the number of base-pairs that close it. • • Hairpin: closed by 1 base-pair • Bulge/Interior Loops (2 base-pairs) • Multiple Internal loops (k base-pairs)

  17. Scoring Loops, multi-loops • Zuker-Turner Energy Rules http://www.bioinfo.rpi.edu/~zukerm/rna/energy/node2.html • • Stacking Energies • Energy for Bulges and Interior Loops • Energy for Multi-loops

  18. Other tricks for obtaining structure • Alignment and Covariance

  19. RNA: unsolved problems • The structure problem is still unsolved. – De novo prediction does not work as well. – Co-variance models require prior alignment. • Many undiscovered non-coding genes – miRNA, and others have only just been discovered. – Very hard to detect signal for these genes – Random sequence folds into low energy structures.

  20. Other ncRNA: miRNA ncRNA ~22 nt in length • Pairs to sites within the 3’ UTR, • specifying translational repression. Similar to siRNA (involved in RNAi) • Unlike siRNA, miRNA do not need • perfect base complementarity Until recently, no computational • techniques to predict miRNA Most predictions based on cloning • small RNAs from size fractionated samples

  21. Gene Regulation

  22. Gene expression • The expression of transcripts and protein in the cell is not static. It changes in response to signals. • The expression can be measured using micro- arrays. • What causes the change in expression?

  23. Transcriptional machinery • DNA polymerase (II) scans the genome, initiating transcription, and terminating it. • The same machinery is used for every gene, so while Pol II is required, it is not sufficient to confer specificity

  24. TF binding • Other transcription Transcription factors factors interact with the core machinery and upstream DNA to provide specificity. • TFs bind to TF binding sites which are clustered in upstream enhancer and promoter elements. • The enhancer elements may be located many kb upstream of the core- promoter Upstream elements

  25. TF binding sites • TF binding sites are weak signal (about 10 bp with 5bp TCAGGAG g 1 conserved) TGAGGAG g 2 • If two genes are co- g 3 TCAGGTG regulated, they are g 4 TGAGGTG likely to share binding g 5 TCAGGTG sites • Discovery of binding site motifs is an important research problem.

  26. http://www.gene-regulation.com/pub/databases.html#transfac

  27. Discovering TF binding sites • Identification of these TF binding sites/switches is critical. • Requires identification of co-regulated genes (genes containing the same set of switches). • How do we find co-regulated genes?

  28. Idea1: Use orthologous genes from different species 1. The species are too close (EX: ACGGCAGCTCGCCGCCGCGC humans and chimps). Binding ||||| || ||||||| || & non-binding sites are both ACGGC-GGGCGCCGCCCCGC conserved. 2. The species are distant. Binding ACGGCAGCTCGCCGCCGC-C sites are conserved but not | || | ||||||| | other sequence. AGTGC-GGGCGCCGCCTCAT 3. The species are very distant. Even binding sites are not ACGGC-GC-TCGCCGCCGCGC | | | || | | conerved. The genes have AT-ACGAAGTAGCGG-ATGGT alternative regulators.

  29. Idea2: Measure expression of genes • Northern Blot: – Quantitative expression of a few genes

  30. Microarray • Expression level of all genes

  31. Protein Expression using MS

  32. Pathways • Proteins interact to transduce signal, catalyze reactions, etc. • The interactions can be captured in a database. • Queries on this database are about looking for interesting sub-graphs in a large graph.

  33. Biological databases in NAR • http://www3.oup.co.uk/nar/database/c • 548 databases in various categories Genbank Rfam SwissProt PDB Kegg dbSNP/OMIM/seattleSNPs Stanford microarray db SWISS 2D-page

  34. Summary • Biological databases cannot be understood without understanding the data, and the tools for querying and accessing these data. • While database technology (XML, Relational OO databases, text formats) is used to store this data, its use is (often) transparent for Bioinformatics people. • In this course, we looked at various data-streams, and pointed to databases that store these data- 2004: 548 databases streams • Nucleic Acids Research brings out a database issue every January

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend