Topics Biological background Gene Regulation: Bioinformatic - PDF document

Topics • Biological background Gene Regulation: Bioinformatic aspects • Computational methods/challenges Jaak Vilo • Current projects CS theory days, Koke, 4.2.04 +Brain has ~10.000 300+ Cell types http://www.scripps.edu/pub/goodsell / David S. Goodsell Level 0 ATCGCTGAATTCCAATGTG Central dogma Level 1 A eukaryotic genome can be DNA Level 2 thought of as six T T AA G C T C C G T A G C A Levels of DNA structure. Level 3 The loops at mRNA Level 4 range U U AA G C U C C G U A G C A Level 4 from 0.5kb to 100kb in length. If these loops were Level 5 stabilized then the valk Leu Ser Ser Val Ala genes inside the loop would not be expressed. Level 6 1

✦ ✘ ✘ ✑ ✛ ✢ ✖ ✗ ✣ ✓ ✛ ✙ ✤ ✧ ✭ ✬ ✮ ✯ ✰ ✬ ✲ ✚ ✜ ✘ ✑ ✗ ✖ ✓ ✒ ✏ DNA determines function (?) �✂✁☎✄☎✆ ✝✟✞ ✠✟✡ ☛✟✡✌☞✟✝✎✍ DNA Protein Structure SwissProt/TrEMBL PDB/Molecular Structure Database GenBank / EMBL Bank ✓✕✔ 20+ Amino Acids 4 Nucleotides (3nt 1 AA) Function? A Simple Metabolic Pathway A Simple Gene ✥✌✦ ✳✌✦ Upstream/ Downstream promoter ATCGAAAT ★✌✩✕✪ ✫✌✬ ✪✌✱ DNA: TAGCTTTA Shoshanna Wodak, Jacques van Helden Regulation of gene expression (transcription) Gene regulation Model of RNA Polymerase II Transcription Initiation Machinery. The machinery depicted here • Determines encompasses over 85 polypeptides in ten (sub) • the development (from embryo) complexes : core RNA polymerase II (RNAPII) • cell types consists of 12 subunits; TFIIH, 9 subunits; TFIIE, 2 • processes of the cell subunits; TFIIF, 3 subunits; TFIIB, 1 subunit, TFIID, 14 • response to the environment subunits; core SRB/mediator, • … more than 16 subunits; Swi/Snf complex, 11 subunits; Srb10 kinase complex, 4 subunits; and • Regulation happens at different levels SAGA, 13 subunits. F.C.P. Holstege, E.G. Jenning, J.J. Wyrick, Tong Ihn Lee, C.J. Hengartner, M.R. Green, T.R. Golub, E.S. Lander, and R.A. Young Dissecting the Regulatory Circuitry of a Eukaryotic Genome Cell 95: 717-728 (1998) 2

Regulation of splicing Regulation by binding to DNA/RNA 80 % 15 % 5 % Valgu seondumine võib mõjutada splaissingut 4^6= 4096, 4^8=65.000 Tissue specific alternative splicing Regulation of Alternative Splicing EST-tehnoloogial baseeruvad andmed (Meelis Kull) T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 sum • Which splice variants in which cells? Geen 1 V1 15 1 1 0 0 3 0 1 2 1 6 V2 5 0 1 3 3 2 2 9 4 9 38 • Are there cell type specific splicing V3 3 0 1 2 1 0 0 1 0 1 9 regulators and signals in DNA/RNA? V4 1 0 0 1 0 0 0 0 1 0 3 V5 0 0 0 0 1 0 0 0 0 3 4 Geen 2 V1 8 1 3 4 1 1 2 11 3 12 46 V2 3 0 3 0 0 0 2 7 0 4 19 • Find genes that have an exon switched V3 0 0 0 0 1 1 1 0 0 1 4 V4 2 0 0 0 0 0 1 2 1 2 8 on specifically in tissue X V5 0 0 0 0 0 0 0 1 0 1 2 • Is there a common signal for all such V6 0 0 1 0 0 0 1 0 0 2 4 Geen 3 V1 16 1 3 5 2 4 3 17 7 18 76 exons or splicing events? V2 7 0 1 2 0 2 2 6 4 8 32 V3 1 0 0 0 0 1 2 1 1 1 7 How to study the gene regulation Core data ( Static ) with computational methods? • DNA sequence(s) • What data is available? • Genes • How to combine them meaningfully? • Protein sequences • Relation to other species • Algorithms (is the analysis feasible)? • Protein structure (???) • Actual analysis • Partial knowledge about function • Interpret the results • how to capture this formally? 3

Phylogenetic tree www.tolweb.org Veel pole piisavalt informatsiooni: Expression data (dynamics) • Alternatiivne splaissing • Low-throughput methods • Valkude modifikatsioonid • Expressed Sequence Tags (EST) • RNA geenid, lühikesed geenid, … • RNA sequences • geenide regulatsioon ja võrgustikud • DNA microarrays for gene expression • DNA ja RNA struktuur ja nende mõjud • Valkude struktuur ja täpne funktsioon ning • Relative abundance of RNA in cell roll bioloogilistes protsessides • Genome-wide localization studies • metaboolsed ja signaali ülekande rajad • binding of proteins to DNA • Variatsioonid populatsioonis • Proteomics • Rääkimata selle kõige arvesse võtmisest • Amount of proteins in cells organismi tasemel… Study of sequence features Study of sequence features Promoters vs. background Is there something unique in the promoter regions? a) b) random (other regions) 4

Phylogenetic footprinting Upstream vs genomic random Study the same gene in many species human ape mouse fish chicken … If preserved during evolution then must be important for something!!! Proteasome: GGTGGCAAA Similar function or role same regulation? • This may or may not be true • How do we actually know that they are behaving similarly? • Different regulation mechanisms may achieve the same effect Proteasome: -1:GGTGGCAAA Proteasome: -2:GGTGGCAAA 5

Proteasome: -3:GGTGGCAAA Proteasome movie • Movies\proteasome.wmv ATG S. Pombe GO+genome W C Cytosolic Ribosome 187 vs. 4897 genes in total -1: ..[AG][AG][AG]CAGTCAC[AG].. Homol-D 121 vs 249 Probability < 1e-117 -1: ..[AG]CCCTA[CA]CCT.. Homol-E 58 vs. 159 Dynamics? Experimental data? • Which genes regulate others • When and how genes are ‘switched on • What data can we start with? or off?’ • What is the global relationship between • What is known or hypothesised so far? genes • How to model the gene regulation? • Can one test the new hypotheses in • Continuous stochastic processes practice? responding to the external stimuli 6

TIGR 32k Human Arrays TIGR 32k Human Arrays Analysis of biological samples with microarrays culture 1 mRNA cDNA hybridise culture 2 LASER, scanning DB Eisen et.al , PNAS 98 Spellman et.al . Mol Biol Cell 98 From microarray images to gene expression data Raw data Intermediate data Final data Image quantifications Array scans Samples Spots Genes Gene Spot/Image expression quantiations levels Tumor classification: 1) class prediction 2) class discovery Hughes, T. R. et al: “Functional Discovery via a Compendium of Expression Profiles”, Cell 102 (2000), 109-126. ALL AML Golub et al, Science Oct 15th 1999 • 38 samples of acute ALL AML myeloic leukemia (AML) and acute lymphoblastic leukemia (ALL) •6817 genes •classificator built based on 50 best correlated genes •tested on 34 new samples, 29 of them predicted accurately 7

✰ ✩ ✳ ✲ ✱ ✱ ✬ ✫ ✪ ✝ ✝ ✷ ✟ ★ ✝ ✧ ✗ ✦ ✥ ✤ ✣ ✬ ✲ ✛ ✲ ✳ ✲ ✱ ✴ ✿ ✾ ✫ ✻ ✱ ✱ ✳ ✬ ✲ ✴ ✲ ✳ ✮ ✲ ✳ ✸ ✚ ✠ ✚ ✠ ✍ ✌ ☞ ☛ ✡ ✆ ✟ ✎ ☎ ✞ ✝ ☎ ✁ � ✎ ✆ ✏ ✏ ✙ ✘ ✗ ✖ ✎ ✍ ✔ ✕ ✑ ✓ ✑ ✌ ☞ ✒ Cluster of co-expressed genes, pattern discovery in regulatory regions Gene expression data • Snapshots in time to various stimuli, ✁✄✂ conditions, tissues, time, • Approximate information about the level of gene expression (RNA transcripts) • Limited granularity of time • Limited accuracy ✜✢✙ ✭✯✮ ✴✶✵ ✭✺✹✶✫ ✬✽✼ • Data size is large => need fast methods Genome Research 1998; • Algorithm: Meelis Kull and J.V. ISMB (Intelligent Systems in Mol. Biol.) 2000 The most unprobable pattern from best Pattern selection criteria clusters Pattern Probability Cluster Occurrences Total nr of K Binomial distribution size in cluster occurrences in K-means AAAATTTT 2.59E-43 96 72 830 60 ACGCG 6.41E-39 96 75 1088 50 ACGCGT 5.23E-38 94 52 387 40 CCTCGACTAA 5.43E-38 27 18 23 220 GACGCG 7.89E-31 86 40 284 38 Background - TTTCGAAACTTACAAAAAT 2.08E-29 26 14 18 450 TTCTTGTCAAAAAGC 2.08E-29 26 14 18 325 ALL upstream Cluster: π π occurs 3 times π π ACATACTATTGTTAAT 3.81E-28 22 13 18 280 GATGAGATG 5.60E-28 68 24 83 84 sequences TGTTTATATTGATGGA 1.90E-27 24 13 18 220 GATGGATTTCTTGTCAAAA 5.04E-27 18 12 18 500 TATAAATAGAGC 1.51E-26 27 13 18 300 GATTTCTTGTCAAA 3.40E-26 20 12 18 700 P(3,6,0.2) is probability GATGGATTTCTTG 3.40E-26 20 12 18 875 of having ≥ 3 matches GGTGGCAA 4.18E-26 40 20 96 180 TTCTTGTCAAAAAGCA 5.10E-26 29 13 18 250 CGAAACTTACAAA 5.10E-26 29 13 18 290 in 6 sequences GAAACTTACAAAAATAAA 7.92E-26 21 12 18 650 TTTGTTTATATTG 1.74E-25 22 12 18 600 ATCAACATACTATTGT 3.62E-25 23 12 18 375 ATCAACATACTATTGTTA 3.62E-25 23 12 18 625 GAACGCGCG 4.47E-25 20 11 13 260 P( π π ,3,6,0.2) =0.0989 π π GTTAATTTCGAAAC 7.23E-25 24 12 18 400 GGTGGCAAAA 3.37E-24 33 14 31 475 5 out of 25, p = 0.2 ATCTTTTGTTTATATTGA 7.19E-24 19 11 18 675 TTTGTTTATATTGATGGA 7.19E-24 19 11 18 475 Vilo et.al. ISMB 2000 GTGGCAAA 1.14E-23 28 18 137 725 Significance of the patterns The pattern probability vs. The same for randomised the average silhouette for clusters the cluster Vilo et.al. ISMB 2000 8

Topics Biological background Gene Regulation: Bioinformatic - PDF document

Topics Biological background Gene Regulation: Bioinformatic aspects Computational methods/challenges Jaak Vilo Current projects CS theory days, Koke, 4.2.04 +Brain has ~10.000 300+ Cell types http://www.scripps.edu/pub/goodsell

Advanced MySQL topics Presented by : John A Mahady AndrewInfoServices.com Topics Topics

6/30/20 SIO15-SS1 2020 Topics 01/02: Nat. Disasters/Forces and Energy SIO15-SS1 2020 Topics

EFFICACY TOPICS EFFICACY TOPICS Public ICH meeting - Brussels 14 th November 2008 International

Topics Redux Michael R. Gunson February 23, 2001 1 AIRS Topics Status mrg Topics From Last

Dealing With Missing Data Possible Future Topics Novice user topics: Advanced topics:

Provider Topics for MCOs and OLTL Topics for MCOs o Safe and Orderly Discharges for NF

2020 Church Finance Topics Presented by Suzanne Krejcar, Treasurer January 26, 2020 Topics

Agenda Decision Topics Review 2006 Scheduled Meeting Topics (what, when) Determine

Aug me nte d Re a lity Sung -e ui Yo o n Project Guidelines: Project Topics Any topics

Current Trends and Hot Topics from a MHRA Borderline Perspective Trends and Hot topics

Topics Topics mechanical energy Force regulation by muscle WATCH HOW MUSCLE CELLS CONTRACT

AUCD Research Topics of AUCD Research Topics of Interest (RTOI) Webinar Interest (RTOI) Webinar

OPEN CALL TOPICS- ADDITIONAL LIST CURRENT TOPICS Innovation TOPIC SUB-THEMES MEMBERS/PARTNERS

Fraud, Waste and Abuse Presentation Topics TOPICS SLIDES Our Pledge 3 Program Integrity

Topics Topics Acute Radiation Syndrome (ARS) y ( ) Definition and diagnosis

NOISE ABATEMENT ANALYSIS NOISE ABATEMENT ANALYSIS DISCUSSION TOPICS DISCUSSION TOPICS

Contrasted Penalized Integrative Analysis Shuangge Ma School of Public Health, Yale University

How Does Surviving War Age the Body and Mind? An Analysis of Subjective Age in Vietnam VHAS Kim

OpenStack Tutorial IEEE CloudCom 2010 Bret Piatt Community Stacker Twitter: @bpiatt

Pilonidal Disease Pilonidal Disease 1 3/8/2014 Pilonidal Disease Pilonidal Disease An

30 September 1-2 October 2016 123 consecutive pts treated by Hypofractionated Radiotherapy with

Profile HMMs for Sequence Families COMP 571 Luay Nakhleh, Rice University Sequence Families

Chemistry 1000 Lecture 14: The group 13 metals Marc R. Roussel October 1, 2018 Marc R. Roussel

Objectives Review most common diseases in dermatology What the primary needs to know

Sambuz

Useful Links

Newsletter

Mail Us