BIOINFORMATICS doi:10.1093/bioinformatics/btm227 A graph-based - PDF document

Vol. 23 ISMB/ECCB 2007, pages i577–i586 BIOINFORMATICS doi:10.1093/bioinformatics/btm227 A graph-based approach to systematically reconstruct human transcriptional regulatory modules Xifeng Yan 1 , Michael R. Mehan 2,† , Yu Huang 2 , Michael S. Waterman 2 , Philip S. Yu 1 and Xianghong Jasmine Zhou 2, * 1 IBM T. J. Watson Research Center, Hawthorne NY and 2 Program in Molecular and Computational Biology, University of Southern California, Los Angeles CA, USA 1 INTRODUCTION ABSTRACT Motivation: A major challenge in studying gene regulation is Reverse-engineering transcriptional regulatory networks is one to systematically reconstruct transcription regulatory modules, of the key challenges for computational biology (Conlon et al ., which are defined as sets of genes that are regulated by a 2003; Luscombe et al ., 2004; Pilpel et al ., 2001; Segal et al ., common set of transcription factors. A commonly used approach 2003; Wang et al ., 2005). Microarray technology, with its for transcription module reconstruction is to derive coexpression ability to simultaneously measure the expression of thousands clusters from a microarray dataset. However, such results often of genes, has revolutionized the way of studying gene contain false positives because genes from many transcription transcription. A commonly used analytical approach is to modules may be simultaneously perturbed upon a given type derive coexpression clusters, which are presumably likely to be of conditions. In this study, we propose and validate that controlled by the same transcription factors (Banerjee and genes, which form a coexpression cluster in multiple microarray Zhang, 2002; Liu et al ., 2001; Roth et al ., 1998; Zhou et al ., datasets across diverse conditions, are more likely to form a 2003). However, this assumption is not always true, because (1) transcription module. However, identifying genes coexpressed in one type of experimental condition may simultaneously a subset of many microarray datasets is not a trivial computational perturb multiple regulatory programs, such that genes problem. from these different regulatory programs may show similar Results: We propose a graph-based data-mining approach to and indistinguishable expression patterns; (2) even if the efficiently and systematically identify frequent coexpression clusters. regulation of those genes can be traced to the same transcrip- Given m microarray datasets, we model each microarray dataset as tion factors, they may be located in different positions of a coexpression graph, and search for vertex sets which are transcription cascades, and thus not share the same direct frequently densely connected across d � m e datasets ( 0 � � � 1 ). For regulators and (3) experimental noise and outliers may lead this novel graph-mining problem, we designed two techniques to to biased and erroneously high estimates of coexpression narrow down the search space: (1) partition the input graphs into similarity. (overlapping) groups sharing common properties; (2) summarize The rapid accumulation of microarray data has offered new promises in addressing the above problems; however, the the vertex neighbor information from the partitioned datasets potential is so far not well recognized and vastly under-utilized. onto the ‘Neighbor Association Summary Graph’s for effective Intuitively, if a set of genes form a coexpression cluster in mining. We applied our method to 105 human microarray datasets, multiple datasets generated under different conditions, they are and identified a large number of potential transcription more likely to represent a transcription module than a single- modules, activated under different subsets of conditions. occurrence cluster does (Zhou et al., 2005). Here, we define a Validation by ChIP-chip data demonstrated that the likelihood of a transcription module to be a set of genes regulated by the same coexpression cluster being a transcription module increases transcription factor(s). The challenge is how to efficiently significantly with its recurrence. Our method opens a new way to identify such gene sets. Although a variety of approaches have exploit the vast amount of existing microarray data accumulation been developed to cluster a microarray dataset (Eisen et al ., for gene regulation study. Furthermore, the algorithm is 1998; Tamayo et al ., 1999; Tavazoie and Church, 1998) they applicable to other biological networks for approximate network cannot be easily extended to identify gene sets coexpressed module mining. across a subset of given microarray datasets. The difficulty is Availability: http://zhoulab.usc.edu/NeMo/ that two factors must be simultaneously determined: first, Contact: xjzhou@usc.edu which set of genes can recurrently form a cluster; second, in which subset of microarrays does this set of genes form clusters. It is even harder if (1) we consider that not all genes within a coexpression cluster will strictly exhibit high expression correlation due to measurement noise; and (2) both the number of genes and the number of datasets are large. Since *To whom correspondence should be addressed. a set of genes may form coexpression clusters only under a y The authors wish it to be known that, in this opinion, the first two authors should be regarded as joint first authors. small subset of conditions due to the highly dynamic nature of � 2007 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

BIOINFORMATICS doi:10.1093/bioinformatics/btm227 A graph-based - PDF document

Vol. 23 ISMB/ECCB 2007, pages i577i586 BIOINFORMATICS doi:10.1093/bioinformatics/btm227 A graph-based approach to systematically reconstruct human transcriptional regulatory modules Xifeng Yan 1 , Michael R. Mehan 2, , Yu Huang 2 , Michael

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 25

Outline Administravia What is bioinformatics CS 5263 Bioinformatics Why

Data Mining in Bioinformatics Day 6: Classification in Bioinformatics Karsten Borgwardt February

Data Mining in Bioinformatics Day 9: String & Text Mining in Bioinformatics Karsten Borgwardt

Bioinformatics Outline What is bioinformatics? Who are bioinformaticians? Hardware

Bioinformatics Panel Presentation Peter D. Karp, Ph.D. Director, Bioinformatics Research Group

SciLifeLab Bioinformatics Platform National Bioinformatics Infrastructure Sweden (NBIS) Nina

Data Mining in Bioinformatics Day 8: Feature Selection in Bioinformatics Karsten Borgwardt

Thailand Bioinformatics: Research and Applications Sissades T ongsima Bioinformatics

CAMDA: An Overview Michael Ochs Bioinformatics Fox Chase Cancer Center Bioinformatics Fox

Introduction to Cancer Bioinformatics and cancer biology Anthony Gitter Cancer Bioinformatics

Text Mining and Information Extraction Applications for Bioinformatics and Systems Biology Plant

Introduction to microarrays Thierry Sengstag, PhD Bioinformatics Core Facility Swiss Institute

CSCI 490 Bioinformatics Part I: Introduction to Bioinformatics and Molecular Biology Course

Bioinformatics Methods for Pathogen Bioinformatics Methods for Pathogen Identification

Practical Bioinformatics Mark Voorhies 4/16/2018 Mark Voorhies Practical Bioinformatics

Gene position scoring within transcription regulation networks Ivan Junier, Joan Hrisson,

Computing Reachable States for Nonlinear Biological Models Thao Dang Colas Le Guernic Oded

Biological Realms in @didierverna facebook/didier.verna Computer Science

I t Introduction to d ti t Evolutionary Algorithms Federico Nesti, f.nesti@santannapisa.it

PTM Tracker: A System For Determining Method Trends Of PTM Modification Sites Relative To

PTT 207 Biomolecular and Genetic Engineering Semester 2 2013/2014 BY: PUAN NURUL AIN HARMIZA

CS 335 Machine Learning What is Machine Learning? Dan Sheldon Spring 2019 What is Machine

What kind of virtual machine is capable of human consciousness? Aaron Sloman

BIOINFORMATICS doi:10.1093/bioinformatics/btm227 A graph-based - PDF document

Vol. 23 ISMB/ECCB 2007, pages i577i586 BIOINFORMATICS doi:10.1093/bioinformatics/btm227 A graph-based approach to systematically reconstruct human transcriptional regulatory modules Xifeng Yan 1 , Michael R. Mehan 2, , Yu Huang 2 , Michael

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 25

Outline Administravia What is bioinformatics CS 5263 Bioinformatics Why

Data Mining in Bioinformatics Day 6: Classification in Bioinformatics Karsten Borgwardt February

Data Mining in Bioinformatics Day 9: String &amp; Text Mining in Bioinformatics Karsten Borgwardt

Bioinformatics Outline What is bioinformatics? Who are bioinformaticians? Hardware

Bioinformatics Panel Presentation Peter D. Karp, Ph.D. Director, Bioinformatics Research Group

SciLifeLab Bioinformatics Platform National Bioinformatics Infrastructure Sweden (NBIS) Nina

Data Mining in Bioinformatics Day 8: Feature Selection in Bioinformatics Karsten Borgwardt

Thailand Bioinformatics: Research and Applications Sissades T ongsima Bioinformatics

CAMDA: An Overview Michael Ochs Bioinformatics Fox Chase Cancer Center Bioinformatics Fox

Introduction to Cancer Bioinformatics and cancer biology Anthony Gitter Cancer Bioinformatics

Text Mining and Information Extraction Applications for Bioinformatics and Systems Biology Plant

Introduction to microarrays Thierry Sengstag, PhD Bioinformatics Core Facility Swiss Institute

CSCI 490 Bioinformatics Part I: Introduction to Bioinformatics and Molecular Biology Course

Bioinformatics Methods for Pathogen Bioinformatics Methods for Pathogen Identification

Practical Bioinformatics Mark Voorhies 4/16/2018 Mark Voorhies Practical Bioinformatics

Gene position scoring within transcription regulation networks Ivan Junier, Joan Hrisson,

Computing Reachable States for Nonlinear Biological Models Thao Dang Colas Le Guernic Oded

Biological Realms in @didierverna facebook/didier.verna Computer Science

I t Introduction to d ti t Evolutionary Algorithms Federico Nesti, f.nesti@santannapisa.it

PTM Tracker: A System For Determining Method Trends Of PTM Modification Sites Relative To

PTT 207 Biomolecular and Genetic Engineering Semester 2 2013/2014 BY: PUAN NURUL AIN HARMIZA

CS 335 Machine Learning What is Machine Learning? Dan Sheldon Spring 2019 What is Machine

What kind of virtual machine is capable of human consciousness? Aaron Sloman

Data Mining in Bioinformatics Day 9: String & Text Mining in Bioinformatics Karsten Borgwardt