BIOINFORMATICS Pages S249S257 Of truth and pathways: chasing bits - PDF document

Vol. 18 Suppl. 1 2002 BIOINFORMATICS Pages S249–S257 Of truth and pathways: chasing bits of information through myriads of articles Michael Krauthammer 1 , Pauline Kra 1, 2 , Ivan Iossifov 1, 2 , Shawn M. Gomez 2 , George Hripcsak 1 , Vasileios Hatzivassiloglou 4 , Carol Friedman 1, 3 and Andrey Rzhetsky 1, 2 1 Department of Medical Informatics, Columbia University, New York, NY, 10032, USA, 2 Columbia Genome Center, Columbia University, New York, NY, 10032, USA, 3 Department of Computer Science, Queens College CUNY, Flushing, NY, 11367, USA and 4 Department of Computer Science, Columbia University, New York, NY, 10027, USA Received on January 24, 2002; revised and accepted on April 1, 2002 ABSTRACT demic and commercial undertakings in modern biology Knowledge on interactions between molecules in living (Jeong et al. , 2001; Karp, 2000; Karp et al. , 1998). As cells is indispensable for theoretical analysis and practical these resources are used more intensively, the updating of applications in modern genomics and molecular biology. manually curated repositories becomes an important is- Building such networks relies on the assumption that sue. Usually, experts determine which information should the correct molecular interactions are known or can be be included in the repositories, and some databases, identified by reading a few research articles. However, such as DIP, invite outside researchers to help curate the this assumption does not necessarly hold, as truth is growing amount of data (Xenarios et al. , 2002). While rather an emerging property based on many potentially expert consensus is certainly the de facto standard in conflicting facts. This paper explores the processes of determining true molecular interactions, it is becoming knowledge generation and publishing in the molecular increasingly more difficult to keep up with the avalanche biology literature using modelling and analysis of real of information flooding research journals. Furthermore, molecular interaction data. The data analysed in this there is some concern that biased reporting of research article were automatically extracted from 50 000 research results in the literature may complicate the process of articles in molecular biology using a computer system truth finding. Mrowka and colleagues (Mrowka et al. , called GeneWays containing a natural language pro- 2001) have recently described significant discrepancies cessing module. The paper indicates that truthfulness of of two-hybrid protein–protein interaction datasets, which statements is associated in the minds of scientists with the were either indirectly compiled from single research relative importance (connectedness) of substances under publications or directly compiled from genomewide study, revealing a potential selection bias in the reporting screens. Their data shows a potential selection bias in the of research results. Aiming at understanding the statistical literature-based dataset, which ‘may have been introduced properties of the life cycle of biological facts reported in by the failure to report interactions which cannot be research articles, we formulate a stochastic model de- understood from previous publications, or by failing to scribing generation and propagation of knowledge about perform experiments for such pairs in the first case’. molecular interactions through scientific publications. We Elucidating such biases, as well as other complicating hope that in the future such a model can be useful for factors such as contradicting research results, are the aim automatically producing consensus views of molecular of this paper. Our motivation is the direct application interaction data. of such insights to our system called GeneWays, which Contact: ar345@columbia.edu automatically collects molecular interaction data from Keywords: statistical modelling; scientometric analysis; the research literature using a natural language module molecular interaction data; natural language processing called GENIES (Friedman et al. , 2001). Our goal is to assist experts in building a consensus representation of INTRODUCTION the extracted molecular information by automating the Molecular interaction data and corresponding knowledge consensus finding process when there are biased and/or bases are becoming increasingly important for both aca- conflicting research results. S249 � Oxford University Press 2002 c

BIOINFORMATICS Pages S249S257 Of truth and pathways: chasing bits - PDF document

Vol. 18 Suppl. 1 2002 BIOINFORMATICS Pages S249S257 Of truth and pathways: chasing bits of information through myriads of articles Michael Krauthammer 1 , Pauline Kra 1, 2 , Ivan Iossifov 1, 2 , Shawn M. Gomez 2 , George Hripcsak 1 ,

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 25

Outline Administravia What is bioinformatics CS 5263 Bioinformatics Why

Data Mining in Bioinformatics Day 6: Classification in Bioinformatics Karsten Borgwardt February

Data Mining in Bioinformatics Day 9: String & Text Mining in Bioinformatics Karsten Borgwardt

Bioinformatics Outline What is bioinformatics? Who are bioinformaticians? Hardware

Bioinformatics Panel Presentation Peter D. Karp, Ph.D. Director, Bioinformatics Research Group

SciLifeLab Bioinformatics Platform National Bioinformatics Infrastructure Sweden (NBIS) Nina

Data Mining in Bioinformatics Day 8: Feature Selection in Bioinformatics Karsten Borgwardt

Thailand Bioinformatics: Research and Applications Sissades T ongsima Bioinformatics

CAMDA: An Overview Michael Ochs Bioinformatics Fox Chase Cancer Center Bioinformatics Fox

Introduction to Cancer Bioinformatics and cancer biology Anthony Gitter Cancer Bioinformatics

Text Mining and Information Extraction Applications for Bioinformatics and Systems Biology Plant

Introduction to microarrays Thierry Sengstag, PhD Bioinformatics Core Facility Swiss Institute

CSCI 490 Bioinformatics Part I: Introduction to Bioinformatics and Molecular Biology Course

Bioinformatics Methods for Pathogen Bioinformatics Methods for Pathogen Identification

Practical Bioinformatics Mark Voorhies 4/16/2018 Mark Voorhies Practical Bioinformatics

Algorithms and Applications for Web-Scale Knowledge Graphs Marco Ponza Supervisor Prof. Paolo

http://www.smart-cities.eu/model.html 2 A large amount of Open Government Data in many

Breaking mechanism from a vacuum point in the defocusing NLS equation Antonio Moro Northumbria

1832-1883. Frederick Mulhaupt, 1871-1938. Edvard Munch, 1863-1944 . Lawton Parker, 1868-1954.

in d ealing with complexity ENGN2219/COMP6719 2020 Shayne.Flint@anu.edu.au Research School

A low-background structural scintillator for rare event physics experiments Michael Febbraro On

hyper-realistic sculptures Duane Hanson Ron Mueck These hyperrealist sculptures made with

NC MULTI - SLIDES PROGRAMMABLE CMP 250 NOVELTY Pressmac Recently developed a new version of his

Sambuz

Useful Links

Newsletter

Mail Us

BIOINFORMATICS Pages S249S257 Of truth and pathways: chasing bits - PDF document

Vol. 18 Suppl. 1 2002 BIOINFORMATICS Pages S249S257 Of truth and pathways: chasing bits of information through myriads of articles Michael Krauthammer 1 , Pauline Kra 1, 2 , Ivan Iossifov 1, 2 , Shawn M. Gomez 2 , George Hripcsak 1 ,

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics Karsten Borgwardt February 25

Outline Administravia What is bioinformatics CS 5263 Bioinformatics Why

Data Mining in Bioinformatics Day 6: Classification in Bioinformatics Karsten Borgwardt February

Data Mining in Bioinformatics Day 9: String &amp; Text Mining in Bioinformatics Karsten Borgwardt

Bioinformatics Outline What is bioinformatics? Who are bioinformaticians? Hardware

Bioinformatics Panel Presentation Peter D. Karp, Ph.D. Director, Bioinformatics Research Group

SciLifeLab Bioinformatics Platform National Bioinformatics Infrastructure Sweden (NBIS) Nina

Data Mining in Bioinformatics Day 8: Feature Selection in Bioinformatics Karsten Borgwardt

Thailand Bioinformatics: Research and Applications Sissades T ongsima Bioinformatics

CAMDA: An Overview Michael Ochs Bioinformatics Fox Chase Cancer Center Bioinformatics Fox

Introduction to Cancer Bioinformatics and cancer biology Anthony Gitter Cancer Bioinformatics

Text Mining and Information Extraction Applications for Bioinformatics and Systems Biology Plant

Introduction to microarrays Thierry Sengstag, PhD Bioinformatics Core Facility Swiss Institute

CSCI 490 Bioinformatics Part I: Introduction to Bioinformatics and Molecular Biology Course

Bioinformatics Methods for Pathogen Bioinformatics Methods for Pathogen Identification

Practical Bioinformatics Mark Voorhies 4/16/2018 Mark Voorhies Practical Bioinformatics

Algorithms and Applications for Web-Scale Knowledge Graphs Marco Ponza Supervisor Prof. Paolo

http://www.smart-cities.eu/model.html 2 A large amount of Open Government Data in many

Breaking mechanism from a vacuum point in the defocusing NLS equation Antonio Moro Northumbria

1832-1883. Frederick Mulhaupt, 1871-1938. Edvard Munch, 1863-1944 . Lawton Parker, 1868-1954.

in d ealing with complexity ENGN2219/COMP6719 2020 Shayne.Flint@anu.edu.au Research School

A low-background structural scintillator for rare event physics experiments Michael Febbraro On

hyper-realistic sculptures Duane Hanson Ron Mueck These hyperrealist sculptures made with

NC MULTI - SLIDES PROGRAMMABLE CMP 250 NOVELTY Pressmac Recently developed a new version of his

Sambuz

Useful Links

Newsletter

Mail Us

Data Mining in Bioinformatics Day 9: String & Text Mining in Bioinformatics Karsten Borgwardt