Identification of fever and vaccine- associated gene interaction - - PowerPoint PPT Presentation

identification of fever and vaccine associated gene
SMART_READER_LITE
LIVE PREVIEW

Identification of fever and vaccine- associated gene interaction - - PowerPoint PPT Presentation

Identification of fever and vaccine- associated gene interaction networks using ontology-based literature mining Arzucan zgr Bogazici University Junguk Hur, Zuoshuang Xiang, and Yongqun Oliver He University of Michigan The VDOSME


slide-1
SLIDE 1

Identification of fever and vaccine- associated gene interaction networks using ontology-based literature mining

Arzucan Özgür

Bogazici University

Junguk Hur, Zuoshuang Xiang, and Yongqun Oliver He

University of Michigan The VDOSME workshop, ICBO 2012 July 21, 2012

slide-2
SLIDE 2

Motivation

slide-3
SLIDE 3

Fever

Fever is a symptom of abnormal elevation of body

temperature, usually as a result of a pathologic process.

Fever-associated genes include PGE2, PLA2, COX-2,

PTGES, and many cytokines

Many vaccines cause fever, but how vaccination

perturbs which fever-related genes is unclear

Goal: Identify gene-gene and gene-vaccine

interaction networks that are associated with fever processes using ontology-based literature mining

slide-4
SLIDE 4

Workflow

slide-5
SLIDE 5

Fever-related literature-derived network

slide-6
SLIDE 6

Literature Corpus

Fever-related articles obtained from PubMed:

“Fever OR Hyperthermia OR Pyrexia OR Febrile OR Pyrexial”

→ 179,156 articles

Vaccine and fever-related articles:

including the terms “vaccine”, “vaccination”, and their variants

(e.g., “vaccines”) 6,224 articles →

including 186 specific vaccine names from VO

6,537 articles →

Sentences of titles and abstracts obtained from:

BioNLP database in the National Center for Integrative

Biomedical Informatics (NCIBI; http://ncibi.org/)

slide-7
SLIDE 7

Vaccine Ontology Support - Motivating Example

  • These results suggest that the BCG-CWS induces TNF-alpha

secretion from DC via TLR2 and TLR4 and that the secreted TNF- alpha induces the maturation of DC per se. [PMID: 11083809] –The term “vaccine” or its variants does not occur in the abstract. –Bacillus Calmette-Guérin (BCG) is a licensed tuberculosis vaccine to protect against infection of Mycobacterium tuberculosis.

We use vaccine ontology for two purposes: 1) Obtain vaccine-related literature. 2) Identify specific vaccine-gene interactions.

slide-8
SLIDE 8

Vaccine Ontology (VO)

Ontology of the vaccine domain for vaccine data

standardization, integration, and analysis.

 http://www.violinet.org/vaccineontology/

Classifies a large number of existing vaccines (>

1,000 vaccines) in licensed use, on trial, or in research.

Follows the OBO Foundry principles. Led by Yongqun “Oliver” He (co-author of this

paper).

slide-9
SLIDE 9

VO Terms Obtained from http://www.ontobee.org/

slide-10
SLIDE 10

Gene and Vaccine Name Identification

Gene names tagged using SciMiner (Hur et al.,

Bioinformatics, 2009)

Dictionary and rule-based system F-score: 76%

Genes reported in terms of the official human genes

based on the HUGO Gene Nomenclature Committee database (http://www.genenames.org/).

VO-SciMiner used to identify vaccine names based on a

set of 186 VO terms (Hur et al., BMC immunology, 2011)

F-score: 95%

slide-11
SLIDE 11

Interaction Extraction

IL-2 and IL-15 induced the production of IL-17 and IFN-gamma in a dose dependent manner by PBMCs. No interaction. No interaction. interaction Path between proteins: good description of semantic relation between them.

Stanford Parser is used to generate the dependency parse trees (de Marneffe et al., 2006).

slide-12
SLIDE 12

Path Edit Kernel

Minimum number of operations (insertion, deletion, or substitution of a single

word) to transform the first string to the second.

IL2 – nsubj – induced – dobj – production – prep_of – IL-17 IL2 – nsubj – induced – dobj – production – prep_of – IL-17 – conj_and – IFN-gamma IL-17 – conj_and – IFN-gamma

Edit distance (Path1 -> Path2) = 2 (2 insertions) Edit distance (Path1 -> Path3) = 8 (6 deletions + 2 insertions)

Convert to Similarity Function:

  • Integrate as a kernel function to SVMlight package (Joachims, 1999).
  • 56% F-score on AIMED, 85% F-score on CB) (Erkan et al., EMNLP,

2007; Ozgur et al., Journal of Biomedical Semantics, 2011).

EditSim pi ,p j=e[

−γEditDistpi,p j]

slide-13
SLIDE 13

Gene-gene interaction networks

Articles containing the term “vaccine” and its variants Articles containing the term “vaccine” and its variants + terms in the Vaccine Ontology (VO) Gene-gene interactions in all fever- related articles

slide-14
SLIDE 14

Generic fever-related network

slide-15
SLIDE 15

Vaccine/VO-associated fever-related network

slide-16
SLIDE 16

Centrality Analysis

slide-17
SLIDE 17

Degree Centrality

The number of nodes a given node is connected to Measures the extent of inluence a node has on the network The more neighbors a node has, the more important it is Degree centrality of x = 5; of y = 2

y

x z

k i=∑

j=1 n

Aij

slide-18
SLIDE 18

Eigenvector Centrality

Proportional to the sum of the centralities of the neighbors of

a given node.

In matrix representation: λx = Ax For non-negative centrality vector: λ is largest eigenvalue of A and x is the corresponding eigenvector Not all neighbors contribute equally to the centrality of node Defined as “prestige” in social networks The prestige of a person depends not only on how many

friends he has, but also on who (how prestigious) his friends are

xi=λ−1∑

j= 1 n

Aij x j

slide-19
SLIDE 19

Closeness Centrality

Inverse sum of the geodesic distances from a given node to

the other nodes in the network

The closer a node to the other nodes, the more important it is.

y

x Geodesic distance: length of shortest path between node i and node j (dij)

closeness i=[∑

j= 1 n

d ij]

−1

slide-20
SLIDE 20

Betweenness Centrality

For a node i: sum over all pairs of nodes of proportion of the

number of shortest paths passing through i

Control of a node over the information flow of the network A node is important if it is on many geodesics

y

x gjk(i): # of geodesics passing over i gjk: total # of geodesics

Betweenness i=∑

j<k

g jk i/g jk

slide-21
SLIDE 21

Genes that rank high in both networks

  • 7 genes
  • well studied

in both contexts

slide-22
SLIDE 22

Genes that rank high in generic fever network

  • 7 genes
  • not well

studied in vaccine context

slide-23
SLIDE 23

Genes that rank high in vaccine/VO fever network

  • 7 genes
  • well studied

in vaccine context

slide-24
SLIDE 24

Gene Set Enrichment Analysis

slide-25
SLIDE 25

Gene Set Enrichment Analysis

The Database for Annotation, Visualization and

Integrated Discovery (DAVID) used.

997 significantly over-represented functional terms (GO

  • r KEGG) in the fever network.

239 significantly over-represented functional terms (GO

  • r KEGG) in the VO-associated fever network.

New scientific hypothesis can be generated (e.g. Role of

phosphorylation process in vaccine-induced fever response).

slide-26
SLIDE 26

Top 10 most significantly enriched biological functions

Values are –log10(Benjamini-Hochberg corrected P-values)

slide-27
SLIDE 27

Gene-Vaccine Interaction Network

slide-28
SLIDE 28

Gene-Vaccine Interactions

SVM pipeline applied to extract gene-vaccine

interactions.

1,716 articles containing 2,835 interactions identified.

–32 articles also related to fever –52 sentences with 44 unique interactions identified. –Specific vaccines: Brucella vaccine RB51, Shigella flexneri vaccine S602, Shigella sonneistrain WRSS1, and Shigella dysenteriae 1 strain WRSd1.

New scientific hypothesis can be generated.

slide-29
SLIDE 29

Fever-related gene-vaccine interaction network Green: vaccine Red: genes Blue: genes associated with vaccines New hypothesis: e.g., TLR-vaccine interactions in inducing fever response.

slide-30
SLIDE 30

Conclusion

Identification of fever and fever–vaccine associated

gene-gene and gene-vaccine networks

Improved mining performance by VO Phosphorylation-focused regulation enriched in the

fever vaccine-subnetwork suggests its crucial role

Identification of TLRs as potential key factors in

vaccine-induced fever responses.

slide-31
SLIDE 31

Future works

Expansion of the networks by

including more specific vaccines (via improved VO) using a sentence-level co-citation rather than SVM-based

approach

integrating Ontology of Adverse Events (OAE;

http://www.oae-ontology.org)

Apply the ontology-based literature mining approach

to different domains.

slide-32
SLIDE 32

Funding: NIH grant R01AI081062 & Marie Curie Career Integration Grant within the 7th European Community Framework Programme.

University of Michigan

Junguk Hur Yongqun He Zoushuang Xiang Rebecca Racz  Dragomir Radev  Eva Feldman  Alex Ade  Brian Athey

Acknowledgments

slide-33
SLIDE 33

Thank you!