Large Scale Knowledge Representation of Large Scale Knowledge - - PowerPoint PPT Presentation

large scale knowledge representation of large scale
SMART_READER_LITE
LIVE PREVIEW

Large Scale Knowledge Representation of Large Scale Knowledge - - PowerPoint PPT Presentation

Large Scale Knowledge Representation of Large Scale Knowledge Representation of Distributed Biomedical Information Distributed Biomedical Information Volker St mpflen mpflen Volker St Thorsten Barnickel Thorsten Barnickel Karamfilka


slide-1
SLIDE 1

TMRA 07 TMRA 07

Large Scale Knowledge Representation of Large Scale Knowledge Representation of Distributed Biomedical Information Distributed Biomedical Information

Volker St Volker Stü ümpflen mpflen Thorsten Barnickel Thorsten Barnickel Karamfilka Karamfilka Nenova Nenova

MIPS / Institute MIPS / Institute for for Bioinformatics Bioinformatics GSF GSF – – National Research Center National Research Center for for Environment Environment and and Health Health

slide-2
SLIDE 2

TMRA 07 TMRA 07

Understanding Understanding Complex Complex Biological Biological Systems Systems

Data K n

  • w

l e d g e

slide-3
SLIDE 3

TMRA 07 TMRA 07

Systems Systems Biology Biology

slide-4
SLIDE 4

TMRA 07 TMRA 07

Questions Questions

  • Different

Different knowledge knowledge domains domains ? ?

  • Ontologies

Ontologies for for semantic semantic structuring structuring ? ?

  • Semantic

Semantic structures structures from from free free text ? text ?

  • Knowledge

Knowledge representation representation from from distributed distributed resources resources ? ?

slide-5
SLIDE 5

TMRA 07 TMRA 07

Merging Merging Knowledge Knowledge from from Different Domains Different Domains

slide-6
SLIDE 6

TMRA 07 TMRA 07

Semantic Semantic Structuring Structuring Demands Demands for for Ontologies Ontologies

  • Life

Life sciences sciences have have a a long long tradition tradition in in classification classification … …

… various various ontologies

  • ntologies are

are available available and in and in use use

  • Ontologies (in

Ontologies (in the the broadest broadest sense sense): ):

  • Controlled

Controlled vocabularies vocabularies

  • Taxonomies

Taxonomies

  • Frames

Frames

  • Examples

Examples for for Ontologies: Ontologies:

  • MeSH

MeSH terms terms, Gene , Gene Ontology Ontology (GO), (GO), FunCat FunCat, , … …

  • Many

Many more more from from e.g e.g. Open . Open Biomedical Biomedical Ontologies Ontologies ( (http://obofoundry.org/ http://obofoundry.org/) )

slide-7
SLIDE 7

TMRA 07 TMRA 07

Example Example: : Extending Extending the the Functional Functional Context Context of Proteins

  • f Proteins
slide-8
SLIDE 8

TMRA 07 TMRA 07

Distributed access system

Semantic Semantic Structuring Structuring and and Knowledge Knowledge Representation Representation

  • several hundreds
  • f biomedical

resources

  • distributed
  • > 1-2 PetaByte

Web Service Web Service

Topic Map Generation Topic Map Generation

Textmining Knowledge Portal

slide-9
SLIDE 9

TMRA 07 TMRA 07

Knowledge Knowledge in Free Text in Free Text

… of pathogen response genes that prevent disease progression. The expression of ERF1 can be activated rapidly by ethylene

  • r jasmonate and can be activated synergistically by both hormones.

In addition, both signalling … Free text Topic Map

slide-10
SLIDE 10

TMRA 07 TMRA 07

REBIMET REBIMET

  • Relation

Relation Extraction Extraction from from Biomedical Biomedical Texts Texts

slide-11
SLIDE 11

TMRA 07 TMRA 07

Entity Entity Recognition Recognition

  • Identification

Identification of relevant

  • f relevant biological

biological entities entities: :

  • Based

Based on synonym

  • n synonym lists

lists created created from from terms terms in in taxonomies taxonomies, , gene gene names names, , … …. .

  • Realized

Realized with with Apaches Apaches Lucene Lucene

slide-12
SLIDE 12

TMRA 07 TMRA 07

ASSERT tool (Pradhan S. et al., 2005)

Information Information Extraction Extraction with with Semantic Semantic Role Role Labeling Labeling and and Cooccurrence Cooccurrence

  • 1. Semantic Role Labeling:

1.1 SPA structure for verb a) 1.2 SPA structure for verb b)

  • 2. Information Extraction:
slide-13
SLIDE 13

TMRA 07 TMRA 07

Simplified Simplified TM TM Representation Representation

  • Generation of Topic

Generation of Topic Map Map fragments fragments

  • Connection

Connection to to evidence evidence in text in text by by reification reification

slide-14
SLIDE 14

TMRA 07 TMRA 07

Screenshot Screenshot Portal Portal

  • PSI

PSI based based merging merging

  • f textmining
  • f textmining model

model with with genome genome model model

slide-15
SLIDE 15

TMRA 07 TMRA 07

Distributed access system

Large Large Scale Scale Integration and Integration and Knowledge Knowledge Representation Representation

Web Service Web Service

Topic Map Generation Topic Map Generation

Textmining

slide-16
SLIDE 16

TMRA 07 TMRA 07

GeKnow GeKnow: Integration of : Integration of PEDANT, SIMAP, NCBI data, NCBI PEDANT, SIMAP, NCBI data, NCBI PubMed PubMed

  • PEDANT 3 ~ 600 GB

PEDANT 3 ~ 600 GB

  • contains 450 genomes each stored in a single

contains 450 genomes each stored in a single MySQL MySQL database database

  • no possibilities for simultaneous cross genome comparison

no possibilities for simultaneous cross genome comparison

  • SIMAP ~

SIMAP ~ 540 GB 540 GB compressed compressed

  • contains over 7 Mio. unique protein sequences

contains over 7 Mio. unique protein sequences

  • NCBI

NCBI

  • Taxonomy information (some thousands)

Taxonomy information (some thousands)

  • Textmining from

Textmining from PubMed PubMed

  • 16 Mio. abstracts, 65 Mio Hits, 15 Mio. Sentences, 13 Mio. SPA

16 Mio. abstracts, 65 Mio Hits, 15 Mio. Sentences, 13 Mio. SPA structures structures

  • Integration of these data on the fly

Integration of these data on the fly

  • Semantic linking of PEDANT databases with SIMAP and NCBI

Semantic linking of PEDANT databases with SIMAP and NCBI Taxonomy Taxonomy

  • No redundant data

No redundant data

slide-17
SLIDE 17

TMRA 07 TMRA 07

How How To To Generate Generate the the Topic Topic Maps Maps ? ?

  • Problems

Problems with with generation generation of

  • f one
  • ne large TM

large TM

  • Very

Very large large data data collections collections ( (storage storage problems problems) )

  • Distributed

Distributed

  • Update

Update problems problems

Generation of TM fragments

slide-18
SLIDE 18

TMRA 07 TMRA 07

System Architecture ( System Architecture (GeKnow GeKnow) )

  • Extension of

Extension of our

  • ur n

n-

  • Tier

Tier J2EE J2EE based based component component and and service service oriented

  • riented

architecture architecture ( (EJBs EJBs and Web Services) and Web Services)

  • Simply

Simply by by adding adding some some semantic semantic components components .. ..

  • .. and

.. and one

  • ne semantic

semantic Tier Tier

slide-19
SLIDE 19

TMRA 07 TMRA 07

Concept Concept: :

  • Independent semantic layer on top of arbitrary data sources

Independent semantic layer on top of arbitrary data sources

TM

Semantic manager

(merging, fragments)

Semantic level

Web Service Web Service

Configuration

Resource manager Integration level

slide-20
SLIDE 20

TMRA 07 TMRA 07

Integration Tier Integration Tier

  • Resource

Resource: :

  • Aware

Aware of

  • f mapping

mapping between between topic topic / / association association types types and and methods methods from from data data source source

  • Handler

Handler: :

  • Proxy

Proxy

  • Manages

Manages connections connections

  • Execute

Execute query query methods methods

slide-21
SLIDE 21

TMRA 07 TMRA 07

Syntax Tier Syntax Tier – – Topic Topic Types Types

  • Converts

Converts resource resource specific specific format format into into TM TM fragments fragments

  • May

May access access multiple multiple resources resources ( (handled handled by by Resource Resource Manager) Manager)

slide-22
SLIDE 22

TMRA 07 TMRA 07

Syntax Tier Syntax Tier – – Association Types Association Types

  • Converts

Converts resource resource specific specific format format into into TM TM fragments fragments

  • May

May access access multiple multiple resources resources ( (handled handled by by Resource Resource Manager) Manager)

slide-23
SLIDE 23

TMRA 07 TMRA 07

Semantic Tier Semantic Tier

Configuration

  • Responsible

Responsible for for

  • fragment

fragment generation generation

  • Merging

Merging

  • No

No programming programming required required ( (only

  • nly configuration

configuration) )

slide-24
SLIDE 24

TMRA 07 TMRA 07

Portal / Portal / Portlets Portlets (JSR (JSR-

  • 168)

168)

slide-25
SLIDE 25

TMRA 07 TMRA 07

Portal Portal

  • Currently

Currently JSF JSF based based

  • Caused

Caused several several problems problems

  • Migration to

Migration to more more generic generic portlets portlets (XSLT (XSLT based based) )

slide-26
SLIDE 26

TMRA 07 TMRA 07

What What‘ ‘s s Left Left ? ?

  • GeKnow

GeKnow dedicated dedicated to to be be Open Open Source Source

  • Visualization

Visualization ? ?

  • Topic

Topic Maps Maps

  • Query

Query language language ? ?

  • Constraint

Constraint language language ? ?

  • OWL ?

OWL ?

  • XTM

XTM fragment fragment exchange exchange ? ?

  • Where

Where are are we we ? ? Just Just before before the the killer killer application application

  • Where

Where are are Topic Topic Maps Maps in Life in Life Sciences Sciences

  • (German) National

(German) National level level : : Helmholtz Society Helmholtz Society funded funded Systems Systems Biology Biology Initiative Initiative

  • Technology

Technology platform platform across across Helmholtz Helmholtz centers centers will will use use Topic Topic Maps Maps

slide-27
SLIDE 27

TMRA 07 TMRA 07

Conclusion Conclusion

  • Aim

Aim: : Solving Solving complex complex biomedical biomedical questions questions

  • Semantic

Semantic knowledge knowledge representation representation

  • Textming

Textming

  • Integration of

Integration of heterogenous heterogenous distributed distributed data data

  • n
  • n the

the fly fly ( (fits fits well to well to existing existing enterprise enterprise information information systems systems) )

  • Representation

Representation within within JSR JSR-

  • 168

168 portal portal/ /portlet portlet solution solution

  • Topic

Topic Maps Maps are are suited suited to to represent represent even even some some 100 100 millions millions of

  • f topics

topics / / associations associations

slide-28
SLIDE 28

TMRA 07 TMRA 07

Acknowledgements Acknowledgements

  • Filka

Filka Nenova Nenova Thorsten Barnickel Thorsten Barnickel Richard Gregory Richard Gregory Matthias Oesterheld Matthias Oesterheld Roland Arnold Roland Arnold Minh Minh-

  • Duc

Duc Truong Truong … …

  • Thomas Rattei

Thomas Rattei

  • Ulrich G

Ulrich Gü üldener ldener Martin Martin M Mü ünsterk nsterkö ötter tter

  • Andreas Ruepp and

Andreas Ruepp and the the Annotation Group Annotation Group

  • Funding

Funding Impuls Impuls-

  • und

und Vernetzungsfonds der Vernetzungsfonds der Helmholtz Helmholtz-

  • Gemeinschaft

Gemeinschaft Deutscher Deutscher Forschungszentren e.V. Forschungszentren e.V.