Improving T ext Mining with Controlled Natural Language: A Case - - PowerPoint PPT Presentation

improving t ext mining with controlled natural language a
SMART_READER_LITE
LIVE PREVIEW

Improving T ext Mining with Controlled Natural Language: A Case - - PowerPoint PPT Presentation

Improving T ext Mining with Controlled Natural Language: A Case Study for Protein Interactions T obias Kuhn (speaker) Loc Royer Norbert E. Fuchs Michael Schroeder DILS'06, Hinxton (UK) 21 July 2006 Cooperation of University of Zurich


slide-1
SLIDE 1

Improving T ext Mining with Controlled Natural Language: A Case Study for Protein Interactions

T

  • bias Kuhn (speaker)

Loïc Royer Norbert E. Fuchs Michael Schroeder DILS'06, Hinxton (UK) 21 July 2006

slide-2
SLIDE 2

T

  • bias Kuhn, DILS'06, Hinxton (UK), 21 July 2006

2

Cooperation of

University of Zurich

(Norbert E. Fuchs, T

  • bias Kuhn)

and

TU Dresden

(Loïc Royer, Michael Schroeder)

slide-3
SLIDE 3

T

  • bias Kuhn, DILS'06, Hinxton (UK), 21 July 2006

3

Introduction

 Biomedical literature is growing at a

tremendous pace

 PubMed contains 16 million articles and

grows by over 600'000 articles per year

 Computational support is needed!

slide-4
SLIDE 4

T

  • bias Kuhn, DILS'06, Hinxton (UK), 21 July 2006

4

T

  • day's Solution

NLP, manual annotation

slide-5
SLIDE 5

T

  • bias Kuhn, DILS'06, Hinxton (UK), 21 July 2006

5

Our Approach

 Let the researchers express their own

results in a formal language

 Perfect processing of scientific results by

computers

 This formal language has to be ...

 easy to learn and understand  expressive enough to express even

complicated scientific results

slide-6
SLIDE 6

T

  • bias Kuhn, DILS'06, Hinxton (UK), 21 July 2006

6

Knowledge Representation Languages

OWL with RDF/XML Description Logics first-order logic ACE UML

has

slide-7
SLIDE 7

T

  • bias Kuhn, DILS'06, Hinxton (UK), 21 July 2006

7

Attempto Controlled English (ACE)

 Formal language that looks like natural

English

 Unambiguously translatable into first-

  • rder logic

 Restricted grammar  Unlimited vocabulary  www.ifi.unizh.ch/attempto

slide-8
SLIDE 8

T

  • bias Kuhn, DILS'06, Hinxton (UK), 21 July 2006

8

Formal Summaries

slide-9
SLIDE 9

T

  • bias Kuhn, DILS'06, Hinxton (UK), 21 July 2006

9

Formal Summaries

BubR1 interacts-with a trunk-domain of Beta2-Adaptin.

[A, B, C, D] named(A, BubR1)-1

  • bject(A, atomic, named_entity, object, cardinality, count_unit, eq, 1)-1

named(B, Beta2-Adaptin)-1

  • bject(B, atomic, named_entity, object, cardinality, count_unit, eq, 1)-1
  • bject(C, atomic, trunk-domain, unspecified, cardinality, count_unit, eq, 1)-1

relation(C, trunk-domain, of, B)-1 predicate(D, unspecified, interact_with, A, C)-1

ACE text Logical representation (DRS)

slide-10
SLIDE 10

T

  • bias Kuhn, DILS'06, Hinxton (UK), 21 July 2006

10

Ontology for Protein Interactions

slide-11
SLIDE 11

T

  • bias Kuhn, DILS'06, Hinxton (UK), 21 July 2006

11

Empirical Study

 “How suitable is ACE together with our

  • ntology to express scientific results of

protein interactions?”

 Manual translation of 273 facts about

protein interactions

 These facts are subheadings of the

“Results”-sections of 89 articles (journals by Elsevier)

slide-12
SLIDE 12

T

  • bias Kuhn, DILS'06, Hinxton (UK), 21 July 2006

12

Empirical Study

154 57 62 matched perfectly matched partially unmatched not covered by the model relations of relations fuzzy 21 56 11 31 not understood

Total: Non-perfect:

slide-13
SLIDE 13

T

  • bias Kuhn, DILS'06, Hinxton (UK), 21 July 2006

13

Authoring tool

 Helps writing ACE sentences  Shows step by step the possible

continuations of the sentence

 New words can be created on-the-fly  Awareness of the underlying ontology  The users do not need to know the details

  • f the ACE syntax and of the underlying
  • ntology
slide-14
SLIDE 14

T

  • bias Kuhn, DILS'06, Hinxton (UK), 21 July 2006

14

Authoring tool: Prototype demo

http://gopubmed.biotec.tu-dresden.de/AceWiki/

slide-15
SLIDE 15

T

  • bias Kuhn, DILS'06, Hinxton (UK), 21 July 2006

15

Benefits of our Approach

 Consistency / redundancy checks

 “Is there a paper that contradicts my results?”  “Is there a paper that comes to the same or similar

results?”

 Answer extraction

 “Which proteins interact with a certain domain of

protein X?”

 Automatically updated knowledge bases

 “Give me an overview of the relations of a protein X

to other proteins!”

slide-16
SLIDE 16

T

  • bias Kuhn, DILS'06, Hinxton (UK), 21 July 2006

16

Conclusions

 Formal summaries for scientific articles

can make text mining easier and more powerful

 ACE combines the power of ontologies

with the convenience of natural language

 Let the researchers formalize their own

results!

slide-17
SLIDE 17

T

  • bias Kuhn, DILS'06, Hinxton (UK), 21 July 2006

17

Thank you for your attention! Questions & Discussion

slide-18
SLIDE 18

T

  • bias Kuhn, DILS'06, Hinxton (UK), 21 July 2006

18

Subheadings: Example

slide-19
SLIDE 19

T

  • bias Kuhn, DILS'06, Hinxton (UK), 21 July 2006

19

Degree of Matching: Examples

 Matched perfectly:

 Interaction of Act1 with TRAF6 

→ Act1 interacts-with TRAF6.

 Matched partially:

 The mtFabD protein is part of the core of the FAS-II

complex

→ MtFabD is a subunit of FAS-II.

 Unmatched:

 Cav1 interacts differentially with distinct Dyn2 forms

slide-20
SLIDE 20

T

  • bias Kuhn, DILS'06, Hinxton (UK), 21 July 2006

20

Reasons for Non-perfect Matching: Examples

 Not covered by the model:

 Daxx Potentiates Fas-Mediated Apoptosis

 Relations of relations:

 Kal-GEF1 activation of Pak does not require GEF activity

 Fuzzy:

 ANKRD1 contains potential CASQ2 binding sequences

located in both its NT- and CT-regions

 Not understood:

 hSrb7 does not interact with other nuclear receptors