MIMIR: Multi-paradigm Information Management Index and Repository - - PowerPoint PPT Presentation

mimir multi paradigm information management index and
SMART_READER_LITE
LIVE PREVIEW

MIMIR: Multi-paradigm Information Management Index and Repository - - PowerPoint PPT Presentation

MIMIR: Multi-paradigm Information Management Index and Repository Valentin Tablan Niraj Aswani, Ian Roberts University of Sheffield University of Sheffield, NLP MIMIR is an IR engine that can search over: Text Semantic


slide-1
SLIDE 1

MIMIR: Multi-paradigm Information Management Index and Repository

Valentin Tablan Niraj Aswani, Ian Roberts

University of Sheffield

slide-2
SLIDE 2

2 University of Sheffield, NLP 2009 GATE Summer School, Sheffield

MIMIR

□ … is an IR engine that can search over:

○ Text ○ Semantic Annotations ○ Ontologies and Knowledge Bases

...represented as GATE documents

□ … is built on top of:

○ Ontotext ORDI ○ MG4J text indexing engine

slide-3
SLIDE 3

3 University of Sheffield, NLP 2009 GATE Summer School, Sheffield

Semantic Annotation

□ … is an annotation process where [parts of] the schema (annotation types, annotation features) are ontological objects. □ … is different from:

○ Ontology learning ○ Ontology population (though it sometimes includes it)

slide-4
SLIDE 4

4 University of Sheffield, NLP 2009 GATE Summer School, Sheffield

Semantic Annotation

slide-5
SLIDE 5

5 University of Sheffield, NLP 2009 GATE Summer School, Sheffield

Under the Hood

Document Collection Ontology + Knowledge Base Mentions Index Token Index Token Index Token Index

...

slide-6
SLIDE 6

6 University of Sheffield, NLP 2009 GATE Summer School, Sheffield

A Mimir Configuration

□ Text fields

○ string (the document text, downcased) ○ root (morphological root of each word) ○ category (part-of-speech of each word)

□ Annotations

○ Measurement (indexed features: type, dimension) ○ Reference (indexed feature: type) ○ Section (indexed feature: type)

slide-7
SLIDE 7

7 University of Sheffield, NLP 2009 GATE Summer School, Sheffield

Query Types (basic)

□ Text. Matches plain text.

Syntax: sequence of words Example: device for measurement of light intensity

□ Annotation. Matches annotations.

Syntax: {Type feature1=value1 feature2=value2...} Example: {Measurement type=scalarValue}

□ Sequence Query. Sequence of other queries.

Syntax: Query1 [n..m] Query2... Example: up to {Measurement} [1..5] {Measurement}

slide-8
SLIDE 8

8 University of Sheffield, NLP 2009 GATE Summer School, Sheffield

Query Types (inclusion)

□ IN Query. Hits of one query only if in hits of another.

Syntax: Query1 IN Query2 Example: London IN {Reference}

□ OVER Query. Hits of a query, only if overlapping hits of another.

Syntax: Query1 OVER Query2 Example: {Reference} OVER London

slide-9
SLIDE 9

9 University of Sheffield, NLP 2009 GATE Summer School, Sheffield

Query Types (advanced)

□ Named Index. Search different text indexes.

Syntax: indexName:term Example: root:be [matches is, am, was, were, ...]

□ Kleene. Specified number of repeats.

Syntax: Query +n, Query +n..m Example: {Measurement}[2], category:JJ[1..3]

slide-10
SLIDE 10

10 University of Sheffield, NLP 2009 GATE Summer School, Sheffield

MIMIR ancestry: ANNIC

slide-11
SLIDE 11

11 University of Sheffield, NLP 2009 GATE Summer School, Sheffield

MIMIR v. ANNIC: Index size

ANNIC

  • v. 0.1
  • v. 0.2
  • v. 0.3
  • v. 0.4
  • v. 1.0

20 40 60 80 100 120 87.77 103.04 21.51 6.9 8.33 0.82

Index Size (times raw input)

slide-12
SLIDE 12

12 University of Sheffield, NLP 2009 GATE Summer School, Sheffield

MIMIR v. ANNIC: Features

ANNIC Annotation features All (+) Only configured (-) Hit details Full (+) Text only (-) JAPE Compatible Yes (+) Partial (-) Scalability Poor (-) Very Good (+) Index Size Large (-) ~ Input (+) Search Speed Fair (-) Fast (+) Mimir

slide-13
SLIDE 13

13 University of Sheffield, NLP 2009 GATE Summer School, Sheffield

DEMO!