Structured Document Retrieval Benjamin Piwowarski DCC October 28, - - PowerPoint PPT Presentation

structured document retrieval
SMART_READER_LITE
LIVE PREVIEW

Structured Document Retrieval Benjamin Piwowarski DCC October 28, - - PowerPoint PPT Presentation

Structured Document Retrieval Benjamin Piwowarski DCC October 28, 2004 B. Piwowarski (DCC) Structured Document Retrieval October 28, 2004 1 / 55 General Outline Structured Document Retrieval Motivations Concepts Retrieval Systems


slide-1
SLIDE 1

Structured Document Retrieval

Benjamin Piwowarski

DCC

October 28, 2004

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 1 / 55

slide-2
SLIDE 2

General Outline

Structured Document Retrieval Motivations Concepts Retrieval Systems “Content Only” queries “Content And Structure” queries Evaluation Assessments Metrics Conclusion Summary Bibliography

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 2 / 55

slide-3
SLIDE 3

Structured Document Retrieval Motivations

Outline

Structured Document Retrieval Motivations Concepts Retrieval Systems “Content Only” queries “Content And Structure” queries Evaluation Assessments Metrics Conclusion Summary Bibliography

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 3 / 55

slide-4
SLIDE 4

Structured Document Retrieval Motivations

Motivations for SDR

Fact

◮ Traditional IR is about finding relevant documents to a user’s

information need, e.g. entire book.

◮ SDR allows users to retrieve document components that are more

focussed to their information needs (ex. a chapter of a book instead of an entire book).

◮ The structure of documents is exploited to identify which

document components to retrieve.

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 4 / 55

slide-5
SLIDE 5

Structured Document Retrieval Motivations

Aims of SDR

Aim of SDR is to return

◮ document components of varying granularity (e.g. a book, a

chapter, a section, a paragraph, a table, a figure, etc)

◮ relevant to the user’s information need both with regards to

content and structure

Fact

◮ SDR involves the same tasks as in the conceptual model for IR ◮ but with different inner functionality (e.g. indexing, query

formulation, retrieval, result presentation, feedback, ...)

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 5 / 55

slide-6
SLIDE 6

Structured Document Retrieval Motivations

SDR Concepts

Like in IR

◮ Indexation of queries and documents into an adequate

representation

◮ A score (RSV) between the query and the document

representations

◮ Feedback can be used both to update document or query

representations

But

◮ Document and possibly queries are structured

Vector Space Models are not anymore adequate

◮ Feedback is (for now) not used

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 6 / 55

slide-7
SLIDE 7

Structured Document Retrieval Concepts

Outline

Structured Document Retrieval Motivations Concepts Retrieval Systems “Content Only” queries “Content And Structure” queries Evaluation Assessments Metrics Conclusion Summary Bibliography

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 7 / 55

slide-8
SLIDE 8

Structured Document Retrieval Concepts

Queries for SDR I

Content-only (CO) queries

◮ Standard IR queries but here we are retrieving document

components

◮ “Santiago metro”

Structure-only queries

◮ Usually not that useful from an IR perspective ◮ “Paragraph containing a diagram next to a table”

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 8 / 55

slide-9
SLIDE 9

Structured Document Retrieval Concepts

Queries for SDR II

Content-and-structure (CAS) queries

◮ Put on constraints on which types of components are to be

retrieved E.g. “Sections of an article in the Mercurio about congestion charges”

◮ E.g. “Articles that contain sections about congestion charges in

Santiago, and that contain a picture of Joaquin Jose Lavin Infante”

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 9 / 55

slide-10
SLIDE 10

Structured Document Retrieval Concepts

Queries: examples I

CO query

<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE inex_topic SYSTEM "topic.dtd"> <inex_topic topic_id="162" query_type="CO" ct_no="1"> <title> Text and Index Compression Algorithms </title> <description>Any type of coding algorithm for text and index compression</description> <narrative>We have developed an information retrieval system implementing compression techniques for indexing documents. We are interested in improving the compression rate of the system preserving a fast access and decoding of the data. A relevant document/component should introduce new algorithms or compares the performance of existing text-coding techniques for text and index compression. A document/component discussing the cost of text compression for text coding and decoding is highly relevant. Strategies for dictionary compression are not relevant.</narrative> <keywords>text compression, text coding, index compression algorithm</keywords> </inex_topic>

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 10 / 55

slide-11
SLIDE 11

Structured Document Retrieval Concepts

Queries: examples II

CAS query

<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE inex_topic SYSTEM "topic.dtd"> <inex_topic topic_id="128" query_type="CAS" ct_no="22"> <title>//article[about(., intelligent transport systems)]//sec[about(.,

  • n-board route planning navigation system for automobiles)]</title>

<description>Find discussions about on-board route planning or navigation systems which are in publications about intelligent transport systems for automobiles.</description> <narrative>I’m interested in information about on board route planning

  • r navigation systems for automobiles.

Relevant elements discuss either a requirement analysis or a concrete implementation of such a system. Elements about navigation or route planning systems that cannot be accessed within the automobile will not be considered relevant. Systems of other phenomena than automobiles will also not be judged relevant.</narrative> <keywords>in-vehicle systems, vehicle intelligence, vehicle information systems, traffic information services, vehicle-mounted equipment</keywords> </inex_topic>

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 11 / 55

slide-12
SLIDE 12

Structured Document Retrieval Concepts

Documents

In general, any document can be considered structured according to one or more structure-type

◮ Linear order of words, sentences, paragraphs ◮ Hierarchy or logical structure of a book’s chapters, sections ◮ Links (hyperlink), cross-references, citations ◮ Temporal and spatial relationships in multimedia documents

Fact

◮ We only consider the logical structure ◮ Documents are in XML (eXtended Markup Language) ◮ Query languages:

◮ Keywords ◮ XPath-like (XPath, XQL, XQuery) ◮ Proximal nodes

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 12 / 55

slide-13
SLIDE 13

Structured Document Retrieval Concepts

Relevance

Definition

◮ Exhaustivity: describes the extent to which the document

component discusses the query.

◮ Specificity: describes the extent to which the document

component focuses on the query.

relevant irrelevant exhaustivity + = specificity

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 13 / 55

slide-14
SLIDE 14

Retrieval Systems “Content Only” queries

Outline

Structured Document Retrieval Motivations Concepts Retrieval Systems “Content Only” queries “Content And Structure” queries Evaluation Assessments Metrics Conclusion Summary Bibliography

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 14 / 55

slide-15
SLIDE 15

Retrieval Systems “Content Only” queries

Models

Score Propagation

◮ Extension of boolean models (p-norm) ◮ Extension of VSM

Term Weight Propagation

◮ Term Selection ◮ Aggregation

→maximum, augmentation, LM, ...

”Moving” Corpus

◮ The elements are grouped in e-collections ◮ Statitistics are computed on these e-collections

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 15 / 55

slide-16
SLIDE 16

Retrieval Systems “Content Only” queries

Augmentation

Principle

◮ Some nodes are elementary elements (answers) ◮ Aggregate weights of children (begining with elementary

elements)

0.5 example 0.7 syntax 0.8 XPath 0.3 XPath 0.15 syntax 0.21 example 0.47 section section chapter 0.3 0.3

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 16 / 55

slide-17
SLIDE 17

Retrieval Systems “Content Only” queries

Language Models

P(Q|ΘE) =

  • ω∈{q1,...,qn}

P(ω|θE)

Estimating P(ω|θE)

◮ Mixture of element- and collection-specific estimates ◮ Then, mixture of language models document body section1 title abstract section2 0.5 0.5 0.33 0.33 0.33 P(bird|title)=1 P(dog|sec1)=0.7 P(cat|sec1)=0.3 P(cat|sec2)=0.6 P(dog|sec2)=0.4 P(dog|body)=0.55 P(cat|document)=0.15 P(dog|document)=0.18 P(bird|document)=0.33 P(cat|body)=0.45

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 17 / 55

slide-18
SLIDE 18

Retrieval Systems “Content Only” queries

Bayesian Networks: Structure

corpus ... Journal collection 1 Journal collection 2 ... books[1] (1995) books[2] (1996) ... journal[1] journal[2] ... title article[1] article[2] ... fm bdy bm ... ... ...

Components

◮ Fixed structure = corpus

structure

◮ Parameters ◮ Baseline models

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 18 / 55

slide-19
SLIDE 19

Retrieval Systems “Content Only” queries

Bayesian Networks: Local Inference

Query Baseline model 1 for Y (M1) Baseline model 2 for Y (M2) Baseline model 1 for X (M1) Baseline model 2 for X (M2) Parent (Y) Doxel X Ancestors of Y Chidren of X

Variables

◮ Query: vector of frequencies ◮ Baseline models: binary {relevant, not relevant} ◮ Element: {not relevant, too big, SDR-relevant}

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 19 / 55

slide-20
SLIDE 20

Retrieval Systems “Content Only” queries

Bayesian Networks: learning

What?

◮ Parameters ( =

⇒ CPT)

◮ Adaptation to specific corpora/query types

How?

◮ Set of queries + associated assessments ◮ Algorithms

◮ Expectation/Maximisation (EM) ◮ Cross-Entropy with gradient ascent ◮ Order-based criterions

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 20 / 55

slide-21
SLIDE 21

Retrieval Systems “Content And Structure” queries

Outline

Structured Document Retrieval Motivations Concepts Retrieval Systems “Content Only” queries “Content And Structure” queries Evaluation Assessments Metrics Conclusion Summary Bibliography

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 21 / 55

slide-22
SLIDE 22

Retrieval Systems “Content And Structure” queries

Models

Fragment Queries

Query Fragment of an XML document Search Match of the two representations

XPath / Algebra based

Query An XPath-like expression Search

  • 1. Transformation into an algebraic expression
  • 2. An event is associated to each element
  • 3. Score = probability of the event
  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 22 / 55

slide-23
SLIDE 23

Retrieval Systems “Content And Structure” queries

Fragments: JuruXML

A modified VSM

RSV(q, d) = 1 |q||d|

  • (ti,cq

i )∈q

  • (tj,cd

ij )∈q

ωq(ti, cq

i ) · ωd(ti, cd i ) · cr(cq i , cd i )

noting c•

  • a path and t• a term.

Example

cr(c1, c2) = 1+length(c1)

1+length(c2)

if c1 is a subsequence of c2

  • therwise
  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 23 / 55

slide-24
SLIDE 24

Retrieval Systems “Content And Structure” queries

Fragments: Language Models / Dynamic TF-IDF

Idea

◮ Take into account the structural conditions ◮ The term weight depends on the element types

TF-IDF The collection is defined by elements sharing the same “path” LM Element-specific LM

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 24 / 55

slide-25
SLIDE 25

Retrieval Systems “Content And Structure” queries

Algebra: ELIXIR

ELIXIR

◮ An extension of WHIRL ◮ Path-based language similar to XQuery ◮ Vague predicate for text (~)

RSV(q, d) =

  • cos(vj, c)
  • cos(vj, vk)
  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 25 / 55

slide-26
SLIDE 26

Retrieval Systems “Content And Structure” queries

Algebra: XIRQL / S-BN

Extension of XPath

◮ Weighting and ranking ◮ Data types with vague predicates

Principle

◮ A query is transformed into an event for each retrievable element ◮ The probability of the event is the score of the element

XIRQL An event ∼ a term occurence S-BN Using a BN network (event = relevance to a query composed of keywords)

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 26 / 55

slide-27
SLIDE 27

Retrieval Systems “Content And Structure” queries

Algebra: example

//image[../p[about(.,"cat pictures")]] ↓ child(rel(cat picture) ∩ label(p)) ∩ label(image) ∩ desc(d) ↓ a is relevant ≡ a ∈ label(image) ∧

  • b∈pa(a)

(b ∈ rel(q1) ∧ b ∈ label(p) ∧ b ∈ desc(d))

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 27 / 55

slide-28
SLIDE 28

Evaluation

Problematic

GOAL Develop collections to evaluate systems

But... contrary to IR: elements are nested

◮ Binary relevance scale is not enough

= ⇒ A new scale

◮ Elements relevance are interdependents

= ⇒ Constraints on assessments

◮ Standard metrics are not adapted

= ⇒ new metrics

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 28 / 55

slide-29
SLIDE 29

Evaluation

INEX initiative

INitiative for the Evaluation of XML Retrieval

◮ Since 2002 ◮ System-centred evaluation of effectiveness of XML retrieval

approaches

◮ 30 to 40 institutions each year ◮ Collaborative effort (participants contribute to the development of

the collection)

◮ Similar methodology as for TREC is followed, but adapted to XML

retrieval.

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 29 / 55

slide-30
SLIDE 30

Evaluation

INEX collection

Fact

◮ Documents (~500MB), 12,107 articles in XML format from the

IEEE Computer Society

◮ Topics:

2002 30 CO and 30 CAS 2003 32 CO and 32 CAS 2004 33 CO and 24 CAS (for now)

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 30 / 55

slide-31
SLIDE 31

Evaluation Assessments

Outline

Structured Document Retrieval Motivations Concepts Retrieval Systems “Content Only” queries “Content And Structure” queries Evaluation Assessments Metrics Conclusion Summary Bibliography

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 31 / 55

slide-32
SLIDE 32

Evaluation Assessments

The Interface

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 32 / 55

slide-33
SLIDE 33

Evaluation Assessments

Passive Rules

Ensure the exhaustivity

◮ Assess the relevance of all the elements (>8M in IEEE)?

THIS IS NOT POSSIBLE

◮ Hypothesis:

Highly specific elements might be near to a submitted element

Some rules

◮ When an element has been assessed as not relevant (E0S0), no

element is added to the pool.

◮ When an element has been assessed as highly specific (E*S3),

  • nly its ancestors are added to the pool.
  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 33 / 55

slide-34
SLIDE 34

Evaluation Assessments

Active Rules

Ensure the consistency

◮ Elements within a document are not independant ◮ Help the user to assess ◮ Consistency of assessments

Some rules

◮ i Eyi ≥ Ex ≥ maxi (Eyi) ◮ maxi (Syi) ≥ Sx ≥ mini (Syi)

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 34 / 55

slide-35
SLIDE 35

Evaluation Metrics

Outline

Structured Document Retrieval Motivations Concepts Retrieval Systems “Content Only” queries “Content And Structure” queries Evaluation Assessments Metrics Conclusion Summary Bibliography

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 35 / 55

slide-36
SLIDE 36

Evaluation Metrics

XML-IR Metrics

The proposed metrics

Recall-precision like “Quantised” precision/recall, “Norbert Gövert” (NG) precision/recall Recall generalisation Expected Ratio of Relevant Elements

  • ther Tolerance To Irrelevance (T2I), Cumulated Gain
  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 36 / 55

slide-37
SLIDE 37

Evaluation Metrics

Stereotypical runs

Idea: emphasis on metrics differences and caveats

  • 1. Perfect
  • 2. Parent
  • 3. Ancestors
  • 4. First Child
  • 5. Biggest Child
  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 37 / 55

slide-38
SLIDE 38

Evaluation Metrics

Recall-precision

User Model

◮ The user consults every element in list order ◮ (S)he is “happy” with every kind of relevant information, even

◮ if (s)he has already seen the same content ◮ if (s)he has already seen it entirely or partly (nesting)

P(Relevant|Retrieved, Wanted = r)

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 38 / 55

slide-39
SLIDE 39

Evaluation Metrics

Quantisation

fstrict(e, s) =

  • 1

if (e, s) = (3, 3)

  • therwise

fgen(e, s) =                1 if (e, s) = (3, 3) 0.75 if (e, s) ∈ {(2, 3), (3, 2), (3, 1)} 0.5 if (e, s) ∈ {(1, 3), (2, 2), (2, 1)} 0.25 if (e, s) ∈ {(1, 2), (1, 1)}

  • therwise

(...) and 5 others one in INEX 2004!

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 39 / 55

slide-40
SLIDE 40

Evaluation Metrics

The “Recall Base”

highly relevant / specific

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 40 / 55

slide-41
SLIDE 41

Evaluation Metrics

Recall-Precision limits

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 RP_g ancestors biggest child document first child parent perfect

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 41 / 55

slide-42
SLIDE 42

Evaluation Metrics

Recall-Precision NG

User Model: classical model + ...

◮ No more relevance for already retrieved elements

recall =

  • e rel(e)(1 − size(seen part of e)

size(e)

)

  • e rel(e)

precision =

  • e spe(e)(1 − size(seen part of e)

size(e)

)

  • e(1 − size(seen part of e)

size(e)

)

Problems

◮ The measure is very instable ◮ Theoritical fundations?

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 42 / 55

slide-43
SLIDE 43

Evaluation Metrics

Recall-Precision NG

0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 RP_ng_o_g ancestors biggest child document first child parent perfect

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 43 / 55

slide-44
SLIDE 44

Evaluation Metrics

Generalised Recall

User Model

◮ R/P model ◮ Stochastic user behaviour

= ⇒ the user can navigate in the document = ⇒ the user may find an element relevant or not

◮ Relevant Information = Highly Specific elements only

GR(n) = E(Number of seen relevant elements) E(Number of relevant elements)

Limitations

◮ An equivalent of precision is missing ◮ Some parameters have to be validated

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 44 / 55

slide-45
SLIDE 45

Evaluation Metrics

Generalised Recall Runs

10 20 30 40 50 60 70 80 90 100 200 400 600 800 1000 1200 1400 1600 ERR ancestors biggest child document first child parent perfect

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 45 / 55

slide-46
SLIDE 46

Evaluation Metrics

Tolerance to Irrelevance (T2I)

User Model

◮ R/P model ◮ The user reads sequentially and stops after a certain amount of

irrelevant information

Limitations

◮ (No implementation) ◮ Some theorical and pratical problems have to be solved ◮ Some parameters have to be validated

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 46 / 55

slide-47
SLIDE 47

Conclusion Summary

Outline

Structured Document Retrieval Motivations Concepts Retrieval Systems “Content Only” queries “Content And Structure” queries Evaluation Assessments Metrics Conclusion Summary Bibliography

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 47 / 55

slide-48
SLIDE 48

Conclusion Summary

Models and methods

CO Search

◮ Well-defined task ◮ Various approaches using extensions of classical models

CAS Search

◮ The query language is still under development ◮ Vague interpretation of every type of condition

New tasks

◮ Natural Language (NLP) Queries ◮ Relevance Feedback ◮ Heterogeneous collections ◮ Interactive Retrieval

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 48 / 55

slide-49
SLIDE 49

Conclusion Summary

Evaluation

INEX

◮ XML documents: IEEE (+ others) ◮ 3 years of assessments

◮ 95 CO topics ◮ 86 CAS topics

Metrics

◮ Precision/recall ◮ Precision/recall - NG ◮ Generalised Recall ◮ Tolerance To Irrelevance ◮ Cumulated Gain

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 49 / 55

slide-50
SLIDE 50

Conclusion Bibliography

Outline

Structured Document Retrieval Motivations Concepts Retrieval Systems “Content Only” queries “Content And Structure” queries Evaluation Assessments Metrics Conclusion Summary Bibliography

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 50 / 55

slide-51
SLIDE 51

Conclusion Bibliography

General references I

http://inex.is.informatik.uni-duisburg.de:2004/ SIGIR 2002 and 2004 workshops on XML retrieval Special issue of JASIST on XML and Information Retrieval, Volume 56(2), 2002. Proceedings of the INEX Worshop (2002, 2003 and 2004) Robert Luk, H.V. Leong, Tharam Dillon, Alvin Chan, W. Bruce Croft, and James Allan. A survey in indexing and searching XML

  • documents. JASIS, 6(53) 415–437, March 2002.
  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 51 / 55

slide-52
SLIDE 52

Conclusion Bibliography

Models I

  • N. Fuhr, Großjohann. XIRQL: An XML query language based on

information retrieval concepts. ACM Transactions on Information Systems (TOIS), 22(2), 313-356, 2004. Chinenyanga, T. & Kushmerick, N. (2001) An expressive and efficient language for XML information retrieval. J. American Society for Information Science & Technology (special issue on XML and Information Retrieval).

  • Y. Mass, M. Mandelbrod, E. Amitay, Maarek Y., and A. So er.

JuruXML - an XML retrieval system at INEX 02, INEX 2003 proceedings, pages 73-90. P . Ogilvie and J. Callan. Language models and structure document retrieval (In INEX 2003 Proceedings)

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 52 / 55

slide-53
SLIDE 53

Conclusion Bibliography

Models II

  • T. Grabs and H.-J. Schek. Flexible information retrieval from XML

with PowerDBXML (In INEX 2003 proceedings) Benjamin Piwowarski and Patrick Gallinari. A bayesian network for XML information retrieval: Searching and learning with the INEX

  • collection. Information Retrieval, December 2004.
  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 53 / 55

slide-54
SLIDE 54

Conclusion Bibliography

Evaluation I

Report of the INEX’03 metrics working group", pp. 184-190 of the INEX’03 proceedings. Benjamin Piwowarski and Mounia Lalmas. Providing consistent and exhaustive relevance assessments for XML retrieval

  • evaluation. In Proceedings of the Thirteenth Conference on

Information and Knowledge Management (CIKM 2004), Washington D.C., U.S.A., November 2004.

  • G. Kazai, M. Lalmas, and Arjen P

. Vries. The Overlap Problem in Content-Oriented XML Retrieval. In SIGIR 2004.

  • N. Gövert, G. Kazai, N. Fuhr, and M. Lalmas. Evaluating the

effectiveness of content-oriented XML retrieval University of Dortmund, Computer Science, 2003.

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 54 / 55

slide-55
SLIDE 55

Conclusion Bibliography

Evaluation II

  • A. P

. de Vries, G. Kazai, M. Lalmas,Tolerance to Irrelevance: A User-effort Oriented Evaluation of Retrieval Systems without Predefined Retrieval Unit Piwowarski, B., and Gallinari, P . Expected ratio of relevant units: A measure for structured information retrieval. In INitiative for the Evaluation of XML Retrieval (INEX). Proceedings of the Second INEX Workshop (Dagstuhl, France, Dec. 2003), N. Fuhr, M. Lalmas, and S. Malik, Eds.

  • B. Piwowarski (DCC)

Structured Document Retrieval October 28, 2004 55 / 55