evaluation in information retrieval
play

Evaluation in Information Retrieval Mandar Mitra Indian Statistical - PowerPoint PPT Presentation

Evaluation in Information Retrieval Mandar Mitra Indian Statistical Institute M. Mitra (ISI) Evaluation in Information Retrieval 1 / 57 Outline Preliminaries 1 2 Metrics 3 Forums Tasks 4 Task 1: Morpheme extraction Task 2: RISOT


  1. Evaluation in Information Retrieval Mandar Mitra Indian Statistical Institute M. Mitra (ISI) Evaluation in Information Retrieval 1 / 57

  2. Outline Preliminaries 1 2 Metrics 3 Forums Tasks 4 Task 1: Morpheme extraction Task 2: RISOT Task 3: SMS-based FAQ retrieval Task 4: Microblog retrieval M. Mitra (ISI) Evaluation in Information Retrieval 2 / 57

  3. . . . . . . . . . Motivation Which is better: Heap sort or Bubble sort? M. Mitra (ISI) Evaluation in Information Retrieval 3 / 57

  4. Motivation Which is better: Heap sort or Bubble sort? vs. . Which is better? . or . M. Mitra (ISI) Evaluation in Information Retrieval 3 / 57

  5. Motivation IR is an empirical discipline. M. Mitra (ISI) Evaluation in Information Retrieval 4 / 57

  6. Motivation IR is an empirical discipline. Intuition can be wrong! “sophisticated” techniques need not be the best e.g. rule-based stemming vs. statistical stemming M. Mitra (ISI) Evaluation in Information Retrieval 4 / 57

  7. Motivation IR is an empirical discipline. Intuition can be wrong! “sophisticated” techniques need not be the best e.g. rule-based stemming vs. statistical stemming Proposed techniques need to be validated and compared to existing techniques. M. Mitra (ISI) Evaluation in Information Retrieval 4 / 57

  8. . . . Cranfield method ( CLEVERDON ET AL ., 60 S ) Benchmark data Document collection . Query / topic collection . Relevance judgments - information about which document is relevant to which query . M. Mitra (ISI) Evaluation in Information Retrieval 5 / 57

  9. Cranfield method ( CLEVERDON ET AL ., 60 S ) Benchmark data syllabus Document collection . question paper Query / topic collection . Relevance judgments - information about which document is relevant to which query . correct answers . . . . M. Mitra (ISI) Evaluation in Information Retrieval 5 / 57

  10. Cranfield method ( CLEVERDON ET AL ., 60 S ) Benchmark data syllabus Document collection . question paper Query / topic collection . Relevance judgments - information about which document is relevant to which query . correct answers . . . . Assumptions relevance of a document to a query is objectively discernible all relevant documents contribute equally to the performance measures relevance of a document is independent of the relevance of other documents M. Mitra (ISI) Evaluation in Information Retrieval 5 / 57

  11. Outline Preliminaries 1 2 Metrics 3 Forums Tasks 4 Task 1: Morpheme extraction Task 2: RISOT Task 3: SMS-based FAQ retrieval Task 4: Microblog retrieval M. Mitra (ISI) Evaluation in Information Retrieval 6 / 57

  12. Evaluation metrics Background User has an information need. Information need is converted into a query . Documents are relevant or non-relevant . Ideal system retrieves all and only the relevant documents. M. Mitra (ISI) Evaluation in Information Retrieval 7 / 57

  13. Evaluation metrics Background User has an information need. Information need is converted into a query . Documents are relevant or non-relevant . Ideal system retrieves all and only the relevant documents. Information need User System Document Collection M. Mitra (ISI) Evaluation in Information Retrieval 7 / 57

  14. Set-based metrics #( relevant retrieved ) = Recall #( relevant) #( true positives ) = #( true positives + false negatives) #( relevant retrieved ) Precision = #( retrieved) #( true positives ) = #( true positives + false positives) 1 F = α/P + (1 − α ) /R ( β 2 + 1) PR = β 2 P + R M. Mitra (ISI) Evaluation in Information Retrieval 8 / 57

  15. Metrics for ranked results (Non-interpolated) average precision Which is better? 1 Non-relevant 1 Relevant 2 Non-relevant 2 Relevant 3 Non-relevant 3 Non-relevant 4 Relevant 4 Non-relevant 5 Relevant 5 Non-relevant M. Mitra (ISI) Evaluation in Information Retrieval 9 / 57

  16. Metrics for ranked results (Non-interpolated) average precision Rank Type Recall Precision 1 Relevant 0.2 1.00 2 Non-relevant 3 Relevant 0.4 0.67 4 Non-relevant 5 Non-relevant 6 Relevant 0.6 0.50 M. Mitra (ISI) Evaluation in Information Retrieval 10 / 57

  17. Metrics for ranked results (Non-interpolated) average precision Rank Type Recall Precision 1 Relevant 0.2 1.00 2 Non-relevant 3 Relevant 0.4 0.67 4 Non-relevant 5 Non-relevant 6 Relevant 0.6 0.50 Relevant 0.8 0.00 ∞ Relevant 1.0 0.00 ∞ M. Mitra (ISI) Evaluation in Information Retrieval 10 / 57

  18. Metrics for ranked results (Non-interpolated) average precision Rank Type Recall Precision 1 Relevant 0.2 1.00 AvgP = 1 5 ( 1 + 2 3 + 3 2 Non-relevant 6 ) 3 Relevant 0.4 0.67 4 Non-relevant 5 Non-relevant (5 relevant docs. in all) 6 Relevant 0.6 0.50 Relevant 0.8 0.00 ∞ Relevant 1.0 0.00 ∞ M. Mitra (ISI) Evaluation in Information Retrieval 10 / 57

  19. Metrics for ranked results (Non-interpolated) average precision Rank Type Recall Precision 1 Relevant 0.2 1.00 AvgP = 1 5 ( 1 + 2 3 + 3 2 Non-relevant 6 ) 3 Relevant 0.4 0.67 4 Non-relevant 5 Non-relevant (5 relevant docs. in all) 6 Relevant 0.6 0.50 Relevant 0.8 0.00 ∞ Relevant 1.0 0.00 ∞ 1 i ∑ AvgP = N Rel Rank ( d i ) d i ∈ Rel M. Mitra (ISI) Evaluation in Information Retrieval 10 / 57

  20. Metrics for ranked results Interpolated average precision at a given recall point 1 Recall points correspond to N Rel N Rel different for different queries P Q1 (3 rel. docs) Q2 (4 rel. docs) 0.0 1.0 R Interpolation required to compute averages across queries M. Mitra (ISI) Evaluation in Information Retrieval 11 / 57

  21. Metrics for ranked results Interpolated average precision r ′ ≥ r P ( r ′ ) P int ( r ) = max M. Mitra (ISI) Evaluation in Information Retrieval 12 / 57

  22. Metrics for ranked results Interpolated average precision r ′ ≥ r P ( r ′ ) P int ( r ) = max 11-pt interpolated average precision Rank Type Recall Precision 1 Relevant 0.2 1.00 2 Non-relevant 3 Relevant 0.4 0.67 4 Non-relevant 5 Non-relevant 6 Relevant 0.6 0.50 Relevant 0.8 0.00 ∞ Relevant 1.0 0.00 ∞ M. Mitra (ISI) Evaluation in Information Retrieval 12 / 57

  23. Metrics for ranked results Interpolated average precision r ′ ≥ r P ( r ′ ) P int ( r ) = max 11-pt interpolated average precision R Interp. P Rank Type Recall Precision 0.0 1.00 0.1 1.00 1 Relevant 0.2 1.00 0.2 1.00 2 Non-relevant 0.3 0.67 3 Relevant 0.4 0.67 0.4 0.67 4 Non-relevant 0.5 0.50 5 Non-relevant 0.6 0.50 6 Relevant 0.6 0.50 0.7 0.00 Relevant 0.8 0.00 ∞ 0.8 0.00 Relevant 1.0 0.00 ∞ 0.9 0.00 1.0 0.00 M. Mitra (ISI) Evaluation in Information Retrieval 12 / 57

  24. Metrics for ranked results 11-pt interpolated average precision 0.0 0.2 0.4 0.6 0.8 1.0 M. Mitra (ISI) Evaluation in Information Retrieval 13 / 57

  25. Metrics for sub-document retrieval Let p r - document part retrieved at rank r rsize ( p r ) - amount of relevant text contained by p r size ( p r ) - total number of characters contained by p r T rel - total amount of relevant text for a given topic ∑ r i =1 rsize ( p i ) P [ r ] = ∑ r i =1 size ( p i ) r 1 ∑ R [ r ] = rsize ( p i ) T rel i =1 M. Mitra (ISI) Evaluation in Information Retrieval 14 / 57

  26. Metrics for ranked results Precision at k (P@k) - precision after k documents have been retrieved easy to interpret not very stable / discriminatory does not average well R precision - precision after N Rel documents have been retrieved M. Mitra (ISI) Evaluation in Information Retrieval 15 / 57

  27. Cumulated Gain Idea: Highly relevant documents are more valuable than marginally relevant documents Documents ranked low are less valuable M. Mitra (ISI) Evaluation in Information Retrieval 16 / 57

  28. Cumulated Gain Idea: Highly relevant documents are more valuable than marginally relevant documents Documents ranked low are less valuable Gain ∈ { 0 , 1 , 2 , 3 } G = ⟨ 3 , 2 , 3 , 0 , 0 , 1 , 2 , 2 , 3 , 0 , . . . ⟩ i ∑ CG [ i ] = G [ i ] j =1 M. Mitra (ISI) Evaluation in Information Retrieval 16 / 57

  29. (n)DCG DCG [ i ] = CG [ i ] if i < b DCG [ i − 1] + G [ i ] / log b i if i ≥ b M. Mitra (ISI) Evaluation in Information Retrieval 17 / 57

  30. (n)DCG DCG [ i ] = CG [ i ] if i < b DCG [ i − 1] + G [ i ] / log b i if i ≥ b Ideal G = ⟨ 3 , 3 , . . . , 3 , 2 , . . . , 2 , 1 , . . . , 1 , 0 , . . . ⟩ DCG [ i ] nDCG [ i ] = Ideal DCG [ i ] M. Mitra (ISI) Evaluation in Information Retrieval 17 / 57

  31. Mean Reciprocal Rank Useful for known-item searches with a single target Let r i — rank at which the “answer” for query i is retrieved. Then reciprocal rank = 1 /r i n 1 ∑ Mean reciprocal rank (MRR) = r i i =1 M. Mitra (ISI) Evaluation in Information Retrieval 18 / 57

  32. Assumptions All relevant documents contribute equally to the performance measures. Relevance of a document to a query is objectively discernible. Relevance of a document is independent of the relevance of other documents. M. Mitra (ISI) Evaluation in Information Retrieval 19 / 57

  33. Assumptions All relevant documents contribute equally to the performance measures. Relevance of a document to a query is objectively discernible. Relevance of a document is independent of the relevance of other documents. All relevant documents in the collection are known. M. Mitra (ISI) Evaluation in Information Retrieval 19 / 57

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend