Search Quality Evaluation Tools and Techniques Alessandro Benedetti, - PowerPoint PPT Presentation

Search Quality Evaluation Tools and Techniques Alessandro Benedetti, Software Engineer Andrea Gazzarini, Software Engineer 2 nd October 2018

Who we are Alessandro Benedetti ▪ Search Consultant ▪ R&D Software Engineer ▪ Master in Computer Science ▪ Apache Lucene/Solr Enthusiast ▪ Semantic, NLP, Machine Learning Technologies passionate ▪ Beach Volleyball Player & Snowboarder

Who we are Andrea Gazzarini, “Gazza” ▪ Software Engineer (1999-) ▪ “Hermit” Software Engineer (2010-) ▪ Java & Information Retrieval Passionate ▪ Apache Qpid (past) Committer ▪ Husband & Father ▪ Bass Player

Sease Sea rch Se rvices ● Open Source Enthusiasts ● Apache Lucene/Solr experts Community Contributors ● ● Active Researchers ● Hot Trends : Learning To Rank, Document Similarity, Measuring Search Quality, Relevancy Tuning

Agenda ✓ Search Quality Evaluation Context overview ‣ Correctness ‣ Evaluation Measures ‣ ➢ Rated Ranking Evaluator (RRE) ➢ Future Works ➢ Q&A

Search Quality Evaluation Context Overview Search Quality Internal Factors External Factors Understandability Timeliness Reusability Modularity Maintainability Search engineering is the production of quality Efficiency Reusability Testability Extendibility search systems . Robustness Readability Maintainability Correctness Search quality (and in general software quality) is a …. huge topic which can be described using internal and external factors . In the end, only external factors matter , those that Focused on can be perceived by users and customers . But the Primarily focused on key for getting optimal levels of those external factors are the internal ones . One of the main differences between search and software quality (especially from a correctness perspective) is in the ok / ko judgment , which is, in general, more “deterministic” in case of software development.

Search Quality Evaluation : Correctness New system Existing system Correctness Here are the requirements We need to improve our search Ok system, users are complaining about junk in search results. Ok v0.1 Correctness is the ability of a system to perform its … exact task, as defined by its specification. Cool! v0.9 v1.1 V1.0 has been released Search domain is critical from this perspective because correctness depends on arbitrary user judgments. v1.2 For each internal (gray) and external (red) iteration a month later… v2.0 we need to find a way to measure the correctness . v1.3 We found a bug We have a change request. Evaluation measures for an information retrieval system are used to assert how well the search results … satisfied the user's query intent. v2.0 How can we know where our system is going between versions, in terms of correctness?

Search Quality Evaluation / Measures Evaluation Measures Evaluation Measures Online Measures Offline Measures Click-through rate Precision Recall F-Measure NDCG Evaluation measures for an information retrieval Session abandonment rate Mean Reciprocal Rank system try to formalise how well a search system Zero result rate satisfies its user information needs. Average Precision …. Session success rate …. Measures are generally split into two categories: online and offline measures . In this context we will focus on offline measures. We will talk about something that can help a search engineer during his ordinary day (i.e. in those phases We are mainly focused here previously called “ internal iterations ”) We will also see how the same tool can be used for a broader usage, like contributing in the continuous integration pipeline or even for delivering value to functional stakeholders .

Agenda ➢ Search Quality Evaluation ✓ Rated Ranking Evaluator (RRE) What is it? ‣ How does it work? ‣ Evaluation Process Input & Output ‣ Challenges ‣ ➢ Future Works ➢ Q&A

RRE : What is it? https://github.com/SeaseLtd/rated-ranking-evaluator RRE: What is it? • A set of search quality evaluation tools • A search quality evaluation framework • Multi ( search ) platform • Written in Java • It can be used also in non-Java projects • Licensed under Apache 2.0 • Open to contributions • Extremely dynamic!

RRE : At a glance 2 10 48950 2 10 Apache Lucene/Solr London ________________________________________________________________________________________________________________________________________________________________________________________________________________ ________________________________________________________________________________________________________________________________________________________________________________________________________________ ________________________________________________________________________________________________________________________________________________________________________________________________________________ ________________________________________________________________________________________________________________________________________________________________________________________________________________ ____________ ____________ ____________ ____________ ________________________________________________________________________________________________________________________________________________________________________________________________________________ People Modules Lines of Code ____________ Months Modules 2 10 67317 5 10 ________________________________________________________________________________________________________________________________________________________________________________________________________________ ________________________________________________________________________________________________________________________________________________________________________________________________________________ ________________________________________________________________________________________________________________________________________________________________________________________________________________ ________________________________________________________________________________________________________________________________________________________________________________________________________________ ____________ ____________ ________________________________________________________________________________________________________________________________________________________________________________________________________________ ____________ ____________ ____________ People Modules Lines of Code Months Modules

RRE : Ecosystem RRE Ecosystem Archetypes Plugin Plugin The picture illustrates the main modules composing Reporting Plugin the RRE ecosystem. All modules with a dashed border are planned for a Plugin future release . Plugin RRE CLI has a double border because although the rre-cli module hasn’t been developed, you can run CORE RRE from a command line using RRE Maven Search archetype , which is part of the current release. Platform RequestHandler API As you can see, the current implementation includes two target search platforms: Apache Solr and RRE Server RRE CLI Elasticsearch . The Search Platform API module provide a search platform abstraction for plugging-in additional search systems. Plugin

RRE : Available metrics Precision Available Metrics Recall Precision at 1 (P@1) Precision at 2 (P@2) These are the RRE built-in metrics which can be used out of the box. Precision at 3 (P@3) The most part of them are computed at query level Precision at 10 (P@10) and then aggregated at upper levels. Average Precision (AP) However, compound metrics (e.g. MAP, or GMAP ) are not explicitly declared or defined, because the Reciprocal Rank computation doesn’t happen at query level. The result Mean Reciprocal Rank of the aggregation executed on the upper levels will automatically produce these metric. Mean Average Precision (MAP) For example, the Average Precision computed for Normalised Discounted Cumulative Gain Q1, Q2, Q3, Qn becomes the Mean Average Precision at Query Group or Topic levels. Compound Metric F-Measure

RRE : Domain Model (1/2) RRE Domain Model Evaluation Top level domain entity 1..* Corpus Test dataset / collection RRE Domain Model is organized into a composite / 1..* tree-like structure where the relationships between Topic Information need entities are always 1 to many . The top level entity is a placeholder representing an 1..* Query Group Query variants evaluation execution . Versioned metrics are computed at query level and 1..* Query Queries then reported, using an aggregation function , at upper levels. The benefit of having a composite structure is clear: v1.0 v1.1 v1.2 v1.n we can see a metric value at different levels (e.g. a F-MEASURE F-MEASURE F-MEASURE … F-MEASURE query, all queries belonging to a query group, all P@10 AP P@10 AP P@10 AP P@10 AP queries belonging to a topic or at corpus level) NDCG …. NDCG …. NDCG …. NDCG ….

Search Quality Evaluation Tools and Techniques Alessandro Benedetti, - PowerPoint PPT Presentation

Search Quality Evaluation Tools and Techniques Alessandro Benedetti, Software Engineer Andrea Gazzarini, Software Engineer 2 nd October 2018 Who we are Alessandro Benedetti Search Consultant R&D Software Engineer Master in

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Informed search algorithms Outline Best-first search Greedy best-first search A *

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

Uninformed Search 2 Informed Search Rest of blind search An informed search strategyone

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte

Elastic Search - Aditi Choksi (EW18455) Elastic Search Search engine Distributed

2 EBI Search 3 EBI Search 4 EBI

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Search Algorithms 3 AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 1 3 Search Algorithms

Query DB structures Manipulation queries DB search Hits Memory search 2 Standardization of

Search 3 AI Slides (5e) c Lin Zuoquan@PKU 2003-2019 3 1 3 Search 3.1 Problem-solving

Informed Search strategies AIMA sections 3.5, 3.6 Summary Informed Search strategies

Search Overview Introduction to Search Blind Search Techniques Heuristic Search

RAW CASHEW NUT QUALITY RAW CASHEW NUT QUALITY RAW CASHEW NUT QUALITY RAW CASHEW NUT QUALITY RAW

A D R U PA L E R S G U I D E T O M A R K E T I N G @dgorton #Marketing4Drupalers D R

Making the Leap John Donham Raph Koster Game Developers Conference Online October 2010 All data

Measurement and Analysis of OSN Ad Auctions Yabing Liu Chloe Kliman-Silver Robert Bell

Performability at Yahoo Search Amr Awadallah and a bunch of other yahoos amr@yahoo-inc.com Now,

A glimpse to sponsored search auctions Maria Serna Fall 2016 AGT-MIRI Sponsored search Keyword

Sponsored Search Auctions G. Amanatidis Based on slides by A. Voudouris Single-item auctions

Business in the age of exponentialism Johan Thorbjrnsson Country Manager Norway, DoubleClick

Email Marketing Rocks Lets find out why Thank you to our supporters Webinar reminders You