Coping with Noisy Search Experiences
Coping with Noisy Search Experiences Pierre-Antoine Champin, Peter - - PowerPoint PPT Presentation
Coping with Noisy Search Experiences Pierre-Antoine Champin, Peter - - PowerPoint PPT Presentation
Coping with Noisy Search Experiences Coping with Noisy Search Experiences Pierre-Antoine Champin, Peter Briggs, Maurice Coyle, Barry Smyth LIRIS, Clarity, Universit e de Lyon, University College Dublin, France Ireland 16 December 2009
Coping with Noisy Search Experiences
Structure of the Talk
1 Context 2 Addressed Problem 3 Proposals and results 4 Perspectives
2 / 29
Coping with Noisy Search Experiences Context
Structure of the Talk
1 Context
Recommender Systems Context Aware Recommendation Social Search
2 Addressed Problem 3 Proposals and results 4 Perspectives
3 / 29
Coping with Noisy Search Experiences Context
HeyStaks
HeyStaks is a social context aware recommender system for web searches
4 / 29
Coping with Noisy Search Experiences Context Recommender Systems
Recommender Systems
Recommender systems aim at presenting users with information that suit their particular preferences or needs. General purpose search engines provide results based on an
- bjective measure of relevance w.r.t. the query
→ same results for everyone Recommender systems for web search aim at personalising the results of search engines.
5 / 29
Coping with Noisy Search Experiences Context Recommender Systems
HeyStaks as a Recommender System
HeyStaks is an extension for Firefox. It aims at integrating into users’ habits rather than forcing them to change. It recognizes result pages from popular search engines, and alter them in order to
promote links (move them up in the list), insert new links,
based on the user’s past search experiences.
6 / 29
Coping with Noisy Search Experiences Context Recommender Systems
HeyStaks as a Recommender System
HeyStaks is an extension for Firefox. It aims at integrating into users’ habits rather than forcing them to change. It recognizes result pages from popular search engines, and alter them in order to
promote links (move them up in the list), insert new links,
based on the user’s past search experiences. Past search experiences are acquired by :
implicit feedback: query results click-through explicit feedback: tagging page, voting, sharing
6 / 29
Coping with Noisy Search Experiences Context Recommender Systems
Recommendations in HeyStaks
7 / 29
Coping with Noisy Search Experiences Context Context Aware Recommendation
Context Aware Recommendation
Search engines provide the same results for every user. Recommender systems provide personalised results... ... but provide the same personalisation every time. Nobody is only one user... ... their need depends on the context,
especially when considering Web searches.
8 / 29
Coping with Noisy Search Experiences Context Context Aware Recommendation
Context Aware Recommendation
Search engines provide the same results for every user. Recommender systems provide personalised results... ... but provide the same personalisation every time. Nobody is only one user... ... their need depends on the context,
especially when considering Web searches.
My searches are sometimes related to my research, my teaching, my leisure... → need for different recommendations in different contexts.
8 / 29
Coping with Noisy Search Experiences Context Context Aware Recommendation
Search Staks
A search stak is a repository of search experiences all related to the same context. Users can create as many staks as they need. They manually select the active stak (current context). The active stak is where search experiences will be collected, and where they will be tapped to provide recommendations.
9 / 29
Coping with Noisy Search Experiences Context Social Search
Social Search
Social search is the process of sharing search experiences between like-minded Web searchers. In HeyStaks, social search is possible by shared staks: several users can contribute to, and receive recommendation from the same stak. Staks can be private: only people I invite can join it. public: anyone can join the stak.
10 / 29
Coping with Noisy Search Experiences Context Social Search
HeyStaks Portal
11 / 29
Coping with Noisy Search Experiences Addressed Problem
Structure of the Talk
1 Context 2 Addressed Problem 3 Proposals and results 4 Perspectives
12 / 29
Coping with Noisy Search Experiences Addressed Problem
The Problem of Stak Selection
Users fail to select the appropriate stak before starting a search. The recommendations they will get will be less relevant. Their search experience is filed in the wrong stak. → HeyStaks ends up with a noisy experience repositories, and provides less accurate recommendations (even when the correct stak is selected).
13 / 29
Coping with Noisy Search Experiences Addressed Problem
Implemented Workarounds
Fall back to default stak when idle.
limits the input noise potentially reduces context awareness
Signal when other staks provide recommendations.
improves the relevance of recommendation, despite a wrong active stak encourages to select the right stak
14 / 29
Coping with Noisy Search Experiences Addressed Problem
Other Possible Solutions
Automatically select the right stak at query time. almost impossible if based on the sole query terms hazardous if based on the available recommendations technically complicated if based on external indicators
time tracking tools browsing history ...
15 / 29
Coping with Noisy Search Experiences Addressed Problem
Other Possible Solutions
Automatically select the right stak at query time. almost impossible if based on the sole query terms hazardous if based on the available recommendations technically complicated if based on external indicators
time tracking tools browsing history ...
Help stak owners to maintain (or curate) their staks. use classification techniques recommend the correct stak for a page pages are easier to classify than queries
15 / 29
Coping with Noisy Search Experiences Addressed Problem
Other Possible Solutions
Help stak owners to maintain (or curate) their staks. use classification techniques recommend the correct stak for a page pages are easier to classify than queries
15 / 29
Coping with Noisy Search Experiences Addressed Problem
Training the Page Classifier
Most work on coping with noise in recommender systems assume that a clean training set is available before noise is encountered. We need to find the kernel of each stak: the set (or a subset
- f) the pages actually relevant to that stak.
How can we find a reliable kernel? How can we evaluate its reliability?
16 / 29
Coping with Noisy Search Experiences Proposals and results
Structure of the Talk
1 Context 2 Addressed Problem 3 Proposals and results
Clustering Popularity weighting Popularity-based kernel
4 Perspectives
17 / 29
Coping with Noisy Search Experiences Proposals and results Clustering
Clustering
Idea: use clustering techniques to identify a candidate kernel Rationale: kernel pages must be somehow similar, while noisy pages will be heterogeneous Problem: huge variability depending on numerous parameters
comparing terms or pages different similarity measures different clustering algorithms threshold values
18 / 29
Coping with Noisy Search Experiences Proposals and results Popularity weighting
Popularity Weighting
Idea: use a measure of the popularity of pages as a proxy to relevance, in order to provide a fuzzy kernel Rationale: kernel pages are repeatedly selected, while noisy pages will only be accidentally selected
19 / 29
Coping with Noisy Search Experiences Proposals and results Popularity weighting
Popularity Measure
20 / 29
Coping with Noisy Search Experiences Proposals and results Popularity weighting
Popularity Measure
20 / 29
Coping with Noisy Search Experiences Proposals and results Popularity weighting
Popularity Measure
20 / 29
Coping with Noisy Search Experiences Proposals and results Popularity weighting
Popularity Measure
20 / 29
Coping with Noisy Search Experiences Proposals and results Popularity weighting
User Evaluation
Poll: for each of the 20 biggest shared staks 15 most popular pages & 15 least popular pages presented in random order to the stak owner asked if the page is relevant to the stak
21 / 29
Coping with Noisy Search Experiences Proposals and results Popularity weighting
Poll Results
10 20 30 40 50 60 70 80 90 100 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 number of documents popularity Irelevant I don’t know Relevant
22 / 29
Coping with Noisy Search Experiences Proposals and results Popularity weighting
Experiment
Classifier: decision tree / naive bayse for each of the 20 biggest shared staks trained with every page, weighted by normalized popularity 10-fold cross validation Accuracy each page contributes to the accuracy proportionally to its normalized popularity → it is more important for the classifier to recognize popular pages than unpopular pages.
23 / 29
Coping with Noisy Search Experiences Proposals and results Popularity weighting
Experimental Results
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 J48 NaiveBayes ZeroR weighted accuracy weighted
24 / 29
Coping with Noisy Search Experiences Proposals and results Popularity weighting
Experimental Results
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 J48 NaiveBayes ZeroR weighted accuracy weighted boolean unweighted
24 / 29
Coping with Noisy Search Experiences Proposals and results Popularity-based kernel
Validity of the Measure
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 weighted accuracy minimum normalized popularity J48 NaiveBayes ZeroR
25 / 29
Coping with Noisy Search Experiences Proposals and results Popularity-based kernel
Kernel-Trained Classifier
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 J48 NaiveBayes ZeroR weighted accuracy for np > 0.6 whole-trained kernel-trained
26 / 29
Coping with Noisy Search Experiences Perspectives
Structure of the Talk
1 Context 2 Addressed Problem 3 Proposals and results 4 Perspectives
27 / 29
Coping with Noisy Search Experiences Perspectives
Further Work
Integrate the classifier to the stak maintenance page.
Which user interface? How to integrate user feedback into the classifier?
Use the classifier to recommend a stak at query time.
First experiments are not satisfying
28 / 29
Coping with Noisy Search Experiences Perspectives
Perspectives
Beyond the specific application domain of this work... Coping with noisy knowledge bases.
Relevant for experience repositories Relevant for large scale (Web-based) systems
Using knowledge for a different purpose.
What other purposes can a knowledge base serve? What properties of the knowledge base make it easy/hard to reuse?
Experience is the name everyone gives to their mistakes —Oscar Wilde
29 / 29