Coping with Noisy Search Experiences Pierre-Antoine Champin, Peter - - PowerPoint PPT Presentation

coping with noisy search experiences
SMART_READER_LITE
LIVE PREVIEW

Coping with Noisy Search Experiences Pierre-Antoine Champin, Peter - - PowerPoint PPT Presentation

Coping with Noisy Search Experiences Coping with Noisy Search Experiences Pierre-Antoine Champin, Peter Briggs, Maurice Coyle, Barry Smyth LIRIS, Clarity, Universit e de Lyon, University College Dublin, France Ireland 16 December 2009


slide-1
SLIDE 1

Coping with Noisy Search Experiences

Coping with Noisy Search Experiences

Pierre-Antoine Champin, Peter Briggs, Maurice Coyle, Barry Smyth

LIRIS, Universit´ e de Lyon, France Clarity, University College Dublin, Ireland

16 December 2009

slide-2
SLIDE 2

Coping with Noisy Search Experiences

Structure of the Talk

1 Context 2 Addressed Problem 3 Proposals and results 4 Perspectives

2 / 29

slide-3
SLIDE 3

Coping with Noisy Search Experiences Context

Structure of the Talk

1 Context

Recommender Systems Context Aware Recommendation Social Search

2 Addressed Problem 3 Proposals and results 4 Perspectives

3 / 29

slide-4
SLIDE 4

Coping with Noisy Search Experiences Context

HeyStaks

HeyStaks is a social context aware recommender system for web searches

4 / 29

slide-5
SLIDE 5

Coping with Noisy Search Experiences Context Recommender Systems

Recommender Systems

Recommender systems aim at presenting users with information that suit their particular preferences or needs. General purpose search engines provide results based on an

  • bjective measure of relevance w.r.t. the query

→ same results for everyone Recommender systems for web search aim at personalising the results of search engines.

5 / 29

slide-6
SLIDE 6

Coping with Noisy Search Experiences Context Recommender Systems

HeyStaks as a Recommender System

HeyStaks is an extension for Firefox. It aims at integrating into users’ habits rather than forcing them to change. It recognizes result pages from popular search engines, and alter them in order to

promote links (move them up in the list), insert new links,

based on the user’s past search experiences.

6 / 29

slide-7
SLIDE 7

Coping with Noisy Search Experiences Context Recommender Systems

HeyStaks as a Recommender System

HeyStaks is an extension for Firefox. It aims at integrating into users’ habits rather than forcing them to change. It recognizes result pages from popular search engines, and alter them in order to

promote links (move them up in the list), insert new links,

based on the user’s past search experiences. Past search experiences are acquired by :

implicit feedback: query results click-through explicit feedback: tagging page, voting, sharing

6 / 29

slide-8
SLIDE 8

Coping with Noisy Search Experiences Context Recommender Systems

Recommendations in HeyStaks

7 / 29

slide-9
SLIDE 9

Coping with Noisy Search Experiences Context Context Aware Recommendation

Context Aware Recommendation

Search engines provide the same results for every user. Recommender systems provide personalised results... ... but provide the same personalisation every time. Nobody is only one user... ... their need depends on the context,

especially when considering Web searches.

8 / 29

slide-10
SLIDE 10

Coping with Noisy Search Experiences Context Context Aware Recommendation

Context Aware Recommendation

Search engines provide the same results for every user. Recommender systems provide personalised results... ... but provide the same personalisation every time. Nobody is only one user... ... their need depends on the context,

especially when considering Web searches.

My searches are sometimes related to my research, my teaching, my leisure... → need for different recommendations in different contexts.

8 / 29

slide-11
SLIDE 11

Coping with Noisy Search Experiences Context Context Aware Recommendation

Search Staks

A search stak is a repository of search experiences all related to the same context. Users can create as many staks as they need. They manually select the active stak (current context). The active stak is where search experiences will be collected, and where they will be tapped to provide recommendations.

9 / 29

slide-12
SLIDE 12

Coping with Noisy Search Experiences Context Social Search

Social Search

Social search is the process of sharing search experiences between like-minded Web searchers. In HeyStaks, social search is possible by shared staks: several users can contribute to, and receive recommendation from the same stak. Staks can be private: only people I invite can join it. public: anyone can join the stak.

10 / 29

slide-13
SLIDE 13

Coping with Noisy Search Experiences Context Social Search

HeyStaks Portal

11 / 29

slide-14
SLIDE 14

Coping with Noisy Search Experiences Addressed Problem

Structure of the Talk

1 Context 2 Addressed Problem 3 Proposals and results 4 Perspectives

12 / 29

slide-15
SLIDE 15

Coping with Noisy Search Experiences Addressed Problem

The Problem of Stak Selection

Users fail to select the appropriate stak before starting a search. The recommendations they will get will be less relevant. Their search experience is filed in the wrong stak. → HeyStaks ends up with a noisy experience repositories, and provides less accurate recommendations (even when the correct stak is selected).

13 / 29

slide-16
SLIDE 16

Coping with Noisy Search Experiences Addressed Problem

Implemented Workarounds

Fall back to default stak when idle.

limits the input noise potentially reduces context awareness

Signal when other staks provide recommendations.

improves the relevance of recommendation, despite a wrong active stak encourages to select the right stak

14 / 29

slide-17
SLIDE 17

Coping with Noisy Search Experiences Addressed Problem

Other Possible Solutions

Automatically select the right stak at query time. almost impossible if based on the sole query terms hazardous if based on the available recommendations technically complicated if based on external indicators

time tracking tools browsing history ...

15 / 29

slide-18
SLIDE 18

Coping with Noisy Search Experiences Addressed Problem

Other Possible Solutions

Automatically select the right stak at query time. almost impossible if based on the sole query terms hazardous if based on the available recommendations technically complicated if based on external indicators

time tracking tools browsing history ...

Help stak owners to maintain (or curate) their staks. use classification techniques recommend the correct stak for a page pages are easier to classify than queries

15 / 29

slide-19
SLIDE 19

Coping with Noisy Search Experiences Addressed Problem

Other Possible Solutions

Help stak owners to maintain (or curate) their staks. use classification techniques recommend the correct stak for a page pages are easier to classify than queries

15 / 29

slide-20
SLIDE 20

Coping with Noisy Search Experiences Addressed Problem

Training the Page Classifier

Most work on coping with noise in recommender systems assume that a clean training set is available before noise is encountered. We need to find the kernel of each stak: the set (or a subset

  • f) the pages actually relevant to that stak.

How can we find a reliable kernel? How can we evaluate its reliability?

16 / 29

slide-21
SLIDE 21

Coping with Noisy Search Experiences Proposals and results

Structure of the Talk

1 Context 2 Addressed Problem 3 Proposals and results

Clustering Popularity weighting Popularity-based kernel

4 Perspectives

17 / 29

slide-22
SLIDE 22

Coping with Noisy Search Experiences Proposals and results Clustering

Clustering

Idea: use clustering techniques to identify a candidate kernel Rationale: kernel pages must be somehow similar, while noisy pages will be heterogeneous Problem: huge variability depending on numerous parameters

comparing terms or pages different similarity measures different clustering algorithms threshold values

18 / 29

slide-23
SLIDE 23

Coping with Noisy Search Experiences Proposals and results Popularity weighting

Popularity Weighting

Idea: use a measure of the popularity of pages as a proxy to relevance, in order to provide a fuzzy kernel Rationale: kernel pages are repeatedly selected, while noisy pages will only be accidentally selected

19 / 29

slide-24
SLIDE 24

Coping with Noisy Search Experiences Proposals and results Popularity weighting

Popularity Measure

20 / 29

slide-25
SLIDE 25

Coping with Noisy Search Experiences Proposals and results Popularity weighting

Popularity Measure

20 / 29

slide-26
SLIDE 26

Coping with Noisy Search Experiences Proposals and results Popularity weighting

Popularity Measure

20 / 29

slide-27
SLIDE 27

Coping with Noisy Search Experiences Proposals and results Popularity weighting

Popularity Measure

20 / 29

slide-28
SLIDE 28

Coping with Noisy Search Experiences Proposals and results Popularity weighting

User Evaluation

Poll: for each of the 20 biggest shared staks 15 most popular pages & 15 least popular pages presented in random order to the stak owner asked if the page is relevant to the stak

21 / 29

slide-29
SLIDE 29

Coping with Noisy Search Experiences Proposals and results Popularity weighting

Poll Results

10 20 30 40 50 60 70 80 90 100 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 number of documents popularity Irelevant I don’t know Relevant

22 / 29

slide-30
SLIDE 30

Coping with Noisy Search Experiences Proposals and results Popularity weighting

Experiment

Classifier: decision tree / naive bayse for each of the 20 biggest shared staks trained with every page, weighted by normalized popularity 10-fold cross validation Accuracy each page contributes to the accuracy proportionally to its normalized popularity → it is more important for the classifier to recognize popular pages than unpopular pages.

23 / 29

slide-31
SLIDE 31

Coping with Noisy Search Experiences Proposals and results Popularity weighting

Experimental Results

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 J48 NaiveBayes ZeroR weighted accuracy weighted

24 / 29

slide-32
SLIDE 32

Coping with Noisy Search Experiences Proposals and results Popularity weighting

Experimental Results

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 J48 NaiveBayes ZeroR weighted accuracy weighted boolean unweighted

24 / 29

slide-33
SLIDE 33

Coping with Noisy Search Experiences Proposals and results Popularity-based kernel

Validity of the Measure

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 weighted accuracy minimum normalized popularity J48 NaiveBayes ZeroR

25 / 29

slide-34
SLIDE 34

Coping with Noisy Search Experiences Proposals and results Popularity-based kernel

Kernel-Trained Classifier

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 J48 NaiveBayes ZeroR weighted accuracy for np > 0.6 whole-trained kernel-trained

26 / 29

slide-35
SLIDE 35

Coping with Noisy Search Experiences Perspectives

Structure of the Talk

1 Context 2 Addressed Problem 3 Proposals and results 4 Perspectives

27 / 29

slide-36
SLIDE 36

Coping with Noisy Search Experiences Perspectives

Further Work

Integrate the classifier to the stak maintenance page.

Which user interface? How to integrate user feedback into the classifier?

Use the classifier to recommend a stak at query time.

First experiments are not satisfying

28 / 29

slide-37
SLIDE 37

Coping with Noisy Search Experiences Perspectives

Perspectives

Beyond the specific application domain of this work... Coping with noisy knowledge bases.

Relevant for experience repositories Relevant for large scale (Web-based) systems

Using knowledge for a different purpose.

What other purposes can a knowledge base serve? What properties of the knowledge base make it easy/hard to reuse?

Experience is the name everyone gives to their mistakes —Oscar Wilde

29 / 29