Flatlands 08 Machine Learning Deirdre Lungley dmlung@essex.ac.uk - - PowerPoint PPT Presentation

flatlands 08
SMART_READER_LITE
LIVE PREVIEW

Flatlands 08 Machine Learning Deirdre Lungley dmlung@essex.ac.uk - - PowerPoint PPT Presentation

Adaptively Modelling the Context of an Intranet Query Deirdre Lungley Motivation Research Plan FCA Flatlands 08 Machine Learning Deirdre Lungley dmlung@essex.ac.uk University of Essex June 6, 2008 Deirdre Lungley (University of Essex)


slide-1
SLIDE 1

Deirdre Lungley Motivation Research Plan FCA Machine Learning

Adaptively Modelling the Context of an Intranet Query

Flatlands ’08

Deirdre Lungley dmlung@essex.ac.uk

University of Essex

June 6, 2008

Deirdre Lungley (University of Essex) June 6, 2008 1 / 16

slide-2
SLIDE 2

Deirdre Lungley Motivation Starting Point Users Like Help Query Table Research Focus Intranet Search Research Plan FCA Machine Learning

Adaptively Modelling the Context of an Intranet Query

Figure: University of Essex Intranet Search

Deirdre Lungley (University of Essex) June 6, 2008 2 / 16

slide-3
SLIDE 3

Deirdre Lungley Motivation Starting Point Users Like Help Query Table Research Focus Intranet Search Research Plan FCA Machine Learning

Adaptively Modelling the Context of an Intranet Query

Motivation Users do like some help!

Kruschwitz and Al-Bakour, 2005 White and Ruthven, 2006

known-item search - query suggestions exploratory search - query destinations

Web examples

Clusty - Vivisimo Ltd. CREDO - FUB, Italy

Intranet example

Aquabrowser - Medialab Solutions, The Netherlands

Analysis of UoE intranet search modifications

Dominated by single-term queries Many of these queries met by documents in top 5 results However, how about?

Multi-context terms - sport, parking, printing Ambiguous terms - CES

Deirdre Lungley (University of Essex) June 6, 2008 3 / 16

slide-4
SLIDE 4

Deirdre Lungley Motivation Starting Point Users Like Help Query Table Research Focus Intranet Search Research Plan FCA Machine Learning

Adaptively Modelling the Context of an Intranet Query

Query Term(s) 1. library 2. accomodation 3. exam timetable 4. timetable 5. courses 6. accommodation 7. fees 8. moodle 9. mba 10. graduation

Table: Prominent Modified Queries

Deirdre Lungley (University of Essex) June 6, 2008 4 / 16

slide-5
SLIDE 5

Deirdre Lungley Motivation Starting Point Users Like Help Query Table Research Focus Intranet Search Research Plan FCA Machine Learning

Adaptively Modelling the Context of an Intranet Query

Research Focus All require a domain model Non-trivial task

Relying on appropriate document annotation

Our answer!

Automatically adapt our domain model - let it learn from implicit user feedback (clickthrough data)

Current uses of clickthrough data

Re-ranking of results Query refinement

Deirdre Lungley (University of Essex) June 6, 2008 5 / 16

slide-6
SLIDE 6

Deirdre Lungley Motivation Starting Point Users Like Help Query Table Research Focus Intranet Search Research Plan FCA Machine Learning

Adaptively Modelling the Context of an Intranet Query

Why Intranet Search? Controlled environments

Often imposed annotation standards Less spam, making inlinks and metadata more reliable

Relatively cohesive community of users

Similiar search needs aid the viability of harnessing user population feedback

Deirdre Lungley (University of Essex) June 6, 2008 6 / 16

slide-7
SLIDE 7

Deirdre Lungley Motivation Research Plan Components System Architecture FCA Machine Learning

Adaptively Modelling the Context of an Intranet Query

Components Underlying Search Engine

Lucene’s Nutch

Natural Language Processing

QTag Collocations (Justeson and Katz)

AN, NN, AAN, ANN, NAN, NNN, NPN

Context Model

Formal Concept Analysis (FCA)

Machine Learning

SVM-Light (Joachims)

Deirdre Lungley (University of Essex) June 6, 2008 7 / 16

slide-8
SLIDE 8

Deirdre Lungley Motivation Research Plan Components System Architecture FCA Machine Learning

Adaptively Modelling the Context of an Intranet Query

NL Query

Underlying Search Engine NL Processor FCA Processor Lattice/Document Representation Document Indices Crawler / Indexer Machine Learning Module

Document Collection

Predictions (URL : weighted terms) Logged Relevance Data Lattice Exploration

Adaptive Element

URL : Terms URL : Terms (Adapted)

Figure: System Architecture

Deirdre Lungley (University of Essex) June 6, 2008 8 / 16

slide-9
SLIDE 9

Deirdre Lungley Motivation Research Plan FCA Hasse Table Related Concept Lattice Automade Adapted Lattice Machine Learning

Adaptively Modelling the Context of an Intranet Query

horse male female adult young horse X stallion X X X mare X X X foal X X filly X X X colt X X X

Figure: Classical Lattice Example - Hasse Table

Deirdre Lungley (University of Essex) June 6, 2008 9 / 16

slide-10
SLIDE 10

Deirdre Lungley Motivation Research Plan FCA Hasse Table Related Concept Lattice Automade Adapted Lattice Machine Learning

Adaptively Modelling the Context of an Intranet Query

horse horse foal young male female adult colt filly stallion mare

Figure: Classical Lattice Example - Concept Lattice

Deirdre Lungley (University of Essex) June 6, 2008 10 / 16

slide-11
SLIDE 11

Deirdre Lungley Motivation Research Plan FCA Hasse Table Related Concept Lattice Automade Adapted Lattice Machine Learning

Adaptively Modelling the Context of an Intranet Query

Figure: Automade Screenshot

Deirdre Lungley (University of Essex) June 6, 2008 11 / 16

slide-12
SLIDE 12

Deirdre Lungley Motivation Research Plan FCA Hasse Table Related Concept Lattice Automade Adapted Lattice Machine Learning

Adaptively Modelling the Context of an Intranet Query

Figure: Example Adapted Lattice

Deirdre Lungley (University of Essex) June 6, 2008 12 / 16

slide-13
SLIDE 13

Deirdre Lungley Motivation Research Plan FCA Machine Learning SVM-Light Clickthrough Data Adaptation Steps

Adaptively Modelling the Context of an Intranet Query

SVM-Light Machine learning tool developed by Thorsten Joachims Particularly suitable for Information Retrieval - developed to surmount the problem of sparsity in document/term matrices Default linear kernel Lattice-based kernel to optimize the lattice structure?

Deirdre Lungley (University of Essex) June 6, 2008 13 / 16

slide-14
SLIDE 14

Deirdre Lungley Motivation Research Plan FCA Machine Learning SVM-Light Clickthrough Data Adaptation Steps

Adaptively Modelling the Context of an Intranet Query

Clickthrough Data Questions have been raised regarding the accuracy of using clickthrough data as an indicator of relevance Radlinski and Joachims promote relative relevance as against absolute relevance

A document clicked on for a query is deemed more relevant to that particular query than documents above and below

Deirdre Lungley (University of Essex) June 6, 2008 14 / 16

slide-15
SLIDE 15

Deirdre Lungley Motivation Research Plan FCA Machine Learning SVM-Light Clickthrough Data Adaptation Steps

Adaptively Modelling the Context of an Intranet Query

Adaptation Steps Record Log Data

Log initial query term Log subsequent query terms either entered in the textbox or chosen by clicking on the lattice node Log clicked URL plus subsequent browser clicked URLs (possibly not within result list)

Adaptive Element. Before creation of query lattice apply SVM-Light Model. This should:

Associate query terms positively with the clicked URLs and negatively with skipped URLs (i.e., increase/decrease document/term weight) Decrease weight of document terms not in query terms If query term does not exist in document terms, add with positive weight Apply threshold to delete terms within documents and entire documents where all terms deleted

Deirdre Lungley (University of Essex) June 6, 2008 15 / 16

slide-16
SLIDE 16

Deirdre Lungley Motivation Research Plan FCA Machine Learning SVM-Light Clickthrough Data Adaptation Steps

Adaptively Modelling the Context of an Intranet Query

Thank You!

Deirdre Lungley (University of Essex) June 6, 2008 16 / 16