1 An Filtering System that Monitors Document Search Engines Can - - PDF document

1
SMART_READER_LITE
LIVE PREVIEW

1 An Filtering System that Monitors Document Search Engines Can - - PDF document

Filtering May be Your Work Adaptive Information Filtering Using Bayesian Graphical Models Yi Zhang Baskin School of Engineering University of California Santa Cruz yiz@soe.ucsc.edu 1 2 Filtering May be Your Work Filtering May be Your Work


slide-1
SLIDE 1

1

1

Adaptive Information Filtering Using Bayesian Graphical Models

Yi Zhang Baskin School of Engineering University of California Santa Cruz yiz@soe.ucsc.edu

2

Filtering May be Your Work

3

Filtering May be Your Work

Getting potential terrorist alert

4

Filtering May be Your Work

Tracking stock news

5

Filtering May be Your Work

Getting funding alerts

6

Even if You Do Not Work…

slide-2
SLIDE 2

2

7

Search Engines Can Help, But … Not Enough!

  • Search engine focus: Short term information need (ad

hoc search) – Information source is relatively static – User pulls information from the system

  • The task: Long term information need (adaptive

filtering) – Information source is dynamic – User wants to be alerted as soon as the information is available – System pushes information to the user

8

An Filtering System that Monitors Document Stream(s)

Filtering System … accumulated docs document stream

Delivered docs

Feedback learning user profile

J

free text query

initialization

(binary classifier) (user profile)

9

Related Areas

Models Applications Systems Information Filtering Statistics, optimization Artificial Intelligence Machine learning Natural language processing Database Computer systems Security Human computer interaction Computer networks Events tracking Bioinformatics Business applications Medical informatics Digital library Email filtering … Focus of this talk

10

Common Approaches and Problems in IR

  • Many people work on filtering

– More than 40+ institutes – NIST TREC filtering track, SPAM track, TDT

  • Commonly used evaluation measure: Utility

– Example: T9U=2J - L deliver if P(J |document)>=0.33

  • Commonly used algorithms: relevance based filtering

– Relevance retrieval + Threshold – Binary text classification: relevant vs. non-relevant

  • Challenges and opportunities

– Very limited user supervision – User criteria beyond relevance – Complex user models can be learned over a long period of user interaction – Poor performance with existing algorithms

11

Our Approach: System with Desirable Characteristics

Ask good questions Use multiple forms

  • f evidence

Use heuristics What can a person do? (desirable characteristics) Bayesian Prior Bayesian Active Learning Graphical Models Our solution for a computer Bayesian Graphical Models Unified Framework Social learning Bayesian Hierarchical Modeling

12

What are Bayesian Graphical Models (BGM)? Three Components

  • Bayesian axiom: maximizing utility
  • Representation tools: Graphical Model

– Graphical representation summarizes conditional independence relationships between variables on the graph – Conditional probabilistic distributions or potential functions encode the quantitative relationship between connected nodes

  • Inference algorithms

– Estimating the unknown from the known – Methods to achieve the goal of utility maximizing

v1 v2 v3 P(v3|v1,v2) v0 P(v1|v0)

slide-3
SLIDE 3

3

13

Road Map

  • Introduction
  • How we use BGM for filtering

– Using expert’s heuristics as a Bayesian prior (SIGIR04) – Exploration and exploitation trade off using Bayesian active learning (ICML 03) – Combining multiple forms of evidence using graphical models (HLT 05) – Collaborative adaptive user modeling with explicit & implicit feedback (CIKM 06)

  • Contribution and future work

14

Motivation: Using Heuristics as Bayesian Prior

) , | ( w x yes y P =

X Y w relevant document parameter

) var , | ( iance mean w P

variance mean priors

15

When is it Expected to Work?

number of training data performance learner: low bias heuristic algorithm used to estimate prior: low variance Rocchio algorithm hypothesis: logistic regression

16

Method: Convert Decision Boundary to Prior Distribution

Document space (N) Logistic Regression Parameter space (N+1)

Rocchio + threshold => wR wR

  • Step 1: Heuristic algorithm => wR

) , | ( ) ( v w w N w P

m

=

  • Step 3: Use wm as logistic regression prior mean

) ( ) , | ( ) ( 1 ) | ( w P w x y p D Z D w P

i i i t t

=

* w

  • Step 4: Estimate posterior distribution of logistic parameter

1 ) , cosine( and ) , | ( max arg

1

= =

= R T i i i w m

w w w x y p w

R m

w w

*

α =

=

= =

T i R i i R m

w x y p w w

1 * *

) , | ( max arg where α α α

α

  • Step 2:

17

Results

0.1 0.2 0.3 0.4 0.5 0.6

  • ur Logistic_Rocchio

Logistic Regression Rocchio Logistic_UnscaledRocchio

normalized utility TREC 11 Adaptive Filtering Data

  • Best TREC official result: 0.475
  • Similar performance on TREC 9 data

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

  • ur team:

LR_Rocchio Team 1 Team 2 Team 4

TDT 2004 results reported by NIST

  • A little better result (0.7328) reported by

team_1 in the TDT workshop

18

Road Map

  • Introduction
  • How we use BGM for filtering

– Using expert’s heuristics as a Bayesian prior – Exploration and exploitation trade off using Bayesian active learning – Combining multiple forms of evidence using graphical models – Collaborative adaptive user modeling with explicit & implicit feedback

  • Contribution and future work
slide-4
SLIDE 4

4

19

Motivation: A “Bad” Document May Help Future Performance

  • The effects of delivering a document to the user:

– A: Satisfy a user’s information need immediately – B: Get the user feedback, learn from it, and serve the user better in the future

  • Existing filtering systems don’t consider the effect B or consider

it heuristically

  • Our solution: Bayesian active learning

– Model the future utility of delivering a document explicitly while learning the threshold

  • n

exploratio : B

  • n

exploitati : A

) ( ) ( ) ( d U N d U d U

future future immediate

+ =

20

Exploitation: Estimate Uimmediate

θ θ θ

θ

d D P d U D d U

t t immediate t t immediate

= ) | ( ) | ( ) | (

1 1

Using Bayesian Inference, we have: Y=relevant or non relevant Ay: credit/penalty defined by the utility function dt: document arrives at current time t Dt-1: existing training data set before dt arrives

  • =

y t y t immediate

d y P A d U ) , | ( ) | ( θ θ

21

Estimating Future Utility Using Bayesian Decision Theory

  • When the true model is , we incur some loss if using model
  • The true model is unknown, but given training data set D, we estimate the

posterior distribution of the true model, and then estimate the expected loss of using :

  • Measure the quality of training data set D as the expected loss of using
  • Measure the future utility as the expected reduction on loss

θ

) , ( ) , ( ) , (

∧ ∧

− = θ θ θ θ θ θ U U Loss

θ

θ

) , ( ) (

) | ( ∧ ∧

= θ θ θ

θ

Loss E Loss

D P

) ( ) (

^ D

Loss D Loss θ = )) , ( ( ) ( ) | (

1 ) , | ( 1 1

1

y d D Loss E D Loss D d U

t t D d y P t t t future

t t

− −

− =

θ

θ

D ∧

θ

22

Method: The Whole Process on BGM

  • Step 1: Estimate the immediate utility

− − −

− =

y t t t t t t t future

y d D Loss D d y P D Loss D d U )) , ( ( ) , | ( ) ( ) | (

1 1 1 1

  • )

| ( :

1 − t

D P θ θ

  • Step 2: Estimate the future utility

document: dt

relevant: y

model:

  • Step 3: Deliver dt if and only if

) | ( ) | ( ) (

1 1

> + =

− − t t future future t t immediate t

D d U N D d U d U θ θ θ

θ

d D P d U D d U

t t immediate t t immediate

= ) | ( ) | ( ) | (

1 1 23

Results

0.360 0.353 normalized utility 25 31 docs/profile 11.54 11.32 utility Bayesian Immediate Bayesian Active

  • When exploration is worth doing, it is

effective – thousands of relevant documents 0.445 0.448 normalized utility 3895 4527 docs/profile 3149 3534 utility Bayesian Immediate Bayesian Active Trec-10: Reuters Dataset

Trec-9: OHSUMED Dataset

  • When exploration is not worth doing, it

didn’t hurt – active learning didn’t improve – only 51 out of 300000 are relevant documents on average

24

Road Map

  • Introduction
  • How we use BGM for filtering

– Using expert’s heuristics as a Bayesian prior – Exploration and exploitation trade off using Bayesian active learning – Combining multiple forms of evidence using graphical models (beyond relevance) » User study » Data analysis – Collaborative adaptive user modeling with explicit & implicit feedback

  • Contribution and future work
slide-5
SLIDE 5

5

25

User Study System

26

User Study to Collect Evaluation Data

27

A Special Browser With Feedback Interface

28

User Topics

  • Terrorism| conflict| Iraq| civilian death
  • Elections
  • Google
  • Bush
  • Kill Bill
  • Women's health
  • Weird stories

29

Evidence Collected

5%-18% relevance score, readability score and document length Content of each document 0% in link count News source information 27% topic familiarity Topic info. 0% mouse usage, time on page, scroll bar usage, keyboard usage User actions 0.5%-1.2% relevant, novel, authoritative, readable, “user like” Explicit user feedback Missing Rate examples 61 different feature missing combinations

30

Methods Tried for Handling Missing Data

  • Linear regression

– LR_Mean: mean substitution – LR_Different: building different models to be used at different data missing combination

  • GM_complete: graphical modelling approach

– Compete Gaussian network – A single model to handle various data missing scenario » Equivalent to using many linear regression models (theoretically and empirically) – Computationally efficient » Close form solution for conditional probability of “user like”

slide-6
SLIDE 6

6

31

Predict “User Like: Evidence Ordered by User Effort Involved

Reasonable performance without user feedback on individual document

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 RSS info + readabiltity score + topic info + relevance score + actions

evidence co rrelatio n co efficien t LR_simple LR_different GM_BNcomplete 32

Understand Domain Using Causal Inference

33

Explore Structures for Prediction Task

  • 0.1

0.1 0.2 0.3 0.4 0.5 relevance score + topic info + readability score + RSS info + actions evidence correlation coefficient GM_complete GM_causal

  • Other structures and functions may lead to better performance

(needs further investigation)

34

Road Map

  • Introduction
  • How we use BGM for filtering

– Using expert’s heuristics as a Bayesian prior – Exploration and exploitation trade off using Bayesian active learning – Combining multiple forms of evidence using graphical models (beyond relevance) – Collaborative adaptive user modeling with explicit & implicit feedback

  • Contribution and future work

35

Social Learning Helps You

When you always learn by yourself When you learn with others

36

Collaborative User Modeling Helps Filtering Systems

J J J J J J J J J

J

slide-7
SLIDE 7

7

37

Solution: Put a Prior Distribution over User Models.

  • A user model is a random

sample from a distribution User Model Space

User i

38

Hierarchical Gaussian Network

Θ Θ Θ Θ=(µ µ µ µ,σ σ σ σ): profile ‘prior’ w: user profile (user model) X: document Y: rating

) , ( ~ , | ) , ( ~

2 2 u u u u

w x N w x y N w κ µ

  • Σ

Sharing information between users without violating privacy

39

Updating User Profile

{ }

  • =

⋅ + − − = + = =

  • =

= =

− − u t d

N i i i u T w w w MAP u t u i u i

y x w w w w D P w P D w P w D w P w D D w N i y x D

1 2 2

) ( 1 ) ( ) ( min arg )) , | ( log( )) | ( log( max arg ) , | ( max arg ) | Pr( ) | ( ) , | Pr( ) , | Pr( 1 | ) , ( : t at time data ning Given trai

2 2 1

κ µ µ θ θ θ θ θ θ θ

σ σ

  • 40

Experimental Methodology

  • Goal: To see how an established system can accommodate a

new user.

  • For each user we evaluate four models

– “Prior”: Hierarchical framework – “No Prior”: Linear model trained only with this user’s data. – “Generic”: Linear model trained with all data – “Baseline”: Rates every document with the average rating seen so far for this user

  • Each method adaptively updates the user model for every

document the user rated

41

Performance Over Time

  • Zhang’s Data
  • Input X:

relevance score, readability score, familiar with the topic, user actions, document length, host speed

  • Performance

changes over time

42

Further Analysis

  • Hierarchical model effectively trades off between collaborative

information and user-specific information

  • Gaussian Prior is simple and fast

– User privacy is reasonably protected in this collaborative filtering scenario – Other priors?

  • Implicit feedback alone does not seem very useful

– Although implicit feedback could be correlated with relevancy for certain user (correlation usually is very low) – Better models?

  • Data is extremely noisy

– Shifting user behavior – Shifting bias in user rating – Be careful with too noisy information: may hurt the performance if not handled properly

slide-8
SLIDE 8

8

43

Summary

  • A unified framework

– A broader definition of utility based on several criteria – Probabilistic reasoning tools for utility optimization

  • Building filtering system with desirable properties

– Novel technique to integrate domain expert’s heuristics into machine learning algorithm using Bayesian prior – New model quality measure for exploitation and exploration tradeoff – Explored how to go beyond relevance and combine multiple forms of evidence (user study + methodology) using graphical models – Collaborative user modeling approach to solve the cold start problem without violating privacy

44

Other Research in My Group

  • Novelty and anomaly detection
  • Personalization
  • Text mining for minority languages
  • Integrating heterogeneous data resources for text

mining – Blog, news, web pages, email, personal

  • Petabyte-scale retrieval