Usage Aware Average-Clicks Kalyan Beemanapalli University of - - PowerPoint PPT Presentation

usage aware average clicks
SMART_READER_LITE
LIVE PREVIEW

Usage Aware Average-Clicks Kalyan Beemanapalli University of - - PowerPoint PPT Presentation

Usage Aware Average-Clicks Kalyan Beemanapalli University of Minnesota Ramya Rangarajan University of Minnesota Jaideep Srivastava University of Minnesota Presenter: Kalyan Beemanapalli WebKDD 2006 Workshop on Knowledge Discovery on


slide-1
SLIDE 1

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA

Intel IT Research 1

Usage Aware Average-Clicks

Kalyan Beemanapalli – University of Minnesota Ramya Rangarajan – University of Minnesota Jaideep Srivastava – University of Minnesota Presenter: Kalyan Beemanapalli

slide-2
SLIDE 2

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA

Intel IT Research 2

Outline

Introduction Related Work Background Method Experiments and Results Key Contributions Conclusions and Future Work Questions

slide-3
SLIDE 3

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA

Intel IT Research 3

Related Work – Link Analysis

Applications

PageRank HITS Average-Clicks (Matsuo et al)

Disadvantage

Static

slide-4
SLIDE 4

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA

Intel IT Research 4

Related work

Solution

Usage Data

Why Usage Aware Average-Clicks?

Average-Clicks

Fairly new algorithm Proposes a new definition to distance between web

pages

Measures distance in user’s context

Ideas from

Usage Aware PageRank (Oztekin et al) Extensions to HITS (Miller et al)

slide-5
SLIDE 5

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA

Intel IT Research 5

Average-Clicks

Measure of distance between web pages Definition – An average click is one click

among n links

Probability of a random surfer on a page p to

click any one of the links is where α = Damping Factor

slide-6
SLIDE 6

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA

Intel IT Research 6

Average Clicks

Average Click length of links on page

p =

Where α = Damping Factor, n = Average Number of links on a page

Distance between page p and q

shortest path between the nodes representing the pages

in the graph

Path through a longer chain of links can be considered

shorter than one through smaller number of links

slide-7
SLIDE 7

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA

Intel IT Research 7

Average Clicks - Example

slide-8
SLIDE 8

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA

Intel IT Research 8

Usage Aware Average-Clicks

P Q R S T Usage Graph

  • No. of
  • ccurrences of

each page

  • No. of co-
  • ccurrences of

pages

p

  • f
  • ccurences
  • f

Number q p,

  • f
  • ccurences
  • co
  • f

) , ( Number q p C =

p node to assigned Weight q to p from edge the

  • f

Weight q p C = ) , (

slide-9
SLIDE 9

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA

Intel IT Research 9

Usage Aware Average-Clicks

P Q R S T Link Graph

  • therwise

i page

  • n

j page link to a is there if i)) e(page 1/Outdegre ( ) , ( ∞ = j i D

slide-10
SLIDE 10

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA

Intel IT Research 10

Usage Aware Average-Clicks

We now have We combine the Link Matrix and Usage Matrix

to define the new definition of distance between 2 pages as follows:

p

  • f
  • ccurences
  • f

Number q p,

  • f
  • ccurences
  • co
  • f

) , ( Number q p C =

  • therwise

i page

  • n

j page link to a is there if p)) e(page 1/Outdegre ( ) , ( ∞ = q p D

⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − =

) ( deg ( * )) , ( 1 ( ) , ( tan

log

p ree Out q p C q p ce Dis

n

α

slide-11
SLIDE 11

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA

Intel IT Research 11

Usage Aware Average-Clicks

Shortest distance between pairs of nodes –

all pairs shortest path algorithm

All Pairs Shortest path algorithm used –

Floyd Warshall’s Algorithm

Implementation Issues

Poor scalability

slide-12
SLIDE 12

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA

Intel IT Research 12

Solution

Template for each node Set of links for page 0

1 2

Page ID Avg Click Score Usage Score Usg Avr Avg Click Score Vector holding the heads

  • f linked lists

Data Structure for Floyd Warshall

slide-13
SLIDE 13

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA

Intel IT Research 13

Experimental Results

Experiments conducted on www.cs.umn.edu Usage data collected in Apr 2006 Data set reduced to 100,000 sessions Noise removed Link Graph built using our crawler

slide-14
SLIDE 14

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA

Intel IT Research 14

Example Distances

slide-15
SLIDE 15

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA

Intel IT Research 15

Evaluation Methodology

Domain Expert’s View

Questionnaires

User’s View

Questionnaires Automate verification

Our Method

Predicting Power

slide-16
SLIDE 16

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA

Intel IT Research 16

Evaluation Methodology

Incorporated into a recommender system Idea - pages that are close to each other are

more similar to each other than pages that are farther apart

Performance compared with ‘2, -1’ model Tested on www.cs.umn.edu

slide-17
SLIDE 17

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA

Intel IT Research 17

… … … … Website Web Logs Session Similarity Session Clusters Clickstream Trees Sessions Usage Aware Average- Clicks Hierarchy Webpage request Get Recommendations Recommendations HTML + Recommendations Web Client Web Server Session Identification Usage Aware Average-Clicks Generation Graph Partitioning

Offline Online

The Recommender System Architecture

Session Alignment

Recommendation System

slide-18
SLIDE 18

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA

Intel IT Research 18

Evaluation Measures

Hit Ratio (HR): Percentage of hits. If a

recommended page is actually requested later in the session, we declare a hit.

Click Reduction (CR): For a test session (p1, p2,…,

pi…, pj…, pn), if pj is recommended at page pi, and pj is subsequently accessed in the session, then the click reduction due to this recommendation is,

i i j reduction Click − =

slide-19
SLIDE 19

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA

Intel IT Research 19

Experimental Set-up

1000 training sessions 3, 5, 10 recommendations 10, 15 and 20 ClickStream Clusters Different testing sessions Experiment repeated 5 times using different

training set

Results compared against the ‘2, -1’ model T-tests performed Same procedure for 3000 training sessions

slide-20
SLIDE 20

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA

Intel IT Research 20

Results

slide-21
SLIDE 21

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA

Intel IT Research 21

slide-22
SLIDE 22

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA

Intel IT Research 22

% Path Reduction

slide-23
SLIDE 23

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA

Intel IT Research 23

slide-24
SLIDE 24

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA

Intel IT Research 24

Conclusion

Incorporated usage data into Average Clicks

algorithm.

Proposed a distance model using usage data

and link graph

Used this method to calculate the similarity

between the pages in an intranet domain

Showed that using a combination of web

graph and link graph will provide better recommendations

slide-25
SLIDE 25

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA

Intel IT Research 25

Future Work

Validate the algorithm using various testing

methods like

Domain expert testing User’s perspective

Compare the algorithm against other usage

based link analysis algorithms

Compare the quality of recommendations

with those obtained by using other kinds of domain information

slide-26
SLIDE 26

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA

Intel IT Research 26

Questions