Intel IT Research
1
WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA
Incorporating Concept Hierarchies Into Usage Mining Based - - PowerPoint PPT Presentation
Incorporating Concept Hierarchies Into Usage Mining Based Recommendations Amit Bose - University of Minnesota Kalyan Beemanapalli University of Minnesota Jaideep Srivastava - University of Minnesota Sigal Sahar - Intel Corporation Presenter:
Intel IT Research
1
WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA
Intel IT Research
2
WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA
Intel IT Research
3
WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA
Intel IT Research
4
WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA
Intel IT Research
5
WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA
Sequence Alignment
Example:
Optimal alignment of the
Intel IT Research
6
WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA
Protein Sequence Alignment is
A protein is a sequence of
One can think of a protein as a
The problem of pair-wise
Use BLOSUM62(Henikoff and
BLOSUM62 Matrix
Intel IT Research
7
WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA
A user session is a sequence
Any two user sessions can be
Challenge is to design an
Several ways possible to
Using Concept hierarchy of
Using Link structure of the
Concept Hierarchy Site Connectivity
Page Sim ilarity Based on Concept Hierarchy Clusters of User Sessions
Online Phase of the Recom m endation Engine W eb Logs
Page Sim ilarity Based on Site Topology
Model for using Domain Knowledge
Intel IT Research
8
WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA
Important ingredient in sequence alignment Two kinds of Similarity measures:
Defining similarity: two issues
What is the basis of similarity How to calculate strength of this similarity
Meaning of session alignment – find the best matching of user
We use Domain knowledge to define similarity between pages and
Intel IT Research
9
WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA
Intel IT Research
10
WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA
Returning after absence Advising Registration How-to Guides Career Services
Student Services
Credit Requirements Pre-registration Grading Options 13-creditpolicy.htm
. . .
Figure 2. Example concept hierarchy for a university student-services website
Intel IT Research
11
WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA
where p(n) is the probability assigned to node n
Intel IT Research
12
WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA
where A – Common Ancestor of pages belonging to concepts n1 and n2
Returning after absence
(I = …)
Advising
(I = …)
Registration
(I = 2.17891) How-to Guides (I = 4.5362)
Career Services
(I = …)
Student Services
(I = 0)
Credit Requirements
(I = 4.9578)
Pre-registration
(I = 5.29699)
Grading Options
(I = …)
13-creditpolicy.htm
. . . . . . . . . . . . . . . . . .
Figure 3. Annotated concept hierarchy for student-services example
Intel IT Research
13
WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA
Where IM and IMAX are the median and maximum values of the information contents of all concept nodes in the hierarchy
Intel IT Research
14
WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA
… … … … Website Web Logs Session Similarity Session Clusters Clickstream Trees
Sessions
Concept Hierarchy Webpage request Get Recommendations Recommendations HTML + Recommendations Web Client Web Server Session Identification Hierarchy Generation Graph Partitioning
Offline
Online
Figure 1. The Recommender System
Session Alignment Recommendation System
Intel IT Research
15
WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA
Intel IT Research
16
WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA
Intel IT Research
17
WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA
Predictive Ability (PA): Percentage of pages in the test sessions for which the
model was able to make recommendations. This is a measure of how useful the model is.
Prediction Strength (PS): Average number of recommendations made for a page. Hit Ratio (HR): Percentage of hits. If a recommended page is actually requested later
in the session, we declare a hit. The hit ratio is thus a measure of how good the model is in making recommendations.
Click Reduction (CR): Average percentage click reduction. For a test session (p1,
p2,…, pi…, pj…, pn), if pj is recommended at page pi, and pj is subsequently accessed in the session, then the click reduction due to this recommendation is: (j-i)/i
Average Recommendation Rank (AR): Average rank of a hit.. If a
recommendation is a hit, then the rank of the recommendation is the rank of that hit. The lower the rank of a hit, the better the quality of recommendation.
Intel IT Research
18
WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA
Number of Recommendations Made = 10
Model Metrics PA PS HR AR 9.82 45.22 6.23 9.81 42.17 6.28 9.80 54.08 6.38 39.89 27.38 30.68 CR RSM 93.42 SSM 97.50 CSM 97.27
Number of Recommendations Made = 5
42.56 3.41 4.96 38.87 31.23 3.59 4.96 27.56 35.13 3.12 4.96 33.14 HR AR PS CR PA Metrics Model 93.42 RSM 97.27 CSM 97.50 SSM
Intel IT Research
19
WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA
Intel IT Research
20
WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA
Intel IT Research
21
WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA