Usage Aware Average-Clicks Kalyan Beemanapalli University of - PowerPoint PPT Presentation

Usage Aware Average-Clicks Kalyan Beemanapalli – University of Minnesota Ramya Rangarajan – University of Minnesota Jaideep Srivastava – University of Minnesota Presenter: Kalyan Beemanapalli WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, at KDD 2006, Philadelphia, PA, USA Intel IT Research 1

Outline � Introduction � Related Work � Background � Method � Experiments and Results � Key Contributions � Conclusions and Future Work � Questions WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Intel IT Research 2 at KDD 2006, Philadelphia, PA, USA

Related Work – Link Analysis � Applications � PageRank � HITS � Average-Clicks ( Matsuo et al ) � Disadvantage � Static WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Intel IT Research 3 at KDD 2006, Philadelphia, PA, USA

Related work � Solution � Usage Data � Why Usage Aware Average-Clicks? � Average-Clicks � Fairly new algorithm � Proposes a new definition to distance between web pages � Measures distance in user’s context � Ideas from � Usage Aware PageRank ( Oztekin et al ) � Extensions to HITS ( Miller et al ) WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Intel IT Research 4 at KDD 2006, Philadelphia, PA, USA

Average-Clicks � Measure of distance between web pages � Definition – An average click is one click among n links � Probability of a random surfer on a page p to click any one of the links is where α = Damping Factor WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Intel IT Research 5 at KDD 2006, Philadelphia, PA, USA

Average Clicks � Average Click length of links on page p = Where α = Damping Factor, n = Average Number of links on a page Distance between page p and q � shortest path between the nodes representing the pages in the graph � Path through a longer chain of links can be considered shorter than one through smaller number of links WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Intel IT Research 6 at KDD 2006, Philadelphia, PA, USA

Average Clicks - Example WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Intel IT Research 7 at KDD 2006, Philadelphia, PA, USA

Usage Aware Average-Clicks Usage Graph No. of occurrences of Q each page P T R No. of co- Number of co - occurences of p, q = occurrences of C ( p , q ) S Number of occurences of p pages Weight of the edge from p to q = C ( p , q ) Weight assigned to node p WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Intel IT Research 8 at KDD 2006, Philadelphia, PA, USA

Usage Aware Average-Clicks Link Graph Q P T R S = D ( i , j ) ( 1/Outdegre e(page i)) if there is a link to page j on page i ∞ otherwise WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Intel IT Research 9 at KDD 2006, Philadelphia, PA, USA

Usage Aware Average-Clicks � We now have Number of co - occurences of p, q = C ( p , q ) Number of occurences of p = D ( p , q ) ( 1/Outdegre e(page p)) if there is a link to page j on page i ∞ otherwise � We combine the Link Matrix and Usage Matrix to define the new definition of distance between 2 pages as follows: ⎛ ⎞ − α = − log ⎜ ⎟ Dis tan ce ( p , q ) ( 1 C ( p , q )) * ( ⎝ ⎠ Out deg ree ( p ) n WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Intel IT Research 10 at KDD 2006, Philadelphia, PA, USA

Usage Aware Average-Clicks � Shortest distance between pairs of nodes – all pairs shortest path algorithm � All Pairs Shortest path algorithm used – Floyd Warshall’s Algorithm � Implementation Issues � Poor scalability WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Intel IT Research 11 at KDD 2006, Philadelphia, PA, USA

Solution Set of links for page 0 0 1 2 Template for each node Page ID Avg Click Score Vector holding the heads Usage Score of linked lists Usg Avr Avg Click Score Data Structure for Floyd Warshall WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Intel IT Research 12 at KDD 2006, Philadelphia, PA, USA

Experimental Results � Experiments conducted on www.cs.umn.edu � Usage data collected in Apr 2006 � Data set reduced to 100,000 sessions � Noise removed � Link Graph built using our crawler WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Intel IT Research 13 at KDD 2006, Philadelphia, PA, USA

Example Distances WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Intel IT Research 14 at KDD 2006, Philadelphia, PA, USA

Evaluation Methodology � Domain Expert’s View � Questionnaires � User’s View � Questionnaires � Automate verification � Our Method � Predicting Power WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Intel IT Research 15 at KDD 2006, Philadelphia, PA, USA

Evaluation Methodology � Incorporated into a recommender system � Idea - pages that are close to each other are more similar to each other than pages that are farther apart � Performance compared with ‘2, -1’ model � Tested on www.cs.umn.edu WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Intel IT Research 16 at KDD 2006, Philadelphia, PA, USA

The Recommender System Architecture Offline Web Logs Website Session Usage Aware Identification Average-Clicks Session Alignment Generation … … Session Similarity … … Graph Usage Aware Average- Clicks Partitioning Hierarchy Sessions Session Clusters Get Clickstream Trees Recommendations Recommendations Recommendation System HTML + Recommendations Web Client Webpage request Web Server Online WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Intel IT Research 17 at KDD 2006, Philadelphia, PA, USA

Evaluation Measures � Hit Ratio (HR) : Percentage of hits . If a recommended page is actually requested later in the session, we declare a hit. � Click Reduction (CR) : For a test session (p1, p2,…, pi…, pj…, pn) , if pj is recommended at page pi , and pj is subsequently accessed in the session, then the click reduction due to this recommendation is, − j i = Click reduction i WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Intel IT Research 18 at KDD 2006, Philadelphia, PA, USA

Experimental Set-up � 1000 training sessions � 3, 5, 10 recommendations � 10, 15 and 20 ClickStream Clusters � Different testing sessions � Experiment repeated 5 times using different training set � Results compared against the ‘2, -1’ model � T-tests performed � Same procedure for 3000 training sessions WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Intel IT Research 19 at KDD 2006, Philadelphia, PA, USA

Results WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Intel IT Research 20 at KDD 2006, Philadelphia, PA, USA

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Intel IT Research 21 at KDD 2006, Philadelphia, PA, USA

% Path Reduction WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Intel IT Research 22 at KDD 2006, Philadelphia, PA, USA

WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Intel IT Research 23 at KDD 2006, Philadelphia, PA, USA

Conclusion � Incorporated usage data into Average Clicks algorithm. � Proposed a distance model using usage data and link graph � Used this method to calculate the similarity between the pages in an intranet domain � Showed that using a combination of web graph and link graph will provide better recommendations WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Intel IT Research 24 at KDD 2006, Philadelphia, PA, USA

Future Work � Validate the algorithm using various testing methods like � Domain expert testing � User’s perspective � Compare the algorithm against other usage based link analysis algorithms � Compare the quality of recommendations with those obtained by using other kinds of domain information WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Intel IT Research 25 at KDD 2006, Philadelphia, PA, USA

Questions WebKDD 2006 Workshop on Knowledge Discovery on the Web, Aug. 20, 2006, Intel IT Research 26 at KDD 2006, Philadelphia, PA, USA

Usage Aware Average-Clicks Kalyan Beemanapalli University of - PowerPoint PPT Presentation

Usage Aware Average-Clicks Kalyan Beemanapalli University of Minnesota Ramya Rangarajan University of Minnesota Jaideep Srivastava University of Minnesota Presenter: Kalyan Beemanapalli WebKDD 2006 Workshop on Knowledge Discovery on

Clicks within Traffjc to Entrances to Traffjc to Total Cost Clicks Impressions Employer

Advanced Usage of Multi Site Functionality Advanced Usage of Multi Site Functionality by by

Average Connectivity and Average Edge-connectivity in Graphs Suil O joint work with Jaehoon Kim

Commercial Energy Usage District Fuel We have reduced Fuel Usage FY03 - FY08 our average

Releasing Search Queries and Clicks Privately Arne Bayer July 24, 2017 Arne Bayer Releasing

Physics plans and and ILDG ILDG usage usage Physics plans in Italy Italy in Francesco Di

Toolkit to Support Intelligibility in Context Aware Applications Context-Aware Applications P

TOWN OF NORFOLK REVENUE 2001 2006 AVERAGE GROWTH SOURCE AVERAGE GROWTH STATE AID 1.69 %

Reducing Average Handling Times in your Contact Centre What is Average Handle Time? Average

USQCD regional grid USQCD regional grid Report to ILDG 14 Report to ILDG 14 US Grid Usage US

Location-Aware Computing Definition: Location-aware applications generate outputs/behaviors

SECURITY MEASURES Hyderabad in light of covid-19 ONLINE SHOPPING Online shopping is easy and

Lab as a Service Compose Your Cloud Automatically with Few Clicks Parker Berberian, UNH Fatih

Report - Facebook, Instagram, Twitter Feb 01 - Feb 28, 2019 @ConversionIA conversioninteract

CALLS, CLICKS AND CHATS WHERE MARKETING ENDS AND SELLING

Clicks vs Bricks on Campus: Assessing the environmental impact of online food shopping. Sharon

Categorical Liveness Checking by Corecursive Algebras Natsuki Urabe, Masaki Hara &

VERCORS: VERIFICATION OF CONCURRENT SYSTEMS MARIEKE HUISMAN UNIVERSITY OF TWENTE, NETHERLANDS

Analysis of Approximate Median Selection M. Hofri Department of Computer Science, WPI

CS221: Algorithms and Data Structures Priority Queues and Heaps Alan J. Hu (Borrowing slides

Using Graph Theory to Analyze Gene Network Coherence Francisco A. Gmez-Vela Norberto

Software Engineering I (02161) Week 10 Assoc. Prof. Hubert Baumeister DTU Compute Technical

Strings & Software Model Checking Philipp Rmmer Uppsala University 30 August 2019 Taipei,

P OPULATION CTMC A population model is thus given by a tuple X ( N ) = ( X ( N ) , T ( N ) , x ( N

Usage Aware Average-Clicks Kalyan Beemanapalli University of - PowerPoint PPT Presentation

Usage Aware Average-Clicks Kalyan Beemanapalli University of Minnesota Ramya Rangarajan University of Minnesota Jaideep Srivastava University of Minnesota Presenter: Kalyan Beemanapalli WebKDD 2006 Workshop on Knowledge Discovery on

Clicks within Traffjc to Entrances to Traffjc to Total Cost Clicks Impressions Employer

Advanced Usage of Multi Site Functionality Advanced Usage of Multi Site Functionality by by

Average Connectivity and Average Edge-connectivity in Graphs Suil O joint work with Jaehoon Kim

Commercial Energy Usage District Fuel We have reduced Fuel Usage FY03 - FY08 our average

Releasing Search Queries and Clicks Privately Arne Bayer July 24, 2017 Arne Bayer Releasing

Physics plans and and ILDG ILDG usage usage Physics plans in Italy Italy in Francesco Di

Toolkit to Support Intelligibility in Context Aware Applications Context-Aware Applications P

TOWN OF NORFOLK REVENUE 2001 2006 AVERAGE GROWTH SOURCE AVERAGE GROWTH STATE AID 1.69 %

Reducing Average Handling Times in your Contact Centre What is Average Handle Time? Average

USQCD regional grid USQCD regional grid Report to ILDG 14 Report to ILDG 14 US Grid Usage US

Location-Aware Computing Definition: Location-aware applications generate outputs/behaviors

SECURITY MEASURES Hyderabad in light of covid-19 ONLINE SHOPPING Online shopping is easy and

Lab as a Service Compose Your Cloud Automatically with Few Clicks Parker Berberian, UNH Fatih

Report - Facebook, Instagram, Twitter Feb 01 - Feb 28, 2019 @ConversionIA conversioninteract

CALLS, CLICKS AND CHATS WHERE MARKETING ENDS AND SELLING

Clicks vs Bricks on Campus: Assessing the environmental impact of online food shopping. Sharon

Categorical Liveness Checking by Corecursive Algebras Natsuki Urabe, Masaki Hara &amp;

VERCORS: VERIFICATION OF CONCURRENT SYSTEMS MARIEKE HUISMAN UNIVERSITY OF TWENTE, NETHERLANDS

Analysis of Approximate Median Selection M. Hofri Department of Computer Science, WPI

CS221: Algorithms and Data Structures Priority Queues and Heaps Alan J. Hu (Borrowing slides

Using Graph Theory to Analyze Gene Network Coherence Francisco A. Gmez-Vela Norberto

Software Engineering I (02161) Week 10 Assoc. Prof. Hubert Baumeister DTU Compute Technical

Strings &amp; Software Model Checking Philipp Rmmer Uppsala University 30 August 2019 Taipei,

P OPULATION CTMC A population model is thus given by a tuple X ( N ) = ( X ( N ) , T ( N ) , x ( N

Categorical Liveness Checking by Corecursive Algebras Natsuki Urabe, Masaki Hara &

Strings & Software Model Checking Philipp Rmmer Uppsala University 30 August 2019 Taipei,