visualization of navigation patterns on a web site
play

Visualization of Navigation Patterns on a Web Site Overview using - PowerPoint PPT Presentation

Visualization of Navigation Patterns on a Web Site Overview using Model Based Clustering Aim : Cluster sequences of user navigation patterns, so as to understand users of websiteexploratory data analysis by The data I. Cadez, D.


  1. Visualization of Navigation Patterns on a Web Site Overview using Model Based Clustering • Aim : Cluster sequences of user navigation patterns, so as to understand users of website—exploratory data analysis by • The data I. Cadez, D. Heckerman, C. Meek, P . Smyth, S. White • The output • The model—mixtures of Markov models Proceedings of KDD-2000 • Fitting the model • Application to msnbc.com • Summary Chris Williams, School of Informatics University of Edinburgh The data The output • Server log files have been converted into a set of sequences, one sequence for each user session • WebCANVAS tool • Each sequence is an ordered list of discrete symbols • Overview screen giving all sequences in each cluster (scrollable) • Each symbol represents one of several possible categories of web pages requested by the user • “Drill down” into a cluster by obtaining • Example sequences frontpage news travel travel – marginal distribution for each cluster news news news news news weather – distribution over first event news health health business business business – transition probabilities p ( i, j )

  2. The model Fitting the model • Mixture of Markov models • Use EM (penalized maximum likelihood) • Initialize π ’s all equal K � p ( x | θ ) = π k p ( x | θ k ) • Initialize θ ’s by fitting a single Markov model, then perturbing these parameters in each i =1 component L � p ( x | θ k ) = p ( x i | θ I p ( x i | x i − 1 , θ T k ) k ) • Do 20 restarts for each K , choose model with highest posterior probability i =2 • Choose K using log likelihood of hold-out data • θ I k is probability of the initial symbol in the sequence (multinomial) • θ T k is the transition probability from x i − 1 to x i ; each row is a multinomial • Can also use a zeroth-order Markov model (unigram model) p ( x | θ k ) = � L i =1 p ( x i | θ U k ) A small problem, and a solution Application to msnbc.com • Two or more clusters can be encoded by a single Markov model • 100,023 training sequences, 98,687 validation seq • Example: start at a then choose between b and c , or start at d then • Found that EM scaled linearly with N (number of sequences) and K choose between e and f • Best first-order model has 40 components • This problem occurred frequently • Chose constrained model with 100 components (of course constrained • Solved by allowing only one non-zero probability start state model needs more components)

  3. Summary • Mixture of first-order Markov models • WebCANVAS tool to visualize the clustered data and models • Found that this clustering has revealed numerous interesting insights

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend