outline
play

Outline Discovering Interesting Patterns Through Users Interactive - PowerPoint PPT Presentation

Outline Discovering Interesting Patterns Through Users Interactive Feedback Well begun is half done. Aristotle Introduction and Background Dong Xin Xuehua Shen Qiaozhu Mei Jiawei Han The Algorithm Presented by:


  1. Outline Discovering Interesting Patterns Through User’s Interactive Feedback Well begun is half done. Aristotle • Introduction and Background Dong Xin Xuehua Shen Qiaozhu Mei Jiawei Han • The Algorithm Presented by: • Examples Jeff Boisvert • Conclusions/Future Work April 11, 2007 • Critique of Paper This paper was presented at KDD ‘06 1 Introduction and Background Introduction and Background • Motivation • SVM – discover ‘interesting’ patterns in data – I think we have been presented with this enough – Subjective ‘interestingness’ � user • Clustering – Often too many patterns to assess manually – K-clusters - Minimize the maximum distance of each pattern to the nearest sample in a cluster • Distance measure – Jaccard distance (between two patterns) www.amazon.com • Setting ∩ T P ( ) T P ( ) = − 1 2 D P P ( , ) 1 – assume an available set of candidate patterns (freq item sets, etc) 1 2 ∪ T P ( ) T P ( ) – Have user rank a subset of the candidate patterns 1 2 – Learn from the users ranking – Have user rank more patterns • Ranking – Learn – Linear - i.e. 2 < 3 (difference in ranking would be 3-2 = 1) – … – Log-Linear - i.e. log(2) < log(3) (difference in ranking would be 0.176) 2 3

  2. Outline The Algorithm • Overview Cluster N patterns in k clusters An algorithm must be seen to be believed. User ranks k patterns 1. Prune candidate patterns and micro-clustering Donald Knuth Refine model Re-rank all N patterns 2. Cluster N patterns into k clusters N=aN 3. Present k patterns to user for ranking • Introduction and Background 4. Refine the model with new user rankings • The Algorithm 5. Re-rank all N patterns with new model 6. Reduce N=a*N • Examples 7. Go to step 2 • Conclusions/Future Work • Areas to discuss • Critique of Paper – (1) Preprocessing – pruning and micro-clustering – Clustering – see introduction – (2) Selecting the k patterns to present to the user – (3) Modeling the users knowledge/ranking *** 4 5 The Algorithm ( Preprocessing ) The Algorithm ( k patterns ) • Clustering patterns • Pruning Cluster N patterns in k clusters Cluster N patterns in k clusters User ranks k patterns – Really have N micro-clusters but … User ranks k patterns – get representative patterns from candidates Refine model Refine model Re-rank all N patterns Re-rank all N patterns – start with maximal’s • Selecting Patterns N=aN N=aN – merge candidates into maximal's – Criteria 1 – patterns presented should not be redundant Which k patterns – representative pattern = maximal Redundant patterns often rank close to each other to present to user? Redundant if same composition/frequency – discard patterns, keep micro-cluster's (maximal’s) www.johndeerelandscapes.com – Criteria 2 – helps refine model of users knowledge of interesting pattern (not uninteresting patterns) • Micro-clustering • Method [ Gonzalez, 1985. Clustering to minimize the maximum intercluster distance ] – Randomly select the first pattern – Two patterns are merged if: – Second pattern – maximum distance from first pattern D(P 1 ,P 2 ) < epsilon – Third pattern – max distance to the nearest of the first and second patterns – … – D is the Jaccard distance Zaiane, COMPUT 695 notes – Epsilon provided by the user (i.e. 0.1) 6 7

  3. The Algorithm ( refine model 1 ) The Algorithm ( refine model 2 ) *** main contribution of the paper • Log-Linear Model Cluster N patterns in k clusters Cluster N patterns in k clusters User ranks k patterns User ranks k patterns – How to model the users knowledge? – Say we have a pattern (P) in a data set of s items, f e (P) is: Refine model Refine model s – So far we have only ranked k out of N patterns… Re-rank all N patterns + ∑ Re-rank all N patterns = log f P ( ) u u N=aN N=aN e j = • Interestingness j 1 – Recall ordering of patterns by user as a constraint: – Difference between observed frequency and expected frequency f o (P) and f e (P) − > − log f ( ) P log f P ( ) log f ( P ) log f P ( ) – Observed from input o 1 e 1 o 2 e 2 – Expected calculated from the model of the users knowledge – Define a weight vector and new representation of the constraint above : f e (P) = M(P, θ ) – If f o (P) and f e (P) are different the pattern is interesting = − − − = w [ c , u , u ,..., u ] v P ( ) [log f ( ), P x ,..., x ] 0 1 s o 1 s • Ranking Will have k – if the user ranks P i as more interesting than P j : constraints R [ f o (P i ),f e (P i ) ] > R [ f o (P j ),f e (P j ) ] T > T w v P ( ) w v P ( ) ฀ ฀ – Log-linear model R [ f o (P),f e (P) ] = log f o (P) - log f e (P) 1 2 8 9 – This is a constraint on the model optimization R [ f o (P i ),f e (P i ) ] > R [ f o (P j ),f e (P j ) ] The Algorithm ( Re-rank all N patters ) The Algorithm ( Reduce N ) • Log-Linear Model ( cont.) • Reduce number of patterns = − − − = Cluster N patterns in k clusters Cluster N patterns in k clusters w [ c , u , u ,..., u ] v P ( ) [log f ( ), P x ,..., x ] User ranks k patterns User ranks k patterns 0 1 s o 1 s – Discard some patterns Refine model Refine model T > T N=aN w v P ( ) w v P ( ) ฀ ฀ Re-rank all N patterns Re-rank all N patterns 1 2 N=aN N=aN – a is specified by the user R [ f o (P i ),f e (P i ) ] > R [ f o (P j ),f e (P j ) ] – Will reduce the number of patterns to present to user at end – Stop when reached the max number of iterations also specified by the user END OF ALGORITHM ☺ Modified from • Biased belief model www.nasa.com – Not presented – Identical formulation to log-linear but assign a users belief probability to each transaction 1 = SVM Black Box v P ( ) [ ( ),..., x P x ( )] P = w [ p ,..., p ] 1 m f ( ) P 1 m o – Can now rank ALL N patterns with interesting measure: > T T m = number of transactions w v P ( ) w v P ( ) ฀ ฀ 10 11 1 2 x k (P) = 1 if the transaction k contains P R [ f (P),f (P) ] = K[v(P ),w ]

  4. The Algorithm Outline Few things are harder to put up with than • Overview the annoyance of a good example. Mark Twain 1. Pre-process - prune / micro-clustering 2. Cluster N patterns into k clusters, present to user 3. Refine the model with new user rankings, re-rank patterns • Introduction and Background 4. Reduce N=a*N • The Algorithm 5. Stop when reached max number of iterations • Example • Input parameters • Conclusions/Future Work – a = shrinking ratio – k = number of user feedback patters • Critique of Paper – niter =number of iterations to consider (will control number of patterns in output) – Epsilon – micro-clustering parameter – Model type – log-linear vs. biased belief – Ranking type – linear vs. log 12 13 Example 1 Example - 2 Transactions Get microclusters Pick a pattern Pick a pattern Pick pattern • Their results on item sets: (35) (19) #1 #2 # k – Use data to simulate a persons prior knowledge – Partition data into 2 subsets, one background one for observed data If k = 2 present: 1 2 3 distance 0 2 2 3 7 1 6 8 0 8 1 – Background = users prior 0 5 3 4 1 1 7 1 3 6 4 5 4 2 6 4 2 0 5 0.5 – Accuracy measured by 4 3 6 3 4 2 5 5 0 0 8 1 0 8 1 3 6 4 1 to the user 4 1 0 0 1 4 4 3 2 0 6 2 ∩ 7 3 7 3 4 2 1 background learned for ranking top ( ) k top ( ) k 8 1 6 7 3 7 = Accuracy 5 1 2 0 8 1 0 8 6 7 6 5 7 Refine Log-linear 8 4 7 k 0 1 4 0.333 2 5 7 8 5 7 2 6 0 Model 8 2 7 0 6 2 0.667 1 8 1 5 6 7 1 6 4 With new f e use SVM 8 1 6 0.333 5 7 2 1 4 7 – Data set: 2 4 6 to rank all 19 1 5 2 4 2 4 8 6 7 0.667 6 4 2 49,046 transactions 2 2 6 transactions 4 5 2 6 5 7 8 4 7 0.667 7 7 8 4 5 2 2,113 items 7 1 4 8 5 7 0.667 5 2 7 Reduce N 4 2 2 8 7 4 8 2 7 0.667 average length of 74 8 0 1 Sort transactions by 6 8 7 1 6 4 0.667 – First 1000 transactions are observed set 1 6 4 rank, take the top 7 5 8 1 4 7 0.667 2 8 7 – 8,234 closed frequent item sets 7 2 5 1 5 2 0.667 aN, say a= 0.1, take 1 5 2 – Micro-clustering reduces to 769 7 5 7 6 4 2 1 the top 17 (19*0.9) 4 7 7 6 5 7 6 5 7 1 – Compare top k ranked patterns 14 15 4 5 2 1

  5. Example - 3 Example - 4 • Their results on sequences: • Their results compared to other algorithms: – 1609 sentences – Same data as example 3 (1609 sentences) – 967 closed sequential patterns – They claim theirs is better… – Full feedback: use k = 967 Selective Sampling Yu, KDD ‘05. Top-N Shen and Zhai, SIGIR ‘05 16 17 Outline Conclusions • Conclusions "I would never die for my beliefs because I might be wrong.” – Interactive with user Bertrand Russell – Tries to learn the users knowledge – Flexible (but flexible = many parameters) • Introduction and Background – Does not work well with sparse data • The Algorithm • Proposed future work • Examples – Study different models for sparse data • Conclusions/Future Work – Better feedback strategies to maximize learning – Apply to other data types/sets • Critique of Paper 18 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend