promotion analysis in
play

Promotion Analysis in Multi-Dimensional Space Tianyi Wu (UIUC) - PowerPoint PPT Presentation

Promotion Analysis in Multi-Dimensional Space Tianyi Wu (UIUC) Dong Xin (Microsoft Research) Qiaozhu Mei (University of Michigan) Jiawei Han (UIUC) 2 Outline Introduction Query execution algorithms Spurious promotion


  1. Promotion Analysis in Multi-Dimensional Space Tianyi Wu (UIUC) Dong Xin (Microsoft Research) Qiaozhu Mei (University of Michigan) Jiawei Han (UIUC)

  2. 2 Outline  Introduction  Query execution algorithms  Spurious promotion  Experiment  Conclusion

  3. 3 Outline  Introduction  Query execution algorithms  Spurious promotion  Experiment  Conclusion

  4. 4 Promotion analysis: introduction  Formulate and study a useful function  Promotion analysis through ranking  General goal: promote a given object by leveraging subspace ranking  Motivating example  A marketing manager of a book retailer  Basic fact  Book sales: 30 th out of 100 other retailers  Not particularly interesting!  After promotion analysis, he discovered:  Ranked 1st in the { college students, science and technology } area  Further advertising and marketing decisions  Another example: person promotion Let’s promote our brand!

  5. 5 Promotion query Observation Global rank May not be interesting Local rank Can be more interesting Compare to all other Compare objects in Full-space Subspaces objects in all aspects certain areas Single SQL query Low cost Many subspaces High cost T HE P ROMOTION Q UERY P ROBLEM Given: an object (e.g., product, person) Goal: discover the most interesting subspaces where the object is highly ranked

  6. 6 Subspace rank: why interesting  Discover merit and competitive strengths  E.g., a bestselling car model among hybrid cars  Enhance image  E.g., fortune 500 company  Facilitate decision making  E.g., marketing plan that focuses on college students  Deliver specific information  E.g., “top - 3 university in biomedical research” vs. “top - 20 university”  Extensively practiced in marketing  Market segmentation  Customer targeting and product positioning

  7. 7 Challenges  Current systems  Given a condition, find top- k objects  Sophisticated early termination and pruning algorithms  Promotion query: not well-supported  User: manual search and navigation  Trial-and-error  Computationally expensive It should be good at …  The rank measure: holistic Let me try some queries…  A blow-up of subspaces

  8. 8 Promotion analysis Multidimensional data model  Fact table Location Time Object Score Lyon July T 0.5 Chicago July T 0.8 Chicago August S 1.0 Chicago July S 1.0 Lyon August V 0.3 Chicago August V 0.6 Chicago July V 0.7 Subspace dimensions Object dimension Score dimension

  9. 9 Subspaces Location Location Time Time Object Object Score Score Lyon Lyon July July T T 0.5 0.5 Chicago Chicago July July T T 0.8 0.8 Aggregate and compute the target Chicago Chicago August August S S 1.0 1.0 object’s rank in each subspace. Chicago Chicago July July S S 1.0 1.0 Lyon Lyon August August V V 0.3 0.3 {*} Chicago Chicago August August V V 0.6 0.6 SUM(T)=1.3 Chicago Chicago July July V V 0.7 0.7 Rank(T)=3 rd / 3 Given a target object T {Lyon} {Chicago} {July} SUM(T)=0.5 SUM(T)=1.8 SUM(T)=1.3 Rank(T)=1 st / 2 Rank(T)=3 rd / 3 Rank(T)=1 st / 3 {Lyon, July} {Chicago, July} SUM(T)=0.8 SUM(T)=0.5 Rank(T)=2 nd / 3 Subspaces of T Rank(T)=1 st / 1 {*} is the special case: full-space

  10. 10 Query model  Given a target object T, find the top subspaces which are promotive  “ Promotiveness ” : a class of measures to quantify how well a subspace S can promote T  P(S, T) = f(Rank(S, T)) * g(Sig(S))  Higher rank ~ more promotive  More significant subspace (e.g., more objects) ~ more promotive  Example instantiations  Simple ranking: P(S, T) = Rank -1 (S, T)  Iceberg condition: P(S, T) = Rank -1 (S, T) * I(ObjCount(S)>MinSig)  Percentile ranking: P(S, T) = ObjCount(S) / Rank(S, T)  …

  11. 11 Query model  Given a target object T, find the top subspaces which are promotive  “ Promotiveness ” : a class of measures to quantify how well a T HE P ROMOTION Q UERY P ROBLEM subspace S can promote T Input: a target object T  P(S, T) = f(Rank(S, T)) * g(Sig(S)) Output: top-R subspaces with the largest P(S, T) scores  Higher rank ~ more promotive /* assume simple ranking */  More significant subspace (e.g., more objects) ~ more promotive  Example instantiations  Simple ranking: P(S, T) = Rank -1 (S, T)  Iceberg condition: P(S, T) = Rank -1 (S, T) * I(ObjCount(S)>MinSig)  Percentile ranking: P(S, T) = ObjCount(S) / Rank(S, T)  …

  12. 12 Outline  Introduction  Query execution algorithms  (1) PromoRank framework  (a) Subspace pruning  (b) Object pruning  (2) Promotion cubes  Spurious promotion  Experiment  Conclusion

  13. 13 The PromoRank framework Idea: use a recursive process to {*} partition and aggregate the data to compute the target object’s rank in each subspace [Beyer99] The bottom-up method {A} {B} {C} {D} {AB} {AC} {AD} {BC} {BD} {CD} {ABC} {ABD} {ACD} {BCD} {ABCD} Target object’s subspace lattice

  14. 14 Compute T’s rank in {*} Method: create a hash table: PromoRank: recursive process HashTable[object] = AggregateScore Partition the data based on A {*} Method: sorting Compute T’s rank in {A} 1 {A} {B} {C} {D} {A} 2 10 14 16 Recursively repeat… {AB} {AB} {AC} {AD} {BC} {BD} {CD} 3 7 9 11 13 15 Top-R promotive {ABC} {ABD} {ACD} {BCD} 4 6 8 12 subspaces: priority queue {ABCD} 5

  15. 15 (1.1) Subspace pruning  Idea: reuse previous results  Goal: prune out unseen subspaces by bounding their promotiveness {*} scores Sig(S) : bounded {A} {B} {C} {D} {A} Rank(S, T) : bounded {AB} {AB} {AC} {AD} {BC} {BD} {CD} {ABC} {ABD} {ACD} {BCD} {ABCD}

  16. 16 Subspace pruning  Keys: Any unseen subspace with low LBRank(T) can be pruned  Compute T’s highest possible Rank: LBRank , S}|+ 1 = 3 rd Thus, LBRank(T) = |{V  Use the monotonicity of the aggregate measure (e.g. SUM, MAX) SUM(V) > SUM(T) SUM(S) > SUM(T) {B} SUM(T) = 1.9 SUM(V) = 5.5 How to prune an unseen one? SUM(S) = 2.2 10 SUM(V) = 5.5 Given a seen (aggregated) subspace {AB} SUM(S) = 2.2 3 SUM(T) = 1.1 Rank(T) = 3rd / 3

  17. 17 (1.2) Object pruning Idea: avoid computing objects Power-law distribution: objects which do not affect rank at the long-tail can be pruned Goal: reduce the partitioning and aggregation cost W and Z can be pruned! SUM(S) = 6.5 SUM(W)<MinScore(T) SUM(T) = 2.2 SUM(Z)<MinScore(T) SUM(U) = 1.5 Seen (aggregated) subspace SUM(W) = 1.0 {A} SUM(Z) = 0.8 Unseen subtree of {AB} {AC} subspaces SUM(T) = 1.9 MinScore(T) = 1.1 SUM(T) = 1.2 {ABC} SUM(T) = 1.1

  18. 18 (2) Promotion cubes Observation: (1) T: tends to be highly ranked in a top subspace; (2) A top subspace is likely to contain many objects  Method: promotion cube  Offline materialization  Structure  For each subspace with Sig(S)>MinSig  parameter: MinSig  Materialize a selected sample of top- k aggregate scores in each subspace  Parameter(s): k and k’

  19. 19 Promotion cell  For each “significant” subspace S, create a “promotion cell”  Promotion cell:  Store aggregate scores; no object IDs Subspace S  Parameters MinSig , k , and k’ : chosen to yield a space-time tradeoff; application dependent Passing the MinSig  Does not restrict query processing threshold PCell(S) k =9, k’ =3 Object (sorted) Object (sorted)

  20. 20 Query execution using promotion cube  Step 1: Compute T’s aggregate scores  Step 2: Compute LBRanks and UBRanks and do pruning  Using the promotion cube {*} {*}  Step 3: Call PromoRank SUM(T)=3.0 {A} {A} {B} {B} {C} {C} {D} {D} SUM(T)=2.2 SUM(T)=2.2 SUM(T)=1.9 SUM(T)=1.6 {AB} {AB} {AC} {AC} {AD} {AD} {BC} {BC} {BD} {BD} {CD} {CD} SUM(T)=1.2 SUM(T)=1.9 SUM(T)=1.8 SUM(T)=1.9 SUM(T)=1.5 SUM(T)=0.9 {ABC} {ABC} {ABD} {ABD} {ACD} {ACD} {BCD} {BCD} SUM(T)=1.1 SUM(T)=0.9 SUM(T)=0.5 SUM(T)=0.3 SUM(T)=0.5 {ABCD} {ABCD}

  21. 21 Query execution using promotion cube  Step 1: Compute T’s aggregate scores  Step 2: Compute LBRanks and UBRanks and do pruning [LBRank, UBRank]  Using the promotion cube {*} {*}  Step 3: Call PromoRank [11, 19] {A} {A} {B} {B} {C} {C} {D} {D} [51, 59] [20, 20] [21, 29] [31, 39] {AB} {AB} {AC} {AC} {AD} {AD} {BC} {BC} {BD} {BD} {CD} {CD} [11, 19] [61,∞) [31, 39] [11, 19] [21, 29] [31, 39] {ABC} {ABC} {ABD} {ABD} {ACD} {ACD} {BCD} {BCD} [21, 29] [61, ∞) [11, 19] [50, 50] {ABCD} {ABCD} [51, 59]

  22. 22 Outline  Introduction  Query execution algorithms  Spurious promotion  Experiment  Conclusion

  23. 23 The spurious promotion problem  Spurious promotion  The target object is highly ranked in a subspace due to random perturbation: not meaningful  Example: Michael Jordan (NBA player) Rank Subspace # 1 {Year = 1995} OK # 1 {MonthOfBirth = February} Spurious # 1 {Weather = Sunny} Spurious Due to random perturbation

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend