Outline Ranking and skyline Top- k algorithms Skyline algorithms - PowerPoint PPT Presentation

Data Mining Top-K and Skyline October 17, 2017 1

Outline  Ranking and skyline  Top- k algorithms  Skyline algorithms  Reconciling top-k and skyline 2

Ranking queries Who is the best NBA player? According to points : Tracy McGrady, score 2003 According to rebounds : Shaquille O'Neal, score 760 According to points + rebounds : Tracy McGrady, score 2487 …… Name Points Rebounds Assists Steals …… Tracy McGrady 2003 484 448 135 …… Kobe Bryant 1819 392 398 86 …… Shaquille O'Neal 1669 760 200 36 …… Yao Ming 1465 669 61 34 …… Dwyane Wade 1854 397 520 121 …… Steve Nash 1165 249 861 74 …… …… …… …… …… …… 3

Ranking queries Top- k Query Given a dataset D of n objects, a scoring function F (according to which we rank the objects in D) and k, a Top-k query returns the k objects with the best score (rank) in D. 4

Similarity queries K-NN Query Given a dataset D of n objects, a query point q, a distance function F and k, a k-NN query returns the k objects with the smallest distance to q. 5

Problems of top-K and k-NN In a Top- k and k -NN query the ranking/distance function F as well as the number of answers k must be provided by the user. In many cases it is difficult to define a meaningful ranking/distance function, especially when the attributes have different semantics (e.g., find the cheapest hotel closest to the beach). 6

Skyline: Hotel Example price hotel distance price p 1 4 400 400 p 1 p 2 24 380 p 2 p 3 14 340 p 3 p 4 300 36 300 p 4 p 5 p 5 26 280 p 6 p 6 8 260 200 p 7 p 7 40 200 p 8 p 8 20 180 p 9 p 10 p 9 100 34 140 p 10 28 120 p 11 p 11 16 60 10 20 30 40 distance to the destination Skyline Computation: Challenges and Opportunities

Skyline: Hotel Example price hotel distance price 0.75*Distance + 0.25*price/10 400 p 1 4 400 13 p 1 p 2 p 2 24 380 27.5 p 3 p 3 14 340 19 300 p 4 P 4 36 300 34.5 p 5 p 6 p 5 26 280 26.5 200 p 6 8 260 12.5 p 7 p 8 p 7 40 200 35 p 9 p 8 20 180 19.5 p 10 100 p 9 34 140 29 p 11 p 10 28 120 24 p 11 16 60 13.5 10 20 30 40 distance to the destination Skyline Computation: Challenges and Opportunities

Skyline: Hotel Example price hotel distance price p 1 4 400 400 p 1 p 2 24 380 p 2 p 3 14 340 p 3 p 4 300 36 300 p 4 p 5 p 5 26 280 p 6 p 6 8 260 200 p 7 p 7 40 200 p 8 p 8 20 180 p 9 p 9 p 10 100 34 140 p 10 28 120 p 11 p 11 16 60 10 20 30 40 distance to the destination Definition ( Skyline ). Given a dataset P of n points in d -dimensional space. Let p and p t be two different points in P , p dominates p t , if for all i , p [ i ] ≤ p t [ i ], and for at least one i , p [ i ] < p t [ i ]. The skyline points are those points that are not dominated by any other point in P . Skyline Computation: Challenges and Opportunities

Skyline Queries: Patient Similarity Search Example Skyline Queries Table:Sample of heart disease dataset. (a) Original data. ID age trestbps 40 140 p 1 39 120 p 2 45 130 p 3 37 140 p 4 trestbps 140 p 4 p 1 130 p 3 q 120 p 2 110 45 age 35 40 Query point: q(41,125) Skyline Computation: Challenges and Opportunities

Motivating Example: Skyline Queries Table:Sample of heart disease dataset. (a) Original data. (b) Mapped Data. ID age trestbps ID age trestbps 40 140 42 140 p 1 t 1 39 120 43 130 p 2 t 2 45 130 45 130 p 3 t 3 37 140 45 140 p 4 t 4 trestbps 140 p 4 p 1 t 1 t 4 t 3 130 t 2 p 3 q 120 p 2 110 45 age 35 40 Query point: q(41,125). Skyline Computation: Challenges and Opportunities

Motivating Example: Skyline Queries Table:Sample of heart disease dataset. (a) Original data. (b) Mapped Data. ID age trestbps ID age trestbps 40 140 42 140 p 1 t 1 39 120 43 130 p 2 t 2 45 130 45 130 p 3 t 3 37 140 45 140 p 4 t 4 trestbps 140 p 4 p 1 t 1 t 4 t 3 130 t 2 p 3 q 120 p 2 110 45 age 35 40 Query point :q(41,125). Skyline Computation: Challenges and Opportunities

Skyline  Applications  Recommendation: recommend phones as cheap as possible, as large memory capacity as possible, as light weight as possible  Aggregation/integration: rank results from multiple search engines with relevance score  Preprocessing for top-k: all candidates for top-1 15

Skyline for Top-1 price hotel distance price p 1 4 400 400 p 1 p 2 24 380 p 2 p 3 14 340 p 3 p 4 300 36 300 p 4 p 5 p 5 26 280 p 6 p 6 8 260 200 p 7 p 7 40 200 p 8 p 8 20 180 p 9 p 10 p 9 100 34 140 p 10 28 120 p 11 p 11 16 60 10 20 30 40 distance to the destination Skyline Computation: Challenges and Opportunities

What about Top-K? price hotel distance price p 1 4 400 400 p 1 p 2 24 380 p 2 p 3 14 340 p 3 p 4 300 36 300 p 4 p 5 p 5 26 280 p 6 p 6 8 260 200 p 7 p 7 40 200 p 8 p 8 20 180 p 9 p 10 p 9 100 34 140 p 10 28 120 p 11 p 11 16 60 10 20 30 40 distance to the destination Skyline Computation: Challenges and Opportunities

Skyline for TopK price hotel distance price p 1 4 400 400 p 1 p 2 24 380 p 2 p 3 14 340 p 3 300 p 4 36 300 p 4 p 5 p 5 26 280 p 6 p 6 8 260 200 p 7 p 7 40 200 p 8 p 8 20 180 p 9 100 p 10 p 9 34 140 Lowest Price p 10 28 120 p 11 p 11 16 60 10 20 30 40 distance to the destination • Skyline: pareto top-1 points • Group skyline: pareto top-k groups Skyline Computation: Challenges and Opportunities

Group skyline definition: Dominance Definition ( G-Skyline ). We say group G dominates group G t , denoted by G ≺ g G t , if we can find two permutations of the t t k points for G and G t , G = { p u 1 , p u 2 , ..., p uk } and G t = { p t } , such that p Ç p t for all i v 1 , p v 2 , ..., p vk ui vi (1 ≤ i ≤ k ) and p ui ≺ p t vi for at least one i . The k -point G-Skyline consists of those groups with k points that are not g-dominated by any other group with same size. price hotel distance price p 1 4 400 400 p 1 p 2 24 380 p 2 p 3 14 340 p 3 p 4 300 36 300 p 4 p 5 p 5 26 280 p 6 p 6 8 260 200 p 7 p 7 40 200 p 8 p 8 20 180 p 9 p 10 p 9 100 34 140 p 10 28 120 p 11 p 11 16 60 10 20 30 40 distance to the destination Skyline Computation: Challenges and Opportunities

Hotel Example price hotel distance price p 1 4 400 400 p 1 p 2 24 380 p 2 p 3 14 340 p 3 300 p 4 36 300 p 4 p 5 p 5 26 280 p 6 p 6 8 260 200 p 7 p 7 40 200 p 8 p 8 20 180 p 9 100 p 10 p 9 34 140 Lowest Price p 10 28 120 p 11 p 11 16 60 10 20 30 40 distance to the destination Skyline Computation: Challenges and Opportunities

Outline  Ranking and skyline  Top- k algorithms  Skyline algorithms  Reconciling top-k and skyline 22

Introduction – naïve methods Top-k processing  Apply the ranking function F to all objects  Unsorted: linearly scan all objects (online)  Sorted list: sorting all objects (offline)  Priority queue: build queue (offline), remove top-k (online)  Offline computation needs to know the scoring function! 23

Top- k Computation – FA algorithm F agin’s Algorithm (FA) R. Fagin, Amnon Lotem, Moni Naor . “ Optimal Aggregation Algorithms for Middleware ”. J. Comput. Syst. Sci. 66(4), pp. 614-656, 2003. The algorithm is based on two types of accesses: Sorted access on attribute a i : retrieves the next object in the sorted list of a i Random access on attribute a i : gives the value of the i -th attribute for a specific object identifier. 24

Top- k Computation The database can be considered as an n x m score matrix, storing the score values of every object in every attribute. a1 a2 a3 a4 a5 O 3 , 99 O 1 , 91 O 1 , 92 O 3 , 74 O 3 , 67 O 1 , 66 O 3 , 90 O 3 , 75 O 1 , 56 O 4 , 67 O 0 , 63 O 0 , 61 O 4 , 70 O 0 , 56 O 1 , 58 O 2 , 48 O 4 , 07 O 2 , 16 O 2 , 28 O 2 , 54 O 4 , 44 O 2 , 01 O 0 , 01 O 4 , 19 O 0 , 35 Note that, for each attribute scores are sorted in descending order. 25

Top- k Computation – FA algorithm Outline of FA Step 1: • Read attributes from every sorted list using sorted access. • Stop when k objects have been seen in common from all lists. Step 2: • Use random access to find missing scores. Step 3: • Compute the scores of the seen objects. • Return the k highest scored objects. 26

Outline Ranking and skyline Top- k algorithms Skyline algorithms - PowerPoint PPT Presentation

Data Mining Top-K and Skyline October 17, 2017 1 Outline Ranking and skyline Top- k algorithms Skyline algorithms Reconciling top-k and skyline 2 Ranking queries Who is the best NBA player? According to points : Tracy McGrady,

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

On Skyline Groups Nan Zhang Chengkai Li Sundaresan Rajasekaran Naeemul Hassan Gautam Das

The Digital Michelangelo Project Marc Levoy Computer Science Department Stanford University

References Bourdieu, P. (1979). La distinction. Critique sociale du jugement. Paris: Minuit. Le

Tips & trick for your first semester at TUM SOM Tuesday, October 20, 2020 TUM School of

Knowledge Transfer for Visual Recognition The University of Tokyo RIKEN AIP (Team leader of

loom p W eb 3 .0 Content Authoring Linked Data Authoring for Non-Experts Ralf Heese, Markus

Graphics! def f(p, q): def main(): print(2 * q + p) i = 10 j = 3 f(i, j) def g(c, d): f(j,

Franco Moretti and Oleg Sobchuk Hidden in Plain Sight. Thoughts on Data Visualization in the

Outline Ranking and skyline Top- k algorithms Skyline algorithms - PowerPoint PPT Presentation

Data Mining Top-K and Skyline October 17, 2017 1 Outline Ranking and skyline Top- k algorithms Skyline algorithms Reconciling top-k and skyline 2 Ranking queries Who is the best NBA player? According to points : Tracy McGrady,

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

On Skyline Groups Nan Zhang Chengkai Li Sundaresan Rajasekaran Naeemul Hassan Gautam Das

The Digital Michelangelo Project Marc Levoy Computer Science Department Stanford University

References Bourdieu, P. (1979). La distinction. Critique sociale du jugement. Paris: Minuit. Le

Tips &amp; trick for your first semester at TUM SOM Tuesday, October 20, 2020 TUM School of

Knowledge Transfer for Visual Recognition The University of Tokyo RIKEN AIP (Team leader of

loom p W eb 3 .0 Content Authoring Linked Data Authoring for Non-Experts Ralf Heese, Markus

Graphics! def f(p, q): def main(): print(2 * q + p) i = 10 j = 3 f(i, j) def g(c, d): f(j,

Franco Moretti and Oleg Sobchuk Hidden in Plain Sight. Thoughts on Data Visualization in the

Tips & trick for your first semester at TUM SOM Tuesday, October 20, 2020 TUM School of