Geometric Top-k Processing: Updates since MDM'16 [Advanced Seminar] - PowerPoint PPT Presentation

Geometric Top-k Processing: Updates since MDM'16 [Advanced Seminar] Kyriakos Mouratidis Singapore Management University MDM 2019

Introduction • Top- k query: shortlists Weights could be captured top options from a set by slide-bars: of alternatives • E.g. tripadvisor.com – rate (and browse) hotels according to price, cleanliness, location, • A user ’ s criteria: price , service, etc. cleanliness and service , with different weights

Introduction • Slide-bar locations → numerical weights • We call q = <0.8, 0.3, 0.5> the query vector – and its domain query space or preference space • Linear function ranks hotels (i.e. options ) – score = 0.8 · price + 0.3 · clean + 0.5 · service – if option r is seen as vecto r , score = dot produc t r·q • Top-k returned (e.g. the top-10) • Top-k processing is well-studied – E.g. [Fagin01,Tao07] for processing w/o & w/ index – Excellent survey [Ilyas08]

Top-k as sweeping the data space [Tsaparas03] • Assume all query weights are positive • …and each option attribute is in range [0,1] • Example for d = 2 (showing: data space ) • Sweeping line normal to vector q • Sweeps from top-corner (1,1) towards origin • Order an option is met ↔ order in ranking! – E.g. top-2 = { r 1 , r 2 } • At current position: ∀ option above (below) the line, higher (lower) score than r 2

Notes on dim/nality of query domain • Ranking of depends only on orientation of sweeping line (or hyper-plane, in higher dim.) – query vector <0.8,0.3,0.5> same effect as <8,3,5> •  we can normalize q so that sum of weights is 1 (without affecting at all the top-k semantics) – e.g. in 2-D we can rewrite scoring function as S(r) = α ·x 1 + (1- α )·x 2 • This reduces dim/nality of query domain by 1 – Geom. operations in query domain become faster • We’ll ignore this in the following for simplicity

Relationship to Convex Hull • Convex Hull : The smallest convex polytope that includes a set of points (options) • Fact: The top-1 option for x 2 any query vector is r 3 on the hull! r 4 r 1 r 10 r 2 – [Dantzig63]: LP text r 5 r 6 r 11 r 13 r 7 r 14 r 12 r 8 r 15 x 1 r 9

[Börzsönyi01, Papadias03]: Skyline • Dominance: option r 1 dominates r 2 iff it has higher values in all dimensions [ignore ties] •  S( r 1 ) > S( r 2 ) ∀ q x 2 x 2 x 2 • Skyline : all opts. that r 3 r 3 r 4 r 4 aren’t dominated r 1 r 1 r 1 r 2 r 2 r 2 • Includes top-1 ∀ q r 5 r 5 r 10 r 10 r 6 r 6 r 11 r 11 • k-skyband : all opts. not dominated by r 13 r 13 r 7 r 7 r 14 r 14 r 12 r 12 r 8 r 8 k or more others r 15 r 15 x 1 x 1 x 1 • Includes top-k ∀ q r 9 r 9

[Zhang14]: Global Immutable Region • Global Immutable Region (GIR) – The maximal region around query vector q where the top- k result remains the same • Order within result retained – i.e. S(r 1 ) > S(r 2 ) and S(r 2 ) > S(r 3 ) … S(r k-1 ) > S(r k ) – k-1 conditions ( O-conditions ) • Non-results cannot overtake r k – i.e. S(r k ) > S(r) for every non-result r – n-k conditions ( NR-conditions ) • Observation: each condition ↔ a half-space!

[Zhang14]: Global Immutable Region • Each condition ↔ a half-space ! • Intersect all half-spaces h 1-2 • Cost: O(n d/2 ) • Problem: Too expensive • Idea: limit no. of NR-conditions!

[Zhang14]: Global Immutable Region • Answer: Every query vector in shaded area (GIR) • Applications: – Result stability – E.g. volume of GIR equals to probability that a random query vector returns same result as q – Result caching – Weight readjustment 10

[Asudeh18]: Result stability • Given a total ranking of the dataset w.r.t. q • They use GIR volume as a measure of stability • Allowing q to move in a region R in pref. space • They report total rankings in decreasing stability order (i.e., decreasing GIR volume) • Their approach relies on sampling (i.e., is approximate) with a probabilistic accuracy analysis

[Mouratidis15]: MaxRank • MaxRank query : given a focal option p , find: 1. The highest rank p may achieve under any possible user preference, and 2. All the regions in the preference space where that rank is attained

[Vlachou10 & 11]: Reverse top-k query • Bichromatic (main focus): Given a focal option p , a set of options, and a set of top-k queries , identify the queries that have p in their result – Algebraic bounds based on MBRs • Monochromatic : Given a focal option p and a set of options, find all regions in pref. space where p is in the top-k result – Solution only for 2-D 13

[Vlachou10 & 11]: Reverse top-k query • Monochromatic RTOP-k in 2-D • S(r) = α ·x 1 + (1- α )·x 2 • Every intersection of S( r ) scoreline of p ↔ 1 r 3 reordering r 1 r 4 • Plane sweep algo. p r 5 r 2 0 α 0.4 1 0 0.2 0.6 Order: 3 4 3 4 14

[Tang17]: k-Shortlist Preference Regions • Monochromatic RTOP-k for d ≥ 2 • aka: k-Shortlist Preference Regions (kSPR): – All regions in preference space where a given focal option p belongs to the top-k result 15

[Tang17]: kSPR Example � � � � • Preference space 1 1 � • Order of p � � � � � • kSPR result for k = 3: – The shaded wedges – Every query vector in � � shaded area ranks p among the top-3 � � � � options 0 0 1 1 0 0 16

[Tang17]: Fast pruning x 2 • Dominees r 1 – ignore r 3 Dominators • Dominators r 4 – simply increment k* r 5 p • Incomparable r 2 – How to deal with them? Dominees r 6 r 7 x 1 r 8 Data Space 17

[Tang17]: kSPR • Consider a single incomparable opt. r • Score of r higher than p iff query vector is inside a half-space – Inequality S ( r ) > S ( p ) maps into half-space in query space Query Space

[Tang17]: Fundamentals • Idea: map each incomp. option to a h/s • Set of h/s including q 2 h 1 q 2 q 2 h 1 h 1 3 3 cell = set of options 4 4 h 2 h 2 h 2 scoring higher than p 4 4 5 5 h 7 h 7 h 7 • Count in each cell = 4 4 3 3 h 6 h 6 h 6 no. of options that 2 2 3 3 h 4 h 4 h 4 score higher than p 1 1 4 4 h 3 h 3 h 3 2 2 • kSPR result for k=4: 3 3 3 3 cells with count ≤ 3 h 5 h 5 h 5 4 4 q 1 q 1 q 1 Half-space Arrangement 19

[Tang17]: Cell Tree • Insert h/s one by one into a binary tree to maintain the arrangement • Insertion of h 1 (root split into 2 leaves) • Insertion of h 2 (each leaf split into two) � � � � � � � : S � � < S(�) ℎ � � , ℎ � � ℎ � ℎ � ℎ � � � � � � , ℎ � � � , ℎ � � ℎ � ℎ � � : S � � > S(�) ℎ � � � � � � � � , ℎ � � ℎ �

[Tang17]: Cell Tree (3 h/s, k = 2) • Assume 3 h/s as shown below: • Cell Tree looks like: � � ℎ � ℎ � � � ℎ � � ℎ � � � � � � � � � ℎ � ℎ � � } {ℎ � � � � � � � � � � } {ℎ � � ℎ � � ℎ � � � � � � � � � ℎ �

[Tang17]: Cell Representation (implicit) � � ℎ � ℎ � • Cell computation takes 0 1 � � � � O ( n d/2 ) � ℎ � � � • Implicit representation by defining halfspaces: � � ℎ � − , h 2 − , h 3 − , h 4 + , h 5 − , h 6 + } { h 1 ℎ � • …even better, just the ℎ � bounding ones: � � − , h 6 + } { h 2 0 1 • Trouble: how to detect infeasible cells? 22

[Tang17]: Case Study kSPR (k=3) on real NBA data for Dwight Howard Season: 2015-16 Season: 2014-15 � � : rebounds � � : rebounds � � : points � � : points

Uncertain Preferences • Literature assumes q is given and exact, but… • …whether manually input or mined, it could only be taken as a mere indication • If only approximate prefs., instead of exact q , use a region R in pref. space to allow for inaccuracies • [Ciaccia&Martinenghi17]: identify all possible top-1 options (k = 1) • [Mouratidis&Tang18]: identify all possible top-k options (k ≥ 1)

[Mouratidis&Tang18]: Uncertain Top-k • Given: approx. preferences ↔ region R in pref. space • UTK 1 : report all options that may be among the top-k when q ∈ R • UTK 2 : report specific top-k set for any q ∈ R

UTK: Example w 2 Region R 0.25 p 1 , p 2 p 1 , p 6 p 2 , p 4 p 1 , p 4 0.05 w 1 0.45 0.05 Dataset UTK output for k = 2 (in preference space)

r-dominance; r-skyband • Consider options r 1 and r 2 • ∀ q in R , S( r 1 ) > S( r 2 ) : r 1 r-dominates r 2 • r-skyband : options r-dominated by <k others • Good filtering, but still superset of UTK options w 2 w 2 R R w 1 w 1 27

UTK 1 – Refinement (RSA) • ∀ remaining candidate r determine if there is position in R where r is in top-k • Progressively consider competitors and recursively partition R by focusing only on promising regions • Use r-dominance relationships to prioritize competitors during verification of r w 2 1 1 2 R 2 1 1 28 w 1

Geometric Top-k Processing: Updates since MDM'16 [Advanced Seminar] - PowerPoint PPT Presentation

Geometric Top-k Processing: Updates since MDM'16 [Advanced Seminar] Kyriakos Mouratidis Singapore Management University MDM 2019 Introduction Top- k query: shortlists Weights could be captured top options from a set by slide-bars: of

PRESENTATION Here you will find the following in- formation: MDM DREYER The company

P1 P1 Math th parents Workshop 2018 2018 FACILITATORS: MDM SABARIAH, MS ANGELA TANG,MDM

Chapter 3: Top-k Query Processing and Indexing 3.1 Top-k Algorithms 3.2 Approximate Top-k Query

To TOP or NOT to TOP www.SAS.com To TOP or NOT to TOP Using the TOP command in Linux By Len van

MDM/R DATA ACCESS FOUNDATION (FOUNDATION) Introduction to the Foundation Working Group March 26,

My Chil ild Through Maslow Hierarchy and Pyramid of Learning Hel Hello! lo! Mdm Ng Miu Le

Primary 1 2018 Parents Briefing Mr Muhammad Farizal (Principal) Vice-Principals Mr Low Min

Geometric Optimization Piotr Indyk April 26, 2005 Lecture 19: Geometric Optimization Geometric

Geometric Algebra A powerful tool for solving geometric problems in visual computing Leandro A.

Boosted Top Tagging Seung J. Lee Outline Introduction: top jets @ LHC Modern boosted top

Puncher/Squeezer Riveting Tools BEST PRACTICES 2018 Tool Uses Top Rail Punch Top Rail

Class 42: Free symmetric top Class 42: Free symmetric top Free symmetric top in body frame Assume

FOOD PROCESSING FOOD PROCESSING GREEN BEAN PROCESSING GREEN BEAN PROCESSING GREEN BEAN

Mission Updates Payload and Subsystems Updates Rocket and Subsystems Updates

Subdivision Surfaces 1 Geometric Modeling Geometric Modeling Sometimes need more than

PDE-based Geometric Modeling and Interactive Sculpting for Graphics Hong Qin Center for Visual

Group versus Individual Liability: A Field Experiment in the Philippines Xavier Gine (World Bank)

1 By April 1 st , we will have the following products Note: Report of Accomplishments and Plan

Stratus Cost-aware container scheduling in the public cloud Andrew Chung Jun Woo Park, Greg

AIRS DATA ASSIMILATION WORKSHOP JOEL SUSSKIND NASA/GSFC 06 November 2001 CLEAR COLUMN RADIANCE

Financial Results Presentation 3Q2018 Contents A Recent Highlights B 3Q2018 Financial

11/2/2015 Nattawoot Koowattanatianchai 1 Derivatives Analysis Nattawoot Koowattanatianchai

TRAINING TO DISSEMINATORS AND INCORPORATORS AGENTS Title of the project - In_Food Quality -

Draft Official Plan Amendment 2016 Growth Projections, Employment Lands and the Agricultural