Top-k Aggregation Using Intersections Yahoo! Research Ravi Kumar - PowerPoint PPT Presentation

Top-k Aggregation Using Intersections Yahoo! Research Ravi Kumar Yahoo! Research Kunal Punera Yahoo! Research / Brooklyn Poly Torsten Suel Yahoo! Research Sergei Vassilvitskii

Top-k retrieval Given a set of documents: Doc 1 Doc 6 Doc 2 Doc 4 Doc 5 Doc 3 And a query: “ New York City ” Find the k documents best matching the query. 2

Top-k retrieval Given a set of documents: Doc 1 Doc 6 Doc 2 Doc 4 Doc 5 Doc 3 And a query: “ New York City ” Find the k documents best matching the query. Assume: decomposable scoring function: Score(“New York City”) = Score(“New”) + Score(“York”)+Score(“City”). 3

Introduction: Postings Lists Data Structures behind top-k retrieval. Create posting lists: Doc ID Score 4

Introduction: Postings Lists Data Structures behind top-k retrieval. Create posting lists: Doc ID Score Query: New York City New... 9 5.2 5 4.0 7 3.3 3 1.0 10 0.0 York... 10 4.1 9 3.1 7 1.0 5 0.5 1 0.2 City... 10 2.0 3 1.5 7 1.0 9 0.2 5 0.1 5

Introduction: Postings Lists (Offline) Sort each list by decreasing score. Query: New York City New... 9 5.2 5 4.0 7 3.3 3 1.0 10 0.0 York... 10 4.1 9 3.1 7 1.0 5 0.5 1 0.2 City... 10 2.0 3 1.5 7 1.0 9 0.2 5 0.1 Retrieval: Start with document with highest score in any list. Look up its score in other lists. Top: 9 5.2+3.1+0.2=8.5 6

Introduction: Postings Lists Data Structures behind top-k retrieval: Arrange each list by decreasing score. Query: New York City New... 9 5.2 5 4.0 7 3.3 3 1.0 10 0.0 York... 10 4.1 9 3.1 7 1.0 5 0.5 1 0.2 City... 10 2.0 3 1.5 7 1.0 9 0.2 5 0.1 Continue with next highest score. Top: Candidate: 9 8.5 10 4.1+2.0+0.0 = 6.1 7

Introduction: Postings Lists Data Structures behind top-k retrieval: Arrange each list by decreasing score. Query: New York City New... 9 5.2 5 4.0 7 3.3 3 1.0 10 0.0 York... 10 4.1 9 3.1 7 1.0 5 0.5 1 0.2 City... 10 2.0 3 1.5 7 1.0 9 0.2 5 0.1 Continue with next highest score. Top: Candidate: 9 8.5 5 4.0+0.5+0.1=4.6 8

Introduction: Postings Lists Data Structures behind top-k retrieval: Arrange each list by decreasing score. Query: New York City New... 9 5.2 5 4.0 7 3.3 3 1.0 10 0.0 York... 10 4.1 9 3.1 7 1.0 5 0.5 1 0.2 City... 10 2.0 3 1.5 7 1.0 9 0.2 5 0.1 When can we stop? Top: Best Possible Remaining: 9 8.5 * 3.3+1.5+1.0=5.8 9

Threshold Algorithm Threshold Algorithm (TA) – Instance optimal (in # of accesses) [Fagin et al] – Performs random accesses No-Random-Access Algorithm (NRA) – Similar to TA – Keep a list of all seen results – Also instance optimal 10

Introducing bi-grams 11

Introducing bi-grams Certain words often occur as phrases. Word association: 11

Introducing bi-grams Certain words often occur as phrases. Word association: – Sagrada ... 11

Introducing bi-grams Certain words often occur as phrases. Word association: – Sagrada ... – Barack ... 11

Introducing bi-grams Certain words often occur as phrases. Word association: – Sagrada ... – Barack ... – Latent Semantic... 11

Introducing bi-grams Certain words often occur as phrases. Word association: – Sagrada ... – Barack ... – Latent Semantic... Pre-compute posting lists for intersections – Note, this is not query-result caching Tradeoffs: – Space: extra space to store the intersection (though it’s smaller) – Time: Less time upon retrieval 12

Bi-grams & TA Query: New York City All aggregations -- 6 lists. [New] [York] [City] [New York] [New City] [York City] 13

Bi-grams & TA Query: New York City All aggregations -- 6 lists. [New] [York] [City] [New York] [New City] [York City] New 9 5.2 5 4.0 7 3.3 3 1.0 10 0.0 York 10 4.1 9 3.1 7 1.0 5 0.5 1 0.2 City 10 2.0 3 1.5 7 1.0 9 0.2 5 0.1 NY 9 8.3 5 4.5 7 4.3 10 4.1 3 1.0 NC 9 5.4 7 4.3 5 4.1 3 2.5 10 2.0 YC 10 6.1 9 3.3 7 2.0 3 1.5 5 0.6 14

Bi-grams & TA Query: New York City All aggregations -- 6 lists. [New] [York] [City] [New York] [New City] [York City] New 9 5.2 5 4.0 7 3.3 3 1.0 10 0.0 York 10 4.1 9 3.1 7 1.0 5 0.5 1 0.2 City 10 2.0 3 1.5 7 1.0 9 0.2 5 0.1 NY 9 8.3 5 4.5 7 4.3 10 4.1 3 1.0 NC 9 5.4 7 4.3 5 4.1 3 2.5 10 2.0 YC 10 6.1 9 3.3 7 2.0 3 1.5 5 0.6 Top: 9 8.5 15

Bi-grams & TA Query: New York City All aggregations -- 6 lists. [New] [York] [City] [New York] [New City] [York City] New 9 5.2 5 4.0 7 3.3 3 1.0 10 0.0 York 10 4.1 9 3.1 7 1.0 5 0.5 1 0.2 City 10 2.0 3 1.5 7 1.0 9 0.2 5 0.1 NY 9 8.3 5 4.5 7 4.3 10 4.1 3 1.0 NC 9 5.4 7 4.3 5 4.1 3 2.5 10 2.0 YC 10 6.1 9 3.3 7 2.0 3 1.5 5 0.6 Top: Can we stop now? 9 8.5 16

TA Bounds Informal New 9 5.2 5 4.0 7 3.3 3 1.0 10 0.0 York 10 4.1 9 3.1 7 1.0 5 0.5 1 0.2 City 10 2.0 3 1.5 7 1.0 9 0.2 5 0.1 NY 9 8.3 5 4.5 7 4.3 10 4.1 3 1.0 NC 9 5.4 7 4.3 5 4.1 3 2.5 10 2.0 YC 10 6.1 9 3.3 7 2.0 3 1.5 5 0.6 Top: 9 8.5 Bounds on any unseen element: N + Y + C = 10.1 17

TA Bounds Informal New 9 5.2 5 4.0 7 3.3 3 1.0 10 0.0 York 10 4.1 9 3.1 7 1.0 5 0.5 1 0.2 City 10 2.0 3 1.5 7 1.0 9 0.2 5 0.1 NY 9 8.3 5 4.5 7 4.3 10 4.1 3 1.0 NC 9 5.4 7 4.3 5 4.1 3 2.5 10 2.0 YC 10 6.1 9 3.3 7 2.0 3 1.5 5 0.6 Top: 9 8.5 Bounds on any unseen element: N + Y + C = 10.1 NY + C = 6.5 18

TA Bounds Informal New 9 5.2 5 4.0 7 3.3 3 1.0 10 0.0 York 10 4.1 9 3.1 7 1.0 5 0.5 1 0.2 City 10 2.0 3 1.5 7 1.0 9 0.2 5 0.1 NY 9 8.3 5 4.5 7 4.3 10 4.1 3 1.0 NC 9 5.4 7 4.3 5 4.1 3 2.5 10 2.0 YC 10 6.1 9 3.3 7 2.0 3 1.5 5 0.6 Top: 9 8.5 Bounds on any unseen element: N + Y + C = 10.1 NY + C = 6.5 NC + Y = 8.4 YC + N = 10.1 19

TA Bounds Informal 9 5.2 5 4.0 7 3.3 3 1.0 10 0.0 New 10 4.1 9 3.1 7 1.0 5 0.5 1 0.2 York City 10 2.0 3 1.5 7 1.0 9 0.2 5 0.1 9 8.3 5 4.5 7 4.3 10 4.1 3 1.0 NY NC 9 5.4 7 4.3 5 4.1 3 2.5 10 2.0 YC 10 6.1 9 3.3 7 2.0 3 1.5 5 0.6 Top: 9 8.5 Bounds on any unseen element: N + Y + C = 10.1 NY + C = 6.5 NC + Y = 8.4 YC + N = 10.1 1/2 (NY + YC + NC) = 7.45 20

TA Bounds Informal New 9 5.2 5 4.0 7 3.3 3 1.0 10 0.0 York 10 4.1 9 3.1 7 1.0 5 0.5 1 0.2 City 10 2.0 3 1.5 7 1.0 9 0.2 5 0.1 NY 9 8.3 5 4.5 7 4.3 10 4.1 3 1.0 NC 9 5.4 7 4.3 5 4.1 3 2.5 10 2.0 YC 10 6.1 9 3.3 7 2.0 3 1.5 5 0.6 Top: 9 8.5 Bounds on any unseen element: N + Y + C = 10.1 NY + C = 6.5 NC + Y = 8.4 YC + N = 10.1 1/2 (NY + YC + NC) = 7.45 Thus best element has score < 6.5. So we are done! 21

TA: Bounds Formal Can we write the bounds on the next element? : score of document x in list i. x i : bound on the score in list i (score of next unseen document) b i Combinations: bound on b ij x i + x j Simple LP for bound on unseen elements: � max x i i x i ≤ b i x i + x j ≤ b ij In theory: Easy! Just solve an LP every time. In reality: You’re kidding, right? 22

Solving the LP Need to solve the LP: Same as solving the dual � � � y ij b ij + y i b i min max x i i � x i ≤ b i y i + y ij ≥ 1 j x i + x j ≤ b ij y i , y ij ≥ 0 23

The dual as a graph � � Add one node for each with weight y ij b ij + y i b i min b i y i Add one edge for each with weight � b ij y ij y i + y ij ≥ 1 j y i , y ij ≥ 0 1.2 5.2 1.2 3.3 3.3 6.1 4.2 3.7 5.1 5.4 24

The dual as a graph � � Add one node for each with weight y ij b ij + y i b i min b i y i Add one edge for each with weight � b ij y ij y i + y ij ≥ 1 j y i , y ij ≥ 0 1.2 5.2 1.2 3.3 3.3 Single Lists 6.1 4.2 3.7 5.1 5.4 24

Top-k Aggregation Using Intersections Yahoo! Research Ravi Kumar - PowerPoint PPT Presentation

Top-k Aggregation Using Intersections Yahoo! Research Ravi Kumar Yahoo! Research Kunal Punera Yahoo! Research / Brooklyn Poly Torsten Suel Yahoo! Research Sergei Vassilvitskii Top-k retrieval Given a set of documents: Doc 1 Doc 6 Doc 2

Safety at Unsignalized Intersections Safety at Unsignalized Intersections Unsignalized

Safety at Unsignalized Intersections Safety at Unsignalized Intersections Unsignalized

CS6100: Topics in Design and Analysis of Algorithms Line Segment Intersections John Augustine

To TOP or NOT to TOP www.SAS.com To TOP or NOT to TOP Using the TOP command in Linux By Len van

Intersections & Turnabouts Intersections Come in a Variety of Designs + A crossroad - Two

Critical Intersections: Pedagogy, Critical Intersections: Pedagogy, Technology, and Student

Safety at Signalized Intersections Safety at Signalized Intersections Gary B. Thomas Gary B.

Enhanced Identification of High-Risk Intersections Haris Zia Graduate Transportation Engineer

Intersection Safety Intersection Safety Intersection Safety Intersections Intersections

Elmwood Park: Electricity Aggregation Developing an Opt-In Municipal Aggregation Program to

simplifying the customer experience through account aggregation Sim Sangha Business Development

The Axiomatic Method in Social Choice Theory: Preference Aggregation, Judgment Aggregation, Graph

Boosted Top Tagging Seung J. Lee Outline Introduction: top jets @ LHC Modern boosted top

Puncher/Squeezer Riveting Tools BEST PRACTICES 2018 Tool Uses Top Rail Punch Top Rail

Class 42: Free symmetric top Class 42: Free symmetric top Free symmetric top in body frame Assume

extremal rays of the cone of betti tables of complete intersections Alex Sutherland Cole Hawkins

Linear Programming and Halfplane Intersection Carola Wenk 1 CMPS 3130/6130 Computational

Ultimately 1/12 Simplifying Assumptions Vehicles have positive velocity Accurate

KCP&L & Spir e Inc ome - E ligible Multi- F amily Custom Re bate y 18 th , 2018

The plot () f u nction and its options DATA VISU AL IZATION IN R Ron Pearson Instr u ctor Some

Degree and Intersection Theory AIRIKA YEE University of Pennsylvania Mentor: Artur B.

CS 6958 LECTURE 8 TRIANGLES, BVH February 3, 2014 Last Time 2 derived ray-triangle

Intersections of Two Planes MCV4U: Calculus & Vectors There are three ways in which two

61A Lecture 19 Object-oriented programming, recursion, and recursive data structures

Top-k Aggregation Using Intersections Yahoo! Research Ravi Kumar - PowerPoint PPT Presentation

Top-k Aggregation Using Intersections Yahoo! Research Ravi Kumar Yahoo! Research Kunal Punera Yahoo! Research / Brooklyn Poly Torsten Suel Yahoo! Research Sergei Vassilvitskii Top-k retrieval Given a set of documents: Doc 1 Doc 6 Doc 2

Safety at Unsignalized Intersections Safety at Unsignalized Intersections Unsignalized

Safety at Unsignalized Intersections Safety at Unsignalized Intersections Unsignalized

CS6100: Topics in Design and Analysis of Algorithms Line Segment Intersections John Augustine

To TOP or NOT to TOP www.SAS.com To TOP or NOT to TOP Using the TOP command in Linux By Len van

Intersections &amp; Turnabouts Intersections Come in a Variety of Designs + A crossroad - Two

Critical Intersections: Pedagogy, Critical Intersections: Pedagogy, Technology, and Student

Safety at Signalized Intersections Safety at Signalized Intersections Gary B. Thomas Gary B.

Enhanced Identification of High-Risk Intersections Haris Zia Graduate Transportation Engineer

Intersection Safety Intersection Safety Intersection Safety Intersections Intersections

Elmwood Park: Electricity Aggregation Developing an Opt-In Municipal Aggregation Program to

simplifying the customer experience through account aggregation Sim Sangha Business Development

The Axiomatic Method in Social Choice Theory: Preference Aggregation, Judgment Aggregation, Graph

Boosted Top Tagging Seung J. Lee Outline Introduction: top jets @ LHC Modern boosted top

Puncher/Squeezer Riveting Tools BEST PRACTICES 2018 Tool Uses Top Rail Punch Top Rail

Class 42: Free symmetric top Class 42: Free symmetric top Free symmetric top in body frame Assume

extremal rays of the cone of betti tables of complete intersections Alex Sutherland Cole Hawkins

Linear Programming and Halfplane Intersection Carola Wenk 1 CMPS 3130/6130 Computational

Ultimately 1/12 Simplifying Assumptions Vehicles have positive velocity Accurate

KCP&amp;L &amp; Spir e Inc ome - E ligible Multi- F amily Custom Re bate y 18 th , 2018

The plot () f u nction and its options DATA VISU AL IZATION IN R Ron Pearson Instr u ctor Some

Degree and Intersection Theory AIRIKA YEE University of Pennsylvania Mentor: Artur B.

CS 6958 LECTURE 8 TRIANGLES, BVH February 3, 2014 Last Time 2 derived ray-triangle

Intersections of Two Planes MCV4U: Calculus &amp; Vectors There are three ways in which two

61A Lecture 19 Object-oriented programming, recursion, and recursive data structures

Intersections & Turnabouts Intersections Come in a Variety of Designs + A crossroad - Two

KCP&L & Spir e Inc ome - E ligible Multi- F amily Custom Re bate y 18 th , 2018

Intersections of Two Planes MCV4U: Calculus & Vectors There are three ways in which two