Ranking Distributed Probabilistic Data Jeffrey Jestes Feifei Li Ke - PowerPoint PPT Presentation

Expected Ranks Example tuples score { (120 , 0 . 8) , (62 , 0 . 2) } t 1 { (103 , 0 . 7) , (70 , 0 . 3) } t 2 { (98 , 1) } t 3 world W Pr[ W ] { t 1 = 120 , t 2 = 103 , t 3 = 98 } 0 . 8 × 0 . 7 × 1 = 0 . 56 { t 1 = 120 , t 3 = 98 , t 2 = 70 } 0 . 8 × 0 . 3 × 1 = 0 . 24 { t 2 = 103 , t 3 = 98 , t 1 = 62 } 0 . 2 × 0 . 7 × 1 = 0 . 14 { t 3 = 98 , t 2 = 70 , t 1 = 62 } 0 . 2 × 0 . 3 × 1 = 0 . 06 tuple r(tuple) 0 . 56 × 0 + 0 . 24 × 0 + 0 . 14 × 2 + 0 . 06 × 2 = 0 . 4 t 1 0 . 56 × 1 + 0 . 24 × 2 + 0 . 14 × 0 + 0 . 06 × 1 = 1 . 1 t 2 0 . 56 × 2 + 0 . 24 × 1 + 0 . 14 × 1 + 0 . 06 × 0 = 1 . 5 t 3 8-3

Expected Ranks It has been shown that r ( t i ) may be written as b i � p i,l ( q ( v i,l ) − Pr [ X i > v i,l ]) r ( t i ) = (2) l =1 where, = number of choices in the pdf of t i b i = probability of choice l in tuple t i p i,l � q ( v i,l ) = j Pr [ X j > v i,l ] = pdf of tuple t i X i Pr [ X i > v i,l ] = contribution of t i to q ( v i,l ) 9-1

Expected Ranks It has been shown that r ( t i ) may be written as b i � p i,l ( q ( v i,l ) − Pr [ X i > v i,l ]) r ( t i ) = (2) l =1 where, = number of choices in the pdf of t i b i = probability of choice l in tuple t i p i,l � q ( v i,l ) = j Pr [ X j > v i,l ] = pdf of tuple t i X i Pr [ X i > v i,l ] = contribution of t i to q ( v i,l ) q ( v i,l ) is the sum of the probabilities that a tuple will out- rank a tuple with score v i,l 9-2

Expected Ranks It has been shown that r ( t i ) may be written as b i � p i,l ( q ( v i,l ) − Pr [ X i > v i,l ]) r ( t i ) = (2) l =1 where, = number of choices in the pdf of t i b i = probability of choice l in tuple t i p i,l � q ( v i,l ) = j Pr [ X j > v i,l ] = pdf of tuple t i X i Pr [ X i > v i,l ] = contribution of t i to q ( v i,l ) X i may contain value-probability pairs ( v, p ) s.t. v > v i,l , since the existence of t i = v i,l precludes t i = v , we must subtract the corresponding p ’s from q ( v i,l ) 9-3

Expected Ranks It has been shown that r ( t i ) may be written as b i � p i,l ( q ( v i,l ) − Pr [ X i > v i,l ]) r ( t i ) = (2) l =1 where, = number of choices in the pdf of t i b i = probability of choice l in tuple t i p i,l � q ( v i,l ) = j Pr [ X j > v i,l ] = pdf of tuple t i X i Pr [ X i > v i,l ] = contribution of t i to q ( v i,l ) Efficient algorithms exist to compute the Expected ranks in O ( NlogN ) time for a database of N tuples 9-4

Computing Expected Ranks by q ( v ) ’s tuples score { (120 , 0 . 8) , (62 , 0 . 2) } t 1 { (103 , 0 . 7) , (70 , 0 . 3) } t 2 { (98 , 1) } t 3 3.0 2.8 2.5 1.5 0.8 0 - ∞ 62 70 98 103 120 10-1

Computing Expected Ranks by q ( v ) ’s tuples score { (120 , 0 . 8) , (62 , 0 . 2) } t 1 { (103 , 0 . 7) , (70 , 0 . 3) } t 2 { (98 , 1) } t 3 3.0 2.8 2.5 1.5 0.8 0 - ∞ 62 70 98 103 120 10-2

Computing Expected Ranks by q ( v ) ’s tuples score { (120 , 0 . 8) , (62 , 0 . 2) } t 1 { (103 , 0 . 7) , (70 , 0 . 3) } t 2 { (98 , 1) } t 3 3.0 2.8 2.5 1.5 0.8 0 - ∞ 62 70 98 103 120 r ( t 1 ) = 0 . 8 × 0 10-3

Computing Expected Ranks by q ( v ) ’s tuples score { (120 , 0 . 8) , (62 , 0 . 2) } t 1 { (103 , 0 . 7) , (70 , 0 . 3) } t 2 { (98 , 1) } t 3 3.0 2.8 2.5 1.5 0.8 0 - ∞ 62 70 98 103 120 r ( t 1 ) = 0 . 8 × 0 + 0 . 2 × (2 . 8 − 0 . 8) = 0 . 4 10-4

Distributed Probabilistic Data Model site 1 tuples score t 1 , 1 X 1 , 1 t 1 , 2 X 1 , 2 . . . . . . . . . site m tuples score t 2 , 1 X 2 , 1 t 2 , 2 X 2 , 2 . . . . . . 11-1

Distributed Probabilistic Data Model site 1 tuples score t 1 , 1 X 1 , 1 t 1 , 2 X 1 , 2 tuples . . . . . . t 1 . t 2 . . . . . site m tuples score t N t 2 , 1 X 2 , 1 Conceptual t 2 , 2 X 2 , 2 Database D . . . . . . We can think of the union of the individual databases D i at each site s i as a conceptual database D 11-2

Ranking Queries for Distributed Probabilistic Data We introduce two frameworks for ranking queries for distributed probabilistic data Sorted Access on Local Ranks Sorted Access on Expected Scores 12-1

Sorted Access on Local Ranks Framework site 2 site m site 1 t 2 , 1 t 1 , 1 t m, 1 . . . t 2 , 2 t m, 2 t 1 , 2 . . . . . . . . . t 2 ,n 2 t 2 ,n m t 1 ,n 1 Every site calculates the local ranks of its tuples and stores tuples in ascending order of local ranks 13-1

Sorted Access on Local Ranks Framework SERVER site 2 site m site 1 t 2 , 1 t 1 , 1 t m, 1 . . . t 2 , 2 t m, 2 t 1 , 2 . . . . . . . . . t 2 ,n 2 t 2 ,n m t 1 ,n 1 The server accesses tuples in ascending order of local ranks and combines the local ranks to get the global ranks 13-2

Local and Global Ranks The local rank of a tuple t i,j at a site s i in database D i is b i,j � p i,j,l ( q i ( v i,j,l ) − Pr [ X i,j > v i,j,l ]) r ( t i,j , D i ) = (3) l =0 The local rank for a tuple t i,j at a site s y with database D y , s.t. i � = y is b i,j � r ( t i,j , D y ) = p i,j,l ( q y ( v i,j,l )) (4) l =1 The global rank for a tuple t i,j is m � r ( t i,j , D y ) = r ( t i,j , Dy ) (5) y =1 14-1

Local and Global Ranks The local rank of a tuple t i,j at a site s i in database D i is b i,j � p i,j,l ( q i ( v i,j,l ) − Pr [ X i,j > v i,j,l ]) r ( t i,j , D i ) = (3) l =0 The local rank for a tuple t i,j at a site s y with database D y , s.t. i � = y is b i,j � r ( t i,j , D y ) = p i,j,l ( q y ( v i,j,l )) (4) l =1 14-2

Local and Global Ranks The local rank of a tuple t i,j at a site s i in database D i is b i,j � p i,j,l ( q i ( v i,j,l ) − Pr [ X i,j > v i,j,l ]) r ( t i,j , D i ) = (3) l =0 The local rank for a tuple t i,j at a site s y with database D y , s.t. i � = y is b i,j � r ( t i,j , D y ) = p i,j,l ( q y ( v i,j,l )) (4) l =1 The global rank for a tuple t i,j is m � r ( t i,j , D y ) = r ( t i,j , Dy ) (5) y =1 14-3

Sorted Access on Local Ranks Initialization Rep. Queue tuple lrank 0.8 t 3 , 1 1.2 t 1 , 1 2.3 t 2 , 1 site 2 site 2 site 3 site 3 site 1 site 1 tuple tuple lrank lrank tuple tuple lrank lrank tuple tuple lrank lrank → → → 2.3 2.3 0.8 0.8 1.2 1.2 t 2 , 1 t 2 , 1 t 3 , 1 t 3 , 1 t 1 , 1 t 1 , 1 → → 3.4 3.4 4.1 4.1 → 5.9 5.9 t 2 , 2 t 2 , 2 t 3 , 2 t 3 , 2 t 1 , 2 t 1 , 2 . . . . . . . . . . . . . . . . . . 29.1 29.1 40.4 40.4 t 2 ,n 2 t 2 ,n 2 t 3 ,n 3 t 3 ,n 3 34.2 34.2 t 1 ,n 1 t 1 ,n 1 15-1

Sorted Access on Local Ranks Initialization Rep. Queue tuple lrank 0.8 t 3 , 1 1.2 t 1 , 1 2.3 t 2 , 1 site 2 site 3 site 1 tuple lrank tuple lrank tuple lrank 2.3 0.8 1.2 t 2 , 1 t 3 , 1 t 1 , 1 → → 3.4 4.1 → 5.9 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . . . . 29.1 40.4 t 2 ,n 2 t 3 ,n 3 34.2 t 1 ,n 1 15-2

Sorted Access on Local Ranks: a Round Rep. Queue top − 2 Queue tuple lrank tuple grank 3.4 t 2 , 2 5.4 t 2 , 1 4.1 t 3 , 2 7.9 t 1 , 1 5.9 t 1 , 2 site 2 site 3 site 1 tuple lrank tuple lrank tuple lrank 2.3 0.8 1.2 t 2 , 1 t 3 , 1 t 1 , 1 3.4 4.1 5.9 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 29.1 40.4 t 2 ,n 2 t 3 ,n 3 34.2 t 1 ,n 1 16-1

Sorted Access on Local Ranks: a Round Rep. Queue top − 2 Queue tuple lrank tuple lrank tuple grank 3.4 t 2 , 2 3.4 t 2 , 2 5.4 t 2 , 1 4.1 t 3 , 2 7.9 t 1 , 1 5.9 t 1 , 2 site 2 site 3 site 1 tuple lrank tuple lrank tuple lrank 2.3 0.8 1.2 t 2 , 1 t 3 , 1 t 1 , 1 3.4 4.1 5.9 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 29.1 40.4 t 2 ,n 2 t 3 ,n 3 34.2 t 1 ,n 1 16-2

Sorted Access on Local Ranks: a Round Rep. Queue top − 2 Queue tuple lrank tuple lrank tuple grank 3.4 t 2 , 2 3.4 t 2 , 2 5.4 t 2 , 1 4.1 t 3 , 2 7.9 t 1 , 1 5.9 t 1 , 2 tuple lrank 4.8 t 2 , 3 site 2 site 3 site 1 tuple lrank tuple lrank tuple lrank 2.3 0.8 1.2 t 2 , 1 t 3 , 1 t 1 , 1 3.4 4.1 5.9 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 29.1 40.4 t 2 ,n 2 t 3 ,n 3 34.2 t 1 ,n 1 16-3

Sorted Access on Local Ranks: a Round top − 2 Queue Rep. Queue tuple lrank tuple lrank tuple grank 3.4 t 2 , 2 4.1 5.4 t 3 , 2 t 2 , 1 4.8 7.9 t 2 , 3 t 1 , 1 5.9 t 1 , 2 site 2 site 3 site 1 tuple lrank tuple lrank tuple lrank 2.3 0.8 1.2 t 2 , 1 t 3 , 1 t 1 , 1 3.4 4.1 5.9 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 29.1 40.4 t 2 ,n 2 t 3 ,n 3 34.2 t 1 ,n 1 16-4

Sorted Access on Local Ranks: a Round top − 2 Queue Rep. Queue tuple lrank tuple lrank tuple grank 3.4 t 2 , 2 4.1 5.4 t 3 , 2 t 2 , 1 4.8 7.9 t 2 , 3 t 1 , 1 5.9 t 1 , 2 X 2 , 2 X 2 , 2 site 2 site 3 site 1 tuple lrank tuple lrank tuple lrank 2.3 0.8 1.2 t 2 , 1 t 3 , 1 t 1 , 1 3.4 4.1 5.9 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 29.1 40.4 t 2 ,n 2 t 3 ,n 3 34.2 t 1 ,n 1 16-5

Sorted Access on Local Ranks: a Round top − 2 Queue Rep. Queue tuple lrank tuple lrank tuple grank 3.4 t 2 , 2 4.1 5.4 t 3 , 2 t 2 , 1 4.8 7.9 t 2 , 3 t 1 , 1 5.9 t 1 , 2 lrank lrank 0.7 1.5 site 2 site 3 site 1 tuple lrank tuple lrank tuple lrank 2.3 0.8 1.2 t 2 , 1 t 3 , 1 t 1 , 1 3.4 4.1 5.9 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 29.1 40.4 t 2 ,n 2 t 3 ,n 3 34.2 t 1 ,n 1 16-6

Sorted Access on Local Ranks: a Round top − 2 Queue Rep. Queue tuple lrank tuple lrank tuple grank 3.4 t 2 , 2 4.1 5.4 t 3 , 2 t 2 , 1 4.8 7.9 t 2 , 3 t 1 , 1 grank 5.9 t 1 , 2 5.6 lrank lrank 0.7 1.5 site 2 site 3 site 1 tuple lrank tuple lrank tuple lrank 2.3 0.8 1.2 t 2 , 1 t 3 , 1 t 1 , 1 3.4 4.1 5.9 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 29.1 40.4 t 2 ,n 2 t 3 ,n 3 34.2 t 1 ,n 1 16-7

Sorted Access on Local Ranks: a Round top − 2 Queue Rep. Queue tuple grank tuple lrank tuple grank 5.6 t 2 , 2 4.1 5.4 t 3 , 2 t 2 , 1 4.8 7.9 t 2 , 3 t 1 , 1 5.9 t 1 , 2 site 2 site 3 site 1 tuple lrank tuple lrank tuple lrank 2.3 0.8 1.2 t 2 , 1 t 3 , 1 t 1 , 1 3.4 4.1 5.9 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 29.1 40.4 t 2 ,n 2 t 3 ,n 3 34.2 t 1 ,n 1 16-8

Sorted Access on Local Ranks: a Round top − 2 Queue Rep. Queue tuple lrank tuple grank 4.1 5.4 t 3 , 2 t 2 , 1 4.8 5.6 t 2 , 3 t 2 , 2 5.9 t 1 , 2 site 2 site 3 site 1 tuple lrank tuple lrank tuple lrank 2.3 0.8 1.2 t 2 , 1 t 3 , 1 t 1 , 1 3.4 4.1 5.9 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 29.1 40.4 t 2 ,n 2 t 3 ,n 3 34.2 t 1 ,n 1 16-9

Sorted Access on Local Ranks: a Round We can safely terminate top − 2 Queue Rep. Queue whenever the largest grank tuple lrank tuple grank from top − k queue is ≤ 4.1 5.4 t 3 , 2 t 2 , 1 smallest lrank from Rep. 4.8 5.6 t 2 , 3 t 2 , 2 Queue 5.9 t 1 , 2 site 2 site 3 site 1 tuple lrank tuple lrank tuple lrank 2.3 0.8 1.2 t 2 , 1 t 3 , 1 t 1 , 1 3.4 4.1 5.9 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 29.1 40.4 t 2 ,n 2 t 3 ,n 3 34.2 t 1 ,n 1 16-10

Sorted Access on Local Ranks: a Round We can safely terminate top − 2 Queue Rep. Queue whenever the largest grank tuple lrank tuple grank from top − k queue is ≤ 4.1 5.4 t 3 , 2 t 2 , 1 smallest lrank from Rep. 4.8 5.6 t 2 , 3 t 2 , 2 Queue 5.9 t 1 , 2 A-LR site 2 site 3 site 1 tuple lrank tuple lrank tuple lrank 2.3 0.8 1.2 t 2 , 1 t 3 , 1 t 1 , 1 3.4 4.1 5.9 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 29.1 40.4 t 2 ,n 2 t 3 ,n 3 34.2 t 1 ,n 1 16-11

Sorted Access on Expected Scores Framework site 2 site m site 1 t 2 , 1 t 1 , 1 t m, 1 . . . t 2 , 2 t m, 2 t 1 , 2 . . . . . . . . . t 2 ,n 2 t 2 ,n m t 1 ,n 1 Every site calculates the local ranks and the expected scores of its tuples and stores the tuples in descending order of expected scores 17-1

Sorted Access on Expected Scores Framework SERVER site 2 site m site 1 t 2 , 1 t 1 , 1 t m, 1 . . . t 2 , 2 t m, 2 t 1 , 2 . . . . . . . . . t 2 ,n 2 t 2 ,n m t 1 ,n 1 Tuples are accessed by descending order of expected scores and the server calculates global ranks 17-2

Sorted Access on Expected Scores Initialization site 2 site 3 site 1 tuple E [ X ] tuple E [ X ] tuple E [ X ] → → → 476 500 489 t 2 , 1 t 3 , 1 t 1 , 1 464 432 421 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . . . . 11 1 t 2 ,n 2 t 3 ,n 3 5 t 1 ,n 1 18-1

Sorted Access on Expected Scores Initialization Rep. Queue tuple E [ X ] 500 t 3 , 1 489 t 1 , 1 476 t 2 , 1 site 2 site 3 site 1 tuple E [ X ] tuple E [ X ] tuple E [ X ] 476 500 489 t 2 , 1 t 3 , 1 t 1 , 1 → → 464 432 → 421 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . . . . 11 1 t 2 ,n 2 t 3 ,n 3 5 t 1 ,n 1 18-2

Sorted Access on Expected Scores: a Round Rep. Queue top − 2 Queue tuple E [ X ] tuple grank 464 t 2 , 2 5.4 t 2 , 1 432 t 3 , 2 7.9 t 1 , 1 421 t 1 , 2 site 2 site 3 site 1 tuple E [ X ] tuple E [ X ] tuple E [ X ] 476 500 489 t 2 , 1 t 3 , 1 t 1 , 1 464 432 421 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 11 1 t 2 ,n 2 t 3 ,n 3 5 t 1 ,n 1 19-1

Sorted Access on Expected Scores: a Round Rep. Queue top − 2 Queue tuple lrank tuple E [ X ] tuple grank 3.4 t 2 , 2 464 t 2 , 2 5.4 t 2 , 1 432 t 3 , 2 7.9 t 1 , 1 421 t 1 , 2 site 2 site 3 site 1 tuple E [ X ] tuple E [ X ] tuple E [ X ] 476 500 489 t 2 , 1 t 3 , 1 t 1 , 1 464 432 421 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 11 1 t 2 ,n 2 t 3 ,n 3 5 t 1 ,n 1 19-2

Sorted Access on Expected Scores: a Round Rep. Queue top − 2 Queue tuple lrank tuple E [ X ] tuple grank 3.4 t 2 , 2 464 t 2 , 2 5.4 t 2 , 1 432 t 3 , 2 7.9 t 1 , 1 421 t 1 , 2 tuple E [ X ] 429 t 2 , 3 site 2 site 3 site 1 tuple E [ X ] tuple E [ X ] tuple E [ X ] 476 500 489 t 2 , 1 t 3 , 1 t 1 , 1 464 432 421 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 11 1 t 2 ,n 2 t 3 ,n 3 5 t 1 ,n 1 19-3

Sorted Access on Expected Scores: a Round top − 2 Queue Rep. Queue tuple lrank tuple E [ X ] tuple grank 3.4 t 2 , 2 432 5.4 t 3 , 2 t 2 , 1 429 7.9 t 2 , 3 t 1 , 1 421 t 1 , 2 site 2 site 3 site 1 tuple E [ X ] tuple E [ X ] tuple E [ X ] 476 500 489 t 2 , 1 t 3 , 1 t 1 , 1 464 432 421 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 11 1 t 2 ,n 2 t 3 ,n 3 5 t 1 ,n 1 19-4

Sorted Access on Expected Scores: a Round top − 2 Queue Rep. Queue tuple lrank tuple E [ X ] tuple grank 3.4 t 2 , 2 432 5.4 t 3 , 2 t 2 , 1 429 7.9 t 2 , 3 t 1 , 1 421 t 1 , 2 X 2 , 2 X 2 , 2 site 2 site 3 site 1 tuple E [ X ] tuple E [ X ] tuple E [ X ] 476 500 489 t 2 , 1 t 3 , 1 t 1 , 1 464 432 421 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 11 1 t 2 ,n 2 t 3 ,n 3 5 t 1 ,n 1 19-5

Sorted Access on Expected Scores: a Round top − 2 Queue Rep. Queue tuple lrank tuple E [ X ] tuple grank 3.4 t 2 , 2 432 5.4 t 3 , 2 t 2 , 1 429 7.9 t 2 , 3 t 1 , 1 421 t 1 , 2 lrank lrank 0.7 1.5 site 2 site 3 site 1 tuple E [ X ] tuple E [ X ] tuple E [ X ] 476 500 489 t 2 , 1 t 3 , 1 t 1 , 1 464 432 421 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 11 1 t 2 ,n 2 t 3 ,n 3 5 t 1 ,n 1 19-6

Sorted Access on Expected Scores: a Round top − 2 Queue Rep. Queue tuple lrank tuple E [ X ] tuple grank 3.4 t 2 , 2 432 5.4 t 3 , 2 t 2 , 1 429 7.9 t 2 , 3 t 1 , 1 grank 421 t 1 , 2 5.6 lrank lrank 0.7 1.5 site 2 site 3 site 1 tuple E [ X ] tuple E [ X ] tuple E [ X ] 476 500 489 t 2 , 1 t 3 , 1 t 1 , 1 464 432 421 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 11 1 t 2 ,n 2 t 3 ,n 3 5 t 1 ,n 1 19-7

Sorted Access on Expected Scores: a Round top − 2 Queue Rep. Queue tuple grank tuple E [ X ] tuple grank 5.6 t 2 , 2 432 5.4 t 3 , 2 t 2 , 1 429 7.9 t 2 , 3 t 1 , 1 421 t 1 , 2 site 2 site 3 site 1 tuple E [ X ] tuple E [ X ] tuple E [ X ] 476 500 489 t 2 , 1 t 3 , 1 t 1 , 1 464 432 421 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 11 1 t 2 ,n 2 t 3 ,n 3 5 t 1 ,n 1 19-8

Sorted Access on Expected Scores: a Round top − 2 Queue Rep. Queue tuple E [ X ] tuple grank 432 5.4 t 3 , 2 t 2 , 1 429 5.6 t 2 , 3 t 2 , 2 421 t 1 , 2 site 2 site 3 site 1 tuple E [ X ] tuple E [ X ] tuple E [ X ] 476 500 489 t 2 , 1 t 3 , 1 t 1 , 1 464 432 421 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 11 1 t 2 ,n 2 t 3 ,n 3 5 t 1 ,n 1 19-9

Sorted Access on Expected Scores: a Round Now the only question is top − 2 Queue Rep. Queue when may we safely termi- tuple E [ X ] tuple grank nate and be certain we have 432 5.4 t 3 , 2 t 2 , 1 the global top − k 429 5.6 t 2 , 3 t 2 , 2 421 t 1 , 2 site 2 site 3 site 1 tuple E [ X ] tuple E [ X ] tuple E [ X ] 476 500 489 t 2 , 1 t 3 , 1 t 1 , 1 464 432 421 t 2 , 2 t 3 , 2 t 1 , 2 . . . . . . → . → . → . 11 1 t 2 ,n 2 t 3 ,n 3 5 t 1 ,n 1 19-10

Sorted Access on Expected Scores: Termina- tion The largest element from the top − k queue is clearly an upper bound r + λ for the global rank of any seen tuple t with pdf X to be in the top − k at round λ 20-1

Sorted Access on Expected Scores: Termina- tion top − 2 Queue Rep. Queue tuple grank tuple E [ X ] 5.4 432 t 2 , 1 t 3 , 2 5.6 429 t 2 , 2 t 2 , 3 421 t 1 , 2 The largest element from the top − k queue is clearly an upper bound r + λ for the global rank of any seen tuple t with pdf X to be in the top − k at round λ 20-2

Sorted Access on Expected Scores: Termina- tion top − 2 Queue Rep. Queue tuple grank tuple E [ X ] 5.4 432 t 2 , 1 t 3 , 2 5.6 429 t 2 , 2 t 2 , 3 421 t 1 , 2 The largest element from the top − k queue is clearly an upper bound r + λ for the global rank of any seen tuple t with pdf X to be in the top − k at round λ The head from the Representative queue with expectance τ is an upper bound for the expectance of any unseen t s.t. E [ X ] ≤ τ 20-3

Sorted Access on Expected Scores: Termina- tion top − 2 Queue Rep. Queue tuple grank tuple E [ X ] 5.4 432 t 2 , 1 t 3 , 2 5.6 429 t 2 , 2 t 2 , 3 421 t 1 , 2 The largest element from the top − k queue is clearly an upper bound r + λ for the global rank of any seen tuple t with pdf X to be in the top − k at round λ The head from the Representative queue with expectance τ is an upper bound for the expectance of any unseen t s.t. E [ X ] ≤ τ How can we derive a lower bound r − λ for the global rank of any unseen tuple t s.t. when r + λ ≤ r − λ it is safe to terminate at round λ ? 20-4

Sorted Access on Expected Scores: a Lower Bound? We introduce two methods to find a lower bound r − λ for any unseen tuple t at round λ 21-1

Sorted Access on Expected Scores: a Lower Bound? We introduce two methods to find a lower bound r − λ for any unseen tuple t at round λ Markov Inequality 21-2

Sorted Access on Expected Scores: a Lower Bound? We introduce two methods to find a lower bound r − λ for any unseen tuple t at round λ Markov Inequality Linear Programming 21-3

Markov Inequality Lower Bound We know that the pdf of any unseen t must satisfy E [ X ] ≤ τ 22-1

Markov Inequality Lower Bound We know that the pdf of any unseen t must satisfy E [ X ] ≤ τ We can use the Markov Inequality to lower bound the rank of any site s i with database D i as, n i n i � � r ( t, D i ) = Pr[ X j > X ] = n i − Pr[ X ≥ X j ] j =1 j =1 b ij n i p i,j,ℓ E [ X ] � � (Markov Ineq.) n i − v i,j,ℓ . ≥ j =1 ℓ =1 b ij n i τ � � v i,j,ℓ = r − ( t, D i ) . (6) n i − p i,j,ℓ ≥ j =1 ℓ =1 22-2

Markov Inequality Lower Bound We know that the pdf of any unseen t must satisfy E [ X ] ≤ τ We can use the Markov Inequality to lower bound the rank of any site s i with database D i as, n i n i � � r ( t, D i ) = Pr[ X j > X ] = n i − Pr[ X ≥ X j ] j =1 j =1 b ij n i p i,j,ℓ E [ X ] � � (Markov Ineq.) n i − v i,j,ℓ . ≥ j =1 ℓ =1 b ij n i τ � � v i,j,ℓ = r − ( t, D i ) . (6) n i − p i,j,ℓ ≥ j =1 ℓ =1 Now the global rank r ( t ) must satisfy m � r ( t ) ≥ r − ( t, D i ) = r − (7) λ i =1 22-3

Markov Inequality Lower Bound We know that the pdf of any unseen t must satisfy E [ X ] ≤ τ We can use the Markov Inequality to lower bound the rank of any site s i with database D i as, n i n i � � r ( t, D i ) = Pr[ X j > X ] = n i − Pr[ X ≥ X j ] j =1 j =1 b ij n i p i,j,ℓ E [ X ] � � Loose! (Markov Ineq.) n i − v i,j,ℓ . ≥ j =1 ℓ =1 b ij n i τ � � v i,j,ℓ = r − ( t, D i ) . (6) n i − p i,j,ℓ ≥ j =1 ℓ =1 Now the global rank r ( t ) must satisfy m � r ( t ) ≥ r − ( t, D i ) = r − (7) λ i =1 22-4

Linear Programming Lower Bound Any unseen tuple t must have E [ X ] ≤ τ 23-1

Linear Programming Lower Bound Any unseen tuple t must have E [ X ] ≤ τ We’ve seen how to derive a lower bound r − λ on the global rank for any unseen tuple t using Markov’s Inequality 23-2

Linear Programming Lower Bound Any unseen tuple t must have E [ X ] ≤ τ We’ve seen how to derive a lower bound r − λ on the global rank for any unseen tuple t using Markov’s Inequality We want to find as tight a r − λ as possible by finding the smallest possible r − ( t, D i ) ’s at each site 23-3

Linear Programming Lower Bound Any unseen tuple t must have E [ X ] ≤ τ We’ve seen how to derive a lower bound r − λ on the global rank for any unseen tuple t using Markov’s Inequality We want to find as tight a r − λ as possible by finding the smallest possible r − ( t, D i ) ’s at each site We can use Linear Programming in order to derive the r − ( t, D i ) at each site to find a tight r − λ 23-4

Linear Programming The idea is to construct the best possible X for an unseen tuple t at each site s i that obtains the smallest possible local rank for each s i 24-1

Linear Programming The idea is to construct the best possible X for an unseen tuple t at each site s i that obtains the smallest possible local rank for each s i X could take on arbitrary v ℓ ’s as it’s possible score values, some of which do not exist in value universe U i at a site s i 24-2

Linear Programming The idea is to construct the best possible X for an unseen tuple t at each site s i that obtains the smallest possible local rank for each s i X could take on arbitrary v ℓ ’s as it’s possible score values, some of which do not exist in value universe U i at a site s i We can show this problem is irrelevant after studying the se- mantics of the r ( t, D i ) ’s and the q ( v ) ’s 24-3

Linear Programming: a Note on q ( v ) ’s Recall that r ( t i,j , D y ) = � b i,j ℓ =1 p i,j,l q y ( v i,j,l ) and q ( v ) is essentially a stair case curve as above 25-1

Linear Programming X may take a value v ℓ not in U i with v 2 as its nearest left neighbor 25-2

Linear Programming Even if X takes a value v ℓ not in U i we can decrease v ℓ until we hit v 2 in U i and E [ X ] ≤ τ clearly still holds as we are only decreasing the value of one of the choices in X 25-3

Linear Programming Also note that during this transformation q ( v ℓ ) = q ( v 2 ) and so the local rank of t remains the same 25-4

Linear Programming Formulation Now we can assume X draws values from U i 26-1

Linear Programming Formulation Now we can assume X draws values from U i Then we can define a linear program with the constraints 0 ≤ p ℓ ≤ 1 ℓ = 1 , . . . , γ = | U i | p 1 + . . . + p γ = 1 p 1 v 1 + . . . + p γ v γ ≤ τ and minimize the local rank which is, r ( X, D i ) = � γ ℓ =1 p ℓ q i ( v ℓ ) 26-2

Ranking Distributed Probabilistic Data Jeffrey Jestes Feifei Li Ke - PowerPoint PPT Presentation

Ranking Distributed Probabilistic Data Jeffrey Jestes Feifei Li Ke Yi 1-1 Introduction Ranking queries are important tools used to return only the most significant results 2-1 Introduction Ranking queries are important tools used to return

Easy and Hard Outline Constraint Ranking in OT The Constraint Ranking problem Making fast

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Ranking candidate genes from Ranking candidate genes from perturbation experiments Niko

Online Submodular Set Cover, Ranking, and Repeated Active Learning Online Ranking: At each round,

TVM for Ads Ranking @ Facebook Hao Lu, Ansha Yu, Yinghai Lu, Andrew Tulloch Ads Ranking at

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

III.3 Probabilistic Retrieval Models 1. Probabilistic Ranking Principle 2. Binary Independence

A Ranking Method to Improve A Ranking Method to Improve Detection of Disease Using Selectively

Kernel Principal Component Ranking: Robust Ranking on Noisy Data Evgeni Tsivtsivadze Botond

Lecture 3: Improving Ranking with Lecture 3: Improving Ranking with Behavior Data Eugene

Running Probabilistic Running Probabilistic Running Probabilistic Programs Backwards Programs

Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Thesis

Probabilistic Computation Lecture 13 BPP vs. PH 1 Recap 2 Recap Probabilistic computation 2

Table of Contents I Probabilistic Reasoning Classical Probabilistic Models Basic Probabilistic

Probabilistic Computation Lecture 12 Flipping coins, taking chances PP, BPP 1 Probabilistic

Semantic Sitemaps R. Cyganiak, H. Stenzhorn, R. Delbru, S. Decker, G. Tummarello DERI Galway A

Automatic 3D Mapping for Tree Diameter Measurements in Inventory Operations Jean-Franois

Presentation to the Commission Summer Units 2 and 3 COL Application Review Safety Evaluation

Continental Steel Superfund Site Nabil Fayoumi Overview Background Site History

Lecture 3: Sports rating models David Aldous January 27, 2016 Sports are a popular topic for

Learning to Rank Learning to Rank with Partially-Labeled Data with Partially-Labeled Data Kevin

FIFA Foe Fun! Tim Chartier Mark Kozek Davidson College Whittier College Michael

Labeling Information Enhancement for Multi-label Learning with Low-rank Subspace An Tao* , Ning