Ranking Distributed Probabilistic Data Jeffrey Jestes Feifei Li Ke - - PowerPoint PPT Presentation

ranking distributed probabilistic data
SMART_READER_LITE
LIVE PREVIEW

Ranking Distributed Probabilistic Data Jeffrey Jestes Feifei Li Ke - - PowerPoint PPT Presentation

Ranking Distributed Probabilistic Data Jeffrey Jestes Feifei Li Ke Yi 1-1 Introduction Ranking queries are important tools used to return only the most significant results 2-1 Introduction Ranking queries are important tools used to return


slide-1
SLIDE 1

1-1

Ranking Distributed Probabilistic Data

Jeffrey Jestes Feifei Li Ke Yi

slide-2
SLIDE 2

2-1

Introduction

Ranking queries are important tools used to return only the most significant results

slide-3
SLIDE 3

2-2

Introduction

Ranking queries are important tools used to return only the most significant results Ranking queries are arguably one of the most important tools for distributed applications

slide-4
SLIDE 4

2-3

Introduction

Ranking queries are important tools used to return only the most significant results Ranking queries are arguably one of the most important tools for distributed applications Not surprisingly, many distributed applications such as sensor networks with fuzzy measurements are also inherently uncertain in nature

slide-5
SLIDE 5

2-4

Introduction

Ranking queries are important tools used to return only the most significant results Ranking queries are arguably one of the most important tools for distributed applications Not surprisingly, many distributed applications such as sensor networks with fuzzy measurements are also inherently uncertain in nature Such applications may be best represented with probabilistic data

slide-6
SLIDE 6

2-5

Introduction

Ranking queries are important tools used to return only the most significant results Ranking queries are arguably one of the most important tools for distributed applications Not surprisingly, many distributed applications such as sensor networks with fuzzy measurements are also inherently uncertain in nature Such applications may be best represented with probabilistic data Even though distributed probabilistic data is relatively common, there has been no prior research on how to rank distributed prob- abilistic data

slide-7
SLIDE 7

3-1

Attribute-Level Model of Uncertainty ( with a scoring attribute )

tuples score t1 X1 = {(v1,1, p1,1), (v1,2, p1,2), . . . , (v1,b1, p1,b1)} t2 X2 = {(v2,1, p2,1), . . . , v2,b2, p2,b2)} . . . . . . tN XN = {(vN,1, pN,1), . . . , (vN,bN , pN,bN )}

slide-8
SLIDE 8

3-2

Attribute-Level Model of Uncertainty ( with a scoring attribute )

tuples score t1 X1 = {(v1,1, p1,1), (v1,2, p1,2), . . . , (v1,b1, p1,b1)} t2 X2 = {(v2,1, p2,1), . . . , v2,b2, p2,b2)} . . . . . . tN XN = {(vN,1, pN,1), . . . , (vN,bN , pN,bN )}

slide-9
SLIDE 9

3-3

Attribute-Level Model of Uncertainty ( with a scoring attribute )

tuples score t1 X1 = {(v1,1, p1,1), (v1,2, p1,2), . . . , (v1,b1, p1,b1)} t2 X2 = {(v2,1, p2,1), . . . , v2,b2, p2,b2)} . . . . . . tN XN = {(vN,1, pN,1), . . . , (vN,bN , pN,bN )}

slide-10
SLIDE 10

4-1

Example Attribute-Level Uncertain Database

tuples score t1 {(120, 0.8), (62, 0.2)} t2 {(103, 0.7), (70, 0.3)} t3 {(98, 1)} world W Pr[W] {t1 = 120, t2 = 103, t3 = 98} 0.8 × 0.7 × 1 = 0.56 {t1 = 120, t3 = 98, t2 = 70} 0.8 × 0.3 × 1 = 0.24 {t2 = 103, t3 = 98, t1 = 62} 0.2 × 0.7 × 1 = 0.14 {t3 = 98, t2 = 70, t1 = 62} 0.2 × 0.3 × 1 = 0.06

slide-11
SLIDE 11

4-2

Example Attribute-Level Uncertain Database

tuples score t1 {(120, 0.8), (62, 0.2)} t2 {(103, 0.7), (70, 0.3)} t3 {(98, 1)} world W Pr[W] {t1 = 120, t2 = 103, t3 = 98} 0.8 × 0.7 × 1 = 0.56 {t1 = 120, t3 = 98, t2 = 70} 0.8 × 0.3 × 1 = 0.24 {t2 = 103, t3 = 98, t1 = 62} 0.2 × 0.7 × 1 = 0.14 {t3 = 98, t2 = 70, t1 = 62} 0.2 × 0.3 × 1 = 0.06

slide-12
SLIDE 12

4-3

Example Attribute-Level Uncertain Database

tuples score t1 {(120, 0.8), (62, 0.2)} t2 {(103, 0.7), (70, 0.3)} t3 {(98, 1)} world W Pr[W] {t1 = 120, t2 = 103, t3 = 98} 0.8 × 0.7 × 1 = 0.56 {t1 = 120, t3 = 98, t2 = 70} 0.8 × 0.3 × 1 = 0.24 {t2 = 103, t3 = 98, t1 = 62} 0.2 × 0.7 × 1 = 0.14 {t3 = 98, t2 = 70, t1 = 62} 0.2 × 0.3 × 1 = 0.06

slide-13
SLIDE 13

4-4

Example Attribute-Level Uncertain Database

tuples score t1 {(120, 0.8), (62, 0.2)} t2 {(103, 0.7), (70, 0.3)} t3 {(98, 1)} world W Pr[W] {t1 = 120, t2 = 103, t3 = 98} 0.8 × 0.7 × 1 = 0.56 {t1 = 120, t3 = 98, t2 = 70} 0.8 × 0.3 × 1 = 0.24 {t2 = 103, t3 = 98, t1 = 62} 0.2 × 0.7 × 1 = 0.14 {t3 = 98, t2 = 70, t1 = 62} 0.2 × 0.3 × 1 = 0.06

slide-14
SLIDE 14

4-5

Example Attribute-Level Uncertain Database

tuples score t1 {(120, 0.8), (62, 0.2)} t2 {(103, 0.7), (70, 0.3)} t3 {(98, 1)} world W Pr[W] {t1 = 120, t2 = 103, t3 = 98} 0.8 × 0.7 × 1 = 0.56 {t1 = 120, t3 = 98, t2 = 70} 0.8 × 0.3 × 1 = 0.24 {t2 = 103, t3 = 98, t1 = 62} 0.2 × 0.7 × 1 = 0.14 {t3 = 98, t2 = 70, t1 = 62} 0.2 × 0.3 × 1 = 0.06

slide-15
SLIDE 15

4-6

Example Attribute-Level Uncertain Database

tuples score t1 {(120, 0.8), (62, 0.2)} t2 {(103, 0.7), (70, 0.3)} t3 {(98, 1)} world W Pr[W] {t1 = 120, t2 = 103, t3 = 98} 0.8 × 0.7 × 1 = 0.56 {t1 = 120, t3 = 98, t2 = 70} 0.8 × 0.3 × 1 = 0.24 {t2 = 103, t3 = 98, t1 = 62} 0.2 × 0.7 × 1 = 0.14 {t3 = 98, t2 = 70, t1 = 62} 0.2 × 0.3 × 1 = 0.06

slide-16
SLIDE 16

5-1

Ranking (top-k) queries (with scores)

Very useful queries: rank by importance, rank by similarity, rank by relevance, k-nearest neighbors

slide-17
SLIDE 17

5-2

Ranking (top-k) queries (with scores)

Very useful queries: rank by importance, rank by similarity, rank by relevance, k-nearest neighbors U-topk: [Soliman, Ilyas, Chang, 07], [Yi, Li, Srivastava, Kollios, 08] U-kRanks: [Soliman, Ilyas, Chang, 07], [Lian, Chen, 08], [Yi, Li, Srivastava, Kollios, 08] PT-k: [Hua, Pei, Zhang, Lin, 08] Global-topk: [Zhang, Chomicki, 08]

slide-18
SLIDE 18

5-3

Ranking (top-k) queries (with scores)

Very useful queries: rank by importance, rank by similarity, rank by relevance, k-nearest neighbors U-topk: [Soliman, Ilyas, Chang, 07], [Yi, Li, Srivastava, Kollios, 08] U-kRanks: [Soliman, Ilyas, Chang, 07], [Lian, Chen, 08], [Yi, Li, Srivastava, Kollios, 08] PT-k: [Hua, Pei, Zhang, Lin, 08] Global-topk: [Zhang, Chomicki, 08] Expected ranks: [Cormode, Li, Yi, 09]

slide-19
SLIDE 19

6-1

Ranking Query Properties

Ranking method Exact-k Containment Unique-Rank Value-Invariant Stability U-topk weak ×

  • U-kRanks
  • ×
  • ×

PT-k × weak

  • Global-topk
  • ×
  • Expected Ranks
  • [Cormode, Li, Yi, 09] has proven that the Expected Ranks def-

inition satisfies all of the above properties while no other defi- nition does

slide-20
SLIDE 20

7-1

Expected Ranks

We can see a tuple t’s rank distribution as a discrete distribution consisting of pairs of (rankW (t), Pr[W]) for all possible worlds W where rankW (t) is the rank of t in W

slide-21
SLIDE 21

7-2

Expected Ranks

We can see a tuple t’s rank distribution as a discrete distribution consisting of pairs of (rankW (t), Pr[W]) for all possible worlds W where rankW (t) is the rank of t in W The expectance of a distribution is an important statistical prop- erty and can provides us important information about a tuple’s rank distribution

slide-22
SLIDE 22

7-3

Expected Ranks

We can see a tuple t’s rank distribution as a discrete distribution consisting of pairs of (rankW (t), Pr[W]) for all possible worlds W where rankW (t) is the rank of t in W The expectance of a distribution is an important statistical prop- erty and can provides us important information about a tuple’s rank distribution Formally, the expected rank of a tuple ti, r(ti), may be defined as r(ti) =

  • W ∈W

Pr[W] × rankW (ti) (1) where, rankW (ti) = |{tj ∈ W|wtj > wti}| wti = score attribute value of ti in W W = the set of all possible W

slide-23
SLIDE 23

8-1

Expected Ranks Example

tuples score t1 {(120, 0.8), (62, 0.2)} t2 {(103, 0.7), (70, 0.3)} t3 {(98, 1)} world W Pr[W] {t1 = 120, t2 = 103, t3 = 98} 0.8 × 0.7 × 1 = 0.56 {t1 = 120, t3 = 98, t2 = 70} 0.8 × 0.3 × 1 = 0.24 {t2 = 103, t3 = 98, t1 = 62} 0.2 × 0.7 × 1 = 0.14 {t3 = 98, t2 = 70, t1 = 62} 0.2 × 0.3 × 1 = 0.06 tuple r(tuple) t1 0.56 × 0 + 0.24 × 0 + 0.14 × 2 + 0.06 × 2 = 0.4 t2 0.56 × 1 + 0.24 × 2 + 0.14 × 0 + 0.06 × 1 = 1.1 t3 0.56 × 2 + 0.24 × 1 + 0.14 × 1 + 0.06 × 0 = 1.5

slide-24
SLIDE 24

8-2

Expected Ranks Example

tuples score t1 {(120, 0.8), (62, 0.2)} t2 {(103, 0.7), (70, 0.3)} t3 {(98, 1)} world W Pr[W] {t1 = 120, t2 = 103, t3 = 98} 0.8 × 0.7 × 1 = 0.56 {t1 = 120, t3 = 98, t2 = 70} 0.8 × 0.3 × 1 = 0.24 {t2 = 103, t3 = 98, t1 = 62} 0.2 × 0.7 × 1 = 0.14 {t3 = 98, t2 = 70, t1 = 62} 0.2 × 0.3 × 1 = 0.06 tuple r(tuple) t1 0.56 × 0 + 0.24 × 0 + 0.14 × 2 + 0.06 × 2 = 0.4 t2 0.56 × 1 + 0.24 × 2 + 0.14 × 0 + 0.06 × 1 = 1.1 t3 0.56 × 2 + 0.24 × 1 + 0.14 × 1 + 0.06 × 0 = 1.5

slide-25
SLIDE 25

8-3

Expected Ranks Example

tuples score t1 {(120, 0.8), (62, 0.2)} t2 {(103, 0.7), (70, 0.3)} t3 {(98, 1)} world W Pr[W] {t1 = 120, t2 = 103, t3 = 98} 0.8 × 0.7 × 1 = 0.56 {t1 = 120, t3 = 98, t2 = 70} 0.8 × 0.3 × 1 = 0.24 {t2 = 103, t3 = 98, t1 = 62} 0.2 × 0.7 × 1 = 0.14 {t3 = 98, t2 = 70, t1 = 62} 0.2 × 0.3 × 1 = 0.06 tuple r(tuple) t1 0.56 × 0 + 0.24 × 0 + 0.14 × 2 + 0.06 × 2 = 0.4 t2 0.56 × 1 + 0.24 × 2 + 0.14 × 0 + 0.06 × 1 = 1.1 t3 0.56 × 2 + 0.24 × 1 + 0.14 × 1 + 0.06 × 0 = 1.5

slide-26
SLIDE 26

8-4

Expected Ranks Example

tuples score t1 {(120, 0.8), (62, 0.2)} t2 {(103, 0.7), (70, 0.3)} t3 {(98, 1)} world W Pr[W] {t1 = 120, t2 = 103, t3 = 98} 0.8 × 0.7 × 1 = 0.56 {t1 = 120, t3 = 98, t2 = 70} 0.8 × 0.3 × 1 = 0.24 {t2 = 103, t3 = 98, t1 = 62} 0.2 × 0.7 × 1 = 0.14 {t3 = 98, t2 = 70, t1 = 62} 0.2 × 0.3 × 1 = 0.06 tuple r(tuple) t1 0.56 × 0 + 0.24 × 0 + 0.14 × 2 + 0.06 × 2 = 0.4 t2 0.56 × 1 + 0.24 × 2 + 0.14 × 0 + 0.06 × 1 = 1.1 t3 0.56 × 2 + 0.24 × 1 + 0.14 × 1 + 0.06 × 0 = 1.5

slide-27
SLIDE 27

8-5

Expected Ranks Example

tuples score t1 {(120, 0.8), (62, 0.2)} t2 {(103, 0.7), (70, 0.3)} t3 {(98, 1)} world W Pr[W] {t1 = 120, t2 = 103, t3 = 98} 0.8 × 0.7 × 1 = 0.56 {t1 = 120, t3 = 98, t2 = 70} 0.8 × 0.3 × 1 = 0.24 {t2 = 103, t3 = 98, t1 = 62} 0.2 × 0.7 × 1 = 0.14 {t3 = 98, t2 = 70, t1 = 62} 0.2 × 0.3 × 1 = 0.06 tuple r(tuple) t1 0.56 × 0 + 0.24 × 0 + 0.14 × 2 + 0.06 × 2 = 0.4 t2 0.56 × 1 + 0.24 × 2 + 0.14 × 0 + 0.06 × 1 = 1.1 t3 0.56 × 2 + 0.24 × 1 + 0.14 × 1 + 0.06 × 0 = 1.5

slide-28
SLIDE 28

8-6

Expected Ranks Example

tuples score t1 {(120, 0.8), (62, 0.2)} t2 {(103, 0.7), (70, 0.3)} t3 {(98, 1)} world W Pr[W] {t1 = 120, t2 = 103, t3 = 98} 0.8 × 0.7 × 1 = 0.56 {t1 = 120, t3 = 98, t2 = 70} 0.8 × 0.3 × 1 = 0.24 {t2 = 103, t3 = 98, t1 = 62} 0.2 × 0.7 × 1 = 0.14 {t3 = 98, t2 = 70, t1 = 62} 0.2 × 0.3 × 1 = 0.06 tuple r(tuple) t1 0.56 × 0 + 0.24 × 0 + 0.14 × 2 + 0.06 × 2 = 0.4 t2 0.56 × 1 + 0.24 × 2 + 0.14 × 0 + 0.06 × 1 = 1.1 t3 0.56 × 2 + 0.24 × 1 + 0.14 × 1 + 0.06 × 0 = 1.5

slide-29
SLIDE 29

8-7

Expected Ranks Example

tuples score t1 {(120, 0.8), (62, 0.2)} t2 {(103, 0.7), (70, 0.3)} t3 {(98, 1)} world W Pr[W] {t1 = 120, t2 = 103, t3 = 98} 0.8 × 0.7 × 1 = 0.56 {t1 = 120, t3 = 98, t2 = 70} 0.8 × 0.3 × 1 = 0.24 {t2 = 103, t3 = 98, t1 = 62} 0.2 × 0.7 × 1 = 0.14 {t3 = 98, t2 = 70, t1 = 62} 0.2 × 0.3 × 1 = 0.06 tuple r(tuple) t1 0.56 × 0 + 0.24 × 0 + 0.14 × 2 + 0.06 × 2 = 0.4 t2 0.56 × 1 + 0.24 × 2 + 0.14 × 0 + 0.06 × 1 = 1.1 t3 0.56 × 2 + 0.24 × 1 + 0.14 × 1 + 0.06 × 0 = 1.5

slide-30
SLIDE 30

8-8

Expected Ranks Example

tuples score t1 {(120, 0.8), (62, 0.2)} t2 {(103, 0.7), (70, 0.3)} t3 {(98, 1)} world W Pr[W] {t1 = 120, t2 = 103, t3 = 98} 0.8 × 0.7 × 1 = 0.56 {t1 = 120, t3 = 98, t2 = 70} 0.8 × 0.3 × 1 = 0.24 {t2 = 103, t3 = 98, t1 = 62} 0.2 × 0.7 × 1 = 0.14 {t3 = 98, t2 = 70, t1 = 62} 0.2 × 0.3 × 1 = 0.06 tuple r(tuple) t1 0.56 × 0 + 0.24 × 0 + 0.14 × 2 + 0.06 × 2 = 0.4 t2 0.56 × 1 + 0.24 × 2 + 0.14 × 0 + 0.06 × 1 = 1.1 t3 0.56 × 2 + 0.24 × 1 + 0.14 × 1 + 0.06 × 0 = 1.5

slide-31
SLIDE 31

8-9

Expected Ranks Example

tuples score t1 {(120, 0.8), (62, 0.2)} t2 {(103, 0.7), (70, 0.3)} t3 {(98, 1)} world W Pr[W] {t1 = 120, t2 = 103, t3 = 98} 0.8 × 0.7 × 1 = 0.56 {t1 = 120, t3 = 98, t2 = 70} 0.8 × 0.3 × 1 = 0.24 {t2 = 103, t3 = 98, t1 = 62} 0.2 × 0.7 × 1 = 0.14 {t3 = 98, t2 = 70, t1 = 62} 0.2 × 0.3 × 1 = 0.06 tuple r(tuple) t1 0.56 × 0 + 0.24 × 0 + 0.14 × 2 + 0.06 × 2 = 0.4 t2 0.56 × 1 + 0.24 × 2 + 0.14 × 0 + 0.06 × 1 = 1.1 t3 0.56 × 2 + 0.24 × 1 + 0.14 × 1 + 0.06 × 0 = 1.5

slide-32
SLIDE 32

8-10

Expected Ranks Example

tuples score t1 {(120, 0.8), (62, 0.2)} t2 {(103, 0.7), (70, 0.3)} t3 {(98, 1)} world W Pr[W] {t1 = 120, t2 = 103, t3 = 98} 0.8 × 0.7 × 1 = 0.56 {t1 = 120, t3 = 98, t2 = 70} 0.8 × 0.3 × 1 = 0.24 {t2 = 103, t3 = 98, t1 = 62} 0.2 × 0.7 × 1 = 0.14 {t3 = 98, t2 = 70, t1 = 62} 0.2 × 0.3 × 1 = 0.06 tuple r(tuple) t1 0.56 × 0 + 0.24 × 0 + 0.14 × 2 + 0.06 × 2 = 0.4 t2 0.56 × 1 + 0.24 × 2 + 0.14 × 0 + 0.06 × 1 = 1.1 t3 0.56 × 2 + 0.24 × 1 + 0.14 × 1 + 0.06 × 0 = 1.5

slide-33
SLIDE 33

8-11

Expected Ranks Example

tuples score t1 {(120, 0.8), (62, 0.2)} t2 {(103, 0.7), (70, 0.3)} t3 {(98, 1)} world W Pr[W] {t1 = 120, t2 = 103, t3 = 98} 0.8 × 0.7 × 1 = 0.56 {t1 = 120, t3 = 98, t2 = 70} 0.8 × 0.3 × 1 = 0.24 {t2 = 103, t3 = 98, t1 = 62} 0.2 × 0.7 × 1 = 0.14 {t3 = 98, t2 = 70, t1 = 62} 0.2 × 0.3 × 1 = 0.06 tuple r(tuple) t1 0.56 × 0 + 0.24 × 0 + 0.14 × 2 + 0.06 × 2 = 0.4 t2 0.56 × 1 + 0.24 × 2 + 0.14 × 0 + 0.06 × 1 = 1.1 t3 0.56 × 2 + 0.24 × 1 + 0.14 × 1 + 0.06 × 0 = 1.5

slide-34
SLIDE 34

9-1

Expected Ranks

It has been shown that r(ti) may be written as r(ti) =

bi

  • l=1

pi,l(q(vi,l) − Pr[Xi > vi,l]) (2) where, bi = number of choices in the pdf of ti pi,l = probability of choice l in tuple ti q(vi,l) =

  • j Pr[Xj > vi,l]

Xi = pdf of tuple ti Pr[Xi > vi,l] = contribution of ti to q(vi,l)

slide-35
SLIDE 35

9-2

Expected Ranks

It has been shown that r(ti) may be written as r(ti) =

bi

  • l=1

pi,l(q(vi,l) − Pr[Xi > vi,l]) (2) where, bi = number of choices in the pdf of ti pi,l = probability of choice l in tuple ti q(vi,l) =

  • j Pr[Xj > vi,l]

Xi = pdf of tuple ti Pr[Xi > vi,l] = contribution of ti to q(vi,l) q(vi,l) is the sum of the probabilities that a tuple will out- rank a tuple with score vi,l

slide-36
SLIDE 36

9-3

Expected Ranks

It has been shown that r(ti) may be written as r(ti) =

bi

  • l=1

pi,l(q(vi,l) − Pr[Xi > vi,l]) (2) where, bi = number of choices in the pdf of ti pi,l = probability of choice l in tuple ti q(vi,l) =

  • j Pr[Xj > vi,l]

Xi = pdf of tuple ti Pr[Xi > vi,l] = contribution of ti to q(vi,l) Xi may contain value-probability pairs (v, p) s.t. v > vi,l, since the existence of ti = vi,l precludes ti = v, we must subtract the corresponding p’s from q(vi,l)

slide-37
SLIDE 37

9-4

Expected Ranks

It has been shown that r(ti) may be written as r(ti) =

bi

  • l=1

pi,l(q(vi,l) − Pr[Xi > vi,l]) (2) where, bi = number of choices in the pdf of ti pi,l = probability of choice l in tuple ti q(vi,l) =

  • j Pr[Xj > vi,l]

Xi = pdf of tuple ti Pr[Xi > vi,l] = contribution of ti to q(vi,l) Efficient algorithms exist to compute the Expected ranks in O(NlogN) time for a database of N tuples

slide-38
SLIDE 38

10-1

Computing Expected Ranks by q(v)’s

120 103 98 70 62

0.8 1.5 2.5 3.0 tuples score t1 {(120, 0.8), (62, 0.2)} t2 {(103, 0.7), (70, 0.3)} t3 {(98, 1)} 2.8

slide-39
SLIDE 39

10-2

Computing Expected Ranks by q(v)’s

120 103 98 70 62

0.8 1.5 2.5 3.0 tuples score t1 {(120, 0.8), (62, 0.2)} t2 {(103, 0.7), (70, 0.3)} t3 {(98, 1)} 2.8

slide-40
SLIDE 40

10-3

Computing Expected Ranks by q(v)’s

120 103 98 70 62

0.8 1.5 2.5 3.0 tuples score t1 {(120, 0.8), (62, 0.2)} t2 {(103, 0.7), (70, 0.3)} t3 {(98, 1)} r(t1) = 0.8 × 0 2.8

slide-41
SLIDE 41

10-4

Computing Expected Ranks by q(v)’s

120 103 98 70 62

0.8 1.5 2.5 3.0 tuples score t1 {(120, 0.8), (62, 0.2)} t2 {(103, 0.7), (70, 0.3)} t3 {(98, 1)} r(t1) = 0.8 × 0 + 0.2 × (2.8 − 0.8) = 0.4 2.8

slide-42
SLIDE 42

11-1

Distributed Probabilistic Data Model

site 1

tuples score

t1,1 X1,1 t1,2 X1,2 . . . . . .

. . .

site m

tuples score

t2,1 X2,1 t2,2 X2,2 . . . . . .

slide-43
SLIDE 43

11-2

Distributed Probabilistic Data Model

tuples t1 t2 . . . tN

site 1

tuples score

t1,1 X1,1 t1,2 X1,2 . . . . . .

. . .

site m

tuples score

t2,1 X2,1 t2,2 X2,2 . . . . . .

We can think of the union of the individual databases Di at each site si as a conceptual database D Conceptual Database D

slide-44
SLIDE 44

12-1

Ranking Queries for Distributed Probabilistic Data

We introduce two frameworks for ranking queries for distributed probabilistic data Sorted Access on Expected Scores Sorted Access on Local Ranks

slide-45
SLIDE 45

13-1

Sorted Access on Local Ranks Framework

site 1 t1,1 t1,2 . . . t1,n1 site 2 t2,1 t2,2 . . . t2,n2 site m tm,1 tm,2 . . . t2,nm . . . Every site calculates the local ranks of its tuples and stores tuples in ascending order of local ranks

slide-46
SLIDE 46

13-2

Sorted Access on Local Ranks Framework

site 1 t1,1 t1,2 . . . t1,n1 site 2 t2,1 t2,2 . . . t2,n2 site m tm,1 tm,2 . . . t2,nm . . . SERVER The server accesses tuples in ascending order of local ranks and combines the local ranks to get the global ranks

slide-47
SLIDE 47

14-1

Local and Global Ranks

The local rank of a tuple ti,j at a site si in database Di is r(ti,j, Di) =

bi,j

  • l=0

pi,j,l(qi(vi,j,l) − Pr[Xi,j > vi,j,l]) (3) The local rank for a tuple ti,j at a site sy with database Dy, s.t. i = y is r(ti,j, Dy) =

bi,j

  • l=1

pi,j,l(qy(vi,j,l)) (4) The global rank for a tuple ti,j is r(ti,j, Dy) =

m

  • y=1

r(ti,j, Dy) (5)

slide-48
SLIDE 48

14-2

Local and Global Ranks

The local rank of a tuple ti,j at a site si in database Di is r(ti,j, Di) =

bi,j

  • l=0

pi,j,l(qi(vi,j,l) − Pr[Xi,j > vi,j,l]) (3) The local rank for a tuple ti,j at a site sy with database Dy, s.t. i = y is r(ti,j, Dy) =

bi,j

  • l=1

pi,j,l(qy(vi,j,l)) (4)

slide-49
SLIDE 49

14-3

Local and Global Ranks

The local rank of a tuple ti,j at a site si in database Di is r(ti,j, Di) =

bi,j

  • l=0

pi,j,l(qi(vi,j,l) − Pr[Xi,j > vi,j,l]) (3) The local rank for a tuple ti,j at a site sy with database Dy, s.t. i = y is r(ti,j, Dy) =

bi,j

  • l=1

pi,j,l(qy(vi,j,l)) (4) The global rank for a tuple ti,j is r(ti,j, Dy) =

m

  • y=1

r(ti,j, Dy) (5)

slide-50
SLIDE 50

15-1

Sorted Access on Local Ranks Initialization

site 1 tuple lrank → t1,1 1.2 t1,2 5.9 . . . t1,n1 34.2 site 2 tuple lrank → t2,1 2.3 t2,2 3.4 . . . t2,n2 29.1 site 3 tuple lrank → t3,1 0.8 t3,2 4.1 . . . t3,n3 40.4

  • Rep. Queue

tuple lrank t3,1 0.8 t1,1 1.2 t2,1 2.3 site 1 tuple lrank t1,1 1.2 → t1,2 5.9 . . . t1,n1 34.2 site 2 tuple lrank t2,1 2.3 → t2,2 3.4 . . . t2,n2 29.1 site 3 tuple lrank t3,1 0.8 → t3,2 4.1 . . . t3,n3 40.4

slide-51
SLIDE 51

15-2

Sorted Access on Local Ranks Initialization

  • Rep. Queue

tuple lrank t3,1 0.8 t1,1 1.2 t2,1 2.3 site 1 tuple lrank t1,1 1.2 → t1,2 5.9 . . . t1,n1 34.2 site 2 tuple lrank t2,1 2.3 → t2,2 3.4 . . . t2,n2 29.1 site 3 tuple lrank t3,1 0.8 → t3,2 4.1 . . . t3,n3 40.4

slide-52
SLIDE 52

16-1

Sorted Access on Local Ranks: a Round

site 1 tuple lrank t1,1 1.2 t1,2 5.9 → . . . t1,n1 34.2 site 2 tuple lrank t2,1 2.3 t2,2 3.4 → . . . t2,n2 29.1 site 3 tuple lrank t3,1 0.8 t3,2 4.1 → . . . t3,n3 40.4

  • Rep. Queue

tuple lrank t2,2 3.4 t3,2 4.1 t1,2 5.9 top − 2 Queue tuple grank t2,1 5.4 t1,1 7.9

slide-53
SLIDE 53

16-2

Sorted Access on Local Ranks: a Round

site 1 tuple lrank t1,1 1.2 t1,2 5.9 → . . . t1,n1 34.2 site 2 tuple lrank t2,1 2.3 t2,2 3.4 → . . . t2,n2 29.1 site 3 tuple lrank t3,1 0.8 t3,2 4.1 → . . . t3,n3 40.4

  • Rep. Queue

tuple lrank t2,2 3.4 t3,2 4.1 t1,2 5.9 top − 2 Queue tuple grank t2,1 5.4 t1,1 7.9 tuple lrank t2,2 3.4

slide-54
SLIDE 54

16-3

Sorted Access on Local Ranks: a Round

site 1 tuple lrank t1,1 1.2 t1,2 5.9 → . . . t1,n1 34.2 site 2 tuple lrank t2,1 2.3 t2,2 3.4 → . . . t2,n2 29.1 site 3 tuple lrank t3,1 0.8 t3,2 4.1 → . . . t3,n3 40.4

  • Rep. Queue

tuple lrank t2,2 3.4 t3,2 4.1 t1,2 5.9 top − 2 Queue tuple grank t2,1 5.4 t1,1 7.9 tuple lrank t2,2 3.4 tuple lrank t2,3 4.8

slide-55
SLIDE 55

16-4

Sorted Access on Local Ranks: a Round

site 1 tuple lrank t1,1 1.2 t1,2 5.9 → . . . t1,n1 34.2 site 2 tuple lrank t2,1 2.3 t2,2 3.4 → . . . t2,n2 29.1 site 3 tuple lrank t3,1 0.8 t3,2 4.1 → . . . t3,n3 40.4 top − 2 Queue tuple grank t2,1 5.4 t1,1 7.9 tuple lrank t2,2 3.4

  • Rep. Queue

tuple lrank t3,2 4.1 t2,3 4.8 t1,2 5.9

slide-56
SLIDE 56

16-5

Sorted Access on Local Ranks: a Round

site 1 tuple lrank t1,1 1.2 t1,2 5.9 → . . . t1,n1 34.2 site 2 tuple lrank t2,1 2.3 t2,2 3.4 → . . . t2,n2 29.1 site 3 tuple lrank t3,1 0.8 t3,2 4.1 → . . . t3,n3 40.4 top − 2 Queue tuple grank t2,1 5.4 t1,1 7.9 tuple lrank t2,2 3.4 X2,2 X2,2

  • Rep. Queue

tuple lrank t3,2 4.1 t2,3 4.8 t1,2 5.9

slide-57
SLIDE 57

16-6

Sorted Access on Local Ranks: a Round

site 1 tuple lrank t1,1 1.2 t1,2 5.9 → . . . t1,n1 34.2 site 2 tuple lrank t2,1 2.3 t2,2 3.4 → . . . t2,n2 29.1 site 3 tuple lrank t3,1 0.8 t3,2 4.1 → . . . t3,n3 40.4 top − 2 Queue tuple grank t2,1 5.4 t1,1 7.9 tuple lrank t2,2 3.4 lrank 1.5 lrank 0.7

  • Rep. Queue

tuple lrank t3,2 4.1 t2,3 4.8 t1,2 5.9

slide-58
SLIDE 58

16-7

Sorted Access on Local Ranks: a Round

site 1 tuple lrank t1,1 1.2 t1,2 5.9 → . . . t1,n1 34.2 site 2 tuple lrank t2,1 2.3 t2,2 3.4 → . . . t2,n2 29.1 site 3 tuple lrank t3,1 0.8 t3,2 4.1 → . . . t3,n3 40.4 top − 2 Queue tuple grank t2,1 5.4 t1,1 7.9 tuple lrank t2,2 3.4 lrank 1.5 lrank 0.7 grank 5.6

  • Rep. Queue

tuple lrank t3,2 4.1 t2,3 4.8 t1,2 5.9

slide-59
SLIDE 59

16-8

Sorted Access on Local Ranks: a Round

site 1 tuple lrank t1,1 1.2 t1,2 5.9 → . . . t1,n1 34.2 site 2 tuple lrank t2,1 2.3 t2,2 3.4 → . . . t2,n2 29.1 site 3 tuple lrank t3,1 0.8 t3,2 4.1 → . . . t3,n3 40.4 top − 2 Queue tuple grank t2,1 5.4 t1,1 7.9

  • Rep. Queue

tuple lrank t3,2 4.1 t2,3 4.8 t1,2 5.9 tuple grank t2,2 5.6

slide-60
SLIDE 60

16-9

Sorted Access on Local Ranks: a Round

site 1 tuple lrank t1,1 1.2 t1,2 5.9 → . . . t1,n1 34.2 site 2 tuple lrank t2,1 2.3 t2,2 3.4 → . . . t2,n2 29.1 site 3 tuple lrank t3,1 0.8 t3,2 4.1 → . . . t3,n3 40.4

  • Rep. Queue

tuple lrank t3,2 4.1 t2,3 4.8 t1,2 5.9 top − 2 Queue tuple grank t2,1 5.4 t2,2 5.6

slide-61
SLIDE 61

16-10

Sorted Access on Local Ranks: a Round

site 1 tuple lrank t1,1 1.2 t1,2 5.9 → . . . t1,n1 34.2 site 2 tuple lrank t2,1 2.3 t2,2 3.4 → . . . t2,n2 29.1 site 3 tuple lrank t3,1 0.8 t3,2 4.1 → . . . t3,n3 40.4

  • Rep. Queue

tuple lrank t3,2 4.1 t2,3 4.8 t1,2 5.9 top − 2 Queue tuple grank t2,1 5.4 t2,2 5.6 We can safely terminate whenever the largest grank from top − k queue is ≤ smallest lrank from Rep. Queue

slide-62
SLIDE 62

16-11

Sorted Access on Local Ranks: a Round

site 1 tuple lrank t1,1 1.2 t1,2 5.9 → . . . t1,n1 34.2 site 2 tuple lrank t2,1 2.3 t2,2 3.4 → . . . t2,n2 29.1 site 3 tuple lrank t3,1 0.8 t3,2 4.1 → . . . t3,n3 40.4

  • Rep. Queue

tuple lrank t3,2 4.1 t2,3 4.8 t1,2 5.9 top − 2 Queue tuple grank t2,1 5.4 t2,2 5.6 We can safely terminate whenever the largest grank from top − k queue is ≤ smallest lrank from Rep. Queue A-LR

slide-63
SLIDE 63

17-1

Sorted Access on Expected Scores Framework

site 1 t1,1 t1,2 . . . t1,n1 site 2 t2,1 t2,2 . . . t2,n2 site m tm,1 tm,2 . . . t2,nm . . . Every site calculates the local ranks and the expected scores of its tuples and stores the tuples in descending

  • rder of expected scores
slide-64
SLIDE 64

17-2

Sorted Access on Expected Scores Framework

site 1 t1,1 t1,2 . . . t1,n1 site 2 t2,1 t2,2 . . . t2,n2 site m tm,1 tm,2 . . . t2,nm . . . SERVER Tuples are accessed by descending order of expected scores and the server calculates global ranks

slide-65
SLIDE 65

18-1

Sorted Access on Expected Scores Initialization

site 1 tuple E[X] → t1,1 489 t1,2 421 . . . t1,n1 5 site 2 tuple E[X] → t2,1 476 t2,2 464 . . . t2,n2 11 site 3 tuple E[X] → t3,1 500 t3,2 432 . . . t3,n3 1

slide-66
SLIDE 66

18-2

Sorted Access on Expected Scores Initialization

  • Rep. Queue

tuple E[X] t3,1 500 t1,1 489 t2,1 476 site 1 tuple E[X] t1,1 489 → t1,2 421 . . . t1,n1 5 site 2 tuple E[X] t2,1 476 → t2,2 464 . . . t2,n2 11 site 3 tuple E[X] t3,1 500 → t3,2 432 . . . t3,n3 1

slide-67
SLIDE 67

19-1

Sorted Access on Expected Scores: a Round

site 1 tuple E[X] t1,1 489 t1,2 421 → . . . t1,n1 5 site 2 tuple E[X] t2,1 476 t2,2 464 → . . . t2,n2 11 site 3 tuple E[X] t3,1 500 t3,2 432 → . . . t3,n3 1

  • Rep. Queue

tuple E[X] t2,2 464 t3,2 432 t1,2 421 top − 2 Queue tuple grank t2,1 5.4 t1,1 7.9

slide-68
SLIDE 68

19-2

Sorted Access on Expected Scores: a Round

site 1 tuple E[X] t1,1 489 t1,2 421 → . . . t1,n1 5 site 2 tuple E[X] t2,1 476 t2,2 464 → . . . t2,n2 11 site 3 tuple E[X] t3,1 500 t3,2 432 → . . . t3,n3 1

  • Rep. Queue

tuple E[X] t2,2 464 t3,2 432 t1,2 421 top − 2 Queue tuple grank t2,1 5.4 t1,1 7.9 tuple lrank t2,2 3.4

slide-69
SLIDE 69

19-3

Sorted Access on Expected Scores: a Round

site 1 tuple E[X] t1,1 489 t1,2 421 → . . . t1,n1 5 site 2 tuple E[X] t2,1 476 t2,2 464 → . . . t2,n2 11 site 3 tuple E[X] t3,1 500 t3,2 432 → . . . t3,n3 1

  • Rep. Queue

tuple E[X] t2,2 464 t3,2 432 t1,2 421 top − 2 Queue tuple grank t2,1 5.4 t1,1 7.9 tuple lrank t2,2 3.4 tuple E[X] t2,3 429

slide-70
SLIDE 70

19-4

Sorted Access on Expected Scores: a Round

site 1 tuple E[X] t1,1 489 t1,2 421 → . . . t1,n1 5 site 2 tuple E[X] t2,1 476 t2,2 464 → . . . t2,n2 11 site 3 tuple E[X] t3,1 500 t3,2 432 → . . . t3,n3 1 top − 2 Queue tuple grank t2,1 5.4 t1,1 7.9 tuple lrank t2,2 3.4

  • Rep. Queue

tuple E[X] t3,2 432 t2,3 429 t1,2 421

slide-71
SLIDE 71

19-5

Sorted Access on Expected Scores: a Round

site 1 tuple E[X] t1,1 489 t1,2 421 → . . . t1,n1 5 site 2 tuple E[X] t2,1 476 t2,2 464 → . . . t2,n2 11 site 3 tuple E[X] t3,1 500 t3,2 432 → . . . t3,n3 1 top − 2 Queue tuple grank t2,1 5.4 t1,1 7.9 tuple lrank t2,2 3.4 X2,2 X2,2

  • Rep. Queue

tuple E[X] t3,2 432 t2,3 429 t1,2 421

slide-72
SLIDE 72

19-6

Sorted Access on Expected Scores: a Round

site 1 tuple E[X] t1,1 489 t1,2 421 → . . . t1,n1 5 site 2 tuple E[X] t2,1 476 t2,2 464 → . . . t2,n2 11 site 3 tuple E[X] t3,1 500 t3,2 432 → . . . t3,n3 1 top − 2 Queue tuple grank t2,1 5.4 t1,1 7.9 tuple lrank t2,2 3.4 lrank 1.5 lrank 0.7

  • Rep. Queue

tuple E[X] t3,2 432 t2,3 429 t1,2 421

slide-73
SLIDE 73

19-7

Sorted Access on Expected Scores: a Round

site 1 tuple E[X] t1,1 489 t1,2 421 → . . . t1,n1 5 site 2 tuple E[X] t2,1 476 t2,2 464 → . . . t2,n2 11 site 3 tuple E[X] t3,1 500 t3,2 432 → . . . t3,n3 1 top − 2 Queue tuple grank t2,1 5.4 t1,1 7.9 tuple lrank t2,2 3.4 lrank 1.5 lrank 0.7 grank 5.6

  • Rep. Queue

tuple E[X] t3,2 432 t2,3 429 t1,2 421

slide-74
SLIDE 74

19-8

Sorted Access on Expected Scores: a Round

site 1 tuple E[X] t1,1 489 t1,2 421 → . . . t1,n1 5 site 2 tuple E[X] t2,1 476 t2,2 464 → . . . t2,n2 11 site 3 tuple E[X] t3,1 500 t3,2 432 → . . . t3,n3 1 top − 2 Queue tuple grank t2,1 5.4 t1,1 7.9

  • Rep. Queue

tuple E[X] t3,2 432 t2,3 429 t1,2 421 tuple grank t2,2 5.6

slide-75
SLIDE 75

19-9

Sorted Access on Expected Scores: a Round

site 1 tuple E[X] t1,1 489 t1,2 421 → . . . t1,n1 5 site 2 tuple E[X] t2,1 476 t2,2 464 → . . . t2,n2 11 site 3 tuple E[X] t3,1 500 t3,2 432 → . . . t3,n3 1

  • Rep. Queue

tuple E[X] t3,2 432 t2,3 429 t1,2 421 top − 2 Queue tuple grank t2,1 5.4 t2,2 5.6

slide-76
SLIDE 76

19-10

Sorted Access on Expected Scores: a Round

site 1 tuple E[X] t1,1 489 t1,2 421 → . . . t1,n1 5 site 2 tuple E[X] t2,1 476 t2,2 464 → . . . t2,n2 11 site 3 tuple E[X] t3,1 500 t3,2 432 → . . . t3,n3 1

  • Rep. Queue

tuple E[X] t3,2 432 t2,3 429 t1,2 421 top − 2 Queue tuple grank t2,1 5.4 t2,2 5.6 Now the only question is when may we safely termi- nate and be certain we have the global top − k

slide-77
SLIDE 77

20-1

Sorted Access on Expected Scores: Termina- tion

The largest element from the top − k queue is clearly an upper bound r+

λ for the global rank of any seen tuple t with pdf X to

be in the top − k at round λ

slide-78
SLIDE 78

20-2

Sorted Access on Expected Scores: Termina- tion

  • Rep. Queue

tuple E[X] t3,2 432 t2,3 429 t1,2 421 top − 2 Queue tuple grank t2,1 5.4 t2,2 5.6 The largest element from the top − k queue is clearly an upper bound r+

λ for the global rank of any seen tuple t with pdf X to

be in the top − k at round λ

slide-79
SLIDE 79

20-3

Sorted Access on Expected Scores: Termina- tion

  • Rep. Queue

tuple E[X] t3,2 432 t2,3 429 t1,2 421 top − 2 Queue tuple grank t2,1 5.4 t2,2 5.6 The largest element from the top − k queue is clearly an upper bound r+

λ for the global rank of any seen tuple t with pdf X to

be in the top − k at round λ The head from the Representative queue with expectance τ is an upper bound for the expectance of any unseen t s.t. E[X] ≤ τ

slide-80
SLIDE 80

20-4

Sorted Access on Expected Scores: Termina- tion

  • Rep. Queue

tuple E[X] t3,2 432 t2,3 429 t1,2 421 top − 2 Queue tuple grank t2,1 5.4 t2,2 5.6 The largest element from the top − k queue is clearly an upper bound r+

λ for the global rank of any seen tuple t with pdf X to

be in the top − k at round λ The head from the Representative queue with expectance τ is an upper bound for the expectance of any unseen t s.t. E[X] ≤ τ How can we derive a lower bound r−

λ for the global rank of any

unseen tuple t s.t. when r+

λ ≤ r− λ it is safe to terminate at round

λ?

slide-81
SLIDE 81

21-1

Sorted Access on Expected Scores: a Lower Bound?

We introduce two methods to find a lower bound r−

λ for any unseen

tuple t at round λ

slide-82
SLIDE 82

21-2

Sorted Access on Expected Scores: a Lower Bound?

We introduce two methods to find a lower bound r−

λ for any unseen

tuple t at round λ Markov Inequality

slide-83
SLIDE 83

21-3

Sorted Access on Expected Scores: a Lower Bound?

We introduce two methods to find a lower bound r−

λ for any unseen

tuple t at round λ Markov Inequality Linear Programming

slide-84
SLIDE 84

22-1

Markov Inequality Lower Bound

We know that the pdf of any unseen t must satisfy E[X] ≤ τ

slide-85
SLIDE 85

22-2

Markov Inequality Lower Bound

We know that the pdf of any unseen t must satisfy E[X] ≤ τ We can use the Markov Inequality to lower bound the rank of any site si with database Di as, r(t, Di) =

ni

  • j=1

Pr[Xj > X] = ni −

ni

  • j=1

Pr[X ≥ Xj] ≥ ni −

ni

  • j=1

bij

  • ℓ=1

pi,j,ℓ E[X] vi,j,ℓ . (Markov Ineq.) ≥ ni −

ni

  • j=1

bij

  • ℓ=1

pi,j,ℓ τ vi,j,ℓ = r−(t, Di). (6)

slide-86
SLIDE 86

22-3

Markov Inequality Lower Bound

We know that the pdf of any unseen t must satisfy E[X] ≤ τ We can use the Markov Inequality to lower bound the rank of any site si with database Di as, r(t, Di) =

ni

  • j=1

Pr[Xj > X] = ni −

ni

  • j=1

Pr[X ≥ Xj] ≥ ni −

ni

  • j=1

bij

  • ℓ=1

pi,j,ℓ E[X] vi,j,ℓ . (Markov Ineq.) ≥ ni −

ni

  • j=1

bij

  • ℓ=1

pi,j,ℓ τ vi,j,ℓ = r−(t, Di). (6) Now the global rank r(t) must satisfy r(t) ≥

m

  • i=1

r−(t, Di) = r−

λ

(7)

slide-87
SLIDE 87

22-4

Markov Inequality Lower Bound

We know that the pdf of any unseen t must satisfy E[X] ≤ τ We can use the Markov Inequality to lower bound the rank of any site si with database Di as, r(t, Di) =

ni

  • j=1

Pr[Xj > X] = ni −

ni

  • j=1

Pr[X ≥ Xj] ≥ ni −

ni

  • j=1

bij

  • ℓ=1

pi,j,ℓ E[X] vi,j,ℓ . (Markov Ineq.) ≥ ni −

ni

  • j=1

bij

  • ℓ=1

pi,j,ℓ τ vi,j,ℓ = r−(t, Di). (6) Now the global rank r(t) must satisfy r(t) ≥

m

  • i=1

r−(t, Di) = r−

λ

(7)

Loose!

slide-88
SLIDE 88

23-1

Linear Programming Lower Bound

Any unseen tuple t must have E[X] ≤ τ

slide-89
SLIDE 89

23-2

Linear Programming Lower Bound

Any unseen tuple t must have E[X] ≤ τ

We’ve seen how to derive a lower bound r−

λ on the global rank

for any unseen tuple t using Markov’s Inequality

slide-90
SLIDE 90

23-3

Linear Programming Lower Bound

Any unseen tuple t must have E[X] ≤ τ

We’ve seen how to derive a lower bound r−

λ on the global rank

for any unseen tuple t using Markov’s Inequality We want to find as tight a r−

λ as possible by finding the small-

est possible r−(t, Di)’s at each site

slide-91
SLIDE 91

23-4

Linear Programming Lower Bound

Any unseen tuple t must have E[X] ≤ τ

We’ve seen how to derive a lower bound r−

λ on the global rank

for any unseen tuple t using Markov’s Inequality We want to find as tight a r−

λ as possible by finding the small-

est possible r−(t, Di)’s at each site We can use Linear Programming in order to derive the r−(t, Di) at each site to find a tight r−

λ

slide-92
SLIDE 92

24-1

Linear Programming

The idea is to construct the best possible X for an unseen tuple t at each site si that obtains the smallest possible local rank for each si

slide-93
SLIDE 93

24-2

Linear Programming

The idea is to construct the best possible X for an unseen tuple t at each site si that obtains the smallest possible local rank for each si

X could take on arbitrary vℓ’s as it’s possible score values, some of which do not exist in value universe Ui at a site si

slide-94
SLIDE 94

24-3

Linear Programming

The idea is to construct the best possible X for an unseen tuple t at each site si that obtains the smallest possible local rank for each si

X could take on arbitrary vℓ’s as it’s possible score values, some of which do not exist in value universe Ui at a site si We can show this problem is irrelevant after studying the se- mantics of the r(t, Di)’s and the q(v)’s

slide-95
SLIDE 95

25-1

Linear Programming: a Note on q(v)’s

Recall that r(ti,j, Dy) = bi,j

ℓ=1 pi,j,lqy(vi,j,l) and q(v) is essentially

a stair case curve as above

slide-96
SLIDE 96

25-2

X may take a value vℓ not in Ui with v2 as its nearest left neighbor

Linear Programming

slide-97
SLIDE 97

25-3

Even if X takes a value vℓ not in Ui we can decrease vℓ until we hit v2 in Ui and E[X] ≤ τ clearly still holds as we are only decreasing the value of one of the choices in X

Linear Programming

slide-98
SLIDE 98

25-4

Also note that during this transformation q(vℓ) = q(v2) and so the local rank of t remains the same

Linear Programming

slide-99
SLIDE 99

26-1

Linear Programming Formulation

Now we can assume X draws values from Ui

slide-100
SLIDE 100

26-2

Linear Programming Formulation

Now we can assume X draws values from Ui

Then we can define a linear program with the constraints 0 ≤ pℓ ≤ 1 ℓ = 1, . . . , γ = |Ui| p1 + . . . + pγ = 1 p1v1 + . . . + pγvγ ≤ τ and minimize the local rank which is, r(X, Di) = γ

ℓ=1 pℓqi(vℓ)

slide-101
SLIDE 101

26-3

Linear Programming Formulation

Now we can assume X draws values from Ui

Then we can define a linear program with the constraints 0 ≤ pℓ ≤ 1 ℓ = 1, . . . , γ = |Ui| p1 + . . . + pγ = 1 p1v1 + . . . + pγvγ ≤ τ and minimize the local rank which is, r(X, Di) = γ

ℓ=1 pℓqi(vℓ)

Each site can conduct these linear programs at each round in

  • rder for the server to derive a tight r−

λ

slide-102
SLIDE 102

26-4

Linear Programming Formulation

Now we can assume X draws values from Ui

Then we can define a linear program with the constraints 0 ≤ pℓ ≤ 1 ℓ = 1, . . . , γ = |Ui| p1 + . . . + pγ = 1 p1v1 + . . . + pγvγ ≤ τ and minimize the local rank which is, r(X, Di) = γ

ℓ=1 pℓqi(vℓ)

Each site can conduct these linear programs at each round in

  • rder for the server to derive a tight r−

λ

This algorithm is denoted as A − LP

slide-103
SLIDE 103

27-1

Eliminating LP’s at distributed sites

· · ·

Server LP LP LP

slide-104
SLIDE 104

27-2

Eliminating LP’s at distributed sites

· · ·

Server LP LP LP q1(v) q2(v) qm(v)

slide-105
SLIDE 105

27-3

Eliminating LP’s at distributed sites

· · ·

Server LP LP LP q1(v) q2(v) qm(v) LP’s

slide-106
SLIDE 106

27-4

Eliminating LP’s at distributed sites

· · ·

Server LP LP LP q1(v) q2(v) qm(v) LP’s Comunication expensive!

slide-107
SLIDE 107

27-5

Eliminating LP’s at distributed sites

· · ·

Server LP LP LP LP’s q∗

1(v)

q∗

2(v)

q∗

m(v)

slide-108
SLIDE 108

28-1

q∗(v)’s : the approximate q(v)’s

Formally, the problem is to find the optimal approximation q∗(v) to a q(v) which obtains the smallest approximation error given a fixed budget η

slide-109
SLIDE 109

28-2

q∗(v)’s : the approximate q(v)’s

Formally, the problem is to find the optimal approximation q∗(v) to a q(v) which obtains the smallest approximation error given a fixed budget η We also must ensure that by using these q∗(v)’s we still arrive at the actual global top-k at the server

slide-110
SLIDE 110

29-1

q∗(v)’s the approximate q(v)’s

Above we see a q∗(v) which takes two points α′ and α′′ which are not right upper corner points in the original q(v) The Blue region is the approximation error

slide-111
SLIDE 111

29-2

q∗(v)’s the approximate q(v)’s

We can minimize the error between the q∗(v) curve and q(v) curve by sampling only the right upper corner points

slide-112
SLIDE 112

29-3

q∗(v)’s the approximate q(v)’s

The new error after selecting α′ as α3 and α′′ as α4 is shown by the blue region

slide-113
SLIDE 113

30-1

q∗(v)’s the approximate q(v)’s

In order to find the optimal q∗(v) for a q(v) we can formulate a dynamic program A(i, j) = min

  • minx∈[i−1,j−1]{A(i − 1, x) − δj

q∗(i−1,x)}

minx∈[i,j−1]{A(i, x)} (8)

slide-114
SLIDE 114

30-2

q∗(v)’s the approximate q(v)’s

In order to find the optimal q∗(v) for a q(v) we can formulate a dynamic program A(i, j) = min

  • minx∈[i−1,j−1]{A(i − 1, x) − δj

q∗(i−1,x)}

minx∈[i,j−1]{A(i, x)} (8) A(i, j) is the optimal approximation error from selecting i points from the first j points from q(v)

slide-115
SLIDE 115

30-3

q∗(v)’s the approximate q(v)’s

In order to find the optimal q∗(v) for a q(v) we can formulate a dynamic program A(i, j) = min

  • minx∈[i−1,j−1]{A(i − 1, x) − δj

q∗(i−1,x)}

minx∈[i,j−1]{A(i, x)} (8) A(i, j) is the optimal approximation error from selecting i points from the first j points from q(v) This algorithm is denoted as A − ALP

slide-116
SLIDE 116

31-1

Updating the q∗(v)’s at the server.... for free

During some round when the server retrieves a tuple t with pdf X from a site si to update the representative queue, the server may see a ( vℓ, pℓ ) pair in X s.t. vℓ was not an originally sampled upper right corner point from q(v)

+ (vℓ, pℓ)

slide-117
SLIDE 117

32-1

Other Issues.... Latency

Currently we check the termination condition at the end of every round

slide-118
SLIDE 118

32-2

Other Issues.... Latency

Currently we check the termination condition at the end of every round An intuitive idea is that we may reduce latency by checking termination condition only after every β rounds

slide-119
SLIDE 119

32-3

Other Issues.... Latency

Currently we check the termination condition at the end of every round An intuitive idea is that we may reduce latency by checking termination condition only after every β rounds This will reduce the computational burden at the server for the A − ALP algorithm, and reduce the computational burden at the sites for the A − LP algorithms

slide-120
SLIDE 120

32-4

Other Issues.... Latency

Currently we check the termination condition at the end of every round An intuitive idea is that we may reduce latency by checking termination condition only after every β rounds This will reduce the computational burden at the server for the A − ALP algorithm, and reduce the computational burden at the sites for the A − LP algorithms The tradeoff is that we could potentially miss the optimal ter- mination point, but not by more than β tuples

slide-121
SLIDE 121

33-1

Experimental Setup

Conducted on an Intel Xeon 5130 CPU @ 2GHz with 4GB memory

slide-122
SLIDE 122

33-2

Experimental Setup

Conducted on an Intel Xeon 5130 CPU @ 2GHz with 4GB memory We utilized three real data sets Movie data set from the Mystiq project containing 56,000 records Temperature data set collected from 54 sensors from the Intel Research Berkley lab containing 64,000 records Chlorine data set from the EPANET project containing 67,000 records

slide-123
SLIDE 123

33-3

Experimental Setup

Conducted on an Intel Xeon 5130 CPU @ 2GHz with 4GB memory We utilized three real data sets Movie data set from the Mystiq project containing 56,000 records Temperature data set collected from 54 sensors from the Intel Research Berkley lab containing 64,000 records Chlorine data set from the EPANET project containing 67,000 records We utilized one synthetic data set Synthetic Gaussian where each record’s score attribute draws it’s values from a Gaussian distribution with standard deviation [1, 1000] and the mean [5 * σ, 100000]

slide-124
SLIDE 124

34-1

Experimental Setup

The default experimental parameters are summarized below Symbol Definition Default Value N number of tuples 56,000 |X| choices in a tuple’s pdf 5 m number of sites 10 k number of tuples to rank 100 η |q∗(v)| 1% × |q(v)|

slide-125
SLIDE 125

34-2

Experimental Setup

The default experimental parameters are summarized below Symbol Definition Default Value N number of tuples 56,000 |X| choices in a tuple’s pdf 5 m number of sites 10 k number of tuples to rank 100 η |q∗(v)| 1% × |q(v)| In addition communication costs are determined as follows Object Definition Communication Cost ( bytes ) v value 4 p probability 4 X pdf |X| × 8 t tuple (|X| × 8) + 4 r(t, Di) local rank of t in Di 4 E[X] Expectance of X 4

slide-126
SLIDE 126

35-1

Communication Cost as k Varies

Chlorine Data Set

slide-127
SLIDE 127

36-1

Number of Rounds as k Varies

Chlorine Data Set

slide-128
SLIDE 128

37-1

Effect of N on Communication Cost

Chlorine Data Set

slide-129
SLIDE 129

38-1

Effect of N on Number of Rounds

Chlorine Data Set

slide-130
SLIDE 130

39-1

Effect of m on Communication Cost

Chlorine Data Set

slide-131
SLIDE 131

40-1

Effect of m on Number of Rounds

Chlorine Data Set

slide-132
SLIDE 132

41-1

Effect of η on Communication Cost

A − ALP Algorithm

slide-133
SLIDE 133

42-1

Effect of η on Number of Rounds

A − ALP Algorithm

slide-134
SLIDE 134

43-1

Effect of β on Communication Cost

Chlorine Data Set

slide-135
SLIDE 135

44-1

Effect of β on Number of Rounds

Chlorine Data Set

slide-136
SLIDE 136

45-1

Conclusions

We introduced computation and communication efficient al- gorithms to rank distributed probabilistic data using expected ranks

slide-137
SLIDE 137

45-2

Conclusions

We introduced computation and communication efficient al- gorithms to rank distributed probabilistic data using expected ranks For future work we would like to study ranking distributed probabilistic data using other ranking definitions, such as U- kRanks and the Paramterized Ranking Function

slide-138
SLIDE 138

45-3

Conclusions

We introduced computation and communication efficient al- gorithms to rank distributed probabilistic data using expected ranks For future work we would like to study ranking distributed probabilistic data using other ranking definitions, such as U- kRanks and the Paramterized Ranking Function Both PRF and U-kRanks rely upon the a tuple’s rank dis- tribution and we believe A − LP and A − ALP could be extended to support these two definitions

slide-139
SLIDE 139

45-4

Conclusions

We introduced computation and communication efficient al- gorithms to rank distributed probabilistic data using expected ranks For future work we would like to study ranking distributed probabilistic data using other ranking definitions, such as U- kRanks and the Paramterized Ranking Function In addition to ranking queries we would like to study other popular queries, such as skyline or nearest neighbor queries,

  • ver distributed probabilistic data

Both PRF and U-kRanks rely upon the a tuple’s rank dis- tribution and we believe A − LP and A − ALP could be extended to support these two definitions

slide-140
SLIDE 140

46-1

The End

T HANK YOU

Q and A

The entire source code is available from a link at http:/ww2.cs.fsu.edu/˜jestes

slide-141
SLIDE 141

47-1

Markov Inequality Lower Bound

r−(t, Di) = ni − τ

ni

  • j=1

bij

  • ℓ=1

pi,j,ℓ vi,j,ℓ (9)

slide-142
SLIDE 142

47-2

Markov Inequality Lower Bound

r−(t, Di) = ni − τ

ni

  • j=1

bij

  • ℓ=1

pi,j,ℓ vi,j,ℓ (9)

We have these invariants for each site si

slide-143
SLIDE 143

47-3

Markov Inequality Lower Bound

r−(t, Di) = ni − τ

ni

  • j=1

bij

  • ℓ=1

pi,j,ℓ vi,j,ℓ (9)

We have these invariants for each site si We only have to send these invariants one time to the server and then the server can check the termination condition r+

λ ≤

r−

λ at the end of each round λ

slide-144
SLIDE 144

47-4

Markov Inequality Lower Bound

r−(t, Di) = ni − τ

ni

  • j=1

bij

  • ℓ=1

pi,j,ℓ vi,j,ℓ (9)

We have these invariants for each site si We only have to send these invariants one time to the server and then the server can check the termination condition r+

λ ≤

r−

λ at the end of each round λ

The Markov Inequality only gives us a loose r−

λ and this leads

directly to our next r−

λ derivation

slide-145
SLIDE 145

48-1

Updating the q∗(v)’s at the server.... for free

During some round the server may see a ( vℓ, pℓ ) pair from an X s.t. vℓ was not an originally sampled upper right corner point from q(v)

slide-146
SLIDE 146

48-2

Updating the q∗(v)’s at the server.... for free

We create a new α upper right corner point for vℓ taking the value of its nearest right neigboring point

slide-147
SLIDE 147

48-3

Updating the q∗(v)’s at the server.... for free

The new α point is raised by pℓ

slide-148
SLIDE 148

48-4

Updating the q∗(v)’s at the server.... for free

Any α points to the left of vℓ which were not included in the

  • riginal sampled q∗(v) are also incremented by pℓ
slide-149
SLIDE 149

48-5

Updating the q∗(v)’s at the server.... for free

We stop when we hit the first α point included in the original q∗(v)

slide-150
SLIDE 150

49-1

Communication Cost as k Varies

Syntehtic Gaussian

slide-151
SLIDE 151

50-1

Communication Cost as k Varies

Movie Data Set

slide-152
SLIDE 152

51-1

Communication Cost as k Varies

Temperature Data Set

slide-153
SLIDE 153

52-1

Number of Rounds as k Varies

Syntehtic Gaussian

slide-154
SLIDE 154

53-1

Number of Rounds as k Varies

Movie Data Set

slide-155
SLIDE 155

54-1

Number of Rounds as k Varies

Temperature Data Set

slide-156
SLIDE 156

55-1

Effect of |X| on Communication Cost

Synthetic Gaussian Data Set

slide-157
SLIDE 157

56-1

Effect of |X| on Number of Rounds

Synthetic Gaussian Data Set

slide-158
SLIDE 158

57-1

Effect of ρ on Communication Cost

Chlorine Data Set

slide-159
SLIDE 159

58-1

Effect of ρ on Number of Rounds

Chlorine Data Set