Discovering relative importance of skyline attributes D. Mindolin - - PowerPoint PPT Presentation

discovering relative importance of skyline attributes
SMART_READER_LITE
LIVE PREVIEW

Discovering relative importance of skyline attributes D. Mindolin - - PowerPoint PPT Presentation

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work Discovering relative importance of skyline attributes D. Mindolin & J. Chomicki Department of Computer Science and Engineering


slide-1
SLIDE 1

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Discovering relative importance of skyline attributes

  • D. Mindolin & J. Chomicki

Department of Computer Science and Engineering University at Buffalo, SUNY

VLDB 2009

slide-2
SLIDE 2

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Main contributions

  • 1. generalizing skylines to p-skylines to capture relative attribute

importance

  • 2. discovering p-skylines on the basis of user feedback:

algorithms and complexity

slide-3
SLIDE 3

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Skylines [B¨

  • rzs¨
  • nyi et al., ICDE’01]

Skyline preferences

◮ Atomic preferences (H): total orders over attribute values ◮ Skyline preference relation (skyH): t1 preferred to t2 if

◮ t1 equal or better than t2 in every attribute, and ◮ t1 strictly better than t2 in at least one attribute

◮ Skyline: the set wskyH(O) of best tuples (according to skyH) in

a set of tuples O

slide-4
SLIDE 4

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Skylines [B¨

  • rzs¨
  • nyi et al., ICDE’01]

Skyline preferences

◮ Atomic preferences (H): total orders over attribute values ◮ Skyline preference relation (skyH): t1 preferred to t2 if

◮ t1 equal or better than t2 in every attribute, and ◮ t1 strictly better than t2 in at least one attribute

◮ Skyline: the set wskyH(O) of best tuples (according to skyH) in

a set of tuples O Example

Y X

slide-5
SLIDE 5

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Skylines [B¨

  • rzs¨
  • nyi et al., ICDE’01]

Skyline preferences

◮ Atomic preferences (H): total orders over attribute values ◮ Skyline preference relation (skyH): t1 preferred to t2 if

◮ t1 equal or better than t2 in every attribute, and ◮ t1 strictly better than t2 in at least one attribute

◮ Skyline: the set wskyH(O) of best tuples (according to skyH) in

a set of tuples O Example

Y X

Skyline properties

◮ Simple, unique way of

composing atomic preferences

◮ Equal attribute importance ◮ Skyline of exponential size

slide-6
SLIDE 6

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

p-skylines

p-skyline relation ≻

◮ Induced by an atomic preference relation >A ∈ H

≻ = {(t, t′) | t.A >A t′.A}

◮ Pareto accumulation (“≻1 equally important as ≻2“)

≻ = ≻1 ⊗ ≻2

◮ Prioritized accumulation (“≻1 more important than ≻2“)

≻ = ≻1 & ≻2

slide-7
SLIDE 7

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

p-skylines

p-skyline relation ≻

◮ Induced by an atomic preference relation >A ∈ H

≻ = {(t, t′) | t.A >A t′.A}

◮ Pareto accumulation (“≻1 equally important as ≻2“)

≻ = ≻1 ⊗ ≻2

◮ Prioritized accumulation (“≻1 more important than ≻2“)

≻ = ≻1 & ≻2

Each atomic preference must be used exactly once in ≻

slide-8
SLIDE 8

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Pareto accumulation [Kießling’02]

Definitions

Var(≻) - set of attributes used in definition of ≻ ES = {(t.A, t′.A) | A ∈ S ∧ t.A = t′.A} - pairs of tuples equal in every attribute in S

Pareto accumulation: ≻1 as important as ≻2

≻1 ⊗ ≻2 = (≻1 ∩ EVar(≻2)) ∪ (≻2 ∩ EVar(≻1)) ∪ (≻1 ∩ ≻2)

slide-9
SLIDE 9

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Pareto accumulation [Kießling’02]

Definitions

Var(≻) - set of attributes used in definition of ≻ ES = {(t.A, t′.A) | A ∈ S ∧ t.A = t′.A} - pairs of tuples equal in every attribute in S

Pareto accumulation: ≻1 as important as ≻2

≻1 ⊗ ≻2 = (≻1 ∩ EVar(≻2)) ∪ (≻2 ∩ EVar(≻1)) ∪ (≻1 ∩ ≻2)

≻X ⊗ ≻Y

Y X

slide-10
SLIDE 10

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Prioritized accumulation [Kießling’02]

Definitions

Var(≻) - set of attributes used in definition of ≻ ES = {(t.A, t′.A) | A ∈ S ∧ t.A = t′.A} - pairs of tuples equal in every attribute in S

Prioritized accumulation: ≻1 more important than ≻2

≻1 & ≻2 = ≻1 ∪ (≻2 ∩ EVar(≻2))

slide-11
SLIDE 11

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Prioritized accumulation [Kießling’02]

Definitions

Var(≻) - set of attributes used in definition of ≻ ES = {(t.A, t′.A) | A ∈ S ∧ t.A = t′.A} - pairs of tuples equal in every attribute in S

Prioritized accumulation: ≻1 more important than ≻2

≻1 & ≻2 = ≻1 ∪ (≻2 ∩ EVar(≻2))

≻X & ≻Y

Y X

slide-12
SLIDE 12

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

p-skyline properties

p-skyline properties

◮ Many different ways of composing atomic preferences

(different combinations of ⊗ and & )

◮ Reduction in query result size w.r.t. skylines ◮ Differences in attribute importance

slide-13
SLIDE 13

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Representing attribute importance with p-graphs

p-graph Γ≻ represents attribute importance induced by a p-skyline relation ≻

◮ Nodes: attributes Var(≻) ◮ Edges: from more important to less important attributes

slide-14
SLIDE 14

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Representing attribute importance with p-graphs

p-graph Γ≻ represents attribute importance induced by a p-skyline relation ≻

◮ Nodes: attributes Var(≻) ◮ Edges: from more important to less important attributes

≻′ = ≻A ⊗ ≻B ⊗ ≻C

A B C

slide-15
SLIDE 15

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Representing attribute importance with p-graphs

p-graph Γ≻ represents attribute importance induced by a p-skyline relation ≻

◮ Nodes: attributes Var(≻) ◮ Edges: from more important to less important attributes

≻′ = ≻A ⊗ ≻B ⊗ ≻C

A B C

≻′′ = ≻A & (≻B ⊗ ≻C)

A B C

slide-16
SLIDE 16

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Properties of p-graphs

Necessary and sufficient conditions for p-graphs Γ is a p-graph of a p-skyline relation iff Γ is

◮ SPO ◮ satisfies Envelope property

Envelope ∀A, B, C, D ∈ A, all different (A, B) ∈ Γ ∧ (C, D) ∈ Γ ∧ (C, B) ∈ Γ ⇒ (C, A) ∈ Γ ∨ (A, D) ∈ Γ ∨ (D, B) ∈ Γ

B D A C

slide-17
SLIDE 17

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Dominance testing using p-graphs

Is o preferred to o′ by ≻?

  • ≻ o′ iff

◮ o = o′, and ◮ for every attribute B in which o is worse than o′, there is a

parent A of B in which o is better than o′

slide-18
SLIDE 18

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Dominance testing using p-graphs

Is o preferred to o′ by ≻?

  • ≻ o′ iff

◮ o = o′, and ◮ for every attribute B in which o is worse than o′, there is a

parent A of B in which o is better than o′ ≻ = ≻A & (≻B ⊗ ≻C)

A B C

Example A : b better than w B : b better than w C : b better than w Then (b, w, b) ≻ (w, b, b) (b, w, b) ≻ (b, b, w)

slide-19
SLIDE 19

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Containment of p-skyline relations

Using p-graphs for checking containment ≻ ⊂ ≻′ ⇔ E(Γ≻) ⊂ E(Γ≻′) Containment hierarchy

A B C A B C A B C B A C B A C A C B A C B A B C C A B B A C A B C A C B A C B A C B A B C B A C B C A C A B C B A

slide-20
SLIDE 20

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Containment of p-skyline relations

Using p-graphs for checking containment ≻ ⊂ ≻′ ⇔ E(Γ≻) ⊂ E(Γ≻′) Containment hierarchy

A B C A B C A B C B A C B A C A C B A C B A B C C A B B A C A B C A C B A C B A C B A B C B A C B C A C A B C B A

Minimal extensions of ≻

◮ Correspond to immediate

children of Γ≻ in the hierarchy

◮ Obtained using rewriting rules

applied to syntax trees of p-skyline formulas

slide-21
SLIDE 21

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Minimal extension rewriting rules

Rules to compute minimal extensions

  • f p-skyline relation

◮ Applied to syntax trees of

p-skyline formulas

◮ Every minimal extension computed

by a single rule application in PTIME

◮ Full set consists of four rule

templates

◮ All minimal extensions of p-skyline

relation can be computed in PTIME

slide-22
SLIDE 22

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Minimal extension rewriting rules

Rules to compute minimal extensions

  • f p-skyline relation

◮ Applied to syntax trees of

p-skyline formulas

◮ Every minimal extension computed

by a single rule application in PTIME

◮ Full set consists of four rule

templates

◮ All minimal extensions of p-skyline

relation can be computed in PTIME Rule1 template

Original tree part

⊗ C1

. . .

& Ci+1. . . Ck N1 . . . Nm

Transformed tree part

⊗ C1 . . .Ci−1 Ci+2 . . . Ck & N1 ⊗ Ci+1 & Nm . . . N2

slide-23
SLIDE 23

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Minimal extension rewriting rules

Rule1 template

Original tree part

⊗ C1

. . .

& Ci+1. . . Ck N1 . . . Nm

Transformed tree part

⊗ C1 . . .Ci−1 Ci+2 . . . Ck & N1 ⊗ Ci+1 & Nm . . . N2

Example

A ⊗ (B & C & D) ⊗ E

⊗ A & E B C D

A ⊗ (B & (E ⊗ (C & D))

⊗ A & B ⊗ E & D C

slide-24
SLIDE 24

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Discovery of p-skyline relations from user feedback

Problem

Given a set A of relevant attributes and a set H of atomic preferences over A, discover the relative importance of attributes [in the form of a p-skyline relation ≻], based on user feedback.

slide-25
SLIDE 25

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Discovery of p-skyline relations from user feedback

Problem

Given a set A of relevant attributes and a set H of atomic preferences over A, discover the relative importance of attributes [in the form of a p-skyline relation ≻], based on user feedback.

User Feedback

tuples, O

slide-26
SLIDE 26

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Discovery of p-skyline relations from user feedback

Problem

Given a set A of relevant attributes and a set H of atomic preferences over A, discover the relative importance of attributes [in the form of a p-skyline relation ≻], based on user feedback.

User Feedback

superior examples, G inferior examples, W tuples, O

Superior examples

Tuples in O which user confidently likes

Inferior examples

Tuples in O which user confidently dislikes

slide-27
SLIDE 27

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Discovery of p-skyline relations from user feedback

Problem

Given a set A of relevant attributes and a set H of atomic preferences over A, discover the relative importance of attributes [in the form of a p-skyline relation ≻], based on user feedback.

User Feedback

superior examples, G inferior examples, W tuples, O

Superior examples

Tuples in O which user confidently likes

Inferior examples

Tuples in O which user confidently dislikes

≻ favors G/disfavors W in O

  • 1. G are among the best tuples in O according to ≻
  • 2. W are not among the best tuples in O according to ≻
slide-28
SLIDE 28

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Complexity of p-skyline relation discovery

Arbitrary W W = ∅ Checking existence of ≻ favoring G and NP-complete PTIME disfavoring W in O Computing maximal ≻ favoring G and FNP-complete PTIME disfavoring W in O

slide-29
SLIDE 29

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Computing maximal ≻ favoring G in O (W = ∅)

Approach

  • 1. Construct a system N of negative constraints from G and O
  • 2. Apply minimal extension rules to find maximal ≻ satisfying N
  • 3. Various optimizations possible
slide-30
SLIDE 30

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Computing maximal ≻ favoring G in O (W = ∅)

Approach

  • 1. Construct a system N of negative constraints from G and O
  • 2. Apply minimal extension rules to find maximal ≻ satisfying N
  • 3. Various optimizations possible

Algorithm complexity O(|O| · |G| · |A|3)

slide-31
SLIDE 31

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Negative constraints

≻ favors G in O: what does it mean?

  • 1. ≻ favors G in O: for every o ∈ O, o′ ∈ G, o ≻ o′
  • 2. o ≻ o′: use the dominance testing rule

◮ Lτ: attributes in which o is better than o′ ◮ Rτ: attributes in which o is worse than o′ ◮ negative constraint τ =< Lτ, Rτ >: some attribute in Rτ is not

a child in Γ≻ of (i.e., not less important than) any attribute in Lτ

  • 3. o ≻ o′ iff corresponding τ is satisfied
slide-32
SLIDE 32

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Negative constraints

≻ favors G in O: what does it mean?

  • 1. ≻ favors G in O: for every o ∈ O, o′ ∈ G, o ≻ o′
  • 2. o ≻ o′: use the dominance testing rule

◮ Lτ: attributes in which o is better than o′ ◮ Rτ: attributes in which o is worse than o′ ◮ negative constraint τ =< Lτ, Rτ >: some attribute in Rτ is not

a child in Γ≻ of (i.e., not less important than) any attribute in Lτ

  • 3. o ≻ o′ iff corresponding τ is satisfied

Example

id A B C

  • b

w w

w b b

  • ≻ o′ represented by

τ =< {A}, {B, C} >

slide-33
SLIDE 33

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Negative constraints

≻ favors G in O: what does it mean?

  • 1. ≻ favors G in O: for every o ∈ O, o′ ∈ G, o ≻ o′
  • 2. o ≻ o′: use the dominance testing rule

◮ Lτ: attributes in which o is better than o′ ◮ Rτ: attributes in which o is worse than o′ ◮ negative constraint τ =< Lτ, Rτ >: some attribute in Rτ is not

a child in Γ≻ of (i.e., not less important than) any attribute in Lτ

  • 3. o ≻ o′ iff corresponding τ is satisfied

Example

id A B C

  • b

w w

w b b

  • ≻ o′ represented by

τ =< {A}, {B, C} >

Example 2

≻A & (≻B ⊗ ≻C)

B A C

does not satisfy τ (≻A & ≻B) ⊗ ≻C

B A C

satisfy τ

slide-34
SLIDE 34

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Using N in algorithm

Rule application strategy

◮ Three out of four rule templates used to compute

≻sky ⊂ ≻1 ⊂ . . . ⊂ ≻k

◮ Each ≻i satisfies N ◮ Each ≻i is a minimal extension of ≻i−1 (≻sky = ≻0) ◮ No minimal extension of ≻k satisfies N ◮ Each rule only addes edges to Γ≻i going to/from set of

attributes to distinguished attribute E

slide-35
SLIDE 35

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Using N in algorithm

Rule application strategy

◮ Three out of four rule templates used to compute

≻sky ⊂ ≻1 ⊂ . . . ⊂ ≻k

◮ Each ≻i satisfies N ◮ Each ≻i is a minimal extension of ≻i−1 (≻sky = ≻0) ◮ No minimal extension of ≻k satisfies N ◮ Each rule only addes edges to Γ≻i going to/from set of

attributes to distinguished attribute E Bottleneck: checking satisfaction of N by ≻

◮ Efficiently check every τ ∈ N against ≻ ◮ Reduce the size of N

slide-36
SLIDE 36

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Efficient checking satisfaction to N

Minimization of N

◮ N is minimal w.r.t. ≻ iff every τ ∈ N is minimal w.r.t. ≻ ◮ τ is minimal w.r.t. ≻ iff ¬∃X ∈ Lτ, Y ∈ Rτ : (X, Y ) ∈ Γ≻

slide-37
SLIDE 37

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Efficient checking satisfaction to N

Minimization of N

◮ N is minimal w.r.t. ≻ iff every τ ∈ N is minimal w.r.t. ≻ ◮ τ is minimal w.r.t. ≻ iff ¬∃X ∈ Lτ, Y ∈ Rτ : (X, Y ) ∈ Γ≻

Minimization of τ

(≻A & ≻B) ⊗ ≻C

B A C

satisfy τ τ =< {A}, {B, C} > Minimal τ =< {A}, {C} >

slide-38
SLIDE 38

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Efficient checking satisfaction to N

Minimization of N

◮ N is minimal w.r.t. ≻ iff every τ ∈ N is minimal w.r.t. ≻ ◮ τ is minimal w.r.t. ≻ iff ¬∃X ∈ Lτ, Y ∈ Rτ : (X, Y ) ∈ Γ≻

Minimization of τ

(≻A & ≻B) ⊗ ≻C

B A C

satisfy τ τ =< {A}, {B, C} > Minimal τ =< {A}, {C} >

Checking minimal τ

◮ τ - minimal w.r.t ≻i ◮ ≻i+1 - minimal extension of ≻i ◮ Γ≻i+1 − Γ≻i =

{(X, E) |X ∈ PE} ∪ {(E, Y ) | Y ∈ CE}

Then ≻i+1 violates τ iff

◮ Rτ = {E} ∧ PE ∩ Lτ = ∅, or ◮ Rτ ⊆ CE ∧ E ∈ Lτ

slide-39
SLIDE 39

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Methods to reduce |N|

Motivation N of |G| · (|O| − 1) constraints checked agains ≻i in every iteration

slide-40
SLIDE 40

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Methods to reduce |N|

Motivation N of |G| · (|O| − 1) constraints checked agains ≻i in every iteration Apply skyline to O

◮ Need only to compare G with skyline of O instead of O ◮ N(G, O) equivalent to N(G, wskyH(O))

slide-41
SLIDE 41

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Methods to reduce |N|

Motivation N of |G| · (|O| − 1) constraints checked agains ≻i in every iteration Apply skyline to O

◮ Need only to compare G with skyline of O instead of O ◮ N(G, O) equivalent to N(G, wskyH(O))

Apply skyline to N Represent τ ∈ N as bitmap, e.g., Lτ = {A}, Rτ = {C} Lτ Rτ A B C A B C 1 1 1 N(G, O) equivalent to bitmap skyline of N(G, O)

slide-42
SLIDE 42

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Experiments: Accuracy

Setup

◮ O: NHL player stats of ∼ 10k tuples ◮ |A| ∈ {9, 12} ◮ ≻fav generated randomly ◮ G drawn from w≻fav (O)

Accuracy measures

◮ Precision =

|w≻(O)∩w≻fav (O)| |w≻(O)|

◮ Recall =

|w≻(O)∩w≻fav (O)| |w≻fav (O)|

Results

20 40 60 80 0.6 0.7 0.8 0.9 1

# superior examples Precision9 Recall9 Precision12 Recall12

Conclusions

◮ Precision is consistently high ◮ Recall is low for small G (due to the maximality of ≻) but grows fast

slide-43
SLIDE 43

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Experiments: Performance

Setup

◮ Three datasets (anticorrelated,

uniform, correlated) of 50k tuples

◮ |A| ∈ {10, 15, 20}

Conclusions

Algorithm is scalable w.r.t. # superior examples and |A|

Results

50 100 150 100 101 102 103

# superior examples ms

anticorr uniform corr 10 12 14 16 18 20 100 101 102 103

|A| ms

anticorr uniform corr

slide-44
SLIDE 44

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Related work

  • 1. [ B¨
  • rzs¨
  • nyi et al., ICDE’01 ]

◮ Skylines

  • 2. [ Kießling et al., VLDB’02 ]

◮ Pareto and prioritized accumulation

  • 3. [ Holland et al., PKDD’03 ]

◮ Mining p-skyline-like preferences (atomic preferences,

  • perators)

◮ Web server logs used as input ◮ Heuristics used

  • 4. [ Jiang et al., KDD’08 ]

◮ Mining atomic preference relations using superior/inferior

examples [skyline semantics]

◮ Intractable problems, heuristics used

  • 5. [ Lee et al., DEXA’08 ]

◮ Mining of [Skyline+equivalence] preference relations ◮ Answers to simple comparison questions used as feedback

slide-45
SLIDE 45

p-skylines Properties of p-skyline relations p-skyline relation elicitation Related & future work

Future work

◮ Attribute importance relationships between sets of attributes ◮ Selecting ”good“ superior examples ◮ Other scenarios of discovery (various forms of feedback,

various result criteria)

◮ p-skylines: expressiveness, algorithms