Query-Based Data Pricing
Dan Suciu – U. of Washington Joint with M. Balazinska, B. Howe, P. Koutris, Daniel Li, Chao Li, G. Miklau, P. Upadhyaya
1 EPFL, 2013
Query-Based Data Pricing Dan Suciu U. of Washington Joint with M. - - PowerPoint PPT Presentation
Query-Based Data Pricing Dan Suciu U. of Washington Joint with M. Balazinska, B. Howe, P. Koutris, Daniel Li, Chao Li, G. Miklau, P. Upadhyaya EPFL, 2013 1 Data Has Value And it is increasingly being sold/bought on the Web Big data
1 EPFL, 2013
EPFL, 2013 2
EPFL, 2013 3
4 EPFL, 2013
5 EPFL, 2013
DIMACS - 10/2012 6
EPFL, 2013 7
Different price by business type
8
EPFL, 2013 9
EPFL, 2013 10
Cheaper just for Washington
11
12 EPFL, 2013
13 EPFL, 2013 How should a “good“ price function be?
14
15
Shape Color Picture Swan White Swan Yellow . . . . . Dragon Yellow Car Yellow . . . . . Fish White . . . . .
Picture credits: http://www.toysperiod.com/blog/uncategorized/the-modern-art-and-science-of-origami/
Price list Price V1 = σShape=‘Swan’(S) $2 V2 = σShape=‘Dragon’ (S) $2 V3 = σShape= ‘Car’ (S) $2 V4 = σShape= ‘Fish’ (S) $2 W1 = σColor=‘White’(S) $3 W2 = σColor=‘Yellow’(S) $3 W3 = σColor=‘Red’(S) $3
Price(σColor)=$3 Price(σShape)=$2
16
Shape Color Picture Swan White Swan Yellow . . . . . Dragon Yellow Car Yellow . . . . . Fish White . . . . .
Picture credits: http://www.toysperiod.com/blog/uncategorized/the-modern-art-and-science-of-origami/
Price(σColor)=$3 Price(σShape)=$2
Get all Dragons for $2 Get all Red Origami for $3
Price list Price V1 = σShape=‘Swan’(S) $2 V2 = σShape=‘Dragon’ (S) $2 V3 = σShape= ‘Car’ (S) $2 V4 = σShape= ‘Fish’ (S) $2 W1 = σColor=‘White’(S) $3 W2 = σColor=‘Yellow’(S) $3 W3 = σColor=‘Red’(S) $3
17
Shape Color Picture Swan White Swan Yellow . . . . . Dragon Yellow Car Yellow . . . . . Fish White . . . . .
Picture credits: http://www.toysperiod.com/blog/uncategorized/the-modern-art-and-science-of-origami/
Find the price of the entire db
$1? $4? $8? $20?
Price(σColor)=$3 Price(σShape)=$2
Get all Dragons for $2 Get all Red Origami for $3
Price list Price V1 = σShape=‘Swan’(S) $2 V2 = σShape=‘Dragon’ (S) $2 V3 = σShape= ‘Car’ (S) $2 V4 = σShape= ‘Fish’ (S) $2 W1 = σColor=‘White’(S) $3 W2 = σColor=‘Yellow’(S) $3 W3 = σColor=‘Red’(S) $3
Shape Color Picture Swan White Swan Yellow . . . . . Dragon Yellow Car Yellow . . . . . Fish White . . . . .
Picture credits: http://www.toysperiod.com/blog/uncategorized/the-modern-art-and-science-of-origami/
Find the price of the entire db V1, V2, V3, V4 determine Q, price(Q) ≤ $8 W1, W2, W3 determine Q, price(Q) ≤ $9
$1? $4? $8 $20? To ensure aribitrage-freeness, we can charge only $8 for the entire database.
Price(σColor)=$3 Price(σShape)=$2
Get all Dragons for $2 Get all Red Origami for $3
Price list Price V1 = σShape=‘Swan’(S) $2 V2 = σShape=‘Dragon’ (S) $2 V3 = σShape= ‘Car’ (S) $2 V4 = σShape= ‘Fish’ (S) $2 W1 = σColor=‘White’(S) $3 W2 = σColor=‘Yellow’(S) $3 W3 = σColor=‘Red’(S) $3
19
Shape Instructions Swan Fold,fold,fold… Dragon Cut,cut,cut,… Shape Color Picture Swan White Swan Yellow . . . . . Dragon Yellow Car Yellow . . . . . Fish White . . . . . Color PaperSpecs White 15g/100 Black 20g/100
Pictures credits: http://www.toysperiod.com/blog/uncategorized/the-modern-art-and-science-of-origami/
Find the price of the full join: Q = R ⋈ S ⋈ T
Price(σColor)=$3 Price(σShape)=$2 Price(σShape)=$99 Price(σColor)=$55
20
Shape Instructions Swan Fold,fold,fold… Dragon Cut,cut,cut,… Shape Color Picture Swan White Swan Yellow . . . . . Dragon Yellow Car Yellow . . . . . Fish White . . . . . Color PaperSpecs White 15g/100 Black 20g/100
Pictures credits: http://www.toysperiod.com/blog/uncategorized/the-modern-art-and-science-of-origami/
Find the price of the full join: Q = R ⋈ S ⋈ T
Price(σColor)=$3 Price(σShape)=$2 Price(σShape)=$99 Price(σColor)=$55
Shape Instructions Color Picture PaperSpecs Swan Fold,fold,fold… White 15g/100
21
Shape Instructions Swan Fold,fold,fold… Dragon Cut,cut,cut,… Shape Color Picture Swan White Swan Yellow . . . . . Dragon Yellow Car Yellow . . . . . Fish White . . . . . Color PaperSpecs White 15g/100 Black 20g/100
Pictures credits: http://www.toysperiod.com/blog/uncategorized/the-modern-art-and-science-of-origami/
Find the price of the full join: Q = R ⋈ S ⋈ T Not obvious! E.g. no Yellow Cars in the join. What to pay for? σShape=‘car’(R) or σColor=‘yellow’(T)
Price(σColor)=$3 Price(σShape)=$2 Price(σShape)=$99 Price(σColor)=$55
Shape Instructions Color Picture PaperSpecs Swan Fold,fold,fold… White 15g/100
22 EPFL, 2013
DIMACS - 10/2012 23
too expensive!
UID User Rating (0..5) 1 Alice 3 $10 2 Bob $10 3 Carol 1 $10 4 Dan $10 … … … 1000 Zoran 2 $10
– Raw data = no perturbation = high price – Differentially private = high perturbation = low price
– Tolerates error ±300 – Equivalently: variance v = 5000*
*Probability(|ĉ – c| ≥ 3 √2 σ) < 1/18=0.056 (Chebyshev), where σ=√v =50√2 ** ε = √2 sensitivity(q)/σ = 5√2 / 50√2 = 0.1
the price
UID User Rating (0..5) 1 Alice 3 $10 2 Bob $10 3 Carol 1 $10 4 Dan $10 … … … 1000 Zoran 2 $10
– Zero error, error ±300 error ±30 – Variance = 0, variance = 5000 variance = 50
– If price > $100 à arbitrage! Buy100 × queries with variance 5000, take average. Cost = 100 × $1.
arbitrage-free. UID User Rating (0..5) 1 Alice 3 $10 2 Bob $10 3 Carol 1 $10 4 Dan $10 … … … 1000 Zoran 2 $10
27 EPFL, 2013
28
2 in the database
EPFL, 2013 29
EPFL, 2013 30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 50 100 150 200 250 300 ILP construction time (100) Total time (100) ILP construction time (1000) Total time (1000)
query time in sec
UID User Rating (0..5) 1 Alice 3 $10 2 Bob $10 3 Carol 1 $10 4 Dan $10 … … … 1000 Zoran 2 $10
How much should we pay Carol?
Query c = x1+x2+…+x1000 Variance v = 50
UID User Rating (0..5) 1 Alice 3 $10 2 Bob $10 3 Carol 1 $10 4 Dan $10 … … … 1000 Zoran 2 $10
How much should we pay Carol?
Query c = x1+x2+…+x1000 Variance v = 50
Differential Privacy
is called ε-differential private, if for all D, D’ that differ in one item, and any set S P[ĉ(D) ∈S] ≤ exp(ε) × P[ĉ(D’) ∈S]
UID User Rating (0..5) 1 Alice 3 $10 2 Bob $10 3 Carol 1 $10 4 Dan $10 … … … 1000 Zoran 2 $10
How much should we pay Carol?
is called ε-differential private, if for all D, D’ that differ in one item, and any set S P[ĉ(D) ∈S] ≤ exp(ε) × P[ĉ(D’) ∈S]
ĉ(D) = c(D) + Lap(Δc/ε) is ε-differential private
Query c = x1+x2+…+x1000 Variance v = 50
Variance v=2(Δc/ε)2
Carol gets no money! Differential Privacy
UID User Rating (0..5) 1 Alice 3 $10 2 Bob $10 3 Carol 1 $10 4 Dan $10 … … … 1000 Zoran 2 $10
How much should we pay Carol?
ĉ(D) = c(D) + Lap(Δc/ε) is ε-differential private
ε(v) = supS log(P[ĉ(D) ∈S]/P[ĉ(D’) ∈S]) Fix variance v
Query c = x1+x2+…+x1000 Variance v = 50
Variance v=2(Δc/ε)2
W(ε) = Carol’s valuation function Carol gets no money! Differential Privacy Data Pricing
Carol’s compensation W depends on ε which depends on v
is called ε-differential private, if for all D, D’ that differ in one item, and any set S P[ĉ(D) ∈S] ≤ exp(ε) × P[ĉ(D’) ∈S]
5 10 15 20 2 4 6 8
ε
$10 $5
W(ε) – Option A W(ε) – Option B
Incentivizing Carol to reveal her valuation W(ε) is difficult! [Ghosh’11,Gkatzelis’12,Riederer’12] We use an idea from [Aperjis&Huberman’11]:
5 10 15 20 2 4 6 8
ε
$10
Risk-averse users count on the fact that most queries will have low privacy leak
$5
W(ε) – Option A W(ε) – Option B
Incentivizing Carol to reveal her valuation W(ε) is difficult! [Ghosh’11,Gkatzelis’12,Riederer’12] We use an idea from [Aperjis&Huberman’11]:
5 10 15 20 2 4 6 8
ε
$10
Risk-averse users count on the fact that most queries will have low privacy leak
$5
W(ε) – Option A W(ε) – Option B
Incentivizing Carol to reveal her valuation W(ε) is difficult! [Ghosh’11,Gkatzelis’12,Riederer’12] We use an idea from [Aperjis&Huberman’11]:
Risk-neutral users want full compensation at the risk of never being paid
38 EPFL, 2013
– Google maps v.s. IOS maps – Facebook’s users
39 EPFL, 2013
EPFL, 2013 40
EPFL, 2013 41