Low-Cost Learning via Active Data Procurement
EC 2015 Jacob Abernethy Yiling Chen Chien-Ju Ho Bo Waggoner
1
Low-Cost Learning via Active Data Procurement EC 2015 Jacob - - PowerPoint PPT Presentation
Low-Cost Learning via Active Data Procurement EC 2015 Jacob Abernethy Yiling Chen Chien-Ju Ho Bo Waggoner 1 General problem: buy data for learning h hypothesis (predictor) Learners LLC We Buy Data! 2 General problem: buy data for
1
2
We Buy Data!
3
We Buy Data!
Example: each person has medical data... … learn to predict disease
4
h
We Buy Data!
Learners LLC
5
6
7
8
9
10
c1
11
z1
z2
distribution
c2
h
12
data: 65 30 65 price: $0.22 $0.41 $0.88
13
ct zt
data: 65 30 65 price: $0.22 $0.41 $0.88
14
15
z1
z2
distribution h
16
VC-dim
h
17
γ
h
(Assume: γ is approximately known in advance)
18
h
(Assume: γ is approximately known in advance)
1 T
γ
19
this work Meir, Procaccia, Rosenschein 2012 Cummings, Ligett, Roth, Wu, Ziani 2015 Dekel, Fisher, Procaccia 2008 Ghosh, Ligett, Roth, Schoenebeck 2014 Horel, Ionnadis, Muthukrishnan 2014 Roth, Schoenebeck 2012 Ligett, Roth 2012 Cai, Daskalakis, Papadimitriou 2015
20
21
22
ht current hypothesis Alg
23
ct zt
null data point ht current hypothesis de-biased data Alg
24
c1 z1 z2 c2
h h
25
γ
26
27
lower and upper bounds)
γ in bounds (analogue of VC-dimension)
28
lower and upper bounds)
γ in bounds (analogue of VC-dimension)
Thanks!
29
Naive 1: post price of 1, obtain B points, run a learner on them. Naive 2: post lower prices, obtain biased data, do what?? Roth-Schoenebeck (EC 2012): draw prices from a distribution, obtain biased data, de-bias it.
30
31
this work Meir, Procaccia, Rosenschein 2012 Cummings, Ligett, Roth, Wu, Ziani 2015 Dekel, Fisher, Procaccia 2008 Ghosh, Ligett, Roth, Schoenebeck 2014 Horel, Ionnadis, Muthukrishnan 2014 Roth, Schoenebeck 2012 Ligett, Roth 2012 Cai, Daskalakis, Papadimitriou 2015 can fabricate data (like in peer- prediction) principal-agent style, data depends on effort agents cannot fabricate data, have costs
32
33
test on half
Descent Naive: pay 1 until budget is exhausted, then run alg Baseline: run alg on all data points (no budget) Large γ: bad correlations Small γ: independent cost/data
(assume approximate knowledge of K … in practice, can estimate it online)
cost” variant of no-regret setting
T
34
ǁ ∇ loss(ht , zt ) ǁ K x
1
35