Low-Cost Learning via Active Data Procurement
September 2015 Jacob Abernethy Yiling Chen Chien-Ju Ho Bo Waggoner
1
Low-Cost Learning via Active Data Procurement September 2015 Jacob - - PowerPoint PPT Presentation
Low-Cost Learning via Active Data Procurement September 2015 Jacob Abernethy Yiling Chen Chien-Ju Ho Bo Waggoner 1 Coming soon to a society near you data-needers s r e d l o h - a t a d ex: pharmaceutical co. ex: medical data
1
2
d a t a
d e r s ex: medical data data-needers ex: pharmaceutical co.
3
z1 z2
data source h
4
h
h
5
c1 z1 z2
data source
c2
h
6
Studying ACTN-3 mutation and endurance running have mutation runners
7
Paying $10 for data (to study HIV) HIV-negative yes yes no yes yes HIV-positive no no yes
8
Paying $10 for data (to study HIV) HIV-negative yes yes no yes yes HIV-positive no no yes
9
entropies, gradients, loss functions, divergences
auctions, budgets, value distributions, reserve prices
10
entropies, gradients, loss functions, divergences
auctions, budgets, value distributions, reserve prices
11
12
13
Meir, Procaccia, Rosenschein 2012 Cummings, Ligett, Roth, Wu, Ziani 2015 Dekel, Fisher, Procaccia 2008 Ghosh, Ligett, Roth, Schoenebeck 2014 Horel, Ionnadis, Muthukrishnan 2014 Roth, Schoenebeck 2012 Ligett, Roth 2012 Cai, Daskalakis, Papadimitriou 2015 can fabricate data (like in peer- prediction) principal-agent style, data depends on effort agents cannot fabricate data, have costs this work
14
Meir, Procaccia, Rosenschein 2012 Cummings, Ligett, Roth, Wu, Ziani 2015 Dekel, Fisher, Procaccia 2008 Ghosh, Ligett, Roth, Schoenebeck 2014 Horel, Ionnadis, Muthukrishnan 2014 Roth, Schoenebeck 2012 Ligett, Roth 2012 Cai, Daskalakis, Papadimitriou 2015 can fabricate data (like in peer- prediction) principal-agent style, data depends on effort agents cannot fabricate data, have costs
this work
Conducting Truthful Surveys, Cheaply
15
c1 1
data source
c2
h
i.i.d.
16
17
18
Meir, Procaccia, Rosenschein 2012 Cummings, Ligett, Roth, Wu, Ziani 2015 Dekel, Fisher, Procaccia 2008 Ghosh, Ligett, Roth, Schoenebeck 2014 Horel, Ionnadis, Muthukrishnan 2014 Roth, Schoenebeck 2012 Ligett, Roth 2012 can fabricate data (like in peer- prediction) principal-agent style, data depends on effort agents cannot fabricate data, have costs
this work
Cai, Daskalakis, Papadimitriou 2015
19
20
Meir, Procaccia, Rosenschein 2012 Cummings, Ligett, Roth, Wu, Ziani 2015 Dekel, Fisher, Procaccia 2008 Ghosh, Ligett, Roth, Schoenebeck 2014 Horel, Ionnadis, Muthukrishnan 2014 Roth, Schoenebeck 2012 Ligett, Roth 2012 can fabricate data (like in peer- prediction) principal-agent style, data depends on effort agents cannot fabricate data, have costs
this work
Cai, Daskalakis, Papadimitriou 2015
21
Meir, Procaccia, Rosenschein 2012 Cummings, Ligett, Roth, Wu, Ziani 2015 Dekel, Fisher, Procaccia 2008 Ghosh, Ligett, Roth, Schoenebeck 2014 Horel, Ionnadis, Muthukrishnan 2014 Roth, Schoenebeck 2012 Ligett, Roth 2012 Cai, Daskalakis, Papadimitriou 2015 can fabricate data (like in peer- prediction) principal-agent style, data depends on effort agents cannot fabricate data, have costs
this work Abernethy, Frongillo, W. NIPS 2015
22
Propose model of online learning with purchased data: T arriving data points and budget B. Convert any “FTRL” algorithm into a mechanism. Show regret on order of T / √B and lower bounds of same order.
23
Propose model of online learning with purchased data: T arriving data points and budget B. Convert any “FTRL” algorithm into a mechanism. Show regret on order of T / √B and lower bounds of same order.
24
25
26
where h* minimizes sum
27
Assume: loss function is convex and Lipschitz, hypothesis space is Hilbert, etc
28
Assume: loss function is convex and Lipschitz, hypothesis space is Hilbert, etc
2
29
Assume: loss function is convex and Lipschitz, hypothesis space is Hilbert, etc
2
(j) ∝ ht-1 (j) exp[ η∇ℓ(ht-1, zt ) ]
30
2 where Δt = ǁ ∇ℓ(ht, zt) ǁ.
31
2 where Δt = ǁ ∇ℓ(ht, zt) ǁ.
32
33
ct zt
34
data: (32,12) (20,18) (32,12) price: $0.22 $0.41 $0.88
ct zt
35
36
where h* minimizes sum
ct zt
37
38
ct zt
data: (32,12) (20,18) (32,12) price: $1 $0 $0
39
ct zt
data: (32,12) (20,18) (32,12) price: $1 $0 $0
40
data: (32,12) (20,18) (32,12) Pr[price=1]: 0.3 0.06 0.41
2 / qt) ]
41
data: (32,12) (20,18) (32,12) Pr[price=1]: 0.3 0.06 0.41
2 / qt) ]
See also: Importance-Weighted Active Learning, Beygelzimer et al, ICML 2009.
42
2 / qt) ]
43
(Predict a repeated coin toss whose bias is either 1+1/√B or 1-1/√B )
2 / qt) ]
44
data,cost: (32,12) , c=0.3 (20,18) , c=0.8 Pr[purchase]: 0.12 0.08
45
2 / qt)
/ K √ct (K a normalizing constant).
ct zt
46
2 / qt)
/ K √ct (K a normalizing constant).
ct zt
47
(Same bad instance, but with “useless” free data points sprinkled in.)
48
49
50
ct zt
/ K √ct .
51
ct zt
/ K √ct .
/ K √ct .
52
53
54
55
Propose model of online learning with purchased data: T arriving data points and budget B. Convert any “FTRL” algorithm into a mechanism. Show regret on order of T / √B and lower bounds of same order.
56
VC-dim
z1 z2
data source h
57
c1 z1 z2
data source
c2
h
58
c1 z1 z2
data source
c2
h
59
c1 z1 z2
data source
c2
h
Proof: known “online-to-batch conversion”: regret R ⇒ risk R/T
60
61
62
63
64
Thanks!
65
66
67
test on half
Descent Naive: pay 1 until budget is exhausted, then run alg Baseline: run alg on all data points (no budget) Large γ: bad correlations Small γ: independent cost/data
68