Low-Cost Learning via Active Data Procurement
October 2015 Jacob Abernethy Yiling Chen Chien-Ju Ho Bo Waggoner
1
Low-Cost Learning via Active Data Procurement October 2015 Jacob - - PowerPoint PPT Presentation
Low-Cost Learning via Active Data Procurement October 2015 Jacob Abernethy Yiling Chen Chien-Ju Ho Bo Waggoner 1 Coming soon to a society near you data-needers s r e d l o h - a t a d ex: pharmaceutical co. ex: medical data 2
1
2
d a t a
d e r s ex: medical data data-needers ex: pharmaceutical co.
3
z1 z2
data source h
4
h
h
5
c1 z1 z2
data source
c2
h
6
Studying ACTN-3 mutation and endurance running have mutation runners
7
Paying $10 for data (to study HIV) HIV-negative yes yes no yes yes HIV-positive no no yes
8
Paying $10 for data (to study HIV) HIV-negative yes yes no yes yes HIV-positive no no yes
9
entropies, gradients, loss functions, divergences
auctions, budgets, value distributions, reserve prices
10
entropies, gradients, loss functions, divergences
auctions, budgets, value distributions, reserve prices
11
12
13
Cummings, Ligett, Roth, Wu, Ziani 2015 Horel, Ionnadis, Muthukrishnan 2014 Roth, Schoenebeck 2012 Ligett, Roth 2012 Cai, Daskalakis, Papadimitriou 2015 principal-agent style, data depends on effort agents cannot fabricate data, have costs this work
14
Cummings, Ligett, Roth, Wu, Ziani 2015 Horel, Ionnadis, Muthukrishnan 2014 Roth, Schoenebeck 2012 Ligett, Roth 2012 Cai, Daskalakis, Papadimitriou 2015 principal-agent style, data depends on effort agents cannot fabricate data, have costs this work
15
Cummings, Ligett, Roth, Wu, Ziani 2015 Horel, Ionnadis, Muthukrishnan 2014 Roth, Schoenebeck 2012 Ligett, Roth 2012 Cai, Daskalakis, Papadimitriou 2015 principal-agent style, data depends on effort agents cannot fabricate data, have costs this work
Waggoner, Frongillo, Abernethy NIPS 2015: prediction-market style mechanism
Conducting Truthful Surveys, Cheaply
16
c1 1
data source
c2
h
i.i.d.
17
18
Propose model of online learning with purchased data: T arriving data points and budget B. Convert any “FTRL” algorithm into a mechanism. Show regret on order of T / √B and lower bounds of same order.
19
Propose model of online learning with purchased data: T arriving data points and budget B. Convert any “FTRL” algorithm into a mechanism. Show regret on order of T / √B and lower bounds of same order.
20
21
22
where h* minimizes sum
23
Assume: loss function is convex and Lipschitz, hypothesis space is Hilbert, etc
24
Assume: loss function is convex and Lipschitz, hypothesis space is Hilbert, etc
2
25
Assume: loss function is convex and Lipschitz, hypothesis space is Hilbert, etc
2
(j) ∝ ht-1 (j) exp[ η∇ℓ(ht-1, zt ) ]
26
2 where Δt = ǁ ∇ℓ(ht, zt) ǁ.
27
2 where Δt = ǁ ∇ℓ(ht, zt) ǁ.
28
29
ct zt
30
data: (32,12) (20,18) (32,12) price: $0.22 $0.41 $0.88
ct zt
31
32
where h* minimizes sum
ct zt
33
34
ct zt
data: (32,12) (20,18) (32,12) price: $1 $0 $0
35
ct zt
data: (32,12) (20,18) (32,12) price: $1 $0 $0
36
data: (32,12) (20,18) (32,12) Pr[price=1]: 0.3 0.06 0.41
2 / qt) ]
37
data: (32,12) (20,18) (32,12) Pr[price=1]: 0.3 0.06 0.41
2 / qt) ]
See also: Importance-Weighted Active Learning, Beygelzimer et al, ICML 2009.
38
2 / qt) ]
39
(Predict a repeated coin toss whose bias is either 1+1/√B or 1-1/√B )
2 / qt) ]
40
data,cost: (32,12) , c=0.3 (20,18) , c=0.8 Pr[purchase]: 0.12 0.08
41
2 / qt)
/ K √ct (K a normalizing constant).
ct zt
42
2 / qt)
/ K √ct (K a normalizing constant).
ct zt
43
(Same bad instance, but with “useless” free data points sprinkled in.)
44
45
46
ct zt
/ K √ct .
47
ct zt
/ K √ct .
/ K √ct .
48
49
50
51
Propose model of online learning with purchased data: T arriving data points and budget B. Convert any “FTRL” algorithm into a mechanism. Show regret on order of T / √B and lower bounds of same order.
52
VC-dim
z1 z2
data source h
53
c1 z1 z2
data source
c2
h
54
c1 z1 z2
data source
c2
h
55
c1 z1 z2
data source
c2
h
Proof: known “online-to-batch conversion”: regret R ⇒ risk R/T
56
57
58
59
60
Thanks!
61
62
63
test on half
Descent Naive: pay 1 until budget is exhausted, then run alg Baseline: run alg on all data points (no budget) Large γ: bad correlations Small γ: independent cost/data
64