Low-Cost Learning via Active Data Procurement October 2015 Jacob - PowerPoint PPT Presentation

Low-Cost Learning via Active Data Procurement October 2015 Jacob Abernethy Yiling Chen Chien-Ju Ho Bo Waggoner 1

Coming soon to a society near you data-needers s r e d l o h - a t a d ex: pharmaceutical co. ex: medical data 2

Classic ML problem data source learning alg hypothesis z 1 h z 2 data data-needer Goal : use small amount of data, output “good” h . 3

Example learning task: classification h h ● Data : (point, label) where label is or ● Hypothesis : hyperplane separating the two types 4

Twist: data is now held by individuals data source mechanism hypothesis z1 z2 h c2 c1 data-needer data-holders “Cost of revealing data” (formal model later…) Goal : spend small budget, output “good” h . 5

Why is this difficult? 1. (Relatively) few data are useful have mutation runners Studying ACTN-3 mutation and endurance running 6

Why is this difficult? 2. Utility of data may be correlated with cost (causing bias) HIV-positive HIV-negative Paying $10 for data no yes (to study HIV) no yes yes yes no yes 7

Why is this difficult? 2. Utility of data may be correlated with cost (causing bias) HIV-positive HIV-negative Paying $10 for data no yes (to study HIV) no yes yes yes no yes Machine Learning roadblock : how to deal with biases? 8

Why is this difficult? 3. Utility (ML) and cost (econ) live in different worlds entropies, gradients, loss functions, divergences mechanism learning alg auctions, budgets, value distributions, reserve prices 9

Why is this difficult? 3. Utility (ML) and cost (econ) live in different worlds entropies, gradients, loss functions, divergences mechanism learning alg auctions, budgets, value distributions, reserve prices Econ roadblock : how to assign value to data? 10

Broad research challenge: 1. How to assign value (prices) to pieces of data? 2. How to design mechanisms for procuring and learning from data? 3. Develop a theory of budget-constrained learning: what is (im)possible to learn given budget B and parameters of the problem? 11

Outline 1. Overview of literature, our contributions 2. Online learning model/results 3. “Statistical learning” result, conclusion 12

Related work How are agents strategic? Roth, Schoenebeck 2012 agents cannot this work Ligett, Roth 2012 fabricate data, have costs Horel, Ionnadis, Muthukrishnan 2014 principal-agent Cummings, Ligett, Roth, Wu, Ziani 2015 style, data Cai, Daskalakis, Papadimitriou 2015 depends on effort 13

Related work risk/regret minimize variance Type of goal bounds or related goal Roth, Schoenebeck 2012 agents cannot this work Ligett, Roth 2012 fabricate data, have costs Horel, Ionnadis, Muthukrishnan 2014 principal-agent Cummings, Ligett, Roth, Wu, Ziani 2015 style, data Cai, Daskalakis, Papadimitriou 2015 depends on effort 14

Related work risk/regret minimize variance bounds or related goal Roth, Schoenebeck 2012 agents cannot this work Ligett, Roth 2012 fabricate data, have costs Horel, Ionnadis, Muthukrishnan 2014 principal-agent Cummings, Ligett, Roth, Wu, Ziani 2015 style, data Cai, Daskalakis, Papadimitriou 2015 depends on effort Waggoner, Frongillo, Abernethy NIPS 2015: prediction-market style mechanism 15

e.g. Roth-Schoenebeck, EC 2012 data Conducting Truthful Surveys, Cheaply source i.i.d. 0 1 h c2 c1 ● Each datapoint is a number. Task is to estimate the mean ● Approach: offer each agent a price drawn i.i.d. ● Goal: minimize the estimate’s variance 16

What we wanted to do differently 1. Prove ML-style risk or regret bounds Why: ML-style approach: understand error rate as function of budget and problem characteristics. 2. Interface with existing ML algorithms. Why: understand how value derives from learning alg. Toward black-box use of learners in mechanisms. 3. Online data arrival Why: active-learning approach, simpler model 17

Overview of our contributions Propose model of online learning with purchased data: T arriving data points and budget B. Convert any “FTRL” algorithm into a mechanism. Show regret on order of T / √ B and lower bounds of same order. 18

Overview of our contributions Extend model to case where data is drawn i.i.d. (“statistical learning”) Propose model of online learning with purchased data: T arriving data points and budget B. Convert any “FTRL” algorithm into a mechanism. Show regret on order of T / √ B and lower bounds of same order. Extend result to “risk” bound on order of 1 / √ B . 19

Outline 1. Overview of literature, our contributions 2. Online learning model/results 3. “Statistical learning” result, conclusion 20

Online learning with purchased data a. Review of online learning b. Our model: adding $$ c. Deriving our mechanism and results 21

Standard online learning model For t = 1, … , T : ● algorithm posts a hypothesis h t ● data point z t arrives ● algorithm sees z t and updates to h t+1 Loss = ∑ t ℓ (h t , z t ) Regret = Loss - ∑ t ℓ (h * , z t ) where h * minimizes sum 22

Follow-the-Regularized-Leader (FTRL) Assume: loss function is convex and Lipschitz, hypothesis space is Hilbert, etc Algorithm: h t = argmin ∑ s<t ℓ (h, z s ) + R(h)/ η 23

Follow-the-Regularized-Leader (FTRL) Assume: loss function is convex and Lipschitz, hypothesis space is Hilbert, etc Algorithm: h t = argmin ∑ s<t ℓ (h, z s ) + R(h)/ η 2 Example 1 (Euclidean norm): R(h) = ǁ h ǁ 2 ⇒ h t = h t-1 - η ∇ ℓ (h, z t ) online gradient descent 24

Follow-the-Regularized-Leader (FTRL) Assume: loss function is convex and Lipschitz, hypothesis space is Hilbert, etc Algorithm: h t = argmin ∑ s<t ℓ (h, z s ) + R(h)/ η 2 Example 1 (Euclidean norm): R(h) = ǁ h ǁ 2 ⇒ h t = h t-1 - η ∇ ℓ (h, z t ) online gradient descent Example 2 (negative entropy): R(h) = ∑ j h (j) ln(h (j) ) . (j) ∝ h t-1 (j) exp[ η ∇ ℓ (h t-1 , z t ) ] ⇒ h t multiplicative weights 25

Regret Bound for FTRL Fact: the regret of FTRL is bounded by O of 2 where Δ t = ǁ ∇ ℓ (h t , z t ) ǁ . 1/ η + η ∑ t Δ t 26

Regret Bound for FTRL Fact: the regret of FTRL is bounded by O of 2 where Δ t = ǁ ∇ ℓ (h t , z t ) ǁ . 1/ η + η ∑ t Δ t We know Δ t ≤ 1 by assumption, so we can choose η =1/ √ T and get Regret ≤ O( √ T ) . “No regret” : average regret → 0. 27

First: model of strategic data-holder Model of agent: ● holds data z t and cost c t ● cost is threshold price ○ agent agrees to sell data iff price ≥ c t ○ interpretations: privacy, transaction cost, …. c t z t ● Assume: all costs ≤ 1 29

Model of agent-mechanism interaction ● Mechanism posts menu of prices offered: data: (32,12) (20,18) (32,12) price: $0.22 $0.41 $0.88 ● agent t arrives c t z t ● If c t ≤ price(z t ) , agent accepts : ○ agent reveals (z t , c t ) ○ mechanism pays agent price(z t ) ● Otherwise, agent rejects : ○ mechanism learns that agent rejected, pays nothing 30

Recall: standard online learning model For t = 1, … , T : ● algorithm posts a hypothesis h t ● data point z t arrives ● algorithm sees z t and updates to h t+1 31

Our model: online learning with $$ For t = 1, … , T : ● mechanism posts a hypothesis h t and a menu of prices ● data point z t arrives with cost c t c t z t ● If c t ≤ menu price of z t : mech pays price, learns z t ● else: mech pays nothing Loss = ∑ t ℓ (h t , z t ) Regret = Loss - ∑ t ℓ (h * , z t ) where h * minimizes sum 32

Start easy Suppose all costs are 1 . ⇒ Determine which data points to sample. data: (32,12) (20,18) (32,12) c t z t price: $1 $0 $0 34

Start easy Suppose all costs are 1 . ⇒ Determine which data points to sample. data: (32,12) (20,18) (32,12) c t z t price: $1 $0 $0 Examples: ● B = T/2 ● B = √ T ● B = log(T) 35

Key idea #1: randomly sample Can purchase each data point z t with probability q t (z t ) . Menu is now randomly chosen : data: (32,12) (20,18) (32,12) Pr[price=1]: 0.3 0.06 0.41 1/ η + E [ η ∑ t ( Δ t 2 / q t ) ] 36

Key idea #1: randomly sample Can purchase each data point z t with probability q t (z t ) . Menu is now randomly chosen : data: (32,12) (20,18) (32,12) Pr[price=1]: 0.3 0.06 0.41 Lemma (importance-weighted regret bound): For any q t s, the regret of (modified) FTRL is O of 1/ η + η E [ ∑ t ( Δ t 2 / q t ) ] See also: Importance-Weighted Active Learning , Beygelzimer et al, ICML 2009. 37

Result for easy case Lemma (importance-weighted regret bound): For any q t s, the regret of (modified) FTRL is O of 1/ η + η E [ ∑ t ( Δ t 2 / q t ) ] Corollary: Setting all q t = B/T and choosing η = √ B / T yields regret ≤ T / √ B . “No data, no regret” : average amount of data → 0 and average regret → 0. 38

Low-Cost Learning via Active Data Procurement October 2015 Jacob - PowerPoint PPT Presentation

Low-Cost Learning via Active Data Procurement October 2015 Jacob Abernethy Yiling Chen Chien-Ju Ho Bo Waggoner 1 Coming soon to a society near you data-needers s r e d l o h - a t a d ex: pharmaceutical co. ex: medical data 2

Low-Cost Learning via Active Data Procurement EC 2015 Jacob Abernethy Yiling Chen Chien-Ju Ho

Low-Cost Learning via Active Data Procurement September 2015 Jacob Abernethy Yiling Chen

Agenda Intro to Active Learning Activity Design Resources for Active Learning Lunch with Active

STRATEGIC PROCUREMENT September 2013 Gemma Isles Acting Head of Procurement / AGMA Procurement

The Active Card An Active Mind in an Active Body More people, More Active, More often! The

Active Adversary Lecture 7 CCA Security MAC Active Adversary Active Adversary An active

Multi-Task Active Learning Yi Zhang Outline Active Learning Multi-Task Active Learning

Romanian Public Romanian Public Procurement System Procurement System THE NATIONAL AGENCY FOR

Procurement BAS Institute February 2018 Financial Affairs Procurement & Supply Chain

UN ESCAP Procurement Isabelle Dupuy Chief Procurement Officer Procurement Unit Division of

Procurement Authority and Administration of Procurement Function 13 Contents Procurement

Procurement Plan Formal Procurement Formal Procurement Methods Invitation for Bid (IFB)

RBT Procurement Sustainability and Procurement Brian Leigh RBT Head of Procurement 10

TUTORIAL - TUTORIAL -ABC ABC TOTAL COST for a COST OBJECT TOTAL COST for a COST OBJECT

Low-carbon procurement Reducing carbon emissions and improving sustainable operations through

Cost Report Capital Cost Operating Cost (Up front cost) (Annual cost over time) Utilities

Towards Supporting Millions of Users in Modifiable Virtual Environments by Redesigning

Blocks Designing for Scale and Innovation for the worlds largest ticket marketplace Charlie

Cable Beach and John Hurliman VWRAP Intel Labs 1 What is Cable Beach? Researching the next

Real Commerce in Virtual Worlds Sascha Vitzthum(Illinois Wesleyan University) Abhishek Kathuria

USB and the Real World Alan Ott Embedded Linux Conference - Europe October 14, 2014 About the

http://cs224w.stanford.edu We are more influenced by our friends than strangers 68% of

Asi sia Worlds largest Worlds most continent! populated continent 17,212,000 sq Covers

Economies, and the Sustainable Development Goals Nile University Mahmoud Mohieldin @wbg2030

Low-Cost Learning via Active Data Procurement October 2015 Jacob - PowerPoint PPT Presentation

Low-Cost Learning via Active Data Procurement October 2015 Jacob Abernethy Yiling Chen Chien-Ju Ho Bo Waggoner 1 Coming soon to a society near you data-needers s r e d l o h - a t a d ex: pharmaceutical co. ex: medical data 2

Low-Cost Learning via Active Data Procurement EC 2015 Jacob Abernethy Yiling Chen Chien-Ju Ho

Low-Cost Learning via Active Data Procurement September 2015 Jacob Abernethy Yiling Chen

Agenda Intro to Active Learning Activity Design Resources for Active Learning Lunch with Active

STRATEGIC PROCUREMENT September 2013 Gemma Isles Acting Head of Procurement / AGMA Procurement

The Active Card An Active Mind in an Active Body More people, More Active, More often! The

Active Adversary Lecture 7 CCA Security MAC Active Adversary Active Adversary An active

Multi-Task Active Learning Yi Zhang Outline Active Learning Multi-Task Active Learning

Romanian Public Romanian Public Procurement System Procurement System THE NATIONAL AGENCY FOR

Procurement BAS Institute February 2018 Financial Affairs Procurement &amp; Supply Chain

UN ESCAP Procurement Isabelle Dupuy Chief Procurement Officer Procurement Unit Division of

Procurement Authority and Administration of Procurement Function 13 Contents Procurement

Procurement Plan Formal Procurement Formal Procurement Methods Invitation for Bid (IFB)

RBT Procurement Sustainability and Procurement Brian Leigh RBT Head of Procurement 10

TUTORIAL - TUTORIAL -ABC ABC TOTAL COST for a COST OBJECT TOTAL COST for a COST OBJECT

Low-carbon procurement Reducing carbon emissions and improving sustainable operations through

Cost Report Capital Cost Operating Cost (Up front cost) (Annual cost over time) Utilities

Towards Supporting Millions of Users in Modifiable Virtual Environments by Redesigning

Blocks Designing for Scale and Innovation for the worlds largest ticket marketplace Charlie

Cable Beach and John Hurliman VWRAP Intel Labs 1 What is Cable Beach? Researching the next

Real Commerce in Virtual Worlds Sascha Vitzthum(Illinois Wesleyan University) Abhishek Kathuria

USB and the Real World Alan Ott Embedded Linux Conference - Europe October 14, 2014 About the

http://cs224w.stanford.edu We are more influenced by our friends than strangers 68% of

Asi sia Worlds largest Worlds most continent! populated continent 17,212,000 sq Covers

Economies, and the Sustainable Development Goals Nile University Mahmoud Mohieldin @wbg2030

Procurement BAS Institute February 2018 Financial Affairs Procurement & Supply Chain