Low-Cost Learning via Active Data Procurement September 2015 Jacob - PowerPoint PPT Presentation

Low-Cost Learning via Active Data Procurement September 2015 Jacob Abernethy Yiling Chen Chien-Ju Ho Bo Waggoner 1

Coming soon to a society near you data-needers s r e d l o h - a t a d ex: pharmaceutical co. ex: medical data 2

Classic ML problem data source learning alg hypothesis z 1 h z 2 data data-needer Goal : use small amount of data, output “good” h . 3

Example learning task: classification h h ● Data : (point, label) where label is or ● Hypothesis : hyperplane separating the two types 4

Twist: data is now held by individuals data source mechanism hypothesis z1 z2 h c2 c1 data-needer data-holders “Cost of revealing data” (formal model later…) Goal : spend small budget, output “good” h . 5

Why is this difficult? 1. (Relatively) few data are useful have mutation runners Studying ACTN-3 mutation and endurance running 6

Why is this difficult? 2. Utility may be correlated with cost (causing bias) HIV-positive HIV-negative Paying $10 for data no yes (to study HIV) no yes yes yes no yes 7

Why is this difficult? 2. Utility may be correlated with cost (causing bias) HIV-positive HIV-negative Paying $10 for data no yes (to study HIV) no yes yes yes no yes Machine Learning roadblock : how to deal with biases? 8

Why is this difficult? 3. Utility (ML) and cost (econ) live in different worlds entropies, gradients, loss functions, divergences mechanism learning alg auctions, budgets, value distributions, reserve prices 9

Why is this difficult? 3. Utility (ML) and cost (econ) live in different worlds entropies, gradients, loss functions, divergences mechanism learning alg auctions, budgets, value distributions, reserve prices Econ roadblock : how to assign value to data? 10

Broad research challenge: 1. How to assign value (prices) to pieces of data? 2. How to design mechanisms for procuring and learning from data? 3. Develop a theory of budget-constrained learning: what is (im)possible to learn given budget B and parameters of the problem? 11

Outline 1. Overview of literature, our contributions 2. Online learning model/results 3. “Statistical learning” result, conclusion 12

Related work Model: how are agents strategic? Roth, Schoenebeck 2012 agents cannot this work Ligett, Roth 2012 fabricate data, have costs Horel, Ionnadis, Muthukrishnan 2014 principal-agent Cummings, Ligett, Roth, Wu, Ziani 2015 style, data Cai, Daskalakis, Papadimitriou 2015 depends on effort Dekel, Fisher, Procaccia 2008 Meir, Procaccia, can fabricate data Rosenschein 2012 (like in peer- Ghosh, Ligett, Roth, prediction) Schoenebeck 2014 13

Related work risk/regret minimize variance Type of goal bounds or related goal Roth, Schoenebeck 2012 agents cannot this work Ligett, Roth 2012 fabricate data, have costs Horel, Ionnadis, Muthukrishnan 2014 principal-agent Cummings, Ligett, Roth, Wu, Ziani 2015 style, data Cai, Daskalakis, Papadimitriou 2015 depends on effort Dekel, Fisher, Procaccia 2008 Meir, Procaccia, can fabricate data Rosenschein 2012 (like in peer- Ghosh, Ligett, Roth, prediction) Schoenebeck 2014 14

e.g. Roth-Schoenebeck, EC 2012 data Conducting Truthful Surveys, Cheaply source i.i.d. 0 1 h c2 c1 ● Each datapoint is a number. Task is to estimate the mean ● Approach: offer each agent a price drawn i.i.d. ● Idea: obtains cheap but biased data; can de-bias it ● Result: derives price distribution to minimize variance of estimate 15

What we wanted to do differently 1. Prove ML-style risk or regret bounds rather than “minimize the variance” type goals. Why: understand error rate as function of budget and problem characteristics (as in ML) 16

What we wanted to do differently 1. Prove ML-style risk or regret bounds rather than “minimize the variance” type goals. Why: understand error rate as function of budget and problem characteristics (as in ML) 2. Interface with existing ML algorithms. Why: understand how value derives from learning alg. Toward black-box use of learners in mechanisms. 17

Related work n ” l o a i r s e population s n e e r g g “ e r g n classification i n r a average e l risk/regret minimize variance s m e l b o r p bounds or related goal Roth, Schoenebeck 2012 agents cannot this work Ligett, Roth 2012 fabricate data, have costs Horel, Ionnadis, Muthukrishnan 2014 principal-agent Cummings, Ligett, Roth, Wu, Ziani 2015 style, data Cai, Daskalakis, Papadimitriou 2015 depends on effort Dekel, Fisher, Procaccia 2008 Meir, Procaccia, can fabricate data Rosenschein 2012 (like in peer- Ghosh, Ligett, Roth, prediction) Schoenebeck 2014 18

What we wanted to do differently 1. Prove ML-style risk or regret bounds rather than “minimize the variance” style. Why: understand error rate as function of budget and problem characteristics (as in ML) 2. Interface with existing ML algorithms. Why: understand how value derives from learning alg. Toward black-box use of learners in mechanisms. 3. Online data arrival rather than “batch” setting. Why: allows “active learning” approach, nice model 19

Related work , e n i l n o “batch” e v i t c a risk/regret minimize variance bounds or related goal Roth, Schoenebeck 2012 agents cannot this work Ligett, Roth 2012 fabricate data, have costs Horel, Ionnadis, Muthukrishnan 2014 principal-agent Cummings, Ligett, Roth, Wu, Ziani 2015 style, data Cai, Daskalakis, Papadimitriou 2015 depends on effort Dekel, Fisher, Procaccia 2008 Meir, Procaccia, can fabricate data Rosenschein 2012 (like in peer- Ghosh, Ligett, Roth, prediction) Schoenebeck 2014 20

Related work risk/regret minimize variance bounds or related goal Roth, Schoenebeck 2012 agents cannot this work Ligett, Roth 2012 fabricate data, have costs Horel, Ionnadis, Muthukrishnan 2014 principal-agent Cummings, Ligett, Roth, Wu, Ziani 2015 style, data Cai, Daskalakis, Papadimitriou 2015 depends on effort Dekel, Fisher, Procaccia 2008 Meir, Procaccia, can fabricate data Abernethy, Frongillo, W. NIPS 2015 Rosenschein 2012 (like in peer- Ghosh, Ligett, Roth, prediction) Schoenebeck 2014 21

Overview of our contributions Propose model of online learning with purchased data: T arriving data points and budget B. Convert any “FTRL” algorithm into a mechanism. Show regret on order of T / √ B and lower bounds of same order. 22

Overview of our contributions Extend model to case where data is drawn i.i.d. (“statistical learning”) Propose model of online learning with purchased data: T arriving data points and budget B. Convert any “FTRL” algorithm into a mechanism. Show regret on order of T / √ B and lower bounds of same order. Extend result to “risk” bound on order of 1 / √ B . 23

Outline 1. Overview of literature, our contributions 2. Online learning model/results 3. “Statistical learning” result, conclusion 24

Online learning with purchased data a. Review of online learning b. Our model: adding $$ c. Deriving our mechanism and results 25

Standard online learning model For t = 1, … , T : ● algorithm posts a hypothesis h t ● data point z t arrives ● algorithm sees z t and updates to h t+1 Loss = ∑ t ℓ (h t , z t ) Regret = Loss - ∑ t ℓ (h * , z t ) where h * minimizes sum 26

Follow-the-Regularized-Leader (FTRL) Assume: loss function is convex and Lipschitz, hypothesis space is Hilbert, etc Algorithm: h t = argmin ∑ s<t ℓ (h, z s ) + R(h)/ η 27

Follow-the-Regularized-Leader (FTRL) Assume: loss function is convex and Lipschitz, hypothesis space is Hilbert, etc Algorithm: h t = argmin ∑ s<t ℓ (h, z s ) + R(h)/ η 2 Example 1 (Euclidean norm): R(h) = ǁ h ǁ 2 ⇒ h t = h t-1 - η ∇ ℓ (h, z t ) online gradient descent 28

Follow-the-Regularized-Leader (FTRL) Assume: loss function is convex and Lipschitz, hypothesis space is Hilbert, etc Algorithm: h t = argmin ∑ s<t ℓ (h, z s ) + R(h)/ η 2 Example 1 (Euclidean norm): R(h) = ǁ h ǁ 2 ⇒ h t = h t-1 - η ∇ ℓ (h, z t ) online gradient descent Example 2 (negative entropy): R(h) = ∑ j h (j) ln(h (j) ) . (j) ∝ h t-1 (j) exp[ η ∇ ℓ (h t-1 , z t ) ] ⇒ h t multiplicative weights 29

Regret Bound for FTRL Fact: the regret of FTRL is bounded by O of 2 where Δ t = ǁ ∇ ℓ (h t , z t ) ǁ . 1/ η + η ∑ t Δ t 30

Regret Bound for FTRL Fact: the regret of FTRL is bounded by O of 2 where Δ t = ǁ ∇ ℓ (h t , z t ) ǁ . 1/ η + η ∑ t Δ t We know Δ t ≤ 1 by assumption, so we can choose η =1/ √ T and get Regret ≤ O( √ T ) . “No regret” : average regret → 0. 31

Online learning with purchased data a. Review of online learning b. Our model: adding $$ c. Deriving our mechanism and results 32

Model of strategic data-holder Model of agent: ● holds data z t and cost c t ● cost is threshold price ○ agent agrees to sell data iff price ≥ c t ○ interpretations: privacy, transaction cost, …. c t z t ● Assume: all costs ≤ 1 33

Low-Cost Learning via Active Data Procurement September 2015 Jacob - PowerPoint PPT Presentation

Low-Cost Learning via Active Data Procurement September 2015 Jacob Abernethy Yiling Chen Chien-Ju Ho Bo Waggoner 1 Coming soon to a society near you data-needers s r e d l o h - a t a d ex: pharmaceutical co. ex: medical data

Low-Cost Learning via Active Data Procurement EC 2015 Jacob Abernethy Yiling Chen Chien-Ju Ho

Low-Cost Learning via Active Data Procurement October 2015 Jacob Abernethy Yiling Chen

Agenda Intro to Active Learning Activity Design Resources for Active Learning Lunch with Active

STRATEGIC PROCUREMENT September 2013 Gemma Isles Acting Head of Procurement / AGMA Procurement

The Active Card An Active Mind in an Active Body More people, More Active, More often! The

Active Adversary Lecture 7 CCA Security MAC Active Adversary Active Adversary An active

Multi-Task Active Learning Yi Zhang Outline Active Learning Multi-Task Active Learning

Romanian Public Romanian Public Procurement System Procurement System THE NATIONAL AGENCY FOR

Procurement BAS Institute February 2018 Financial Affairs Procurement & Supply Chain

UN ESCAP Procurement Isabelle Dupuy Chief Procurement Officer Procurement Unit Division of

Procurement Authority and Administration of Procurement Function 13 Contents Procurement

Procurement Plan Formal Procurement Formal Procurement Methods Invitation for Bid (IFB)

RBT Procurement Sustainability and Procurement Brian Leigh RBT Head of Procurement 10

TUTORIAL - TUTORIAL -ABC ABC TOTAL COST for a COST OBJECT TOTAL COST for a COST OBJECT

Low-carbon procurement Reducing carbon emissions and improving sustainable operations through

Cost Report Capital Cost Operating Cost (Up front cost) (Annual cost over time) Utilities

RGGI I Carbon Market Forecast for r 2019 and Beyond RGGI Carbon Market Forecast for 2019 &

Forecasted BAU GHG Emissions Compared to the Allowance Cap Covered Entities Only 2013 - 2020 450

A short and long term impact analysis of the first auction April 12, 2017

Sponsored Search Auctions Introduction Web search engines

Revenue Maximization with Dynamic Auctions in IaaS Cloud Markets Wei Wang, Ben Liang, Baochun Li

CIMA Paper P2 Advanced Management Accounting Ian Kusano and Nathi Thela 6 Chapter Performance

Modeling the Income Process (Extract from Earnings, Consumption and Lifecycle Choices by

EARNINGS REVIEW April 28, 2016 1 Business Review Appendix Financial Headlines 3 Income from

Sambuz

Useful Links

Newsletter

Mail Us

Low-Cost Learning via Active Data Procurement September 2015 Jacob - PowerPoint PPT Presentation

Low-Cost Learning via Active Data Procurement September 2015 Jacob Abernethy Yiling Chen Chien-Ju Ho Bo Waggoner 1 Coming soon to a society near you data-needers s r e d l o h - a t a d ex: pharmaceutical co. ex: medical data

Low-Cost Learning via Active Data Procurement EC 2015 Jacob Abernethy Yiling Chen Chien-Ju Ho

Low-Cost Learning via Active Data Procurement October 2015 Jacob Abernethy Yiling Chen

Agenda Intro to Active Learning Activity Design Resources for Active Learning Lunch with Active

STRATEGIC PROCUREMENT September 2013 Gemma Isles Acting Head of Procurement / AGMA Procurement

The Active Card An Active Mind in an Active Body More people, More Active, More often! The

Active Adversary Lecture 7 CCA Security MAC Active Adversary Active Adversary An active

Multi-Task Active Learning Yi Zhang Outline Active Learning Multi-Task Active Learning

Romanian Public Romanian Public Procurement System Procurement System THE NATIONAL AGENCY FOR

Procurement BAS Institute February 2018 Financial Affairs Procurement &amp; Supply Chain

UN ESCAP Procurement Isabelle Dupuy Chief Procurement Officer Procurement Unit Division of

Procurement Authority and Administration of Procurement Function 13 Contents Procurement

Procurement Plan Formal Procurement Formal Procurement Methods Invitation for Bid (IFB)

RBT Procurement Sustainability and Procurement Brian Leigh RBT Head of Procurement 10

TUTORIAL - TUTORIAL -ABC ABC TOTAL COST for a COST OBJECT TOTAL COST for a COST OBJECT

Low-carbon procurement Reducing carbon emissions and improving sustainable operations through

Cost Report Capital Cost Operating Cost (Up front cost) (Annual cost over time) Utilities

RGGI I Carbon Market Forecast for r 2019 and Beyond RGGI Carbon Market Forecast for 2019 &amp;

Forecasted BAU GHG Emissions Compared to the Allowance Cap Covered Entities Only 2013 - 2020 450

A short and long term impact analysis of the first auction April 12, 2017

Sponsored Search Auctions Introduction Web search engines

Revenue Maximization with Dynamic Auctions in IaaS Cloud Markets Wei Wang, Ben Liang, Baochun Li

CIMA Paper P2 Advanced Management Accounting Ian Kusano and Nathi Thela 6 Chapter Performance

Modeling the Income Process (Extract from Earnings, Consumption and Lifecycle Choices by

EARNINGS REVIEW April 28, 2016 1 Business Review Appendix Financial Headlines 3 Income from

Sambuz

Useful Links

Newsletter

Mail Us

Procurement BAS Institute February 2018 Financial Affairs Procurement & Supply Chain

RGGI I Carbon Market Forecast for r 2019 and Beyond RGGI Carbon Market Forecast for 2019 &