Low-Cost Learning via Active Data Procurement EC 2015 Jacob - PowerPoint PPT Presentation

Low-Cost Learning via Active Data Procurement EC 2015 Jacob Abernethy Yiling Chen Chien-Ju Ho Bo Waggoner 1

General problem: buy data for learning h hypothesis (predictor) Learners LLC We Buy Data! 2

General problem: buy data for learning … learn to predict h disease Example: each person has hypothesis medical data... (predictor) Learners LLC We Buy Data! 3

Example task: classification h Learners LLC We Buy Data! ● Data point : pair (x, label) where label is or ● Hypothesis : hyperplane separating the two types ● Loss : 0 if h(x) = correct label, 1 if incorrect label ● Goal : pick h with low expected loss on new data point 4

General Goal: Learn a good hypothesis by purchasing data from the crowd 5

This paper: 1. price data actively based on value 2. machine-learning style bounds 3. transform learning algs to mechanisms learning alg mechanism 6

How to assess value/price of data? 8

Use the learner’s current hypothesis! 9

Use the learner’s current hypothesis! 10

Our model distribution i.i.d. mechanism hypothesis z1 z2 online arrival h c2 c1 Cost of revealing data ● lies in [0,1] ● worst-case, arbitrarily correlated with the data 11

Agent-mechanism interaction At each time t = 1, … , T : 1. mechanism posts menu data: 65 30 65 price: $0.22 $0.41 $0.88 12

Agent-mechanism interaction At each time t = 1, … , T : 1. mechanism posts menu data: 65 30 65 price: $0.22 $0.41 $0.88 2. agent arrives mechanism learns (zt, ct) accepts and pays price(zt) zt ct rejects mechanism sees rejection and pays nothing 13

What is the “classic” learning problem? distribution learning alg hypothesis i.i.d. z1 h z2 15

Classic ML bounds measure of problem difficulty h VC-dim E loss( h ) ≤ E loss( h* ) + O T alg’s hypothesis # of data points optimal hypothesis 16

Main result measure of “ problem difficulty ” , in [0,1] . h For a variety of learning problems: γ E loss( h ) ≤ E loss( h* ) + O B optimal hypothesis Budget constraint our hypothesis (Assume: γ is approximately known in advance) 17

Main result γ ≈ average cost * difficulty 1 T measure of “ problem difficulty ” , in [0,1] . “if problem is cheap or easy or has good correlations , h we do well” For a variety of learning problems: γ E loss( h ) ≤ E loss( h* ) + O B optimal hypothesis Budget constraint our hypothesis (Assume: γ is approximately known in advance) 18

Related work in purchasing data Type of goal Roth, Schoenebeck 2012 this work Ligett, Roth 2012 Horel, Ionnadis, Muthukrishnan 2014 Model Cummings, Ligett, Roth, Wu, Ziani 2015 Cai, Daskalakis, Papadimitriou 2015 Dekel, Fisher, Procaccia 2008 Ghosh, Ligett, Roth, Meir, Procaccia, Schoenebeck 2014 Rosenschein 2012 19

This paper: Key features/ideas: 1. price data actively based on value 2. machine-learning style bounds 3. transform learning algs to mechanisms learning alg mechanism 20

Learning algorithms: FTRL ● Follow-The-Regularized-Leader (FTRL) (Multiplicative Weights, Online Gradient Descent, ….) ● FTRL algs do “no regret” learning: ○ output a hypothesis at each time ○ want low total loss ● we interface with FTRL as a black box… … but analysis relies on “opening the box” 21

Our mechanism At each time t = 1, … , T : 1. post menu Alg current hypothesis price(z) ~ ht distribution( ht, z ) 22

Our mechanism At each time t = 1, … , T : 1. post menu Alg current hypothesis price(z) ~ ht distribution( ht, z ) 2. agent arrives de-biased data accepts zt ct null data point rejects 23

Analysis idea: use no-regret setting! z1 z2 c2 c1 h h ● Propose regret minimization with purchased data ● Prove upper and lower bounds on regret ● low regret ⇒ good prediction on new data (main result) 24

Summary Problem: learn a good hypothesis by buying data from arriving agents For a variety of learning problems: γ E loss( h ) ≤ E loss( h* ) + O B 25

Key ideas 1. price data actively based on value 2. machine-learning style bounds 3. transform learning algs to mechanisms learning alg mechanism 26

Future work - Improve bounds (no-regret: gap between lower and upper bounds) - Propose “universal quantity” to replace γ in bounds (analogue of VC-dimension) - Variants of the model, better batch mechanisms - Explore black-box use of learning algs in mechanisms 27

Future work - Improve bounds (no-regret: gap between lower and upper bounds) - Propose “universal quantity” to replace γ in bounds (analogue of VC-dimension) - Variants of the model, better batch mechanisms - Explore black-box use of learning algs in mechanisms Thanks! 28

Additional slides 29

What would you do before this work? Naive 1: post price of 1, obtain B points, run a learner on them. Naive 2: post lower prices, obtain biased data, do what?? Roth-Schoenebeck (EC 2012): draw prices from a distribution, obtain biased data, de-bias it. ● Batch setting (offer each data point the same price distribution) ● Each agent has a number. Task is to estimate the mean ● Derives price distribution to minimize variance of estimate 30

Related work ML-style Minimize variance risk bounds or related goal Roth, Schoenebeck 2012 agents cannot this work Ligett, Roth 2012 fabricate data, have costs Horel, Ionnadis, Muthukrishnan 2014 principal-agent Cummings, Ligett, Roth, Wu, Ziani 2015 style, data Cai, Daskalakis, Papadimitriou 2015 depends on effort Dekel, Fisher, Procaccia 2008 Meir, Procaccia, can fabricate data Rosenschein 2012 (like in peer- Ghosh, Ligett, Roth, prediction) Schoenebeck 2014 31

Simulation results MNIST dataset -- handwritten digit classification Brighter green = Toy problem: higher cost classify (1 or 4) vs (9 or 8) 32

Simulation results ● T = 8503 ● train on half, test on half ● Alg: Online Gradient Descent Naive: pay 1 until budget is exhausted, then run alg Baseline: run alg on all data points (no budget) Large γ : bad correlations Small γ : independent cost/data 33

“value” and pricing distribution? ● Value of data = size of loss size of gradient of loss (“how much you learn from the loss”) ● Pricing distribution: ǁ ∇ loss(h t , z t ) ǁ Pr[ price ≥ x ] = K x 1 ● K = normalization constant proportional to γ = ∑ t ǁ ∇ loss(h t ,z t ) ǁ c t T (assume approximate knowledge of K … in practice, can estimate it online) ● Distribution is derived by optimizing regret bound of mechanism for “at- cost” variant of no-regret setting 34

Pricing distribution 35

Low-Cost Learning via Active Data Procurement EC 2015 Jacob - PowerPoint PPT Presentation

Low-Cost Learning via Active Data Procurement EC 2015 Jacob Abernethy Yiling Chen Chien-Ju Ho Bo Waggoner 1 General problem: buy data for learning h hypothesis (predictor) Learners LLC We Buy Data! 2 General problem: buy data for

Low-Cost Learning via Active Data Procurement October 2015 Jacob Abernethy Yiling Chen

Low-Cost Learning via Active Data Procurement September 2015 Jacob Abernethy Yiling Chen

Agenda Intro to Active Learning Activity Design Resources for Active Learning Lunch with Active

STRATEGIC PROCUREMENT September 2013 Gemma Isles Acting Head of Procurement / AGMA Procurement

The Active Card An Active Mind in an Active Body More people, More Active, More often! The

Active Adversary Lecture 7 CCA Security MAC Active Adversary Active Adversary An active

Multi-Task Active Learning Yi Zhang Outline Active Learning Multi-Task Active Learning

Romanian Public Romanian Public Procurement System Procurement System THE NATIONAL AGENCY FOR

Procurement BAS Institute February 2018 Financial Affairs Procurement & Supply Chain

UN ESCAP Procurement Isabelle Dupuy Chief Procurement Officer Procurement Unit Division of

Procurement Authority and Administration of Procurement Function 13 Contents Procurement

Procurement Plan Formal Procurement Formal Procurement Methods Invitation for Bid (IFB)

RBT Procurement Sustainability and Procurement Brian Leigh RBT Head of Procurement 10

TUTORIAL - TUTORIAL -ABC ABC TOTAL COST for a COST OBJECT TOTAL COST for a COST OBJECT

Low-carbon procurement Reducing carbon emissions and improving sustainable operations through

Cost Report Capital Cost Operating Cost (Up front cost) (Annual cost over time) Utilities

Learning What is learning? Foundations of Artificial Intelligence An agent learns when it

13. Reinforcemen t Learning [Read Chapter 13] [Exercises 13.1, 13.2, 13.4] Con

Class 1 Introduction to Statistical Learning Theory Carlo Ciliberto Department of Computer

CS 285 Instructor: Sergey Levine UC Berkeley Todays Lecture 1. So far: manually design

9/12/17 Universal Design Ron Rogers for Learning @ronbrogers Ron_Rogers@ocali.org 101 Goals

Statistical Machine Learning Lecture 01: Introduction Kristian Kersting TU Darmstadt Summer

L2S: Learning to Search CS 6355: Structured Prediction 1 Some slides adapted from Daum and

MACHINE LEARNING Probably Approximately Correct (PAC) Learning Alessandro Moschitti Department

Low-Cost Learning via Active Data Procurement EC 2015 Jacob - PowerPoint PPT Presentation

Low-Cost Learning via Active Data Procurement EC 2015 Jacob Abernethy Yiling Chen Chien-Ju Ho Bo Waggoner 1 General problem: buy data for learning h hypothesis (predictor) Learners LLC We Buy Data! 2 General problem: buy data for

Low-Cost Learning via Active Data Procurement October 2015 Jacob Abernethy Yiling Chen

Low-Cost Learning via Active Data Procurement September 2015 Jacob Abernethy Yiling Chen

Agenda Intro to Active Learning Activity Design Resources for Active Learning Lunch with Active

STRATEGIC PROCUREMENT September 2013 Gemma Isles Acting Head of Procurement / AGMA Procurement

The Active Card An Active Mind in an Active Body More people, More Active, More often! The

Active Adversary Lecture 7 CCA Security MAC Active Adversary Active Adversary An active

Multi-Task Active Learning Yi Zhang Outline Active Learning Multi-Task Active Learning

Romanian Public Romanian Public Procurement System Procurement System THE NATIONAL AGENCY FOR

Procurement BAS Institute February 2018 Financial Affairs Procurement &amp; Supply Chain

UN ESCAP Procurement Isabelle Dupuy Chief Procurement Officer Procurement Unit Division of

Procurement Authority and Administration of Procurement Function 13 Contents Procurement

Procurement Plan Formal Procurement Formal Procurement Methods Invitation for Bid (IFB)

RBT Procurement Sustainability and Procurement Brian Leigh RBT Head of Procurement 10

TUTORIAL - TUTORIAL -ABC ABC TOTAL COST for a COST OBJECT TOTAL COST for a COST OBJECT

Low-carbon procurement Reducing carbon emissions and improving sustainable operations through

Cost Report Capital Cost Operating Cost (Up front cost) (Annual cost over time) Utilities

Learning What is learning? Foundations of Artificial Intelligence An agent learns when it

13. Reinforcemen t Learning [Read Chapter 13] [Exercises 13.1, 13.2, 13.4] Con

Class 1 Introduction to Statistical Learning Theory Carlo Ciliberto Department of Computer

CS 285 Instructor: Sergey Levine UC Berkeley Todays Lecture 1. So far: manually design

9/12/17 Universal Design Ron Rogers for Learning @ronbrogers Ron_Rogers@ocali.org 101 Goals

Statistical Machine Learning Lecture 01: Introduction Kristian Kersting TU Darmstadt Summer

L2S: Learning to Search CS 6355: Structured Prediction 1 Some slides adapted from Daum and

MACHINE LEARNING Probably Approximately Correct (PAC) Learning Alessandro Moschitti Department

Procurement BAS Institute February 2018 Financial Affairs Procurement & Supply Chain