Jian Li
Institute for Interdisciplinary Information Sciences Tsinghua University
Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing
ICML 2014
joint work with Yuan Zhou (CMU) and Xi Chen (Berkeley)
Optimal PAC Multiple Arm Identification with Applications to - - PowerPoint PPT Presentation
ICML 2014 Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University joint work with Yuan Zhou (CMU) and Xi Chen (Berkeley) The Stochastic
Jian Li
Institute for Interdisciplinary Information Sciences Tsinghua University
Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing
ICML 2014
joint work with Yuan Zhou (CMU) and Xi Chen (Berkeley)
ο Stochastic Multi-armed Bandit
ο Set of π arms ο Each arm is associated with an unknown reward distribution
supported on [0,1] with mean ππ
ο Each time, sample an arm and receive the
reward independently drawn from the reward distribution
ο Top-K Arm identification problem
You can take N samples
Goal: (Approximately) Identify the best K arms (arms with largest means) Use as few samples as possible (i.e., minimize N)
ο Wide Applications:
ο Industrial Engineering (Koenig & Law, 85), Evolutionary
Computing (Schmidt, 06), Simulation Optimization (Chen, Fu, Shi 08)
ο Motivating Application: Crowdsourcing
Crowd
ο Workers are noisy ο How to identify reliable workers and exclude unreliable workers ? ο Test workers by golden tasks (i.e., tasks with known answers)
οΆ Each test costs money. How to identify the best πΏ workers with minimum amount
Top-π³ Arm Identification Worker Bernoulli arm with mean ππ (ππ: π-th workerβs reliability) Test with golden task Obtain a binary-valued sample (correct/wrong)
0.95 0.99 0.5
ο Sorted means π1 β₯ π2 β₯ β― β₯ ππ ο Goal: find a set of πΏ arms π to minimize the aggregate regret ο Given any π, π, the algorithm outputs a set π of πΏ arms such that ππ β€ π,
with probability at least 1 β π (PAC learning)
ο For πΏ = 1, i.e., find π : π1 β ππ β€ π w.p. 1 β π
ο [Evan-Dar, Mannor and Mansour, 06] ο [Mannor, Tsitsiklis, 04]
ο This Talk: For general K ππ = 1 πΏ ππ
πΏ π=1
β ππ
πβπ
ο Assume Bernoulli distributions from now on ο Think of a collection of biased coins ο Try to (approximately) find K coins with largest bias
(towards head)
0.5 0.55 0.6 0.45 0.8
ο Misidentification Probability (Bubeck et. al., 13): ο Consider the case: (K=1)
Pr(π β {1,2, β¦ , πΏ})
Distinguish such two coins with high confidence requires approx 10^5 samples (#samples depends on the gap π1 β π2)
Using regret (say with π = 0.01), we may choose either of them
1 0.99999
9/41
ο Explore-K (Kalyanakrishnan et al., 12, 13)
ο Select a set of πΏ arms π: βπ β π , ππ > ππΏ β π w.h.p.
(ππΏ: πΏ-th largest mean)
ο Example: π1 β₯ β― β₯ ππΏβ1 β« ππΏ and ππ+πΏ > ππΏ β π for
π = 1, β¦ , πΏ
ο Set π = πΏ + 1, πΏ + 2 β¦ , 2πΏ satisfies the requirement
Uniform Sampling Sample each coin M times Pick the K coins with the largest empirical means empirical mean: #heads/M How large M needs to be (in order to achieve π-regret)?? So the total number of samples is O(nlogn)
π = π( 1 π2 log π πΏ + 1 πΏ log 1 π ) = π(log π)
Uniform Sampling
ο With M=O(logn), we can get an estimate ππ
β² for ππ such that
ππ β ππ
β² β€ π with very high probability (say 1 β 1 π2)
ο This can be proved easily using Chernoff Bound (Concentration
bound).
ο What if we use M=O(1) (let us say M=10)
ο E.g., consider the following example (K=1):
ο 0.9, 0.5, 0.5, β¦β¦β¦β¦β¦β¦β¦., 0.5 (a million coins with mean 0.5) ο Consider a coin with mean 0.5,
Pr[All samples from this coin are head]=(1/2)^10
ο With const prob, there are more than 500 coins whose samples are all heads
ο In fact, we can show a matching lower bound
π = Ξ( 1 π2 log π πΏ + 1 πΏ log 1 π ) = Ξ(log π) One observation: if πΏ = Ξ π , π = π(1).
ο Consider the following example:
ο 0.9, 0.5, 0.5, β¦β¦β¦β¦β¦β¦β¦., 0.5 (a million coins with mean 0.5) ο Uniform sampling spends too many samples on bad coins. ο Should spend more samples on good coins
ο However, we do not know which one is good and which is badβ¦β¦
ο Sample each coin M=O(1) times.
ο If the empirical mean of a coin is large, we DO NOT know whether it
is good or bad
ο But if the empirical mean of a coin is very small, we DO know it is bad
(with high probability)
ο Input: π (no. of arms), πΏ (top-πΏ arms), π (total no. of samples/budget) ο Initialization: Active set of arms π0 = 1,2, β¦ , π , Set of top arms π0 = β
Iteration Index π = 0, Parameter πΎ β 0.75, 1
ο While π
π < πΏ and ππ > 0 do
ο If ππ > 4πΏ then
ο ππ +1=Quartile-Elimination(ππ , πΎπ 1 β πΎ π )
ο Else ( ππ β€ 4πΏ)
ο Identify the best K arms for at most 4K arms,
using uniform sampling ο π = π + 1
ο Output: set of selected πΏ arms π
π Eliminate one quarter arms with lowest empirical means
ο Idea: uniformly sample each arm in the active set π and discard the worst
quarter of arms (with the lowest empirical mean)
ο Input: π (active arms), π (budget) ο Sample each arm π β π for π / π times & let π
π be the empirical mean
ο Find the lower quartile of the empirical mean π
: |{π: π π < π }| = |π|/4
ο Output: πβ² = π \ {π: π
π < π }
ο Sample complexity π :
Outputs πΏ arms s.t. ππ =
1 πΏ
ππ
πΏ π=1
β ππ
πβπ
β€ π, w.p. 1 β π.
ο πΏ β€
π 2 : π = π π π2 1 + ln 1
π
πΏ
(this is linear!)
ο πΏ β₯
π 2 : π = π πβπΏ πΏ π π2 πβπΏ πΏ + ln 1
π
πΏ
(which can be sublinear!)
ο Apply our algorithm to identify the worst π β πΏ arms.
ο Sample complexity π :
Outputs πΏ arms s.t. ππ =
1 πΏ
ππ
πΏ π=1
β ππ
πβπ
β€ π, w.p. 1 β π.
ο πΏ β€
π 2 : π = π π π2 1 + ln 1
π
πΏ
(this is linear!)
ο πΏ β₯
π 2 : π = π πβπΏ πΏ π π2 πβπΏ πΏ + ln 1
π
πΏ
(which can be sublinear!)
ο Reduce to the πΏ β€
π 2 case by identifying the worst π β πΏ arms.
Better bound if K is larger!
ο πΏ β€
π 2 : π = π π π2 1 + ln 1
π
πΏ οΆ πΏ = 1, π = π
π π2 ln 1 π
[Even-Dar et. al., 06]
οΆ For larger πΏ, the sample complexity is smaller: identify πΏ arms is simpler ! οΆ Why? Example: π1 = 1
2 + 2π, π2 = π3 = β― ππ = 1 2 .
2π πΏ , for πΏ β₯ 2 , any set is fine.
οΆ NaΓ―ve Uniform Sampling: π = Ξ© πlog π
, log π factor worse
ο πΏ β€
π 2 : there is an underlying ππ such that for any randomized
algorithm, to identify a set π with ππ β€ π w.p. at least 1 β π, πΉ[π ] = Ξ© π π2 1 + ln 1 π πΏ
ο πΏ >
π 2 : πΉ[π ] = Ξ© πβπΏ πΏ π π2 πβπΏ πΏ + ln 1
π
πΏ
Our algorithm is optimal for every value of π, πΏ, π, π!
20/41
ο First Lower bound: πΏ β€
π 2 , π β₯ Ξ© π π2
ο Reduction to distinguishing two Bernoulli arms with means
1 2
and
1 2 + π with probability > 0.51, which requires at least
Ξ©
1 π2 samples [Chernoff, 72]
(anti-concentration)
ο Second Lower bound: πΏ β€
π 2 , π β₯ Ξ© π π2 ln 1
π
πΏ
ο A standard technique in statistical decision theory
OptMAI πΎ = 0.8, πΎ = 0.9 SAR Bubeck et. al., 13 LUCB Kalyanakrishnan et. al., 12 Uniform NaΓ―ve Uniform Sampling Simulated Experiments:
Total Budget: π = 20π, π = 50π, π = 100π Top-πΏ Arms: πΏ = 10, 20, β¦ , 500 Report average result over 100 independent runs Underlying distributions: (1) ππ~ππππππ π 0,1 (2) ππ = 0.6 for π = 1, β¦ , πΏ, ππ = 0.5 for π = πΏ + 1, β¦ , π
Metric: regret ππ
οΆ ππ~ππππππ π 0,1
οΆ ππ = 0.6 for π = 1, β¦ , πΏ, ππ = 0.5 for π = πΏ + 1, β¦ , π
24
ο RTE data for textual entailment
(Snow et. al., 08)
ο 800 binary labeling tasks with true labels ο 164 workers
ο Empirical distribution of the number tasks assigned to a worker
(πΎ = 0.9, πΏ = 10, π = 20π)
A worker receives at most 143 tasks SAR queries an arm Ξ©
π log n times
A worker receives at most 48 tasks OptMAI queries an arm π
π πΞ© 1 times
Crowdsourcing: Impossible to assign too many tasks to a single worker
ο Precision =
|πβ© 1,β¦,πΏ | πΏ
: no. of arms in π belongs to the top πΏ arms
ο Top-k arm identification ο Application in crowdsourcing ο (Worse case) Optimal upper and lower bounds ο Further direction: some instances are βeasierβ, i.e.,
0.9,0.1,0.1,0.1,β¦β¦. Can we get better upper bounds for these instance??