Optimal PAC Multiple Arm Identification with Applications to - - PowerPoint PPT Presentation

β–Ά
optimal pac multiple arm identification with
SMART_READER_LITE
LIVE PREVIEW

Optimal PAC Multiple Arm Identification with Applications to - - PowerPoint PPT Presentation

ICML 2014 Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University joint work with Yuan Zhou (CMU) and Xi Chen (Berkeley) The Stochastic


slide-1
SLIDE 1

Jian Li

Institute for Interdisciplinary Information Sciences Tsinghua University

Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing

ICML 2014

joint work with Yuan Zhou (CMU) and Xi Chen (Berkeley)

slide-2
SLIDE 2

The Stochastic Multi-armed Bandit

ο‚— Stochastic Multi-armed Bandit

ο‚— Set of π‘œ arms ο‚— Each arm is associated with an unknown reward distribution

supported on [0,1] with mean πœ„π‘—

ο‚— Each time, sample an arm and receive the

reward independently drawn from the reward distribution

slide-3
SLIDE 3

The Stochastic Multi-armed Bandit

ο‚— Top-K Arm identification problem

You can take N samples

  • A sample: Choose an arm, play it once, and observe the reward

Goal: (Approximately) Identify the best K arms (arms with largest means) Use as few samples as possible (i.e., minimize N)

slide-4
SLIDE 4

Motivating Applications

ο‚— Wide Applications:

ο‚— Industrial Engineering (Koenig & Law, 85), Evolutionary

Computing (Schmidt, 06), Simulation Optimization (Chen, Fu, Shi 08)

ο‚— Motivating Application: Crowdsourcing

Crowd

slide-5
SLIDE 5

Motivating Applications

ο‚— Workers are noisy ο‚— How to identify reliable workers and exclude unreliable workers ? ο‚— Test workers by golden tasks (i.e., tasks with known answers)

 Each test costs money. How to identify the best 𝐿 workers with minimum amount

  • f money?

Top-𝑳 Arm Identification Worker Bernoulli arm with mean πœ„π‘— (πœ„π‘—: 𝑗-th worker’s reliability) Test with golden task Obtain a binary-valued sample (correct/wrong)

0.95 0.99 0.5

slide-6
SLIDE 6

Evaluation Metric

ο‚— Sorted means πœ„1 β‰₯ πœ„2 β‰₯ β‹― β‰₯ πœ„π‘œ ο‚— Goal: find a set of 𝐿 arms π‘ˆ to minimize the aggregate regret ο‚— Given any πœ—, πœ€, the algorithm outputs a set π‘ˆ of 𝐿 arms such that π‘€π‘ˆ ≀ πœ—,

with probability at least 1 βˆ’ πœ€ (PAC learning)

ο‚— For 𝐿 = 1, i.e., find 𝑗 : πœ„1 βˆ’ πœ„π‘— ≀ πœ— w.p. 1 βˆ’ πœ€

ο‚— [Evan-Dar, Mannor and Mansour, 06] ο‚— [Mannor, Tsitsiklis, 04]

ο‚— This Talk: For general K π‘€π‘ˆ = 1 𝐿 πœ„π‘—

𝐿 𝑗=1

βˆ’ πœ„π‘—

π‘—βˆˆπ‘ˆ

slide-7
SLIDE 7

Simplification

ο‚— Assume Bernoulli distributions from now on ο‚— Think of a collection of biased coins ο‚— Try to (approximately) find K coins with largest bias

(towards head)

0.5 0.55 0.6 0.45 0.8

slide-8
SLIDE 8

Why aggregate regret?

ο‚— Misidentification Probability (Bubeck et. al., 13): ο‚— Consider the case: (K=1)

Pr(π‘ˆ β‰  {1,2, … , 𝐿})

Distinguish such two coins with high confidence requires approx 10^5 samples (#samples depends on the gap πœ„1 βˆ’ πœ„2)

Using regret (say with πœ— = 0.01), we may choose either of them

1 0.99999

slide-9
SLIDE 9

Why aggregate regret?

9/41

ο‚— Explore-K (Kalyanakrishnan et al., 12, 13)

ο‚— Select a set of 𝐿 arms π‘ˆ: βˆ€π‘— ∈ π‘ˆ , πœ„π‘— > πœ„πΏ βˆ’ πœ— w.h.p.

(πœ„πΏ: 𝐿-th largest mean)

ο‚— Example: πœ„1 β‰₯ β‹― β‰₯ πœ„πΏβˆ’1 ≫ πœ„πΏ and πœ„π‘—+𝐿 > πœ„πΏ βˆ’ πœ— for

𝑗 = 1, … , 𝐿

ο‚— Set π‘ˆ = 𝐿 + 1, 𝐿 + 2 … , 2𝐿 satisfies the requirement

slide-10
SLIDE 10

NaΓ―ve Solution

Uniform Sampling Sample each coin M times Pick the K coins with the largest empirical means empirical mean: #heads/M How large M needs to be (in order to achieve πœ—-regret)?? So the total number of samples is O(nlogn)

𝑁 = 𝑃( 1 πœ—2 log π‘œ 𝐿 + 1 𝐿 log 1 πœ€ ) = 𝑃(log π‘œ)

slide-11
SLIDE 11

NaΓ―ve Solution

Uniform Sampling

ο‚— With M=O(logn), we can get an estimate πœ„π‘—

β€² for πœ„π‘— such that

πœ„π‘— βˆ’ πœ„π‘—

β€² ≀ πœ— with very high probability (say 1 βˆ’ 1 π‘œ2)

ο‚— This can be proved easily using Chernoff Bound (Concentration

bound).

ο‚— What if we use M=O(1) (let us say M=10)

ο‚— E.g., consider the following example (K=1):

ο‚— 0.9, 0.5, 0.5, …………………., 0.5 (a million coins with mean 0.5) ο‚— Consider a coin with mean 0.5,

Pr[All samples from this coin are head]=(1/2)^10

ο‚— With const prob, there are more than 500 coins whose samples are all heads

slide-12
SLIDE 12

Uniform Sampling

ο‚— In fact, we can show a matching lower bound

𝑁 = Θ( 1 πœ—2 log π‘œ 𝐿 + 1 𝐿 log 1 πœ€ ) = Θ(log π‘œ) One observation: if 𝐿 = Θ π‘œ , 𝑁 = 𝑃(1).

slide-13
SLIDE 13

Can we do better??

ο‚— Consider the following example:

ο‚— 0.9, 0.5, 0.5, …………………., 0.5 (a million coins with mean 0.5) ο‚— Uniform sampling spends too many samples on bad coins. ο‚— Should spend more samples on good coins

ο‚— However, we do not know which one is good and which is bad……

ο‚— Sample each coin M=O(1) times.

ο‚— If the empirical mean of a coin is large, we DO NOT know whether it

is good or bad

ο‚— But if the empirical mean of a coin is very small, we DO know it is bad

(with high probability)

slide-14
SLIDE 14

Optimal Multiple Arm Identification (OptMAI)

ο‚— Input: π‘œ (no. of arms), 𝐿 (top-𝐿 arms), 𝑅 (total no. of samples/budget) ο‚— Initialization: Active set of arms 𝑇0 = 1,2, … , π‘œ , Set of top arms π‘ˆ0 = βˆ…

Iteration Index 𝑠 = 0, Parameter 𝛾 ∈ 0.75, 1

ο‚— While π‘ˆ

𝑠 < 𝐿 and 𝑇𝑠 > 0 do

ο‚— If 𝑇𝑠 > 4𝐿 then

ο‚— 𝑇𝑠+1=Quartile-Elimination(𝑇𝑠, 𝛾𝑠 1 βˆ’ 𝛾 𝑅)

ο‚— Else ( 𝑇𝑠 ≀ 4𝐿)

ο‚— Identify the best K arms for at most 4K arms,

using uniform sampling ο‚— 𝑠 = 𝑠 + 1

ο‚— Output: set of selected 𝐿 arms π‘ˆ

𝑠 Eliminate one quarter arms with lowest empirical means

slide-15
SLIDE 15

Quartile-Elimination

ο‚— Idea: uniformly sample each arm in the active set 𝑇 and discard the worst

quarter of arms (with the lowest empirical mean)

ο‚— Input: 𝑇 (active arms), 𝑅(budget) ο‚— Sample each arm 𝑗 ∈ 𝑇 for 𝑅/ 𝑇 times & let πœ„

𝑗 be the empirical mean

ο‚— Find the lower quartile of the empirical mean π‘Ÿ

: |{𝑗: πœ„ 𝑗 < π‘Ÿ }| = |𝑇|/4

ο‚— Output: 𝑇′ = 𝑇 \ {𝑗: πœ„

𝑗 < π‘Ÿ }

slide-16
SLIDE 16

Sample Complexity

ο‚— Sample complexity 𝑅:

Outputs 𝐿 arms s.t. π‘€π‘ˆ =

1 𝐿

πœ„π‘—

𝐿 𝑗=1

βˆ’ πœ„π‘—

π‘—βˆˆπ‘ˆ

≀ πœ—, w.p. 1 βˆ’ πœ€.

ο‚— 𝐿 ≀

π‘œ 2 : 𝑅 = 𝑃 π‘œ πœ—2 1 + ln 1

πœ€

𝐿

(this is linear!)

ο‚— 𝐿 β‰₯

π‘œ 2 : 𝑅 = 𝑃 π‘œβˆ’πΏ 𝐿 π‘œ πœ—2 π‘œβˆ’πΏ 𝐿 + ln 1

πœ€

𝐿

(which can be sublinear!)

ο‚— Apply our algorithm to identify the worst π‘œ βˆ’ 𝐿 arms.

slide-17
SLIDE 17

Sample Complexity

ο‚— Sample complexity 𝑅:

Outputs 𝐿 arms s.t. π‘€π‘ˆ =

1 𝐿

πœ„π‘—

𝐿 𝑗=1

βˆ’ πœ„π‘—

π‘—βˆˆπ‘ˆ

≀ πœ—, w.p. 1 βˆ’ πœ€.

ο‚— 𝐿 ≀

π‘œ 2 : 𝑅 = 𝑃 π‘œ πœ—2 1 + ln 1

πœ€

𝐿

(this is linear!)

ο‚— 𝐿 β‰₯

π‘œ 2 : 𝑅 = 𝑃 π‘œβˆ’πΏ 𝐿 π‘œ πœ—2 π‘œβˆ’πΏ 𝐿 + ln 1

πœ€

𝐿

(which can be sublinear!)

ο‚— Reduce to the 𝐿 ≀

π‘œ 2 case by identifying the worst π‘œ βˆ’ 𝐿 arms.

Better bound if K is larger!

slide-18
SLIDE 18

Sample Complexity

ο‚— 𝐿 ≀

π‘œ 2 : 𝑅 = 𝑃 π‘œ πœ—2 1 + ln 1

πœ€

𝐿  𝐿 = 1, 𝑅 = 𝑃

π‘œ πœ—2 ln 1 πœ€

[Even-Dar et. al., 06]

 For larger 𝐿, the sample complexity is smaller: identify 𝐿 arms is simpler !  Why? Example: πœ„1 = 1

2 + 2πœ—, πœ„2 = πœ„3 = β‹― πœ„π‘œ = 1 2 .

  • Identify the first arm (𝐿 = 1) is hard ! Cannot pick the wrong arm.
  • Since π‘€π‘ˆ ≀

2πœ— 𝐿 , for 𝐿 β‰₯ 2 , any set is fine.

 NaΓ―ve Uniform Sampling: 𝑅 = Ξ© π‘œlog π‘œ

, log π‘œ factor worse

slide-19
SLIDE 19

Matching Lower Bounds

ο‚— 𝐿 ≀

π‘œ 2 : there is an underlying πœ„π‘— such that for any randomized

algorithm, to identify a set π‘ˆ with π‘€π‘ˆ ≀ πœ— w.p. at least 1 βˆ’ πœ€, 𝐹[𝑅] = Ξ© π‘œ πœ—2 1 + ln 1 πœ€ 𝐿

ο‚— 𝐿 >

π‘œ 2 : 𝐹[𝑅] = Ξ© π‘œβˆ’πΏ 𝐿 π‘œ πœ—2 π‘œβˆ’πΏ 𝐿 + ln 1

πœ€

𝐿

Our algorithm is optimal for every value of π‘œ, 𝐿, πœ—, πœ€!

slide-20
SLIDE 20

Matching Lower Bounds

20/41

ο‚— First Lower bound: 𝐿 ≀

π‘œ 2 , 𝑅 β‰₯ Ξ© π‘œ πœ—2

ο‚— Reduction to distinguishing two Bernoulli arms with means

1 2

and

1 2 + πœ— with probability > 0.51, which requires at least

Ξ©

1 πœ—2 samples [Chernoff, 72]

(anti-concentration)

ο‚— Second Lower bound: 𝐿 ≀

π‘œ 2 , 𝑅 β‰₯ Ξ© π‘œ πœ—2 ln 1

πœ€

𝐿

ο‚— A standard technique in statistical decision theory

slide-21
SLIDE 21

Experiments

OptMAI 𝛾 = 0.8, 𝛾 = 0.9 SAR Bubeck et. al., 13 LUCB Kalyanakrishnan et. al., 12 Uniform NaΓ―ve Uniform Sampling Simulated Experiments:

  • No. of Arms: π‘œ = 1000

Total Budget: 𝑅 = 20π‘œ, 𝑅 = 50π‘œ, 𝑅 = 100π‘œ Top-𝐿 Arms: 𝐿 = 10, 20, … , 500 Report average result over 100 independent runs Underlying distributions: (1) πœ„π‘—~π‘‰π‘œπ‘—π‘”π‘π‘ π‘› 0,1 (2) πœ„π‘— = 0.6 for 𝑗 = 1, … , 𝐿, πœ„π‘— = 0.5 for 𝑗 = 𝐿 + 1, … , π‘œ

Metric: regret π‘€π‘ˆ

slide-22
SLIDE 22

Simulated Experiment

 πœ„π‘—~π‘‰π‘œπ‘—π‘”π‘π‘ π‘› 0,1

slide-23
SLIDE 23

Simulated Data

 πœ„π‘— = 0.6 for 𝑗 = 1, … , 𝐿, πœ„π‘— = 0.5 for 𝑗 = 𝐿 + 1, … , π‘œ

slide-24
SLIDE 24

Real Data

24

ο‚— RTE data for textual entailment

(Snow et. al., 08)

ο‚— 800 binary labeling tasks with true labels ο‚— 164 workers

slide-25
SLIDE 25

Real Data

ο‚— Empirical distribution of the number tasks assigned to a worker

(𝛾 = 0.9, 𝐿 = 10, 𝑅 = 20π‘œ)

A worker receives at most 143 tasks SAR queries an arm Ξ©

𝑅 log n times

A worker receives at most 48 tasks OptMAI queries an arm 𝑃

𝑅 π‘œΞ© 1 times

Crowdsourcing: Impossible to assign too many tasks to a single worker

slide-26
SLIDE 26

Real Data

ο‚— Precision =

|π‘ˆβˆ© 1,…,𝐿 | 𝐿

: no. of arms in π‘ˆ belongs to the top 𝐿 arms

slide-27
SLIDE 27

Conclusion

ο‚— Top-k arm identification ο‚— Application in crowdsourcing ο‚— (Worse case) Optimal upper and lower bounds ο‚— Further direction: some instances are β€œeasier”, i.e.,

0.9,0.1,0.1,0.1,……. Can we get better upper bounds for these instance??

slide-28
SLIDE 28

Thanks.

lapordge@gmail.com