1-1
Collaborative Learning with Limited Interaction: Tight Bounds for Distributed Exploration in Multi-Armed Bandits
- Nov. 10, 2019
Collaborative Learning with Limited Interaction: Tight Bounds for - - PowerPoint PPT Presentation
Collaborative Learning with Limited Interaction: Tight Bounds for Distributed Exploration in Multi-Armed Bandits Chao Tao, Qin Zhang Yuan Zhou IUB UIUC Nov. 10, 2019 FOCS 2019 1-1 Collaborative Learning One of the most important tasks in
1-1
2-1
2-2
3-1
3-2
– Time: network bandwidth/latency, protocol handshaking – Energy: e.g., robots exploring in the deep sea and on Mars
3-3
– Time: network bandwidth/latency, protocol handshaking – Energy: e.g., robots exploring in the deep sea and on Mars
4-1
5-1
6-1
P1 P2 Pk
6-2
makes the next pull requests a comm. step and enters the wait mode terminates and outputs the answer. comm. comm.
P1 P2 Pk
6-3
makes the next pull requests a comm. step and enters the wait mode terminates and outputs the answer.
comm. comm.
P1 P2 Pk
7-1
7-2
r∈[R] tr,
7-3
r∈[R] tr,
8-1
TA(I, δ): expected time needed for A to succeed on I with probability at least (1 − δ).
centralized O
instance I
δ∈(0,1/3]: TO(I,δ)≤T
8-2
TA(I, δ): expected time needed for A to succeed on I with probability at least (1 − δ).
centralized O
instance I
δ∈(0,1/3]: TO(I,δ)≤T
T(best cen) T(A)
8-3
TA(I, δ): expected time needed for A to succeed on I with probability at least (1 − δ).
centralized O
instance I
δ∈(0,1/3]: TO(I,δ)≤T
T(best cen) T(A)
– Our upper bound slowly degrades (in log) as T grows
8-4
TA(I, δ): expected time needed for A to succeed on I with probability at least (1 − δ).
centralized O
instance I
δ∈(0,1/3]: TO(I,δ)≤T
– Our upper bound slowly degrades (in log) as T grows
9-1
10-1
[21]: Hillel et al. NIPS 2013; ∆min = mean of best arm - mean of 2nd best arm
10-2
K/ lnO(1) K Ω(ln K/ ln ln K) ˜ Ω(K) ln K Ω
1 ∆min /(ln ln K + ln ln 1 ∆min )
1 ∆min
˜ Ω(K) K/ lnO(1) K [21]: Hillel et al. NIPS 2013; ∆min = mean of best arm - mean of 2nd best arm
10-3
K/ lnO(1) K Ω(ln K/ ln ln K) ˜ Ω(K) ln K Ω
1 ∆min /(ln ln K + ln ln 1 ∆min )
1 ∆min
˜ Ω(K) K/ lnO(1) K
Almost tight round-speedup tradeoffs for fixed-time. Almost tight round-speedup tradeoffs for fixed-confidence. A separation for two problems.
[21]: Hillel et al. NIPS 2013; ∆min = mean of best arm - mean of 2nd best arm
Today’s focus (LB)
10-4
K/ lnO(1) K Ω(ln K/ ln ln K) ˜ Ω(K) ln K Ω
1 ∆min /(ln ln K + ln ln 1 ∆min )
1 ∆min
˜ Ω(K) K/ lnO(1) K
A generalization of the round-elimination technique. A new technique for instance-dependent round complexity. Almost tight round-speedup tradeoffs for fixed-time. Almost tight round-speedup tradeoffs for fixed-confidence. A separation for two problems.
[21]: Hillel et al. NIPS 2013; ∆min = mean of best arm - mean of 2nd best arm
Today’s focus (LB) Today
11-1
12-1
12-2
13-1
∆2
min /K
– Non-adaptive algos: all arm pulls should be determined at the beginning of each round – Translated into our collaborative learning setting
13-2
∆2
min /K
2, and (n − 1) arms with mean
2 − ∆min
– Non-adaptive algos: all arm pulls should be determined at the beginning of each round – Translated into our collaborative learning setting
14-1
14-2
14-3
15-1
i=2 1 ∆2
i /K
15-2
i=2 1 ∆2
i /K
16-1
17-1
17-2
18-1
Define Dj to be the class of distributions µ with support {B−1, . . . , B−(j−1), B−j, . . . , B−L}, such that if X ∼ µ, then
, where λj is a normalization factor
X Prob (logB)
ℓ < j ℓ = j ℓ = j + 1 ℓ = L − 1 ℓ = L
Let α ∈ [1, n0,2] be a parameter, B = γ = α log10 n, L = log n/(log log n + log α), ρ = log3 n.
18-2
Define Dj to be the class of distributions µ with support {B−1, . . . , B−(j−1), B−j, . . . , B−L}, such that if X ∼ µ, then
, where λj is a normalization factor
X Prob (logB)
ℓ < j ℓ = j ℓ = j + 1 ℓ = L − 1 ℓ = L
Let α ∈ [1, n0,2] be a parameter, B = γ = α log10 n, L = log n/(log log n + log α), ρ = log3 n. Arms i.i.d. with mean 1
2 − X
Try to embed the pyramid distribution into each arm
19-1
a = 1
2 − B−(j+1)
γB2j − √10γ ln nBj, b = γB2j
2
+ Bj+0.6
Consider an arm with mean 1
2 − X
some j ∈ [L − 1]. We pull the arm γB2j times. Let Θ = (Θ1, Θ2, . . . , ΘγB2j ) be the pull outcomes, and let |Θ| =
i∈[γB2j ] Θi.
If |Θ| ∈ [a, b], then publish the arm.
E[|Θ|] if X = B−ℓ for ℓ > j b a
19-2
a = 1
2 − B−(j+1)
γB2j − √10γ ln nBj, b = γB2j
2
+ Bj+0.6
Consider an arm with mean 1
2 − X
some j ∈ [L − 1]. We pull the arm γB2j times. Let Θ = (Θ1, Θ2, . . . , ΘγB2j ) be the pull outcomes, and let |Θ| =
i∈[γB2j ] Θi.
If |Θ| ∈ [a, b], then publish the arm.
E[|Θ|] if X = B−ℓ for ℓ > j b a
20-1
20-2
Round reduction. For any j ≤ L
2 − 1,
∃ r-round (K/α)-speedup non-adaptive algorithm with error prob. δ
⇒ ∃ (r − 1)-round (K/α)-speedup non-adaptive algorithm with error
1
L
nj+1 ∈ Ij+1
(Ij =
L )B−2j−1)
20-3
Round reduction. For any j ≤ L
2 − 1,
∃ r-round (K/α)-speedup non-adaptive algorithm with error prob. δ
⇒ ∃ (r − 1)-round (K/α)-speedup non-adaptive algorithm with error
1
L
nj+1 ∈ Ij+1
(Ij =
L )B−2j−1)
Base Case: Any 0-round algorithm must have error 0.99 on any distribution in (D L
2 )
n L
2 (∀ n L 2 ∈ I L 2 ).
21-1
Let S be the set of arms which will be pulled more than γB2j times (note: we are considering non-adaptive algos)
21-2
Let S be the set of arms which will be pulled more than γB2j times (note: we are considering non-adaptive algos) Algorithm Augmentation (for j-th round)
γB2j. Let Θz = (Θz,1, . . . , Θz,γB2j ) be the γB2j pull outcomes. If |Θz| ∈ [a, b], we publish the arm.
published arm with mean 1
2 − B−L
, then we return “error”. ⇒ (by key property of Dj) resulting posterior distribution on unpublished arms in (Dj+1)nj+1 (nj+1 ∈ Ij+1)
21-3
L) in the induction.
Let S be the set of arms which will be pulled more than γB2j times (note: we are considering non-adaptive algos) Algorithm Augmentation (for j-th round)
γB2j. Let Θz = (Θz,1, . . . , Θz,γB2j ) be the γB2j pull outcomes. If |Θz| ∈ [a, b], we publish the arm.
published arm with mean 1
2 − B−L
, then we return “error”. ⇒ (by key property of Dj) resulting posterior distribution on unpublished arms in (Dj+1)nj+1 (nj+1 ∈ Ij+1)
22-1
Prove by a coupling-like argument: Compare the behavior
23-1
24-1
24-2
25-1
26-1
27-1
– Randomly partition the n arms to K agents. – Each agent runs a centralized algo for T/2 time,
R−1 R .
– Each agent spends T/(2R) time uniformly on K arms. – Eliminate arms whose empirical means smaller than (top empirical mean − ǫ(K, R, T, #candidates))
R ), the algo succeeds w.pr. 0.99
28-1
R , the error diminishes
28-2
R , the error diminishes
28-3
R , the error diminishes
28-4
R , the error diminishes
R , centralized algo
28-5
R , the error diminishes
R , centralized algo
2 , choose a random time budget in { T 2 , T 200}
29-1
30-1
2 + ∆)
I(∆)
30-2
2 + ∆)
I(∆)
31-1
I(∆♭)
31-2
I(∆♭)
32-1
I(∆)[E∗(α, ∆−2/(Kq))] ≥ Pr I(∆)[E(α, ∆−2/(Kq))] − δ(K, q)
32-2
I(∆)[E∗(α, ∆−2/(Kq))] ≥ Pr I(∆)[E(α, ∆−2/(Kq))] − δ(K, q)
33-1
I(∆/ζ)[E(α + 1, ∆−2/(Kq) + ∆−2/β)]
I(∆)[E∗(α, ∆−2/(Kq))] − δ′(K, q, β)
E(α, T): A uses at least α rounds and at most T time before the end of the α-th round. E∗(α, T): A uses at least (α + 1) rounds and at most T time before the end of the α-th round.
33-2
I(∆/ζ)[E(α + 1, ∆−2/(Kq) + ∆−2/β)]
I(∆)[E∗(α, ∆−2/(Kq))] − δ′(K, q, β)
E(α, T): A uses at least α rounds and at most T time before the end of the α-th round. E∗(α, T): A uses at least (α + 1) rounds and at most T time before the end of the α-th round.
34-1
D′⊗X[A] ≤ γ · exp
35-1
I(∆/ζ)[E(α + 1, ∆−2/(Kq) + ∆−2/β)]
I(∆)[E∗(α, ∆−2/(Kq))] − δ′(K, q, β)
I(∆)[E∗(α, ∆−2/(Kq))] ≥ Pr I(∆)[E(α, ∆−2/(Kq))] − δ(K, q)
35-2
I(∆/ζ)[E(α + 1, ∆−2/(Kq) + ∆−2/β)]
I(∆)[E∗(α, ∆−2/(Kq))] − δ′(K, q, β)
I(∆)[E∗(α, ∆−2/(Kq))] ≥ Pr I(∆)[E(α, ∆−2/(Kq))] − δ(K, q)
∆−2/(Kq) + ∆−2/β = (∆/ζ)−2/(Kq)