Exploiting Structure of Uncertainty for Efficient Matroid - - PowerPoint PPT Presentation

▶

Nov 11, 2022 210 likes •302 views

Exploiting Structure of Uncertainty for Efficient Matroid Semi-Bandits Pierre Perrault (INRIA Lille CMLA, ENS PS) Vianney Perchet (CMLA, ENS PS Criteo AI Lab) Michal Valko (INRIA Lille) Perrault et al. Exploiting Structure of

SLIDE 1

Exploiting Structure of Uncertainty for Efficient Matroid Semi-Bandits

Pierre Perrault (INRIA Lille — CMLA, ENS PS) Vianney Perchet (CMLA, ENS PS — Criteo AI Lab) Michal Valko (INRIA Lille)

Perrault et al. Exploiting Structure of Uncertainty 1 / 6

SLIDE 2

Semi-bandits confidence regions

max

i

(δi − µi,t−1)2Ni,t−1
≤ log(t)
i

(δi − µi,t−1)2Ni,t−1 ≤ log(t) Not very accurate Accurate

Perrault et al. Exploiting Structure of Uncertainty 2 / 6

SLIDE 3

Efficiency

Algorithms use the OFU principle: At ∈ arg max

A∈A, µ∈Ct

eT

Aµ = arg max A∈A

eT

Aµt−1 L(A)

+ max

µ∈Ct−µt−1

eT

Aµ

F(A)

. Theorem (Perrault et al.) : F linear, : F submodular. max

i

(δi − µi,t−1)2Ni,t−1
≤ log(t)
i

(δi − µi,t−1)2Ni,t−1 ≤ log(t) Not very accurate Accurate Efficient Inefficient

Perrault et al. Exploiting Structure of Uncertainty 3 / 6

SLIDE 4

NEW: Approximation for matroid

Assume non-negative rewards. A is the family of independent sets. GREEDY: L(S) + F(S) 1 − 1/e ≥ L(O) + F(O), ∀O ∈ A. Gives linear regret. We expect a constant close to 1 for F small.

Perrault et al. Exploiting Structure of Uncertainty 4 / 6

SLIDE 5

NEW: Approximation for matroid

Assume non-negative rewards. A is the family of independent sets. GREEDY: L(S) + F(S) 1 − 1/e ≥ L(O) + F(O), ∀O ∈ A. Gives linear regret. We expect a constant close to 1 for F small. L(S1) ≥ L(O), ∀O ∈ A. F(S2) 1 − 1/e ≥ F(O), ∀O ∈ A.

Perrault et al. Exploiting Structure of Uncertainty 4 / 6

SLIDE 6

NEW: Approximation for matroid

Assume non-negative rewards. A is the family of independent sets. GREEDY: L(S) + F(S) 1 − 1/e ≥ L(O) + F(O), ∀O ∈ A. Gives linear regret. We expect a constant close to 1 for F small. L(S1) ≥ L(O), ∀O ∈ A. F(S2) 1 − 1/e ≥ F(O), ∀O ∈ A. Theorem (Perrault et al.) GREEDY for maximizing L + F gives S such that L(S) + 2F(S) ≥ L(O) + F(O), ∀O ∈ A.

Perrault et al. Exploiting Structure of Uncertainty 4 / 6

SLIDE 7

When reward can be negative...

LOCAL SEARCH based algorithm. Theorem (Perrault et al.) L(S) + 2(1 + ε)F(S) ≥ L(O) + F(O), ∀O ∈ A. Time complexity per round: O

m2n log(mt)/ε
Start from the greedy solution Sinit ∈ arg max

A

L(A). Then, repeatedly try three basic operations in order to improve the current solution. Improvements greater than ε mF(S).

Perrault et al. Exploiting Structure of Uncertainty 5 / 6

SLIDE 8

Thank you! Poster: Pacific Ballroom #53

Extension to budgeted bandit, where we want to minimize L1 − F1 L2 + F2

. Solution uses NEW concept: Approximation Lagrangian Lκ(λ, S) L1(S) − κF1(S) − λ(L2(S) + κF2(S)), Experiments

101 103 105 T 1 2 3 4 5 RT ×102 CUCB ESCB 101 103 105 T 1 2 3 RT ×102 CUCB ESCB Perrault et al. Exploiting Structure of Uncertainty 6 / 6