Announcements HW 1 deadline is postponed to next Tuesday before - - PowerPoint PPT Presentation

announcements
SMART_READER_LITE
LIVE PREVIEW

Announcements HW 1 deadline is postponed to next Tuesday before - - PowerPoint PPT Presentation

Announcements Announcements HW 1 deadline is postponed to next Tuesday before class, e.g., HW 1 deadline is postponed to next Tuesday before class, e.g., 3:30 pm 3:30 pm 1 1 CS6501: T opics in Learning and Game Theory (Fall 2019) Swap


slide-1
SLIDE 1

1

Announcements

ØHW 1 deadline is postponed to next Tuesday before class, e.g.,

3:30 pm

1

Announcements

ØHW 1 deadline is postponed to next Tuesday before class, e.g.,

3:30 pm

slide-2
SLIDE 2

CS6501: T

  • pics in Learning and Game Theory

(Fall 2019)

Swap Regret and Convergence to CE

Instructor: Haifeng Xu

slide-3
SLIDE 3

3

Outline

Ø (External) Regret vs Swap Regret Ø Convergence to Correlated Equilibrium Ø Converting Regret Bounds to Swap Regret Bounds

slide-4
SLIDE 4

4

Recap: Online Learning

At each time step 𝑢 = 1, ⋯ , 𝑈, the following occurs in order:

1.

Learner picks a distribution 𝑞( over actions [𝑜]

2.

Adversary picks cost vector 𝑑( ∈ 0,1 /

3.

Action 𝑗( ∼ 𝑞( is chosen and learner incurs cost 𝑑((𝑗()

4.

Learner observes 𝑑( (for use in future time steps)

slide-5
SLIDE 5

5

Recap: (External) Regret

ØExternal regret ØBenchmark min

7∈[/] ∑( 𝑑((𝑘) is the learner utility had he known 𝑑:, ⋯ , 𝑑;

and is allowed to take the best single action across all rounds

ØDescribes how much the learner regrets, had he known the cost

vector 𝑑:, ⋯ , 𝑑; in hindsight

𝑆; = 𝔽>?∼@? ∑(∈[;] 𝑑( 𝑗( − min

7∈[/] ∑(∈[;] 𝑑((𝑘)

slide-6
SLIDE 6

6

Recap: (External) Regret

ØA closer look at external regret

= max

7∈[/] ∑(∈ ; ∑>∈[/] 𝑑( 𝑗 𝑞((𝑗) − ∑(∈[;] 𝑑((𝑘)

𝑆; = 𝔽>?∼@? ∑(∈[;] 𝑑( 𝑗( − min

7∈[/] ∑(∈[;] 𝑑((𝑘)

= ∑(∈ ; ∑>∈[/] 𝑑( 𝑗 𝑞((𝑗) − min

7∈[/] ∑(∈[;] 𝑑((𝑘)

= max

7∈[/] ∑(∈ ; ∑>∈[/][𝑑( 𝑗 − 𝑑((𝑘)]𝑞((𝑗)

Many-to-one action swap

slide-7
SLIDE 7

7

Recap: (External) Regret

ØA closer look at external regret

= max

7∈[/] ∑(∈ ; ∑>∈[/] 𝑑( 𝑗 𝑞((𝑗) − ∑(∈[;] 𝑑((𝑘)

𝑆; = 𝔽>?∼@? ∑(∈[;] 𝑑( 𝑗( − min

7∈[/] ∑(∈[;] 𝑑((𝑘)

= ∑(∈ ; ∑>∈[/] 𝑑( 𝑗 𝑞((𝑗) − min

7∈[/] ∑(∈[;] 𝑑((𝑘)

= max

7∈[/] ∑(∈ ; ∑>∈[/][𝑑( 𝑗 − 𝑑((𝑘)]𝑞((𝑗)

ØIn external regret, learner is allowed to swap to a single action 𝑘

and can choose the best 𝑘 in hindsight

slide-8
SLIDE 8

8

Swap Regret

ØA closer look at external regret

𝑆;

ØSwap regret allows many-to-many action swap

  • E.g., 𝑡 1 = 2, 𝑡 2 = 1, 𝑡 3 = 4, 𝑡 4 = 4

ØFormally,

where max is over all possible swap functions

Ø𝑜/ many swap functions, each action 𝑗 has 𝑜 choices to swap to ØQuiz: how many many-to-one swaps?

𝑑((𝑡(𝑗))

𝑡𝑥𝑆; = max

I

∑(∈ ; ∑>∈[/][𝑑( 𝑗 − 𝑑((𝑡(𝑗))]𝑞((𝑗) = max

7∈[/] ∑(∈ ; ∑>∈[/][𝑑( 𝑗 − 𝑑((𝑘)]𝑞((𝑗)

slide-9
SLIDE 9

9

Some Facts about Swap Regret

Recall swap regret

𝑡𝑥𝑆; = max

I

∑(∈ ; ∑>∈[/][𝑑( 𝑗 − 𝑑((𝑡(𝑗))]𝑞((𝑗)

Fact 1. For any algorithm: 𝑡𝑥𝑆; ≥ 𝑆; Proof:

Ø𝑡(𝑗) only affects term ∑(∈ ; [𝑑( 𝑗 − 𝑑((𝑡(𝑗))]𝑞((𝑗), so should be

picked to maximize this term Fact 2. For any algorithm execution 𝑞:, ⋯ , 𝑞;, the optimal swap function 𝑡∗ satisfies, for any 𝑗, 𝑡∗ 𝑗 = arg max

7∈[/] ∑(∈ ; [𝑑( 𝑗 − 𝑑((𝑘)]𝑞((𝑗)

slide-10
SLIDE 10

10

Some Facts about Swap Regret

Remarks:

ØThe optimal swap can be decided “independently” for each 𝑗

Fact 1. For any algorithm: 𝑡𝑥𝑆; ≥ 𝑆; Fact 2. For any algorithm execution 𝑞:, ⋯ , 𝑞;, the optimal swap function 𝑡∗ satisfies, for any 𝑗, 𝑡∗ 𝑗 = arg max

7∈[/] ∑(∈ ; [𝑑( 𝑗 − 𝑑((𝑘)]𝑞((𝑗)

slide-11
SLIDE 11

11

Some Facts about Swap Regret

Remarks:

ØBenchmark of swap regret depends on the algorithm execution

𝑞:, ⋯ , 𝑞;, but benchmark of external regret does not.

ØThis raises a subtle issue: an algorithm minimize swap regret

does not necessarily minimize the total loss

  • An algorithm may intentionally take less actions so the benchmark

does not have many opportunities to swap

Fact 1. For any algorithm: 𝑡𝑥𝑆; ≥ 𝑆; Fact 2. For any algorithm execution 𝑞:, ⋯ , 𝑞;, the optimal swap function 𝑡∗ satisfies, for any 𝑗, 𝑡∗ 𝑗 = arg max

7∈[/] ∑(∈ ; [𝑑( 𝑗 − 𝑑((𝑘)]𝑞((𝑗)

slide-12
SLIDE 12

12

Some Facts about Swap Regret

Fact 1. For any algorithm: 𝑡𝑥𝑆; ≥ 𝑆; Fact 2. For any algorithm execution 𝑞:, ⋯ , 𝑞;, the optimal swap function 𝑡∗ satisfies, for any 𝑗, 𝑡∗ 𝑗 = arg max

7∈[/] ∑(∈ ; [𝑑( 𝑗 − 𝑑((𝑘)]𝑞((𝑗)

is also called the internal regret

max

>∈[/] max 7∈[/] ∑(∈ ; [𝑑( 𝑗 − 𝑑((𝑘)]𝑞((𝑗)

pick worst 𝑗

slide-13
SLIDE 13

13

Outline

Ø (External) Regret vs Swap Regret Ø Convergence to Correlated Equilibrium Ø Converting Regret Bounds to Swap Regret Bounds

slide-14
SLIDE 14

14

Recap: Normal-Form Games and CE

Ø 𝑜 players, denoted by set 𝑜 = {1, ⋯ , 𝑜} Ø Player 𝑗 takes action 𝑏> ∈ 𝐵> Ø Player utility depends on the outcome of the game, i.e., an action

profile 𝑏 = (𝑏:, ⋯ , 𝑏/)

  • Player 𝑗 receives payoff 𝑣>(𝑏) for any outcome 𝑏 ∈ Π>T:

/ 𝐵>

Ø Correlated equilibrium is an action recommendation policy

A recommendation policy 𝜌 is a correlated equilibrium if

∑VWX 𝑣> 𝑏>, 𝑏Y> ⋅ 𝜌(𝑏>, 𝑏Y>) ≥ ∑VWX 𝑣> 𝑏[>, 𝑏Y> ⋅ 𝜌 𝑏>, 𝑏Y> , ∀ 𝑏[> ∈ 𝐵>, ∀𝑗 ∈ 𝑜 .

Ø That is, for any recommended action 𝑏>, player 𝑗 does not want

to “swap” to another 𝑏>

[

slide-15
SLIDE 15

15

Repeated Games with No-Swap-Regret Players

ØThe game is played repeatedly for 𝑈 rounds ØEach player uses an online learning algorithm to select a mixed

strategy at each round 𝑢

ØFor any player 𝑗’s perspective, the following occurs in order at 𝑢

  • Picks a mixed strategy 𝑦>

( ∈ Δ|`X| over actions in 𝐵>

  • Any other player 𝑘 ≠ 𝑗 picks a mixed strategy 𝑦7

( ∈ Δ|`b|

  • Player 𝑗 receives expected utility 𝑣> 𝑦>

(, 𝑦Y> (

= 𝔽V∼(cX

?,cWX ? ) 𝑣>(𝑏)

  • Player 𝑗 learns 𝑦Y>

( (for future use)

slide-16
SLIDE 16

16

From No Swap Regret to Correlated Equ

Remarks:

ØIn mixed strategy profile 𝑦:

(, 𝑦d (, ⋯ , 𝑦/ ( , prob. of 𝑏 is Π>∈ / 𝑦> ((𝑏>)

Ø𝜌;(𝑏) is simply the average of Π>∈ / 𝑦>

((𝑏>) over 𝑈 rounds

  • Theorem. If all players use no-swap-regret learning algorithms with

strategy sequence 𝑦>

( (∈[;] for 𝑗. The following recommendation

policy 𝜌; converges to a CE: 𝜌; 𝑏 =

: ; ∑( Π>∈ / 𝑦> ((𝑏>) , ∀ 𝑏 ∈ 𝐵.

slide-17
SLIDE 17

17

∑V∈`

: ; ∑( Π>∈ / 𝑦> ((𝑏>)

⋅ 𝑣>(𝑏) =

: ; ∑( ∑V∈` Π>∈ / 𝑦> ((𝑏>) ⋅ 𝑣>(𝑏)

Proof:

ØDerive player 𝑗’s expected utility from 𝜌;

  • Theorem. If all players use no-swap-regret learning algorithms with

strategy sequence 𝑦>

( (∈[;] for 𝑗. The following recommendation

policy 𝜌; converges to a CE: 𝜌; 𝑏 =

: ; ∑( Π>∈ / 𝑦> ((𝑏>) , ∀ 𝑏 ∈ 𝐵.

From No Swap Regret to Correlated Equ

slide-18
SLIDE 18

18

∑V∈`

: ; ∑( Π>∈ / 𝑦> ((𝑏>)

⋅ 𝑣>(𝑏) =

: ; ∑( ∑V∈` Π>∈ / 𝑦> ((𝑏>) ⋅ 𝑣>(𝑏)

=

: ; ∑( 𝑣>(𝑦> (, 𝑦Y> ( )

Proof:

ØDerive player 𝑗’s expected utility from 𝜌;

  • Theorem. If all players use no-swap-regret learning algorithms with

strategy sequence 𝑦>

( (∈[;] for 𝑗. The following recommendation

policy 𝜌; converges to a CE: 𝜌; 𝑏 =

: ; ∑( Π>∈ / 𝑦> ((𝑏>) , ∀ 𝑏 ∈ 𝐵.

From No Swap Regret to Correlated Equ

slide-19
SLIDE 19

19

∑V∈`

: ; ∑( Π>∈ / 𝑦> ((𝑏>)

⋅ 𝑣>(𝑏) =

: ; ∑( ∑V∈` Π>∈ / 𝑦> ((𝑏>) ⋅ 𝑣>(𝑏)

=

: ; ∑( 𝑣>(𝑦> (, 𝑦Y> ( )

Proof:

ØDerive player 𝑗’s expected utility from 𝜌;

  • Theorem. If all players use no-swap-regret learning algorithms with

strategy sequence 𝑦>

( (∈[;] for 𝑗. The following recommendation

policy 𝜌; converges to a CE: 𝜌; 𝑏 =

: ; ∑( Π>∈ / 𝑦> ((𝑏>) , ∀ 𝑏 ∈ 𝐵.

=

: ; ∑VX∈`X ∑(T: ;

𝑣> 𝑏>, 𝑦Y>

(

⋅ 𝑦>

((𝑏>)

From No Swap Regret to Correlated Equ

ØPlayer 𝑗’s expected utility conditioned on being recommended 𝑏> is

: ; ∑(T: ;

𝑣> 𝑏>, 𝑦Y>

(

⋅ 𝑦>

((𝑏>)

(normalization factor omitted)

slide-20
SLIDE 20

20

Proof:

ØThe CE condition requires for all player 𝑗 and all 𝑏> ∈ 𝐵>

  • Theorem. If all players use no-swap-regret learning algorithms with

strategy sequence 𝑦>

( (∈[;] for 𝑗. The following recommendation

policy 𝜌; converges to a CE: 𝜌; 𝑏 =

: ; ∑( Π>∈ / 𝑦> ((𝑏>) , ∀ 𝑏 ∈ 𝐵.

From No Swap Regret to Correlated Equ

: ; ∑(T: ;

𝑣> 𝑡(𝑏>), 𝑦Y>

(

⋅ 𝑦>

( 𝑏> , ∀𝑡 𝑏> ∈ 𝐵> : ; ∑(T: ;

𝑣> 𝑏>, 𝑦Y>

(

⋅ 𝑦>

( 𝑏>

slide-21
SLIDE 21

21

Proof:

ØThe CE condition requires for all player 𝑗 and all 𝑏> ∈ 𝐵>

  • Theorem. If all players use no-swap-regret learning algorithms with

strategy sequence 𝑦>

( (∈[;] for 𝑗. The following recommendation

policy 𝜌; converges to a CE: 𝜌; 𝑏 =

: ; ∑( Π>∈ / 𝑦> ((𝑏>) , ∀ 𝑏 ∈ 𝐵.

From No Swap Regret to Correlated Equ

: ; ∑(T: ;

𝑣> 𝑡(𝑏>), 𝑦Y>

(

⋅ 𝑦>

( 𝑏> , ∀𝑡 𝑏> ∈ 𝐵>

ØLet 𝑡∗ be the optimal swap function in the swap regret:

𝑡𝑥𝑆;

> = max I

∑(T:

;

∑VX∈`X[𝑣> 𝑡 𝑏> , 𝑦Y> − 𝑣> 𝑏>, 𝑦Y>

(

] ⋅ 𝑦>

((𝑏>)

= ∑VX ∑(T:

;

[𝑣> 𝑡∗ 𝑏> , 𝑦Y> − 𝑣> 𝑏>, 𝑦Y>

(

] ⋅ 𝑦>

((𝑏>)

≥ ∑(T:

;

𝑣> 𝑡∗ 𝑏> , 𝑦Y> − 𝑣> 𝑏>, 𝑦Y>

(

⋅ 𝑦>

( 𝑏> ,

∀𝑏>

: ; ∑(T: ;

𝑣> 𝑏>, 𝑦Y>

(

⋅ 𝑦>

( 𝑏>

slide-22
SLIDE 22

22

Proof:

ØThe CE condition requires for all player 𝑗 and all 𝑏> ∈ 𝐵>

  • Theorem. If all players use no-swap-regret learning algorithms with

strategy sequence 𝑦>

( (∈[;] for 𝑗. The following recommendation

policy 𝜌; converges to a CE: 𝜌; 𝑏 =

: ; ∑( Π>∈ / 𝑦> ((𝑏>) , ∀ 𝑏 ∈ 𝐵.

From No Swap Regret to Correlated Equ

: ; ∑(T: ;

𝑣> 𝑏>, 𝑦Y>

(

⋅ 𝑦>

( 𝑏> ≥ : ; ∑(T: ;

𝑣> 𝑡(𝑏>), 𝑦Y>

(

⋅ 𝑦>

( 𝑏> , ∀𝑡 𝑏> ∈ 𝐵>

ØLet 𝑡∗ be the optimal swap function in the swap regret:

𝑡𝑥𝑆;

> ≥ ∑(T: ;

𝑣> 𝑡∗ 𝑏> , 𝑦Y> − 𝑣> 𝑏>, 𝑦Y>

(

⋅ 𝑦>

( 𝑏> ,

∀𝑏>

slide-23
SLIDE 23

23

Proof:

ØThe CE condition requires for all player 𝑗 and all 𝑏> ∈ 𝐵>

  • Theorem. If all players use no-swap-regret learning algorithms with

strategy sequence 𝑦>

( (∈[;] for 𝑗. The following recommendation

policy 𝜌; converges to a CE: 𝜌; 𝑏 =

: ; ∑( Π>∈ / 𝑦> ((𝑏>) , ∀ 𝑏 ∈ 𝐵.

From No Swap Regret to Correlated Equ

: ; ∑(T: ;

𝑣> 𝑏>, 𝑦Y>

(

⋅ 𝑦>

( 𝑏> ≥ : ; ∑(T: ;

𝑣> 𝑡(𝑏>), 𝑦Y>

(

⋅ 𝑦>

( 𝑏> , ∀𝑡 𝑏> ∈ 𝐵>

ØLet 𝑡∗ be the optimal swap function in the swap regret:

𝑡𝑥𝑆;

> ≥ ∑(T: ;

𝑣> 𝑡∗ 𝑏> , 𝑦Y> − 𝑣> 𝑏>, 𝑦Y>

(

⋅ 𝑦>

( 𝑏> ,

∀𝑏>

ØFrom Fact 2 before, optimal swap function 𝑡∗ satisfies

𝑡∗ 𝑏> = arg max

I VX ∈`X

∑(T:

;

𝑣> 𝑡 𝑏> , 𝑦Y> − 𝑣> 𝑏>, 𝑦Y>

(

⋅ 𝑦>

((𝑏>)

ØThis implies

𝑡𝑥𝑆;

> ≥ ∑(T: ;

𝑣> 𝑡 𝑏> , 𝑦Y> − 𝑣> 𝑏>, 𝑦Y>

(

⋅ 𝑦>

( 𝑏> ,

∀𝑏> and 𝑡(𝑏>) Thm follows by diving both sides by 𝑈(→ ∞)

slide-24
SLIDE 24

24

Outline

Ø (External) Regret vs Swap Regret Ø Convergence to Correlated Equilibrium Ø Converting Regret Bounds to Swap Regret Bounds

slide-25
SLIDE 25

25

Good External Regret ≠ Good Swap Regret

ØAn algorithm with small swap regret also has small external regret ØThe reverse is not true – an algorithm with small external regret

does not necessarily have small swap regret

  • Examples are not difficult to construct

Do there exist online learning algorithms with sublinear regret?

slide-26
SLIDE 26

26

Ø𝐼 utilizes 𝐵 but is different and more complicated ØThere exists no-swap-regret online learning algorithm

  • Since there exists online algorithm with O( 𝑈 ln 𝑜) regret
  • Theorem. Any online algorithm 𝐵 with external regret 𝑆 can be

converted to another online algorithm 𝐼 swap regret 𝑜𝑆. 𝑜 = number of actions

slide-27
SLIDE 27

27

Proof Overview:

ØThe idea starts from the following observations

Let 𝑡∗ be the optimal swap function, then: 𝑡𝑥𝑆; = max

I

∑(∈ ; ∑>∈[/][𝑑( 𝑗 − 𝑑((𝑡(𝑗))]𝑞((𝑗) = ∑>∈[/] ∑(∈ ; [𝑑( 𝑗 − 𝑑((𝑡∗(𝑗))]𝑞((𝑗)

  • Theorem. Any online algorithm 𝐵 with external regret 𝑆 can be

converted to another online algorithm 𝐼 swap regret 𝑜𝑆.

slide-28
SLIDE 28

28

Proof Overview:

ØThe idea starts from the following observations

Let 𝑡∗ be the optimal swap function, then: 𝑡𝑥𝑆; = max

I

∑(∈ ; ∑>∈[/][𝑑( 𝑗 − 𝑑((𝑡(𝑗))]𝑞((𝑗) = ∑>∈[/] ∑(∈ ; [𝑑( 𝑗 − 𝑑((𝑡∗(𝑗))]𝑞((𝑗) Two observations:

1.

The red terms “looks like” an external regret term

  • Swap to a single action, but ∑(∈ ; 𝑑( 𝑗 𝑞((𝑗) does not look quite right yet

2.

If the red term is less than 𝑆 for any 𝑗, then we are done

  • Theorem. Any online algorithm 𝐵 with external regret 𝑆 can be

converted to another online algorithm 𝐼 swap regret 𝑜𝑆. regret from action 𝑗’s swap

slide-29
SLIDE 29

29

  • Theorem. Any online algorithm 𝐵 with external regret 𝑆 can be

converted to another online algorithm 𝐼 swap regret 𝑜𝑆. Proof Step 1: constructing 𝐼

ØMake 𝑜 copies of algorithm 𝐵 as 𝐵:, ⋯ , 𝐵/

  • Intuitively, 𝐵> takes care of the regret from action 𝑗’s swap

ØConstruction of 𝐼

  • At round 𝑢, 𝐼 picks action 𝑗 with probability 𝑞((𝑗) (to be designed)
  • Let 𝑟(

> ∈ Δ/ be the randomized action of 𝐵> generated at round 𝑢

  • Choose 𝑞((𝑗) ∈ [0,1] to satisfy the following:

∑> 𝑞((𝑗) = 1 ∑> 𝑞( 𝑗 𝑟(

>(𝑘) = 𝑞((𝑘) , ∀𝑘 ∈ [𝑜]

𝑞( is a distribution 𝑞( is stationary

That is, following two ways for 𝐼 to select actions are equivalent

  • 1. Select 𝑗 with probability 𝑞((𝑗)
  • 2. Select algorithm 𝐵> with prob 𝑞((𝑗), then use 𝐵> to pick an action
slide-30
SLIDE 30

30

  • Theorem. Any online algorithm 𝐵 with external regret 𝑆 can be

converted to another online algorithm 𝐼 swap regret 𝑜𝑆. Proof Step 1: constructing 𝐼

ØMake 𝑜 copies of algorithm 𝐵 as 𝐵:, ⋯ , 𝐵/

  • Intuitively, 𝐵> takes care of the regret from action 𝑗’s swap

ØConstruction of 𝐼

  • At round 𝑢, 𝐼 picks action 𝑗 with probability 𝑞((𝑗) (to be designed)
  • Let 𝑟(

> ∈ Δ/ be the randomized action of 𝐵> generated at round 𝑢

  • Choose 𝑞((𝑗) ∈ [0,1] to satisfy the following:
  • After observing cost vector 𝑑(, allocate 𝑞((𝑗) ⋅ 𝑑( as the “simulated

cost” to algorithm 𝐵> for its future use ∑> 𝑞((𝑗) = 1 ∑> 𝑞( 𝑗 𝑟(

>(𝑘) = 𝑞((𝑘) , ∀𝑘 ∈ [𝑜]

𝑞( is a distribution 𝑞( is stationary

slide-31
SLIDE 31

31

  • Theorem. Any online algorithm 𝐵 with external regret 𝑆 can be

converted to another online algorithm 𝐼 swap regret 𝑜𝑆. Proof Step 2: deriving regret bound

Ø𝐵> has external regret 𝑆, so

∑(∈ ; ∑7 𝑟(

>(𝑘) [𝑞( 𝑗 𝑑( 𝑘 − 𝑞( 𝑗 𝑑((𝑘′)] ≤ 𝑆

∀𝑘[ ∈ 𝑜 (1)

ØSwap regret of 𝐼

By our construction: ∑> 𝑞( 𝑗 𝑟(

>(𝑘) = 𝑞((𝑘) , ∀𝑘 ∈ [𝑜]

Need to somehow relate 𝑡𝑥𝑆; to 𝑟(

>’s, because Inequality (1)

is the only bound we have 𝑡𝑥𝑆; = max

I

∑(∈ ; ∑7∈[/] 𝑞((𝑘)[𝑑( 𝑘 − 𝑑((𝑡(𝑘))]

slide-32
SLIDE 32

32

  • Theorem. Any online algorithm 𝐵 with external regret 𝑆 can be

converted to another online algorithm 𝐼 swap regret 𝑜𝑆. Proof Step 2: deriving regret bound

Ø𝐵> has external regret 𝑆, so

∑(∈ ; ∑7 𝑟(

>(𝑘) [𝑞( 𝑗 𝑑( 𝑘 − 𝑞( 𝑗 𝑑((𝑘′)] ≤ 𝑆

∀𝑘[ ∈ 𝑜 (1)

ØSwap regret of 𝐼

By our construction: ∑> 𝑞( 𝑗 𝑟(

>(𝑘) = 𝑞((𝑘) , ∀𝑘 ∈ [𝑜]

= max

I

∑(∈ ; ∑7∈[/] ∑> 𝑞( 𝑗 𝑟(

>(𝑘) [𝑑( 𝑘 − 𝑑((𝑡(𝑘))]

𝑡𝑥𝑆; = max

I

∑(∈ ; ∑7∈[/] 𝑞((𝑘)[𝑑( 𝑘 − 𝑑((𝑡(𝑘))]

slide-33
SLIDE 33

33

  • Theorem. Any online algorithm 𝐵 with external regret 𝑆 can be

converted to another online algorithm 𝐼 swap regret 𝑜𝑆. Proof Step 2: deriving regret bound

Ø𝐵> has external regret 𝑆, so

∑(∈ ; ∑7 𝑟(

>(𝑘) [𝑞( 𝑗 𝑑( 𝑘 − 𝑞( 𝑗 𝑑((𝑘′)] ≤ 𝑆

∀𝑘[ ∈ 𝑜 (1)

ØSwap regret of 𝐼

= max

I

∑>(∑(∈ ; ∑7∈[/] 𝑞( 𝑗 𝑟(

>(𝑘)[𝑑( 𝑘 − 𝑑((𝑡(𝑘))] )

= max

I

∑(∈ ; ∑7∈[/] ∑> 𝑞( 𝑗 𝑟(

> 𝑘 [𝑑( 𝑘 − 𝑑( 𝑡 𝑘

] 𝑡𝑥𝑆; = max

I

∑(∈ ; ∑7∈[/] 𝑞((𝑘)[𝑑( 𝑘 − 𝑑((𝑡(𝑘))]

slide-34
SLIDE 34

34

  • Theorem. Any online algorithm 𝐵 with external regret 𝑆 can be

converted to another online algorithm 𝐼 swap regret 𝑜𝑆. Proof Step 2: deriving regret bound

Ø𝐵> has external regret 𝑆, so

∑(∈ ; ∑7 𝑟(

>(𝑘) [𝑞( 𝑗 𝑑( 𝑘 − 𝑞( 𝑗 𝑑((𝑘′)] ≤ 𝑆

∀𝑘[ ∈ 𝑜 (1)

ØSwap regret of 𝐼

= max

I

∑> ∑(∈ ; ∑7∈[/] 𝑞( 𝑗 𝑟(

>(𝑘)[𝑑( 𝑘 − 𝑑((𝑡(𝑘))]

= max

I

∑(∈ ; ∑7∈[/] ∑> 𝑞( 𝑗 𝑟(

>(𝑘) [𝑑( 𝑘 − 𝑑((𝑡(𝑘))]

𝑡𝑥𝑆; = max

I

∑(∈ ; ∑7∈[/] 𝑞((𝑘)[𝑑( 𝑘 − 𝑑((𝑡(𝑘))] ≤ 𝑜 ⋅ 𝑆

slide-35
SLIDE 35

Thank You

Haifeng Xu

University of Virginia hx4ad@virginia.edu