[PPT] - Announcements HW 1 deadline is postponed to next Tuesday before PowerPoint Presentation

SLIDE 1

1

Announcements

ØHW 1 deadline is postponed to next Tuesday before class, e.g.,

3:30 pm

1

Announcements

ØHW 1 deadline is postponed to next Tuesday before class, e.g.,

3:30 pm

SLIDE 2

CS6501: T

pics in Learning and Game Theory

(Fall 2019)

Swap Regret and Convergence to CE

Instructor: Haifeng Xu

SLIDE 3

3

Outline

Ø (External) Regret vs Swap Regret Ø Convergence to Correlated Equilibrium Ø Converting Regret Bounds to Swap Regret Bounds

SLIDE 4

4

Recap: Online Learning

At each time step 𝑢 = 1, ⋯ , 𝑈, the following occurs in order:

1.

Learner picks a distribution 𝑞( over actions [𝑜]

2.

Adversary picks cost vector 𝑑( ∈ 0,1 /

3.

Action 𝑗( ∼ 𝑞( is chosen and learner incurs cost 𝑑((𝑗()

4.

Learner observes 𝑑( (for use in future time steps)

SLIDE 5

5

Recap: (External) Regret

ØExternal regret ØBenchmark min

7∈[/] ∑( 𝑑((𝑘) is the learner utility had he known 𝑑:, ⋯ , 𝑑;

and is allowed to take the best single action across all rounds

ØDescribes how much the learner regrets, had he known the cost

vector 𝑑:, ⋯ , 𝑑; in hindsight

𝑆; = 𝔽>?∼@? ∑(∈[;] 𝑑( 𝑗( − min

7∈[/] ∑(∈[;] 𝑑((𝑘)

SLIDE 6

6

Recap: (External) Regret

ØA closer look at external regret

= max

7∈[/] ∑(∈ ; ∑>∈[/] 𝑑( 𝑗 𝑞((𝑗) − ∑(∈[;] 𝑑((𝑘)

𝑆; = 𝔽>?∼@? ∑(∈[;] 𝑑( 𝑗( − min

7∈[/] ∑(∈[;] 𝑑((𝑘)

= ∑(∈ ; ∑>∈[/] 𝑑( 𝑗 𝑞((𝑗) − min

7∈[/] ∑(∈[;] 𝑑((𝑘)

= max

7∈[/] ∑(∈ ; ∑>∈[/][𝑑( 𝑗 − 𝑑((𝑘)]𝑞((𝑗)

Many-to-one action swap

SLIDE 7

7

Recap: (External) Regret

ØA closer look at external regret

= max

7∈[/] ∑(∈ ; ∑>∈[/] 𝑑( 𝑗 𝑞((𝑗) − ∑(∈[;] 𝑑((𝑘)

𝑆; = 𝔽>?∼@? ∑(∈[;] 𝑑( 𝑗( − min

7∈[/] ∑(∈[;] 𝑑((𝑘)

= ∑(∈ ; ∑>∈[/] 𝑑( 𝑗 𝑞((𝑗) − min

7∈[/] ∑(∈[;] 𝑑((𝑘)

= max

7∈[/] ∑(∈ ; ∑>∈[/][𝑑( 𝑗 − 𝑑((𝑘)]𝑞((𝑗)

ØIn external regret, learner is allowed to swap to a single action 𝑘

and can choose the best 𝑘 in hindsight

SLIDE 8

8

Swap Regret

ØA closer look at external regret

𝑆;

ØSwap regret allows many-to-many action swap

E.g., 𝑡 1 = 2, 𝑡 2 = 1, 𝑡 3 = 4, 𝑡 4 = 4

ØFormally,

where max is over all possible swap functions

Ø𝑜/ many swap functions, each action 𝑗 has 𝑜 choices to swap to ØQuiz: how many many-to-one swaps?

𝑑((𝑡(𝑗))

𝑡𝑥𝑆; = max

I

∑(∈ ; ∑>∈[/][𝑑( 𝑗 − 𝑑((𝑡(𝑗))]𝑞((𝑗) = max

7∈[/] ∑(∈ ; ∑>∈[/][𝑑( 𝑗 − 𝑑((𝑘)]𝑞((𝑗)

SLIDE 9

9

Some Facts about Swap Regret

Recall swap regret

𝑡𝑥𝑆; = max

I

∑(∈ ; ∑>∈[/][𝑑( 𝑗 − 𝑑((𝑡(𝑗))]𝑞((𝑗)

Fact 1. For any algorithm: 𝑡𝑥𝑆; ≥ 𝑆; Proof:

Ø𝑡(𝑗) only affects term ∑(∈ ; [𝑑( 𝑗 − 𝑑((𝑡(𝑗))]𝑞((𝑗), so should be

picked to maximize this term Fact 2. For any algorithm execution 𝑞:, ⋯ , 𝑞;, the optimal swap function 𝑡∗ satisfies, for any 𝑗, 𝑡∗ 𝑗 = arg max

7∈[/] ∑(∈ ; [𝑑( 𝑗 − 𝑑((𝑘)]𝑞((𝑗)

SLIDE 10

10

Some Facts about Swap Regret

Remarks:

ØThe optimal swap can be decided “independently” for each 𝑗

Fact 1. For any algorithm: 𝑡𝑥𝑆; ≥ 𝑆; Fact 2. For any algorithm execution 𝑞:, ⋯ , 𝑞;, the optimal swap function 𝑡∗ satisfies, for any 𝑗, 𝑡∗ 𝑗 = arg max

7∈[/] ∑(∈ ; [𝑑( 𝑗 − 𝑑((𝑘)]𝑞((𝑗)

SLIDE 11

11

Some Facts about Swap Regret

Remarks:

ØBenchmark of swap regret depends on the algorithm execution

𝑞:, ⋯ , 𝑞;, but benchmark of external regret does not.

ØThis raises a subtle issue: an algorithm minimize swap regret

does not necessarily minimize the total loss

An algorithm may intentionally take less actions so the benchmark

does not have many opportunities to swap

Fact 1. For any algorithm: 𝑡𝑥𝑆; ≥ 𝑆; Fact 2. For any algorithm execution 𝑞:, ⋯ , 𝑞;, the optimal swap function 𝑡∗ satisfies, for any 𝑗, 𝑡∗ 𝑗 = arg max

7∈[/] ∑(∈ ; [𝑑( 𝑗 − 𝑑((𝑘)]𝑞((𝑗)

SLIDE 12

12

Some Facts about Swap Regret

Fact 1. For any algorithm: 𝑡𝑥𝑆; ≥ 𝑆; Fact 2. For any algorithm execution 𝑞:, ⋯ , 𝑞;, the optimal swap function 𝑡∗ satisfies, for any 𝑗, 𝑡∗ 𝑗 = arg max

7∈[/] ∑(∈ ; [𝑑( 𝑗 − 𝑑((𝑘)]𝑞((𝑗)

is also called the internal regret

max

>∈[/] max 7∈[/] ∑(∈ ; [𝑑( 𝑗 − 𝑑((𝑘)]𝑞((𝑗)

pick worst 𝑗

SLIDE 13

13

Outline

Ø (External) Regret vs Swap Regret Ø Convergence to Correlated Equilibrium Ø Converting Regret Bounds to Swap Regret Bounds

SLIDE 14

14

Recap: Normal-Form Games and CE

Ø 𝑜 players, denoted by set 𝑜 = {1, ⋯ , 𝑜} Ø Player 𝑗 takes action 𝑏> ∈ 𝐵> Ø Player utility depends on the outcome of the game, i.e., an action

profile 𝑏 = (𝑏:, ⋯ , 𝑏/)

Player 𝑗 receives payoff 𝑣>(𝑏) for any outcome 𝑏 ∈ Π>T:

/ 𝐵>

Ø Correlated equilibrium is an action recommendation policy

A recommendation policy 𝜌 is a correlated equilibrium if

∑VWX 𝑣> 𝑏>, 𝑏Y> ⋅ 𝜌(𝑏>, 𝑏Y>) ≥ ∑VWX 𝑣> 𝑏[>, 𝑏Y> ⋅ 𝜌 𝑏>, 𝑏Y> , ∀ 𝑏[> ∈ 𝐵>, ∀𝑗 ∈ 𝑜 .

Ø That is, for any recommended action 𝑏>, player 𝑗 does not want

to “swap” to another 𝑏>

[

SLIDE 15

15

Repeated Games with No-Swap-Regret Players

ØThe game is played repeatedly for 𝑈 rounds ØEach player uses an online learning algorithm to select a mixed

strategy at each round 𝑢

ØFor any player 𝑗’s perspective, the following occurs in order at 𝑢

Picks a mixed strategy 𝑦>

( ∈ Δ|`X| over actions in 𝐵>

Any other player 𝑘 ≠ 𝑗 picks a mixed strategy 𝑦7

( ∈ Δ|`b|

Player 𝑗 receives expected utility 𝑣> 𝑦>

(, 𝑦Y> (

= 𝔽V∼(cX

?,cWX ? ) 𝑣>(𝑏)

Player 𝑗 learns 𝑦Y>

( (for future use)

SLIDE 16

16

From No Swap Regret to Correlated Equ

Remarks:

ØIn mixed strategy profile 𝑦:

(, 𝑦d (, ⋯ , 𝑦/ ( , prob. of 𝑏 is Π>∈ / 𝑦> ((𝑏>)

Ø𝜌;(𝑏) is simply the average of Π>∈ / 𝑦>

((𝑏>) over 𝑈 rounds

Theorem. If all players use no-swap-regret learning algorithms with

strategy sequence 𝑦>

( (∈[;] for 𝑗. The following recommendation

policy 𝜌; converges to a CE: 𝜌; 𝑏 =

: ; ∑( Π>∈ / 𝑦> ((𝑏>) , ∀ 𝑏 ∈ 𝐵.

SLIDE 17

17

∑V∈`

: ; ∑( Π>∈ / 𝑦> ((𝑏>)

⋅ 𝑣>(𝑏) =

: ; ∑( ∑V∈` Π>∈ / 𝑦> ((𝑏>) ⋅ 𝑣>(𝑏)

Proof:

ØDerive player 𝑗’s expected utility from 𝜌;

Theorem. If all players use no-swap-regret learning algorithms with

strategy sequence 𝑦>

( (∈[;] for 𝑗. The following recommendation

policy 𝜌; converges to a CE: 𝜌; 𝑏 =

: ; ∑( Π>∈ / 𝑦> ((𝑏>) , ∀ 𝑏 ∈ 𝐵.

From No Swap Regret to Correlated Equ

SLIDE 18

18

∑V∈`

: ; ∑( Π>∈ / 𝑦> ((𝑏>)

⋅ 𝑣>(𝑏) =

: ; ∑( ∑V∈` Π>∈ / 𝑦> ((𝑏>) ⋅ 𝑣>(𝑏)

=

: ; ∑( 𝑣>(𝑦> (, 𝑦Y> ( )

Proof:

ØDerive player 𝑗’s expected utility from 𝜌;

Theorem. If all players use no-swap-regret learning algorithms with

strategy sequence 𝑦>

( (∈[;] for 𝑗. The following recommendation

policy 𝜌; converges to a CE: 𝜌; 𝑏 =

: ; ∑( Π>∈ / 𝑦> ((𝑏>) , ∀ 𝑏 ∈ 𝐵.

From No Swap Regret to Correlated Equ

SLIDE 19

19

∑V∈`

: ; ∑( Π>∈ / 𝑦> ((𝑏>)

⋅ 𝑣>(𝑏) =

: ; ∑( ∑V∈` Π>∈ / 𝑦> ((𝑏>) ⋅ 𝑣>(𝑏)

=

: ; ∑( 𝑣>(𝑦> (, 𝑦Y> ( )

Proof:

ØDerive player 𝑗’s expected utility from 𝜌;

Theorem. If all players use no-swap-regret learning algorithms with

strategy sequence 𝑦>

( (∈[;] for 𝑗. The following recommendation

policy 𝜌; converges to a CE: 𝜌; 𝑏 =

: ; ∑( Π>∈ / 𝑦> ((𝑏>) , ∀ 𝑏 ∈ 𝐵.

=

: ; ∑VX∈`X ∑(T: ;

𝑣> 𝑏>, 𝑦Y>

(

⋅ 𝑦>

((𝑏>)

From No Swap Regret to Correlated Equ

ØPlayer 𝑗’s expected utility conditioned on being recommended 𝑏> is

: ; ∑(T: ;

𝑣> 𝑏>, 𝑦Y>

(

⋅ 𝑦>

((𝑏>)

(normalization factor omitted)

SLIDE 20

20

Proof:

ØThe CE condition requires for all player 𝑗 and all 𝑏> ∈ 𝐵>

Theorem. If all players use no-swap-regret learning algorithms with

strategy sequence 𝑦>

( (∈[;] for 𝑗. The following recommendation

policy 𝜌; converges to a CE: 𝜌; 𝑏 =

: ; ∑( Π>∈ / 𝑦> ((𝑏>) , ∀ 𝑏 ∈ 𝐵.

From No Swap Regret to Correlated Equ

≥

: ; ∑(T: ;

𝑣> 𝑡(𝑏>), 𝑦Y>

(

⋅ 𝑦>

( 𝑏> , ∀𝑡 𝑏> ∈ 𝐵> : ; ∑(T: ;

𝑣> 𝑏>, 𝑦Y>

(

⋅ 𝑦>

( 𝑏>

SLIDE 21

21

Proof:

ØThe CE condition requires for all player 𝑗 and all 𝑏> ∈ 𝐵>

Theorem. If all players use no-swap-regret learning algorithms with

strategy sequence 𝑦>

( (∈[;] for 𝑗. The following recommendation

policy 𝜌; converges to a CE: 𝜌; 𝑏 =

: ; ∑( Π>∈ / 𝑦> ((𝑏>) , ∀ 𝑏 ∈ 𝐵.

From No Swap Regret to Correlated Equ

≥

: ; ∑(T: ;

𝑣> 𝑡(𝑏>), 𝑦Y>

(

⋅ 𝑦>

( 𝑏> , ∀𝑡 𝑏> ∈ 𝐵>

ØLet 𝑡∗ be the optimal swap function in the swap regret:

𝑡𝑥𝑆;

> = max I

∑(T:

;

∑VX∈`X[𝑣> 𝑡 𝑏> , 𝑦Y> − 𝑣> 𝑏>, 𝑦Y>

(

] ⋅ 𝑦>

((𝑏>)

= ∑VX ∑(T:

;

[𝑣> 𝑡∗ 𝑏> , 𝑦Y> − 𝑣> 𝑏>, 𝑦Y>

(

] ⋅ 𝑦>

((𝑏>)

≥ ∑(T:

;

𝑣> 𝑡∗ 𝑏> , 𝑦Y> − 𝑣> 𝑏>, 𝑦Y>

(

⋅ 𝑦>

( 𝑏> ,

∀𝑏>

: ; ∑(T: ;

𝑣> 𝑏>, 𝑦Y>

(

⋅ 𝑦>

( 𝑏>

SLIDE 22

22

Proof:

ØThe CE condition requires for all player 𝑗 and all 𝑏> ∈ 𝐵>

Theorem. If all players use no-swap-regret learning algorithms with

strategy sequence 𝑦>

( (∈[;] for 𝑗. The following recommendation

policy 𝜌; converges to a CE: 𝜌; 𝑏 =

: ; ∑( Π>∈ / 𝑦> ((𝑏>) , ∀ 𝑏 ∈ 𝐵.

From No Swap Regret to Correlated Equ

: ; ∑(T: ;

𝑣> 𝑏>, 𝑦Y>

(

⋅ 𝑦>

( 𝑏> ≥ : ; ∑(T: ;

𝑣> 𝑡(𝑏>), 𝑦Y>

(

⋅ 𝑦>

( 𝑏> , ∀𝑡 𝑏> ∈ 𝐵>

ØLet 𝑡∗ be the optimal swap function in the swap regret:

𝑡𝑥𝑆;

> ≥ ∑(T: ;

𝑣> 𝑡∗ 𝑏> , 𝑦Y> − 𝑣> 𝑏>, 𝑦Y>

(

⋅ 𝑦>

( 𝑏> ,

∀𝑏>

SLIDE 23

23

Proof:

ØThe CE condition requires for all player 𝑗 and all 𝑏> ∈ 𝐵>

Theorem. If all players use no-swap-regret learning algorithms with

strategy sequence 𝑦>

( (∈[;] for 𝑗. The following recommendation

policy 𝜌; converges to a CE: 𝜌; 𝑏 =

: ; ∑( Π>∈ / 𝑦> ((𝑏>) , ∀ 𝑏 ∈ 𝐵.

From No Swap Regret to Correlated Equ

: ; ∑(T: ;

𝑣> 𝑏>, 𝑦Y>

(

⋅ 𝑦>

( 𝑏> ≥ : ; ∑(T: ;

𝑣> 𝑡(𝑏>), 𝑦Y>

(

⋅ 𝑦>

( 𝑏> , ∀𝑡 𝑏> ∈ 𝐵>

ØLet 𝑡∗ be the optimal swap function in the swap regret:

𝑡𝑥𝑆;

> ≥ ∑(T: ;

𝑣> 𝑡∗ 𝑏> , 𝑦Y> − 𝑣> 𝑏>, 𝑦Y>

(

⋅ 𝑦>

( 𝑏> ,

∀𝑏>

ØFrom Fact 2 before, optimal swap function 𝑡∗ satisfies

𝑡∗ 𝑏> = arg max

I VX ∈`X

∑(T:

;

𝑣> 𝑡 𝑏> , 𝑦Y> − 𝑣> 𝑏>, 𝑦Y>

(

⋅ 𝑦>

((𝑏>)

ØThis implies

𝑡𝑥𝑆;

> ≥ ∑(T: ;

𝑣> 𝑡 𝑏> , 𝑦Y> − 𝑣> 𝑏>, 𝑦Y>

(

⋅ 𝑦>

( 𝑏> ,

∀𝑏> and 𝑡(𝑏>) Thm follows by diving both sides by 𝑈(→ ∞)

SLIDE 24

24

Outline

Ø (External) Regret vs Swap Regret Ø Convergence to Correlated Equilibrium Ø Converting Regret Bounds to Swap Regret Bounds

SLIDE 25

25

Good External Regret ≠ Good Swap Regret

ØAn algorithm with small swap regret also has small external regret ØThe reverse is not true – an algorithm with small external regret

does not necessarily have small swap regret

Examples are not difficult to construct

Do there exist online learning algorithms with sublinear regret?

SLIDE 26

26

Ø𝐼 utilizes 𝐵 but is different and more complicated ØThere exists no-swap-regret online learning algorithm

Since there exists online algorithm with O( 𝑈 ln 𝑜) regret
Theorem. Any online algorithm 𝐵 with external regret 𝑆 can be

converted to another online algorithm 𝐼 swap regret 𝑜𝑆. 𝑜 = number of actions

SLIDE 27

27

Proof Overview:

ØThe idea starts from the following observations

Let 𝑡∗ be the optimal swap function, then: 𝑡𝑥𝑆; = max

I

∑(∈ ; ∑>∈[/][𝑑( 𝑗 − 𝑑((𝑡(𝑗))]𝑞((𝑗) = ∑>∈[/] ∑(∈ ; [𝑑( 𝑗 − 𝑑((𝑡∗(𝑗))]𝑞((𝑗)

Theorem. Any online algorithm 𝐵 with external regret 𝑆 can be

converted to another online algorithm 𝐼 swap regret 𝑜𝑆.

SLIDE 28

28

Proof Overview:

ØThe idea starts from the following observations

Let 𝑡∗ be the optimal swap function, then: 𝑡𝑥𝑆; = max

I

∑(∈ ; ∑>∈[/][𝑑( 𝑗 − 𝑑((𝑡(𝑗))]𝑞((𝑗) = ∑>∈[/] ∑(∈ ; [𝑑( 𝑗 − 𝑑((𝑡∗(𝑗))]𝑞((𝑗) Two observations:

1.

The red terms “looks like” an external regret term

Swap to a single action, but ∑(∈ ; 𝑑( 𝑗 𝑞((𝑗) does not look quite right yet

2.

If the red term is less than 𝑆 for any 𝑗, then we are done

Theorem. Any online algorithm 𝐵 with external regret 𝑆 can be

converted to another online algorithm 𝐼 swap regret 𝑜𝑆. regret from action 𝑗’s swap

SLIDE 29

29

Theorem. Any online algorithm 𝐵 with external regret 𝑆 can be

converted to another online algorithm 𝐼 swap regret 𝑜𝑆. Proof Step 1: constructing 𝐼

ØMake 𝑜 copies of algorithm 𝐵 as 𝐵:, ⋯ , 𝐵/

Intuitively, 𝐵> takes care of the regret from action 𝑗’s swap

ØConstruction of 𝐼

At round 𝑢, 𝐼 picks action 𝑗 with probability 𝑞((𝑗) (to be designed)
Let 𝑟(

> ∈ Δ/ be the randomized action of 𝐵> generated at round 𝑢

Choose 𝑞((𝑗) ∈ [0,1] to satisfy the following:

∑> 𝑞((𝑗) = 1 ∑> 𝑞( 𝑗 𝑟(

>(𝑘) = 𝑞((𝑘) , ∀𝑘 ∈ [𝑜]

𝑞( is a distribution 𝑞( is stationary

That is, following two ways for 𝐼 to select actions are equivalent

1. Select 𝑗 with probability 𝑞((𝑗)
2. Select algorithm 𝐵> with prob 𝑞((𝑗), then use 𝐵> to pick an action

SLIDE 30

30

Theorem. Any online algorithm 𝐵 with external regret 𝑆 can be

converted to another online algorithm 𝐼 swap regret 𝑜𝑆. Proof Step 1: constructing 𝐼

ØMake 𝑜 copies of algorithm 𝐵 as 𝐵:, ⋯ , 𝐵/

Intuitively, 𝐵> takes care of the regret from action 𝑗’s swap

ØConstruction of 𝐼

At round 𝑢, 𝐼 picks action 𝑗 with probability 𝑞((𝑗) (to be designed)
Let 𝑟(

> ∈ Δ/ be the randomized action of 𝐵> generated at round 𝑢

Choose 𝑞((𝑗) ∈ [0,1] to satisfy the following:
After observing cost vector 𝑑(, allocate 𝑞((𝑗) ⋅ 𝑑( as the “simulated

cost” to algorithm 𝐵> for its future use ∑> 𝑞((𝑗) = 1 ∑> 𝑞( 𝑗 𝑟(

>(𝑘) = 𝑞((𝑘) , ∀𝑘 ∈ [𝑜]

𝑞( is a distribution 𝑞( is stationary

SLIDE 31

31

Theorem. Any online algorithm 𝐵 with external regret 𝑆 can be

converted to another online algorithm 𝐼 swap regret 𝑜𝑆. Proof Step 2: deriving regret bound

Ø𝐵> has external regret 𝑆, so

∑(∈ ; ∑7 𝑟(

>(𝑘) [𝑞( 𝑗 𝑑( 𝑘 − 𝑞( 𝑗 𝑑((𝑘′)] ≤ 𝑆

∀𝑘[ ∈ 𝑜 (1)

ØSwap regret of 𝐼

By our construction: ∑> 𝑞( 𝑗 𝑟(

>(𝑘) = 𝑞((𝑘) , ∀𝑘 ∈ [𝑜]

Need to somehow relate 𝑡𝑥𝑆; to 𝑟(

>’s, because Inequality (1)

is the only bound we have 𝑡𝑥𝑆; = max

I

∑(∈ ; ∑7∈[/] 𝑞((𝑘)[𝑑( 𝑘 − 𝑑((𝑡(𝑘))]

SLIDE 32

32

Theorem. Any online algorithm 𝐵 with external regret 𝑆 can be

converted to another online algorithm 𝐼 swap regret 𝑜𝑆. Proof Step 2: deriving regret bound

Ø𝐵> has external regret 𝑆, so

∑(∈ ; ∑7 𝑟(

>(𝑘) [𝑞( 𝑗 𝑑( 𝑘 − 𝑞( 𝑗 𝑑((𝑘′)] ≤ 𝑆

∀𝑘[ ∈ 𝑜 (1)

ØSwap regret of 𝐼

By our construction: ∑> 𝑞( 𝑗 𝑟(

>(𝑘) = 𝑞((𝑘) , ∀𝑘 ∈ [𝑜]

= max

I

∑(∈ ; ∑7∈[/] ∑> 𝑞( 𝑗 𝑟(

>(𝑘) [𝑑( 𝑘 − 𝑑((𝑡(𝑘))]

𝑡𝑥𝑆; = max

I

∑(∈ ; ∑7∈[/] 𝑞((𝑘)[𝑑( 𝑘 − 𝑑((𝑡(𝑘))]

SLIDE 33

33

Theorem. Any online algorithm 𝐵 with external regret 𝑆 can be

converted to another online algorithm 𝐼 swap regret 𝑜𝑆. Proof Step 2: deriving regret bound

Ø𝐵> has external regret 𝑆, so

∑(∈ ; ∑7 𝑟(

>(𝑘) [𝑞( 𝑗 𝑑( 𝑘 − 𝑞( 𝑗 𝑑((𝑘′)] ≤ 𝑆

∀𝑘[ ∈ 𝑜 (1)

ØSwap regret of 𝐼

= max

I

∑>(∑(∈ ; ∑7∈[/] 𝑞( 𝑗 𝑟(

>(𝑘)[𝑑( 𝑘 − 𝑑((𝑡(𝑘))] )

= max

I

∑(∈ ; ∑7∈[/] ∑> 𝑞( 𝑗 𝑟(

> 𝑘 [𝑑( 𝑘 − 𝑑( 𝑡 𝑘

] 𝑡𝑥𝑆; = max

I

∑(∈ ; ∑7∈[/] 𝑞((𝑘)[𝑑( 𝑘 − 𝑑((𝑡(𝑘))]

SLIDE 34

34

Theorem. Any online algorithm 𝐵 with external regret 𝑆 can be

converted to another online algorithm 𝐼 swap regret 𝑜𝑆. Proof Step 2: deriving regret bound

Ø𝐵> has external regret 𝑆, so

∑(∈ ; ∑7 𝑟(

>(𝑘) [𝑞( 𝑗 𝑑( 𝑘 − 𝑞( 𝑗 𝑑((𝑘′)] ≤ 𝑆

∀𝑘[ ∈ 𝑜 (1)

ØSwap regret of 𝐼

= max

I

∑> ∑(∈ ; ∑7∈[/] 𝑞( 𝑗 𝑟(

>(𝑘)[𝑑( 𝑘 − 𝑑((𝑡(𝑘))]

= max

I

∑(∈ ; ∑7∈[/] ∑> 𝑞( 𝑗 𝑟(

>(𝑘) [𝑑( 𝑘 − 𝑑((𝑡(𝑘))]

𝑡𝑥𝑆; = max

I

∑(∈ ; ∑7∈[/] 𝑞((𝑘)[𝑑( 𝑘 − 𝑑((𝑡(𝑘))] ≤ 𝑜 ⋅ 𝑆

SLIDE 35

Thank You

Haifeng Xu

University of Virginia hx4ad@virginia.edu