Online linear optimization and adaptive routing Baruch Awerbuch, - - PowerPoint PPT Presentation

online linear optimization and adaptive routing
SMART_READER_LITE
LIVE PREVIEW

Online linear optimization and adaptive routing Baruch Awerbuch, - - PowerPoint PPT Presentation

Online linear optimization and adaptive routing Baruch Awerbuch, Robert Kleinberg Motivation Overlay network routing Send a packet from source to target using the route with minimum delay The total route delay is revealed Graph


slide-1
SLIDE 1

Online linear optimization and adaptive routing

Baruch Awerbuch, Robert Kleinberg

slide-2
SLIDE 2

Motivation

  • Overlay network routing – Send a packet from

source to target using the route with minimum delay

  • The total route delay is revealed
  • Graph example

s r 12 1 5 3 12 5 1 10

slide-3
SLIDE 3

Using previous algorithms

  • We can use EXP3. Each route is a arm. Since

we have n! routes, our regret will be

  • We have seen online shortest paths with (full

information)

O(√(K Gmaxln K))→O(√(n!lnn!)) E[cost]≤(1+ϵ)mincostT+O(mnlog n/ϵ)

slide-4
SLIDE 4

Problem definition

  • G=(V,E) – Directed graph
  • For each j = 1, …, T the adaptive adversary

select cost for each edge

  • The algorithm select a path of length ≤ H
  • Receive cost of the entire path
  • Goal to minimize the difference between the

algorithm's expected total cost and the cost of the best single path from source to target

c j: E →[0,1]

slide-5
SLIDE 5

Regret

O(H

2(mH logΔ logmHT ) 1/3T 2/3)

slide-6
SLIDE 6

Pre-processing

  • We will transform the graph G to a leveled

directed acyclic graph

  • Start by calculating G x {0, 1, …, H}

– Vertex set V x {0, 1, …, H} – ei from (u, i - 1) to (v, i) for every e=(u, v) in E

  • The graph is obtained by:

– Deleting paths that doesn't reach to r

̃ G=( ̃ V , ̃ E) ̃ G

slide-7
SLIDE 7

Main idea

  • We can traverse the graph by querying BEX

for probabilities on the outgoing edges until we reach r

  • To do so we need to feed BEX with

information on all experts

  • We will run in phases, at each phase we will

estimate the cost for all experts. At the end of each phase we will update BEX

  • We will feed BEX with the total path cost
slide-8
SLIDE 8

Sampling experts

  • We can sample the experts according to the

distribution BEX returns (according to the previous phases costs)

  • The problem – We might ignore some edges

that might be better at next phases

  • We will add some exploration steps at each

phase

slide-9
SLIDE 9

Exploration

  • Will occur with probability δ
  • Choose an edge e=(u,v) uniformly at random
  • Construct a path by joining prefix(u), e and

suffix(v)

slide-10
SLIDE 10

Suffix

  • Suffix(v) will return the distribution on s – v

paths

  • Implementation – Choose edge by BEX

probabilities, traverse the edge, repeat until r is reached

  • Why can't it be random?

v r 1 1000 1 1 1 2 10

slide-11
SLIDE 11

Prefix

  • Prefix(v) – Will return the distribution on s - v

paths

  • Let suffix(u | v) be the distribution on u – v paths
  • Obtained by sampling from suffix(u) conditional

to the event that the path passes through v.

slide-12
SLIDE 12

Prefix

  • Sample from suffix(s | v) with probability
  • For all e = (q,u) from , with probability

sample from suffix(u | v) prepend e and then prepend a sample from prefix(q)

  • Where PΦ(v) is the probability v is contained in

the suffix of a path in phase Φ

(1−δ)Pr(v∈suffix(s))/Pϕ(v) ̃ E (δ/ ̃ m)Pr(v∈suffix(u))/Pϕ(v)

slide-13
SLIDE 13

Updating costs

  • Phase length
  • At each phase we will sum the costs for each

edge only if the edge wasn't part of the path chosen by prefix

  • The reason for that is that we cannot control

the probability those edges came from

τ=⌈2mH log(mH T )/δ⌉

slide-14
SLIDE 14

Updating costs

  • At the end of each phase

Where

∀e∈ ̃ E , μϕ(e)← E[∑

j∈τϕ

χ j(e)] ̃ cϕ(e)←(∑

j∈τϕ

χ j(e)c j(π j))/μϕ(e) ϕ=1,... ,.⌈T /τ⌉ j=τ(ϕ−1)+1, τ(ϕ−1)+2,... , τϕ

slide-15
SLIDE 15

Algorithm analysis

  • Let

C

− (v)=∑ j=1 T

E[c j( prefix(v))] C

+ (v)=∑ j=1 T

E[c j(suffix(v))] OPT (v)=min pathsπ:v → r∑

j=1 T

c j(π)

slide-16
SLIDE 16

Algorithm analysis

  • We know that for BEX
  • Let pϕ be the probability distribution supplied

by BEX(v) during phase ϕ

j=1 t

i=1 K

p j(i)c j(i)≤∑

j=1 t

c j(k)+O(ϵt+log K /ϵ)M

ϕ=1 t

e∈Δ(v)

pϕ(e) ̃ cϕ(e)≤∑

ϕ=1 t

̃ cϕ(e0)+O(ϵ H t+H logΔ/ϵ)

slide-17
SLIDE 17

Algorithm analysis

  • We used the fact that cost of a phase M is

smaller than 3H with high probability. By Chernoff bound

τ=2mHlog(mHT ) δ μϕ>δ τ mH =2log(mHT ) Pr(∑

j∈τϕ

χ j≥3∗2log(mHT ))≤e

−2/32log(mH T )≤1

mHT

slide-18
SLIDE 18

Algorithm analysis

  • Now by applying union bound over all phases

we get that this low probability event contributes at most HT / (mHT) < 1. So we will ignore this event

slide-19
SLIDE 19

Algorithm analysis

  • Expanding ̃

cϕ (Eq.12) ∑

ϕ=1 t

e∈Δ(v) ∑ j∈τϕ

pϕ(e)χ j(e)c j(π j)/μϕ(e) ≤∑

ϕ=1 t

j∈τϕ

χ j(e0)c j(π j)/μϕ(e0)+O(ϵ Ht+H ϵ log Δ)

slide-20
SLIDE 20

Algorithm analysis

  • Claim 3.2.

Pr(π⊂π j∣χ j(e)=1)=Pr( prefix(v)=π) π:s →v

slide-21
SLIDE 21

Algorithm analysis

  • Proof of claim 3.2
  • The first claim is by definition, let's prove the

second claim

χ j(e)=1 →e∈π j

0∨e∈π j +

Pr(π⊆π j∣e∈π j

0)=Pr( prefix(v)=π)

Pr(π⊆π j∣e∈π j

+ )=Pr( prefix(v)=π)

slide-22
SLIDE 22

Algorithm analysis

  • e is sampled independently from the path

preceding v, so

Pr(π⊆π j∣e∈π j

+ )=Pr(π∈π j∣v∈π j + )

Pr(v∈π j

+ )Pr(π⊆π j∣v∈π j + )=Pr(π⊆π j∩v∈π j + )

=(1−δ)Pr(v∈suffix(s))Pr(π=suffix(s∣v)) + ∑

e=(q ,u)∈ ̃ E

δ ̃ m Pr(v∈suffix(u)) Pr(π= prefix(q)∪{e}∪suffix(u∣v)) =Pr(v∈π j

+ )Pr(π= prefix(v))

slide-23
SLIDE 23

Algorithm analysis

  • Claim 3.3. If e =(v, w) then
  • Follows from claim 3.2 that the portion of the

path preceding e is distributed by prefix(v)

E[χ j(e)c j(π j)]=(μ(e)/τ)(A j(v)+B j(w)+c j(e)) A j(v)=E[c j( prefix(v))] B j(w)=E[c j(suffix(w))]

slide-24
SLIDE 24

Algorithm analysis

  • Taking the expectation of eq.12

The left side will become

  • The right side will become

ϕ=1 t

e∈Δ(v) ∑ j∈τϕ

pϕ(e)(A j(v)+B j(w)+c j(e)) =1 τ ∑

j=1 T

e∈Δ(v)

pϕ(e)(A j(v)+B j(w)+c j(e)) 1 τ ∑

j=1 T

(A j(v)+B j(w0)+c j(e0))

slide-25
SLIDE 25

Algorithm analysis

  • After removing Aj(v) from both sides and

notice that

  • So the left side will become

e∈Δ(v)

pϕ(e)(B j(w)+c j(e))=E [c j(suffix(v))] 1 τ ∑

j=1 T

E[c j(suffix(v))]=c

+ (v)/τ

slide-26
SLIDE 26

Algorithm analysis

  • The right side will become
  • Thus we have derived the local performance

guarantee (Eq.13)

1 τ ∑

j=1 T

E[c j(suffix(v))]+c

+ (w0)/τ+O(ϵ Ht+H

ϵ logΔ) c

+ (v)≤c + (w0)+∑ j=1 T

c j(e0)+O(ϵ HT +τ ϵ H log Δ)

slide-27
SLIDE 27

Global performance guarantee

  • Claim 3.4
  • To prove we can use the following observation

c

+ (v)≤OPT (v)+O(ϵ HT+τ

ϵ H logΔ)h(v) OPT (v)=mine0=(v ,w0){∑

j=1 T

c j(e0)+OPT (w0)}

slide-28
SLIDE 28

Global performance guarantee

  • Proof – By induction on h(v) and by using the

local performance guarantee

  • Lets mark
  • Now rewrite the claim and eq.13

F=O(ϵ Ht+τ H ϵ log Δ) c

+ (v)≤c + (w0)+∑ j=1 T

c j(e0)+F c

+ (v)≤OPT (v)+F h(v)

slide-29
SLIDE 29

Global performance guarantee

  • h(v)=1

It's true by the local performance guarantee

c

+ (v)≤OPT (v)+F=∑ j=1 T

c j(e0)+OPT (r)+F :∀e0=(v ,r) c

+ (v)≤∑ j=1 T

c j(e0)+F :∀e0=(v ,r)

slide-30
SLIDE 30

Global performance guarantee

  • h(v)=k+1

c

+ (v)≤c + (vk)+∑ j=1 T

c j(ek+1)+F ≤∑

j=1 T

c j(ek+1)+OPT (vk)+kF+F =OPT (vk+1)+(k+1)F

slide-31
SLIDE 31

Regret

  • Theorem 3.5. The algorithm suffers regret
  • The exploration step contributes
  • The exploitation contributes
  • Also
  • Substituting in claim 3.4 we get total

exploitation cost

O(H

2(mH log Δ log mHT ) 1/3T 2/3)

δTH c

+ (s)−OPT (s)

τ=2mH log(mH T )/δ c

+ (s)−OPT (s)=O(ϵT+2mHlogΔlog(mhT )

ϵδ )H

2

slide-32
SLIDE 32

Regret

  • We can assign

And we will get the desired regret

Regret≤O(δT+ϵT+2mHlogΔlog(mhT ) ϵδ )H

2

ϵ=δ=(2mH log Δlog(mhT ))

1/3T −1/3

O(H

2(mH log Δ log mHT ) 1/3T 2/3)