Linear Bandits: Rich decision sets Sham M. Kakade Machine Learning - - PowerPoint PPT Presentation

linear bandits rich decision sets sham m kakade
SMART_READER_LITE
LIVE PREVIEW

Linear Bandits: Rich decision sets Sham M. Kakade Machine Learning - - PowerPoint PPT Presentation

Linear Bandits: Rich decision sets Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for Big data 1 / 9 Bandits in practice: two major issues The decision space is very


slide-1
SLIDE 1

Linear Bandits: Rich decision sets Sham M. Kakade

Machine Learning for Big Data CSE547/STAT548 University of Washington

  • S. M. Kakade (UW)

Optimization for Big data 1 / 9

slide-2
SLIDE 2

Bandits in practice: two major issues

The decision space is very large.

Drug cocktails Ad design

We often have “side information” when making a decision

history of a user

  • S. M. Kakade (UW)

Optimization for Big data 2 / 9

slide-3
SLIDE 3

More real motivations...

  • S. M. Kakade (UW)

Optimization for Big data 2 / 9

slide-4
SLIDE 4

Linear bandits

An additive effects model. Suppose each round we take a decision x ∈ D ⊂ Rd.

x is paths on a graph. x is a feature vector of properties of an ad x is a which drugs are being taken

Upon taking action x, we get reward r, with expectation: E[r|x] = µ⊤x

  • nly d unknown paramters (and “effectively” 2d actions)

W desire an algorithm A (mapping histories to deicsions), which has low regret. µ⊤x∗ − E[µ⊤xt|A] ≤?? (where x∗ is the best decision)

  • S. M. Kakade (UW)

Optimization for Big data 3 / 9

slide-5
SLIDE 5

Example: Shortest paths...

  • S. M. Kakade (UW)

Optimization for Big data 3 / 9

slide-6
SLIDE 6

Algorithm Idea

again, let’s think of optimism in the face of uncertainty we observed some r1, . . . rt−1, and have taken x1, . . . xt−1. Questions:

what is an estimate of the reward of E[r|x] and what is our uncertainty? what is an estimate of µ and what is our uncertainty?

  • S. M. Kakade (UW)

Optimization for Big data 4 / 9

slide-7
SLIDE 7

Regression!

Define: At :=

  • τ<t

xτx⊤

τ + λI, bt :=

  • τ<t

xτrτ Our estimate of µ ˆ µt = A−1

t

bt Confidence of our estimate: µ − ˆ µt2

At ≤ O(d log t)

  • S. M. Kakade (UW)

Optimization for Big data 5 / 9

slide-8
SLIDE 8

LinUCB

Again, optimism in the face of uncertainty. Define: Bt := {ν|ν − ˆ µ2

At ≤ Od log t}

(Lin UCB) take action: xt = argmaxx∈D max

ν

ν⊤x then update At, Bt, bt, and ˆ µt. Equivalently, take action: xt = argmaxx∈D ˆ µ⊤x + (d log t)

  • xA−1

t

x

  • S. M. Kakade (UW)

Optimization for Big data 6 / 9

slide-9
SLIDE 9

LinUCB: Geometry

  • S. M. Kakade (UW)

Optimization for Big data 7 / 9

slide-10
SLIDE 10

LinUCB: Confidence intervals

  • S. M. Kakade (UW)

Optimization for Big data 8 / 9

slide-11
SLIDE 11

LinUCB

Regret bound: µ⊤x∗ − E[µ⊤xt|A] ≤ ∗(d √ T) (this is the best possible, up to log factors). Compare to O( √ KT)

Independent of number of actions. k-arm case is a special case.

Thompson sampling: This is a good algorithm in practice.

  • S. M. Kakade (UW)

Optimization for Big data 9 / 9

slide-12
SLIDE 12

Proof Idea...

Stats: need to show that Bt is a valid confidence region. Geometric lemma: The regret is upper bounded by the: log volume of posterior cov volume of prior cov Then just find the worst case log volume change.

  • S. M. Kakade (UW)

Optimization for Big data 9 / 9

slide-13
SLIDE 13

Dealing with context...

  • S. M. Kakade (UW)

Optimization for Big data 9 / 9

slide-14
SLIDE 14

Dealing with context...

  • S. M. Kakade (UW)

Optimization for Big data 9 / 9

slide-15
SLIDE 15

Acknowledgements

http://gdrro.lip6.fr/sites/default/files/ JourneeCOSdec2015-Kaufman.pdf https://sites.google.com/site/banditstutorial/ http://www.yisongyue.com/courses/cs159/lectures/ LinUCB.pdf

  • S. M. Kakade (UW)

Optimization for Big data 9 / 9