A Contextual-Bandit Approach to Personalized News Article - - PowerPoint PPT Presentation

a contextual bandit approach to personalized news article
SMART_READER_LITE
LIVE PREVIEW

A Contextual-Bandit Approach to Personalized News Article - - PowerPoint PPT Presentation

A Contextual-Bandit Approach to Personalized News Article Recommendation Lihong li, Wei Chu, John Langford, Rebort E. Schapire Presentator: Qingyun Wu News Recommendation Cycle A K-armed Bandit Formulation A gambler must decide which of


slide-1
SLIDE 1

A Contextual-Bandit Approach to Personalized News Article Recommendation

Lihong li, Wei Chu, John Langford, Rebort E. Schapire Presentator: Qingyun Wu

slide-2
SLIDE 2

News Recommendation Cycle

slide-3
SLIDE 3

A K-armed Bandit Formulation

  • A gambler must decide which of the K

non-identical slot machines(we called them arms) to play in a sequence of trails in order to maximize total reward.

How to pull arms to maximize reward? News Website <—> gambler Candidate news articles <—> arms User Click <—> Reward

slide-4
SLIDE 4

A K-armed Bandit formulation

Setting

  • Set of K choices(arms)
  • Each choice is associate with an unknown probability

distribution supported in [0,1]

  • play the game for rounds

pa

a

  • In each round

(1)we pick article (2)we observe random sample from Our Goal: maximize

t

T

j Xt

pj

Xt

t=1 T

slide-5
SLIDE 5

Ideal Solution

Pick

argmax

a

µa

But we DO NOT know the mean.

slide-6
SLIDE 6

Every time we pull an arm we learn a bit more about the distribution.

Feasible Solution

slide-7
SLIDE 7

Exploitation VS. Exploration

Exploitation: pull an arm for which we current have the highest estimate of mean of reward

Greedy Strategy: Take the arm with the highest average reward Random Strategy: Randomly choose an arm

Exploration: Pull an arm we never pulled before

Extreme examples: Too confident Too unconfident

slide-8
SLIDE 8

Don’t just look at the mean(that’s the expected reward), but also the confidence!

How to make trade off

Exploration Exploitation

slide-9
SLIDE 9

UCB(Upper Confidence Bound) algorithm

argmax(

a

µa

^

+α *Varance)

Pick

argmax(

a

µa

^

+α *UCB)

Pick UCB1

argmax

a (µa ^

+ 2lnT na )

Reference: Finite-Analysis of the Multi-armed Bandit Problem, Peter Auer, Nicolo Cesa-Bianchi, Paul Fischer http://homes.di.unimi.it/~cesabian/Pubblicazioni/ml-02.pdf

Confidence Interval is a range of values within which we are sure the mean lies with a certain probability

slide-10
SLIDE 10

Make use of Contextual Information

User feature: demographic information, geographic features, behavioral categories Article feature:URL categories, topic categories

Assumption about the reward:

The expected reward of an arm is linear in its -dimensional feature , with some unknown coefficient vector , namely, for all ,

a

d xt,a

θa

*

t

E(r

t,a | xt,a) = xt,a T θa *

slide-11
SLIDE 11

UCB(Upper Confidence Bound) algorithm

Assumption

Parameter Estimation

Bound of the variance

E(r

t,a | xt,a) = xt,a T θa *

ˆ θa =(Da

TDa + Id)−1Da Tca

(Ridge Regression)

xt ,a

T ˆ

θa − E(r

t ,a|xt ,a) ≤α

xt ,a

T (Da TDa + Id)−1xt ,a

Bound we need!!!

Pick

argmax

a (xt,a T ˆ

θa +α xt,a

T (Da T Da + Id)xt,a )

slide-12
SLIDE 12

Performance Evaluation

slide-13
SLIDE 13

Summary

Model news recommendation as a K-armed Bandit Problem UCB-type Algorithm Take Contextual Information in to consideration

slide-14
SLIDE 14

Q&A