A Preference-Based Bandit Framework for Personalized Recommendation - - PowerPoint PPT Presentation

a preference based bandit framework for personalized
SMART_READER_LITE
LIVE PREVIEW

A Preference-Based Bandit Framework for Personalized Recommendation - - PowerPoint PPT Presentation

A Preference-Based Bandit Framework for Personalized Recommendation Maryam Tavakol and Ulf Brefeld Paderborn, Nov 8, 2016 Introduction Personalized Recommendation Preference Learning Multi-armed bandits 2 Recommendation 3 Recommendation


slide-1
SLIDE 1

A Preference-Based Bandit Framework for Personalized Recommendation

Maryam Tavakol and Ulf Brefeld

Paderborn, Nov 8, 2016

slide-2
SLIDE 2

Introduction

2

Personalized Recommendation

Preference Learning Multi-armed bandits

slide-3
SLIDE 3

Recommendation

3

slide-4
SLIDE 4

Recommendation

4

slide-5
SLIDE 5

Preference Model

  • Item i: {Shirt, Blue, Women, Cheap}
  • Item k: {Polo shirt, White, Women, Expensive}

Item i ≻ Item k:

{Shirt-Polo shirt, Blue-White, Women-Women, Cheap-Expensive}

5

zik := zi − zk

slide-6
SLIDE 6

Payoff Model

  • Personalized model + average component

6

User 1 User 2 … User m User 1 + User 2 + … + User m

E[rt,ik|ut = uj] = β>

t zik + θ>zik

slide-7
SLIDE 7

Personalized Recommendation with Qualitative Bandit

  • For t = 1, …, T:
  • 1. The world generates some context
  • 2. The learner chooses an action
  • 3. The world reacts with a reward
  • Choosing the arm with the highest mean reward +

confidence interval (General case of LinUCB)

7

slide-8
SLIDE 8

Unified Optimization

  • Solving the objective function in dual space
  • With arbitrary loss function
  • Using Fenchel-Legendre conjugate

8

slide-9
SLIDE 9

Squared Loss

  • The problem reduces to standard quadratic
  • ptimization
  • Model parameters ( , ), are obtained from

9

α θ βj

max

α

− 1 2C α>α + r>α

1 2α>[ZZ> + 1 µ( X

j

φj ⌦ φ>

j ) ZZ>]α

slide-10
SLIDE 10

Squared Loss

  • In the contextual bandit framework:
  • Mean:
  • Confidence bound:

10

β>

t zik + θ>zik

c q z>

ik(Z>Z + λI)1zik

slide-11
SLIDE 11

Algorithm

11

slide-12
SLIDE 12

Summary

  • Personalized recommendation
  • Pairwise learning in bandit framework
  • Optimization in dual space
  • Learning algorithm for squared loss

12

slide-13
SLIDE 13

Questions?

Thanks for your attention

Email: tavakol@leuphana.de