showing relevant ads via context multi armed bandits
play

Showing Relevant Ads via Context Multi-Armed Bandits D avid P al - PowerPoint PPT Presentation

Showing Relevant Ads via Context Multi-Armed Bandits D avid P al December 17, 2008 A&C Seminar joint work with Tyler Lu and Martin P al The Problem were running a popular website users visit our website we want to


  1. Showing Relevant Ads via Context Multi-Armed Bandits D´ avid P´ al December 17, 2008 A&C Seminar joint work with Tyler Lu and Martin P´ al

  2. The Problem • we’re running a popular website • users visit our website • we want to show each user relevant ad for him/her • relevant = likely to click on • for each user there is some side information • (search query, geographic location, cookies, etc.)

  3. Multi-Armed Bandits • pulling an arm = showing an ad • reward = click on the ad

  4. Previous Work Context-Free Multi-Armed Bandits • historical papers by Robbins in early 1950’s • stochastic version: Lai & Robbins 1985, Auer et al. 2002 • non-stochastic version: Auer et al. 1995 • Lipschitz version: R. Kleinberg 2005, Auer et al. 2007, R. Kleinberg et al. 2008

  5. Overview • Our model with context and Lipschitz condition • Regret and No-Regret learning • Statement of our results: • upper and lower bound on the regret • Our algorithm • Idea of the analysis of the algorithm

  6. Lipschitz Context Multi-Armed Bandits • information x about the user ( context ) • suppose we show ad y • with probability µ ( x , y ) the user’s clicks on the ad • assume µ : X × Y → [ 0 , 1 ] is Lipschitz: | µ ( x , y ) − µ ( x ′ , y ′ ) | ≤ L X ( x , x ′ ) + L Y ( y , y ′ ) where L X and L Y are metrics

  7. The Game • adversary chooses µ : X × Y → [ 0 , 1 ] and a sequence x 1 , x 2 , . . . , x T • algorithm chooses y 1 , y 2 , . . . , y T online: • in round t = 1 , 2 , . . . , T the algorithm has access to • x 1 , x 2 , . . . , x t − 1 • y 1 , y 2 , . . . , y t − 1 • ^ µ 1 , ^ µ 2 , . . . , ^ µ t − 1 ∈ { 0 , 1 } • adversary reveals x t • based on this the algorithm outputs y t

  8. Regret • optimal strategy: in round t = 1 , 2 , . . . , T show y ∗ t = argmax µ ( x t , y ) y ∈ Y • the algorithm shows instead y 1 , y 2 , . . . , y T • difference between expected payoffs � T � T � � Regret ( T ) = µ ( x t , y ∗ t ) − E µ ( x t , y t ) t = 1 t = 1

  9. No Regret Learning • per-round regret vanishes: Regret ( T ) lim = 0 T T →∞ • how fast is the convergence? typical result: Regret ( T ) = O ( T γ ) where 0 < γ < 1.

  10. Our Results (Oversimplifying and lying somewhat.) Theorem If X has “dimension” a and Y has “dimension” b, then • there exists an algorithm with � � Regret ( T ) = � a + b + 1 O T a + b + 2 • for any algorithm � � a + b + 1 Regret ( T ) = Ω T a + b + 2

  11. Covering Dimension • let ( Z , L Z ) be a metric space • cover the space with ǫ -balls • How many balls do we need? • roughly ( 1 /ǫ ) d ǫ • define d to be the dimension

  12. Optimal Algorithm • suppose that T is known to the algorithm • X , Y have dimensions a , b respectively • discretize X and Y : 1 ǫ = T − a + b + 2 • X 0 are centers of ǫ -balls covering X • Y 0 are centers of ǫ -balls covering Y • round x t to nearest element of X 0 • display only ads from Y 0

  13. Optimal Algorithm, continued • for each x 0 ∈ X 0 and y 0 ∈ Y 0 maintain: • number of times y 0 was displayed for x 0 : n ( x 0 , y 0 ) • corresponding number of clicks: m ( x 0 , y 0 ) • estimate of the click-through rate: µ ( x 0 , y 0 ) = m ( x 0 , y 0 ) n ( x 0 , y 0 )

  14. Optimal Algorithm, continued x 0 ǫ x t • when x t arrives “round” it to x 0 ∈ X 0 • show ad y 0 ∈ Y 0 that maximizes � log T µ ( x 0 , y 0 ) + 1 + n ( x 0 , y 0 ) (exploration vs. exploitation trade-off)

  15. Idea of Analysis • let � log T R t ( x 0 , y 0 ) = 1 + n ( x 0 , y 0 ) I t ( x 0 , y 0 ) = µ ( x 0 , y 0 ) + R t ( x 0 , y 0 ) • By Chernoff-Hoeffding bound with high probability I t ( x 0 , y 0 ) ∈ [ µ ( x 0 , y 0 ) − ǫ, µ ( x 0 , y 0 ) + 2 R t ( x 0 , y 0 ) + ǫ ] for all x 0 ∈ X 0 , y 0 ∈ Y 0 and all t = 1 , 2 , . . . , T simultaneously.

  16. Idea of Analysis Fix x 0 ∈ X 0 Y 0 µ ( x 0 , y 4 ) y 4 µ ( x 0 , y 3 ) y 3 µ ( x 0 , y 2 ) y 2 µ ( x 0 , y 1 ) y 1 µ ( x 0 , · )

  17. Idea of Analysis The confidence intervals µ ( x 0 , · ) − ǫ µ ( x 0 , · ) + 2 R t ( x 0 , · ) + ǫ

  18. Idea of Analysis • The algorithm displays the ad maximizing I t ( x 0 , · ) . • I t ( x 0 , y 0 ) ’s lies w.h.p. in the confidence interval. I t ( x 0 , · )

  19. Idea of Analysis � T � T � � Regret ( T ) = µ ( x t , y ∗ t ) − E µ ( x t , y t ) t = 1 t = 1 contribution to the regret: µ ( x 0 , y ∗ ) − µ ( x 0 , y ) optimal ad y ∗ suboptimal ad y

  20. Idea of Analysis If µ ( x 0 , y ) + R t ( x 0 , y ) + ǫ < µ ( x 0 , y ∗ ) − ǫ , the algorithm stops displaying the suboptimal ad y . µ ( x 0 , y ∗ ) − ǫ µ ( x 0 , y ) + 2 R t ( x 0 , y ) + ǫ

  21. Idea of Analysis � log T R t ( x 0 , y ) = 1 + n ( x 0 , y ) • Confidence interval for y shrinks as n t ( x 0 , y ) increases. • Thus we can upper bound n t ( x 0 , y ) in terms of the difference µ ( x 0 , y ∗ ) − µ ( x 0 , y ) • Rest is just a long calculation.

  22. Conclusion • formulation of Context Multi-Armed Bandits • roughly matching upper and lower bounds: a + b + 1 T a + b + 2 • www.cs.uwaterloo.ca/˜dpal/papers/ • possible future work: non-stochastic clicks Thanks!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend