bandits in auctions more
play

Bandits in Auctions (& more) Vianney Perchet joint work with P. - PowerPoint PPT Presentation

Bandits in Auctions (& more) Vianney Perchet joint work with P. Rigollet (MIT) and J. Weed (MIT) CEMRACS 2017 July 20 2017 CMLA, ENS Paris-Saclay & Criteo Research Motivations & Objectives Classical Examples of Bandits Problems


  1. Bandits in Auctions (& more) Vianney Perchet joint work with P. Rigollet (MIT) and J. Weed (MIT) CEMRACS 2017 July 20 2017 CMLA, ENS Paris-Saclay & Criteo Research

  2. Motivations & Objectives

  3. Classical Examples of Bandits Problems – Size of data: n patients with some proba of getting cured or – Patients cured or dead 1) Inference: Find the best treatment between the red and blue 3 – Choose one of two treatments to prescribe 2) Cumul: Save as many patients as possible

  4. Classical Examples of Bandits Problems – Size of data: n banners with some proba of click or – Banner clicked or ignored 1) Inference: Find the best ad between the red and blue 2) Cumul: Get as many clicks as possible 3 – Choose one of two ads to display

  5. • criteo chooses ad of a client, Microsoft or Cdiscount or Booking • criteo gets paid by the client if the user clicks on the ad Example of Repeated Auctions Ad slot sold by lemonde.fr. 2nd-price auctions • Several (marketing) companies places bids • Highest bid wins (...), say criteo, pays to lemonde 2nd bid (...) Main Problem: Repeated auctions with unknown private valuation Learn valuations , find which ad to display & good strategies 4

  6. • criteo chooses ad of a client, Microsoft or Cdiscount or Booking • criteo gets paid by the client if the user clicks on the ad Example of Repeated Auctions Ad slot sold by lemonde.fr. 2nd-price auctions • Several (marketing) companies places bids • Highest bid wins (...), say criteo, pays to lemonde 2nd bid (...) Main Problem: Repeated auctions with unknown private valuation Learn valuations , find which ad to display & good strategies 4

  7. Example of Repeated Auctions Ad slot sold by lemonde.fr. 2nd-price auctions • Several (marketing) companies places bids • criteo chooses ad of a client, Microsoft or Cdiscount or Booking • criteo gets paid by the client if the user clicks on the ad Main Problem: Repeated auctions with unknown private valuation Learn valuations , find which ad to display & good strategies 4 • Highest bid wins (...), say criteo, pays to lemonde 2nd bid (...)

  8. Example of Repeated Auctions Some companies whose cookies can be controlled 4

  9. Back to Classical Examples of Bandits Problems – Size of data: n mails with some proba of spam or – Mail correctly or incorrectly classified 1) Inference: Find the best between the red and blue 2) Cumul: Minimize number of errors as possible 5 – Choose one of two actions: spam or ham

  10. Back to Classical Examples of Bandits Problems 5

  11. Back to Classical Examples of Bandits Problems – Size of data: n patients with some proba of getting cured or – Patients cured or dead 1) Inference: Find the best treatment between the red and blue 2) Cumul: Save as many patients as possible 5 – Choose one of two

  12. – Save as many as possible. Two-Armed Bandit – Patients arrive and are treated sequentially. 6

  13. – Save as many as possible. Two-Armed Bandit – Patients arrive and are treated sequentially. 6

  14. – Save as many as possible. Two-Armed Bandit – Patients arrive and are treated sequentially. 6

  15. – Save as many as possible. Two-Armed Bandit – Patients arrive and are treated sequentially. 6

  16. – Save as many as possible. Two-Armed Bandit – Patients arrive and are treated sequentially. 6

  17. – Save as many as possible. Two-Armed Bandit – Patients arrive and are treated sequentially. 6

  18. – Save as many as possible. Two-Armed Bandit – Patients arrive and are treated sequentially. 6

  19. – Save as many as possible. Two-Armed Bandit – Patients arrive and are treated sequentially. 6

  20. – Save as many as possible. Two-Armed Bandit – Patients arrive and are treated sequentially. 6

  21. – Save as many as possible. Two-Armed Bandit – Patients arrive and are treated sequentially. 6

  22. – Save as many as possible. Two-Armed Bandit – Patients arrive and are treated sequentially. 6

  23. – Save as many as possible. Two-Armed Bandit – Patients arrive and are treated sequentially. 6

  24. Two-Armed Bandit – Patients arrive and are treated sequentially. – Save as many as possible. 6

  25. A bit of theory 7

  26. Stochastic Multi-Armed Bandit

  27. K -Armed Stochastic Bandit Problems i.i.d. T T T max – Goal: Maximize expected reward 9 bounded X i – K actions i ∈ { 1 , . . . , K } , outcome X i t ∈ R (sub-)Gaussian, ( ) 1 , X i 2 , . . . , ∼ N µ i , 1 ( ) 2 , . . . , X π t − 1 X π 1 1 , X π 2 ∈ { 1 , . . . , K } – Non-Anticipative Policy: π t t − 1 ∑ T t = ∑ T t = 1 E X π t t = 1 µ π t – Performance: Cumulative Regret ∑ ∑ ∑ { } µ i − µ π t = ∆ i R T = π t = i ̸ = ⋆ 1 i ∈{ 1 ,..., K } t = 1 t = 1 t = 1 with ∆ i = µ ⋆ − µ i , the “gap” or cost of error i .

  28. Most Famous algorithm [Auer, Cesa-Bianchi, Fisher, ’02] i Worst-Case: k Regret: s . • UCB - “Upper Confidence Bound” T i t 10 i i X √ { } 2 log ( t ) π t + 1 = arg max t + , T i ( t ) where T i ( t ) = ∑ t ∑ t = 1 1 { π t = i } and X t = 1 s : i s = i X i E R T ≲ ∑ K log ( T ) log ( T ) ∧ T ∆ E R T ≲ sup ∆ k ∆ ∆ √ KT log ( T ) ≂

  29. 11 i X i i i • 2-lines proof: i i i { √ } 2 log ( t ) Ideas of proof π t + 1 = arg max i t + T i ( t ) √ √ 2 log ( t ) 2 log ( t ) ⋆ π t + 1 = i ̸ = ⋆ ⇐ ⇒ X t + ≤ X t + T ⋆ ( t ) T i ( t ) √ 2 log ( t ) ⇒ T i ( t ) ≲ log ( t ) ⇒ ”∆ i ≤ “ = = T i ( t ) ∆ 2 • Number of mistakes grows as log ( t ) i ; each mistake costs ∆ i . ∆ 2 Regret at stage T ≲ ∑ × ∆ i ≂ ∑ log ( T ) log ( T ) ∆ 2 ∆ i • “ = ⇒ ” actually happens with overwhelming proba • “optimal”: no algo always has a regret smaller than ∑ log ( T ) ∆ i

  30. Other Algos • Other algo, MOSS [Audibert, Bubeck], variants of UCB T Discretize + UCB gives TK • ETC [Perchet,Rigollet]. pull in round-robin then eliminate 12 k √ R T ≲ ∑ log ( T ∆ k ) , worst case R T ≤ T log ( K ) K ∆ k √ R T ≲ K log ( T ∆ min / K ) , worst case R T ≤ ∆ min • Infinite number of actions x ∈ [ 0 , 1 ] d with ∆( x ) 1 Lipschitz. √ ε ≤ T 2 / 3 R T ≲ T ε +

  31. Adversarial Multi-Armed Bandit

  32. K -Armed Adversarial Bandit Problems T t T X i 14 No assumption on X i • K actions i ∈ [ K ] = { 1 , . . . , K } , outcome X i t ∈ R bounded in [ 0 , 1 ] 1 , X i 2 , . . . ( ) 2 , . . . , X π t − 1 X π 1 1 , X π 2 • Non-Anticipative Policy: π t ∈ [ K ] t − 1 • Performance: Cumulative Regret ∑ ∑ X π t R T = max t − i ∈ [ K ] t = 1 t = 1 ∑ T • Convex optimization of p �→ E p t , from ∆([ K ]) to [ 0 , 1 ] t = 1 X i

  33. EXP-algo t • Using this estimate we obtain that p i t p i t t X i t p i t X i X t t p i t 15 p i X i s X t • Main insight: π t ∼ p t ∈ ∆([ K ]) , more weights on best actions e η ∑ t − 1 s = 1 X i t = η is a parameter ∑ j ∈ [ K ] e η ∑ t − 1 s , s = 1 X j t is observed, not X t . Estimate X t by � • Only X π t ( 1 − X i ) � 1 { π t = i } and run EXP on � t = 1 − • E � 1 − X i t = 1 − ( 1 − p i t ) . 0 + p i = X i t , unbiased estimator ( ) 2 • E ∑ t ) 2 ≤ 1 + ∑ t ( � 1 − X i t ≤ K + 1 bounded variance i ∈ [ K ] p i i ∈ K p i √ E R T ≤ log ( K ) + η ( K + 1 ) T ≤ 3 log ( K ) KT η

  34. Bandits & Repeated Auctions

  35. Back to Repeated Auctions Ad slot sold by lemonde.fr. 2nd-price auctions • Several (marketing) companies places bids • criteo gets paid by the client if the user clicks on the ad Main Problem: Repeated auctions with unknown private valuation Learn valuations , find which ad to display & good strategies 17 • Highest bid wins (...), say criteo, pays to lemonde 2nd bid (...) • criteo chooses ad of a client, Microsoft or Cdiscount or Booking

  36. 2nd price Auctions • A good is sold on second price auctions auction. • The highest bidder wins and pays second highest bid Truthful auctions • Utility of bidder : 18 • Each buyer, with valuation v ( i ) , puts a bet b ( i ) b ♯ = max i ̸ = argmax b ( i ) (ties broken arbitrarily) optimal strategy bid its own valuation b ( i ) = v ( i ) ( v ( i ) − b ♯ ) 1 { b ( i ) ≥ b ♯ } • if b ( i ) > v ( i ) might only pay too much • if b ( i ) > v ( i ) might loose the auction

  37. Reserve price Reserve price • Still truthful: c is a bid 19 • Utility of highest value: v ⋆ − b ♯ • Utility of seller (value v 0 ): b ♯ − v 0 , can be negative ! A threshold c : if b ∗ ≥ c ; price max { b ♯ , c } otherwise not sold • Optimal reserve price c ∗ max. E ( max { v ♯ , c } − v 0 ) 1 { v ∗ ≥ c } • Depends on the (actually unknown) distributions of value.

  38. Main model • Total regret : T T • Learning optimal reserve price [Cesa-Bianchi, Gentile, Mansour] max 20 From the point of view of a bidder ? • At round t = 1 , . . . , T : bidder bids b t ∈ [ 0 , 1 ] if b t > m t (maximum other bids & reserve price) win good, observe value v t ∈ [ 0 , 1 ] • Total utility: ∑ T t = 1 ( v t − m t ) 1 { b t > m t } ∑ ∑ ( v t − m t ) 1 { b > m t } − ( v t − m t ) 1 { b t > m t } b ∈ [ 0 , 1 ] t = 1 t = 1

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend