decentralized exploration in multi armed bandits
play

Decentralized Exploration in Multi-Armed Bandits Raphal Fraud, Rda - PowerPoint PPT Presentation

Decentralized Exploration in Multi-Armed Bandits Raphal Fraud, Rda Alami, Romain Laroche raphael.feraud@orange.com, reda.alami@orange.com, Romain.Laroche@microsoft.com June 2019 R. Fraud, R. Alami, R. Laroche 1 / 19 Context and


  1. Decentralized Exploration in Multi-Armed Bandits Raphaël Féraud, Réda Alami, Romain Laroche raphael.feraud@orange.com, reda.alami@orange.com, Romain.Laroche@microsoft.com June 2019 R. Féraud, R. Alami, R. Laroche 1 / 19

  2. Context and Motivation Outline Context and Motivation 1 Decentralized Exploration Problem 2 Decentralized Elimination Algorithm 3 Experiments 4 Conclusion 5 R. Féraud, R. Alami, R. Laroche 2 / 19

  3. Context and Motivation Sequential A/B testing use cases Most of digital applications perform sequential A/B testing in order to optimize their audience. For instance, Orange web portal performs marketing optimization for promoting services: If I would like to promote Orange TV which banner is the best ? Should I push on Games of Thrones or on Sports? R. Féraud, R. Alami, R. Laroche 3 / 19

  4. Context and Motivation (Centralized) Exploration Problem Definition 1 ( ǫ -optimal arm) An arm k P K is said to be ǫ -optimal, if µ k ➙ µ k ✝ ✁ ǫ , where k ✝ ✏ arg max k P K µ k , ǫ P ♣ 0 , 1 s , and µ k is the mean reward of arm k . Centralized approach: The click stream of users is gathered and processed by a Best Arm Identification algorithm to choose with high probability an ǫ -optimal arm. Do we really need to gather billions of logs containing private user’s information for handling sequential A/B testing use cases ? R. Féraud, R. Alami, R. Laroche 4 / 19

  5. Context and Motivation (Centralized) Exploration Problem Definition 1 ( ǫ -optimal arm) An arm k P K is said to be ǫ -optimal, if µ k ➙ µ k ✝ ✁ ǫ , where k ✝ ✏ arg max k P K µ k , ǫ P ♣ 0 , 1 s , and µ k is the mean reward of arm k . Centralized approach: The click stream of users is gathered and processed by a Best Arm Identification algorithm to choose with high probability an ǫ -optimal arm. Do we really need to gather billions of logs containing private user’s information for handling sequential A/B testing use cases ? R. Féraud, R. Alami, R. Laroche 4 / 19

  6. Context and Motivation (Centralized) Exploration Problem Definition 1 ( ǫ -optimal arm) An arm k P K is said to be ǫ -optimal, if µ k ➙ µ k ✝ ✁ ǫ , where k ✝ ✏ arg max k P K µ k , ǫ P ♣ 0 , 1 s , and µ k is the mean reward of arm k . Centralized approach: The click stream of users is gathered and processed by a Best Arm Identification algorithm to choose with high probability an ǫ -optimal arm. Do we really need to gather billions of logs containing private user’s information for handling sequential A/B testing use cases ? R. Féraud, R. Alami, R. Laroche 4 / 19

  7. Decentralized Exploration Problem Outline Context and Motivation 1 Decentralized Exploration Problem 2 Decentralized Elimination Algorithm 3 Experiments 4 Conclusion 5 R. Féraud, R. Alami, R. Laroche 5 / 19

  8. Decentralized Exploration Problem Problem setting Definition 2 (message) A message is a random variable, that is sent by player n to other players. When the event "player n is 1 active" occurs, player n reads the messages received from other players Player n chooses an arm to play. 2 The reward of the played arm is 3 revealed to player n . Player n may update its set of 4 arms and/or send a message to the other players. Goal Designing an algorithm that samples effectively to find an ǫ -optimal arm for each player, while ensuring privacy and minimizing the number of messages. R. Féraud, R. Alami, R. Laroche 6 / 19

  9. Decentralized Exploration Problem Privacy guarantee We define the privacy level as the information about the preferred arms of a player, that an adversary could infer by intercepting the messages of this player. Definition 3 ( ♣ ǫ, η q -private). The decentralized algorithm A is ♣ ǫ, η q -private for finding an ǫ -approximation of the best arm, if for any player n , ❊ η 1 , 0 ➔ η 1 ➔ η ➔ 1 such that an adversary, that knows M n , the set of messages of player n , and the algorithm A , can infer what arm is an ǫ -approximation of the best arm for player n with a probability at least 1 ✁ η 1 : ❅ l n P t 1 , ..., L ✉ , K n ♣ l n q ❸ K ǫ ⑤ M n , A ❅ n P N , � ✟ ➙ 1 ✁ η 1 , P where K ǫ is the set of ǫ -optimal arms, and K n is the set of arms of player n , and l n is the number of times where K n has been updated, and L ↕ K . 1 ✁ η is the confidence level associated to the decision of the adversary: the higher η , the higher the privacy protection. R. Féraud, R. Alami, R. Laroche 7 / 19

  10. Decentralized Elimination Algorithm Outline Context and Motivation 1 Decentralized Exploration Problem 2 Decentralized Elimination Algorithm 3 Experiments 4 Conclusion 5 R. Féraud, R. Alami, R. Laroche 8 / 19

  11. Decentralized Elimination Algorithm Decentralized Elimination: the principle An Arm Selection Subroutine is run on each player. The players exchange the indexes of arms that they eliminate with a high probability of failure η . The high probability of failure insures privacy of messages. When enough players vote for the elimination of an arm, it is eliminated for all players. Why does it work ? When M ↕ N players independently eliminate an arm with a probability of failure η , then the probability of failure of the group of M voting players is δ ✏ η M . R. Féraud, R. Alami, R. Laroche 9 / 19

  12. Decentralized Elimination Algorithm Decentralized Elimination: the principle An Arm Selection Subroutine is run on each player. The players exchange the indexes of arms that they eliminate with a high probability of failure η . The high probability of failure insures privacy of messages. When enough players vote for the elimination of an arm, it is eliminated for all players. Why does it work ? When M ↕ N players independently eliminate an arm with a probability of failure η , then the probability of failure of the group of M voting players is δ ✏ η M . R. Féraud, R. Alami, R. Laroche 9 / 19

  13. Decentralized Elimination Algorithm Decentralized Elimination: a generic algorithm Definition 4 (Arm Selection Subroutine) An ArmSelection subroutine takes as parameters an approximation factor ǫ , a failure probability η , and a set of remaining arm K n ♣ l n q , where l n is the number of times K n has been updated. It samples a remaining arm in K n ♣ l n q and returns the set of eliminated arms K n ♣ l n q . An ArmSelection subroutine satisfies Properties 1 and 2. Property 1 (remaining ǫ -optimal arm) ❅ l n P t 1 , ..., L ✉ , K n � l n ✁ 1 l n ✟ ⑨ K n � ✟ , l n ✁ 1 t K n ♣ l n q ❳ K ǫ ✏ ❍✉⑤ H t n , K n � ❳ K ǫ ✘ ❍ ↕ η ✂ f ♣ l n q , � ✟ ✟ P where 0 ↕ f ♣ l n q ↕ 1 and ➳ f ♣ l n q ✏ 1 , and H t n is the interaction history. l n Property 2 (finite sample complexity) ❉ t n ➙ 1 , ❅ η P ♣ 0 , 1 q , ❅ ǫ P ♣ 0 , 1 s , P t K n ♣ L q ⑨ K ǫ ✉⑤ H t n ✟ ➙ 1 ✁ η. � R. Féraud, R. Alami, R. Laroche 10 / 19

  14. Decentralized Elimination Algorithm Decentralized Elimination: a generic algorithm Definition 4 (Arm Selection Subroutine) An ArmSelection subroutine takes as parameters an approximation factor ǫ , a failure probability η , and a set of remaining arm K n ♣ l n q , where l n is the number of times K n has been updated. It samples a remaining arm in K n ♣ l n q and returns the set of eliminated arms K n ♣ l n q . An ArmSelection subroutine satisfies Properties 1 and 2. Property 1 (remaining ǫ -optimal arm) ❅ l n P t 1 , ..., L ✉ , K n � l n ✁ 1 l n ✟ ⑨ K n � ✟ , l n ✁ 1 t K n ♣ l n q ❳ K ǫ ✏ ❍✉⑤ H t n , K n � ❳ K ǫ ✘ ❍ ↕ η ✂ f ♣ l n q , � ✟ ✟ P where 0 ↕ f ♣ l n q ↕ 1 and ➳ f ♣ l n q ✏ 1 , and H t n is the interaction history. l n Property 2 (finite sample complexity) ❉ t n ➙ 1 , ❅ η P ♣ 0 , 1 q , ❅ ǫ P ♣ 0 , 1 s , P t K n ♣ L q ⑨ K ǫ ✉⑤ H t n ✟ ➙ 1 ✁ η. � R. Féraud, R. Alami, R. Laroche 10 / 19

  15. Decentralized Elimination Algorithm Decentralized Elimination: a generic algorithm Definition 4 (Arm Selection Subroutine) An ArmSelection subroutine takes as parameters an approximation factor ǫ , a failure probability η , and a set of remaining arm K n ♣ l n q , where l n is the number of times K n has been updated. It samples a remaining arm in K n ♣ l n q and returns the set of eliminated arms K n ♣ l n q . An ArmSelection subroutine satisfies Properties 1 and 2. Property 1 (remaining ǫ -optimal arm) ❅ l n P t 1 , ..., L ✉ , K n � l n ✁ 1 l n ✟ ⑨ K n � ✟ , l n ✁ 1 t K n ♣ l n q ❳ K ǫ ✏ ❍✉⑤ H t n , K n � ❳ K ǫ ✘ ❍ ↕ η ✂ f ♣ l n q , � ✟ ✟ P where 0 ↕ f ♣ l n q ↕ 1 and ➳ f ♣ l n q ✏ 1 , and H t n is the interaction history. l n Property 2 (finite sample complexity) ❉ t n ➙ 1 , ❅ η P ♣ 0 , 1 q , ❅ ǫ P ♣ 0 , 1 s , P t K n ♣ L q ⑨ K ǫ ✉⑤ H t n ✟ ➙ 1 ✁ η. � R. Féraud, R. Alami, R. Laroche 10 / 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend