stay with me lifetime maximization through
play

Stay With Me: Lifetime Maximization Through Heteroscedastic Linear - PowerPoint PPT Presentation

Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging Ping-Chun Hsieh 1 , Xi Liu 1 , Anirban Bhattacharya 2 , and P . R. Kumar 1 1 Department of ECE Texas A&M University 2 Department of Statistics Texas


  1. Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging Ping-Chun Hsieh 1 , Xi Liu 1 , Anirban Bhattacharya 2 , and P . R. Kumar 1 1 Department of ECE Texas A&M University 2 Department of Statistics Texas A&M University ICML 2019 Poster @ Pacific Ballroom # 124 Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging ICML 2019 1 / 10

  2. Lifetime Maximization: Continuing The Play • A finite game is played for the purpose of winning. • An infinite game is for the purpose of continuing the play. Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging ICML 2019 2 / 10

  3. Lifetime Maximization: Continuing The Play • A finite game is played for the purpose of winning. • An infinite game is for the purpose of continuing the play. Lifetime maximization Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging ICML 2019 2 / 10

  4. Why Lifetime Maximization? Medical treatments Portfolio selection Cloud services Salient features of these applications: 1 Each participant has a satisfaction level. 2 A participant drops if the outcomes are not satisfactory. 3 The outcomes depend heavily on the contextual information of the participant. Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging ICML 2019 3 / 10

  5. Model: Linear Bandits With Reneging 1 { x t , a } a ∈ A are pairwise participant-action contexts (observed by the platform when participant t arrives). 2 Outcome r t , a is conditionally independent given the context and has mean θ T ∗ x t , a . 3 Participant t keeps interacting with the platform as long as r t , a ≥ β t . Otherwise, the participant drops. Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging ICML 2019 4 / 10

  6. Heteroscedastic Outcomes • Heteroscedasticity: Outcome variations can be wildly different across different participants and actions Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging ICML 2019 5 / 10

  7. Heteroscedastic Outcomes • Heteroscedasticity: Outcome variations can be wildly different across different participants and actions • Example: • Two actions, 1 (red) and 2 (blue) • Participant satisfaction level = β • Heteroscedasticity is widely studied in econometrics, and is usually captured through regression on variance. Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging ICML 2019 5 / 10

  8. Model: Heteroscedastic Bandits With Reneging 1 { x t , a } a ∈ A are pairwise participant-action contexts (observed by the platform when participant t arrives) 2 Outcome r t , a is conditionally independent given the context and satisfies that r t , a ∼ N ( θ ⊤ ∗ x t , a , f ( φ ⊤ ∗ x t , a )) . 3 Participant t keeps interacting with the platform if r t , a ≥ β t . Otherwise, the participant drops. Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging ICML 2019 6 / 10

  9. Oracle Policy and Regret • Oracle policy π oracle already knows θ ∗ and φ ∗ . • For each participant t , π oracle keeps choosing the action that minimizes reneging probability P { r t , a < β t | x t , a } • Hence, π oracle is a fixed policy Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging ICML 2019 7 / 10

  10. Oracle Policy and Regret • Oracle policy π oracle already knows θ ∗ and φ ∗ . • For each participant t , π oracle keeps choosing the action that minimizes reneging probability P { r t , a < β t | x t , a } • Hence, π oracle is a fixed policy • For T participants, define Regret π ( T ) = ( the total expected lifetime under π oracle ) − ( the total expected lifetime under π ) Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging ICML 2019 7 / 10

  11. Proposed Algorithm: HR-UCB • When participant t arrives, obtain estimators � θ, � φ with confidence intervals C θ , C φ based on past observations. Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging ICML 2019 8 / 10

  12. Proposed Algorithm: HR-UCB • When participant t arrives, obtain estimators � θ, � φ with confidence intervals C θ , C φ based on past observations. • For each action a , construct a UCB index as � �� − 1 � β t − � θ ⊤ x t , a Q HR ( x t , a ) = Φ � + ∆( C θ , C φ , x t , a ) (1) t � �� � f ( � φ ⊤ x t , a ) confidence interval for lifetime � �� � estimated expected lifetime Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging ICML 2019 8 / 10

  13. Proposed Algorithm: HR-UCB • When participant t arrives, obtain estimators � θ, � φ with confidence intervals C θ , C φ based on past observations. • For each action a , construct a UCB index as � �� − 1 � β t − � θ ⊤ x t , a Q HR ( x t , a ) = Φ � + ∆( C θ , C φ , x t , a ) (1) t � �� � f ( � φ ⊤ x t , a ) confidence interval for lifetime � �� � estimated expected lifetime • Apply the action arg max a Q HR ( x t , a ) . t Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging ICML 2019 8 / 10

  14. Proposed Algorithm: HR-UCB • When participant t arrives, obtain estimators � θ, � φ with confidence intervals C θ , C φ based on past observations. • For each action a , construct a UCB index as � �� − 1 � β t − � θ ⊤ x t , a Q HR ( x t , a ) = Φ � + ∆( C θ , C φ , x t , a ) (1) t � �� � f ( � φ ⊤ x t , a ) confidence interval for lifetime � �� � estimated expected lifetime • Apply the action arg max a Q HR ( x t , a ) . t Main technical challenges 1 Design estimators � θ, � φ under heteroscedasticity 2 Derive the confidence intervals C θ , C φ for � θ, � φ 3 Convert the C θ , C φ into the confidence interval of lifetime Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging ICML 2019 8 / 10

  15. Estimators of θ ∗ and φ ∗ (Challenge 1) • Generalized least square estimator (Wooldridge, 2015): With any n outcome observations, � � − 1 X ⊤ � X ⊤ θ n = n X n + λ I n r , � � − 1 X ⊤ � n f − 1 ( � X ⊤ ε ◦ � φ n = n X n + λ I ε ) . • X n is the matrix of n applied contexts • r is the vector of n observed outcomes ε ( x t , a ) = r t , a − � n x t , a is the estimated residual with respect to � θ ⊤ • � θ n Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging ICML 2019 9 / 10

  16. Estimators of θ ∗ and φ ∗ (Challenge 1) • Generalized least square estimator (Wooldridge, 2015): With any n outcome observations, � � − 1 X ⊤ � X ⊤ θ n = n X n + λ I n r , � � − 1 X ⊤ � n f − 1 ( � X ⊤ ε ◦ � φ n = n X n + λ I ε ) . • X n is the matrix of n applied contexts • r is the vector of n observed outcomes ε ( x t , a ) = r t , a − � n x t , a is the estimated residual with respect to � θ ⊤ • � θ n • Nice property (Abbasi-Yadkori et al., 2011): Let V n = X ⊤ n X n + λ I . For any δ > 0, with probability at least 1 − δ , for all n ∈ N , � � log( 1 || � θ n − θ ∗ || V n ≤ C θ ( δ, n ) = O δ ) + log n . Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging ICML 2019 9 / 10

  17. Main Technical Contributions (Challenges 2 & 3) Theorem For any δ > 0, with probability at least 1 − 2 δ , we have � � log( 1 || � φ n − φ ∗ || V n ≤ C φ ( δ, n ) = O , ∀ n ∈ N . δ ) + log n (2) • The proof is more involved since � φ n depends on the residual � ε Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging ICML 2019 10 / 10

  18. Main Technical Contributions (Challenges 2 & 3) Theorem For any δ > 0, with probability at least 1 − 2 δ , we have � � log( 1 || � φ n − φ ∗ || V n ≤ C φ ( δ, n ) = O , ∀ n ∈ N . δ ) + log n (2) • The proof is more involved since � φ n depends on the residual � ε Theorem � � ∆( C θ ( n , δ ) , C φ ( n , δ ) , x ) := k 1 C θ ( n , δ ) + k 2 C φ ( n , δ ) · || x || V − 1 is a n confidence interval with respect to lifetime, where k 1 , k 2 are constants independent of past history and x . Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging ICML 2019 10 / 10

  19. Main Technical Contributions (Challenges 2 & 3) Theorem For any δ > 0, with probability at least 1 − 2 δ , we have � � log( 1 || � φ n − φ ∗ || V n ≤ C φ ( δ, n ) = O , ∀ n ∈ N . δ ) + log n (2) • The proof is more involved since � φ n depends on the residual � ε Theorem � � ∆( C θ ( n , δ ) , C φ ( n , δ ) , x ) := k 1 C θ ( n , δ ) + k 2 C φ ( n , δ ) · || x || V − 1 is a n confidence interval with respect to lifetime, where k 1 , k 2 are constants independent of past history and x . Theorem �� � Under the HR-UCB policy, Regret ( T ) = O T (log T ) 3 . Stay With Me: Lifetime Maximization Through Heteroscedastic Linear Bandits With Reneging ICML 2019 10 / 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend