fast gradient descent for drifting least squares
play

Fast gradient descent for drifting least squares regression: - PowerPoint PPT Presentation

Fast gradient descent for drifting least squares regression: Non-asymptotic bounds and application to bandits Prashanth L A Joint work with Nathaniel Korda and R emi Munos INRIA Lille - Team SequeL MLRG - Oxford University


  1. Fast gradient descent for drifting least squares regression: Non-asymptotic bounds and application to bandits Prashanth L A † Joint work with Nathaniel Korda ♯ and R´ emi Munos † † INRIA Lille - Team SequeL ♯ MLRG - Oxford University November 26, 2014 Prashanth L A Fast gradient descent, with application to bandits November 26, 2014 1 / 31

  2. Complacs News Recommendation Platform NOAM database: 17 million articles from 2010 Task: Find the best among 2000 news feeds Reward: Relevancy score of the article Feature dimension: 80000 (approx) 1 In collaboration with Nello Cristianini and Tom Welfare at University of Bristol Prashanth L A Fast gradient descent, with application to bandits November 26, 2014 2 / 31

  3. Complacs News Recommendation Platform NOAM database: 17 million articles from 2010 Task: Find the best among 2000 news feeds Reward: Relevancy score of the article Feature dimension: 80000 (approx) 1 In collaboration with Nello Cristianini and Tom Welfare at University of Bristol Prashanth L A Fast gradient descent, with application to bandits November 26, 2014 2 / 31

  4. Complacs News Recommendation Platform NOAM database: 17 million articles from 2010 Task: Find the best among 2000 news feeds Reward: Relevancy score of the article Feature dimension: 80000 (approx) 1 In collaboration with Nello Cristianini and Tom Welfare at University of Bristol Prashanth L A Fast gradient descent, with application to bandits November 26, 2014 2 / 31

  5. Complacs News Recommendation Platform NOAM database: 17 million articles from 2010 Task: Find the best among 2000 news feeds Reward: Relevancy score of the article Feature dimension: 80000 (approx) 1 In collaboration with Nello Cristianini and Tom Welfare at University of Bristol Prashanth L A Fast gradient descent, with application to bandits November 26, 2014 2 / 31

  6. More on relevancy score Problem: Find the best news feed for Crime stories Sample scores: Five dead in Finnish mall shooting Score: 1 . 93 Holidays provide more opportunities to drink Score: − 0 . 48 Russia raises price of vodka Score: 2 . 67 Why Obama Care Must Be Defeated Score: 0 . 43 Score: − 1 . 06 University closure due to weather Prashanth L A Fast gradient descent, with application to bandits November 26, 2014 3 / 31

  7. More on relevancy score Problem: Find the best news feed for Crime stories Sample scores: Five dead in Finnish mall shooting Score: 1 . 93 Holidays provide more opportunities to drink Score: − 0 . 48 Russia raises price of vodka Score: 2 . 67 Why Obama Care Must Be Defeated Score: 0 . 43 Score: − 1 . 06 University closure due to weather Prashanth L A Fast gradient descent, with application to bandits November 26, 2014 3 / 31

  8. More on relevancy score Problem: Find the best news feed for Crime stories Sample scores: Five dead in Finnish mall shooting Score: 1 . 93 Holidays provide more opportunities to drink Score: − 0 . 48 Russia raises price of vodka Score: 2 . 67 Why Obama Care Must Be Defeated Score: 0 . 43 Score: − 1 . 06 University closure due to weather Prashanth L A Fast gradient descent, with application to bandits November 26, 2014 3 / 31

  9. More on relevancy score Problem: Find the best news feed for Crime stories Sample scores: Five dead in Finnish mall shooting Score: 1 . 93 Holidays provide more opportunities to drink Score: − 0 . 48 Russia raises price of vodka Score: 2 . 67 Why Obama Care Must Be Defeated Score: 0 . 43 Score: − 1 . 06 University closure due to weather Prashanth L A Fast gradient descent, with application to bandits November 26, 2014 3 / 31

  10. More on relevancy score Problem: Find the best news feed for Crime stories Sample scores: Five dead in Finnish mall shooting Score: 1 . 93 Holidays provide more opportunities to drink Score: − 0 . 48 Russia raises price of vodka Score: 2 . 67 Why Obama Care Must Be Defeated Score: 0 . 43 Score: − 1 . 06 University closure due to weather Prashanth L A Fast gradient descent, with application to bandits November 26, 2014 3 / 31

  11. A linear bandit algorithm Rewards y n x n := arg max UCB ( x ) n θ ∗ s.t. E [ y n | x n ] = x T x ∈ D Choose x n Observe y n Estimate UCBs � Regression used to compute UCB ( x ) := x T ˆ x T A − 1 θ n + α n x Prashanth L A Fast gradient descent, with application to bandits November 26, 2014 4 / 31

  12. A linear bandit algorithm Rewards y n x n := arg max UCB ( x ) n θ ∗ s.t. E [ y n | x n ] = x T x ∈ D Choose x n Observe y n Estimate UCBs � Regression used to compute UCB ( x ) := x T ˆ x T A − 1 θ n + α n x Prashanth L A Fast gradient descent, with application to bandits November 26, 2014 4 / 31

  13. A linear bandit algorithm Rewards y n x n := arg max UCB ( x ) n θ ∗ s.t. E [ y n | x n ] = x T x ∈ D Choose x n Observe y n Estimate UCBs � Regression used to compute UCB ( x ) := x T ˆ x T A − 1 θ n + α n x Prashanth L A Fast gradient descent, with application to bandits November 26, 2014 4 / 31

  14. A linear bandit algorithm Rewards y n x n := arg max UCB ( x ) n θ ∗ s.t. E [ y n | x n ] = x T x ∈ D Choose x n Observe y n Estimate UCBs � Regression used to compute UCB ( x ) := x T ˆ x T A − 1 θ n + α n x Prashanth L A Fast gradient descent, with application to bandits November 26, 2014 4 / 31

  15. UCB values Mean-reward estimate UCB ( x ) = ˆ µ ( x ) + α σ ( x ) ˆ Confidence width At each round t , select a tap. Optimize the quality of n selected beers Prashanth L A Fast gradient descent, with application to bandits November 26, 2014 5 / 31

  16. UCB values Mean-reward estimate UCB ( x ) = ˆ µ ( x ) + α σ ( x ) ˆ Confidence width At each round t , select a tap. Optimize the quality of n selected beers Prashanth L A Fast gradient descent, with application to bandits November 26, 2014 5 / 31

  17. UCB values Mean-reward estimate UCB ( x ) = ˆ µ ( x ) + α σ ( x ) ˆ Confidence width At each round t , select a tap. Optimize the quality of n selected beers Prashanth L A Fast gradient descent, with application to bandits November 26, 2014 5 / 31

  18. UCB values Linearity ⇒ No need to estimate mean-reward of all arms, estimating θ ∗ is enough Regression ˆ θ n = A − 1 n b n UCB ( x ) = µ ( x ) ˆ + α σ ( x ) ˆ Mahalanobis distance of x from � x T A − 1 A n : n x Optimize the beer you drink, before you get drunk Prashanth L A Fast gradient descent, with application to bandits November 26, 2014 6 / 31

  19. UCB values Linearity ⇒ No need to estimate mean-reward of all arms, estimating θ ∗ is enough Regression ˆ θ n = A − 1 n b n UCB ( x ) = µ ( x ) ˆ + α σ ( x ) ˆ Mahalanobis distance of x from � x T A − 1 A n : n x Optimize the beer you drink, before you get drunk Prashanth L A Fast gradient descent, with application to bandits November 26, 2014 6 / 31

  20. UCB values Linearity ⇒ No need to estimate mean-reward of all arms, estimating θ ∗ is enough Regression ˆ θ n = A − 1 n b n UCB ( x ) = µ ( x ) ˆ + α σ ( x ) ˆ Mahalanobis distance of x from � x T A − 1 A n : n x Optimize the beer you drink, before you get drunk Prashanth L A Fast gradient descent, with application to bandits November 26, 2014 6 / 31

  21. Performance measure Best arm: x ∗ = arg min { x T θ ∗ } . x T � ( x ∗ − x i ) T θ ∗ Regret: R T = i = 1 Goal: ensure R T grows sub-linearly with T Linear bandit algorithms ensure sub-linear regret! Prashanth L A Fast gradient descent, with application to bandits November 26, 2014 7 / 31

  22. Performance measure Best arm: x ∗ = arg min { x T θ ∗ } . x T � ( x ∗ − x i ) T θ ∗ Regret: R T = i = 1 Goal: ensure R T grows sub-linearly with T Linear bandit algorithms ensure sub-linear regret! Prashanth L A Fast gradient descent, with application to bandits November 26, 2014 7 / 31

  23. Complexity of Least Squares Regression Choose x n Observe y n Estimate ˆ θ n Figure : Typical ML algorithm using Regression Regression Complexity O ( d 2 ) using the Sherman-Morrison lemma or O ( d 2 . 807 ) using the Strassen algorithm or O ( d 2 . 375 ) the Coppersmith-Winograd algorithm Problem: Complacs News feed platform has high-dimensional features ( d ∼ 10 5 ) ⇒ solving OLS is computationally costly Prashanth L A Fast gradient descent, with application to bandits November 26, 2014 8 / 31

  24. Complexity of Least Squares Regression Choose x n Observe y n Estimate ˆ θ n Figure : Typical ML algorithm using Regression Regression Complexity O ( d 2 ) using the Sherman-Morrison lemma or O ( d 2 . 807 ) using the Strassen algorithm or O ( d 2 . 375 ) the Coppersmith-Winograd algorithm Problem: Complacs News feed platform has high-dimensional features ( d ∼ 10 5 ) ⇒ solving OLS is computationally costly Prashanth L A Fast gradient descent, with application to bandits November 26, 2014 8 / 31

  25. Fast GD for Regression Update θ n Pick i n uniformly θ n + 1 θ n in { 1 , . . . , n } using ( x i n , y i n ) Random Sampling GD Update Solution: Use fast (online) gradient descent (GD) Efficient with complexity of only O ( d ) ( Well-known ) High probability bounds with explicit constants can be derived ( not fully known ) Prashanth L A Fast gradient descent, with application to bandits November 26, 2014 9 / 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend