One-Pass Ranking Models for Low-Latency Product Recommendations - PowerPoint PPT Presentation

One-Pass Ranking Models for   Low-Latency Product Recommendations Martin Saveski @msaveski MIT (Amazon Berlin)

One-Pass Ranking Models for   Low-Latency Product Recommendations Amazon Machine Learning Team, Berlin Antonino Freno Rodolphe Jenatton Cédric Archambeau

Product Recommendations

Product Recommendations Constraints

Product Recommendations Constraints 1. Large # of examples   Large # of features

Product Recommendations Constraints 1. Large # of examples   Large # of features 2. Drifting distribution

Product Recommendations Constraints 1. Large # of examples   Large # of features 2. Drifting distribution 3. Real-time ranking   (<few ms)

Product Recommendations Constraints 1. Large # of examples   Small memory footprint Large # of features 2. Drifting distribution 3. Real-time ranking   (<few ms)

Product Recommendations Constraints 1. Large # of examples   Small memory footprint Large # of features 2. Drifting distribution Fast training time 3. Real-time ranking   (<few ms)

Product Recommendations Constraints 1. Large # of examples   Small memory footprint Large # of features 2. Drifting distribution Fast training time 3. Real-time ranking   Low prediction latency (<few ms)

Our approach Product Recommendations Small memory footprint Fast training time Low prediction latency

Our approach Product Recommendations Small memory footprint Stochastic optimization One pass learning Fast training time Low prediction latency

Our approach Product Recommendations Small memory footprint Stochastic optimization One pass learning Fast training time Low prediction latency Sparse models

Learning Ranking Functions

Learning Ranking Functions Three broad families of models 1. Pointwise (Logistic regression) 2. Pairwise (RankSVM) 3. Listwise (ListNet)

Learning Ranking Functions Three broad families of models 1. Pointwise (Logistic regression) 2. Pairwise (RankSVM) 3. Listwise (ListNet) Loss functions • Evaluation functions (NDCG) • Surrogate functions

Loss Function Lambda Rank (Burges et al., 2007)

Loss Function Lambda Rank (Burges et al., 2007) Product 1 Product 2 Product 3 Product 4 x 3 : Features X x 2 x 4 x 1 : Ground-truth Rank 1 1 2 3 r

Loss Function Lambda Rank (Burges et al., 2007) Product 1 Product 2 Product 3 Product 4 x 3 : Features X x 2 x 4 x 1 : Ground-truth Rank 1 1 2 3 r j i

Loss Function Lambda Rank (Burges et al., 2007) Product 1 Product 2 Product 3 Product 4 x 3 : Features X x 2 x 4 x 1 : Ground-truth Rank 1 1 2 3 r j i j Importance of sorting and correctly i ∆ M = M ( r ) − M ( r i/j )

Loss Function Lambda Rank (Burges et al., 2007) Product 1 Product 2 Product 3 Product 4 x 3 : Features X x 2 x 4 x 1 : Ground-truth Rank 1 1 2 3 r j i j Importance of sorting and correctly i ∆ M = M ( r ) − M ( r i/j ) Difference in scores S = max { 0 , w T x j − w T x i } ∆ S

Loss Function Lambda Rank (Burges et al., 2007) Product 1 Product 2 Product 3 Product 4 x 3 : Features X x 2 x 4 x 1 : Ground-truth Rank 1 1 2 3 r j i j Importance of sorting and correctly i ∆ M = M ( r ) − M ( r i/j ) Difference in scores S = max { 0 , w T x j − w T x i } ∆ S Loss X L ( X ; w ) = ∆ M · ∆ S r i ≤ r j

ElasticRank Introducing Sparsity Adding and penalties l 1 l 2 L ∗ ( X , w ) = L ( X , w ) + λ 1 || w || 1 + 1 2 λ 2 || w || 2 2

ElasticRank Introducing Sparsity Adding and penalties l 1 l 2 L ∗ ( X , w ) = L ( X , w ) + λ 1 || w || 1 + 1 2 λ 2 || w || 2 2 Both and control model complexity λ 2 λ 1

ElasticRank Introducing Sparsity Adding and penalties l 1 l 2 L ∗ ( X , w ) = L ( X , w ) + λ 1 || w || 1 + 1 2 λ 2 || w || 2 2 Both and control model complexity λ 2 λ 1 λ 1 • trades-off sparsity and performance

ElasticRank Introducing Sparsity Adding and penalties l 1 l 2 L ∗ ( X , w ) = L ( X , w ) + λ 1 || w || 1 + 1 2 λ 2 || w || 2 2 Both and control model complexity λ 2 λ 1 λ 1 • trades-off sparsity and performance • adds strong convexity & improves convergence λ 2

Optimization Algorithms Extensions of Stochastic Gradient Descent

Optimization Algorithms Extensions of Stochastic Gradient Descent FOBOS Forward-Backward Splitting (Duchi, 2009) 1. Gradient step 2. Proximal step involving the regularization

Optimization Algorithms Extensions of Stochastic Gradient Descent FOBOS Forward-Backward Splitting (Duchi, 2009) 1. Gradient step 2. Proximal step involving the regularization RDA Regularized Dual Averaging (Xiao, 2010) • Keeps a running average of all past gradients • Solves a proximal step using the average

Optimization Algorithms Extensions of Stochastic Gradient Descent FOBOS Forward-Backward Splitting (Duchi, 2009) 1. Gradient step 2. Proximal step involving the regularization RDA Regularized Dual Averaging (Xiao, 2010) • Keeps a running average of all past gradients • Solves a proximal step using the average pSGD Pruned Stochastic Gradient Descent • Prunes every gradient steps k | w i | < θ ⇒ w i = 0 • If

Hyper-parameter Optimization • Turn-key inference • Automatic adjustment of hyper-parameters • Bayesian Approach (Snoek, Larochelle, Adams; 2012) • Gaussian Process • Thomson Sampling

LETOR Experiments ElasticRank is comparable with state-of-the-art models 0.6 0.5 0.4 NDCG @ 5 0.3 0.2 0.1 0 OHSUMED TD2003 TD2004 Logistic RankSVM ListNet ElasticRank Regression

Amazon.com Experiments Experimental Setup • # examples millions ≈ • # features thousands (millions of dimensions) N ≈ • Purchase logs from contiguous time interval Validation Testing Training 9 1 1 11 11 11

Experimental Results ElasticRank performs best ElasticRank ElasticRank ElasticRank pSGD FOBOS RDA RankSVM Logistic Regression Recall @ 1

Sparsity vs Performance RDA achieves the best trade-off 0.305 0.3 RDA 0.295 pSGD 0.29 Recall @ 1 0.285 FOBOS 0.28 0.275 0.27 0.265 PSGD FOBOS RDA 0.26 1 4 16 64 256 1024 Number of Weights

Prediction Time 15 10.9 μ s Microseconds 10 8.7 μ s 6.2 μ s 5 0 4 29 1804 Number of Weights

Contributions How to learn ranking functions with • Single pass • Small memory footprint • Sparse WITHOUT sacrificing performance

References • C. J. C. Burges, R. Ragno, and Q. V. Le. Learning to rank with nonsmooth cost functions . In Advances in Neural Information Processing Systems (NIPS), 2006. • J. C. Duchi and Y. Singer. Efficient online and batch learning using forward backward splitting . Journal of Machine Learning Research (JMLR), 2009. • L. Xiao. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization . Journal of Machine Learning Research (JMLR), 2010. • J. Snoek, H. Larochelle, and R. P. Adams. Practical bayesian optimization of machine learning algorithms . In Advances in Neural Information Processing Systems (NIPS), 2012.

One-Pass Ranking Models for   Low-Latency Product Recommendations Martin Saveski @msaveski MIT (Amazon Berlin)

One-Pass Ranking Models for Low-Latency Product Recommendations - PowerPoint PPT Presentation

One-Pass Ranking Models for Low-Latency Product Recommendations Martin Saveski @msaveski MIT (Amazon Berlin) One-Pass Ranking Models for Low-Latency Product Recommendations Amazon Machine Learning Team, Berlin Antonino Freno

Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for Ultra-Low Latency SSDs Jinkyu Jeong

50% pass developmental credit course course pass take pass developmental credit credit

Product Section Product Section New Product Introduction New Product Introduction Product

U-Pass Program Executive Management Committee May 17, 2018 1 U-PASS The U-Pass Pilot

Easy and Hard Outline Constraint Ranking in OT The Constraint Ranking problem Making fast

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Low Latency Live Video Streaming over HTTP 2.0 Sheng Wei, Vishy Swaminathan | Adobe Research

STORM AND LOW-LATENCY PROCESSING www.inf.ed.ac.uk Low latency processing Similar to data

KNN and re ranking models for English KNN and re-ranking models for English patent mining at

The Proposed Closure of Rollover Pass Texas General Land Office Jerry Patterson, Land

U-PASS IMPLEMENTATION 2015/2016 Why are we implementing the U-Pass? In 2014/2015, the

Ranking candidate genes from Ranking candidate genes from perturbation experiments Niko

Online Submodular Set Cover, Ranking, and Repeated Active Learning Online Ranking: At each round,

TVM for Ads Ranking @ Facebook Hao Lu, Ansha Yu, Yinghai Lu, Andrew Tulloch Ads Ranking at

Lets talk locks! @kavya719 kavya locks. locks are slow locks are slow latency

FAILURE AT NETFLIX VELOCITY Cannot Connect to the Netflix Service 0 0 Ms % IMPACT LATENCY

Distribution A: Approved for Public Release 20 April 2016 1 > GP BOMBS / Theater Mission

The Crystallography Open Database new perspectives Saulius Graulis Andrius Merkys Antanas

Processes by Providing a Service and Integration Infrastructure Florian Krmer, Marius Politze,

Managing Ongoing Managing Ongoing Responsibilities for Responsibilities for Variable- -Rate

De Develop opment of of the new Research Infrastructure for or Europ opes Na Natural Sc

Online Learning with Model Selection Lizhe Sun, Adrian Barbu Florida State University

klaR: A Package Including Various Classification Tools Christian R over, Nils Raabe, Karsten

Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Sup

One-Pass Ranking Models for Low-Latency Product Recommendations - PowerPoint PPT Presentation

One-Pass Ranking Models for Low-Latency Product Recommendations Martin Saveski @msaveski MIT (Amazon Berlin) One-Pass Ranking Models for Low-Latency Product Recommendations Amazon Machine Learning Team, Berlin Antonino Freno

Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for Ultra-Low Latency SSDs Jinkyu Jeong

50% pass developmental credit course course pass take pass developmental credit credit

Product Section Product Section New Product Introduction New Product Introduction Product

U-Pass Program Executive Management Committee May 17, 2018 1 U-PASS The U-Pass Pilot

Easy and Hard Outline Constraint Ranking in OT The Constraint Ranking problem Making fast

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Low Latency Live Video Streaming over HTTP 2.0 Sheng Wei, Vishy Swaminathan | Adobe Research

STORM AND LOW-LATENCY PROCESSING www.inf.ed.ac.uk Low latency processing Similar to data

KNN and re ranking models for English KNN and re-ranking models for English patent mining at

The Proposed Closure of Rollover Pass Texas General Land Office Jerry Patterson, Land

U-PASS IMPLEMENTATION 2015/2016 Why are we implementing the U-Pass? In 2014/2015, the

Ranking candidate genes from Ranking candidate genes from perturbation experiments Niko

Online Submodular Set Cover, Ranking, and Repeated Active Learning Online Ranking: At each round,

TVM for Ads Ranking @ Facebook Hao Lu, Ansha Yu, Yinghai Lu, Andrew Tulloch Ads Ranking at

Lets talk locks! @kavya719 kavya locks. locks are slow locks are slow latency

FAILURE AT NETFLIX VELOCITY Cannot Connect to the Netflix Service 0 0 Ms % IMPACT LATENCY

Distribution A: Approved for Public Release 20 April 2016 1 &gt; GP BOMBS / Theater Mission

The Crystallography Open Database new perspectives Saulius Graulis Andrius Merkys Antanas

Processes by Providing a Service and Integration Infrastructure Florian Krmer, Marius Politze,

Managing Ongoing Managing Ongoing Responsibilities for Responsibilities for Variable- -Rate

De Develop opment of of the new Research Infrastructure for or Europ opes Na Natural Sc

Online Learning with Model Selection Lizhe Sun, Adrian Barbu Florida State University

klaR: A Package Including Various Classification Tools Christian R over, Nils Raabe, Karsten

Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Sup

Distribution A: Approved for Public Release 20 April 2016 1 > GP BOMBS / Theater Mission