[PPT] - Ranking article comments using reinforcement learning Lester PowerPoint Presentation

SLIDE 1

Ranking article comments using reinforcement learning

Lester Solbakken | October 28th 2019 vespa.ai

SLIDE 2

SLIDE 3

Encourages meaningful discussion?

SLIDE 4

SLIDE 5

Vespa at

Hundreds of Vespa applications (Flickr, Tumblr, TechCrunch, Huffington Post, Aol, Gemini, Engadget, Yahoo News Sports Finance Mail etc.):

serving over a billion users,
hundreds of thousands of

queries per second,

billions of content items.

Personalized article recommendations Personalized real-time native ads selection Searching 20+ billion images Select comments using neural nets and reinforcement learning

SLIDE 6

Around 30 developers in Trondheim, Norway

Vespa team

Fast Search & Transfer (alltheweb.com)

1998

Overture Yahoo

2004 2017 2019

Oath Verizon Media Group Vespa Open Source

SLIDE 7

Baseline - existing solution

Comments found on many Yahoo properties such as Yahoo Finance, Yahoo News, and Yahoo Sports

~ 1 billion comments stored
~ 12.000 queries per second
2x that for updates

Some articles have > 100.000 comments!

https://blog.vespa.ai/post/182759620076/serving-article-comments-using-reinforcement

SLIDE 8

Potential features

Wilson score*: probability of comment being overwhelmingly liked by all users

(*) Zhang et. al. 2011. How to Count Thumb-Ups and Thumb-Downs: User-Rating Based Ranking of Items from an Axiomatic Perspective.

Community How users interacted with comment Comment Relevance to topic, moderation Author Reputation User Preferences Other Time Conversation AI (https://conversationai.github.io)

SLIDE 9

Previous ranking algorithm

Community features Comment features Author features User features Other features Final score Hardcoded weighting

SLIDE 10

Scoring

Question Answer

Ranking Learning

SLIDE 11

Scoring

Question Answer

How should features be combined intelligently? Ranking How can we overcome position bias? Learning How do we learn directly from user behavior?

SLIDE 12

Scoring

Question Answer

How should features be combined intelligently? Neural network over comment features Ranking How can we overcome position bias? Learning How do we learn directly from user behavior?

SLIDE 13

Scoring

Question Answer

How should features be combined intelligently? Neural network over comment features Ranking How can we overcome position bias? Exploration with sampling Learning How do we learn directly from user behavior?

SLIDE 14

Scoring

Question Answer

How should features be combined intelligently? Neural network over comment features Ranking How can we overcome position bias? Exploration with sampling Learning How do we learn directly from user behavior? Reinforcement learning with dwell time rewards

SLIDE 15

Reinforcement learning in general

RL is a general-purpose framework for artificial intelligence

RL is for an agent with the

capacity to act

Each action influences the

agent’s future state

Success is measured by a

scalar reward signal

Select actions to maximise

future reward

SLIDE 16

Multi-arm bandits with context Reward r is conditioned on chosen action - feedback is partial Canonical example: ad serving

Contextual bandits

Source: Microsoft research

features x score v = f(x) action a

}

policy

SLIDE 17

Sometimes called contextual semibandits* Importance weighted sampling to construct unbiased estimates for rewards

}

Contextual bandits in ranking

features x score v = f(x) ranking policy

(*) Krishnamurthy, Agarwal, Dudík 2016. Contextual Semibandits via Supervised Learning Oracles.

Policy chooses a ranking, not an action

SLIDE 18

Comment

Scoring

SLIDE 19

Comment

Scoring

Features

Community Comment Author User Other

SLIDE 20

Comment

Scoring

Features Model

Community Comment Author User Other

SLIDE 21

Comment

Scoring

Features Model Positive score

Community Comment Author User Other

SLIDE 22

Comments

Ranking

Scores

SLIDE 23

Comments

Ranking

Scores Sampling

SLIDE 24

Comments

Ranking

Scores Sampling Ranking

SLIDE 25

Comments

Ranking

Scores Sampling Ranking

SLIDE 26

Comments

Ranking

Scores Sampling Ranking

SLIDE 27

Comments

Ranking

Scores Sampling Ranking

SLIDE 28

Learning

Model Rankings

SLIDE 29

Learning

Model Rankings Reward

SLIDE 30

Learning

Model Rankings Reward Gradient ascent

in direction of expected reward

SLIDE 31

Learning

Model Rankings Reward Gradient ascent

in direction of expected reward

SLIDE 32

Learning

Model Rankings Reward Gradient ascent

Can use any reward

in direction of expected reward

SLIDE 33

Cold start: pre-train neural network to emulate previous ranking

Gradient ascent with Kendall’s tau coefficient as reward

Off-policy evaluation: interactions are logged as (x, a, r, p), where p is the policy’s probability of choosing a given x.

Inverse-Propensity Scoring* for estimating average reward of a some policy

from data collected by another policy

Bootstrapping and testing

(*) Peter C. Austin. 2011. An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies.

SLIDE 34

Elements of a solution

Comments Scoring model Ranking

SLIDE 35

Elements of a solution

log (r) Comments Scoring model Ranking Distributed DB Reward instrumentation

SLIDE 36

Elements of a solution

log (r) log (x, a, p) Comments Scoring model Ranking Distributed DB Reward instrumentation

SLIDE 37

Elements of a solution

log (r) log (x, a, p) Comments Scoring model Machine learning Ranking (x, a, r, p) Distributed DB Reward instrumentation

SLIDE 38

Implementation

Presentation Comment processing Hadoop

votes

TensorFlow

(r) (x, a, p) create, update feed rankings

Vespa

SLIDE 39

Vespa

Search and filter over

structured and unstructured data

Query time organization and

aggregation of matching data

Real-time writes
Advanced relevance scoring

with tensors as first class citizens*

Scaleable and fast
Elastic and fault tolerant
Pluggable
Easy to operate

Typical use cases: text search, personalization, recommendation, targeting, real-time data display A platform for low latency computations over large, evolving data sets:

(*) https://github.com/jobergum/dense-vector-ranking-performance

SLIDE 40

Scaleable and fast

About 1 billion comments / ~12.000 queries per second
Read latency 7ms for 10k comments - including model evaluation
Write latency ~1ms

Direct deployment of ML scoring models Advanced computation framework for complex features Custom logic for implementing sampling and logging Hosted for simpler architecture *

Vespa as comment serving system

(*) https://vespa.ai/cloud

SLIDE 41

Scalable low latency execution

Container node Query

Application Package

Admin & Config Content node

Deploy

Configuration
Components
ML models

Scatter-gather

Core sharding models models models

How to bound latency: 1) Parallelization 2) Prepared data structures (indexes etc.) 3) Move execution to data nodes

SLIDE 42

Deploying ML models to Vespa

map( join( reduce( join( placeholder, weights, f(x,y)(x * y) ), sum, d1 ), bias, f(x,y)(x + y) ), f(x)(max(0,x)) ) placeholder weights matmul bias add relu

1. Model in application package 2. Download model from external source during (re-)deployment 3. Feed model weights as tensors

SLIDE 43

Deployment strategy

Experimental bucket A/B test Production Traffic splitter Freeze scoring model

SLIDE 44

~25% increase in time spent Experimenting with

more features for a larger neural networks
personalized comment ranking
more sophisticated rewards

Results and ongoing work

SLIDE 45

Generalizing the implementation

Presentation Content processing Distributed DB Machine learning

(r) (x, a, p) feed rankings

Vespa

External content

Search News recommendation Product recommendation Ad selection Q&A +++

SLIDE 46

Thanks to

Verizon Media Engineering Sreekanth Ramakrishnan Aaron Nagao Zhi Qu Xue Wu Verizon Media Science Akshay Soni Kapil Thadani

SLIDE 47

Ranking article comments using reinforcement learning

Lester Solbakken | October 28th 2019 vespa.ai

Encourages meaningful discussion?

Vespa at

Vespa team

Baseline - existing solution

Potential features

Previous ranking algorithm

Question Answer

Question Answer

Question Answer

Question Answer

Question Answer

Reinforcement learning in general

Contextual bandits

}

}

Contextual bandits in ranking

Scoring

Scoring

Scoring

Scoring

Ranking

Ranking

Ranking

Ranking

Ranking

Ranking

Learning

Learning

Learning

Learning

Learning

Bootstrapping and testing

Elements of a solution

Elements of a solution

Elements of a solution

Elements of a solution

Implementation

Vespa

Vespa as comment serving system

Scalable low latency execution

Deploying ML models to Vespa

Deployment strategy

Results and ongoing work

Generalizing the implementation

Thanks to

Thank you!

https://vespa.ai/cloud vespa.ai