Deep Learning-based Search and Recommendation systems using - - PowerPoint PPT Presentation

deep learning based search and recommendation systems
SMART_READER_LITE
LIVE PREVIEW

Deep Learning-based Search and Recommendation systems using - - PowerPoint PPT Presentation

Deep Learning-based Search and Recommendation systems using TensorFlow Abhishek Kumar Dr. Vijay Agneeswaran MARCH 06, 2018 Strata Conference San Jose ( 2018 ) S A P I E N T R A Z O R F I S H 2 2 S A P I E N T R A Z O R F I S H Session


slide-1
SLIDE 1
slide-2
SLIDE 2

2 S A P I E N T R A Z O R F I S H

Deep Learning-based Search and Recommendation systems using TensorFlow

Strata Conference – San Jose ( 2018 )

S A P I E N T R A Z O R F I S H 2

MARCH 06, 2018

Abhishek Kumar

  • Dr. Vijay Agneeswaran
slide-3
SLIDE 3

3 S A P I E N T R A Z O R F I S H

Session Logistics

1. Access the work environment using the following link [ ]

  • 2. Code and presentation available at link : [ht

http://bi bit.ly/s /strata-dl dl-ca ca-20 2018 8 ]

  • 3. Connecting to the speakers [ Please send introductory note in LinkedIn invite ]

1. Abhishek Kumar ( ht http://bit.ly/kum kumara rabhi hishe hek @me meabhishekkuma mar )

  • 2. Dr. Vijay Agneeswaran (

ht http:/ ://bi bit.ly/v /vijaysa @a_ a_vi vijay aysrinivas vas)

  • 4. Don’t forget to tweet #str

strata tadata ta

slide-4
SLIDE 4

4 S A P I E N T R A Z O R F I S H

A B H I S H E K K U M A R

Senior Data Scientist , Sa SapientRa tRazorfish Masters from Un Univers ersity of

  • f Califor
  • rnia, Be

Berkeley Pl Pluralsight t Aut Autho hor

  • Doing Data Science with Python
  • R Programming Fundamentals
  • Machine Learning with ENCOG
  • Currently authoring : “Deploying Machine

Learning Models with Tensorflow Serving”

About the Speaker

slide-5
SLIDE 5

5 S A P I E N T R A Z O R F I S H

D R . V I J A Y A G N E E S W A R A N

Senior Director and Head of Data Science, Sa SapientRa tRazorfish MS (Research ) & PhD , IIT IIT Madras Post doctoral research fellowship, LS LSIR La Labs bs Professional member : AC ACM, IEEE (Seni nior) 4 4 Full US Patents and multiple publications (including IE IEEE journals) Regular Speaker @ O’ O’Reilly Str Strata ta co confer eren ence ce

About the Speaker

slide-6
SLIDE 6

6 S A P I E N T R A Z O R F I S H

Audience Profiling

  • 1. Machine Learning?
  • 2. Deep Learning?
  • 3. Search and Recommendation Systems?

4.Tensorflow?

slide-7
SLIDE 7

7 S A P I E N T R A Z O R F I S H

Session Agenda

FOUNDATION [ 30 MINS ] 1. High Level Overview For Problem Space [ Search, Recommendations, Learning to Rank ] 2. Deep Learning Primer 3. Why Tensorflow for Deep Learning ?

4 Levels of Learning

Search [ 1 HR ] 1. Embedding 2. Demo : Embedding in TF 3. Image Search using CovNet 4. Demo : Image Search in TF PRODUCTION [ 30 Mins ] 1. TF in Production : Training & Inference 2. Tensorflow Serving 3. RecSys Architecture Recommendation and LTR [ 1 HR ] 1. Embedding in RecSys 2. Demo : Build DL based RecSys with Explicit Feedback using TF 3. Demo : Build hybrid RecSys using TF 4. Learning on Rank 5. Demo : Build DL based RecSys with Implicit Feedback using TF

BREAK

slide-8
SLIDE 8

8 S A P I E N T R A Z O R F I S H

By the end of this session…

  • 1. You will have basic foundation of Deep Learning.
  • 2. You will have good understanding of Recommendation Systems, Search

and Ranking Systems

  • 3. You will be able to transform the concepts and build DL models using

Tensorflow

1. Deep learning based Image retrieval system

  • 2. Deep learning based hybrid RecSys on explicit feedback
  • 3. Deep learning based RecSys and Learning to Rank model on implicit feedback
  • 4. You will have high level idea to take the lab scale solution to a production

ready system

slide-9
SLIDE 9

9 S A P I E N T R A Z O R F I S H

Quick Chat with Your Neighbor

  • 1. Introduce yourself to your

neighbor

  • 2. What they are looking to learn

from the tutorial

slide-10
SLIDE 10

10 S A P I E N T R A Z O R F I S H

Probl Problem em Spac pace e : Search and Recommendation

slide-11
SLIDE 11

11 S A P I E N T R A Z O R F I S H

*Adapted from Forrester Research “Age of the Customer” graphic

Manufacturing Economy

Mass manufacturing makes industrial powerhouses successful “A customer can have a car painted any color he wants as long as it's black”

1900

Information Economy

1990

Connected PCs and supply chains mean those that control information flow dominate “The great challenge… is to make productive the tremendous new resource, the knowledge

  • worker. ”

Distribution Economy

1960

Global connections and transportation systems make distribution key “Strategy is…globalization, taking your products around the world; be the low- cost producer”

Connected Economy

2008+

iPhone and Facebook launch in 2007/8 heralding a new era in transparency, empowerment, and experimentation. “The customer is the center of your universe.”

We now live in the connected age

slide-12
SLIDE 12

12 S A P I E N T R A Z O R F I S H

The Connected Consumer is in charge

Empowered and demand transparency Demand personalization and real-time relevancy Value experiences over most things Embrace and seek new companies to engage with

slide-13
SLIDE 13

13 S A P I E N T R A Z O R F I S H

Research proves that consumer experience does matter

Source : http://www.nextopia.com/wp-content/uploads/2015/01/personalization-ecommerce-infographic.png https://blog.hubspot.com/blog/tabid/6307/bid/23996/Half-of-Shoppers-Spend-75-of-Time-Conducting-Online-Research-Data.aspx http://possible.mindtree.com/rs/574-LHH-431/images/Mindtree%20Shopper%20Survey%20Report.pdf http://www.getelastic.com/using-big-data-for-big-personalization-infographic/

86%

CU STO ME R S SAID TH AT PE R SO NALIZAT IO N H AS H AD SO ME IMPACT O N PU R CH ASING D E CISIO N

75%

SH O PPING TIME IS SPE NT O N PR O D U CT D ISCOVE R Y & R E SE AR CH O NLINE B Y 5 0% CU STO ME R S

95%

D ATA W ITH IN OR G ANIZATION R E MAINS U NTAPPE D D E MAND IMPR OVE D R E SPO NSE TIME

81%

CU STO ME R S

slide-14
SLIDE 14

14 S A P I E N T R A Z O R F I S H

Problem Space : Search

Search term..

Engine Filtered Results

Search Engines Challenges

Re-Ranked Results Indexing Similarity Calculation Recency Impression Discounting

Personalized Search

User + Interactions

  • How to represent text, images,

audios

  • TF-IDFs?
  • Metadata for binary?
  • Search in other languages?
  • Search Quality
  • Well-ranked results
  • By providing better search

results, Netflix estimates that it is avoiding canceled subscriptions that would reduce its revenue by $1B

  • annually. [ Link ]
slide-15
SLIDE 15

15 S A P I E N T R A Z O R F I S H

Problem Space : Recommendation

Engine User Other Users Interactions Recommended Results

Recommendation Engines

Re-Ranked Results Results Diversity Recency Impression Discounting Item Interactions

Challenges

  • How to represent users and items?
  • How to build hybrid systems with

both interactions ( collaborative ) and user/item metadata ?

  • How to use dynamic user behaviors?
  • How to use implicit ( view, share )

feedback ?

slide-16
SLIDE 16

16 S A P I E N T R A Z O R F I S H

Why Deep Learning for Search and Recommender System?

  • Direct content Feature extraction instead of metadata
  • Text, Image, Audio
  • Better representation of users and items for Recsys
  • Hybrid algorithms and heterogeneous data can be used
  • Better suited to model dynamic behavioral patterns and complex

feature interactions

slide-17
SLIDE 17

17 S A P I E N T R A Z O R F I S H

Deep Learning Primer

slide-18
SLIDE 18

18 S A P I E N T R A Z O R F I S H

What is Deep Learning ?

Class of machine learning algorithms

  • That uses hierarchy of non-linear processing layers and

complex model structures

  • Layers learn to represent different representation of data
  • Higher level features are constructed from lower level

abstract features

  • Trendy name for “Neural Networks with deep layers”

Artificial Intelligence Machine Learning Deep Learning Big Data Enabling Tech

slide-19
SLIDE 19

19 S A P I E N T R A Z O R F I S H

Simple Neural Network With 2 Layers

INPUT LAYER OUTPUT LAYER Limitation : Can learn only linear relationship

slide-20
SLIDE 20

20 S A P I E N T R A Z O R F I S H

Simple Neural Network with At Least One Hidden Layer

INPUT LAYER HIDDEN LAYER OUTPUT LAYER Universal Approximator

slide-21
SLIDE 21

21 S A P I E N T R A Z O R F I S H

Neural Network Training : Backpropagation

INPUT LAYER HIDDEN LAYER OUTPUT LAYER Output calculation Error Propagation and Weights Update ( using Gradients) Error Calculation

slide-22
SLIDE 22

22 S A P I E N T R A Z O R F I S H

What Changed Now ?

  • Mor

More data

  • More complex models need more data to avoid overfitting
  • Deep learning models have higher VC dimension
  • Co

Computin ing Power

  • Computing power have increased significantly
  • Specialized hardware such as GPUs and TPUs
  • Re

Research Breakth through

  • Hinton’s work on layerwise training led a new paradigm to train deep networks
  • Non-saturating activation functions ( variation of ReLUs )
  • Dropouts helped to achieve regularization easily
  • Adaptive learning rate helped to avoid problems of local minima and led to better

convergence

slide-23
SLIDE 23

23 S A P I E N T R A Z O R F I S H

Popular Neural Network Architectures : Deep Feed forward

INPUT LAYER MULTIPLE HIDDEN LAYERS OUTPUT LAYER

slide-24
SLIDE 24

24 S A P I E N T R A Z O R F I S H

Popular Neural Network Architectures : Convolution Neural Network ( CovNet)

Image Credit : https://towardsdatascience.com/applied-deep-learning-part-4-convolutional-neural-networks-584bc134c1e2

slide-25
SLIDE 25

25 S A P I E N T R A Z O R F I S H

Convolution Neural Network ( CovNet) : Components

NO NON-LI LINEARITY CONVO VOLUTION Mathematical Operation

  • n two sets of

information PO POOLING

Input Filter / Kernel Feature Map

Image Credit : https://towardsdatascience.com/applied-deep-learning-part-4-convolutional-neural-networks-584bc134c1e2

slide-26
SLIDE 26

26 S A P I E N T R A Z O R F I S H

Convolution Neural Network ( CovNet) : Components

NO NON-LI LINEARITY No Non-lin linear operatio ion on in input CONVO VOLUTION PO POOLING

Image Credit : https://towardsdatascience.com/applied-deep-learning-part-4-convolutional-neural-networks-584bc134c1e2

f(x)=max(0,x). ReLU ( Rectified Linear Unit )

  • Ca

Capture In Interactio ion

  • E.g Input : 3 * x1 + 4 * x2, Output

: f(Input)

  • In

Introduce Non-Li Linearity

  • Slope is not constant ( zero for

negative value, 1 for positive )

  • Re

Reduce th the chances of vanishing gr gradi dient

  • Average derivative rarely

become 0 ( some data points have positive derivative )

slide-27
SLIDE 27

27 S A P I E N T R A Z O R F I S H

Convolution Neural Network ( CovNet) : Components

NO NON-LI LINEARITY CONVO VOLUTION PO POOLING Downsample feature map to reduce dimensionality

Image Credit : https://towardsdatascience.com/applied-deep-learning-part-4-convolutional-neural-networks-584bc134c1e2

Max Pool

slide-28
SLIDE 28

28 S A P I E N T R A Z O R F I S H

Popular Neural Network Architectures : LSTM ( Long Short Term Memory )

  • Special kind of Recurrent Neural

Network ( RNN )

  • Can learn long-term dependencies

( as default behavior )

  • Use gates
  • Forget gate
  • Input gate
  • Output gate

Source : http://colah.github.io/posts/2015-08-Understanding-LSTMs/

slide-29
SLIDE 29

29 S A P I E N T R A Z O R F I S H

Popular Neural Network Architectures : RBM (Restricted Boltzmann Machine)

  • Shallow network
  • Can be used for unsupervised learning
  • Reconstruct input
  • Belongs to auto-encoder family
  • Useful in Collaborative filtering,

dimensionality reduction Visible Layer ( Input Layer ) Hidden Layer

slide-30
SLIDE 30

30 S A P I E N T R A Z O R F I S H

Popular Neural Network Architectures : Deep Belief Networks

  • Boltzmann Machine is a specific energy

model with linear energy function.

  • This is a deep neural network composed of

multiple layers of latent variables (hidden units or feature detectors)

  • Can be viewed as a stack of RBMs
  • Hinton along with his student proposed that

these networks can be trained greedily one layer at a time

slide-31
SLIDE 31

31 S A P I E N T R A Z O R F I S H

Popular Neural Network Architectures : Auto-Encoders

Source : http://colah.github.io/posts/2015-08-Understanding-LSTMs/

  • Aim of auto encoders network is to learn a

compressed representation for set of data

  • Unsupervised learning algorithm that applies

back propagation, setting the target values equal to inputs (identity function)

  • Denoising auto encoder addresses identity

function by randomly corrupting input that the auto encoder must then reconstruct or denoise

  • Best applied when there is structure in the data
  • Applications : Dimensionality reduction, feature

selection

slide-32
SLIDE 32

32 S A P I E N T R A Z O R F I S H

Why Tensorflow for Deep Learning?

slide-33
SLIDE 33

33 S A P I E N T R A Z O R F I S H

Why Tensorflow for Deep Learning ? Versatility

Auto-differentiation & first-rate

  • ptimizers

Mathematical functions Assemble neural networks Extensive built-in support

slide-34
SLIDE 34

34 S A P I E N T R A Z O R F I S H

Why Tensorflow for Deep Learning ?

Align cognitive model to programming model

slide-35
SLIDE 35

35 S A P I E N T R A Z O R F I S H

Why Tensorflow for Deep Learning ?

Use Tensorboard to Visualize and Debug Deep Learning Network Network Graph Visualize Embedding Cost Trends

slide-36
SLIDE 36

36 S A P I E N T R A Z O R F I S H

Why Tensorflow for Deep Learning ?

Tensorflow Playbook : http://playground.tensorflow.org/

slide-37
SLIDE 37

37 S A P I E N T R A Z O R F I S H

Deep Learning in Search

slide-38
SLIDE 38

38 S A P I E N T R A Z O R F I S H

Representation : A Key Aspect

Search term..

Engine Filtered Results

Search Engines

Re-Ranked Results

Personalized Search

User + Interactions

Search or Query Word Phrase Sentence Image Audio Collection Set of Documents Set of Documents Set of Documents Set of Images Set of Audio Representation Representation Similarity

slide-39
SLIDE 39

39 S A P I E N T R A Z O R F I S H

Representation : A Key Challenge

Word Sentence Query Documents Collection Text Image Image Query Set of Images Collection Treat “Representation” as the problem of “Em Embedding” to encode objects ( text, images ) into co conti tinuous s space ce ( set of numeric values )

slide-40
SLIDE 40

40 S A P I E N T R A Z O R F I S H

Search Problem In Context of Embedding

Objects 1 Create embedding for

  • bjects into continuous

space Shoes Sneakers How to tie laces ? Black holes How time changes in black hole ? 2 Put similar objects together Based on embedding 3 Shoes Sneakers How to tie laces ? Black holes How time changes in black hole ? Given a query embedding, find neighbors quickly

slide-41
SLIDE 41

41 S A P I E N T R A Z O R F I S H

Word Embedding : One-Hot Encoding

Sneakers

sh shoe sn sneake kers tr tree book book bl black … .. .. .. .. 1 1 1

Number of columns = Number of unique words in the vocabulary Is Issues :

  • Sparse ( all values are zero except one )
  • Large embedding dimension ( equal to vocab size )
  • Semantic meaning not captured
  • “shoe” is at same distance as “tree”

Shoe Tree

slide-42
SLIDE 42

42 S A P I E N T R A Z O R F I S H

Word Embedding : Prediction Based Encoding

Sneakers Embedding Size : d Be Benefits:

  • Dense representation
  • Smaller embedding dimension ( equal to embedding size : d )
  • Semantic meaning captured
  • “shoe” is at smaller distance than “tree”

[ 0.322 0.122 0.231 0.111 0.222 …….. 0.445 ] Word2Vec Model (Google, 2013) GloVe Model (Stanford, 2014 ) CBOW Use surrounding words to predict target word Skip-Gram Use target word to predict surrounding words Use word-word co-occurance matrix and nearest neighbor to create embedding

slide-43
SLIDE 43

43 S A P I E N T R A Z O R F I S H

Demo : Short Introduction to Embedding Goal :

  • Embedding in Tensorflow
  • Word Embedding using GloVe pre-trained model
slide-44
SLIDE 44

44 S A P I E N T R A Z O R F I S H

Sentence Embedding

I like white sneakers.

  • Sentence embedding can be considered as word embedding matrix where each word

is represented as a column of size d

  • Leverage pre-trained word embedding to learn sentence embedding
  • Weights are further tuned during training process

I Li Like Wh White Sn Sneakers .. … .. .. x1 x2 x3 x4 .. .. .. .. Embedding size ( d )

slide-45
SLIDE 45

45 S A P I E N T R A Z O R F I S H

Sentence Embedding Using Convolution Neural Network

Paper : Convolutional Neural Networks for Sentence Classification" by Yoon Kim [ Link ]

slide-46
SLIDE 46

46 S A P I E N T R A Z O R F I S H

Image Embedding : Option 1 : Flattened Arrays

Is Issues :

  • Very large embedding dimension
  • Search in a very large embedding space will be very expensive
  • Spatial features ( edges, contours, textures ) are not captured : Poor search results

Each image is set of pixels Flatten the pixels matrix into arrays 100 px by 100 px image : 10000 dimensional array

slide-47
SLIDE 47

47 S A P I E N T R A Z O R F I S H

Image Embedding : Option 2 : Pre-Trained Deep Learning Models

Be Benefits:

  • Smaller embedding dimension
  • Spatial features ( edges, contours, textures ) are captured in intermediate layers
  • Enhanced search experience

Alexnet Network Inception VGG Lenet GoogLenet ResNet

Image Source : https://www.saagie.com/blog/object-detection-part1

slide-48
SLIDE 48

48 S A P I E N T R A Z O R F I S H

Demo : Image search using Alexnet Pre-Trained Model Goal :

  • Use Alexnet Pre-trained model to create image embedding
  • Image Search
slide-49
SLIDE 49

49 S A P I E N T R A Z O R F I S H

Deep Learning in Recommendation System

slide-50
SLIDE 50

50 S A P I E N T R A Z O R F I S H

RecSys 101 : What is RecSys?

“Serve the re relevant items to users in an au auto tomated mated fashion to optimize short a and long t term business o

  • bjectives”

RELEVA VANT ( WHAT ) 1. Novelty 2. Serendipity 3. Diversity AU AUTOMAT ATED ( HOW ) 1. No manual intervention 2. Scale Up BU BUSINESS OBJECTIVE VES ( WHY ) 1. Short Term Business Objectives a. High clicks b. Revenue c. Positive explicit ratings 2. Long Term Business Objectives a. Increased engagement b. Increase in social action c. Increase in Subscriptions

slide-51
SLIDE 51

51 S A P I E N T R A Z O R F I S H

RecSys 101 : Internals

What Who Item filtering, Understanding User Profile, Intent Context Interaction Sc Score It Items P ( click ), P (share ) Ra Rank It Items Sort by Business Objective Machine Learning

Content Based

slide-52
SLIDE 52

52 S A P I E N T R A Z O R F I S H

RecSys 101 : Content Based Recommendation

Recommends an item to a user based upon a description of the item and a profile of the user’s interests

Representing Items using Features

Drama Arty Comedy Action … … … Commercial 0.7 0.2 0.8

User Profile

Creating a user profile that describes the types of items the user likes/dislikes

slide-53
SLIDE 53

53 S A P I E N T R A Z O R F I S H

RecSys 101 : Content Based Recommendation

§ More than 100 million monthly active users § Over 30 million songs

slide-54
SLIDE 54

54 S A P I E N T R A Z O R F I S H

RecSys 101 : Content Based Recommendation

Pr

Pros

No need of other users data Easy to understand reason behind recommendation Capable of recommending new and unknown items

Co

Cons

Can only be effective in limited circumstances No suitable suggestions if content doesn’t have enough information Depend entirely on previous selected items and therefore cannot make predictions about future interests of users

slide-55
SLIDE 55

55 S A P I E N T R A Z O R F I S H

RecSys 101 : Internals

What Who Item filtering, Understanding User Profile, Intent Context Interaction Sc Score It Items P ( click ), P (share ) Ra Rank It Items Sort by Business Objective Machine Learning

Collaborative

slide-56
SLIDE 56

56 S A P I E N T R A Z O R F I S H

RecSys 101 : Collaborative Filtering

Unlike Content based filtering , Collaborative Filtering doesn’t require any product description at all

Which movie should I watch ? Which Restaurant to go ? Which book to read next ?

I should ask my “FRIENDS” !

slide-57
SLIDE 57

57 S A P I E N T R A Z O R F I S H

RecSys 101 : Collaborative Filtering : Interactions / Feedback

Item User

Ex

Expl plicit

Im

Implicit Ratings Purchased Add to cart Viewed Shared

slide-58
SLIDE 58

58 S A P I E N T R A Z O R F I S H

RecSys 101 : Collaborative Filtering : Interactions / Feedback

Item User

Ex

Expl plicit

Im

Implicit

§ Very few users leave ratings § Very less explicit data § Ratings are biased § Often not easy for user to express likeness in terms of Ratings or score § Easy to track & Store web logs data § Lots of implicit data generated for each user § More the data , better the recommendations § Noisy § Difficult to infer Negative Feedback

slide-59
SLIDE 59

59 S A P I E N T R A Z O R F I S H

RecSys 101 : Collaborative Filtering

Collaborative Filtering Nearest Neighbor User based Item based Latent Factor

Find similar users like me and recommend what they liked Find similar items to those I have previously liked Factor based techniques ( Matrix Factorization, Factorization Machine )

  • Scalability
  • Predictive accuracy
  • Can model real-life situations ( e.g. Biases, Additional Input

sources , Temporal Dynamics ) $ 1 Million Netflix Challenge

slide-60
SLIDE 60

60 S A P I E N T R A Z O R F I S H

RecSys 101 : Collaborative Filtering : Latent Factor

Ta Take e the e users ers and th their f feedback ck f for di different items and d ide dentify hidde dden factors th that i t influence ce th the u user fe feedback The idea is to factorize or decompose the user item matrix into two matrices

  • Users are mapped on to hidden factors
  • Items are mapped on to hidden factors
slide-61
SLIDE 61

61 S A P I E N T R A Z O R F I S H

RecSys 101 : Collaborative Filtering

Pr

Pros

Content information not required either of users or items Personalized recommendations using other user’s experience No domain experience required

Co

Cons

Cannot produce recommendations if there is no interaction data available ( Co Cold ld Start Pr Problem em ) Often demonstrate poor accuracy when there is little data about users’ ratings (Sp Sparsity ty) Popular items get more feedback ( Po Popularity ty bi bias )

slide-62
SLIDE 62

62 S A P I E N T R A Z O R F I S H

RecSys 101 : Internals

What Who Item filtering, Understanding User Profile, Intent Context Interaction Sc Score It Items P ( click ), P (share ) Ra Rank It Items Sort by Business Objective Machine Learning

Hybrid

slide-63
SLIDE 63

63 S A P I E N T R A Z O R F I S H

RecSys 101 : Hybrid Recommendation Engine

Pr

Pros

Solve the issue of Cold Start by leverage both content and collaboration Use of Implicit feedback reduces the sparsity issues to a large extent Can include higher order feature interactions as well

Co

Cons

Difficult to implement

slide-64
SLIDE 64

64 S A P I E N T R A Z O R F I S H

Representation : A Key Aspect

Engine User

Other Users Interactions

Recommended Results

Recommendation Engines

Re-Ranked Results Item Interactions

Representation User ID Item ID User / Item Metadata

slide-65
SLIDE 65

65 S A P I E N T R A Z O R F I S H

Matrix Factorization

Users Items Ho How to better represent use users s and nd items s ? Wh What at ab about item an and use user metadata ?

slide-66
SLIDE 66

66 S A P I E N T R A Z O R F I S H

Matrix Factorization

Credit : Olivier Grisel

slide-67
SLIDE 67

67 S A P I E N T R A Z O R F I S H

Demo : User and Item Embedding in Matrix Factorization using Tensorflow Goal :

  • Use embedding in RecSys
  • How to add metadata for building hybrid RecSys
slide-68
SLIDE 68

68 S A P I E N T R A Z O R F I S H

Matrix Factorization : Deep Neural Networks

Credit : Olivier Grisel

slide-69
SLIDE 69

69 S A P I E N T R A Z O R F I S H

Matrix Factorization : Deep Neural Networks with Metadata

Credit : Olivier Grisel

slide-70
SLIDE 70

70 S A P I E N T R A Z O R F I S H

Learning to Rank

slide-71
SLIDE 71

71 S A P I E N T R A Z O R F I S H

Problem Space : Re-Ranking

Search term..

Engine Filtered Results

Search Engines

Re-Ranked Results Indexing Similarity Calculation Recency Impression Discounting

Personalized Search

User + Interactions

Search + Re-Ranking Using Machine Learning = Le Learning g to Rank

slide-72
SLIDE 72

72 S A P I E N T R A Z O R F I S H

Why Learning to Rank is Required ?

Learning to Rank Classical Information Retrieval

  • Inputs
  • Query, Document
  • TF, IDF , |D|, P(t|D)
  • Output
  • Relevance Score (q, d)
  • Manual function : VSM Cosine , BM25,

LM Dirichlet

  • Not many factors to tune
  • To improve search results quality

need to consider many features

  • Query word in anchor text
  • Number of images
  • Number of out links
  • Page rank
  • Classical IR don’t work as many

factors and their combinations have to be tuned

  • Machine learning based ranking

system learn a function to automatically rank results

slide-73
SLIDE 73

73 S A P I E N T R A Z O R F I S H

Learning to Rank Formulation

Query Documents Relevance [ 0 , 1 ] Re-ranked Documents based on predicted relevance

qi Di = di1 , di2 , di3 ….. din yi1 , yi2 , yi3 ….. yin Learn a ranking function H H ( w, func(qi, Di)) à Optimal Ranking Ri

Query + Documents Representation

slide-74
SLIDE 74

74 S A P I E N T R A Z O R F I S H

Learning to Rank Techniques : Point Wise Approach

Query Documents Relevance [ 0 , 1 ]

qi Di = di1 , di2 , di3 ….. din yi1 , yi2 , yi3 ….. yin H ( w, func(qi, Di)) à Optimal Ranking Ri q1 , d11, y11 q1 , d12, y12

  • q2 , d21 , y21

q2 , d22, y22 Triples

Input (Query, Document Pair Representation) Output Class

Binary Classification Use Probability as Relevance Score

slide-75
SLIDE 75

75 S A P I E N T R A Z O R F I S H

Learning to Rank Techniques : Pair Wise Approach

Query Documents Relevance [ 0 , 1 ]

qi Di = di1 , di2 , di3 ….. din yi1 , yi2 , yi3 ….. yin H ( w, func(qi, Di)) à Optimal Ranking Ri q1 , di (Pos), dj (Neg) q2 , di (Pos), dj (Neg) Triples H ( w, func(qi, Di(pos)) >= H ( w, func(qi, Di(neg)) + margin There is also a “List Wise Approach” that ranks all documents in one go. But complicated to implement

slide-76
SLIDE 76

76 S A P I E N T R A Z O R F I S H

Learning to Rank In Recommendation System

Query Documents Relevance [ 0 , 1 ]

qi Di = di1 , di2 , di3 ….. din yi1 , yi2 , yi3 ….. yin H ( w, func(qi, Di)) à Optimal Ranking Ri

User Items Implicit Feedback [ 0 , 1 ]

Qi Di = di1 , di2 , di3 ….. din yi1 , yi2 , yi3 ….. yin H ( w, func(qi, Di)) à Optimal Ranking Ri

slide-77
SLIDE 77

77 S A P I E N T R A Z O R F I S H

Triplet Loss with Implicit Feedback

Credit : Olivier Grisel

slide-78
SLIDE 78

78 S A P I E N T R A Z O R F I S H

Deep Triplet Network with Implicit Feedback

Credit : Olivier Grisel

slide-79
SLIDE 79

79 S A P I E N T R A Z O R F I S H

Demo : Using Triplet Loss for Implicit Feedback using Tensorflow Goal :

  • Use implicit feedback
  • Use triplet network
slide-80
SLIDE 80

80 S A P I E N T R A Z O R F I S H

Popular Alternatives

Paper : DeepFM: A Factorization-Machine based Neural Network for CTR Prediction ( Link ) Deep FM Wide & Deep ( Google )

slide-81
SLIDE 81

81 S A P I E N T R A Z O R F I S H

Production

slide-82
SLIDE 82

82 S A P I E N T R A Z O R F I S H

How Recommendation System Works

Recommendation System ( Run in Subseconds )

Enter Server Request Run Recommendation Engine Personalized Recommendation Response with Personalized Recommendation Interact ( click / buy / rate ) Save Interactions Pass to the model Updated Model Train Model

slide-83
SLIDE 83

83 S A P I E N T R A Z O R F I S H

Machine Intelligence Architecture

Model & Testing Framework Machine Learning Server

Deep Learning

Data Ingestion and Processing Pipeline

Tune Web Trackers Mobile Trackers Adobe/Google Analytics Kafka Hadoop Distributed File System Spark Streaming 3rd Party Service (like Comscore) APIs Deep Storage (S3/Google Cloud Storage)

Product Categorization Spam Reduction

Enhanced Domain Data

Attribute Extraction Optimization Models

Model Deployment Engine (Spark, Google Cloud ML

  • r Azure ML)

Content & Attribute Based

Sqoop

A/B Test

Testing Framework

Offline Evaluation Real time adjustment with bandit algorithm Model Ensembles Machine Learning Server (TensorFlow Serving, Azure ML, Spark Velox) Metrics

API

Cached Default Recommend ation Scheduled retraining Application ( Recommender Decoupled )

Too long load time APIs

2 3

Real-time Pipeline Batch Pipeline Session

1

slide-84
SLIDE 84

84 S A P I E N T R A Z O R F I S H

Tensorflow in Production

  • Mod

Model Object Optimization

  • n
  • Graph Transformation Tool ( GTT )

to remove unreachable nodes, fuse adjacent operators , round weights for compression

  • Ahead of Time ( AOT ) compiler

built on XLA ( Accelerated Linear Algebra ) for minimal Tensorflow Runtime

  • Lo

Low latency inference

  • Tensorflow Serving to serve TF

models Tra Training In Infe ference

  • Di

Distribu buted d training

  • Distributed Tensorflow
  • Leverage Kubernetes to auto-scale training

process

  • Tensorflow + Spark to get the best of both

world

  • GCP Cloud ML for serverless processing
  • Hy

HyperParameter Tu Tuning

  • Kubernetes for parallel execution of

hyperparameter tuning

  • Leverage Bayesian Optimization ( Scikit-

Optimize,)

  • Sp

Specialized H Hardware

  • Leverage GPUs, TPUs for faster training
  • Leverage Estimator and Experiment APIs
slide-85
SLIDE 85

85 S A P I E N T R A Z O R F I S H

Tensorflow Serving

slide-86
SLIDE 86

86 S A P I E N T R A Z O R F I S H

Tensorflow Serving

  • Low latency inference
  • Model versioning and

rollback

  • Custom version policy

for A/B and Bandit tests

  • Uses highly efficient

gRPC and Protocol Buffers

slide-87
SLIDE 87

Thanks

0 6 - M A R C H - 2 0 1 8

  • Pr

Provide f e feed eedback ck o

  • n th

the tu e tuto torial

  • Do

Download d & review tutorial material

  • Concepts
  • Demos
  • Sh

Share

  • Progress, Issues, Use-cases
  • Twitter
  • @a_v

a_vijay aysri rinivas as

  • @me

meabhishekkuma mar

Ne Next Steps