ONLINE LEARNING OF WEBSITE EMBEDDINGS for Accurate Prediction of - - PowerPoint PPT Presentation

online learning of website embeddings
SMART_READER_LITE
LIVE PREVIEW

ONLINE LEARNING OF WEBSITE EMBEDDINGS for Accurate Prediction of - - PowerPoint PPT Presentation

ONLINE LEARNING OF WEBSITE EMBEDDINGS for Accurate Prediction of User Behavior Even when Data are Scarce Amelia White, Director of Data Science Research Nov 13, 2019 Expanding Digital Survey Data SMALL SURVEY PANEL CUSTOM DSTILLERY DEVICE


slide-1
SLIDE 1

Amelia White, Director of Data Science Research Nov 13, 2019

ONLINE LEARNING OF WEBSITE EMBEDDINGS

for Accurate Prediction of User Behavior

Even when Data are Scarce

slide-2
SLIDE 2

Expanding Digital Survey Data

CUSTOM MODEL DSTILLERY DEVICE UNIVERSE ~200MM devices SMALL SURVEY PANEL

slide-3
SLIDE 3

nytimes.com 4/11/19 buzzfeed.com 4/11/19

Data Used for Modeling

3

CUSTOM MODEL DSTILLERY DEVICE UNIVERSE ~200MM devices 1.5B Web site visits daily SMALL SURVEY PANEL chocicecream.com 4/11/19 nytimes.com 4/11/19 buzzfeed.com 4/11/19 vanillaicecream.com 4/11/19

slide-4
SLIDE 4

4

Millions of Users 10 Million URLs

slide-5
SLIDE 5

10 Million URLs

Thousands of Users

Need for a Reduced Dimensional Feature Space

slide-6
SLIDE 6

REDUCED DIMENSIONAL FEATURE SPACE

slide-7
SLIDE 7

Taking Ideas from Natural Language Processing

  • Similar data
  • Sentences of words
  • Sequences of web sites visited
  • High dimensional categorical features
slide-8
SLIDE 8

10 Million URLs

Thousands of Users

Need for a Reduced Dimensional Feature Space

slide-9
SLIDE 9

10 Million URLs

Thousands of Users

Need for a Reduced Dimensional Feature Space

Thousands of Users 128 Dimensional Embedding Space

slide-10
SLIDE 10

www.hairstyle.com www.short-hairstyles.co www.pophaircuts.com Kx128 B = Embedding matrix Bi Output Layer

Website Embeddings V1: word2vec

Fully Connected Edges P(ContextURL |targetURL) www.pophaircuts.com Dictionary(www.short-hairstyles.co) = i i = 0,...,K-1 K = 50,000

slide-11
SLIDE 11
  • Trained word2vec with the browsing history of all devices seen in a 2 week time

period:

  • Browsing history of 430,648,822 devices
  • Sequence of 15,077,897,800 site visits

Training Word2vec

slide-12
SLIDE 12

Visualizing Embeddings

Website Cluster # www.boardingarea.com 512 www.thepointsguy.com 512 www.taxifarefinder.com 512 www.theflightdeal.com 512 www.uberestimate.com 512 www.sleepinginairports.net 512 www.frugaltravelguy.com 512 www.airchina.us 512 www.cathaypacific.com 512 www.travelskills.com 512 www.travelsort.com 512 www.skyteam.com 512 www.seatmaestro.com 512 www.flyertalk.com 512 www.expertflyer.com 512 www.singaporeair.com 512 www.estimatefares.com 512

slide-13
SLIDE 13

BEYOND WORD2VEC:

  • Embedding millions of

URLs, with a manageable number of parameters

  • Online learning of

embeddings

slide-14
SLIDE 14

EMBEDDING MORE URLS WITH FEWER PARAMETERS

slide-15
SLIDE 15

Hash Embeddings

slide-16
SLIDE 16

Website Embeddings V2: Hash embeddings

Dictionary(‘www.kohls.com’) = m H1(m) = i H2(m) = j Bi Bj Pm Hash Embedding Convolution layer Output Layer Nx2 P= Importance parameters m = 0,...,N N = 10M Kx128 B = Embedding matrix i,j = 0,...,K

slide-17
SLIDE 17

Hash Embedding Requires Fewer Parameters

Number of Parameters

slide-18
SLIDE 18

Measuring Embedding Quality for Parameter Selection

https://platform.ai/blog/page/11/the-silhouette-loss-function-metric-learning-with-a-cluster-v alidity-index/, JIM BREMNER, APRIL 09, 2019

  • Selected a ‘ground

truth’ clustering, made from a known high quality embedding

  • Used the silhouette

score to measure how well test embeddings converged to the ground truth clustering as the network trained

slide-19
SLIDE 19

Good Performance with 100x Fewer Parameters

s(i) Number of Parameters

slide-20
SLIDE 20

ONLINE LEARNING OF EMBEDDINGS

slide-21
SLIDE 21

H0(‘www.kohls.com’) = m H1(m) = i H2(m) = j Bi Bj Pm Hash Embedding Convolution layer Output Layer

Website Embeddings V3: Online Learning of Hash Embeddings

Nx2 P= Importance parameters m = 0,...,N Kx128 B = Embedding matrix i,j = 0,...,K

slide-22
SLIDE 22

Online Learning Optimizes Faster than Batch Learning

W2V (batch) Embeddings Hash (online) Embeddings

s(i) Higher quality embeddings

slide-23
SLIDE 23

Training the Online Embeddings

B

slide-24
SLIDE 24

Distance in Embedding Space

slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28

MODELING USERS IN EMBEDDING SPACE

slide-29
SLIDE 29

10 Million URLs

Thousands of Users

Need for a Reduced Dimensional Feature Space

Thousands of Users 128 Dimensional Embedding Space

slide-30
SLIDE 30

From URL Embeddings to Models

slide-31
SLIDE 31

From URL Embeddings to Models

slide-32
SLIDE 32

~1M training examples ~1000 training examples

Embedding Features Outperform Sparse Web Features For Small Data Sets

% Gain in AUC Comparing Embedding Features to Sparse Web Features

slide-33
SLIDE 33

~1M training examples ~1000 training examples

Embedding Features Outperform Sparse Web Features For Small Data Sets

% Gain in AUC Comparing Embedding Features to Sparse Web Features

slide-34
SLIDE 34

~1M training examples ~1000 training examples

Embedding Features Outperform Sparse Web Features For Small Data Sets

% Gain in AUC Comparing Embedding Features to Sparse Web Features

slide-35
SLIDE 35

MODELING SURVEY DATA

CUSTOM MODEL DSTILLERY DEVICE UNIVERSE ~200MM devices SMALL SURVEY PANEL

slide-36
SLIDE 36

Case Study: Predicting Ad Influence for Ice Cream Brand

  • The Problem:

○ A survey company models which people are likely to be influenced by an advertisement for an ice cream brand ○ 5.5K survey respondents ○ 500 high scoring respondents

  • Our Goal:

○ Predicting the high scoring respondents ○ Produce audience of devices that are predicted to be influenceable by ad for ice cream brand

slide-37
SLIDE 37

Case Study: Predicting Ad Influence for Ice Cream Brand

  • Test AUC on predicting high scoring

respondents:

○ Raw web behavior: 64.1 ○ Summarized web behavior: 63.5 ○ Cookie Embeddings: 75.8 Website embeddings Sparse web features Clusters of web sites

slide-38
SLIDE 38

THANK YOU

Presented by Amelia White. awhite@dstillery.com Contributors: Christopher Jenness Melinda Han Williams MLE team: Wickus Martin Roger Cost Justin Moynihan Patrick McCarthy