Helixa Audience Projection of Target Consumers over Multiple - - PowerPoint PPT Presentation

helixa
SMART_READER_LITE
LIVE PREVIEW

Helixa Audience Projection of Target Consumers over Multiple - - PowerPoint PPT Presentation

Helixa Audience Projection of Target Consumers over Multiple Domains: a NER and Bayesian approach Gianmario Spacagna Chief Scientist @ Helixa OReilly AI Conference London, 16th October 2019 About Me 7+ years experience in Data Science and


slide-1
SLIDE 1

Helixa

Audience Projection of Target Consumers over Multiple Domains: a NER and Bayesian approach

Gianmario Spacagna Chief Scientist @ Helixa O’Reilly AI Conference London, 16th October 2019

slide-2
SLIDE 2

About Me

7+ years experience in Data Science and Machine Learning Currently leading a team of ML Scientists and ML Engineers Background in Telematics and Software Engineering of Distributed Systems Ongoing MBA Student Co-author of Python Deep Learning Contributor of the Professional Data Science Manifesto Blogger of Data Science Vademecum Founder of the Data Science Milan community (1.4k members) Stockholm, London, Milan Gianmario Spacagna Chief Scientist, Helixa gspacagna@helixa.ai

slide-3
SLIDE 3

DEMOGRAPHICS

HHI < 40K Female 18 - 24

INFLUENCERS

ODESZA Cardi B Shane Dawson James Charles

Helixa is Market Research platform that uses AI to integrate disparate data sources into an enriched view of the consumers who matter to your business.

INTERESTS

Listen to Podcasts Kylie Cosmetics Fan Starbucks Chipotle

PSYCHOGRAPHICS

Fast Food Fans Fashion Enthusiasts Entertainment Junkies

slide-4
SLIDE 4

In the next 40 minutes... OUR GOAL: Discuss some of the current challenges of traditional market research and propose a novel solution based on Named Entity Recognition (NER) and Bayesian Inference.

slide-5
SLIDE 5

Challenges in Market Research

slide-6
SLIDE 6

Applied Social Science

What is Market Research? Gain Insights for Strategic Decisions

Information about individuals and organizations Statistical Inference

slide-7
SLIDE 7

Why Market Research matters?

Brands Perceptions Consumers Preferences and Behaviors Buyer Personas Market Segmentation Identify Opportunities Market Trends

slide-8
SLIDE 8

Approaches to Market Research

Opinions and individual experiences In-depth interviews Smaller sample

Qualitative Quantitative

Numbers and Data Statistics Larger sample

slide-9
SLIDE 9

Quantitative Market Research is conducted with Surveys

Define Analyze Distribute Collect Design

slide-10
SLIDE 10

Limitations of Surveys

Expensive Invasive Response Bias Predefined questions Narrow coverage

slide-11
SLIDE 11

Market Research using “Implicit Consumers Feedback”

Define Analyze Distribute Collect Design

vs.

e.g. Social Listening

slide-12
SLIDE 12

Twitter Interactions

Inferring Interests from Twitter Interactions

slide-13
SLIDE 13

Advantages of Implicit Consumer Feedback Approaches

Flexible costs Wide view Opportunities for Big Data and AI Mass coverage Spontaneous

slide-14
SLIDE 14

What about other information?

Twitter Interactions Amazon Purchases

?

Beer Consumption Brand

?

slide-15
SLIDE 15

The Universe of Consumers Datasets

Social Media Financial and Properties Behaviors First Party (CSM) Consumer Research Surveys

slide-16
SLIDE 16

SCATTERED PARTIAL SKEWED

M A L E F E M A L E

18-30 31-43 44-56 57-70

Individual Consumers Datasets are Far From Being Exhaustive

slide-17
SLIDE 17

ALL IN ONE COMPLETE REPRESENTATIVE

M A L E F E M A L E

18-30 31-43 44-56 57-70

The Holy Grail of Market Research

slide-18
SLIDE 18

What is the baseline algorithm for “completing” datasets?

slide-19
SLIDE 19

Look-alike Fusion

slide-20
SLIDE 20

What is look-alike fusion?

Left: Social Network Panel Right: Consumptions Survey Panel

slide-21
SLIDE 21

Assignment Optimization Problem

  • Hungarian method
  • Simplex
  • Auction algorithm

Well-known solutions:

slide-22
SLIDE 22

Datasets Fusion

X X X X X X X X X X X

Left User Right User left-only entities right-only entities Target Audience =

slide-23
SLIDE 23

Look-alike Fusions Requires a Main Panel Centrality

slide-24
SLIDE 24

Look-alike Fusions Don’t Scale Well

Differences in feature space Craftsmanship required at each change of data Universal objective function to optimize

slide-25
SLIDE 25

Is there a more scalable way to “fuse” datasets?

slide-26
SLIDE 26

The Audience Projection

slide-27
SLIDE 27

Audience Projection defined as “User Binary Classification”

Source: Social Network Panel Destination: Consumptions Survey Panel

70M Social accounts 200M U.S. consumers 1.6M / 26M / TRUE FALSE TRUE FALSE

Target Audience =

PROJECTION

Ben & Jerry’s: bought in last 6 months? Affinity: 1.80x Venmo: paid in last 30 days? Affinity: 1.6x Angry Orchard: drunk in last 6 months? Affinity: 1.50x

slide-28
SLIDE 28

Solution = Named Entity Recognition (NER) + Bayesian Model

Social Pages Consumption Questions NER NER BAYESIAN MODEL ENTITY LINKING (NEL)

Destination: Consumptions Survey Panel Source: Social Network Panel Projected Users Probabilities Target Audience

slide-29
SLIDE 29

Entities Represent an Universal Feature Space

Social Pages Consumption Questions Listed Products NER NER NER

slide-30
SLIDE 30

The Coca-Cola Company is a total beverage company, offering over 500 brands in more than 200 countries and territories.

Named Entity Recognition(NER) in each Domain

Social Pages Consumption Questions Listed Products

Adidas Originals Men's Relaxed Strapback Cap Coca-Cola KWC-4 6-Can Personal Mini 12V DC Car and 110V AC Cooler, Red

slide-31
SLIDE 31

NLP Libraries with NER capability

Polyglot

Deep Pavlov

slide-32
SLIDE 32

Why for Production?

Fast Accurate Industry-grade maturity

slide-33
SLIDE 33

example of NER usage

slide-34
SLIDE 34

Same Entity May Exist with Different Spellings

Interacted with Coca-Cola Company on Social Networks “Have you consumed Coca-Cola last week?”

slide-35
SLIDE 35

Linking and Normalizing Entities via

en.wikipedia.org/wiki/Coca-Cola en.wikipedia.org/wiki/The_Coca-Cola_Company Entity Relationship

slide-36
SLIDE 36

Normalized Entities means a Common Feature Space

slide-37
SLIDE 37

Stacked Heterogeneous Feature Space

X X ? ? X X ? ? ? ? X X X X ? ? X X X ? ? X X X

Source Users Destination Users source-only entities common entities destination-only entities Latent interests Target Audience =

slide-38
SLIDE 38

Common Entities translate Source to Destination

Source: Social Network Panel Destination: Consumptions Survey Panel Target Audience = C

  • m

m

  • n

E n t i t i e s

?

Bayesian Model

Source Target Size 1.6M / 70M = 2.3% Share of Interests

slide-39
SLIDE 39

“Share of interests” encode the DNA of the Target Audience

Global share of interests: 100% Common Entities Target audience share of interests: 50% 17% 50% Target Audience slice

slide-40
SLIDE 40

Bayesian Model

Posterior Probability of user belonging to projected target given the Share of Interests on common entities

𝐐( / ) = ∈ 𝐐( / )∙𝐐( ) ∈ ∈ 𝐐( )

Evidence Prior Source Target Size=2.3% Likelihood

slide-41
SLIDE 41

Evidence Decomposition

𝐐( )

Evidence

𝐐( / )∙𝐐( ) ∈ ∈ 𝐐( / )∙𝐐( ) ∉ ∉

slide-42
SLIDE 42

Marginal Positive Likelihood

Binomial distribution

𝐐( / ) ≈ ∈

p=17%

slide-43
SLIDE 43

Joint Likelihood under Naive Assumption

𝐐( , , / ) =

50% 17% 50%

∈ 𝐐( / )∙

17%

∈ 𝐐( / )∙

50%

∈ 𝐐( / )

50%

slide-44
SLIDE 44

Destination variables TeenNick Robot Chicken Bob’s Burgers Ben & Jerry’s Venmo Angry Orchard Nintendo DSi XL Video Games Audio or Video Chat Affinity 8.9x 7.27x 2.36x 1.80x 1.62x 1.55.x 1.47x 1.45x 1.23x

Predicted Probabilities provides Insights on the Projected Users

PROJECTION

Target Audience =

Projected Users Probabilities Insights on Destination Variables

𝐐( / ) ∈

slide-45
SLIDE 45

Audience Projection In a Nutshell

Social Panel Consumptions Survey Panel Common Entities

Bayesian Model

Target Audience = Affinity: 1.80x Affinity: 1.55x Affinity: 1.62x

slide-46
SLIDE 46

Cool! How do you know this is accurate?

slide-47
SLIDE 47

Evaluation Techniques

slide-48
SLIDE 48

Binary Classifier Evaluation

Bayesian Model

Projected Users Probabilities Ground Truth

Evaluation techniques

?

slide-49
SLIDE 49

Validate via Common Entities

X X X X X X X X

Source Users Destination Users common entities Target Audience OR = Projected Audience OR =

Exact Query Replica Ground Truth

slide-50
SLIDE 50

Validate via Self Reconstruction Within the Same Domain

X X X X X X X X X X X X X X X X X X X

Source Users Destination Users source-only entities common entities destination-only entities Target Audience =

Ground Truth

slide-51
SLIDE 51

Validate via Double-step Reconstruction

PROJECTION PROJECTION Predicted probabilities Ground Truth

slide-52
SLIDE 52

Repeat Test Cases Stratifying by Category

slide-53
SLIDE 53

Demographics Skewness

PROJECTION

slide-54
SLIDE 54

Golden Benchmarks Comparison on Aggregated Insights

slide-55
SLIDE 55

Opportunities

slide-56
SLIDE 56

Many Linked Views of the Same Global Population

Audience Projection

slide-57
SLIDE 57

Multiple Perspectives Reinforce Reliability

Social Panel Target Audience = Interacted with Game Informer social page Affinity: 2.17x Have you read any Game Informer issue? Affinity: 1.73x Game Informer Single Issue Magazine purchased online Affinity: 2.51x

slide-58
SLIDE 58

Generalize Audience Projection as a Domain Adaptation Problem

slide-59
SLIDE 59

Final Remarks

slide-60
SLIDE 60

Many Datasets but

  • nly Partial Views
slide-61
SLIDE 61

Look-alike fusions don’t scale well

slide-62
SLIDE 62

Audience Projection adapts to any “entity domain”

Bayesian Model

slide-63
SLIDE 63

Accuracy and Biases can be quantified

slide-64
SLIDE 64

Strategists now have a complete view of their Target Audience

slide-65
SLIDE 65

Gianmario Spacagna Chief Scientist at Helixa.ai gspacagna@helixa.ai @gm_spacagna

slide-66
SLIDE 66

Appendix A: The spaCy NER Model

slide-67
SLIDE 67

Natural Language Processing (NLP) Pipeline

"Mark Watney visited Mars"

slide-68
SLIDE 68

The spaCy NER Model Overview

EMBED ENCODE ATTEND PREDICT

slide-69
SLIDE 69

Embedding Words

Features token lower prefix suffix shape Apple apple app ple Wwwww U.K. uk uk uk W.W. Fahrenheit 451 fahrenheit 451 fah 451 Wwwwwwwwww ddd

Each word (token) is represented by concatenating the embeddings of all of the 4 features in order to generalize the context for unknown words.

slide-70
SLIDE 70

Efficiently Embedding Words

Hash Embedding reduces the dimensionality and allows to deal with large vocabularies

slide-71
SLIDE 71

Encoding Sequences of Words

Residual Convolutional Neural Networks allows to encode context-independent word vectors into a context-sensitive sentence matrix.

Raw tri-gram chunk Enriched tri-gram matrix Mark Watney visited “Mark Watney visited Mars”

slide-72
SLIDE 72

Crafting the Attention Vector

The attention vector of the trigram includes information on the encountered entities.

“Mark Watney visited Mars” Attention vector Tri-gram matrix Enriched tri-gram vector

slide-73
SLIDE 73

Predicting the Recognized Entities

Actions: SHIFT OUT REDUCE (Entity Tagging) Stack Buffer Segment “Mark Watney visited Mars” Actions: 1.SHIFT 2.SHIFT 3.REDUCE (PER) 4.OUT 5.SHIFT 6.REDUCE (LOC) Mark Watney Mars Mark Watney visited Mars Enriched tri-gram vector Update attention Attention vector Tri-gam matrix

slide-74
SLIDE 74

Official Explanation of spaCy NER Model

https://www.youtube.com/watch?v=sqDHBH9IjRU

slide-75
SLIDE 75

Appendix B: The Bayesian Model

slide-76
SLIDE 76

Projecting the Share of Interests on Common Entities

Target Audience Projection 50% 17% 50%

Share of Interests:

SIZE: 60M SIZE: 200M SIZE: ? SIZE: 40M

Global Audience (average american) = Target Audience

evidence prior

slide-77
SLIDE 77

Evidence Statistics on Share of Interests

N = 180M users in U.S. population sampling rate = 1 : 10k n = 18k users in sample panel p = 17% of market penetration x = 3k expected projected users

SIZE: 200M SIZE: 40M

statistics: evidence

slide-78
SLIDE 78

𝐐( / ) =

Binomial Positive Likelihood

n = 17999 x = 2999 log(p)=-5.56323 Probability of selecting 3000 / 18000 McDonald’s panel users given that the user IS part of the target

n = 18000 x = 3000 log(p)=-5.54342 is smaller than

p=17%

slide-79
SLIDE 79

𝐐( / ) =

Binomial Negative Likelihood

n = 17999 x = 2999 log(p)=-5.53942 Probability of selecting 3000 / 18000 McDonald’s panel users given that the user IS NOT part of the target

n = 18000 x = 3000 log(p)=-5.54342

p=17%

is greater than