Quick Growth through ML Model A/B Testing Introduce eBay - - PowerPoint PPT Presentation

quick growth through ml model
SMART_READER_LITE
LIVE PREVIEW

Quick Growth through ML Model A/B Testing Introduce eBay - - PowerPoint PPT Presentation

Quick Growth through ML Model A/B Testing Introduce eBay Experimentation Platform for the Paid Search Ads - Sleven Liu, Martin Zhang, Yi Liu Agenda Why Growth hacking and A/B testing? Search Ads: The most important marketing channel


slide-1
SLIDE 1

Quick Growth through ML Model A/B Testing

Introduce eBay Experimentation Platform for the Paid Search Ads

  • Sleven Liu, Martin Zhang, Yi Liu
slide-2
SLIDE 2

Agenda

  • Why Growth hacking and A/B testing?
  • Search Ads: The most important marketing channel
  • Challenges and Solution for A/B testing
  • Machine Learning Models Integration

Hadoop Summit 2

slide-3
SLIDE 3

Quick Growth in the eBay Paid Marketing through A/B Testing & ML Model

Hadoop Summit 3

50+

Models/Year

5+

Years

60+

Experiments/ Year

slide-4
SLIDE 4

Growth Hacking

Hadoop Summit 4

Data

A/B test Marketing “Growth hackers are a hybrid of marketer and coder,

  • ne who…answers with A/B tests, landing pages,

viral factor, email deliverability, and Open Graph. On top of this, they layer the discipline of direct marketing, with its emphasis on quantitative measurement, scenario modeling via spreadsheets, and a lot of database queries.”

  • 《Growth Hacker is the new VP Marketing》

Andrew Chen

slide-5
SLIDE 5

A/B Testing

  • Key Elements

–Statistical hypothesis –Sampling

  • Benefits

– Customer vs. expertise – Early launch and adoption in the marketing – Continue delivery and integration – Based on the data and statistics

  • Limitation

– Statistician Power – Imbalancing

Hadoop Summit 5

slide-6
SLIDE 6

Growth Hacking Channels

  • “Poor distribution, not product is the number one cause of failure” – Peter Thiel, 《Zero to One》

Hadoop Summit 6

UGC / SEO Ads Affiliate Net Email Viral Marketing

slide-7
SLIDE 7

Google Text Ads

  • Google Ads, CPC
  • Content

–Headline –Display URL –Description

  • SRP + Search Network
  • Exact vs. Broad match
  • Campaign Structure

Hadoop Summit 7

slide-8
SLIDE 8

Google Product Listing Ads / Shopping Campaign

  • More info (price/picture) more qualified traffic
  • Catch more eyeballs
  • Product/Brand match
  • Higher barrier, less competition
  • Backend structure

Hadoop Summit 8

slide-9
SLIDE 9

Challenges of A/B testing in the Paid Search Ads

  • No control on the user/visiting
  • Accurate user targeting
  • Skew data & Low coverage

Sampling

  • “Black Box” on third partner / ads platform
  • Limitation of Testing objects

Test Setup

  • External data loop

Tracking

Hadoop Summit 9

slide-10
SLIDE 10

A/B Testing Solution Example in the Text Ads

Sampling

  • Based on the keywords
  • Stratified sampling to resolve skewed data

Test Setup

  • Campaign structure management
  • Test object: bidding models

Tracking

  • Insides + outsides tracking
  • Data loop for the model

Hadoop Summit 10

slide-11
SLIDE 11

Why Sampling is important for A/B testing?

  • Choose the right sample size
  • Is a large sample always good to speed up A/B? Or put business in real risk?
  • Choose the right method
  • Why not using random sampling anyway?
  • Un-represented sampling result might hurt business after rollout
  • Is the model workable for all the Ads? Or only the sampled ads?
  • A trustable sampling result makes the A/B result trustable
  • Is the difference from A/B test result really from the model? Or because of the sampling difference?

Hadoop Summit 11

slide-12
SLIDE 12

Sampling Challenge – Huge volume of data

  • Billion level Ads
  • New Ads sourcing – is the process scalable for

more ads added to marketing?

  • Ads history tracking – how the process dealing

with the historical data?

Hadoop Summit 12

slide-13
SLIDE 13

Sampling challenge – Skew Data & Low Coverage

Hadoop Summit 13

  • Top click queries
  • Long tail queries
  • Low Conversion Rate – Impression -> Click ->

Transaction

  • Deal with ads with no impression on partner

5000000 10000000 15000000 20000000 25000000 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00%

Ad Count Click Distribution (hot -> cold)

ad count total_ad

ADS IMPRESSION CLICK VALUED CLICK

Ads Count

slide-14
SLIDE 14

Sampling Solution - Method

Hadoop Summit 14

slide-15
SLIDE 15

Sampling Solution - Tech

  • Hbase + HDFS
  • Active ads stored in Hbase
  • Ads history stored in HDFS
  • Spark
  • Huge data pre-aggregation
  • Optimization of huge data join with ads history, user behavior…
  • Store data as Parquet to improve the spark job efficiency

Hadoop Summit 15

slide-16
SLIDE 16

Machine Learning Model Integration

Hadoop Summit 16

Where is the data? What is a model? How to manage the model lifecycle?

slide-17
SLIDE 17

Challenge for data

  • Data extraction
  • Data processing
  • Data gathering

Hadoop Summit 17

  • Original Solution
  • Regular ETL data pipeline to build

factor for each model

  • Move gathered factors to model

running env based on different scenario

  • Bottleneck
  • Some effort are duplicated among

different models

  • Factor is not reusable as it is built

to meet special model’s requirement

  • More effort to maintain the factor as

it could be from different sources and built for specified model

slide-18
SLIDE 18

New Solution - Factor System

  • Factor: the model input
  • Heterogeneous data sources
  • Syntax + Semantic layer
  • Calculate on the Hadoop
  • Factor life-cycle

Hadoop Summit 18

slide-19
SLIDE 19

What factor system provides

  • Register Service
  • Factor code integration, deployment
  • External factor register
  • Download Service
  • Online model input
  • Offline data exploring and model development
  • Scheduling Service
  • Schedule the factor code in factor system due to different source data latency
  • Dashboard
  • Factor status monitor, help understand the factor code running status
  • Factor meta definition, help data scientist better understand the factor to build the model

Hadoop Summit 19

slide-20
SLIDE 20

Capacity of Factor System

Hadoop Summit 20

  • PB level source data volume
  • 10+TB daily increment
  • 1000+ permanent factors, historical data backup on HDFS
  • Use Cases
  • Batch Models - serve all the machine learning models for Paid IM marketing
  • Adhoc – to support offline data exploring for data scientist and data developer
  • NRT/Real-time (Future) - build factor cache for NRT or real-time model use cases
slide-21
SLIDE 21

What model requires

Hadoop Summit 21

// Model Logic

Model result Data Stream 1 Data Stream 2

  • Model can access the wanted

data based on the logical design

  • Model can be executed in

expected env using right tech to meet different use cases

  • Model result can be delivered

for real business needs

slide-22
SLIDE 22

What is a model – Model Engine

Hadoop Summit 22

  • Onboarding data from factor

system to model engine

  • Execute models using different

tech solution to meet the real scenarios

  • Landing result to different

system to integrate with Ads publisher

slide-23
SLIDE 23

What model engine can help more to data scientist

  • Sampled data for model training
  • Data scientist can get pre-sampled represented ads to train/test the models
  • Real production factors access
  • Avoid duplicated effort from data scientist when developing new models with existing factors
  • Self Service
  • Integration, provide staging environment similar to real-production for model execution to avoid integration issue after

model deployment

  • Model deployment
  • Online debugging, all the model result/logs are kept in system to allow data scientist debugging during A/B testing
  • Dashboard
  • Model status monitor

Hadoop Summit 23

slide-24
SLIDE 24

Model Lifecycle (Batch)

Hadoop Summit 24

slide-25
SLIDE 25

Model Lifecycle (NRT)

Hadoop Summit 25

slide-26
SLIDE 26

Anything Else for model?

  • Is Model Result Reliable?
  • “SafeNet”
  • Collect the historical behavior of

model

  • Detect any significant difference
  • Block the result sending to publisher

Hadoop Summit 26

  • How to track?
  • Ads Monitor & Alert
  • Expose online model result to Scientist/Analyst
  • Dashboard
  • Hourly & Daily report
  • Alerts deliver to model owner & business owner
slide-27
SLIDE 27

Summary

  • A/B Testing
  • Hbase, HDFS, MySQL, Oracle, Mongo
  • Java, Scala, SQL
  • Machine learning model
  • HDFS, Kafka, Cassandra
  • Hive, Spark, Spark streaming
  • Java, Scala, R, Python
  • Dashboard
  • InfluxDB
  • Grafana

Hadoop Summit 27

slide-28
SLIDE 28

Hadoop Summit 28