quick growth through ml model
play

Quick Growth through ML Model A/B Testing Introduce eBay - PowerPoint PPT Presentation

Quick Growth through ML Model A/B Testing Introduce eBay Experimentation Platform for the Paid Search Ads - Sleven Liu, Martin Zhang, Yi Liu Agenda Why Growth hacking and A/B testing? Search Ads: The most important marketing channel


  1. Quick Growth through ML Model A/B Testing Introduce eBay Experimentation Platform for the Paid Search Ads - Sleven Liu, Martin Zhang, Yi Liu

  2. Agenda • Why Growth hacking and A/B testing? • Search Ads: The most important marketing channel • Challenges and Solution for A/B testing • Machine Learning Models Integration Hadoop Summit 2

  3. Quick Growth in the eBay Paid Marketing through A/B Testing & ML Model 60+ 5+ 50+ Experiments/ Years Models/Year Year Hadoop Summit 3

  4. Growth Hacking “ Growth hackers are a hybrid of marketer and coder, one who … answers with A/B tests, landing pages, viral factor, email deliverability, and Open Graph. Marketing On top of this, they layer the discipline of direct marketing, with its emphasis on quantitative A/B test measurement, scenario modeling via spreadsheets, and a lot of database queries. ” Data - 《 Growth Hacker is the new VP Marketing 》 Andrew Chen Hadoop Summit 4

  5. A/B Testing • Key Elements – Statistical hypothesis – Sampling • Benefits – Customer vs. expertise – Early launch and adoption in the marketing – Continue delivery and integration – Based on the data and statistics • Limitation – Statistician Power – Imbalancing Hadoop Summit 5

  6. Growth Hacking Channels • “ Poor distribution, not product is the number one cause of failure” – Peter Thiel, 《 Zero to One 》 Viral Marketing Affiliate Email Net Ads UGC / SEO Hadoop Summit 6

  7. Google Text Ads • Google Ads, CPC • Content – Headline – Display URL – Description • SRP + Search Network • Exact vs. Broad match • Campaign Structure Hadoop Summit 7

  8. Google Product Listing Ads / Shopping Campaign • More info (price/picture) more qualified traffic • Catch more eyeballs • Product/Brand match • Higher barrier, less competition • Backend structure Hadoop Summit 8

  9. Challenges of A/B testing in the Paid Search Ads • No control on the user/visiting Sampling • Accurate user targeting • Skew data & Low coverage • “Black Box” on third partner / ads platform Test Setup • Limitation of Testing objects Tracking • External data loop Hadoop Summit 9

  10. A/B Testing Solution Example in the Text Ads • Based on the keywords • Stratified sampling to resolve skewed data Sampling • Campaign structure management • Test object: bidding models Test Setup • Insides + outsides tracking Tracking • Data loop for the model Hadoop Summit 10

  11. Why Sampling is important for A/B testing?  Choose the right sample size • Is a large sample always good to speed up A/B? Or put business in real risk?  Choose the right method • Why not using random sampling anyway?  Un-represented sampling result might hurt business after rollout • Is the model workable for all the Ads? Or only the sampled ads?  A trustable sampling result makes the A/B result trustable • Is the difference from A/B test result really from the model? Or because of the sampling difference? Hadoop Summit 11

  12. Sampling Challenge – Huge volume of data • Billion level Ads • New Ads sourcing – is the process scalable for more ads added to marketing? • Ads history tracking – how the process dealing with the historical data? Hadoop Summit 12

  13. Sampling challenge – Skew Data & Low Coverage 100.00% Click Distribution (hot -> cold) 90.00% 80.00% 70.00% Ads Count 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0 5000000 10000000 15000000 20000000 25000000 Ad Count ADS IMPRESSION CLICK VALUED ad count total_ad CLICK • • Low Conversion Rate – Impression -> Click -> Top click queries Transaction • Long tail queries • Deal with ads with no impression on partner Hadoop Summit 13

  14. Sampling Solution - Method Hadoop Summit 14

  15. Sampling Solution - Tech • Hbase + HDFS  Active ads stored in Hbase  Ads history stored in HDFS • Spark  Huge data pre-aggregation  Optimization of huge data join with ads history, user behavior…  Store data as Parquet to improve the spark job efficiency Hadoop Summit 15

  16. Machine Learning Model Integration Where is the data? What is a model? How to manage the model lifecycle? Hadoop Summit 16

  17. Challenge for data • Data extraction • Data processing • Data gathering • Original Solution  Regular ETL data pipeline to build factor for each model  Move gathered factors to model running env based on different scenario • Bottleneck  Some effort are duplicated among different models  Factor is not reusable as it is built to meet special model’s requirement  More effort to maintain the factor as it could be from different sources and built for specified model Hadoop Summit 17

  18. New Solution - Factor System  Factor: the model input  Heterogeneous data sources  Syntax + Semantic layer  Calculate on the Hadoop  Factor life-cycle Hadoop Summit 18

  19. What factor system provides • Register Service  Factor code integration, deployment  External factor register • Download Service  Online model input  Offline data exploring and model development • Scheduling Service  Schedule the factor code in factor system due to different source data latency • Dashboard  Factor status monitor, help understand the factor code running status  Factor meta definition, help data scientist better understand the factor to build the model Hadoop Summit 19

  20. Capacity of Factor System • PB level source data volume • 10+TB daily increment • 1000+ permanent factors, historical data backup on HDFS • Use Cases  Batch Models - serve all the machine learning models for Paid IM marketing  Adhoc – to support offline data exploring for data scientist and data developer  NRT/Real-time (Future) - build factor cache for NRT or real-time model use cases Hadoop Summit 20

  21. What model requires • Model can access the wanted data based on the logical Data Stream 1 design • Model can be executed in Model result // Model Logic expected env using right tech to meet different use cases Data Stream 2 • Model result can be delivered for real business needs Hadoop Summit 21

  22. What is a model – Model Engine • Onboarding data from factor system to model engine • Execute models using different tech solution to meet the real scenarios • Landing result to different system to integrate with Ads publisher Hadoop Summit 22

  23. What model engine can help more to data scientist • Sampled data for model training  Data scientist can get pre-sampled represented ads to train/test the models • Real production factors access  Avoid duplicated effort from data scientist when developing new models with existing factors • Self Service  Integration, provide staging environment similar to real-production for model execution to avoid integration issue after model deployment  Model deployment  Online debugging, all the model result/logs are kept in system to allow data scientist debugging during A/B testing • Dashboard  Model status monitor Hadoop Summit 23

  24. Model Lifecycle (Batch) Hadoop Summit 24

  25. Model Lifecycle (NRT) Hadoop Summit 25

  26. Anything Else for model? • Is Model Result Reliable?  “SafeNet” • Collect the historical behavior of model • Detect any significant difference • Block the result sending to publisher • How to track?  Ads Monitor & Alert • Expose online model result to Scientist/Analyst • Dashboard • Hourly & Daily report • Alerts deliver to model owner & business owner Hadoop Summit 26

  27. Summary • A/B Testing  Hbase, HDFS, MySQL, Oracle, Mongo  Java, Scala, SQL • Machine learning model  HDFS, Kafka, Cassandra  Hive, Spark, Spark streaming  Java, Scala, R, Python • Dashboard  InfluxDB  Grafana Hadoop Summit 27

  28. Hadoop Summit 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend