Applying Design To Solve Scaling Problems and Evolve an Architecture - PowerPoint PPT Presentation

Introducing the Bidder-as-a-Service Applying Design To Solve Scaling Problems and Evolve an Architecture DataEngConf, NYC Oct. 30, 2017 Mark Weiss Senior Software Engineer mark@beeswax.com @marksweiss

What is Beeswax?

We Built a Better Bidder About Beeswax ● Beeswax is a 3-year-old ad tech startup based in NYC ● Founded by three ex-Googlers, CEO has deep roots in ad tech ● 40 employees in NYC and London Why we are Different ● Customers get the benefits of a custom bidder stack, without the development and operating cost and risk ● Give customers access to all of their data ● Provide APIs for customers to customize bidding strategy, API-driven ● SaaS model and pricing, customers pay to use the platform

RTB: Real Time Bidding (AKA "Please Let Us Do This") Ad Exchange Step 1: Step 2: Send ad request & userid Broadcast bid request Beeswax Bidder Scale: 1M QPS < 200 ms Publisher Latency_99 : 20 ms - Target campaigns - Target user profiles - Optimize for ROI Step 4: Step 3: - Customize Show ad to user Submit bid & ad markup Auction

What is the Beeswax Data Platform?

Beeswax Data Platform Bid Data Customer Raw Event Customer Data Normalized Log Data Redshift Event Ingestion S3 Event Join, Processing Normalize, Aggregate Impression, Click and Customer other Event Data Reports

Beeswax Data Platform: Event Stream Bid Data Customer Raw Event Customer Python Web App Data Normalized Log Data Input: HTTP/JSON Output: Protobuf Redshift Event Ingestion S3 Event Join, Processing Normalize, Aggregate Kinesis Impression, Click and Customer other Event Data Reports

Beeswax Data Platform: Event Processing Bid Data Custom Java KCL App Input: Protobuf Customer Raw Event Customer Output: CSV Data Normalized Log Data Redshift Event Ingestion S3 Event Join, Processing Normalize, Aggregate Impression, Click and Customer other Event Data Reports

Beeswax Data Platform: Event Processing Bid Data Customer Raw Event Customer Data Normalized Log Data Redshift Event Ingestion S3 Event Join, Processing Normalize, Aggregate - AWS Data Pipeline - AWS Redshift/SQL - Custom Python libs - Python Activities Impression, Click and Customer other Event Data Reports

What Was the State of the System?

Event Join and Aggregation ("Everything Looks Good …") Bids Honeycomb Joining Fact Table: Impressions and Impression Details Aggregation Clicks, Conversions

Event Join and Aggregation ("Everything Looks Good …") Bids Honeycomb Joining Fact Table: Impressions and Impression Details Aggregation Clicks, Conversions Other Impression Data

Pipeline Problems: Monolithic and Inflexible Step 1 Step 2 Step 3 Target Table Step 1 Step 2 Step 3

We were a lucky startup with a bunch of "good problems to have"

System Goals for Architectural Evolution ● Support separate pipelines writing to the same target tables ● Support any pipeline depending on the data from any other ● Centralize job-level state management and job control

System Goals for Architectural Evolution ● Support separate pipelines writing to the same target tables ● Support any pipeline depending on the data from any other ● Centralize job-level state management and job control ● Continue to use the existing platform technologies … for now

From Goals to Principles to Patterns to Design

Goals to Principles: Remove Contention Goal Principle Multiple asynchronous pipelines with Jobs always write to new versioned no write contention instances of target tables Multiple pipelines land data in same One job per master target table reads master fact table from multiple sources and writes into the target table sequentially

Principles to Patterns: Remove Contention Input Data Set A Data Pipeline Job Staging Table A A Input Data Set A Gather Data Target Fact Pipeline Job Table Input Data Set B Data Pipeline Job Staging Table B B Input Data Set B

Goals to Principles: Job Composition and Job State Goal Principle Any job can depend on any other job Jobs record completion of uniquely identifiable, timestamped data sets into one source of truth for all jobs Jobs always consume the most recent Jobs can query one source of truth to source data available discover the the most recent data sets available upon which they depend

Principles to Patterns: Job Composition and Job State Global Job State Scatter Job Staging Table A Data Set Type A Version 1 Time 1 Data Set Time 1 Type 1

Principles to Patterns: Job Composition and Job State Global Job State Scatter Job Staging Table A Data Set Type A Version 1 Time 1 Data Set Time 1 Type A Data Set Time 2 Type A Scatter Job Staging Table A Data Set Type A Version 2 Time 2

Principles to Patterns: Job Composition and Job State Consumes most recent data Gather Data Job Global Job State Scatter Job Staging Table A Data Set Type A Version 1 Time 1 Data Set Time 1 Type A Data Set Time 2 Type A Scatter Job Staging Table A Data Set Type A Garbage Version 2 Time 2 Collection Job DROPs less recent data

Patterns to Design: Job Composition and Job State (Data Set Type A, timestamp 1, processing_window) Scatter Job A Gather Data Global Job Pipeline Job State (Data Set Type A, timestamp 1, proecssing_window), (Data Set Type B, Scatter Job B (Data Set Type A, timestamp 2, timestamp 1, processing_window) processing_window)

Implementing the Design with What we Have on Hand

Implementing the Design RDS (MySQL) Data Pipeline Python API Global Job State Jobs Data Pipeline Tables Jobs Redshift DDL ● AWS Data Pipeline ● Python ● Redshift SQL

Conclusions ● You can evolve data architecture without adopting new technology ● Carefully chosen invariants define a design that can solve present problems and supports future flexibility ● Invariants are system Goals ● Identifying goals suggest Principles ● Patterns embody Principles ● Design applies patterns

Introducing the Bidder-as-a-Service Questions? We have a great team! Mark Weiss Senior Software Engineer We have lots of fun problems to solve! mark@beeswax.com We have LaCroix and Kind Bars! @marksweiss We're hiring! https://www.beeswax.com/careers/

Applying Design To Solve Scaling Problems and Evolve an Architecture - PowerPoint PPT Presentation

Introducing the Bidder-as-a-Service Applying Design To Solve Scaling Problems and Evolve an Architecture DataEngConf, NYC Oct. 30, 2017 Mark Weiss Senior Software Engineer mark@beeswax.com @marksweiss What is Beeswax? We Built a Better

EVolve Houston Shared Vision and Roadmap for the Greater Houston Area Presented by : EVolve

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

Evolve Recycling Industry Leading Inkjet, Toner and Small Electronics Recycling Program for

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

Scaling From simple models to rich strategies PPPLab Day, November 30th Scaling: recent

Outline Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large Principles of

Using Python to Solve Computationally Hard Problems Using Python to Solve Computationally Hard

Solving Percent Problems Word Problems Find a Pattern Estimation Problems Fraction Problems

Non-Intrusively Avoiding Scaling Problems in and out of MPI Collectives Hongbo Li , Zizhong Chen,

Chapter 11: Scaling and Round-off Noise Keshab K. Parhi Outline Introduction Scaling

ADAPTIVE RADIO OUTPUT SCALING FOR POWER AND BANDWIDTH SAVING Koen Zandberg 1 ADAPTIVE RADIO

Scaling Distributed Teams Around The Globe Ranganathan Balashanmugam Scaling Distributed Teams

Conformal Finite Size Scaling of Conformal Finite Size Scaling of Flavors Chik Him Wong Twelve

So#ware Scaling Mo/va/on & Goals HW Configura/on & Scale Out So#ware Scaling

Beeswax a platform for private web apps Jean-Sbastien Lgar*, Robert Sumi and William Aiello

Moving Pharmacy Forward: Using the Pharmacists Patient Care Process Kathleen A. Lusk, PharmD,

Mind Hurdles: Math Transformers Interactive Fun (An Introduction To Slides, Flips And Turns For

Sandy River Basin Watershed Council Meeting March 31 st , 2014 Council Development

Code Games Or How I Learned to Stop Worrying and Love to Code Jacob Wilkins Topics Learning New

What your church can do for a Healthy Earth! with Jessica Morthorpe Uniting Earth Ministry

Slide 1 / 47 1 Two substances mercury with a density 13600 kg/m3 and alcohol with a density 800

CCPA - How To Do It Tanya Forsheit, Chair, Privacy & Data Security Group, Frankfurt Kurnit

Applying Design To Solve Scaling Problems and Evolve an Architecture - PowerPoint PPT Presentation

Introducing the Bidder-as-a-Service Applying Design To Solve Scaling Problems and Evolve an Architecture DataEngConf, NYC Oct. 30, 2017 Mark Weiss Senior Software Engineer mark@beeswax.com @marksweiss What is Beeswax? We Built a Better

EVolve Houston Shared Vision and Roadmap for the Greater Houston Area Presented by : EVolve

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

Analysis of Scaling Algorithms for Matrix &amp; Operator Scaling Contents Scaling Algorithms

Evolve Recycling Industry Leading Inkjet, Toner and Small Electronics Recycling Program for

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

Scaling From simple models to rich strategies PPPLab Day, November 30th Scaling: recent

Outline Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large Principles of

Using Python to Solve Computationally Hard Problems Using Python to Solve Computationally Hard

Solving Percent Problems Word Problems Find a Pattern Estimation Problems Fraction Problems

Non-Intrusively Avoiding Scaling Problems in and out of MPI Collectives Hongbo Li , Zizhong Chen,

Chapter 11: Scaling and Round-off Noise Keshab K. Parhi Outline Introduction Scaling

ADAPTIVE RADIO OUTPUT SCALING FOR POWER AND BANDWIDTH SAVING Koen Zandberg 1 ADAPTIVE RADIO

Scaling Distributed Teams Around The Globe Ranganathan Balashanmugam Scaling Distributed Teams

Conformal Finite Size Scaling of Conformal Finite Size Scaling of Flavors Chik Him Wong Twelve

So#ware Scaling Mo/va/on &amp; Goals HW Configura/on &amp; Scale Out So#ware Scaling

Beeswax a platform for private web apps Jean-Sbastien Lgar*, Robert Sumi and William Aiello

Moving Pharmacy Forward: Using the Pharmacists Patient Care Process Kathleen A. Lusk, PharmD,

Mind Hurdles: Math Transformers Interactive Fun (An Introduction To Slides, Flips And Turns For

Sandy River Basin Watershed Council Meeting March 31 st , 2014 Council Development

Code Games Or How I Learned to Stop Worrying and Love to Code Jacob Wilkins Topics Learning New

What your church can do for a Healthy Earth! with Jessica Morthorpe Uniting Earth Ministry

Slide 1 / 47 1 Two substances mercury with a density 13600 kg/m3 and alcohol with a density 800

CCPA - How To Do It Tanya Forsheit, Chair, Privacy &amp; Data Security Group, Frankfurt Kurnit

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

So#ware Scaling Mo/va/on & Goals HW Configura/on & Scale Out So#ware Scaling

CCPA - How To Do It Tanya Forsheit, Chair, Privacy & Data Security Group, Frankfurt Kurnit