Applying Design To Solve Scaling Problems and Evolve an Architecture - - PowerPoint PPT Presentation

applying design to solve scaling problems and evolve an
SMART_READER_LITE
LIVE PREVIEW

Applying Design To Solve Scaling Problems and Evolve an Architecture - - PowerPoint PPT Presentation

Introducing the Bidder-as-a-Service Applying Design To Solve Scaling Problems and Evolve an Architecture DataEngConf, NYC Oct. 30, 2017 Mark Weiss Senior Software Engineer mark@beeswax.com @marksweiss What is Beeswax? We Built a Better


slide-1
SLIDE 1

Introducing the Bidder-as-a-Service

Applying Design To Solve Scaling Problems and Evolve an Architecture

DataEngConf, NYC Oct. 30, 2017

Mark Weiss Senior Software Engineer mark@beeswax.com @marksweiss

slide-2
SLIDE 2

What is Beeswax?

slide-3
SLIDE 3

We Built a Better Bidder

About Beeswax

  • Beeswax is a 3-year-old ad tech startup based in NYC
  • Founded by three ex-Googlers, CEO has deep roots in ad tech
  • 40 employees in NYC and London

Why we are Different

  • Customers get the benefits of a custom bidder stack, without the

development and operating cost and risk

  • Give customers access to all of their data
  • Provide APIs for customers to customize bidding strategy, API-driven
  • SaaS model and pricing, customers pay to use the platform
slide-4
SLIDE 4

RTB: Real Time Bidding (AKA "Please Let Us Do This")

Publisher Ad Exchange Beeswax Bidder

Scale: 1M QPS Latency_99 : 20 ms

  • Target campaigns
  • Target user profiles
  • Optimize for ROI
  • Customize

< 200 ms

Step 1: Send ad request & userid Step 2: Broadcast bid request Step 3: Submit bid & ad markup Step 4: Show ad to user

Auction

slide-5
SLIDE 5

What is the Beeswax Data Platform?

slide-6
SLIDE 6

Beeswax Data Platform

Event Ingestion Impression, Click and

  • ther Event Data

Event Processing Customer Raw Event Data S3 Redshift Bid Data Join, Normalize, Aggregate Customer Normalized Log Data Customer Reports

slide-7
SLIDE 7

Beeswax Data Platform: Event Stream

Event Ingestion Impression, Click and

  • ther Event Data

Event Processing Customer Raw Event Data S3 Redshift Bid Data Join, Normalize, Aggregate Customer Normalized Log Data Customer Reports Python Web App Input: HTTP/JSON Output: Protobuf Kinesis

slide-8
SLIDE 8

Beeswax Data Platform: Event Processing

Event Ingestion Impression, Click and

  • ther Event Data

Event Processing Customer Raw Event Data S3 Redshift Bid Data Join, Normalize, Aggregate Customer Normalized Log Data Customer Reports Custom Java KCL App Input: Protobuf Output: CSV

slide-9
SLIDE 9

Beeswax Data Platform: Event Processing

Event Ingestion Impression, Click and

  • ther Event Data

Event Processing Customer Raw Event Data S3 Redshift Bid Data Join, Normalize, Aggregate Customer Normalized Log Data Customer Reports

  • AWS Data Pipeline
  • AWS Redshift/SQL
  • Custom Python libs
  • Python Activities
slide-10
SLIDE 10

What Was the State of the System?

slide-11
SLIDE 11

Event Join and Aggregation ("Everything Looks Good …")

Bids Impressions Clicks, Conversions

Honeycomb Joining and Aggregation

Fact Table: Impression Details

slide-12
SLIDE 12

Event Join and Aggregation ("Everything Looks Good …")

Bids Impressions Clicks, Conversions

Honeycomb Joining and Aggregation

Fact Table: Impression Details

Other Impression Data

slide-13
SLIDE 13

Pipeline Problems: Monolithic and Inflexible

Target Table

Step 1 Step 2 Step 3 Step 1 Step 2 Step 3

slide-14
SLIDE 14

We were a lucky startup with a bunch of "good problems to have"

slide-15
SLIDE 15

System Goals for Architectural Evolution

  • Support separate pipelines writing to the same target tables
  • Support any pipeline depending on the data from any other
  • Centralize job-level state management and job control
slide-16
SLIDE 16

System Goals for Architectural Evolution

  • Support separate pipelines writing to the same target tables
  • Support any pipeline depending on the data from any other
  • Centralize job-level state management and job control
  • Continue to use the existing platform technologies … for now
slide-17
SLIDE 17

From Goals to Principles to Patterns to Design

slide-18
SLIDE 18

Goals to Principles: Remove Contention

Goal Principle Multiple asynchronous pipelines with no write contention Jobs always write to new versioned instances of target tables Multiple pipelines land data in same master fact table One job per master target table reads from multiple sources and writes into the target table sequentially

slide-19
SLIDE 19

Principles to Patterns: Remove Contention

Data Pipeline Job A

Staging Table A Target Fact Table

Gather Data Pipeline Job

Input Data Set A Input Data Set A

Data Pipeline Job B

Staging Table B Input Data Set B Input Data Set B

slide-20
SLIDE 20

Goals to Principles: Job Composition and Job State

Goal Principle Any job can depend on any other job Jobs record completion of uniquely identifiable, timestamped data sets into one source of truth for all jobs Jobs always consume the most recent source data available Jobs can query one source of truth to discover the the most recent data sets available upon which they depend

slide-21
SLIDE 21

Principles to Patterns: Job Composition and Job State

Scatter Job Data Set Type A Time 1 Staging Table A Version 1 Global Job State Data Set Type 1 Time 1

slide-22
SLIDE 22

Principles to Patterns: Job Composition and Job State

Scatter Job Data Set Type A Time 1 Staging Table A Version 1 Scatter Job Data Set Type A Time 2 Staging Table A Version 2 Global Job State Data Set Type A Time 1 Data Set Type A Time 2

slide-23
SLIDE 23

Principles to Patterns: Job Composition and Job State

Scatter Job Data Set Type A Time 1 Staging Table A Version 1 Garbage Collection Job DROPs less recent data Scatter Job Data Set Type A Time 2 Staging Table A Version 2 Global Job State Data Set Type A Time 1 Data Set Type A Time 2 Gather Data Job Consumes most recent data

slide-24
SLIDE 24

Patterns to Design: Job Composition and Job State

Gather Data Pipeline Job

Global Job State (Data Set Type A, timestamp 1, processing_window) Scatter Job A Scatter Job B (Data Set Type A, timestamp 1, proecssing_window), (Data Set Type B, timestamp 2, processing_window) (Data Set Type A, timestamp 1, processing_window)

slide-25
SLIDE 25

Implementing the Design with What we Have on Hand

slide-26
SLIDE 26

Implementing the Design

Data Pipeline Jobs Tables Global Job State

  • AWS Data

Pipeline

  • Python
  • Redshift SQL

Redshift DDL RDS (MySQL) Data Pipeline Jobs Python API

slide-27
SLIDE 27

Conclusions

  • You can evolve data architecture without adopting new

technology

  • Carefully chosen invariants define a design that can solve

present problems and supports future flexibility

  • Invariants are system Goals
  • Identifying goals suggest Principles
  • Patterns embody Principles
  • Design applies patterns
slide-28
SLIDE 28

Introducing the Bidder-as-a-Service

Questions?

Mark Weiss Senior Software Engineer mark@beeswax.com @marksweiss

We have a great team! We have lots of fun problems to solve! We have LaCroix and Kind Bars! We're hiring! https://www.beeswax.com/careers/