applying design to solve scaling problems and evolve an
play

Applying Design To Solve Scaling Problems and Evolve an Architecture - PowerPoint PPT Presentation

Introducing the Bidder-as-a-Service Applying Design To Solve Scaling Problems and Evolve an Architecture DataEngConf, NYC Oct. 30, 2017 Mark Weiss Senior Software Engineer mark@beeswax.com @marksweiss What is Beeswax? We Built a Better


  1. Introducing the Bidder-as-a-Service Applying Design To Solve Scaling Problems and Evolve an Architecture DataEngConf, NYC Oct. 30, 2017 Mark Weiss Senior Software Engineer mark@beeswax.com @marksweiss

  2. What is Beeswax?

  3. We Built a Better Bidder About Beeswax ● Beeswax is a 3-year-old ad tech startup based in NYC ● Founded by three ex-Googlers, CEO has deep roots in ad tech ● 40 employees in NYC and London Why we are Different ● Customers get the benefits of a custom bidder stack, without the development and operating cost and risk ● Give customers access to all of their data ● Provide APIs for customers to customize bidding strategy, API-driven ● SaaS model and pricing, customers pay to use the platform

  4. RTB: Real Time Bidding (AKA "Please Let Us Do This") Ad Exchange Step 1: Step 2: Send ad request & userid Broadcast bid request Beeswax Bidder Scale: 1M QPS < 200 ms Publisher Latency_99 : 20 ms - Target campaigns - Target user profiles - Optimize for ROI Step 4: Step 3: - Customize Show ad to user Submit bid & ad markup Auction

  5. What is the Beeswax Data Platform?

  6. Beeswax Data Platform Bid Data Customer Raw Event Customer Data Normalized Log Data Redshift Event Ingestion S3 Event Join, Processing Normalize, Aggregate Impression, Click and Customer other Event Data Reports

  7. Beeswax Data Platform: Event Stream Bid Data Customer Raw Event Customer Python Web App Data Normalized Log Data Input: HTTP/JSON Output: Protobuf Redshift Event Ingestion S3 Event Join, Processing Normalize, Aggregate Kinesis Impression, Click and Customer other Event Data Reports

  8. Beeswax Data Platform: Event Processing Bid Data Custom Java KCL App Input: Protobuf Customer Raw Event Customer Output: CSV Data Normalized Log Data Redshift Event Ingestion S3 Event Join, Processing Normalize, Aggregate Impression, Click and Customer other Event Data Reports

  9. Beeswax Data Platform: Event Processing Bid Data Customer Raw Event Customer Data Normalized Log Data Redshift Event Ingestion S3 Event Join, Processing Normalize, Aggregate - AWS Data Pipeline - AWS Redshift/SQL - Custom Python libs - Python Activities Impression, Click and Customer other Event Data Reports

  10. What Was the State of the System?

  11. Event Join and Aggregation ("Everything Looks Good …") Bids Honeycomb Joining Fact Table: Impressions and Impression Details Aggregation Clicks, Conversions

  12. Event Join and Aggregation ("Everything Looks Good …") Bids Honeycomb Joining Fact Table: Impressions and Impression Details Aggregation Clicks, Conversions Other Impression Data

  13. Pipeline Problems: Monolithic and Inflexible Step 1 Step 2 Step 3 Target Table Step 1 Step 2 Step 3

  14. We were a lucky startup with a bunch of "good problems to have"

  15. System Goals for Architectural Evolution ● Support separate pipelines writing to the same target tables ● Support any pipeline depending on the data from any other ● Centralize job-level state management and job control

  16. System Goals for Architectural Evolution ● Support separate pipelines writing to the same target tables ● Support any pipeline depending on the data from any other ● Centralize job-level state management and job control ● Continue to use the existing platform technologies … for now

  17. From Goals to Principles to Patterns to Design

  18. Goals to Principles: Remove Contention Goal Principle Multiple asynchronous pipelines with Jobs always write to new versioned no write contention instances of target tables Multiple pipelines land data in same One job per master target table reads master fact table from multiple sources and writes into the target table sequentially

  19. Principles to Patterns: Remove Contention Input Data Set A Data Pipeline Job Staging Table A A Input Data Set A Gather Data Target Fact Pipeline Job Table Input Data Set B Data Pipeline Job Staging Table B B Input Data Set B

  20. Goals to Principles: Job Composition and Job State Goal Principle Any job can depend on any other job Jobs record completion of uniquely identifiable, timestamped data sets into one source of truth for all jobs Jobs always consume the most recent Jobs can query one source of truth to source data available discover the the most recent data sets available upon which they depend

  21. Principles to Patterns: Job Composition and Job State Global Job State Scatter Job Staging Table A Data Set Type A Version 1 Time 1 Data Set Time 1 Type 1

  22. Principles to Patterns: Job Composition and Job State Global Job State Scatter Job Staging Table A Data Set Type A Version 1 Time 1 Data Set Time 1 Type A Data Set Time 2 Type A Scatter Job Staging Table A Data Set Type A Version 2 Time 2

  23. Principles to Patterns: Job Composition and Job State Consumes most recent data Gather Data Job Global Job State Scatter Job Staging Table A Data Set Type A Version 1 Time 1 Data Set Time 1 Type A Data Set Time 2 Type A Scatter Job Staging Table A Data Set Type A Garbage Version 2 Time 2 Collection Job DROPs less recent data

  24. Patterns to Design: Job Composition and Job State (Data Set Type A, timestamp 1, processing_window) Scatter Job A Gather Data Global Job Pipeline Job State (Data Set Type A, timestamp 1, proecssing_window), (Data Set Type B, Scatter Job B (Data Set Type A, timestamp 2, timestamp 1, processing_window) processing_window)

  25. Implementing the Design with What we Have on Hand

  26. Implementing the Design RDS (MySQL) Data Pipeline Python API Global Job State Jobs Data Pipeline Tables Jobs Redshift DDL ● AWS Data Pipeline ● Python ● Redshift SQL

  27. Conclusions ● You can evolve data architecture without adopting new technology ● Carefully chosen invariants define a design that can solve present problems and supports future flexibility ● Invariants are system Goals ● Identifying goals suggest Principles ● Patterns embody Principles ● Design applies patterns

  28. Introducing the Bidder-as-a-Service Questions? We have a great team! Mark Weiss Senior Software Engineer We have lots of fun problems to solve! mark@beeswax.com We have LaCroix and Kind Bars! @marksweiss We're hiring! https://www.beeswax.com/careers/

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend