mike borsuk
play

Mike Borsuk mike.borsuk@optimizely.com About Optimizely Experiment - PowerPoint PPT Presentation

The Continuing Story Of Analytics at Optimizely : Batch, Streaming and Lambda Systems Mike Borsuk mike.borsuk@optimizely.com About Optimizely Experiment Everywhere o Experimentation, Personalization, Recommendations o Web, Mobile, OTT, Full


  1. The Continuing Story Of Analytics at Optimizely : Batch, Streaming and Lambda Systems Mike Borsuk mike.borsuk@optimizely.com

  2. About Optimizely Experiment Everywhere o Experimentation, Personalization, Recommendations o Web, Mobile, OTT, Full stack Data challenges o Billions of events per day received o Real-time results

  3. Overview Background & Motivation o Real Time Stream Processing o What is Lambda Architecture and how/why we o are implementing

  4. Optimizely X Personalization

  5. Personalization data scale o 4.14B raw events received daily o Grouped into 10M distinct visitor sessions daily (stream processing w/Samza) o Calculating and serving back millions of time series data points

  6. Personalization data challenges o From a single A/B test per experiment to multiple targeted tests in a campaign o Longer running data collection / analysis o Need for session based metrics o Data schema designed for single A/B tests

  7. Personalization data scale o Mean response time (HBase) goes from milliseconds to nearly 30s

  8. Realtime Stream Processing Persist raw events o S3 buckets grouped by 24h UTC Fan out events into processing queues o Kafka topics for event types Session aggregation w/Samza Groups clickstream events into sessions o Per-visitor basis o Split on 30 minutes inactivity o

  9. Stream Processing Architecture

  10. Lambda Architecture o Batch Layer o Serving Layer o Speed Layer

  11. Lambda Architecture

  12. Our Implementation of LA o Match schema to query patterns o Make time-series data “combinable” or at the same base granularity o Write data into HBase for locality at query time, “de-normalization”

  13. Our Implementation of LA o Immutable raw-event “source of truth” o Pre-computation batch jobs matching our real- time o Time range optimized real-time queries o Serving layer to merge batch + real-time o Done for performance, not accuracy

  14. Adding Lambda Layers Speed ``

  15. Adding Lambda Layers Speed Layer Pre-computed Time Series Realtime Computation Batch Layer Serving Layer Composite Time Series Result query time range

  16. Benefits we are seeing Solving our query latency issues •

  17. Benefits we are seeing o Flexibility o System Fault Tolerance o Human Fault Tolerance

  18. Drawbacks we are seeing o Complexity in serving layer o Batch job management o Operational Burdens

  19. References o Big Data, book by Nathan Marz and James Warren o Optimizely engineering blog: https://medium.com/engineers-optimizely o Samza specific: Optimizely presentation at LinkedIn streaming meetup (https://youtu.be/p7hjrKyfQkc)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend