realtime data processing at facebook
play

Realtime Data Processing at Facebook Abhay Venkatesh Actionable - PowerPoint PPT Presentation

Realtime Data Processing at Facebook Abhay Venkatesh Actionable reports Why e.g. Chorus: what is trending right now? Realtime monitoring Streaming at e.g. dashboard queries Facebook? Hybrid realtime-batch pipelines e.g.


  1. Realtime Data Processing at Facebook Abhay Venkatesh

  2. • Actionable reports Why • e.g. Chorus: what is trending right now? • Realtime monitoring Streaming at • e.g. dashboard queries Facebook? • Hybrid realtime-batch pipelines • e.g. pre-emptive queries over data warehouse

  3. • s not ms, which means • can use persistent message bus called Scribe Workload • which makes it easier to enable Assumptions • Fault tolerance • Scalability • Multiple options for correctness

  4. System Architecture

  5. • Puma The Streaming • Swift Triad • Stylus

  6. • For apps written in a SQL-like language • Quick to write (< 1 hour) • But run over long periods (months to years) Puma • Two purposes • Pre-computed query results for simple aggregation queries • Filtering and processing of Scribe streams

  7. A Puma App

  8. Very Basic API • Can read() from a Scribe Stream Swift • Checkpoints every • N Strings, or • B Bytes

  9. • Low-Level Stream Processing in C++ Stylus Scribe Scribe Stylus Stream or Stream Processor(s) Data Store

  10. Sample Application

  11. • Language Paradigm Design • Data Transfer • Processing Semantics Decisions • State-saving mechanism • Reprocessing

  12. • Language Paradigm • Data Transfer Design Decisions • Processing Semantics • State-saving mechanism • Reprocessing

  13. Processing • At least once, at most once or exactly once • State semantics (inputs) Semantics • Output semantics

  14. State-Saving Mechanisms

  15. Reprocessing • Data warehousing with Hive • Stream processing in batch environment Data • Puma -> Hive • Stylus -> stateless, stateful, and monoid

  16. • “Move Fast” Closing • Ease of debugging Thoughts • Ease of deployment • Ease of monitoring and operation

  17. Naiad Facebook Realtime Systems • Milliseconds, not seconds • Seconds, not milliseconds • Robust solutions to • Does not handle micro- Comparison micro-stragglers stragglers • Expense availability in event • Persistent message bus with Naiad of failure ensures no loss • Naiad consumes inputs from • Flexible, and easy to use, message queue, and writes deploy, debug to key-value store

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend