real time data analytics uber
play

Real Time Data Analytics @ Uber Ankur Bansal November 14, 2016 - PowerPoint PPT Presentation

Real Time Data Analytics @ Uber Ankur Bansal November 14, 2016 About Me Sr. Software Engineer, Streaming Team @ Uber Streaming team supports platform for real time data analytics: Kafka, Samza, Flink, Pinot.. and plenty more


  1. Real Time Data Analytics @ Uber Ankur Bansal November 14, 2016

  2. About Me Sr. Software Engineer, Streaming Team @ Uber ● Streaming team supports platform for real time data ○ analytics: Kafka, Samza, Flink, Pinot.. and plenty more ○ Focused on scaling Kafka at Uber’s pace Staff software Engineer @ Ebay ● Build & scale Ebay’s cloud using openstack ○ Apache Kylin: Committer, Emeritus PMC ●

  3. Agenda Real time Use Cases ● Kafka Infrastructure Deep Dive ● Our own Development: ● ○ Rest Proxy & Clients Local Agent ○ uReplicator (Mirrormaker) ○ Chaperone (Auditing) ○ ● Operations/Tooling

  4. Important Use Cases

  5. Real-time Price Surging Stream Rider eyeballs KAFKA Processing SURGE MULTIPLIERS Open car information

  6. Real-time Machine Learning - UberEats ETD

  7. Fraud detection ● Share my ETA ● And many more ...

  8. Apache Kafka is Uber’s Lifeline

  9. DATA CONSUMERS Kafka ecosystem @ Uber Mobile App DATA Debugging PRODUCERS RIDER APP Real-time, Fast Analytics DRIVER APP Alerts, REAL-TIME PIPELINE Dashboards API / SERVICES DISPATCH (gps logs) Applications BATCH PIPELINE Data Science Mapping & Logistic Ad-hoc exploration Analytics Reporting

  10. Kafka cluster stats 100s of billion Messages/day 100s TB bytes/day Multiple data centers

  11. Kafka Infrastructure Deep Dive

  12. Requirements Scale to 100s Billions/day → 1 Trillion/day ● High Throughput ( Scale: 100s TB → PB) ● ● Low Latency for most use cases(<5ms ) Reliability - 99.99% ( #Msgs Available /#Msgs Produced) ● Multi-Language Support ● Tens of thousands of simultaneous clients. ● ● Reliable data replication across DC

  13. Kafka Pipeline DataCenter-I Applications Kafka REST Regional [ProxyClient] Proxy Kafka DataCenter-III Aggregate uReplicator Kafka DataCenter-II Applications Kafka REST Regional [ProxyClient] Proxy Kafka Local Agent Secondary Kafka

  14. Kafka Pipeline: Data Flow Aggregate Kafka Application Process Kafka Proxy Server Regional Kafka uReplicator 1 3 5 7 ProxyClient 8 4 6 2

  15. Kafka Clusters DataCenter-I Applications Kafka REST Regional [ProxyClient] Proxy Kafka DataCenter-III Aggregate uReplicator Kafka DataCenter-II Applications Kafka REST Regional [ProxyClient] Proxy Kafka Local Agent Secondary Kafka

  16. Kafka Clusters Use case based clusters ● Data (async, reliable) ○ Logging (High throughput) ○ Time Sensitive (Low Latency e.g. Surge, Push ○ notifications) High Value Data (At-least once, Sync e.g. Payments) ○ Secondary cluster as fallback ● Aggregate clusters for all data topics. ●

  17. Kafka Clusters Scale to 100s Billions/day → 1 Trillion/day ● High Throughput ( Scale: 100s TB → PB) ● ● Low Latency for most use cases(<5ms ) Reliability - 99.99% ( #Msgs Available /#Msgs Produced) ● Multi-Language Support ● Tens of thousands of simultaneous clients. ● ● Reliable data replication across DC

  18. Kafka Rest Proxy DataCenter-I Applications Kafka REST Regional [ProxyClient] Proxy Kafka DataCenter-III Aggregate uReplicator Kafka DataCenter-II Applications Kafka REST Regional [ProxyClient] Proxy Kafka Local Agent Secondary Kafka

  19. Why Kafka Rest Proxy ? Simplified Client API ● Multi-lang support (Java, NodeJs, Python, Golang) ● Decouple client from Kafka broker ● Thin clients = operational ease ○ Less connections to Kafka brokers ○ Future kafka upgrade ○ Enhanced Reliability ● Primary & Secondary Kafka Clusters ○

  20. Kafka Rest Proxy: Internals

  21. Kafka Rest Proxy: Internals

  22. Kafka Rest Proxy: Internals Based on Confluent’s open sourced Rest Proxy ● Performance enhancements ● ○ Simple http servlets on jetty instead of Jersey Optimized for binary payloads. ○ Performance increase from 7K* to 45-50K QPS/box ○ Caching of topic metadata. ● ● Reliability improvements* Support for Fallback cluster ○ Support for multiple Producers (SLA based segregation) ○ Plan to contribute back to community ● *Based on benchmarking & analysis done in Jun ’2015

  23. Rest Proxy: performance (1 box) End-end Latency (ms) Message rate (K/second) at single node

  24. Kafka Clusters + Rest Proxy Scale to 100s Billions/day → 1 Trillion/day ● High Throughput ( Scale: 100s TB → PB) ● ● Low Latency for most use cases(<5ms ) Reliability - 99.99% ( #Msgs Available /#Msgs Produced) ● Multi-Language Support ● Tens of thousands of simultaneous clients. ● ● Reliable data replication across DC

  25. Kafka Clients DataCenter-I Applications Kafka REST Regional [ProxyClient] Proxy Kafka DataCenter-III Aggregate uReplicator Kafka DataCenter-II Applications Kafka REST Regional [ProxyClient] Proxy Kafka Local Agent Secondary Kafka

  26. Client Libraries Support for multiple clusters. ● High Throughput ● ○ Non-blocking, async, batching <1ms produce latency for clients ○ Handles Throttling/BackOff signals from Rest Proxy ○ Topic Discovery ● ○ Discovers the kafka cluster a topic belongs Able to multiplex to different kafka clusters ○ Integration with Local Agent for critical data ●

  27. Client Libraries What if there is network glitch / outage? Add Figure

  28. Client Libraries Add Figure

  29. Kafka Clusters + Rest Proxy + Clients Scale to 100s Billions/day → 1 Trillion/day ● High Throughput ( Scale: 100s TB → PB) ● ● Low Latency for most use cases(<5ms ) Reliability - 99.99% ( #Msgs Available /#Msgs Produced) ● Multi-Language Support ● Tens of thousands of simultaneous clients. ● ● Reliable data replication across DC

  30. Local Agent DataCenter-I Applications Kafka REST Regional [ProxyClient] Proxy Kafka DataCenter-III Aggregate uReplicator Kafka DataCenter-II Applications Kafka REST Regional [ProxyClient] Proxy Kafka Local Agent Secondary Kafka

  31. Local Agent Local spooling in case of downstream outage/backpressure ● Backfills at the controlled rate to avoid hammering ● infrastructure recovering from outage Implementation: ● Reuses code from rest-proxy and kafka’s log module. ○ Appends all topics to same file for high throughput. ○

  32. Local Agent Architecture Add Figure

  33. Local Agent in Action Add Figure

  34. Kafka Clusters + Rest Proxy + Clients + Local Agent Scale to 100s Billions/day → 1 Trillion/day ● High Throughput ( Scale: 100s TB → PB) ● ● Low Latency for most use cases(<5ms ) Reliability - 99.99% ( #Msgs Available /#Msgs Produced) ● Multi-Language Support ● Tens of thousands of simultaneous clients. ● ● Reliable data replication across DC

  35. uReplicator DataCenter-I Applications Kafka REST Regional [ProxyClient] Proxy Kafka DataCenter-III Aggregate uReplicator Kafka DataCenter-II Applications Kafka REST Regional [ProxyClient] Proxy Kafka Local Agent Secondary Kafka

  36. Multi-DC data flow Traffic from DC2 Traffic from DC1 App box Dispatch http calls Mobile API Mirror Maker Kafka8 Aggregation Cluster Traffic from DC3

  37. Mirrormaker : existing problems ● New Topic added ● New partitions added ● Mirrormaker bounced ● New mirrormaker added >> INSERT SCREENSHOT HERE << CONFIDENTIAL

  38. uReplicator: In-house solution Helix MM Zookeeper Controller Helix Helix Helix Thread 1 Thread 1 Thread 1 Agent Agent Agent Thread N Thread N Thread N Topic-partition Topic-partition Topic-partition MM worker1 MM worker2 MM worker3

  39. uReplicator Helix MM Zookeeper Controller Helix Helix Helix Thread 1 Thread 1 Thread 1 Agent Agent Agent Thread N Thread N Thread N Topic-partition Topic-partition Topic-partition MM worker1 MM worker2 MM worker3

  40. Kafka Clusters + Rest Proxy + Clients + Local Agent Scale to 100s Billions/day → 1 Trillion/day ● High Throughput ( Scale: 100s TB → PB) ● ● Low Latency for most use cases(<5ms ) Reliability - 99.99% ( #Msgs Available /#Msgs Produced) ● Multi-Language Support ● Tens of thousands of simultaneous clients. ● ● Reliable data replication across DC

  41. uReplicator Running in production for 1+ year ● Open sourced: https://github.com/uber/uReplicator ● Blog: https://eng.uber.com/ureplicator/ ●

  42. Chaperone - E2E Auditing

  43. Chaperone Architecture

  44. Chaperone : Track counts >> INSERT SCREENSHOT HERE << CONFIDENTIAL

  45. Chaperone : Track Latency >> INSERT SCREENSHOT HERE << CONFIDENTIAL

  46. Chaperone Running in production for 1+ year ● Planning to open source in ~2 Weeks ●

  47. At-least Once Kafka

  48. Why do we need it? Aggregate Kafka Application Process Kafka Proxy Server Regional Kafka uReplicator 1 3 5 7 ProxyClient 8 4 6 2 Most of infrastructure tuned for high throughput ● ○ Batching at each stage ○ Ack before produce (ack’ed != committed) ● Single node failure in any stage leads to data loss ● Need a reliable pipeline for High Value Data e.g. Payments

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend