Real Time Data Analytics @ Uber
Ankur Bansal November 14, 2016
Real Time Data Analytics @ Uber Ankur Bansal November 14, 2016 - - PowerPoint PPT Presentation
Real Time Data Analytics @ Uber Ankur Bansal November 14, 2016 About Me Sr. Software Engineer, Streaming Team @ Uber Streaming team supports platform for real time data analytics: Kafka, Samza, Flink, Pinot.. and plenty more
Ankur Bansal November 14, 2016
○ Streaming team supports platform for real time data analytics: Kafka, Samza, Flink, Pinot.. and plenty more ○ Focused on scaling Kafka at Uber’s pace
○ Build & scale Ebay’s cloud using openstack
○ Rest Proxy & Clients ○ Local Agent ○ uReplicator (Mirrormaker) ○ Chaperone (Auditing)
Stream Processing
SURGE MULTIPLIERS Rider eyeballs Open car information KAFKA
And many more ...
DATA PRODUCERS DATA CONSUMERS
Real-time, Fast Analytics
BATCH PIPELINE
Applications Data Science Analytics Reporting
RIDER APP DRIVER APP API / SERVICES DISPATCH (gps logs) Mapping & Logistic
Ad-hoc exploration Alerts, Dashboards
Debugging
REAL-TIME PIPELINE
Mobile App
Messages/day bytes/day
Applications [ProxyClient] Kafka REST Proxy Regional Kafka Aggregate Kafka Applications [ProxyClient] Kafka REST Proxy Regional Kafka
Local Agent Secondary Kafka
DataCenter-I
uReplicator
DataCenter-III DataCenter-II
Application Process
ProxyClient
Kafka Proxy Server uReplicator
1 2 3 5 7 6 4 8
Regional Kafka Aggregate Kafka
Applications [ProxyClient] Kafka REST Proxy Regional Kafka Aggregate Kafka Applications [ProxyClient] Kafka REST Proxy Regional Kafka
Local Agent Secondary Kafka
DataCenter-I
uReplicator
DataCenter-III DataCenter-II
○ Data (async, reliable) ○ Logging (High throughput) ○ Time Sensitive (Low Latency e.g. Surge, Push notifications) ○ High Value Data (At-least once, Sync e.g. Payments)
Applications [ProxyClient] Kafka REST Proxy Regional Kafka Aggregate Kafka Applications [ProxyClient] Kafka REST Proxy Regional Kafka
Local Agent Secondary Kafka
DataCenter-I
uReplicator
DataCenter-III DataCenter-II
○ Thin clients = operational ease ○ Less connections to Kafka brokers ○ Future kafka upgrade
○ Primary & Secondary Kafka Clusters
○ Simple http servlets on jetty instead of Jersey ○ Optimized for binary payloads. ○ Performance increase from 7K* to 45-50K QPS/box
○ Support for Fallback cluster ○ Support for multiple Producers (SLA based segregation)
*Based on benchmarking & analysis done in Jun ’2015
Message rate (K/second) at single node End-end Latency (ms)
Applications [ProxyClient] Kafka REST Proxy Regional Kafka Aggregate Kafka Applications [ProxyClient] Kafka REST Proxy Regional Kafka
Local Agent Secondary Kafka
DataCenter-I
uReplicator
DataCenter-III DataCenter-II
○ Non-blocking, async, batching ○ <1ms produce latency for clients ○ Handles Throttling/BackOff signals from Rest Proxy
○ Discovers the kafka cluster a topic belongs ○ Able to multiplex to different kafka clusters
Add Figure
What if there is network glitch /
Add Figure
Applications [ProxyClient] Kafka REST Proxy Regional Kafka Aggregate Kafka Applications [ProxyClient] Kafka REST Proxy Regional Kafka
Local Agent Secondary Kafka
DataCenter-I
uReplicator
DataCenter-III DataCenter-II
infrastructure recovering from outage
○ Reuses code from rest-proxy and kafka’s log module. ○ Appends all topics to same file for high throughput.
Add Figure
Add Figure
Applications [ProxyClient] Kafka REST Proxy Regional Kafka Aggregate Kafka Applications [ProxyClient] Kafka REST Proxy Regional Kafka
Local Agent Secondary Kafka
DataCenter-I
uReplicator
DataCenter-III DataCenter-II
Traffic from DC1 Traffic from DC3 Traffic from DC2
App box
Dispatch Mobile API
Kafka8 Aggregation Cluster
Mirror Maker
http calls
CONFIDENTIAL
>> INSERT SCREENSHOT HERE <<
Zookeeper Helix MM Controller
Helix Agent Thread 1 Thread N
Topic-partition
Helix Agent Thread 1 Thread N
Topic-partition
Helix Agent Thread 1 Thread N
Topic-partition
MM worker1 MM worker2 MM worker3
Zookeeper Helix MM Controller
Helix Agent Thread 1 Thread N
Topic-partition
Helix Agent Thread 1 Thread N
Topic-partition
Helix Agent Thread 1 Thread N
Topic-partition
MM worker1 MM worker2 MM worker3
CONFIDENTIAL
>> INSERT SCREENSHOT HERE <<
CONFIDENTIAL
>> INSERT SCREENSHOT HERE <<
Application Process
ProxyClient
Kafka Proxy Server uReplicator
1 2 3 5 7 6 4 8
Regional Kafka Aggregate Kafka
○ Batching at each stage ○ Ack before produce (ack’ed != committed)
○ min.insync.replicas=2, can only torrent one node failure ○ unclean.leader.election= false, need to wait until the old leader comes back
○ Partition Failover
○ Replication throttling, to reduce impact of node bootstrap ○ Prevent catching up nodes to become ISR
Add Figure
imbalance and inter-broker dependency.
Rebalance Plan.
incremental, can be stopped and resumed.
Automated in the future.
Add Figure
○ Multiple Clusters per DC ○ Use case based tuning
○ Batch everywhere, Async produce ○ Replace Jersey with Jetty
○ Chaperone ○ Toolkit
Broker 1 100 101 102 103 Broker 2 100 101 Broker 3 100 101 Leader Committed Producer Acked
Broker 1 100 101 102 103 Broker 2 100 101 Broker 3 100 101 Leader Committed Producer Failed Acked
Broker 1 100 101 102 103 Broker 2 100 101 Broker 3 100 101 Leader Committed Producer
Broker 1 100 101 102 103 Broker 2 100 101 104 105 106 Broker 3 100 101 104 105 Leader Committed Producer Old HW
Broker 1 100 101 102 103 Broker 2 100 101 104 105 106 Broker 3 100 101 104 105 Leader Committed Producer
X
Old HW
X
Broker 1 100 101 104 105 106 Broker 2 100 101 104 105 106 Broker 3 100 101 105 106 Leader Committed Producer data loss!!
* Supported in Kafka 0.8+
Broker 1 Broker 2 Broker 3 ZooKeeper
Broker 1
Partition 0
Broker 2
Partition 1
Broker 3
Partition 2
ZooKeeper
Broker 1
Partition 0 Partition 2
Broker 2
Partition 1 Partition 0
Broker 3
Partition 2 Partition 1
ZooKeeper
Broker 1
Partition 0
1 2 3
Partition 2
1 2 3
Broker 2
Partition 1
1 2 3
Partition 0
1 2 3
Broker 3
Partition 2
1 2 3
Partition 1
1 2 3
ZooKeeper