Real Time Data Analytics @ Uber Ankur Bansal November 14, 2016 - - PowerPoint PPT Presentation

real time data analytics uber
SMART_READER_LITE
LIVE PREVIEW

Real Time Data Analytics @ Uber Ankur Bansal November 14, 2016 - - PowerPoint PPT Presentation

Real Time Data Analytics @ Uber Ankur Bansal November 14, 2016 About Me Sr. Software Engineer, Streaming Team @ Uber Streaming team supports platform for real time data analytics: Kafka, Samza, Flink, Pinot.. and plenty more


slide-1
SLIDE 1

Real Time Data Analytics @ Uber

Ankur Bansal November 14, 2016

slide-2
SLIDE 2

About Me

  • Sr. Software Engineer, Streaming Team @ Uber

○ Streaming team supports platform for real time data analytics: Kafka, Samza, Flink, Pinot.. and plenty more ○ Focused on scaling Kafka at Uber’s pace

  • Staff software Engineer @ Ebay

○ Build & scale Ebay’s cloud using openstack

  • Apache Kylin: Committer, Emeritus PMC
slide-3
SLIDE 3

Agenda

  • Real time Use Cases
  • Kafka Infrastructure Deep Dive
  • Our own Development:

○ Rest Proxy & Clients ○ Local Agent ○ uReplicator (Mirrormaker) ○ Chaperone (Auditing)

  • Operations/Tooling
slide-4
SLIDE 4

Important Use Cases

slide-5
SLIDE 5

Stream Processing

Real-time Price Surging

SURGE MULTIPLIERS Rider eyeballs Open car information KAFKA

slide-6
SLIDE 6

Real-time Machine Learning - UberEats ETD

slide-7
SLIDE 7
slide-8
SLIDE 8
  • Fraud detection
  • Share my ETA

And many more ...

slide-9
SLIDE 9

Apache Kafka is Uber’s Lifeline

slide-10
SLIDE 10

DATA PRODUCERS DATA CONSUMERS

Real-time, Fast Analytics

BATCH PIPELINE

Applications Data Science Analytics Reporting

RIDER APP DRIVER APP API / SERVICES DISPATCH (gps logs) Mapping & Logistic

Ad-hoc exploration Alerts, Dashboards

Kafka ecosystem @ Uber

Debugging

REAL-TIME PIPELINE

Mobile App

slide-11
SLIDE 11

100s of billion 100s TB

Messages/day bytes/day

Kafka cluster stats

Multiple data centers

slide-12
SLIDE 12

Kafka Infrastructure Deep Dive

slide-13
SLIDE 13

Requirements

  • Scale to 100s Billions/day → 1 Trillion/day
  • High Throughput ( Scale: 100s TB → PB)
  • Low Latency for most use cases(<5ms )
  • Reliability - 99.99% ( #Msgs Available /#Msgs Produced)
  • Multi-Language Support
  • Tens of thousands of simultaneous clients.
  • Reliable data replication across DC
slide-14
SLIDE 14

Kafka Pipeline

Applications [ProxyClient] Kafka REST Proxy Regional Kafka Aggregate Kafka Applications [ProxyClient] Kafka REST Proxy Regional Kafka

Local Agent Secondary Kafka

DataCenter-I

uReplicator

DataCenter-III DataCenter-II

slide-15
SLIDE 15

Kafka Pipeline: Data Flow

Application Process

ProxyClient

Kafka Proxy Server uReplicator

1 2 3 5 7 6 4 8

Regional Kafka Aggregate Kafka

slide-16
SLIDE 16

Kafka Clusters

Applications [ProxyClient] Kafka REST Proxy Regional Kafka Aggregate Kafka Applications [ProxyClient] Kafka REST Proxy Regional Kafka

Local Agent Secondary Kafka

DataCenter-I

uReplicator

DataCenter-III DataCenter-II

slide-17
SLIDE 17

Kafka Clusters

  • Use case based clusters

○ Data (async, reliable) ○ Logging (High throughput) ○ Time Sensitive (Low Latency e.g. Surge, Push notifications) ○ High Value Data (At-least once, Sync e.g. Payments)

  • Secondary cluster as fallback
  • Aggregate clusters for all data topics.
slide-18
SLIDE 18

Kafka Clusters

  • Scale to 100s Billions/day → 1 Trillion/day
  • High Throughput ( Scale: 100s TB → PB)
  • Low Latency for most use cases(<5ms )
  • Reliability - 99.99% ( #Msgs Available /#Msgs Produced)
  • Multi-Language Support
  • Tens of thousands of simultaneous clients.
  • Reliable data replication across DC
slide-19
SLIDE 19

Kafka Rest Proxy

Applications [ProxyClient] Kafka REST Proxy Regional Kafka Aggregate Kafka Applications [ProxyClient] Kafka REST Proxy Regional Kafka

Local Agent Secondary Kafka

DataCenter-I

uReplicator

DataCenter-III DataCenter-II

slide-20
SLIDE 20

Why Kafka Rest Proxy ?

  • Simplified Client API
  • Multi-lang support (Java, NodeJs, Python, Golang)
  • Decouple client from Kafka broker

○ Thin clients = operational ease ○ Less connections to Kafka brokers ○ Future kafka upgrade

  • Enhanced Reliability

○ Primary & Secondary Kafka Clusters

slide-21
SLIDE 21

Kafka Rest Proxy: Internals

slide-22
SLIDE 22

Kafka Rest Proxy: Internals

slide-23
SLIDE 23

Kafka Rest Proxy: Internals

  • Based on Confluent’s open sourced Rest Proxy
  • Performance enhancements

○ Simple http servlets on jetty instead of Jersey ○ Optimized for binary payloads. ○ Performance increase from 7K* to 45-50K QPS/box

  • Caching of topic metadata.
  • Reliability improvements*

○ Support for Fallback cluster ○ Support for multiple Producers (SLA based segregation)

  • Plan to contribute back to community

*Based on benchmarking & analysis done in Jun ’2015

slide-24
SLIDE 24

Rest Proxy: performance (1 box)

Message rate (K/second) at single node End-end Latency (ms)

slide-25
SLIDE 25

Kafka Clusters + Rest Proxy

  • Scale to 100s Billions/day → 1 Trillion/day
  • High Throughput ( Scale: 100s TB → PB)
  • Low Latency for most use cases(<5ms )
  • Reliability - 99.99% ( #Msgs Available /#Msgs Produced)
  • Multi-Language Support
  • Tens of thousands of simultaneous clients.
  • Reliable data replication across DC
slide-26
SLIDE 26

Kafka Clients

Applications [ProxyClient] Kafka REST Proxy Regional Kafka Aggregate Kafka Applications [ProxyClient] Kafka REST Proxy Regional Kafka

Local Agent Secondary Kafka

DataCenter-I

uReplicator

DataCenter-III DataCenter-II

slide-27
SLIDE 27

Client Libraries

  • Support for multiple clusters.
  • High Throughput

○ Non-blocking, async, batching ○ <1ms produce latency for clients ○ Handles Throttling/BackOff signals from Rest Proxy

  • Topic Discovery

○ Discovers the kafka cluster a topic belongs ○ Able to multiplex to different kafka clusters

  • Integration with Local Agent for critical data
slide-28
SLIDE 28

Client Libraries

Add Figure

What if there is network glitch /

  • utage?
slide-29
SLIDE 29

Client Libraries

Add Figure

slide-30
SLIDE 30

Kafka Clusters + Rest Proxy + Clients

  • Scale to 100s Billions/day → 1 Trillion/day
  • High Throughput ( Scale: 100s TB → PB)
  • Low Latency for most use cases(<5ms )
  • Reliability - 99.99% ( #Msgs Available /#Msgs Produced)
  • Multi-Language Support
  • Tens of thousands of simultaneous clients.
  • Reliable data replication across DC
slide-31
SLIDE 31

Local Agent

Applications [ProxyClient] Kafka REST Proxy Regional Kafka Aggregate Kafka Applications [ProxyClient] Kafka REST Proxy Regional Kafka

Local Agent Secondary Kafka

DataCenter-I

uReplicator

DataCenter-III DataCenter-II

slide-32
SLIDE 32

Local Agent

  • Local spooling in case of downstream outage/backpressure
  • Backfills at the controlled rate to avoid hammering

infrastructure recovering from outage

  • Implementation:

○ Reuses code from rest-proxy and kafka’s log module. ○ Appends all topics to same file for high throughput.

slide-33
SLIDE 33

Local Agent Architecture

Add Figure

slide-34
SLIDE 34

Local Agent in Action

Add Figure

slide-35
SLIDE 35

Kafka Clusters + Rest Proxy + Clients + Local Agent

  • Scale to 100s Billions/day → 1 Trillion/day
  • High Throughput ( Scale: 100s TB → PB)
  • Low Latency for most use cases(<5ms )
  • Reliability - 99.99% ( #Msgs Available /#Msgs Produced)
  • Multi-Language Support
  • Tens of thousands of simultaneous clients.
  • Reliable data replication across DC
slide-36
SLIDE 36

uReplicator

Applications [ProxyClient] Kafka REST Proxy Regional Kafka Aggregate Kafka Applications [ProxyClient] Kafka REST Proxy Regional Kafka

Local Agent Secondary Kafka

DataCenter-I

uReplicator

DataCenter-III DataCenter-II

slide-37
SLIDE 37

Traffic from DC1 Traffic from DC3 Traffic from DC2

App box

Dispatch Mobile API

Kafka8 Aggregation Cluster

Mirror Maker

Multi-DC data flow

http calls

slide-38
SLIDE 38

CONFIDENTIAL

>> INSERT SCREENSHOT HERE <<

Mirrormaker : existing problems

  • New Topic added
  • New partitions added
  • Mirrormaker bounced
  • New mirrormaker added
slide-39
SLIDE 39

uReplicator: In-house solution

Zookeeper Helix MM Controller

Helix Agent Thread 1 Thread N

Topic-partition

Helix Agent Thread 1 Thread N

Topic-partition

Helix Agent Thread 1 Thread N

Topic-partition

MM worker1 MM worker2 MM worker3

slide-40
SLIDE 40

uReplicator

Zookeeper Helix MM Controller

Helix Agent Thread 1 Thread N

Topic-partition

Helix Agent Thread 1 Thread N

Topic-partition

Helix Agent Thread 1 Thread N

Topic-partition

MM worker1 MM worker2 MM worker3

slide-41
SLIDE 41

Kafka Clusters + Rest Proxy + Clients + Local Agent

  • Scale to 100s Billions/day → 1 Trillion/day
  • High Throughput ( Scale: 100s TB → PB)
  • Low Latency for most use cases(<5ms )
  • Reliability - 99.99% ( #Msgs Available /#Msgs Produced)
  • Multi-Language Support
  • Tens of thousands of simultaneous clients.
  • Reliable data replication across DC
slide-42
SLIDE 42

uReplicator

  • Running in production for 1+ year
  • Open sourced: https://github.com/uber/uReplicator
  • Blog: https://eng.uber.com/ureplicator/
slide-43
SLIDE 43

Chaperone - E2E Auditing

slide-44
SLIDE 44

Chaperone Architecture

slide-45
SLIDE 45

CONFIDENTIAL

>> INSERT SCREENSHOT HERE <<

Chaperone : Track counts

slide-46
SLIDE 46

CONFIDENTIAL

>> INSERT SCREENSHOT HERE <<

Chaperone : Track Latency

slide-47
SLIDE 47

Chaperone

  • Running in production for 1+ year
  • Planning to open source in ~2 Weeks
slide-48
SLIDE 48

At-least Once Kafka

slide-49
SLIDE 49

Why do we need it?

Application Process

ProxyClient

Kafka Proxy Server uReplicator

1 2 3 5 7 6 4 8

Regional Kafka Aggregate Kafka

  • Most of infrastructure tuned for high throughput

○ Batching at each stage ○ Ack before produce (ack’ed != committed)

  • Single node failure in any stage leads to data loss
  • Need a reliable pipeline for High Value Data e.g. Payments
slide-50
SLIDE 50

How did we achieve it?

  • Brokers:

○ min.insync.replicas=2, can only torrent one node failure ○ unclean.leader.election= false, need to wait until the old leader comes back

  • Rest Proxy:

○ Partition Failover

  • Improved Operations:

○ Replication throttling, to reduce impact of node bootstrap ○ Prevent catching up nodes to become ISR

slide-51
SLIDE 51

Operations/Tooling

slide-52
SLIDE 52

Partition Rebalancing

Add Figure

slide-53
SLIDE 53

Partition Rebalancing

  • Calculates partition

imbalance and inter-broker dependency.

  • Generates & Executes

Rebalance Plan.

  • Rebalance plans are

incremental, can be stopped and resumed.

  • Currently on-demand,

Automated in the future.

slide-54
SLIDE 54

XFS vs EXT4

Add Figure

slide-55
SLIDE 55

Summary: Scale

  • Kafka Brokers:

○ Multiple Clusters per DC ○ Use case based tuning

  • Rest Proxy to reduce connections and better batching
  • Rest Proxy & Clients

○ Batch everywhere, Async produce ○ Replace Jersey with Jetty

  • XFS
slide-56
SLIDE 56

Summary: Reliability

  • Local Agent
  • Secondary Clusters
  • Multi Producer support in Rest Proxy
  • uReplicator
  • Auditing via Chaperone
slide-57
SLIDE 57

Future Work

  • Open source contribution

○ Chaperone ○ Toolkit

  • Data Lineage
  • Active Active Kafka
  • Chargeback
  • Exactly once mirroring via uReplicator
slide-58
SLIDE 58

Questions ?

ankur@uber.com

slide-59
SLIDE 59

Extra Slides

slide-60
SLIDE 60

Kafka Durability (acks=1)

Broker 1 100 101 102 103 Broker 2 100 101 Broker 3 100 101 Leader Committed Producer Acked

slide-61
SLIDE 61

Kafka Durability (acks=1)

Broker 1 100 101 102 103 Broker 2 100 101 Broker 3 100 101 Leader Committed Producer Failed Acked

slide-62
SLIDE 62

Kafka Durability (acks=1)

Broker 1 100 101 102 103 Broker 2 100 101 Broker 3 100 101 Leader Committed Producer

slide-63
SLIDE 63

Kafka Durability (acks=1)

Broker 1 100 101 102 103 Broker 2 100 101 104 105 106 Broker 3 100 101 104 105 Leader Committed Producer Old HW

slide-64
SLIDE 64

Kafka Durability (acks=1)

Broker 1 100 101 102 103 Broker 2 100 101 104 105 106 Broker 3 100 101 104 105 Leader Committed Producer

X

Old HW

X

slide-65
SLIDE 65

Kafka Durability (acks=1)

Broker 1 100 101 104 105 106 Broker 2 100 101 104 105 106 Broker 3 100 101 105 106 Leader Committed Producer data loss!!

slide-66
SLIDE 66

Distributed Messaging system

* Supported in Kafka 0.8+

  • High throughput
  • Low latency
  • Scalable
  • Centralized
  • Real-time
slide-67
SLIDE 67

What is Kafka?

  • Distributed
  • Partitioned
  • Replicated
  • Commit Log

Broker 1 Broker 2 Broker 3 ZooKeeper

slide-68
SLIDE 68

What is Kafka?

  • Distributed
  • Partitioned
  • Replicated
  • Commit Log

Broker 1

Partition 0

Broker 2

Partition 1

Broker 3

Partition 2

ZooKeeper

slide-69
SLIDE 69

What is Kafka?

  • Distributed
  • Partitioned
  • Replicated
  • Commit Log

Broker 1

Partition 0 Partition 2

Broker 2

Partition 1 Partition 0

Broker 3

Partition 2 Partition 1

ZooKeeper

slide-70
SLIDE 70

What is Kafka?

  • Distributed
  • Partitioned
  • Replicated
  • Commit Log

Broker 1

Partition 0

1 2 3

Partition 2

1 2 3

Broker 2

Partition 1

1 2 3

Partition 0

1 2 3

Broker 3

Partition 2

1 2 3

Partition 1

1 2 3

ZooKeeper

slide-71
SLIDE 71

Kafka Concepts