The Architecture of Uber's Realtime System March 25, 2015 Amos - - PowerPoint PPT Presentation

the architecture of uber s realtime system
SMART_READER_LITE
LIVE PREVIEW

The Architecture of Uber's Realtime System March 25, 2015 Amos - - PowerPoint PPT Presentation

The Architecture of Uber's Realtime System March 25, 2015 Amos Barreto Danny Yuan @amos_barreto @g9yuayon What is Uber Uber is a transportation platform Uber connects riders to drivers Transportation at your fingertips What is Realtime?


slide-1
SLIDE 1

The Architecture of Uber's Realtime System

Amos Barreto @amos_barreto Danny Yuan @g9yuayon March 25, 2015

slide-2
SLIDE 2

What is Uber

slide-3
SLIDE 3

Uber is a transportation platform

slide-4
SLIDE 4

Uber connects riders to drivers

slide-5
SLIDE 5

Transportation at your fingertips

slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8

What is Realtime?

slide-9
SLIDE 9

It’s the brain of Uber’s logistics platform

slide-10
SLIDE 10

Driver Riders

It assigns drivers to riders

slide-11
SLIDE 11

It balances driver & rider satisfaction

slide-12
SLIDE 12

Sounds pretty simple, right?

slide-13
SLIDE 13

Realtime Analytics

Not Really

slide-14
SLIDE 14

Realtime Analytics

Not Really

slide-15
SLIDE 15

Realtime Analytics

Not Really

slide-16
SLIDE 16

Realtime Analytics

Not Really

slide-17
SLIDE 17

We didn’t start with this

slide-18
SLIDE 18

Instead…

slide-19
SLIDE 19

The Beginning

  • PHP application
  • Transactions appended to flat files
  • Half of the code in Español
  • Lifespan: 6-9 months
slide-20
SLIDE 20

But we had to evolve

slide-21
SLIDE 21

So we built a distributed state machine

slide-22
SLIDE 22

We tried to follow good practices

slide-23
SLIDE 23

Simulator is very helpful

slide-24
SLIDE 24

Extensive instrumentation

slide-25
SLIDE 25

Asynchronous State Machines

slide-26
SLIDE 26

Scaling Horizontally

W

slide-27
SLIDE 27

Scaling Horizontally

W W W W W

slide-28
SLIDE 28

Stateless Workers

W W W W W

slide-29
SLIDE 29

Stateless Workers

W W W W W

HAProxy

slide-30
SLIDE 30

Shared States

W W W W W

HAProxy

slide-31
SLIDE 31

Shared States

W W W W W

HAProxy Redis with T wemproxy

slide-32
SLIDE 32

Everything can go down

W W W W W

HAProxy Redis with T wemproxy

slide-33
SLIDE 33

Everything can go down

W W W W W

HAProxy Redis with T wemproxy Riak

slide-34
SLIDE 34

There’s tougher problem to solve

slide-35
SLIDE 35
  • Trip states happen in partial order
  • Rider and driver states may need

to be synchronized

Dispatch needs coordination

slide-36
SLIDE 36

Coordination is hard

“Minimizing coordination, or blocking communication between concurrently executing operations, is key to maximizing scalability, availability, and high performance in database systems” Coordination Avoidance in Database Systems

slide-37
SLIDE 37
  • Row lock? Distributed lock?

Consensus protocols?

  • Insight: What we really need is
  • rdered execution
  • Solution: Assign requests with the

same user-defined key to the same stateless worker node

Coordination is hard

slide-38
SLIDE 38

What about load balance?

W W W W W

slide-39
SLIDE 39

What about load balance?

W W W W W

HAProxy

slide-40
SLIDE 40

What about load balance?

W W W W W

HAProxy

slide-41
SLIDE 41

What about load balance?

W W W W W

HAProxy

slide-42
SLIDE 42

What about load balance?

Consistent Hashing to the rescue

slide-43
SLIDE 43

What about load balance?

Consistent Hashing to the rescue

slide-44
SLIDE 44

Membership changes

  • A worker in a cluster can crash
  • A worker can join a cluster
  • We need fast and reliable failure

detector and membership updates

slide-45
SLIDE 45

Key Insights

  • Separate failure detection from

membership updates

  • Do not rely on a single peer for

failure detection

  • Membership changes via gossip-like

protocols

slide-46
SLIDE 46

SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol

slide-47
SLIDE 47

Ringpop

  • Open source
  • Hash ring abstraction
  • Implements a variation of SWIM
  • Flap damping
  • Use checksums to verify correctness of

ring state

  • Proxying capabilities
slide-48
SLIDE 48

Ringpop

var ringpop = new RingPop({ app: 'myapp', hostPort: 'myhost:30000' }); ringpop.bootstrap(['myhost:30001’, 'myhost2:30000']); ringpop.on('ready', function() { // do something }); var node = ringpop.lookup(‘[unique-request-id]’); if (node === ringpop.whoami()) { // process request } else { // forward request }

slide-49
SLIDE 49

Ringpop Serial

  • Simple ringpop wrapper
  • Requests are queued by key
  • Processed serially, one at a time
  • Emulates transactions
slide-50
SLIDE 50
slide-51
SLIDE 51
slide-52
SLIDE 52
slide-53
SLIDE 53
slide-54
SLIDE 54
slide-55
SLIDE 55
slide-56
SLIDE 56

Transactions

slide-57
SLIDE 57

Transactions

  • Conflicts are possible during

membership changes

slide-58
SLIDE 58

Transactions

  • Conflicts are possible during

membership changes

  • Need smart application level conflict

resolution

slide-59
SLIDE 59

Global Geospacial Index

slide-60
SLIDE 60

Global Geospacial Index

  • High volume of location updates
  • Mild, but expensive, query volume
  • Large search space (the world)
slide-61
SLIDE 61
slide-62
SLIDE 62
slide-63
SLIDE 63
slide-64
SLIDE 64
slide-65
SLIDE 65
slide-66
SLIDE 66

What about crash recovery?

slide-67
SLIDE 67

Sevnup

  • Open sourced node.js module
  • Ringpop extension
  • Key ownership hand-off
  • Customizable recovery & release
  • Pluggable persistence layer
slide-68
SLIDE 68

A B C D 1 2 3 4 5 … Virtual Nodes (1024) Ringpop Application Cluster key432 key988 Keys key654

slide-69
SLIDE 69

A B C D 1 2 3 4 5 … Virtual Nodes (1024) Ringpop Application Cluster key432 key988 Keys key654

slide-70
SLIDE 70

Reliable Timers

  • Node.js offers in-memory timers
  • Use sevnup to make them reliable
  • Riak as persistence layer
slide-71
SLIDE 71

Data Center Failure

  • How do we replicate trip data?
  • Constants updates
  • Writes heavy
  • T

emporal, and minimal loss expected

slide-72
SLIDE 72

Key insight: each driver application has trip data already

Realtime Trip Replication (RTTR)

slide-73
SLIDE 73

RTTR

  • A key-value store on the phone
  • A timeseries store for partner gps

points on the phone

  • Piggyback on existing

communication protocols

  • All data encrypted
slide-74
SLIDE 74

Realtime Analytics

Data in Realtime

slide-75
SLIDE 75

Ops need realtime analytics

slide-76
SLIDE 76

Ops need realtime analytics

slide-77
SLIDE 77

Ops need realtime analytics

slide-78
SLIDE 78

Ops need realtime analytics

slide-79
SLIDE 79

Dispatch needs data for decisions

Realtime Analytics

slide-80
SLIDE 80

Dispatch needs data for decisions

Realtime Analytics

slide-81
SLIDE 81

Dispatch needs data for decisions

Realtime Analytics

slide-82
SLIDE 82

Dispatch needs data for decisions

slide-83
SLIDE 83

Applications need real-time data

slide-84
SLIDE 84

Applications need real-time data

  • Notification
slide-85
SLIDE 85

Applications need real-time data

  • Notification
  • Marketing
slide-86
SLIDE 86

Applications need real-time data

  • Notification
  • Marketing
  • Fraud detection
slide-87
SLIDE 87

Dispatch can’t do everything

slide-88
SLIDE 88

Empower

But we use data to empower people

slide-89
SLIDE 89

An event-based data platform

state driver_arrived from_state driver_accepted timestamp 13244323342 lattitude 12.23 longitude 30.00

slide-90
SLIDE 90

An event-based data platform

slide-91
SLIDE 91

An event-based data platform

  • Reliable replication of states
slide-92
SLIDE 92

An event-based data platform

  • Reliable replication of states
  • Canonical state representation
slide-93
SLIDE 93

An event-based data platform

  • Reliable replication of states
  • Canonical state representation
  • Domain specific APIs
slide-94
SLIDE 94

Reliable replication of states

slide-95
SLIDE 95

Reliable replication of states

slide-96
SLIDE 96

Reliable replication of states

slide-97
SLIDE 97

Canonical representation of states

slide-98
SLIDE 98
  • Consistency matters

Canonical representation of states

slide-99
SLIDE 99
  • Consistency matters
  • Normalize your events if possible. E.g.,

no PII

Canonical representation of states

slide-100
SLIDE 100
  • Consistency matters
  • Normalize your events if possible. E.g.,

no PII

  • More generally: keep apps robust by

minimizing assumptions

Canonical representation of states

slide-101
SLIDE 101
  • Consistency matters
  • Normalize your events if possible. E.g.,

no PII

  • More generally: keep apps robust by

minimizing assumptions

  • Introduce context to correlate events.

E.g., trip ID, root service

Canonical representation of states

slide-102
SLIDE 102

We need an event stream processor

slide-103
SLIDE 103

We need an event stream processor

  • Static join
slide-104
SLIDE 104

Canonical representation of states

  • Static join
  • State tracking
slide-105
SLIDE 105

Canonical representation of states

  • Static join
  • State tracking
slide-106
SLIDE 106

Canonical representation of states

  • Static join
  • State tracking
slide-107
SLIDE 107

Canonical representation of states

  • Static join
  • State tracking
  • Aggregation
slide-108
SLIDE 108

Tip: Emit event for edges instead of nodes

1 1

  • Option 1: two events
  • driver_dispatched
  • driver_rejected
slide-109
SLIDE 109

Tip: Emit event for edges instead of nodes

1 1 2

  • Option 1: two events
  • driver_dispatched
  • driver_rejected
  • Option 2: single event
  • driver_rejected
  • parent: driver_dispatched
slide-110
SLIDE 110

Apache Storm for Event Processing

slide-111
SLIDE 111

Results go back to Kafka

slide-112
SLIDE 112

Results go back to Kafka

slide-113
SLIDE 113

Results are also indexed

slide-114
SLIDE 114

Results are also indexed

slide-115
SLIDE 115

Lessons Learned

slide-116
SLIDE 116

Lessons Learned

  • Know your data
slide-117
SLIDE 117

Lessons Learned

  • Know your data
  • Really small graph
slide-118
SLIDE 118

Lessons Learned

  • Know your data
  • Really small graph
  • Lots of them over time
slide-119
SLIDE 119

Lessons Learned

  • Know your data
  • Really small graph
  • Lots of them over time
  • No need to have a graph database
slide-120
SLIDE 120

Lessons Learned

slide-121
SLIDE 121

Lessons Learned

  • Boolean query is a must have
slide-122
SLIDE 122

Lessons Learned

  • Boolean query is a must have
  • Pre-aggregation is nice, but keep all

the dimensions for GROUP BY queries

slide-123
SLIDE 123

Lessons Learned

  • Boolean query is a must have
  • Pre-aggregation is nice, but keep all

the dimensions for GROUP BY queries

  • Build a query service first
slide-124
SLIDE 124
slide-125
SLIDE 125
slide-126
SLIDE 126
slide-127
SLIDE 127
slide-128
SLIDE 128
slide-129
SLIDE 129

Domain Specific Query Service

slide-130
SLIDE 130

Optimize for growth

slide-131
SLIDE 131

Optimize for growth

  • Uber has been growing fast
slide-132
SLIDE 132

Optimize for growth

  • Uber has been growing fast
  • We strive for fast iterations
slide-133
SLIDE 133

Separation of concerns

slide-134
SLIDE 134

Separation of concerns

  • Application teams care about business logic
slide-135
SLIDE 135

Separation of concerns

  • Application teams care about business logic
  • Someone has to worry about optimization,

caching, indexing, scaling, and etc

slide-136
SLIDE 136
slide-137
SLIDE 137

/driverAcceptanceRate? ¡ geo_dist(10, ¡[37, ¡22])& ¡ time_range(2015-­‑02-­‑04,2015-­‑03-­‑06)& ¡ aggregate(timeseries(7d))& ¡ eq(msg.driverId,1) ¡

slide-138
SLIDE 138

Time in seconds

slide-139
SLIDE 139
slide-140
SLIDE 140
slide-141
SLIDE 141
slide-142
SLIDE 142
slide-143
SLIDE 143
slide-144
SLIDE 144

Questions?