SLIDE 1 The Architecture of Uber's Realtime System
Amos Barreto @amos_barreto Danny Yuan @g9yuayon March 25, 2015
SLIDE 2
What is Uber
SLIDE 3
Uber is a transportation platform
SLIDE 4
Uber connects riders to drivers
SLIDE 5
Transportation at your fingertips
SLIDE 6
SLIDE 7
SLIDE 8
What is Realtime?
SLIDE 9
It’s the brain of Uber’s logistics platform
SLIDE 10
Driver Riders
It assigns drivers to riders
SLIDE 11
It balances driver & rider satisfaction
SLIDE 12
Sounds pretty simple, right?
SLIDE 13 Realtime Analytics
Not Really
SLIDE 14 Realtime Analytics
Not Really
SLIDE 15 Realtime Analytics
Not Really
SLIDE 16 Realtime Analytics
Not Really
SLIDE 17
We didn’t start with this
SLIDE 18
Instead…
SLIDE 19 The Beginning
- PHP application
- Transactions appended to flat files
- Half of the code in Español
- Lifespan: 6-9 months
SLIDE 20
But we had to evolve
SLIDE 21
So we built a distributed state machine
SLIDE 22
We tried to follow good practices
SLIDE 23
Simulator is very helpful
SLIDE 24
Extensive instrumentation
SLIDE 25
Asynchronous State Machines
SLIDE 26 Scaling Horizontally
W
SLIDE 27 Scaling Horizontally
W W W W W
SLIDE 28 Stateless Workers
W W W W W
SLIDE 29 Stateless Workers
W W W W W
HAProxy
SLIDE 30 Shared States
W W W W W
HAProxy
SLIDE 31 Shared States
W W W W W
HAProxy Redis with T wemproxy
SLIDE 32 Everything can go down
W W W W W
HAProxy Redis with T wemproxy
SLIDE 33 Everything can go down
W W W W W
HAProxy Redis with T wemproxy Riak
SLIDE 34
There’s tougher problem to solve
SLIDE 35
- Trip states happen in partial order
- Rider and driver states may need
to be synchronized
Dispatch needs coordination
SLIDE 36 Coordination is hard
“Minimizing coordination, or blocking communication between concurrently executing operations, is key to maximizing scalability, availability, and high performance in database systems” Coordination Avoidance in Database Systems
SLIDE 37
- Row lock? Distributed lock?
Consensus protocols?
- Insight: What we really need is
- rdered execution
- Solution: Assign requests with the
same user-defined key to the same stateless worker node
Coordination is hard
SLIDE 38 What about load balance?
W W W W W
SLIDE 39 What about load balance?
W W W W W
HAProxy
SLIDE 40 What about load balance?
W W W W W
HAProxy
SLIDE 41 What about load balance?
W W W W W
HAProxy
SLIDE 42
What about load balance?
Consistent Hashing to the rescue
SLIDE 43
What about load balance?
Consistent Hashing to the rescue
SLIDE 44 Membership changes
- A worker in a cluster can crash
- A worker can join a cluster
- We need fast and reliable failure
detector and membership updates
SLIDE 45 Key Insights
- Separate failure detection from
membership updates
- Do not rely on a single peer for
failure detection
- Membership changes via gossip-like
protocols
SLIDE 46
SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol
SLIDE 47 Ringpop
- Open source
- Hash ring abstraction
- Implements a variation of SWIM
- Flap damping
- Use checksums to verify correctness of
ring state
SLIDE 48 Ringpop
var ringpop = new RingPop({ app: 'myapp', hostPort: 'myhost:30000' }); ringpop.bootstrap(['myhost:30001’, 'myhost2:30000']); ringpop.on('ready', function() { // do something }); var node = ringpop.lookup(‘[unique-request-id]’); if (node === ringpop.whoami()) { // process request } else { // forward request }
SLIDE 49 Ringpop Serial
- Simple ringpop wrapper
- Requests are queued by key
- Processed serially, one at a time
- Emulates transactions
SLIDE 50
SLIDE 51
SLIDE 52
SLIDE 53
SLIDE 54
SLIDE 55
SLIDE 56
Transactions
SLIDE 57 Transactions
- Conflicts are possible during
membership changes
SLIDE 58 Transactions
- Conflicts are possible during
membership changes
- Need smart application level conflict
resolution
SLIDE 59
Global Geospacial Index
SLIDE 60 Global Geospacial Index
- High volume of location updates
- Mild, but expensive, query volume
- Large search space (the world)
SLIDE 61
SLIDE 62
SLIDE 63
SLIDE 64
SLIDE 65
SLIDE 66
What about crash recovery?
SLIDE 67 Sevnup
- Open sourced node.js module
- Ringpop extension
- Key ownership hand-off
- Customizable recovery & release
- Pluggable persistence layer
SLIDE 68 A B C D 1 2 3 4 5 … Virtual Nodes (1024) Ringpop Application Cluster key432 key988 Keys key654
SLIDE 69 A B C D 1 2 3 4 5 … Virtual Nodes (1024) Ringpop Application Cluster key432 key988 Keys key654
SLIDE 70 Reliable Timers
- Node.js offers in-memory timers
- Use sevnup to make them reliable
- Riak as persistence layer
SLIDE 71 Data Center Failure
- How do we replicate trip data?
- Constants updates
- Writes heavy
- T
emporal, and minimal loss expected
SLIDE 72
Key insight: each driver application has trip data already
Realtime Trip Replication (RTTR)
SLIDE 73 RTTR
- A key-value store on the phone
- A timeseries store for partner gps
points on the phone
communication protocols
SLIDE 74 Realtime Analytics
Data in Realtime
SLIDE 75
Ops need realtime analytics
SLIDE 76
Ops need realtime analytics
SLIDE 77
Ops need realtime analytics
SLIDE 78
Ops need realtime analytics
SLIDE 79 Dispatch needs data for decisions
Realtime Analytics
SLIDE 80 Dispatch needs data for decisions
Realtime Analytics
SLIDE 81 Dispatch needs data for decisions
Realtime Analytics
SLIDE 82
Dispatch needs data for decisions
SLIDE 83
Applications need real-time data
SLIDE 84 Applications need real-time data
SLIDE 85 Applications need real-time data
SLIDE 86 Applications need real-time data
- Notification
- Marketing
- Fraud detection
SLIDE 87
Dispatch can’t do everything
SLIDE 88
Empower
But we use data to empower people
SLIDE 89 An event-based data platform
state driver_arrived from_state driver_accepted timestamp 13244323342 lattitude 12.23 longitude 30.00
SLIDE 90
An event-based data platform
SLIDE 91 An event-based data platform
- Reliable replication of states
SLIDE 92 An event-based data platform
- Reliable replication of states
- Canonical state representation
SLIDE 93 An event-based data platform
- Reliable replication of states
- Canonical state representation
- Domain specific APIs
SLIDE 94
Reliable replication of states
SLIDE 95
Reliable replication of states
SLIDE 96
Reliable replication of states
SLIDE 97
Canonical representation of states
SLIDE 98
Canonical representation of states
SLIDE 99
- Consistency matters
- Normalize your events if possible. E.g.,
no PII
Canonical representation of states
SLIDE 100
- Consistency matters
- Normalize your events if possible. E.g.,
no PII
- More generally: keep apps robust by
minimizing assumptions
Canonical representation of states
SLIDE 101
- Consistency matters
- Normalize your events if possible. E.g.,
no PII
- More generally: keep apps robust by
minimizing assumptions
- Introduce context to correlate events.
E.g., trip ID, root service
Canonical representation of states
SLIDE 102
We need an event stream processor
SLIDE 103 We need an event stream processor
SLIDE 104 Canonical representation of states
- Static join
- State tracking
SLIDE 105 Canonical representation of states
- Static join
- State tracking
SLIDE 106 Canonical representation of states
- Static join
- State tracking
SLIDE 107 Canonical representation of states
- Static join
- State tracking
- Aggregation
SLIDE 108 Tip: Emit event for edges instead of nodes
1 1
- Option 1: two events
- driver_dispatched
- driver_rejected
SLIDE 109 Tip: Emit event for edges instead of nodes
1 1 2
- Option 1: two events
- driver_dispatched
- driver_rejected
- Option 2: single event
- driver_rejected
- parent: driver_dispatched
SLIDE 110
Apache Storm for Event Processing
SLIDE 111
Results go back to Kafka
SLIDE 112
Results go back to Kafka
SLIDE 113
Results are also indexed
SLIDE 114
Results are also indexed
SLIDE 115
Lessons Learned
SLIDE 116 Lessons Learned
SLIDE 117 Lessons Learned
- Know your data
- Really small graph
SLIDE 118 Lessons Learned
- Know your data
- Really small graph
- Lots of them over time
SLIDE 119 Lessons Learned
- Know your data
- Really small graph
- Lots of them over time
- No need to have a graph database
SLIDE 120
Lessons Learned
SLIDE 121 Lessons Learned
- Boolean query is a must have
SLIDE 122 Lessons Learned
- Boolean query is a must have
- Pre-aggregation is nice, but keep all
the dimensions for GROUP BY queries
SLIDE 123 Lessons Learned
- Boolean query is a must have
- Pre-aggregation is nice, but keep all
the dimensions for GROUP BY queries
- Build a query service first
SLIDE 124
SLIDE 125
SLIDE 126
SLIDE 127
SLIDE 128
SLIDE 129
Domain Specific Query Service
SLIDE 130
Optimize for growth
SLIDE 131 Optimize for growth
- Uber has been growing fast
SLIDE 132 Optimize for growth
- Uber has been growing fast
- We strive for fast iterations
SLIDE 133
Separation of concerns
SLIDE 134 Separation of concerns
- Application teams care about business logic
SLIDE 135 Separation of concerns
- Application teams care about business logic
- Someone has to worry about optimization,
caching, indexing, scaling, and etc
SLIDE 136
SLIDE 137
/driverAcceptanceRate? ¡ geo_dist(10, ¡[37, ¡22])& ¡ time_range(2015-‑02-‑04,2015-‑03-‑06)& ¡ aggregate(timeseries(7d))& ¡ eq(msg.driverId,1) ¡
SLIDE 138 Time in seconds
SLIDE 139
SLIDE 140
SLIDE 141
SLIDE 142
SLIDE 143
SLIDE 144
Questions?