Scaling Uber with Node.js Amos Barreto @amos_barreto Uber is - - PowerPoint PPT Presentation

scaling uber with node js
SMART_READER_LITE
LIVE PREVIEW

Scaling Uber with Node.js Amos Barreto @amos_barreto Uber is - - PowerPoint PPT Presentation

Scaling Uber with Node.js Amos Barreto @amos_barreto Uber is everyones Private driver. REQUEST RIDE RATE Tap to select location Sit back and relax, tell your Help us maintain a quality service driver your destination by


slide-1
SLIDE 1

Scaling Uber with Node.js

Amos Barreto @amos_barreto

slide-2
SLIDE 2
slide-3
SLIDE 3

REQUEST

Tap to select location

RIDE

Sit back and relax, tell your 
 driver your destination

RATE

Help us maintain a quality service by rating your experience

Uber is everyone’s Private driver.

slide-4
SLIDE 4

4

YOUR DRIVERS

slide-5
SLIDE 5

19

Your Drivers

UBER QUALIFIED

Uber only partners with drivers who have a keen eye for customer service and a passion for the trade.

RIDER RATED

Tell us what you think. Your feedback helps us work with drivers to constantly improve the Uber experience.

LICENSED & INSURED

From insurance to background checks, every driver meets or beats local regulations.

slide-6
SLIDE 6

4

LOGISTICS

slide-7
SLIDE 7

22

#OMGUBERICECREAM

slide-8
SLIDE 8

22

#OMGUBERCHOPPER

UberChopper

slide-9
SLIDE 9

22

#UBERVALENTINES

slide-10
SLIDE 10

22

#ICANHASUBERKITTENS

slide-11
SLIDE 11

6

Trip State Machine (Simplified)

Request Dispatch Accept Arrive Begin End

slide-12
SLIDE 12

6

Trip State Machine (Extended)

Request Dispatch (1) Expire / Reject Arrive Begin End Dispatch (2) Accept

slide-13
SLIDE 13

4

OUR STORY

slide-14
SLIDE 14

6

PHP Cron

  • PHP dispatch

  • Outsourced to remote

contractors in Midwest


  • Half the code in spanish

  • Flat file
  • Lifetime: 6-9 months

Version 1

slide-15
SLIDE 15

33

slide-16
SLIDE 16

“I read an article on HackerNews about a new framework called Node.js”


  • Jason Roberts
slide-17
SLIDE 17

Tradeoffs

  • Learning curve
  • Scalability
  • Performance
  • Library ecosystem
  • Database drivers
  • Documentation
  • Monitoring
  • Production operations
slide-18
SLIDE 18

Node.js

  • Lifetime: 9 months
  • Developed in house
  • Node.js application
  • Prototyped on 0.2
  • Launched in production

with 0.4

  • MongoDB datastore

Version 2

slide-19
SLIDE 19

33

“I really don’t see dispatch changing much in the next three years”

slide-20
SLIDE 20

15

Expect the unexpected

slide-21
SLIDE 21

SF

  • Mongo did not scale with

volume of GPS logs (global write lock)

  • Swapped mongo for redis

and flat files

Version 3

NYC SEA CHI CN CN CN CN

slide-22
SLIDE 22

Decoupling storage of different types of data

slide-23
SLIDE 23

SF

  • Node.js mongo client failed

to recognize replica set topology changes

Version 3 (continued)

NYC SEA CHI CN CN CN CN

slide-24
SLIDE 24

Be wary of immature client libraries

slide-25
SLIDE 25

Commits to client modules over time

slide-26
SLIDE 26

SF

Version 3 (continued)

NYC SEA CHI BOS PAR

slide-27
SLIDE 27

15

Focus on driving business value

slide-28
SLIDE 28

15

slide-29
SLIDE 29

15

slide-30
SLIDE 30

15

Capacity planning, forecasting, and load testing are your friends

slide-31
SLIDE 31

15

Measure everything

slide-32
SLIDE 32

SF

  • Nickname: The Grid
  • Multi-process dispatch
  • Peer assignment
  • Redis is now considered the

source of truth

  • Use lua interpreter for

atomic operations

  • Fan out to all city peers to

find nearby cars

Version 4

NYC SEA CHI CN CN CN SF SF NYC CHI CHI

slide-33
SLIDE 33

15

clientStatus = redis.call('hget', clientHash, ‘status’) driverStatus = redis.call('hget', driverHash, ‘status’) if clientStatus == 'WaitingForPickup' and driverStatus == 'Open' then local clientPeerId = redis.call('hget', clientAssignmentHash, clientToken) redis.call('hset', driverHash, 'status', 'DispatchPending') redis.call('hset', clientAssignmentHash, clientToken, driverPeerId) if clientPeerId then redis.call('zincrby', countKey, -1, clientPeerId) end redis.call('zincrby', countKey, 1, driverPeerId) return redis.status_reply('SUCCESS') else return redis.error_reply('ERROR - clientStatus: '..tostring(clientStatus)..', driverStatus: '..tostring(driverStatus)) end

slide-34
SLIDE 34

15

clientStatus = redis.call('hget', clientHash, ‘status’) driverStatus = redis.call('hget', driverHash, ‘status’) if clientStatus == 'WaitingForPickup' and driverStatus == 'Open' then local clientPeerId = redis.call('hget', clientAssignmentHash, clientToken) redis.call('hset', driverHash, 'status', 'DispatchPending') redis.call('hset', clientAssignmentHash, clientToken, driverPeerId) if clientPeerId then redis.call('zincrby', countKey, -1, clientPeerId) end redis.call('zincrby', countKey, 1, driverPeerId) return redis.status_reply('SUCCESS') else return redis.error_reply('ERROR - clientStatus: '..tostring(clientStatus)..', driverStatus: '..tostring(driverStatus)) end

slide-35
SLIDE 35

15

clientStatus = redis.call('hget', clientHash, ‘status’) driverStatus = redis.call('hget', driverHash, ‘status’) if clientStatus == 'WaitingForPickup' and driverStatus == 'Open' then local clientPeerId = redis.call('hget', clientAssignmentHash, clientToken) redis.call('hset', driverHash, 'status', 'DispatchPending') redis.call('hset', clientAssignmentHash, clientToken, driverPeerId) if clientPeerId then redis.call('zincrby', countKey, -1, clientPeerId) end redis.call('zincrby', countKey, 1, driverPeerId) return redis.status_reply('SUCCESS') else return redis.error_reply('ERROR - clientStatus: '..tostring(clientStatus)..', driverStatus: '..tostring(driverStatus)) end

slide-36
SLIDE 36

15

clientStatus = redis.call('hget', clientHash, ‘status’) driverStatus = redis.call('hget', driverHash, ‘status’) if clientStatus == 'WaitingForPickup' and driverStatus == 'Open' then local clientPeerId = redis.call('hget', clientAssignmentHash, clientToken) redis.call('hset', driverHash, 'status', 'DispatchPending') redis.call('hset', clientAssignmentHash, clientToken, driverPeerId) if clientPeerId then redis.call('zincrby', countKey, -1, clientPeerId) end redis.call('zincrby', countKey, 1, driverPeerId) return redis.status_reply('SUCCESS') else return redis.error_reply('ERROR - clientStatus: '..tostring(clientStatus)..', driverStatus: '..tostring(driverStatus)) end

slide-37
SLIDE 37

SF1

Version 4 (continued)

SF3 SF2

SEA1

NY1 NY3 NY2 NY4

SEA2 SEA3 SEA4 CHI1 CHI2 CHI3

BOS1 BOS2

PAR1

slide-38
SLIDE 38
slide-39
SLIDE 39

Version 5

SF SF SF SF SF SF SF SF SF

slide-40
SLIDE 40

Version 5

max # of loc queries # of nodes

slide-41
SLIDE 41

SF

Version 5

NYC SEA CHI CN CN CN SF SF NYC CHI CHI ncar ncar ncar ncar

slide-42
SLIDE 42

15

Break out services as needed

slide-43
SLIDE 43

15

Understand v8 to optimize Node.js applications

slide-44
SLIDE 44
slide-45
SLIDE 45
slide-46
SLIDE 46

SF1 SF3 SF2

SEA1

NY1 NY3 NY2 NY4

SEA2 SEA3 SEA4 CHI1 CHI2 CHI3

BOS1 BOS2

PAR1

slide-47
SLIDE 47

Don’t take vacation ;)

slide-48
SLIDE 48

15

Don’t live in Chicago!

slide-49
SLIDE 49

15

Stateless applications… No single points of failure… Replicated data stores… Dynamic application topology…

slide-50
SLIDE 50

SF1

Version 6

SF3 SF2

SEA1

NY1 NY3 NY2 NY4

SEA2 SEA3 SEA4 CHI1 CHI2 CHI3

BOS1 BOS2

PAR1

Grid Manager Grid Manager Grid Manager

slide-51
SLIDE 51

Version 7

haproxy

slide-52
SLIDE 52

15

Do the obvious

slide-53
SLIDE 53
slide-54
SLIDE 54

Pros

  • every application is horizontally scalable

  • flexible, partially dynamic topology

  • failure recovery manual in the worst case
  • supports primary business case very well

  • conservative estimates 1-2 years of runway
slide-55
SLIDE 55

Never be satisfied

slide-56
SLIDE 56

Cons

  • what happens when a city out scales the capacity of a single redis instance?
  • who wants to wake up in the middle of the night for servers crashes?
  • what about future business use cases?
slide-57
SLIDE 57

4

#WORLDCLASS

slide-58
SLIDE 58

World Class

  • city agnostic dispatch application
  • “stateless” applications
  • scale to 100x current load
  • flexible data model
slide-59
SLIDE 59

15

Every now and then it’s okay to bend the rules

slide-60
SLIDE 60

Realtime Analytics

slide-61
SLIDE 61

Realtime Analytics

slide-62
SLIDE 62

So why did we stick with Node.js?

  • JavaScript is easy to learn
  • Simple interface with thorough documentation
  • Lends itself to fast prototyping
  • Asynchronous, nimble
  • Avoid concurrency challenges
  • Increasingly mature module ecosystem
slide-63
SLIDE 63

How to win with Node.js?

  • measure everything - particularly response times and event loop lag

  • learn to take heap dumps to debug memory issues

  • strace, perf, flame graphs are necessary tools for improving performance
  • small, reusable components to reduce duplication
slide-64
SLIDE 64

34

The Human Factor

slide-65
SLIDE 65

Thank you. Questions?