Taking Storage for a Ride
René W. Schmidt, Storage Platform
NOVEMBER 4, 2016
Taking Storage for a Ride Ren W. Schmidt, Storage Platform - - PowerPoint PPT Presentation
Taking Storage for a Ride Ren W. Schmidt, Storage Platform NOVEMBER 4, 2016 About me Uber for almost 3 years working on scaling out our storage infrastructure across the planet. VMware for 10+ years. Part of the team that released Virtual
René W. Schmidt, Storage Platform
NOVEMBER 4, 2016
Uber for almost 3 years working on scaling out our storage infrastructure across the planet. VMware for 10+ years. Part of the team that released Virtual Center 1.0, and many vSphere releases since then. Sun Microsystems for 4 years. Part of the team that shipped the Java Hotspot Virtual Machine 1.0, and Java Web Start 1.0.
Backend Marketplace
Data Warehouse All ongoing trips Billing, Payouts, User Accounts, Trip histories, Fraud, etc. Operational Data
Backend Marketplace
We were there
Latency Scalability Availability Development Agility
As of early 2014
Backend Marketplace The Schemaless Storage System
Operational Infrastructure
Zone US-West Zone US-East Zone CN-East Zone CN-West Regions Zone US-West Zone US-West Zone US-East Zone US-East Zone CN-East Zone CN-East Zone CN-West Zone CN-West Schemaless Instances Schemaless Instances Schemaless Instances Schemaless Instances Schemaless Instances Datastores Trips Trips Service Billing Service Datastores Ratings Datastores Receipts
…
Ratings Service
… Self-Service Self-Healing Storage System Developer Abstraction Service Layer Datacenter
More than 80% of Uber’s operational data is in Schemaless From a single datastore (trips) to 300+ datastores From 48 MySQL hosts to many thousands MySQL instances
(make developers efficient and happy)
(qps, capacity, $/GB, trust in operation)
(4 9’s, zero-downtime operations, hide failures)
Trips Datastore
1000+ services (most are stateless) Each service can request their own storage
w/ redundancy
Tracking of real-life interactions
UUID BASE ROUTE FARE RATING
12AB { json dict } { json dict } { json dict } F4CD { json dict } { json dict } { json dict …, ts: 0} { json dict …, ts: 2} … { json dict }
put_cell(uuid, column_key, ts, data) get_cell(uuid, colum_key, ts) get_cell_latest(uuid, column_key)
Simple, proven, schemaless datamodel Append-only - each cell can only be written once
A007 { json dict } { json dict } { json dict }
1 2 3 Distribution Layer % 4096 4 5 6 7 4 9 2 4 9 3 4 9 4 4 9 5
Logical Model Sharding Function Fixed set of Shards Expandable set of MySQL Clusters
Defined on columns Scalable - partitioned across shards Fast queries - just need to query a single shard Can be added / removed dynamically
put_cell(100, ‘BASE’, { client_id: 10 driver_id: 437 fare: 10 } INDEX: name: CLIENT_INDEX column: BASE fields:
Shard 1 Shard 2 Shard 3
Index
Index Definition Shard 0
Cell
put_cell(121, ‘BASE’, { client_id: 10 driver_id: 217 fare: 15 }
Index Cell
Internally organized as an ordered log (append-only datastore) B-Tree index for (row, col, ts) lookups Efficient scanning for changes over time
Shard 0 Shard 1 Shard 2 Shard 3
(BASE, ROUTE) —> FARE (BASE, FARE) -> CLIENT_BILLING (BASE, FARE) -> DRIVER_PAYOUT
All backend processing is triggered based on data being written: Functional programming paradigm Robust in case of failures Eliminates out-of-band message queues
Master Slaves Master Slaves
Distribution Layer
put_cell
Slave Failure - replace Master Slaves Master Slaves
Distribution Layer
put_cell
Master Failure - Hinted Handoff Master Slaves Master Slaves
Distribution Layer
put_cell
Two options: 1) Fail write (fine for batch) 2) Buffer write and retry later Slave is promoted to master
Database Cluster Splits
1 2 3 4
Database Cluster Splits
1 2 3 4 1 2 3 4
Database Cluster Splits
1 3 2 4
How fast can we add a MySQL slave? How fast can a slave DB be promoted to a master DB?
The network is not infinitely fast
Operational Infrastructure
Zone US-West Zone US-East Zone CN-East Zone CN-West Regions Zone US-West Zone US-West Zone US-East Zone US-East Zone CN-East Zone CN-East Zone CN-West Zone CN-West Schemaless Instances Schemaless Instances Schemaless Instances Schemaless Instances Schemaless Instances Datastores Trips Trips Service Billing Service Datastores Ratings Datastores Receipts
…
Ratings Service
… Self-Service Self-Healing Storage System Developer Abstraction Service Layer Datacenters
Drift, Drift, and Drift
“Pets”
“Cattle”
Pets World View Cattle World View The desired state is in your head Desired state is codifjed Driven by you Driven by software Making changes is cool Making changes is a non-event Making operations directly on hosts Changing the model Runbooks Autonomous Brittle Robust Operation Oriented Goal-State Oriented
Goal State System Actual State Evaluate Action
Operator Input
Current Drift
Docker
Host
PULL GOAL STATE
Opsless- Agent
Goal State Actual State DB Role Sync_From Role Read-only Issues foo-db1 master
no none bar-db2 slave bar-db1 slave yes none baz-db10 slave baz-db9 slave yes none
UPDATE ACTUAL STATE
Docker
Host 1
Opsless- Agent
Goal State Actual State DB Host Role Sync_From Role Read-only Issues foo-db1 Host A master
no none foo-db2 Host B slave foo-db1 slave yes none
Docker Host 2
Opsless- Agent
Goalstate: { foo-db1: {role: master}} Goalstate: { foo-db2: { role: slave, …}}
Docker
Host 1
Opsless- Agent
Goal State Actual State DB Host Role Sync_From Role Read-only Issues foo-db1 Host A idle
no Wrong role foo-db2 Host B master
yes Wrong role
Docker Host 2
Opsless- Agent
Goalstate: { foo-db1: {role: master}} Goalstate: { foo-db2: { role: slave, …}}
Docker
Host 1
Opsless- Agent
Goal State Actual State DB Host Role Sync_From Role Read-only Issues foo-db1 Host A idle
no Wrong role foo-db2 Host B master
yes Wrong role
Docker Host 2
Opsless- Agent
Goalstate: { foo-db1: {role: master}} Goalstate: { foo-db2: { role: master, …}}
Waits for replication lag go to zero Master to be read-only
Docker
Host 1
Opsless- Agent
Goal State Actual State DB Host Role Sync_From Role Read-only Issues foo-db1 Host A idle
yes none foo-db2 Host B master
no none
Docker Host 2
Opsless- Agent
Goalstate: { foo-db1: {role: idle}} Goalstate: { foo-db2: { role: master, …}}
RAFT Consensus Protocol
Operational Infrastructure
Zone US-West Zone US-East Zone CN-East Zone CN-West Regions Zone US-West Zone US-West Zone US-East Zone US-East Zone CN-East Zone CN-East Zone CN-West Zone CN-West Schemaless Instances Schemaless Instances Schemaless Instances Schemaless Instances Schemaless Instances Datastores Trips Trips Service Billing Service Datastores Ratings Datastores Receipts
…
Ratings Service
… Self-Service Self-Healing Storage System Developer Abstraction Service Layer Datacenters