Scaling Dropbox
P R E S L AV L E , N O V E M B E R 7 T H , 2 0 1 6
Scaling Dropbox P R E S L AV L E , N O V E M B E R 7 T H , 2 0 1 - - PowerPoint PPT Presentation
Scaling Dropbox P R E S L AV L E , N O V E M B E R 7 T H , 2 0 1 6 Zone Zone (west) (east) Zone (central) block.dropbox.com Zone Zone (west) (east) Zone (central) block.dropbox.com Zone Zone (west) (east) Zone (central)
P R E S L AV L E , N O V E M B E R 7 T H , 2 0 1 6
block.dropbox.com
Zone (west) Zone (east) Zone (central)
block.dropbox.com
Zone (west) Zone (east) Zone (central)
block.dropbox.com
Zone (west) Zone (east) Zone (central)
Fear of the unknown
M E M O R Y L E A K
S Y N C H O R N I Z AT I O N E V E N T
Success story
TO DAY ’ S TA L K
P R E S L AV L E
Infrastructure Performance, Traffic team
F I L E , S Y N C & S H A R E
5 0 0 M I L L I O N U S E R S
2 0 1 2
metaserver metaserver metaserver blockserver blockserver blockserver S3 DB DB DB Memcached Memcached Memcached nginx nginx LB notification server clients nginx nginx LB async processing async processing async processing
AWS Dropbox’s datacenters
B LO C K DATA I N S 3
metaserver metaserver metaserver blockserver blockserver blockserver S3 DB DB DB Memcached Memcached Memcached nginx nginx LB notification server clients nginx nginx LB async processing async processing async processing
AWS Dropbox’s datacenters AWS
M E TA DATA I N M Y S Q L
metaserver metaserver metaserver blockserver blockserver blockserver S3 DB DB DB Memcached Memcached Memcached nginx nginx LB notification server clients nginx nginx LB async processing async processing async processing
AWS Dropbox’s datacenters Dropbox’s datacenters
1 . F E TC H M E TA DATA
metaserver metaserver metaserver blockserver blockserver blockserver S3 DB DB DB Memcached Memcached Memcached nginx nginx LB notification server clients nginx nginx LB async processing async processing async processing
AWS Dropbox’s datacenters
metaserver DB LB clients Memcached
2 . D OW N LOA D B LO C K S
metaserver metaserver metaserver blockserver blockserver blockserver S3 DB DB DB Memcached Memcached Memcached nginx nginx LB notification server clients nginx nginx LB async processing async processing async processing
AWS Dropbox’s datacenters
blockserver S3 LB LB clients
3 . WA I T F O R N OT I F I C AT I O N S
metaserver metaserver metaserver blockserver blockserver blockserver S3 DB DB DB Memcached Memcached Memcached nginx nginx LB notification server clients nginx nginx LB async processing async processing async processing
AWS Dropbox’s datacenters
notification server clients metaserver
P Y T H O N E V E R Y W H E R E
metaserver metaserver metaserver blockserver blockserver blockserver S3 DB DB DB Memcached Memcached Memcached nginx nginx LB notification server clients nginx nginx LB async processing async processing async processing
AWS Dropbox’s datacenters
Dropbox’s datacenters meta-client meta-client meta-client meta-client meta-client meta-web meta-api meta-api meta-api meta-mobile meta-mobile meta-mobile
C LU S T E R I S O L AT I O N
Scaling Databases Scaling as Organization Scaling Software Managing Complexity
S C A L I N G DATA BA S E S
mysql master mysql replica mysql replica metaserver Memcached Memcached Memcached shard1 master shard1 replica shard1 replica shard0 master shard0 replica shard0 replica shardN master shardN replica shrardN replica
…
H O R I ZO N TA L S C A L I N G
shard1 master shard1 replica shard1 replica shard0 master shard0 replica shard0 replica shardN master shardN replica shrardN replica
… …
metaserver metaserver metaserver metaserver metaserver metaserver
CO N N E C T I O N S
shard1 master shard1 replica shard1 replica shard0 master shard0 replica shard0 replica shardN master shardN replica shrardN replica
… …
metaserver metaserver metaserver metaserver metaserver metaserver
S Q L P R OX Y
shard1 master shard1 replica shard1 replica shard0 master shard0 replica shard0 replica shardN master shardN replica shrardN replica
… …
metaserver metaserver metaserver metaserver metaserver metaserver SQL Proxy SQL Proxy SQL Proxy
Scaling as Organization Scaling Software Managing Complexity Scaling Databases
G LO BA L DATA BA S E
AVA I L A B I L I T Y I S S U E S
P L AY B O O K
1. Check for ongoing deployments or newly enabled features
P L AY B O O K
1. Check for ongoing deployments or newly enabled features 2. Check for recently started background jobs
1. Check for ongoing deployments or newly enabled features 2. Check for recently started background jobs 3. DBA oncall, please help!
P L AY B O O K
Dropbox grew from 100 to 500 employees
S C A L A B L E M E TA DATA S TO R E D E S I G N E D F O R M U LT I -T E N A N C Y
2013 — Present
S H A R D I N G A N D C AC H I N G B E H I N D T H E S C E N E S
E N T I T I E S A N D A S S O C I AT I O N S
F I R S T G O S E R V I C E
Scaling Software Scaling as Organization Managing Complexity Scaling Databases
P E R F E C T S TO R M
S H A R D I N G
P H OTO A L B U M S
T E A M A D M I N CO N S O L E
R E Q U E S T FA N O U T
request
Colocation ID Counter 8 bytes 8 bytes
G LO BA L I D
Lack of colocation also hurts performance
N E W S E R V I C E : F I L E J O U R N A L
shard1 master shard1 replica shard1 replica shard0 master shard0 replica shard0 replica shardN master shardN replica shrardN replica
… …
metaserver metaserver metaserver metaserver File Journal File Journal File Journal
…
metaserver metaserver
S H A R D FA I LU R E
shard1 master shard1 replica shard1 replica shard0 master shard0 replica shard0 replica shardN master shardN replica shrardN replica
… …
metaserver metaserver metaserver metaserver File Journal File Journal File Journal
…
metaserver metaserver shard1 master
S H A R D I N G ( PA R T I I )
LO N G T I M E O U T S
shard1 master shard1 replica shard1 replica shard0 master shard0 replica shard0 replica shardN master shardN replica shrardN replica
… …
metaserver metaserver metaserver metaserver File Journal File Journal File Journal
…
metaserver metaserver shard1 master
R U N O U T O F W O R K E R S
shard1 master shard1 replica shard1 replica shard0 master shard0 replica shard0 replica shardN master shardN replica shrardN replica
… …
metaserver metaserver metaserver metaserver File Journal File Journal File Journal
…
metaserver metaserver shard1 master File Journal File Journal File Journal
C A S C A D I N G FA I LU R E
shard1 master shard1 replica shard1 replica shard0 master shard0 replica shard0 replica shardN master shardN replica shrardN replica
… …
metaserver metaserver metaserver metaserver File Journal File Journal File Journal
…
metaserver metaserver shard1 master File Journal File Journal File Journal metaserver metaserver metaserver metaserver metaserver metaserver
Limit resources dedicated to processing a single shard
S H A R D I S O L AT I O N
Managing Complexity Scaling as Organization Scaling Software Scaling Databases
500PB+ user block data 3+ geographic regions 500+ million users
M AG I C P O C K E T B LO C K S TO R AG E S Y S T E M
Zone (west) Zone (east) Zone (central) put put put get get get
complicated!
simple complicated!
complicated!
complicated!
complicated!
complicated!
P Y T H O N , G O & R U S T
2 0 1 6
meta-client meta-client meta-client meta-client meta-client meta-web meta-api meta-api meta-api meta-mobile meta-mobile meta-mobile File Journal File Journal File Journal Search Search Search Auth Auth Auth service Block Routing Block Routing Block Routing Auth Auth Edgestore Auth Auth Presence &Notications File Journal File Journal Cape
…
blockserver blockserver blockserver Magic Pocket Magic Pocket Magic Pocket Blockservice Riviera Riviera Thumbnail service
H O W TO P R E V E N T C A S C A D I N G FA I LU R E ?
meta-client meta-client meta-client meta-client meta-client meta-web meta-api meta-api meta-api meta-mobile meta-mobile meta-mobile File Journal File Journal File Journal Search Search Search Auth Auth Auth service Block Routing Block Routing Block Routing Auth Auth Edgestore Auth Auth Presence &Notications File Journal File Journal Cape
…
blockserver blockserver blockserver Magic Pocket Magic Pocket Magic Pocket Blockservice Riviera Riviera Thumbnail service Search
BA N DA I D : P E R R O U T E I S O L AT I O N
Q U E U E P R I O R I T I Z AT I O N
Partition & Isolate (data or services)
cluster isolation: data model isolation: shard isolation: region isolation: route isolation: Metaserver Edgestore File Journal Magic Pocket Bandaid
Isolation