SLIDE 1
Distributed Systems Maciej opatka Facebook Inbox Search Authors - - PowerPoint PPT Presentation
Distributed Systems Maciej opatka Facebook Inbox Search Authors - - PowerPoint PPT Presentation
Distributed Systems Maciej opatka Facebook Inbox Search Authors Avinash Lakshman (one of the authors of Amazon's Dynamo) and Prashant Malik Facebook code dump Community Transfer to Apache Software Foundation An Apache top
SLIDE 2
SLIDE 3
BigTable data model An Amazon Dynamo-like infrastructure
SLIDE 4
Distributed multidimensional map indexed by
a key
Four or five dimensions Key Value Timestamp Data
SLIDE 5
Keyspace → Column Family Column Family → Column Family Row Column Family Row → Columns Column → Data value
SLIDE 6
Keyspace → Super Column Family Super Column Family → Super Column Family
Row
Super Column Family Row → Columns Row Column Row → Columns Column → Data value
SLIDE 7
Replication Log file Bootstrapping
- Partitioning
- Consistent Hashing
- Periodic Data Compaction
Gossip Anti-Entropy data sync (uses Merkel tree) Write and Read Quorum W + R > N
SLIDE 8
SLIDE 9
RandomPartitioner OrderPreservingPartitioner
SLIDE 10
SLIDE 11
Terabytes of data Replaced MySQL Detecting failures in 15 seconds ZooKeeper used to locate nodes Replaced by HBase
SLIDE 12
50+TB of data on a 150 node cluster, east
and west coast data centers
Term search
UserId -> Word -> MessageId Columns
Interaction search
UserId -> Recipient UserId -> MessageId Columns
Latency Stat Search Inte teractio tions Term Search Min 7.69ms 7.78ms Median 15.69ms 18.27ms Max 26.13ms 44.41ms
- Tab. Read performance
SLIDE 13
SLIDE 14
SLIDE 15
SLIDE 16
SLIDE 17
SLIDE 18
Workload A— 50 percent reads and 50 percent updates, update heavy: (a) read operations, (b) update operations. Six server-class machines (dual 64-bit quad core 2.5 GHz Intel Xeon CPUs, 8 GB of RAM, 6 disk RAID-10 array and gigabit ethernet)
SLIDE 19
Workload B — 50 percent reads and 50 percent updates, Read heavy: (a) read operations, (b) update operations. Six server-class machines (dual 64-bit quad core 2.5 GHz Intel Xeon CPUs, 8 GB of RAM, 6 disk RAID-10 array and gigabit ethernet)
SLIDE 20
Designed to run on cheap commodity
hardware
Handle high write throughput while not
sacricing read eciency
Decentralized Elasticity Fault-tolerant Tunable consistency
SLIDE 21