Reducing Replication Bandwidth for Distributed Document Databases
Lianghong Xu1, Andy Pavlo1, Sudipta Sengupta2 Jin Li2, Greg Ganger1 Carnegie Mellon University1, Microsoft Research2
Reducing Replication Bandwidth for Distributed Document Databases - - PowerPoint PPT Presentation
Reducing Replication Bandwidth for Distributed Document Databases Lianghong Xu 1 , Andy Pavlo 1 , Sudipta Sengupta 2 Jin Li 2 , Greg Ganger 1 Carnegie Mellon University 1 , Microsoft Research 2 #1 You can sleep with grad students but not
Reducing Replication Bandwidth for Distributed Document Databases
Lianghong Xu1, Andy Pavlo1, Sudipta Sengupta2 Jin Li2, Greg Ganger1 Carnegie Mellon University1, Microsoft Research2
#1 – You can sleep with grad students but not undergrads.
#2 – Keep a bottle of water in your office in case a student breaks down crying.
#3 – Kids love MongoDB, but they want to go work for Google.
Reducing Replication Bandwidth for Distributed Document Databases
In ACM Symposium on Cloud Computing,
More Info: http:/ /cmudb.io/doc-dbs
Replication Bandwidth
Operation logs Operation logs
Primary Database Secondary MMS
WAN
Replication Bandwidth
Operation logs Operation logs
Primary Database Secondary MMS
WAN
Goal: Reduce bandwidth for WAN geo-replication.
Why Deduplication?
pres ess?
– Oplog batches are small and not enough overlap.
iff?
– Need application guidance to identify source.
eduplicati tion finds and removes redundancies.
Traditional Dedup
Modified Region Duplicate Region Chunk Boundary
Deduped Data Incoming Data
1 2 4 5 1 2 4 5 3
Send deduped data to replicas.
Traditional Dedup
Modified Region Duplicate Region Chunk Boundary
Incoming Data Deduped Data
Must send the entire document.
Similarity Dedup
Modified Region Duplicate Region Chunk Boundary
Incoming Data Deduped Data
Delta! Delta! Delta! Delta! Delta!
Only send delta encoding.
Compress vs. Dedup
20GB sampled Wikipedia dataset. MongoDB v2.7 / / 4MB Oplog batches
Primary Node Client Secondary Node
sDedup: Similarity Dedup
Source documents Insertion & Updates Database Oplog Delta Compressor
Unsynchronized
Deduplicated
Oplog
Re-constructed
Replay Delta Decompressor Oplog syncer Database Source documents Source Document Cache
Encoding Steps
Identify Similar Documents
Target Document
Consistent Sampling Similarity Sketch Rabin Chunking
32 17 25 41 12 41 32 Feature Index Table Candi ndida date Docum ument ents 41 32 32 25 38 41 12 32 17 38 41 12 39 32 22 15
Doc #1 Doc #2 Doc #3
32 25 38 41 12 32 17 38 41 12
Doc #2 Doc #3
1
Doc #1
2
Doc #2
2
Doc #3
Simila imilarit ity Score re
Selecting the Best Match
Source Document Cache
Rank Candidates Score 1 2 1 2 2 1 Doc #1 Doc #2 Doc #3
Init itial R ial Ran ankin ing Fin inal R al Ran ankin ing
Rank Candidates Cached? Score 1 Ye Yes 6 1 Ye Yes 3 2 No 2 Doc #1 Doc #3 Doc #2
Is doc cached? If yes, reward 3x
Delta Compression
– Based on the xDelta algorithm – Improved speed with minimal loss of compression
ncoding ng: :
– Descriptors about duplicate/unique regions + unique bytes
Decoding ng:
– Use source doc + encoded output – Concatenate byte regions in order
Evaluation
– 1 primary, 1 secondary node, 1 client – Node Config: 4 cores, 8GB RAM, 100GB HDD storage
– Wikipedia dump (20GB out of ~12TB) – Stack Exchange data dump (10GB out of ~100GB)
Compression: Wikipedia
9.9 26.3 38.4 38.9 2.3 4.6 9.1 15.2 10 20 30 40 50
4KB 1KB 256B 64B Compression Ratio Chunk Size sDedup trad-dedup 20GB sampled Wikipedia dataset
Memory: Wikipedia
34.1 47.9 57.3 61.0 80.2 133.0 272.5 780.5 200 400 600 800
4KB 1KB 256B 64B Memory (MB) Chunk Size sDedup trad-dedup 20GB sampled Wikipedia dataset
Compression: StackExchange
1.0 1.2 1.3 1.8 1.0 1.0 1.1 1.2 1 2 3 4 5
4KB 1KB 256B 64B Compression Ratio Chunk Size sDedup trad-dedup 10GB sampled StackExchange dataset
Throughput Overhead
Dedup + Sharding
38.4 38.2 38.1 37.9 10 20 30 40 50
1 3 5 9 Compression Ratio # of Shards 20GB sampled Wikipedia dataset
Failure Recovery
20GB sampled Wikipedia dataset.
Failure Point
Conclusion
replicated document databases.
– Much greater data reduction than traditional dedup – Up to 38x compression ratio for Wikipedia – Resource-efficient design for inline deduplication with negligible performance overhead
What’s Next?
storage manager.
WiredTiger vs. sDedup
Compression Ratio Snappy 1.6x zLib 3.0x sDedup (no compress) 38.4x sDedup + Snappy 60.8x sDedup + zLib 114.5x
20GB sampled Wikipedia dataset.
@andy_pavlo