Reducing Replication Bandwidth for Distributed Document Databases - - PowerPoint PPT Presentation

reducing replication bandwidth for distributed document
SMART_READER_LITE
LIVE PREVIEW

Reducing Replication Bandwidth for Distributed Document Databases - - PowerPoint PPT Presentation

Reducing Replication Bandwidth for Distributed Document Databases Lianghong Xu 1 , Andy Pavlo 1 , Sudipta Sengupta 2 Jin Li 2 , Greg Ganger 1 Carnegie Mellon University 1 , Microsoft Research 2 Document-oriented Databases { { " _id " :


slide-1
SLIDE 1

Reducing Replication Bandwidth for Distributed Document Databases

Lianghong Xu1, Andy Pavlo1, Sudipta Sengupta2 Jin Li2, Greg Ganger1 Carnegie Mellon University1, Microsoft Research2

slide-2
SLIDE 2

Document-oriented Databases

{ "_id" : "55ca4cf7bad4f75b8eb5c25c", "pageId" : "46780", "revId" : "41173", "timestamp" : "2002-03-30T20:06:22", "sha1" : "6i81h1zt22u1w4sfxoofyzmxd” "text" : “The Peer and the Peri is a comic [[Gilbert and Sullivan]] [[operetta ]] in two acts… just as predicting,…The fairy Queen, however, appears to … all live happily ever after. " } { "_id" : "55ca4cf7bad4f75b8eb5c25d”, "pageId" : "46780", "revId" : "128520", "timestamp" : "2002-03-30T20:11:12", "sha1" : "q08x58kbjmyljj4bow3e903uz” "text" : "The Peer and the Peri is a comic [[Gilbert and Sullivan]] [[operetta ]] in two acts… just as predicted, …The fairy Queen, on the other hand, is ''not'' happy, and appears to … all live happily ever after. " }

Update

Update: Reading a recent doc and writing back a similar one

2 ¡

slide-3
SLIDE 3

Replication Bandwidth

Operation logs Operation logs

Secondary Secondary

WAN

Primary Database

3 ¡

{ "_id" : "55ca4cf7bad4f75b8eb5c25c", "pageId" : "46780", "revId" : "41173", "timestamp" : "2002-03-30T20:06:22", "sha1" : "6i81h1zt22u1w4sfxoofyzmxd” "text" : “The Peer and the Peri is a comic [[Gilbert and Sullivan]] [[operetta ]] in two acts… just as predicting,…The fairy Queen, however, appears to … all live happily ever after. " } { "_id" : "55ca4cf7bad4f75b8eb5c25d”, "pageId" : "46780", "revId" : "128520", "timestamp" : "2002-03-30T20:11:12", "sha1" : "q08x58kbjmyljj4bow3e903uz” "text" : "The Peer and the Peri is a comic [[Gilbert and Sullivan]] [[operetta ]] in two acts… just as predicted, …The fairy Queen, on the other hand, is ''not'' happy, and appears to … all live happily ever after. " }

slide-4
SLIDE 4

Replication Bandwidth

Operation logs Operation logs

Secondary Secondary

WAN

Primary Database

4 ¡

{ "_id" : "55ca4cf7bad4f75b8eb5c25c", "pageId" : "46780", "revId" : "41173", "timestamp" : "2002-03-30T20:06:22", "sha1" : "6i81h1zt22u1w4sfxoofyzmxd” "text" : “The Peer and the Peri is a comic [[Gilbert and Sullivan]] [[operetta ]] in two acts… just as predicting,…The fairy Queen, however, appears to … all live happily ever after. " } { "_id" : "55ca4cf7bad4f75b8eb5c25d”, "pageId" : "46780", "revId" : "128520", "timestamp" : "2002-03-30T20:11:12", "sha1" : "q08x58kbjmyljj4bow3e903uz” "text" : "The Peer and the Peri is a comic [[Gilbert and Sullivan]] [[operetta ]] in two acts… just as predicted, …The fairy Queen, on the other hand, is ''not'' happy, and appears to … all live happily ever after. " }

Goal: Reduce WAN bandwidth for geo-replication

slide-5
SLIDE 5

Why Deduplication?

  • Why not just compr
  • mpres

ess?

– Oplog batches are small and not enough overlap

  • Why not just use di

difg fg?

– Need application guidance to identify source

  • Dedup

Dedup finds and removes redundancies

– In the entire data corpus

5 ¡

slide-6
SLIDE 6

Traditional Dedup: Ideal

Modified Region Duplicate Region Chunk Boundary Deduped Data Incoming Data

{BYTE STREAM }

1 2 4 5 1 2 4 5 3

Send dedup’ed data to replicas

6 ¡

slide-7
SLIDE 7

Traditional Dedup: Reality

Modified Region Duplicate Region Chunk Boundary Incoming Data Deduped Data

4 1 2 4 5 3

7 ¡

slide-8
SLIDE 8

Traditional Dedup: Reality

Modified Region Duplicate Region Chunk Boundary Incoming Data Deduped Data

4 1 2 4 5 3

Send almost the entire document.

8 ¡

slide-9
SLIDE 9

Similarity Dedup (sDedup)

Modified Region Duplicate Region Chunk Boundary Incoming Data Dedup’ed Data

Delta!

Only send delta encoding.

9 ¡

slide-10
SLIDE 10

Compress vs. Dedup

20GB sampled Wikipedia dataset MongoDB v2.7 / / 4MB Oplog batches

10 ¡

slide-11
SLIDE 11

Primary Node Client Secondary Node

sDedup Integration

Source documents Insertion & Updates Database Oplog sDedup Encoder Unsynchronized

  • plog entries

Dedup’ed

  • plog entries

Oplog Re-constructed

  • plog entries

Replay sDedup Decoder Oplog syncer Database Source documents Source Document Cache

11 ¡

slide-12
SLIDE 12

sDedup Encoding Steps

  • Identify Similar Documents
  • Select the Best Match
  • Delta Compression

12 ¡

slide-13
SLIDE 13

Identify Similar Documents

Target Document Consistent Sampling Similarity Sketch Rabin Chunking

32 17 25 41 12 41 32

Feature Index Table

Candida andidate Documents Documents

41 32 32 25 38 41 12 32 17 38 41 12 39 32 22 15

Doc #1 Doc #2 Doc #3

32 25 38 41 12 32 17 38 41 12

Doc #2 Doc #3

1

Doc #1

2

Doc #2

2

Doc #3 Similarity Similarity Sc Scor

  • re

13 ¡

slide-14
SLIDE 14

Select the Best Match

Source Document Cache Rank Candidates Score 1 2 1 2 2 1 Doc #1 Doc #2 Doc #3 Initial Ranking Initial Ranking Final Ranking Final Ranking Rank Candidates Cached? Score 1 Yes es 4 1 Yes es 3 2 No 2 Doc #1 Doc #3 Doc #2

Is doc cached? If yes, reward +2

14 ¡

slide-15
SLIDE 15

Evaluation

  • MongoDB setup (v2.7)

– 1 primary, 1 secondary node, 1 client – Node Config: 4 cores, 8GB RAM, 100GB HDD storage

  • Datasets:

– Wikipedia dump (20GB out of ~12TB) – Additional datasets evaluated in the paper

15 ¡

slide-16
SLIDE 16

Compression

9.9 26.3 38.4 38.9 2.3 4.6 9.1 15.2 10 20 30 40 50

4KB 1KB 256B 64B Compression Ratio Chunk Size sDedup trad-dedup 20GB sampled Wikipedia dataset

16 ¡

slide-17
SLIDE 17

Memory

34.1 47.9 57.3 61.0 80.2 133.0 272.5 780.5 200 400 600 800

4KB 1KB 256B 64B Memory (MB) Chunk Size sDedup trad-dedup 20GB sampled Wikipedia dataset

17 ¡

slide-18
SLIDE 18

Other Results (See Paper)

  • Negligible client perf

Negligible client performanc

  • rmance o

e overhead erhead

  • Failur

ailure r e rec ecovery is quick and eas ery is quick and easy y

  • Shar

Sharding ding does not hurt c does not hurt compr

  • mpres

ession r sion rate e

  • Mor

More da e datasets tasets

– Microsoft Exchange, Stack Exchange

18 ¡

slide-19
SLIDE 19

Conclusion & Future Work

  • sDedup: Similarity-based deduplication for

replicated document databases

– Much greater data reduction than traditional dedup – Up to 38x compression ratio for Wikipedia – Resource-effjcient design with negligible overhead

  • Futur

Future w e work

  • rk

– More diverse datasets – Dedup for local database storage – Difgerent similarity search schemes (e.g., super-fingerprints)

19 ¡