bcstore bandwidth efficient in memory kv store with batch
play

BCStore: Bandwidth-Efficient In-memory KV-Store with Batch Coding - PowerPoint PPT Presentation

BCStore: Bandwidth-Efficient In-memory KV-Store with Batch Coding Shenglong Li , Quanlu Zhang, Zhi Yang and Yafei Dai Peking University Outline Introduction and Motivation Our Design System and Implementation Evaluation Outline


  1. BCStore: Bandwidth-Efficient In-memory KV-Store with Batch Coding Shenglong Li , Quanlu Zhang, Zhi Yang and Yafei Dai Peking University

  2. Outline  Introduction and Motivation  Our Design  System and Implementation  Evaluation

  3. Outline  Introduction and Motivation  Our Design  System and Implementation  Evaluation

  4. In-memory KV-Store  A crucial building block for many systems – Data cache (e.g. Memcached and Redis in Facebook, Twitter) – In-memory database  Availability is important for in-memory KV-Stores – Facebook reports that it takes 2.5-3 hours to recover 120GB data of an in-memory database from disk to memory Data redundancy in distributed memory is essential for fast failover

  5. Two redundancy schemes  Replication is a classical way to provide data availability – E.g., Repcached, Redis Write request Client Data node High High Update bandwidth memory cost cost Backup Backup node node

  6. Two redundancy schemes  Erasure coding is a space-efficient redundancy scheme  The increase of CPU speed enables fast data recovery – Encoding/Decoding rates can reach 40Gb/s on single core [1] Client Write request Data Data Data Node Node Node High Low bandwidth memory Update cost cost Parity Parity Node Node [1] Efficient and Available In-memory KV-Store with Hybrid Erasure Coding and Replication, FAST ’16

  7. In-place Update  A traditional mechanism for encoding small objects Update(obj4->obj4’) Delta(obj4, obj4’) Data node 1 obj4’ obj4 obj7 obj1 Update(obj8->obj8’) Update Data node 2 obj2 obj8’ obj8 obj5 (obj3->obj3’) Data node 3 Bandwidth cost obj6 obj9 obj3 obj3’ is the same as 3-replication Parity node 1 p p P p P P Parity node 2 p P P p P p Our goal: both memory efficiency and bandwidth efficiency

  8. Outline  Introduction and Motivation  Our Design  System and Implementation  Evaluation

  9. Our Design  Aggregate write requests and encode objects in a new coding stripe invalid Batch coding Append Data node 1 obj7 obj1 obj4 obj4 obj4’ Data node 2 obj2 obj8 obj5 obj8 obj8’ Batch node Data node 3 obj3 obj6 obj9 obj3 obj3’ Parity node 1 P P P P Parity node 2 P P P P

  10. Latency Analysis  Batch coding induces extra request waiting time  Formalize the waiting time W Request throughput W = f(T, k) number of data nodes Latency bound Ɛ K = 3

  11. Garbage Collection  Recycle updated or deleted blocks and release extra parity blocks  Move-based garbage collection Original stripes Batched stripes Data Move nodes Parity nodes Much bandwidth cost for updating GC GC parity blocks

  12. Garbage Collection  How to reduce the GC bandwidth cost? – Intuition: GC the stripes with the most invalid blocks  Greedy block moving Original stripes Batched stripes Data nodes Parity nodes Two block moves to release GC GC two coding stripes

  13. Garbage Collection  How to further reduce block move? – Intuition: make the updates focus on few stripes  Popularity-based data arrangement Original stripes Batched stripes Hot Cold Cold Hot Data nodes Parity nodes Only one block move to GC GC release two coding stripes

  14. Bandwidth Analysis  Theorem GC bandwidth + Coding bandwidth <= In-place update bandwidth Detailed proof can be found in our paper

  15. Outline  Introduction and Motivation  Our Design  System and Implementation  Evaluation

  16. System Architecture Data process Client Batch process Data process Batch coding Preprocessing Data process Client Garbage Metadata Parity process collection management Client Parity process Storage group

  17. Handle Write Requests v2 Data process 1 Client Set(k1, v1) v1 Batch process Data process 2 Stripe Hash v3 index table Client Data process 3 Update set(k2, v2) Batch coding P1 Parity process 1 set(k3, v3) Client v2 P2 v1 Parity process 2 v3 P1 P2 b1

  18. Handle Read Requests v2 Data process 1 v1 get(b1) Data process 2 Batch process get(k1) v3 Client Hash Stripe Data process 3 table index P1 Parity process 1 Key Stripe id k1 b1 P2 k2 b1 Parity process 2 k3 b1

  19. Recovery v2 Data process 1 Recover the request data first 1. Get values according to stripe id from any k storage processes v1 Data process 2 Batch process get(k1) v3 Client Data process 3 Decoder P1 Parity process 1 v2 2. Recover the P2 P1 v1 lost blocks Parity process 2 P2

  20. Outline  Introduction and Motivation  Our Design  System and Implementation  Evaluation

  21. Evaluation  Cluster configuration – 10 machines running SUSE Linux 11 containing 12 * AMD Opteron Processor 4180 CPUs – 1Gb/s Ethernet  Targets of comparison – In-place update EC (Cocytus[1]) – Replication (Rep)  Workload – YCSB with different key distributions – 50%:50% read/write ratio [1] Efficient and Available In-memory KV-Store with Hybrid Erasure Coding and Replication, FAST ’16

  22. Bandwidth Cost Save up to 51% bandwidth cost Bandwidth cost for different coding schemes.

  23. Throughput Up to 2.4x improvement Throughput performance for different coding schemes.

  24. Memory Save up to 41% memory cost Memory consumption for different redundancy schemes

  25. Latency Read latency Write latency

  26. Conclusion  Efficiency and availability are two crucial features for in- memory KV-Stores  We build BCStore, an in-memory KV-Store which applies erasure coding for data availability  We design batch coding mechanism to achieve high bandwidth efficiency for write workload  We propose a heuristic garbage collection algorithm to improve memory efficiency

  27. Thanks! Q&A

  28. Severity of Bandwidth Cost  Prevalence of write requests in large-scale web services – Peak load can easily run out of network bandwidth and degrade service performance  Monetary cost of bandwidth becomes several times higher – Especially under the commonly used peak-load pricing model – Bandwidth amplification would be more serious with the increase of m (number of parity servers)  Budget of bandwidth resource is usually limited in workload-sharing cluster Our goal: High memory efficiency and bandwidth efficiency

  29. Our Design  Batch write requests and append a new coding stripe Batch coding Append Data node 1 obj1 obj4 obj7 obj4’ Data node 2 obj2 obj8 obj5 obj8’ Data node 3 obj3 obj6 obj9 obj3’ Parity node 1 P P Parity node 2 P P

  30. Challenges  Recycle the memory space of data blocks which are deleted or updated – Data blocks and parity blocks are appended to the storage – Updated blocks can not be delete directly  Encode variable-sized data efficiently – Variable-sized data can not be appended to previous storage space directly

  31. Garbage Collection  Popularity-based data arrangement Hot Data node 1 Data node 2 Sort Data node 3 Parity node1 Parity node2 Batched cold objects

  32. Encoding Variable-size Data  Virtual coding stripes (vcs) – Each virtual coding stripe has a large fixed-length space and is aligned in virtual address Physical space Data node 1 Data node 1 Data node 2 Data node 3 Parity node 1 Parity node 2 vcs1 vcs2 vcs3 Virtual space

  33. Bandwidth Cost Bandwidth cost for moderate-skewed Zipfian workload (RS(3,2))

  34. Throughput Throughput performance for moderate-skewed Zipfian workload

  35. Throughput Throughput for recovery

  36. In-place Update  A traditional mechanism for coding small objects Data node 1 obj4 obj7 obj1 Data node 2 obj2 obj8 obj5 Data node 3 obj6 obj9 obj3 Parity node 1 P P P Parity node 2 P P P

  37. Garbage Collection  How to further reduce block move? – Intuition: make the updates focus on few stripes  Popularity-based data arrangement Original stripes Batched stripes Hot Cold Cold Hot Data nodes Parity nodes GC GC

  38. Bandwidth Analysis  Theorem GC bandwidth + Coding bandwidth <= In-place update bandwidth Original stripes Batched stripes Data nodes Parity nodes GC GC Worst case of GC bandwidth

  39. Bandwidth Cost Bandwidth cost for different throughput. (RS(5,4))

  40. Recovery Data process 1 1. Get latest batch id Data process 2 M Batch process Client Replication Data process 3 3. Serve requests M Batch process Parity process 1 2. Update the latest stable batch id and reconstruct metadata Parity process 2

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend