optimizing hash based distributed storage using client
play

Optimizing Hash-based Distributed Storage Using Client Choices - PowerPoint PPT Presentation

Optimizing Hash-based Distributed Storage Using Client Choices Peilun Li and Wei Xu Institute for Interdisciplinary Information Sciences, Tsinghua University Data Placement Design #1 Centralized management: GFS, HDFS, data name


  1. Optimizing Hash-based Distributed Storage Using Client Choices Peilun Li and Wei Xu Institute for Interdisciplinary Information Sciences, Tsinghua University

  2. Data Placement Design #1 • Centralized management: GFS, HDFS, … data name → server name Data Server server name → server IP Name Server Data Server Client Data Data Server 2

  3. Data Placement Design #2 • Hash-based distributed management: Ceph, Dynamo, FDS, … server name → server IP Data Server Monitor Server server name → Data Server server IP Hash function Data Server Data Data Server Name Name Client 3

  4. Pros and Cons of Different Designs Pros Cons Centralized Global performance Centralized name server can become Management optimization. bottleneck. Fixed placement makes it hard to do optimization. Hash-based Avoid centralized server Distributed bottleneck. Some optimization is vulnerable to Management change of lower-level storage architectures. 4

  5. Motivation • We want to use server information to improve system performance in hash-based distributed management. • Static information: network structure, failure domain, … • Dynamic information: latency, memory utilization, … • We want a flexible system so that new optimizations for specific applications can be added easily. • Do not want to redesign the whole placement algorithm or hash function. 5

  6. Solution: Multiple Hash Functions server name → server IP Hash Server 1 Server 1 Function 1 Hash Server 2 Policy Server 2 Data Server 2 Function 2 Hash Server 3 Server 3 Function 3 Client 6

  7. Solution: Multiple Hash Functions • We can use multiple hash functions to provide multiple choices, and choose the best one with a fixed policy. • Different servers provide different performance. • A performance requirement or even a specific application can have their own optimization policy. • Easy to be implemented as an independent module. 7

  8. How does Write Work Now? Server 1 Choice Cache Write-Query Server 2 Write Data Client No data & Performance Multi-hash Server 3 8

  9. How does Read Work Now? Server 1 Choice Cache Read-Query Server 2 Read Data Client Has data Multi-hash Server 3 9

  10. Simple Server • Gather server performance metrics. • CPU/memory/disk utilization, average read/write latency, unflushed journal size, … • Answer client probing. • Check whether the requested data exist on this server or not. • Piggyback server metrics with probing results. 10

  11. Clever Client • Provide multiple choices. • Probe server choices before the first access. • Make a choice if need to write new data. • Cache the choice after the first access. 11

  12. Making the Best Choice • A policy gets server information as input and output the best choice. • Example policies: 12

  13. Implementation • We implement it based on Ceph. • About 140 lines of C++ codes for server module. • Easy to be implemented on other systems. • Only support block device interface now. • It ensures that only one client is accessing the block device data. 13

  14. Evaluation Setup • Testbed cluster. • 3 machines. • 15*4TB hard drives • 2*12 cores 2.1GHz Xeon CPU • 128 GB memory • 10Gb NIC. • Workloads are generated with librbd engine of FIO. 8 images are read/written with 4MB block size concurrently on the same machine. • Production cluster. • 44 machines. • 4*4TB hard drives and 256GB SSD. • 2 10Gb NICs. • Workloads are generated with webserver module of FileBench. • The number of choice is fixed to 2. 14

  15. Policy space Saves Disk Space • space chooses the server with most free space to store data. • A hash-based storage system is full when there is one full disk. Evaluation of space 96% 100% Disk capacity utilization 73% 80% 60% 40% 20% 0% baseline space 15

  16. Policy local Reduces Network Bottleneck • local chooses the closest server to store data. • Can save cross-rack network bandwidth. Evaluation of local on testbed Evaluation of local on production cluster 12947.2 2500 14000 12000 2000 Throughput (MS/s) Throughput (MB/s) 10000 7963.1 1500 8000 6000 1000 4000 500 2000 0 0 1 2 3 4 5 6 7 8 9 101112131415161718192021222324 baseline local baseline local 16

  17. Policy memory Improves Read Throughput • memory chooses the server with the most free memory. • Coexist with other running programs • More free memory => more file systems buffer => better read perf. Evaluation of memory 1600 1400 Throughput (MB/s) 1200 1000 800 600 400 200 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 baseline memory 17

  18. Inefficient Policies • Policies cpu , latency , and journal do not work well. 1700 1000 1000 900 900 1650 800 800 Throughput (MS/s) Throughput (MB/s) Throughput (MB/s) 1600 700 700 600 600 1550 500 500 1500 400 400 300 300 1450 200 200 1400 100 100 1350 0 0 1 3 5 7 9 11 13 15 17 19 21 23 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 baseline cpu baseline latency baseline journal 18

  19. Why are They Inefficient? • The Ceph server is not CPU intensive under this hardware configuration. • Queue-based transient metrics, e.g. unflushed journal size, changes too fast, so we can not have a consistent measurement. • However, applying ineffective policies still provide similar performance of the baseline! 19

  20. Summary of Different Policies • General improvement: Policy Performance Change Improvement local 1545 MB/s → 1900 MB/s 23.0% memory 778 MB/s → 1403 MB/s 80.3% space 73% → 96% 31.5% cpu 1545 MB/s → 1513MB/s -1.9% latency 402 MB/s → 396MB/s -1.5% journal 402MB/s → 396MB/s -1.5% 20

  21. Probing Overhead • The most significant overhead is server probing. 4MB sequential write 4KB random write 100 100 2 choices 2 choices no probing no probing 90 90 80 80 70 70 60 60 Percentile Percentile 50 50 40 40 30 30 20 20 10 10 0 0 20 40 60 80 100 120 140 160 50 100 150 200 250 300 21 Latency (ms) Latency (ms)

  22. Discussion about Probing Overhead • It has 2.7ms average latency overhead for probing because of an extra round trip time. • Latency is increased by 2.7% for large sequential write and 6.9% for small random write. • The probing is only done in the first access at a client. • The overhead is distributed to all subsequent accesses of an object. 22

  23. Future Work • Develop more advanced choice policies based on multiple metrics. • Provide an application-level API, so the application itself can make the choices. • Exploring different ways to collaboratively cache the choice information, in order to reduce the number of probing. 23

  24. Conclusion • Hash-based design in distributed systems can be flexible as well. • Statistic optimization with best efforts can be both simple and efficient. • Without significant queueing effects, the power of two may not work well in a real computer system. 24

  25. Thank You We are hiring: faculty members, postdocs in any CS field contact: weixu@tsinghua.edu.cn 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend