Optimizing Hash-based Distributed Storage Using Client Choices - - PowerPoint PPT Presentation

optimizing hash based distributed storage using client
SMART_READER_LITE
LIVE PREVIEW

Optimizing Hash-based Distributed Storage Using Client Choices - - PowerPoint PPT Presentation

Optimizing Hash-based Distributed Storage Using Client Choices Peilun Li and Wei Xu Institute for Interdisciplinary Information Sciences, Tsinghua University Data Placement Design #1 Centralized management: GFS, HDFS, data name


slide-1
SLIDE 1

Optimizing Hash-based Distributed Storage Using Client Choices

Peilun Li and Wei Xu Institute for Interdisciplinary Information Sciences, Tsinghua University

slide-2
SLIDE 2

Data Placement Design #1

  • Centralized management: GFS, HDFS, …

Client

Name Server Data Server Data Server Data Server data name → server name server name → server IP Data

2

slide-3
SLIDE 3

Data Placement Design #2

  • Hash-based distributed management: Ceph, Dynamo, FDS, …

Client Monitor Server Data Server Data Server Data Server server name → server IP server name → server IP

Hash function

Server Name

Data

Data Name

3

slide-4
SLIDE 4

Pros and Cons of Different Designs

Pros Cons

Centralized Management Global performance

  • ptimization.

Centralized name server can become bottleneck. Hash-based Distributed Management Avoid centralized server bottleneck. Fixed placement makes it hard to do

  • ptimization.

Some optimization is vulnerable to change of lower-level storage architectures.

4

slide-5
SLIDE 5

Motivation

  • We want to use server information to improve system

performance in hash-based distributed management.

  • Static information: network structure, failure domain, …
  • Dynamic information: latency, memory utilization, …
  • We want a flexible system so that new optimizations for specific

applications can be added easily.

  • Do not want to redesign the whole placement algorithm or hash function.

5

slide-6
SLIDE 6

Solution: Multiple Hash Functions

Client

server name → server IP Hash Function 1 Server 1 Hash Function 2 Server 2 Hash Function 3 Server 3 Server 1 Server 2 Server 3 Policy Server 2 Data

6

slide-7
SLIDE 7

Solution: Multiple Hash Functions

  • We can use multiple hash functions to provide multiple choices,

and choose the best one with a fixed policy.

  • Different servers provide different performance.
  • A performance requirement or even a specific application can

have their own optimization policy.

  • Easy to be implemented as an independent module.

7

slide-8
SLIDE 8

How does Write Work Now?

Client

Server 1 Server 2 Server 3 Write-Query No data Write Data

Choice Cache

Multi-hash

& Performance

8

slide-9
SLIDE 9

How does Read Work Now?

Client

Server 1 Server 2 Server 3 Read-Query Has data

Choice Cache

Multi-hash

Read Data

9

slide-10
SLIDE 10

Simple Server

  • Gather server performance metrics.
  • CPU/memory/disk utilization, average read/write latency, unflushed

journal size, …

  • Answer client probing.
  • Check whether the requested data exist on this server or not.
  • Piggyback server metrics with probing results.

10

slide-11
SLIDE 11

Clever Client

  • Provide multiple choices.
  • Probe server choices before the first access.
  • Make a choice if need to write new data.
  • Cache the choice after the first access.

11

slide-12
SLIDE 12

Making the Best Choice

  • A policy gets server information as input and output the best

choice.

  • Example policies:

12

slide-13
SLIDE 13

Implementation

  • We implement it based on Ceph.
  • About 140 lines of C++ codes for server module.
  • Easy to be implemented on other systems.
  • Only support block device interface now.
  • It ensures that only one client is accessing the block device data.

13

slide-14
SLIDE 14

Evaluation Setup

  • Testbed cluster.
  • 3 machines.
  • 15*4TB hard drives
  • 2*12 cores 2.1GHz Xeon CPU
  • 128 GB memory
  • 10Gb NIC.
  • Workloads are generated with librbd engine of FIO. 8 images are read/written

with 4MB block size concurrently on the same machine.

  • Production cluster.
  • 44 machines.
  • 4*4TB hard drives and 256GB SSD.
  • 2 10Gb NICs.
  • Workloads are generated with webserver module of FileBench.
  • The number of choice is fixed to 2.

14

slide-15
SLIDE 15

Policy space Saves Disk Space

  • space chooses the server with most free space to store data.
  • A hash-based storage system is full when there is one full disk.

73% 96% 0% 20% 40% 60% 80% 100%

Disk capacity utilization baseline space

Evaluation of space

15

slide-16
SLIDE 16

Policy local Reduces Network Bottleneck

  • local chooses the closest server to store data.
  • Can save cross-rack network bandwidth.

500 1000 1500 2000 2500 1 2 3 4 5 6 7 8 9 101112131415161718192021222324

Throughput (MS/s)

Evaluation of local on testbed

baseline local 7963.1 12947.2 2000 4000 6000 8000 10000 12000 14000

Throughput (MB/s) baseline local

Evaluation of local on production cluster

16

slide-17
SLIDE 17

Policy memory Improves Read Throughput

  • memory chooses the server with the most free memory.
  • Coexist with other running programs
  • More free memory => more file systems buffer => better read perf.

200 400 600 800 1000 1200 1400 1600 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

Throughput (MB/s)

Evaluation of memory

baseline memory 17

slide-18
SLIDE 18

Inefficient Policies

  • Policies cpu, latency, and journal do not work well.

1350 1400 1450 1500 1550 1600 1650 1700 1 3 5 7 9 11 13 15 17 19 21 23

Throughput (MS/s)

baseline cpu 100 200 300 400 500 600 700 800 900 1000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Throughput (MB/s)

baseline latency 100 200 300 400 500 600 700 800 900 1000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Throughput (MB/s)

baseline journal 18

slide-19
SLIDE 19

Why are They Inefficient?

  • The Ceph server is not CPU intensive under this hardware

configuration.

  • Queue-based transient metrics, e.g. unflushed journal size,

changes too fast, so we can not have a consistent measurement.

  • However, applying ineffective policies still provide similar

performance of the baseline!

19

slide-20
SLIDE 20

Summary of Different Policies

  • General improvement:

Policy Performance Change Improvement local 1545 MB/s → 1900 MB/s 23.0% memory 778 MB/s → 1403 MB/s 80.3% space 73% → 96% 31.5% cpu 1545 MB/s → 1513MB/s

  • 1.9%

latency 402 MB/s → 396MB/s

  • 1.5%

journal 402MB/s → 396MB/s

  • 1.5%

20

slide-21
SLIDE 21

Probing Overhead

  • The most significant overhead is server probing.

10 20 30 40 50 60 70 80 90 100 50 100 150 200 250 300 Percentile Latency (ms) 2 choices no probing 10 20 30 40 50 60 70 80 90 100 20 40 60 80 100 120 140 160 Percentile Latency (ms) 2 choices no probing

4MB sequential write 4KB random write

21

slide-22
SLIDE 22

Discussion about Probing Overhead

  • It has 2.7ms average latency overhead for probing because of an

extra round trip time.

  • Latency is increased by 2.7% for large sequential write and 6.9% for

small random write.

  • The probing is only done in the first access at a client.
  • The overhead is distributed to all subsequent accesses of an object.

22

slide-23
SLIDE 23

Future Work

  • Develop more advanced choice policies based on multiple metrics.
  • Provide an application-level API, so the application itself can make

the choices.

  • Exploring different ways to collaboratively cache the choice

information, in order to reduce the number of probing.

23

slide-24
SLIDE 24

Conclusion

  • Hash-based design in distributed systems can be flexible as well.
  • Statistic optimization with best efforts can be both simple and

efficient.

  • Without significant queueing effects, the power of two may not

work well in a real computer system.

24

slide-25
SLIDE 25

Thank You

We are hiring: faculty members, postdocs in any CS field contact: weixu@tsinghua.edu.cn

25