Optimizing Hash-based Distributed Storage Using Client Choices
Peilun Li and Wei Xu Institute for Interdisciplinary Information Sciences, Tsinghua University
Optimizing Hash-based Distributed Storage Using Client Choices - - PowerPoint PPT Presentation
Optimizing Hash-based Distributed Storage Using Client Choices Peilun Li and Wei Xu Institute for Interdisciplinary Information Sciences, Tsinghua University Data Placement Design #1 Centralized management: GFS, HDFS, data name
Peilun Li and Wei Xu Institute for Interdisciplinary Information Sciences, Tsinghua University
Name Server Data Server Data Server Data Server data name → server name server name → server IP Data
2
Client Monitor Server Data Server Data Server Data Server server name → server IP server name → server IP
Hash function
Server Name
Data
Data Name
3
Pros Cons
Centralized Management Global performance
Centralized name server can become bottleneck. Hash-based Distributed Management Avoid centralized server bottleneck. Fixed placement makes it hard to do
Some optimization is vulnerable to change of lower-level storage architectures.
4
5
Client
server name → server IP Hash Function 1 Server 1 Hash Function 2 Server 2 Hash Function 3 Server 3 Server 1 Server 2 Server 3 Policy Server 2 Data
6
7
Client
Server 1 Server 2 Server 3 Write-Query No data Write Data
Choice Cache
Multi-hash
& Performance
8
Client
Server 1 Server 2 Server 3 Read-Query Has data
Choice Cache
Multi-hash
Read Data
9
journal size, …
10
11
12
13
with 4MB block size concurrently on the same machine.
14
73% 96% 0% 20% 40% 60% 80% 100%
Disk capacity utilization baseline space
Evaluation of space
15
500 1000 1500 2000 2500 1 2 3 4 5 6 7 8 9 101112131415161718192021222324
Throughput (MS/s)
Evaluation of local on testbed
baseline local 7963.1 12947.2 2000 4000 6000 8000 10000 12000 14000
Throughput (MB/s) baseline local
Evaluation of local on production cluster
16
200 400 600 800 1000 1200 1400 1600 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
Throughput (MB/s)
Evaluation of memory
baseline memory 17
1350 1400 1450 1500 1550 1600 1650 1700 1 3 5 7 9 11 13 15 17 19 21 23
Throughput (MS/s)
baseline cpu 100 200 300 400 500 600 700 800 900 1000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Throughput (MB/s)
baseline latency 100 200 300 400 500 600 700 800 900 1000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Throughput (MB/s)
baseline journal 18
19
Policy Performance Change Improvement local 1545 MB/s → 1900 MB/s 23.0% memory 778 MB/s → 1403 MB/s 80.3% space 73% → 96% 31.5% cpu 1545 MB/s → 1513MB/s
latency 402 MB/s → 396MB/s
journal 402MB/s → 396MB/s
20
10 20 30 40 50 60 70 80 90 100 50 100 150 200 250 300 Percentile Latency (ms) 2 choices no probing 10 20 30 40 50 60 70 80 90 100 20 40 60 80 100 120 140 160 Percentile Latency (ms) 2 choices no probing
4MB sequential write 4KB random write
21
22
23
24
25