Hurricane Master semester project IC School Operating Systems - - PowerPoint PPT Presentation

hurricane
SMART_READER_LITE
LIVE PREVIEW

Hurricane Master semester project IC School Operating Systems - - PowerPoint PPT Presentation

Hurricane Master semester project IC School Operating Systems Laboratory Author Diego Antognini Supervisors Prof. Willy Zwaenepoel Laurent Bindschaedler Outline Motivation Hurricane Experiments Future work Conclusion


slide-1
SLIDE 1

Hurricane

Master semester project – IC School Operating Systems Laboratory Author Diego Antognini Supervisors

  • Prof. Willy Zwaenepoel

Laurent Bindschaedler

slide-2
SLIDE 2

Outline

  • Motivation
  • Hurricane
  • Experiments
  • Future work
  • Conclusion

2

slide-3
SLIDE 3

Motivation

slide-4
SLIDE 4

Original goal of the project

  • Implement Chaos on top of HDFS !
  • How ?
  • Replace storage engine by HDFS
  • Why ?
  • Industry interested by systems running on Hadoop
  • Handling cluster easily
  • Distributed file systems
  • Fault-tolerance (but at what price ?)

4 Introduction – Hurricane – Experiments – Future work - Conclusion

slide-5
SLIDE 5

Chaos

  • Scale-out graph processing from secondary storage
  • Maximize sequential access
  • Stripes data across secondary devices in a cluster
  • Limited only by :
  • aggregate bandwidth
  • capacity of all storage devices in the entire cluster

5 Introduction – Hurricane – Experiments – Future work - Conclusion

slide-6
SLIDE 6

Hadoop Distributed File System

6 Introduction – Hurricane – Experiments – Future work - Conclusion

Namenode Datanodes Datanodes Client

slide-7
SLIDE 7

Experiment : DFSIO

  • Measure aggregate bandwidth on a cluster when

writing & reading 100 GB of data in X files :

  • Use DFSIO benchmark
  • Each task operates on a distinct block
  • Measure disk I/O

# Files Size 1 100 GB 2 50 GB … … 4096 25 MB

7 Introduction – Hurricane – Experiments – Future work - Conclusion

slide-8
SLIDE 8

Clusters

DCO OS Ubuntu 14.04.01 # Cores 16 Memory 128 GB Storage HDD : 140 MB/s SSD : 243 MB/s Network 10 Gbit/s

8 Introduction – Hurricane – Experiments – Future work - Conclusion

slide-9
SLIDE 9

Results DFSIO – DCO cluster

9 Introduction – Hurricane – Experiments – Future work - Conclusion 250 500 750 1000 1250 1500 1750 2000 2250 2500 1 2 4 8 16 32 64 128 256 512 1024 2048 4096

Aggregate bandwidth [MB/s] Number of Files

I/O to disk writing 100GB of data 8 Nodes - No Replication DCO Cluster

Read Write Baseline (dd, hdparm) - Read/Write

slide-10
SLIDE 10

Observations: DFSIO

  • Somewhat lackluster performance
  • Hard to tune !

10

HDFS doesn’t fit the requirements

Introduction – Hurricane – Experiments – Future work - Conclusion

slide-11
SLIDE 11

Our solution

  • Create a standalone distributed storage system

based on Chaos storage engine

  • Give it an HDFS-like RPC interface

11 Introduction – Hurricane – Experiments – Future work - Conclusion

Actual project !

slide-12
SLIDE 12

Hurricane

slide-13
SLIDE 13

Hurricane

  • Scalable decentralized storage system based on Chaos
  • Balance I/O load randomly across available disks
  • Saturate available storage bandwidth
  • Target rack-scale deployment

13 Introduction – Hurricane – Experiments – Future work - Conclusion

slide-14
SLIDE 14

Real life scenario

  • Chaos using Hurricane

14 Introduction – Hurricane – Experiments – Future work - Conclusion

slide-15
SLIDE 15

Real life scenario

  • Measuring emotions of countries during Euro 2016
  • And much more !

15 Introduction – Hurricane – Experiments – Future work - Conclusion

Switzerland Belgium Romania data

Emotions Switzerland Emotions Belgium Emotions Romania

slide-16
SLIDE 16

Locality does not matter !

  • Remote storage bandwidth = local storage bandwidth
  • Clients can read/write to any storage device
  • Storage is slower than network
  • Network not a bottleneck !
  • Realistic for most clusters at rack scale or even more

16 Introduction – Hurricane – Experiments – Future work - Conclusion

slide-17
SLIDE 17

Maximizing I/O bandwidth

  • Clients pull data records from servers
  • Batches requests to prevent idle servers (prefetching)

17 Introduction – Hurricane – Experiments – Future work - Conclusion

Client Client Client Server Server Server

slide-18
SLIDE 18

Features

  • Global file handling (global_*)
  • Create, exists, delete, fill, drain, rewind etc …
  • Local file handling (local_*)
  • Create, exists, delete, fill, drain, rewind etc ...
  • Add storage nodes dynamically

18 Introduction – Hurricane – Experiments – Future work - Conclusion

slide-19
SLIDE 19

f

How does it work ? – Writing files

19

S1 S2 C1 C2 C3 f f

Introduction – Hurricane – Experiments – Future work - Conclusion

slide-20
SLIDE 20

How does it work ? – Reading files

20

S1 S2 C1 C2 C3 f f

Introduction – Hurricane – Experiments – Future work - Conclusion

slide-21
SLIDE 21

How does it work ? - Join

21

S1 S2 C1 C2 C3 S3 g g g

Introduction – Hurricane – Experiments – Future work - Conclusion

f f f

slide-22
SLIDE 22

Experiments

slide-23
SLIDE 23

Clusters

LABOS DCO TREX OS Ubuntu 14.04.1 Ubuntu 14.04.01 Ubuntu 14.04.01 # Cores 32 16 32 Memory 32 GB 128 GB 128 Gb Storage HDD : 474 MB/s HDD : 140 MB/s SSD : 243 MB/s HDD : 414 MB/s SSD : 464 MB/s Network 1 Gbit/s 10 Gbit/s 40 Gbit/s

23 Introduction – Hurricane – Experiments – Future work - Conclusion

slide-24
SLIDE 24

List of experiments

  • Weak scaling
  • Scalability 1 client
  • Strong scaling
  • Case studies
  • Unbounded buffer
  • Compression

24 Introduction – Hurricane – Experiments – Future work - Conclusion

slide-25
SLIDE 25

Weak scaling

  • Each node writes/reads 16 GB of data
  • Increasing number of nodes
  • N servers, N clients
  • Measure average bandwidth
  • Compare Chaos storage engine, Hurricane, DFSIO

25 Introduction – Hurricane – Experiments – Future work - Conclusion

slide-26
SLIDE 26

100 200 300 400 500 1 2 4 8 16

Average bandwidth [MB/s] Machines TREX SSD Write Chaos storage Hurricane DFSIO

100 200 300 400 500 1 2 4 8 16

Average bandwidth [MB/s] Machines TREX SSD Read Chaos storage Hurricane DFSIO

16 GB per node – 40 Gbit/s network

26 Introduction – Hurricane – Experiments – Future work - Conclusion

Baseline (dd, hdparm)

slide-27
SLIDE 27

16 GB per node – 10 Gbit/s network

27 Introduction – Hurricane – Experiments – Future work - Conclusion 50 100 150 200 250 1 2 4 8

Average bandwidth [MB/s] Machines DCO SSD Write Chaos storage Hurricane DFSIO

50 100 150 200 250 1 2 4 8

Average bandwidth [MB/s] Machines DCO SSD Read Chaos storage Hurricane DFSIO

Baseline (dd, hdparm)

slide-28
SLIDE 28

16 GB per node – 1 Gbit/s network

28 Introduction – Hurricane – Experiments – Future work - Conclusion 100 200 300 400 500 1 2 4 8

Average bandwidth [MB/s] Machines LABOS Read Chaos storage Hurricane DFSIO

100 200 300 400 500 1 2 4 8

Average bandwidth [MB/s] Machines LABOS Write Chaos storage Hurricane DFSIO

Baseline (dd, hdparm)

slide-29
SLIDE 29

Weak scaling - Summary

  • Hurricane similar performance with Chaos storage
  • Scalable
  • Outperforms HDFS roughly 1.5x
  • Maximize I/O bandwidth

29 Introduction – Hurricane – Experiments – Future work - Conclusion

slide-30
SLIDE 30

16 GB per node - 64 nodes

30 Introduction – Hurricane – Experiments – Future work - Conclusion 50 100 150 200 250 300 1 2 4 8 16 32 64

Average bandwidth [MB/s] Machines DCO SSD Read Chaos storage Hurricane

50 100 150 200 250 300 1 2 4 8 16 32 64

Average bandwidth [MB/s] Machines DCO SSD Write Chaos storage Hurricane

STILL SCALABLE & GOOD I/O BANDWIDTH

Baseline (dd, hdparm) Baseline (dd, hdparm)

slide-31
SLIDE 31

Scalability with 1 Client

  • Client writes/reads 16 GB of data per server node
  • Increasing number of server nodes
  • N servers, 1 client
  • Measure aggregate bandwidth
  • Only Hurricane is used

31 Introduction – Hurricane – Experiments – Future work - Conclusion

slide-32
SLIDE 32

40 Gbit/s network

32 Introduction – Hurricane – Experiments – Future work - Conclusion 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 1 2 4 8 16

Aggegate bandwidth [MB/s] Machines TREX SSD Read

500 1000 1500 2000 2500 3000 3500 4000 4500 5000 1 2 4 8 16

Aggregate bandwidth [MB/s] Machines TREX SSD Write

Unknown network problem Baseline Actual bandwidth of the network

slide-33
SLIDE 33

10 Gbit/s network

33 Introduction – Hurricane – Experiments – Future work - Conclusion 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 1 2 4 8

Aggegate bandwidth [MB/s] Machines DCO SSD Read

500 1000 1500 2000 2500 3000 3500 4000 4500 5000 1 2 4 8

Aggregate bandwidth [MB/s] Machines DCO SSD Write

Baseline Also scale with only 1 client

Use the I/O bandwidth of all the server nodes

slide-34
SLIDE 34

Strong scaling

  • Read/write 128 GB of data in total
  • Increasing number of nodes
  • N servers, N clients
  • Measure aggregate bandwidth
  • Compare Chaos storage engine, Hurricane, DFSIO

34 Introduction – Hurricane – Experiments – Future work - Conclusion

slide-35
SLIDE 35

1000 2000 3000 4000 5000 6000 7000 8000 1 2 4 8 16

Aggregate bandwidth [MB/s] Machines TREX SSD Read Chaos storage Hurricane DFSIO

40 Gbit/s network

35 Introduction – Hurricane – Experiments – Future work - Conclusion 1000 2000 3000 4000 5000 6000 7000 8000 1 2 4 8 16

Aggregate bandwidth [MB/s] Machines TREX SSD Write Chaos storage Hurricane DFSIO

Baseline

slide-36
SLIDE 36

1 Gbit/s network

36 Introduction – Hurricane – Experiments – Future work - Conclusion 1000 2000 3000 4000 5000 6000 7000 8000 1 2 4 8

Aggregate bandwidth [MB/s] Machines LABOS Read Chaos storage Hurricane DFSIO

1000 2000 3000 4000 5000 6000 7000 8000 1 2 4 8

Aggregate bandwidth [MB/s] Machines LABOS Write

Chaos storage Hurricane DFSIO

Baseline

slide-37
SLIDE 37

Strong scaling - Summary

  • Hurricane similar performance with Chaos storage
  • Scalable
  • Outperforms HDFS roughly x1.5
  • Maximize I/O bandwidth

37 Introduction – Hurricane – Experiments – Future work - Conclusion

slide-38
SLIDE 38

Case study - Unbounded buffer

  • Each node write/read a certain amount of data
  • Use Hurricane to amortize mismatch between

producers and consumers

  • Show that it can accomodate temporary spikes

seamlessly

  • 16 machines on T-REX -> 16 servers & clients
  • Measure average file size

38 Introduction – Hurricane – Experiments – Future work - Conclusion

slide-39
SLIDE 39

1 TB per node - ~2.5x SSD capacity

39 Introduction – Hurricane – Experiments – Future work - Conclusion 50 100 150 200 250 300 350 400 450 500

Average file size [GB] Time TREX SSD Hurricane

  • Max. SSD capacity
slide-40
SLIDE 40

50 100 150 200 250 300 350 400 450 500

Average file size [GB] Time TREX SSD Hurricane

8 TB per node- ~20x SSD capacity

40 Introduction – Hurricane – Experiments – Future work - Conclusion

  • Max. SSD capacity
slide-41
SLIDE 41

Case study - Summary

  • We can write much more than the cluster can handle
  • Still full I/O bandwidth !
  • Effectively amortize write-read imbalance
  • No degradation of I/O bandwidth
  • Hurricane can buy you time to react to a write deluge

41 Introduction – Hurricane – Experiments – Future work - Conclusion

slide-42
SLIDE 42

Case study - Compression

  • Each node writes/reads 16 GB of data
  • Compress (LZ4) data at disk rate
  • 16 machines on T-REX -> 16 servers & clients
  • Compare three cases :
  • No compression
  • Compress zeroes data
  • Compress data amenable to delta-encoding
  • Measure average bandwidth

42 Introduction – Hurricane – Experiments – Future work - Conclusion

slide-43
SLIDE 43

16GB of input

43 Introduction – Hurricane – Experiments – Future work - Conclusion 250 500 750 1000 1250 1500 No compression Zeroed buffer Delta-encodable

Average bandwidth [MB/s] 16 Machines TREX SSD Read

250 500 750 1000 1250 1500 No compression Zeroed buffer Delta-encodable

16 Machines TREX SSD Write

If data amenable to compression, both speed and storage gains !

Baseline

Type of data Input Output Read speed Write speed No compression 16 GB 16 GB 443 MB/s 455 MB/s Zeroed buffer 16 GB 65 MB 1260 MB/s 565 MB/s Delta-encodable 16 GB 7.2GB 964 MB/s 455 MB/s

slide-44
SLIDE 44

Future work

slide-45
SLIDE 45

Future work

  • Fault tolerance
  • Implement Chaos on Hurricane
  • Integrate Hurricane into Hadoop or Spark
  • Further experiments

45 Introduction – Hurricane – Experiments – Future work - Conclusion

slide-46
SLIDE 46

Conclusion

slide-47
SLIDE 47

Conclusion

  • Hurricane is scalable decentralized storage system
  • HDFS-like RPC interface (flexible)
  • Outperforms HDFS
  • Maximal I/O bandwidth

47 Introduction – Hurricane – Experiments – Future work - Conclusion

slide-48
SLIDE 48

THANK YOU

QUESTIONS ?

slide-49
SLIDE 49

References

1. Amitabha Roy, Laurent Bindschaedler, Jasmina Malicevic, and Willy Zwaenepoel: Chaos: Scale-out Graph Processing from Secondary Storage. SOSP 2015. 2. Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel: X-Stream: Edge-centric Graph Processing using Streaming Partitions. SOSP 2013. 3. Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler: The Hadoop Distributed File System. MSST 2010. 4. Mark Slee, Aditya Agarwal and Marc Kwiatkowski: Thrift: Scalable cross-language services implementation. Facebook white paper 2007. 5. Edmund B. Nightingale, Jeremy Elson, Jinliang Fan, Owen Hofmann, Jon Howell and Yutaka Suzue: Flat Datacenter

  • Storage. OSDI 12.

6. Michael Mitzenmacher : The Power of Two Choices in Randomized Load Balancing. IEE Transactions on Parallel and Distributed Systems 2001.

49