On Utilization of Contributory On Utilization of Contributory - - PowerPoint PPT Presentation

on utilization of contributory on utilization of
SMART_READER_LITE
LIVE PREVIEW

On Utilization of Contributory On Utilization of Contributory - - PowerPoint PPT Presentation

On Utilization of Contributory On Utilization of Contributory Storage in Desktop Grids Storage in Desktop Grids Chreston Miller , Ali R. Butt, and Patrick Butler Department of Computer Science Contributory Storage: Cheap Contributory Storage:


slide-1
SLIDE 1

On Utilization of Contributory Storage in Desktop Grids On Utilization of Contributory Storage in Desktop Grids

Chreston Miller, Ali R. Butt, and Patrick Butler Department of Computer Science

slide-2
SLIDE 2

2

Contributory Storage: Cheap Storage using Shared Resources Contributory Storage: Cheap Storage using Shared Resources

  • Distributed setup with many participants
  • Nodes contribute storage space for sharing
  • Create a uniform global storage space
  • Typically supports decentralized store/lookup
  • Many systems build upon this idea
  • PAST, CFS, OceanStore, Kosha, LOCKSS,…

2

slide-3
SLIDE 3

3

Goal: Use of Contributory Storage in Scientific Computing Goal: Use of Contributory Storage in Scientific Computing

  • Advantages:
  • Provides economical storage with large capacity
  • Supports parallel access to distributed resources
  • Challenges:
  • Limited individual file sizes
  • Unreliable and transient participants

 Simple replication or file splitting is likely not to work

3

Need for techniques to use shared storage in scientific computing

slide-4
SLIDE 4

4

Our Contribution: PeerStripe Reliable Shared Storage Our Contribution: PeerStripe Reliable Shared Storage

  • Utilizes storage contributed by peer nodes
  • Adapts data striping to support large files
  • Employs error coding for fault tolerance
  • Leverages multicast for efficient replication
  • Supports easy integration with applications

4

slide-5
SLIDE 5

5

Outline Outline

  • Preamble
  • End to our Means
  • Evaluation Study
  • Conclusion

5

slide-6
SLIDE 6

6

Outline Outline

  • Preamble
  • End to our Means
  • Evaluation Study
  • Conclusion

6

– Problem – Motivation – Our Contributions – Core Technologies

slide-7
SLIDE 7

7

Core Technologies: Structured Peer-to-Peer Networks Core Technologies: Structured Peer-to-Peer Networks

  • Implement Distributed Hash Table abstraction
  • Facilitate decentralized operation
  • Provide self-organization of participants
  • Systems based on these networks provide:
  • Mobility and location transparency
  • Load-balancing
  • We use Free Pastry substrate from Rice

University and Microsoft

7

slide-8
SLIDE 8

8

Core Technologies: Increasing Data Availability Core Technologies: Increasing Data Availability

  • Erasure codes
  • Provide redundancy against failures
  • Incur less space overhead than replication
  • Advanced codes can withstand multiple failures
  • Multicast communication protocol
  • Supports simultaneous messaging to many nodes
  • Can be leveraged for efficient replication

8

slide-9
SLIDE 9

9

Outline Outline

  • Preamble
  • End to our Means
  • Experimental Study
  • Conclusion

9

– Software Architecture – Splitting a file – Redundancy with multicast – Error coding – Interfacing with applications

slide-10
SLIDE 10

10

PeerStripe Software Tasks PeerStripe Software Tasks

  • 1. Storing large files
  • Split file into different size chunks
  • Use DHT’s to store chunks
  • 2. Error coding chunks
  • Use online code to provide redundancy
  • 3. Chunk replication
  • Replicate commonly used chunks
  • 4. Interface with applications
  • Provide API’s for applications to use

10

slide-11
SLIDE 11

11

Part 1: Splitting Files into Chunks Part 1: Splitting Files into Chunks

11

Encoder Splitter

Data File x Chunks n blocks /chunk m blocks/chunk x*m error coded blocks Nodes Chunk 1 Get capacity from nodes

slide-12
SLIDE 12

12

Part 2: Error Coding Chunks Part 2: Error Coding Chunks

  • Each chunk is separately error coded
  • 1. A chunk is split into equal n size blocks
  • 2. The blocks are error coded into m encoded blocks
  • 3. Encoded blocks are inserted into the DHT

12

QuickTime™ and a decompressor are needed to see this picture.

1 2 3

slide-13
SLIDE 13

13

Investigation of Error Codes Investigation of Error Codes

  • Error codes tested and used:
  • XOR code: Protect against single failures
  • Online code: Protect against multiple failures

+ Good redundancy with small space overhead

  • Recovery may consume resources
slide-14
SLIDE 14

14

Part 3: Multicast-based Replication Part 3: Multicast-based Replication

  • Leverage multicast for efficient and fast data

dissemination to multiple destinations

  • Faster recovery at the cost of space
  • Challenge: Creation of a multicast-tree from

source to replica destinations

14

slide-15
SLIDE 15

15

Creating a Multicast Tree Creating a Multicast Tree

  • Use greedy approach
  • Start from the source S
  • Using locality-aware DHT

select random nodes close to S as first tier

  • Repeat selecting at each

tier till replica location R is reached

  • Employ standard

multicast protocols, e.g. Bullet to push data from S to R

15

S R R R R R R R R

slide-16
SLIDE 16

16

Part 4: Interfacing with Applications Part 4: Interfacing with Applications

  • Modify applications to use direct calls to the

PeerStripe API

  • Works well for new applications
  • Link applications with an interposing library to

redirect I/O

  • Transparent integration with existing applications

16

slide-17
SLIDE 17

17

Outline Outline

  • Begin to our Means
  • End to our Means
  • Evaluation Study
  • Conclusion

17

– Simulation – Real world – PlanetLab – Condor

slide-18
SLIDE 18

18

Evaluation: Overview Evaluation: Overview

  • 1. Simulation study:
  • Successful File Stores
  • Number and size of chunks created
  • System utilization (in terms of storage capacity)
  • File availability with error coding
  • Error code performance
  • Effects of participant churn
  • 2. Design verification on PlanetLab
  • 3. Integration with Condor desktop grid

18

slide-19
SLIDE 19

19

Simulation Study Setup Simulation Study Setup

  • 10,000-node directly connected network
  • Assigned node capacities with mean 45 GB and

variance 10 GB

  • File system trace of 1.2M files totaling 278.7 TB
  • Compare with PAST and CFS storage systems

19

slide-20
SLIDE 20

20

Number of Successful File Stores Number of Successful File Stores

  • 7.0x improvement over PAST
  • 2.9x improvement over CFS

20

slide-21
SLIDE 21

21

Number and Size of Chunks Number and Size of Chunks

  • CFS: 61.25 chunks with stdev of 13.8
  • Fixed chunk size of 4 MB
  • PeerStripe: 3.72 chunks with stdev of 3.1
  • Average chunk size 81.28 MB with stdev 19.9 MB

 Fewer chunks in PeerStripe allows

  • Fewer expensive p2p lookups
  • Performance similar to PAST

21

slide-22
SLIDE 22

22

Overall System Capacity Utilization Overall System Capacity Utilization

  • PeerStripe: 20.19% better than PAST
  • PeerStripe: 7.18% better than CFS
  • PeerStripe can utilize the available storage

capacity more efficiently even at higher utilization

22

slide-23
SLIDE 23

23

Error Coding: File Availability Error Coding: File Availability

  • XOR code - 23% less failures
  • Online code - 32% less failures
  • Online code provides excellent fault tolerance

against node failures

23

slide-24
SLIDE 24

24

Error Coding Performance Error Coding Performance

  • Compare XOR (1:1) and Online code with NULL code
  • XOR - factor of 3.3 times faster than online codes
  • Online code - slower than XOR,
  • Decoding can start as soon as a block becomes available and can

be overlapped with retrieval of other blocks

  • The efficiency of online code overshadows its overhead

24

Erasure code Encoded size Encoding time Size(MB) Overhead Time Overhead Null 4 0% 11 0% XOR 6 50% 79 618% Online 4.12 3% 264 2300%

slide-25
SLIDE 25

25

Effects of Participant Churn Effects of Participant Churn

  • Failed up to 20% of total nodes
  • 29.3 GB of data was regenerated per node failure
  • Total of 58,625.8 GB regenerated
  • 142.2 GB data was lost which is small compared to the

278.7 TB of total data

  • The data recreated per failure is small: 0.01%

25

Nodes failed (percentage

  • f total)

Data lost Data regenerated Total (GB) Total (GB) Average (GB) Sd (GB) 10 percent 28044.35 28.04 79.85 20 percent 142.18 58625.78 29.31 80.02

slide-26
SLIDE 26

26

Verification on PlanetLab Verification on PlanetLab

  • 40 different distributed sites
  • Number of failed stores reduced by

330% w.r.t. PAST 105% w.r.t. CFS

  • Storage utilization:

CFS 52%, PAST - 47%, PeerStripe - 63%

  • Online codes provided 98.6% availability

through four node failures

26

slide-27
SLIDE 27

27

Interfacing with Condor Interfacing with Condor

27

QuickTime™ and a decompressor are needed to see this picture.

  • Utilize a 32-node Condor pool
  • CFS and PeerStripe worked for smaller files
  • DHT lookups introduced an overhead - few for

PeerStripe

  • Overhead for PeerStripe is small
slide-28
SLIDE 28

28

Outline Outline

  • Begin to our Means
  • End to our Means
  • Experimental Study
  • Conclusion

28

slide-29
SLIDE 29

29

Conclusion Conclusion

  • P2p-based storage can be extended with erasure

coding and striping to provide robust, scalable, and reliable distributed storage for scientific computing.

  • PeerStripe achieves better utilization of collective

capacity of nodes with good performance

  • Error coding is effective in providing fault tolerance

and data availability

  • Multicast can be used for replica maintenance
  • Use of interposing library allows easy integration

with new and existing applications

29

slide-30
SLIDE 30

30

Questions? Questions?

  • chmille3@cs.vt.edu
  • butta@cs.vt.edu
  • http://research.cs.vt.edu/dssl/

30