Towards A Parallel and Restartable Data Transfer Mechanism in iRODS - - PowerPoint PPT Presentation

towards a parallel and restartable data transfer
SMART_READER_LITE
LIVE PREVIEW

Towards A Parallel and Restartable Data Transfer Mechanism in iRODS - - PowerPoint PPT Presentation

Towards A Parallel and Restartable Data Transfer Mechanism in iRODS Zoey Greer Jason Coposky Terrell Russell Hao Xu June 5, 2018 Introduction Current iRODS implementation supports limit parallel transfer and restart capability. We


slide-1
SLIDE 1

Towards A Parallel and Restartable Data Transfer Mechanism in iRODS

Zoey Greer Jason Coposky Terrell Russell Hao Xu June 5, 2018

slide-2
SLIDE 2

Introduction

Current iRODS implementation supports limit parallel transfer and restart capability. We introduce a design that extends current iRODS to support multiple tasks related to parallel transfer and restart in a unified, general solution. We want to

◮ extend rather than completely rewrite the current iCAT. ◮ put, get, replication symmetrically. ◮ build API up from microservices. ◮ support parallel transfer ◮ support distributed storage of data. ◮ support partial replicas. ◮ support automatic restart. ◮ support partial synchronization. ◮ support distributed strorage of ICAT efficiently

slide-3
SLIDE 3

The Design: Current

data object replica resource 1-n 1-1

Figure: Entity-Relationship Diagram

slide-4
SLIDE 4

The Design: Parallel and Restart

data object replica block resource 1-n 1-n m-n n-1

Figure: Entity-Relationship Diagram

slide-5
SLIDE 5

Block Level

◮ Block level

put get client to server y y client to client n n server to server y y/n

◮ Data Object level: put-get-replicate

slide-6
SLIDE 6

Data Types

type Error type Range -- = (Int, Bitmap) type Block type Data_object -- = (Path, Timestamp) type Replica -- = (Data_object, Host, Replica_num)

slide-7
SLIDE 7

block_put

Push a block to a resource using block_put. In the following, we use a default block size of 4MB. block_put : (Replica, Range, [Block]) -> () This can be used in various operations.

slide-8
SLIDE 8

data_object

The put operation is initiated by the client by the data_object

  • peration.

data_object : Data_object -> [(Replica, Range)] This request can be to any server.

slide-9
SLIDE 9

replica

For each resource, the client start putting blocks into replicas using the replica operation. replica : (Replica, Range) -> Range The returned range is a range of existing blocks on the resource in the input range. Based on returned range, the client sends the blocks to the resource.

slide-10
SLIDE 10

block_get

Pull a block from a resource using block_get. block_get : (Replica, Range) -> [Block]

slide-11
SLIDE 11

put

client server1 server2 data_object [(server2,0-128)] replica 0-64 block_put(64-128)

slide-12
SLIDE 12

get

client server1 server2 data_object [(server2,0-128)] replica 0-128 block_get(64-128)

slide-13
SLIDE 13

replicate

client server1 server2 server3 data_object [(server2,0-128)] replica 0-128 replica 0-64 replicate block_put

slide-14
SLIDE 14

Storing incomplete replica

metadata blocks

Figure: Incomplete replica

Metadata contain Replica and Range of available blocks

slide-15
SLIDE 15

Parallel put

metadata 1 replica 1 metadata 2 replica 2 blocks

Figure: Multi-part put

slide-16
SLIDE 16

Parallel get

metadata 1 replica 1 metadata 2 replica 2 blocks metadata 3

Figure: Multi-part get