R.A.I.D.F.S Randomized Aggregation Independent Distributed File - - PowerPoint PPT Presentation

r a i d f s
SMART_READER_LITE
LIVE PREVIEW

R.A.I.D.F.S Randomized Aggregation Independent Distributed File - - PowerPoint PPT Presentation

R.A.I.D.F.S Randomized Aggregation Independent Distributed File System P2P Distributed File System with an API for Map-Reduce Integration Sven Reber, Jrmy Gotteland, David Froelicher, Alban Marguet, Pascal Cudr, Valrian Pittet Context


slide-1
SLIDE 1

Randomized Aggregation Independent Distributed File System

R.A.I.D.F.S

P2P Distributed File System with an API for Map-Reduce Integration Sven Reber, Jérémy Gotteland, David Froelicher, Alban Marguet, Pascal Cudré, Valérian Pittet

slide-2
SLIDE 2

Context

  • clusters hard to configure and expensive to maintain
  • everyone has a computer
  • lots of unused storage and computational resources on end-

user machine

  • network connexions are improving
slide-3
SLIDE 3

Goals

Peer to peer DFS that is

  • designed to support Map-Reduce operations

○ chunking by line blocks ○ text files

  • resilient
  • easy to configure (dynamic configuration)

○ simply connect to the network and run your jobs

slide-4
SLIDE 4

Architecture

slide-5
SLIDE 5

DFS - Stabilization

GlobalChunkField <= 3 (arbitrary) is an unstable state

slide-6
SLIDE 6

DFS - Stabilization

Look at its neighbors chunkfields

slide-7
SLIDE 7

DFS - Stabilization

Randomly gets one of the insufficiently replicated chunk

slide-8
SLIDE 8

DFS - Stabilization

Do not download chunk if it finds enough replicas

slide-9
SLIDE 9

DFS - Stabilization

File is “stable” when there is enough replicas

slide-10
SLIDE 10

DFS - put

New file : “put” command

slide-11
SLIDE 11

DFS - put

publish an index update, then neighbors discover every 20s

slide-12
SLIDE 12

DFS - put

neighbors try to stabilize file (same process as before)

slide-13
SLIDE 13

DFS - put

neighbors get missing chunks randomly to complete their GCF

slide-14
SLIDE 14

DFS - other commands

commands available

  • ls
  • put
  • get
  • rm
slide-15
SLIDE 15

Map operation

  • Some peer starts a Job
  • MapFiles (jobid, Resource, Initiator, MapFunction)

○ Each chunk mapped to its result files (can be created in advance) -> One folder for each mapped chunk ○ One key chunk for each key discovered in the original chunk

slide-16
SLIDE 16

MapFile

slide-17
SLIDE 17

Reduce operation

  • Keys discovered

during map

  • Keys sent to

initiator

slide-18
SLIDE 18

ReduceFile

  • Initiator prepare

ReduceFile on DFS

slide-19
SLIDE 19

ReduceFile

  • Peer that wants to

create a ReduceFile chunk download the needed keyChunks

slide-20
SLIDE 20

ReduceFile

  • Initiator knows that

a reduce is finished when ReduceFile is stable on DFS

slide-21
SLIDE 21

What’s Next

  • Large Scale & Stress Tests of DFS
  • Implement the Map and Reduce files
  • Include multi-master management (results from

the MRp2p paper)