Rapid Replication of Multi- Petabyte File Systems Justin Sybrandt - - PowerPoint PPT Presentation

rapid replication of multi petabyte file systems
SMART_READER_LITE
LIVE PREVIEW

Rapid Replication of Multi- Petabyte File Systems Justin Sybrandt - - PowerPoint PPT Presentation

Rapid Replication of Multi- Petabyte File Systems Justin Sybrandt Jason Hick (NSF award number 1157075) NERSC Stores PB of scientific data. Needed to replicate whole file systems. The Problem Need to freshen a stale copy.


slide-1
SLIDE 1

Rapid Replication of Multi- Petabyte File Systems

Justin Sybrandt Jason Hick

(NSF award number 1157075)

slide-2
SLIDE 2

NERSC

  • Stores PB of scientific data.
  • Needed to replicate whole file systems.
slide-3
SLIDE 3

The Problem

  • Need to freshen a stale copy.

○ File system backups. ○ Disaster recovery. ○ Moving locations.

slide-4
SLIDE 4

Distsync

  • Quickly determines the

changes between two file systems.

  • Follows the Master-Slave Paradigm.
  • Similar to Shift, but streamlined for

large synchronizations.

Out-Of-Date GPFS Up-To-Date GPFS

NODE NODE NODE

slide-5
SLIDE 5

Job Generation

  • Policy scans produce

lists of all files.

  • Generator creates job

files in linear time.

slide-6
SLIDE 6

Job File

  • Contains a list of file paths.
  • Limited in size.

(when possible)

  • Type implies action.
slide-7
SLIDE 7

Job Scheduling

  • Manager ensures

that jobs are completed in the right order.

slide-8
SLIDE 8

Job Completion

  • Workers start processing

jobs in parallel.

  • Utilizes system

commands.

slide-9
SLIDE 9

Micro Benchmark

slide-10
SLIDE 10

Conclusions

DistSync Processes file system scans. Creates job files. Maximises file system bandwidth. Frequent syncs lead to faster syncs.

slide-11
SLIDE 11

Thank You!

Questions?