Functional Assessment of Erasure Coded Storage Archive Computer - - PowerPoint PPT Presentation

functional assessment of erasure coded storage archive
SMART_READER_LITE
LIVE PREVIEW

Functional Assessment of Erasure Coded Storage Archive Computer - - PowerPoint PPT Presentation

LA-UR-13-25967 Functional Assessment of Erasure Coded Storage Archive Computer Systems, Cluster, and Networking Summer Institute Blair Crossman Taylor Sanchez Josh Sackos 1 Presentation Overview Introduction Caringo Testing


slide-1
SLIDE 1

Functional Assessment of Erasure Coded Storage Archive

Blair Crossman Taylor Sanchez Josh Sackos

LA-UR-13-25967

Computer Systems, Cluster, and Networking Summer Institute

slide-2
SLIDE 2

Presentation Overview

  • Introduction
  • Caringo Testing
  • Scality Testing
  • Conclusions

1

slide-3
SLIDE 3

Storage Mediums

  • Tape
  • Priced for capacity not bandwidth
  • Solid State Drives
  • Priced for bandwidth not capacity
  • Hard Disk
  • Bandwidth scales with more drives

2

slide-4
SLIDE 4

Object Storage: Flexible Containers

  • Files are stored in data containers
  • Meta data outside of file system
  • Key-value pairs
  • File system scales with machines
  • METADATA EXPLOSIONS!!

3

slide-5
SLIDE 5

What is the problem?

  • RAID, replication, and tape systems were not designed

for exascale computing and storage

  • Hard disk capacity continues to grow
  • Solution to multiple hard disk failures is needed

4

slide-6
SLIDE 6

Erasure Coding : Reduce Rebuild Recalculate

Reduce! Rebuild! Recalculate!

5

slide-7
SLIDE 7

Project Description

  • Erasure coded object storage file system is a potential

replacement for LANL’s tape archive system

  • Installed and configured two prototype archives
  • Scality
  • Caringo
  • Verified the functionality of systems

6

slide-8
SLIDE 8

Functionality Not Performance

Caringo

  • SuperMicro admin node
  • 1GigE interconnect
  • 10 IBM System x3755

§ 4 x 1TB HDD

  • Erasure coding:
  • n=3
  • k=3

Scality

  • SuperMicro admin node
  • 1GigE interconnect
  • 6 HP Proliant (DL160 G6)

§ 4 x 1TB HDD

  • Erasure coding:
  • n=3
  • k=3

7

slide-9
SLIDE 9

Project Testing Requirements

  • Data
  • Ingest : Retrieval : Balance : Rebuild
  • Metadata
  • Accessibility : Customization : Query
  • POSIX Gateway
  • Read : Write : Delete : Performance overhead

8

slide-10
SLIDE 10

How We Broke Data

  • Pulled out HDDs (Scality, kill daemon)
  • Turned off nodes
  • Uploaded files, downloaded files
  • Used md5sum to compare originals to downloaded

copies

9

slide-11
SLIDE 11

Caringo: The automated storage system

  • Warewulf/Perceus like diskless (RAM) boot
  • Reconfigurable, requires reboot
  • DHCP PXE boot provisioned
  • Little flexibility or customizability
  • http://www.caringo.com

10

slide-12
SLIDE 12

No Node Specialization

  • Nodes "bid" for tasks
  • Lowest latency wins
  • Distributes the work
  • Each node performs all tasks
  • Administrator : Compute : Storage
  • Automated Power management
  • Set a sleep timer
  • Set an interval to check disks
  • Limited Administration Options

11

slide-13
SLIDE 13

Caringo Rebuilds Data As It Is Written

  • Balances data as written
  • Primary Access Node
  • Secondary Access Node
  • Automated
  • New HDD/Node: auto balanced
  • New drives format automatically
  • Rebuilds Constantly
  • If any node goes down rebuild starts immediately
  • Volumes can go "stale”
  • 14 Day Limit on unused volumes

12

slide-14
SLIDE 14

What’s a POSIX Gateway

  • Content File Server
  • Fully Compliant POSIX object
  • Performs system administration tasks
  • Parallel writes
  • Was not available for testing

13

slide-15
SLIDE 15

“Elastic” Metadata

  • Accessible
  • Query: key values
  • By file size, date, etc.
  • Indexing requires “Elastic Search” machine to do

indexing

  • Can be the bottleneck in system

14

slide-16
SLIDE 16

Minimum Node Requirements

  • Needs a full n + k nodes to:
  • rebuild
  • write
  • balance
  • Does not need full n +k to:
  • read
  • query metadata
  • administration

15

slide-17
SLIDE 17

Static Disk Install

  • Requires disk install
  • Static IP addresses
  • Optimizations require deeper knowledge
  • http://www.scality.com

16

slide-18
SLIDE 18

Virtual Ring Resilience

  • Success until less virtual nodes available than n+k

erasure configuration.

  • Data stored to ‘ring’ via distributed hash table

17

slide-19
SLIDE 19

Manual Rebuilds, But Flexible

  • Rebuilds on less than required nodes
  • Lacks full protection
  • Populates data back to additional node
  • New Node/HDD: Manually add node
  • Data is balanced during:
  • Writing
  • Rebuilding

18

slide-20
SLIDE 20

Indexer Sold Separately

  • Query all erasure coding metadata per server
  • Per item metadata
  • User Definable
  • Did not test Scality’s ‘Mesa’ indexing service
  • Extra software

19

slide-21
SLIDE 21

Fuse gives 50% Overhead, but scalable

20

slide-22
SLIDE 22

On the right path

  • Scality
  • Static installation, flexible erasure coding
  • Helpful
  • Separate indexer
  • 500MB file limit ('Unlimited' update coming)
  • Caringo
  • Variable installation, strict erasure coding
  • Good documentation
  • Indexer included
  • 4TB file limit (addressing bits limit)

21

slide-23
SLIDE 23

Very Viable

  • Some early limitations
  • Changes needed on both products
  • Scality seems more ready to make those changes.

22

slide-24
SLIDE 24

Questions?

23

slide-25
SLIDE 25

Acknowledgements

Special Thanks to :

Dane Gardner - NMC Instructor Matthew Broomfield - NMC Teaching Assistant HB Chen - HPC-5 - Mentor Jeff Inman - HPC-1- Mentor Carolyn Connor - HPC-5, Deputy Director ISTI Andree Jacobson - Computer & Information Systems Manager NMC Josephine Olivas - Program Administrator ISTI Los Alamos National Labs, New Mexico Consortium, and ISTI

24