Parallel File Systems John White Lawrence Berkeley National Lab - - PowerPoint PPT Presentation

parallel file systems
SMART_READER_LITE
LIVE PREVIEW

Parallel File Systems John White Lawrence Berkeley National Lab - - PowerPoint PPT Presentation

Parallel File Systems John White Lawrence Berkeley National Lab Topics Defining a File System Our Specific Case for File Systems Parallel File Systems A Survey of Current Parallel File Systems Implementation What is a


slide-1
SLIDE 1

Parallel File Systems

John White

Lawrence Berkeley National Lab

slide-2
SLIDE 2

Topics

  • Defining a File System
  • Our Specific Case for File Systems
  • Parallel File Systems
  • A Survey of Current Parallel File

Systems

  • Implementation
slide-3
SLIDE 3

What is a File System?

  • Simply, a method for ensuring

▪ A Unified Access Method to Data ▪ Organization (in a technical sense…) ▪ Data Integrity ▪ Efficient Use of Hardware

slide-4
SLIDE 4

The HPC Application (our application)

  • Large Node Count
  • High IO Code (small file operations)
  • High Throughput Code (large files fast)
  • You Can Never Provide Too Much Capacity
slide-5
SLIDE 5

What’s the Problem With Tradition?

  • NFS/CIFS/AFP/NAS is slow

Single point of contact for both data and metadata

Protocol Overhead

File based locking

We want parallelism from the application to disk

  • We Need a Single Namespace
  • We Need Truly Massive Aggregate Throughput

(stop thinking MB/s)

  • Bottlenecks are Inherent to Architecture
  • Most Importantly:
slide-6
SLIDE 6
  • Researchers Just Don’t Care

▪ They want their data available everywhere ▪ They hate transferring data (this bears repeating) ▪ Their code wants the data several cycles ago ▪ If they have to learn new IO APIs, they commonly

won't use it, period

▪ An increasing number aren’t aware their code is

inefficient

slide-7
SLIDE 7

Performance in Aggregate: A Specific Case

  • File System capable of Performance of 5GB/s
  • Researcher running an analysis of past stock ticker

data ▪ 10 independent processes per node, 10+ nodes, sometimes 1000+ processes ▪ Was running into “performance issues”

  • In Reality, code was hitting 90% of peak performance

▪ 100s of processes choking each other ▪ Efficiency is key

slide-8
SLIDE 8

Parallel File Systems

  • A File System That Provides

▪ Access to Massive Amounts of Data at Large Client

Counts

▪ Simultaneous Client Access at Sub-File Levels ▪ Striping at Sub-File Levels ▪ Massive Scalability ▪ A Method to Aggregate Large Numbers of Disks

slide-9
SLIDE 9

Popular Parallel File Systems

  • Lustre

Purchased by Intel

Support offerings from Intel, Whamcloud and numerous vendors

Object based

Growing feature list

∼ Information Lifecycle Management ∼ “Wide Area” mounting support ∼ Data replication and Metadata clustering planned

Open source

∼ Large and growing install base, vibrant community ∼ “Open” compatibility

slide-10
SLIDE 10

Popular Parallel File Systems

  • GPFS

IBM, born around 1993 as Tiger Shark multimedia file system

Support direct from vendor

AIX, Linux, some Windows

Ethernet and Infiniband support

Wide Area Support

ILM

Distributed metadata and locking

Matured storage pool support

Replication

slide-11
SLIDE 11

Licensing Landscape

  • GPFS (A Story of a Huge Feature Set at a Huge Cost)

▪ Binary ∼ IBM licensing

  • Per Core

▪ Site-Wide

  • Lustre

▪ Open ▪ Paid Licensing available tied to support offerings

slide-12
SLIDE 12

Striping Files

slide-13
SLIDE 13

SAN – All nodes have access to storage fabric, all LUNs

slide-14
SLIDE 14

Direct Connect – A separate storage cluster hosts and exports via common fabric

slide-15
SLIDE 15

Berkeley Research Computing

  • Current Savio Scratch File System

▪ Lustre 2.5 ▪ 210TB of DDN 9900 ∼ ~10GB/s ideal throughput ▪ Accessible on all nodes

  • Future

▪ Lustre 2.5 or GPFS 4.1 ▪ ~1PB+ Capacity ▪ ~20GB/s throughput ▪ Vendor yet to be determined

slide-16
SLIDE 16

Berkeley Research Computing

  • Access Methods

▪ Available on every node ∼ POSIX ∼ MPIIO ▪ Data Transfer ∼ Globus Online

  • Ideal for large transfers
  • Restartable
  • Tuned for large networks and long distance
  • Easy to use graphical interface online

∼ SCP/SFTP

  • Well known
  • Suitable for quick and dirty transfers
slide-17
SLIDE 17

Current Technological Landscape

  • Tiered Storage (Storage Pools)

▪ When you have multiple storage needs within a single namespace ∼ SSD/FC for for jobs, metadata (Tier0) ∼ SATA for capacity (Tier1) ∼ Tape for long-term/archival (Tier2)

  • ILM

▪ Basically, perform actions on data per a rule set ∼ Migration to Tape ∼ Fast Tier 0 storage use case ∼ Purge Policies

  • Replication

▪ Dangers of metadata operations ▪ Long term storage

slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20

Further Information

Berkeley Research Computing http://research-it.berkeley.edu/brc HPCS At LBNL http://scs.lbl.gov/ Email: jwhite@lbl.gov