The Direct Access File System (DAFS) Matt DeBergalis, Peter Corbett, - - PowerPoint PPT Presentation

the direct access file system dafs
SMART_READER_LITE
LIVE PREVIEW

The Direct Access File System (DAFS) Matt DeBergalis, Peter Corbett, - - PowerPoint PPT Presentation

The Direct Access File System (DAFS) Matt DeBergalis, Peter Corbett, Steve Kleiman, Arthur Lent, Dave Noveck, Tom Talpey, Mark Wittle Network Appliance, Inc. Usenix FAST 03 Tom Talpey tmt@netapp.com 1 Outline DAFS DAT / RDMA


slide-1
SLIDE 1

1

The Direct Access File System (DAFS)

Matt DeBergalis, Peter Corbett, Steve Kleiman,

Arthur Lent, Dave Noveck, Tom Talpey, Mark Wittle

Network Appliance, Inc.

Usenix FAST ’03 Tom Talpey tmt@netapp.com

slide-2
SLIDE 2

2 Usenix FAST ‘03

Outline

DAFS DAT / RDMA DAFS API Benchmark results

slide-3
SLIDE 3

3 Usenix FAST ‘03

DAFS – Direct Access File System

File access protocol, based on NFSv4 and

RDMA, designed specifically for high- performance data center file sharing (local sharing)

Low latency, high throughput, and low

  • verhead

Semantics for clustered file sharing

environment

slide-4
SLIDE 4

4 Usenix FAST ‘03

DAFS Design Points

Designed for high performance

Minimize client-side overhead

Base protocol: remote DMA, flow control

Operations: batch I/O, cache hints, chaining

Direct application access to transport resources

Transfers file data directly to application buffers

Bypasses operating system overhead

File semantics

Improved semantics to enable local file sharing

Superset of CIFS, NFSv3, NFSv4 (and local file systems!)

Consistent high-speed locking

Graceful client and server failover, cluster fencing

http://www.dafscollaborative.org

slide-5
SLIDE 5

5 Usenix FAST ‘03

DAFS Protocol

Session-based Strong authentication Message format optimized Multiple data transfer models Batch I/O Cache hints Chaining

slide-6
SLIDE 6

6 Usenix FAST ‘03

DAFS Protocol Enhanced Semantics

Rich locking Cluster fencing Shared key reservations Exactly-once failure semantics Append mode, Create-unlinked, Delete-on-last-

close

slide-7
SLIDE 7

7 Usenix FAST ‘03

DAT – Direct Access Transport

Common requirements and an abstraction of services

for RDMA - Remote Direct Memory Access

Portable, high-performance transport underpinning for DAFS and applications

Defines communications endpoints, transfer semantics, memory description, signalling, etc.

Transfer models:

Send (like traditional network flow)

RDMA Write (write directly to advertised peer memory)

RDMA Read (read from advertised peer memory)

Transport independent

1 Gb/s VI/IP, 10 Gb/s InfiniBand, future RDMA over IP

http://www.datcollaborative.org

slide-8
SLIDE 8

8 Usenix FAST ‘03

DAFS Inline Read

READ_INLINE

Application Buffer

Send Descriptor Receive Descriptor

Client

REPLY

Server Buffer

Send Descriptor Receive Descriptor

Server

READ_INLINE

REPLY

1 2 3

slide-9
SLIDE 9

9 Usenix FAST ‘03

DAFS Direct Read

READ_DIRECT

Application Buffer

Send Descriptor Receive Descriptor

Client

REPLY

Server Buffer

Send Descriptor Receive Descriptor

Server

READ_DIRECT

REPLY

1 2 3

RDMA Write

slide-10
SLIDE 10

10 Usenix FAST ‘03

DAFS Inline Write

WRITE_INLINE

Application Buffer

Send Descriptor Receive Descriptor

Client

REPLY

Server Buffer

Send Descriptor Receive Descriptor

Server

WRITE_INLINE

REPLY

1 2 3

slide-11
SLIDE 11

11 Usenix FAST ‘03

DAFS Direct Write

WRITE_DIRECT

Application Buffer

Send Descriptor Receive Descriptor

Client

REPLY

Server Buffer

Send Descriptor Receive Descriptor

Server

WRITE_DIRECT

REPLY

1 2 3

RDMA Read

slide-12
SLIDE 12

12 Usenix FAST ‘03

DAFS-enabled Applications

Raw Device Adapter

Disk I/O Syscalls

Application (unchanged) Buffers Device Driver DAFS Library DAT Provider Library NIC Driver RDMA NIC

  • Kernel-level plug-in
  • Looks like raw disk
  • App uses standard

disk I/O calls

  • Very limited access to

DAFS features

  • Performance similar

to direct-attached disk

Kernel File System

File I/O Syscalls

Application (unchanged) Buffers File System DAFS Library DAT Provider Library NIC Driver RDMA NIC

  • Kernel-level plug-in
  • Peer to local FS
  • App uses standard

file I/O semantics

  • Limited access to

DAFS features

  • Performance similar

to local FS

User Library

Application (modified) Buffers RDNA NIC DAFS Library DAT Provider Library NIC Driver

User Space OS Kernel H/W

  • User-level library
  • Best performance
  • Full application

access to DAFS semantics

  • Paper focuses on

this style

User Space OS Kernel H/W

DAFS API Calls

slide-13
SLIDE 13

13 Usenix FAST ‘03

DAFS API

File based: exports DAFS semantics Designed for highest application performance Lowest client CPU requirements of any I/O system Rich semantics that meet or exceed local file system

capabilities

Portable and consistent interface and semantics

across platforms

No need for different mount options, caching policies, client-side SCSI commands, etc.

DAFS API interface is completely specified in an open standard document, not in OS-specific documentation

Operating system avoidance

slide-14
SLIDE 14

14 Usenix FAST ‘03

The DAFS API

Why a new API?

Backward compatibility with POSIX is fruitless

  • File descriptor sharing, signals, fork()/exec()

Performance

  • RDMA (memory registration), completion groups

New semantics

  • Batch I/O, cache hints, named attributes, open with

key, delete on last close

Portability

  • OS independence and semantic consistency
slide-15
SLIDE 15

15 Usenix FAST ‘03

Key DAFS API Features

Asynchronous

High performance interfaces support native asynchronous file I/O

Many I/Os can be issued and awaited concurrently

Memory registration

Efficiently prewires application data buffers, permitting RDMA (direct data placement)

Extended semantics

Batch I/O, delete on last close, open with key, cluster fencing, locking primitives

Flexible completion model

Completion groups segregate related I/O

Applications can wait on specific requests, any of a set, or any number of a set

slide-16
SLIDE 16

16 Usenix FAST ‘03

Key DAFS API Features

Batch I/O

Essentially free I/O: amortizes costs of I/O issue over many requests

Asynchronous notification of any number of completions

Scatter/gather file regions and memory regions independently

Support for high-latency operations

Cache hints

Security and authentication

Credentials for multiple users

Varying levels of client authentication: none, default, plaintext password, HOSTKEY, Kerberos V, GSS-API

Abstraction

server discovery, transient failure and recovery, failover, multipathing

slide-17
SLIDE 17

17 Usenix FAST ‘03

Benchmarks

Microbenchmarks to measure throughput and

cost per operation of DAFS versus traditional network I/O

Application benchmark to demonstrate value

  • f modifying application to use DAFS API
slide-18
SLIDE 18

18 Usenix FAST ‘03

Benchmark Configuration

User-space DAFS library, VI provider NetApp F840 Server, fully cached workload

Adapters (GbE):

  • Intel PRO/1000
  • Emulex GN9000 VI/TCP

NFSv3/UDP, DAFS

Sun 280R client

Adapters:

  • Sun “Gem 2.0”
  • Emulex GN9000 VI/TCP

Point-to-point connections

slide-19
SLIDE 19

19 Usenix FAST ‘03

Microbenchmarks

Measures read performance NFS kernel versus DAFS user Asynchronous and Synchronous Throughput versus blocksize Throughput versus CPU time DAFS advantages are evident:

Increased throughput

Constant overhead per operation

slide-20
SLIDE 20

20 Usenix FAST ‘03

Microbenchmark Results

slide-21
SLIDE 21

21 Usenix FAST ‘03

Application (GNU gzip)

Demonstrates benefit of user I/O parallelism Read, compress, write 550MB file Gzip modified to use DAFS API

Memory preregistration, asynchronous read and write

16KB blocksize 1 CPU, 1 process: DAFS advantage 2 CPUs, 2 processes: DAFS 2x speedup

slide-22
SLIDE 22

22 Usenix FAST ‘03

GNU gzip Runtimes

slide-23
SLIDE 23

23 Usenix FAST ‘03

Conclusion

DAFS protocol enables high-performance local

file sharing

DAFS API leverages benefit of user space I/O The combination yields significant

performance gains for I/O intensive applications