performance of rdma capable storage performance of rdma
play

Performance of RDMA-Capable Storage Performance of RDMA-Capable - PowerPoint PPT Presentation

Performance of RDMA-Capable Storage Performance of RDMA-Capable Storage Protocols on Wide-Area Network Protocols on Wide-Area Network Weikuan Yu Weikuan Yu Nageswara S.V. Rao Nageswara S.V. Rao Pete Wyckoff* Pete Wyckoff* Jeffrey S. Vetter


  1. Performance of RDMA-Capable Storage Performance of RDMA-Capable Storage Protocols on Wide-Area Network Protocols on Wide-Area Network Weikuan Yu Weikuan Yu Nageswara S.V. Rao Nageswara S.V. Rao Pete Wyckoff* Pete Wyckoff* Jeffrey S. Vetter Jeffrey S. Vetter Ohio Supercomputer Center* Supercomputer Center* Ohio Managed by UT-Battelle for the Department of Energy

  2. InfiniBand Clusters around the World InfiniBand Clusters around the World SGI (US) CEA (France) Ranger (US) Tsubame (Japan) Dawning (China) EKA (India) 2 Managed by UT-Battelle for the Department of Energy PDSW'08, Austin, TX

  3. The Problem of Computing Islands The Problem of Computing Islands • Islands of InfiniBand (IB) clusters Islands of InfiniBand (IB) clusters • – More IB clusters are deployed More IB clusters are deployed – – Some already connected, e.g. through Some already connected, e.g. through TeraGrid TeraGrid – • But only via TCP/IP protocols But only via TCP/IP protocols • • Data transfer across these islands Data transfer across these islands • – Need ever-greater data movement capabilities. – Need ever-greater data movement capabilities. – GridFTP, BBCP or other special storage configuration GridFTP, BBCP or other special storage configuration – – TCP performance on Long Distance can be low – TCP performance on Long Distance can be low • With 10GigE on UltraScience Net (no tuning) With 10GigE on UltraScience Net (no tuning) • – 9.2 Gbps at 0.2 mile 9.2 Gbps at 0.2 mile – – 8.2 Gbps at 1400 miles 8.2 Gbps at 1400 miles – – 2.3-2.5 Gbps at 6600+ miles 2.3-2.5 Gbps at 6600+ miles – 3 Managed by UT-Battelle for the Department of Energy PDSW'08, Austin, TX

  4. RDMA (IB) in Clusters and Local Area Networks RDMA (IB) in Clusters and Local Area Networks Sub-microsecond latency Sub-microsecond latency • • Superb bandwidth (32Gbps with IB QDR) Superb bandwidth (32Gbps with IB QDR) • • Heavily used for clustering Heavily used for clustering • • Getting popular in storage environment Getting popular in storage environment • • – NFS over RDMA ( NFS over RDMA (NFSoRDMA NFSoRDMA) ) – – SCSI RDMA Protocol (SRP) – SCSI RDMA Protocol (SRP) – iSCSI over RDMA ( iSCSI over RDMA (iSER iSER) ) – Applications Applications MPI NFS/iSERI/SRP MPI NFS/iSERI/SRP Verbs Verbs InfiniBand HCA InfiniBand HCA 1 µ sec 4 Managed by UT-Battelle for the Department of Energy PDSW'08, Austin, TX

  5. Sample Performance of RDMA-based Storage Sample Performance of RDMA-based Storage RDMA
enables
good
iSCSI
bandwidth
within
LAN RDMA
enables
good
iSCSI
bandwidth
within
LAN • • Nearly
doubled
the
performance
for
iSCSI Nearly
doubled
the
performance
for
iSCSI • • 5 Managed by UT-Battelle for the Department of Energy PDSW'08, Austin, TX

  6. Feasibility of RDMA (IB) on WAN Feasibility of RDMA (IB) on WAN Long-range Extensions for InfiniBand available Long-range Extensions for InfiniBand available • • – Network Equipment Technologies (NET): NX5010 Network Equipment Technologies (NET): NX5010 – – Obsidian Research: Longbow Obsidian Research: Longbow – Long latency (10 4 Long latency (10 4 ~10 ~10 5 5 µ sec) µ sec) • • High bandwidth yet feasible High bandwidth yet feasible • • – – Good Good distance scalability and tolerance to interfering traffic distance scalability and tolerance to interfering traffic – Good network throughput and MPI-level Performance – Good network throughput and MPI-level Performance Can RDMA provide a good transport protocol for storage on WAN? Can RDMA provide a good transport protocol for storage on WAN? • • Applications Applications MPI NFS/iSERI/SRP MPI NFS/iSERI/SRP Verbs Verbs InfiniBand HCA InfiniBand HCA 10 4 ~10 5 µ sec 6 Managed by UT-Battelle for the Department of Energy PDSW'08, Austin, TX

  7. Experimental Environment Experimental Environment Hardware Hardware • • – Long-range IB extension devices from NET (Network Equipment – Long-range IB extension devices from NET (Network Equipment Technologies, Inc) Technologies, Inc) – Mellanox PCI-Express 4x DDR HCAs HCAs (InfiniHost-III and Connect-X) (InfiniHost-III and Connect-X) – Mellanox PCI-Express 4x DDR Software Packages Software Packages • • – OFED-1.3 from openfabrics openfabrics.org .org – OFED-1.3 from – Linux-2.6.25 with Linux-2.6.25 with NFSoRDMA NFSoRDMA and and iSER iSER support support – Performance of RDMA-based Storage Protocols on WAN Performance of RDMA-based Storage Protocols on WAN • • – NFS over RDMA – NFS over RDMA – – iSCSI over RDMA iSCSI over RDMA 7 Managed by UT-Battelle for the Department of Energy PDSW'08, Austin, TX

  8. UltraScience Net at ORNL UltraScience Net at ORNL Experimental WAN Network Experimental WAN Network • • – Oak Ridge, Atlanta, Chicago, Seattle, and Sunnyvale Atlanta, Chicago, Seattle, and Sunnyvale – Oak Ridge, – OC192 backbone connections – OC192 backbone connections – 4300 miles one way, 8600 miles loop-back – 4300 miles one way, 8600 miles loop-back 8 Managed by UT-Battelle for the Department of Energy PDSW'08, Austin, TX

  9. RDMA-based Transport RDMA-based Transport Request
and
request
becomes
pure
control
messages, Request
and
request
becomes
pure
control
messages, • • and
have
to
travel
long
distance
on
WAN and
have
to
travel
long
distance
on
WAN Use
of
RDMA
read
(round‐trip
operations)
for
clients
to
write
data Use
of
RDMA
read
(round‐trip
operations)
for
clients
to
write
data • • Possible
additional
control
messages
for
NFSoRDMA
for
long
arguments Possible
additional
control
messages
for
NFSoRDMA
for
long
arguments • • Further
fragmentation
due
to
the
use
of
page‐based
operations Further
fragmentation
due
to
the
use
of
page‐based
operations • • 9 Managed by UT-Battelle for the Department of Energy PDSW'08, Austin, TX

  10. RDMA on WAN RDMA on WAN RDMA
has
good
network‐level
performance
within
short
distance
WAN RDMA
has
good
network‐level
performance
within
short
distance
WAN • • High
bandwidth
at
long
distance
is
only
possible
for
large
messages High
bandwidth
at
long
distance
is
only
possible
for
large
messages • • Low
RDMA‐read
performance
for
page‐based
messages
(4KB),
even
at Low
RDMA‐read
performance
for
page‐based
messages
(4KB),
even
at • • 0.2
mile
when
using
InfiniHost‐III
 0.2
mile
when
using
InfiniHost‐III 
HCAs HCAs 10 Managed by UT-Battelle for the Department of Energy PDSW'08, Austin, TX

  11. NFS over RDMA NFS over RDMA NFS
over
RDMA
achieves
good
 NFS
over
RDMA
achieves
good 
bandwidth
within
short
distance bandwidth
within
short
distance • • But
significant
optimizations
are
needed
for
long
distance But
significant
optimizations
are
needed
for
long
distance • • 11 Managed by UT-Battelle for the Department of Energy PDSW'08, Austin, TX

  12. NFS - Large block size NFS - Large block size NFS
over
IPoIB‐CM
benefits
from
large
block
size NFS
over
IPoIB‐CM
benefits
from
large
block
size • • NFS
over
RDMA
needs
to
support
large
block
size
for
better
fit NFS
over
RDMA
needs
to
support
large
block
size
for
better
fit • • on
long‐distance
WAN on
long‐distance
WAN 12 Managed by UT-Battelle for the Department of Energy PDSW'08, Austin, TX

  13. NFS over RDMA - using Connect-X NFS over RDMA - using Connect-X Better
RDMA
read
in
connect‐X
improves
 Better
RDMA
read
in
connect‐X
improves 
the
performance the
performance • • of
file
write
for
NFS
over
RDMA of
file
write
for
NFS
over
RDMA Performance
at
long
distance
is
yet
to
determine Performance
at
long
distance
is
yet
to
determine • • 13 Managed by UT-Battelle for the Department of Energy PDSW'08, Austin, TX

  14. iSCSI over RDMA (iSER iSER) ) iSCSI over RDMA ( RDMA
enables
high‐performance
iSCSI
within
short
distance RDMA
enables
high‐performance
iSCSI
within
short
distance • • RDMA
has
good
promise
over
long
distance
as
shown
with
large RDMA
has
good
promise
over
long
distance
as
shown
with
large • • messages messages 14 Managed by UT-Battelle for the Department of Energy PDSW'08, Austin, TX

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend