R&D Activities on Storage in CERN-ITs FIO group Helge Meinhard - - PowerPoint PPT Presentation

r d activities on storage in cern it s fio group
SMART_READER_LITE
LIVE PREVIEW

R&D Activities on Storage in CERN-ITs FIO group Helge Meinhard - - PowerPoint PPT Presentation

R&D Activities on Storage in CERN-ITs FIO group Helge Meinhard / CERN-IT HEPiX Fall 2009 LBNL 27 October 2009


slide-1
SLIDE 1

R&D Activities on Storage in CERN-IT’s FIO group

  • Helge Meinhard / CERN-IT

HEPiX Fall 2009 LBNL 27 October 2009

slide-2
SLIDE 2

Outline

Follow-up of two presentations in Umea meeting:

  • iSCSI technology (Andras Horvath)
  • Lustre evaluation project (Arne Wiebalck)

Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

slide-3
SLIDE 3

iSCSI - Motivation

  • Three approaches

– Possible replacement for rather expensive setups with Fibre Channel SANs (used e.g. for physics databases with Oracle RAC, and for backup infrastructure) or proprietary high-end NAS appliances

  • Potential cost-saving
  • Potential cost-saving

– Possible replacement for bulk disk servers (Castor)

  • Potential gain in availability, reliability and flexibility

– Possible use for applications, for which small disk servers have been used in the past

  • Potential gain in flexibility, cost-saving
  • Focus is functionality, robustness and large-scale

deployment rather than ultimate performance

Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

slide-4
SLIDE 4

iSCSI terminology

  • iSCSI is a set of protocols for block-level access to

storage

– Similar to FC – Unlike NAS (e.g. NFS)

  • “Target”: storage unit listening to block-level

requests requests

– Appliances available on the market – Do-it-yourself: put software stack on storage node, e.g.

  • ur storage-in-a-box nodes
  • “Initiator”: unit sending block-level requests (e.g.

read, write) to the target

– Most modern operating systems feature an iSCSI initiator stack: Linux RH4, RH5; Windows

Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

slide-5
SLIDE 5

Hardware used

  • Initiators: number of different servers including

– Dell M610 blades – Storage-in-a-box server – All running SLC5

  • Targets:

– Dell Equallogic PS5000E (12 drives, 2 controllers with 3 GigE each) each) – Dell Equallogic PS6500E (48 drives, 2 controllers with 4 GigE each) – Infortrend A12E-G2121 (12 drives, 1 controller with 2 GigE) – Storage-in-a-box: Various models with multiple GigE or 10GigE interfaces, running Linux

  • Network (if required): private, HP ProCurve 3500 and

6600

Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

slide-6
SLIDE 6

Target stacks under Linux

  • RedHat Enterprise 5 comes with tgtd

– Single-threaded – Does not scale well

  • Tests with IET

– Multi-threaded – No performance limitation in our tests – Required newer kernel to work out of the box (Fedora and Ubuntu server worked for us)

  • In context of collaboration between CERN and Caspur, work

going on to understand the steps to be taken for backporting IET to RHEL 5

Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

slide-7
SLIDE 7

Performance comparison

  • 8k random I/O test with Oracle tool Orion

Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

slide-8
SLIDE 8

Performance measurement

  • 1 server, 3 storage-in-a-box servers as targets

– Each target exporting 14 JBOD disks over 10GigE

Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

slide-9
SLIDE 9

Almost production status…

  • Two storage-in-a-box servers with hardware RAID5

running SLC5 and tgtd on GigE

– Initiator provides multipathing and software RAID 1 – Used for some grid services – No issues

  • Two Infortrend boxes (JBOD configuration)

– Again, initiator provides multipathing and software RAID 1 – Used as backend storage for Lustre MDT (see next part)

  • Tools for setup, configuration and monitoring in place

Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

slide-10
SLIDE 10

Being worked on

  • Large deployment of Equallogic ‘Sumos’ (48 drives
  • f 1 TB each, dual controllers, 4 GigE/controller):

24 systems, 48 front-end nodes

  • Experience encouraging, but there are issues

– Controllers don’t support DHCP, manual config required – Buggy firmware – Problems with batteries on controllers – Support not fully integrated into Dell structures yet – Remarkable stability

  • We have failed all network and server components that can fail,

the boxes kept running

– Remarkable performance

Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

slide-11
SLIDE 11

Equallogic performance

  • 16 servers, 8 sumos, 1 GigE per server, iozone

Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

slide-12
SLIDE 12

Appliances vs. home-made

  • Appliances

– Stable – Performant – Highly functional (Equallogic: snapshots, relocation without server involvement, automatic load balancing, …)

  • Home-made with storage-in-a-box servers
  • Home-made with storage-in-a-box servers

– Inexpensive – Complete control over configuration – Can run other things than target software stack – Can select function at software install time (iSCSI target

  • vs. classical disk server with rfiod or xrootd)

Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

slide-13
SLIDE 13

Ideas (partly started testing)

  • Two storage-in-a box servers as highly redundant

setup

– Running target and initiator stacks at the same time – Mounting half the disks local, half on the other machine – Some heartbeat detects failures and (e.g. by resetting an IP alias) moves functionality to one or the other box IP alias) moves functionality to one or the other box

  • Several storage-in-a-box servers as targets

– Exporting disks either as JBOD or as RAID – Front-end server creates software RAID (e.g. RAID 6)

  • ver volumes from all storage-in-a-box servers

– Any one (or two with SW RAID 6) storage-in-a-box server can fail entirely, the data remain available

Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

slide-14
SLIDE 14

Lustre Evaluation Project

  • Tasks and goals

– Evaluate Lustre as a candidate for storage consolidation

  • Home directories
  • Project space
  • Analysis space
  • HSM
  • HSM

– Reduce service catalogue

  • Increase overlap between service teams
  • Integrate with CERN fabric management tools

Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

slide-15
SLIDE 15

Areas of interest (1)

  • Installation

– Quattorized installation of Lustre instances – Client RPMs for SLC5

  • Backup

– LVM-based snapshots for meta data – Tested with TSM, set up for PPS instance – Changelogs feature of v2.0 not yet usable

  • Strong Authentication

– v2.0: early adaptation, full Kerberos Q1/2011 – Tested & used by other sites (not by us yet)

  • Fault-tolerance

– Lustre comes with built-in failover – PPS MDS iSCSI setup

Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009

slide-16
SLIDE 16

FT: MDS PPS Setup

Dell Equallogic iSCSI Arrays 16x 500GB SATA Dell PowerEdge M600 Blade Server 16GB

Private iSCSI Network MDS MDT OSS OSS CLT

Fully redundant against component failure

– iSCSI for shared storage – Linux device mapper + md for mirroring – Quattorized – Needs testing

slide-17
SLIDE 17

Areas of Interest (2/2)

  • Special performance & Optimization

– Small files: „Numbers dropped from slides“ – Postmark benchmark (not done yet)

  • HSM interface

– Active developement, driven by CEA – Access to Lustre HSM code (to be tested with TSM/CASTOR)

  • Life Cycle Management (LCM) & Tools

– Support for day-to-day operations? – Limited support for setup, monitoring and management

slide-18
SLIDE 18

Findings and Thoughts

  • No strong authentication as of now

– Foreseen for Q1/2011

  • Strong client/server coupling

– Recovery

  • Very powerful users
  • Very powerful users

– Striping, Pools

  • Missing support for life cycle management

– No user transparent data migration – Lustre/Kernel upgrades difficult

  • Moving targets on the roadmap

– V2.0 not yet stable enough for testing

slide-19
SLIDE 19

Summary

  • Some desirable features not there (yet)

– Wish list communicated to SUN – SUN interested in evaluation

  • Some more tests to be done

– Kerberos, Small files, HSM

  • Documentation