A virtualized approach to Mass Storage System Dorin Lobontu, Jos van - - PowerPoint PPT Presentation

a virtualized approach to mass storage system
SMART_READER_LITE
LIVE PREVIEW

A virtualized approach to Mass Storage System Dorin Lobontu, Jos van - - PowerPoint PPT Presentation

A virtualized approach to Mass Storage System Dorin Lobontu, Jos van Wezel and Martin Beitzinger STEINBUCH CENTRE FOR COMPUTING KIT University of the State of Baden-Wuerttemberg and www.kit.edu National Research Center of the Helmholtz


slide-1
SLIDE 1

KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association

STEINBUCH CENTRE FOR COMPUTING

www.kit.edu

A virtualized approach to Mass Storage System

Dorin Lobontu, Jos van Wezel and Martin Beitzinger

slide-2
SLIDE 2

Steinbuch Centre for Computing 2 Dorin Lobontu

Presentation Overview

01.06.2011

  • GridKa Storage Overview
  • TSM as Management System for MSS
  • Tape Library Virtualization with ERMM
  • Tape Reports
slide-3
SLIDE 3

Steinbuch Centre for Computing 3 Dorin Lobontu

GridKa Storage Overview

01.06.2011

  • ptimized tape access

keep file copies for performance improvement write on tape by TSM

NameSpace Operations Access Controls Storage Management Pool Management LHC-Centers

  • temporary storage
  • analysis
  • monte-carlo-simulation

GridFTP

  • an extension of the standard

FTP for Grid applications

  • authentification over GSI

(Grid Security Infrastructure)

  • encryption by SSL
  • partial file transfer
  • automatic TCP optimization
  • parallel and striped transfers

GridFTP GridKa Storage System dCache

dCache is a storage management system:

  • manages a large amount of data
  • stores data on distributed media (disk, tape)
  • hierarchical storage management
  • has automatic load balancing

GridKa Storage System

  • 75 fileservers in 3 dCache

installations

  • 3 tape libraries - 10 PB tape

capacity

  • 8 PB disk capacity

MSS

350 MegaByte per second

Read-Pools Write-Pools Stage-pools

1 GB/s 1 GB/s

slide-4
SLIDE 4

Steinbuch Centre for Computing 4 Dorin Lobontu

MSS Requirements

01.06.2011

Components

Mass Storage System

Library manager dCache TSS+STA+DM dCache TSS+STA+DM dCache TSS+STA+DM xrootd TSS+STA+DM xrootd TSS+STA+DM LSDF Clients dCache Arch. Manager Xrootd Arch. Manager LSDF Arch. Manager

  • MSS has to have a scalable architecture
  • MSS has to uncouple tape resources and

applications

  • MSS has to share the same resources for different

applications

  • MSS has to provide security mechanisms to

prevent/grant applications access to its resources

slide-5
SLIDE 5

Steinbuch Centre for Computing 5 Dorin Lobontu

Presentation Overview

01.06.2011

  • GridKa Storage Overview
  • TSM as Management System for MSS
  • Tape Library Virtualization with ERMM
  • Tape Reports
slide-6
SLIDE 6

Steinbuch Centre for Computing 6 Dorin Lobontu

TSM as Library Manager

01.06.2011

TSM Server & Library Manager

IBM TS3500 Grau ITL-XL STK SL-8500

dCachePool dCachePool dCachePool dCachePool tss tss tss tss StorageAgent StorageAgent StorageAgent StorageAgent

  • on the TSM server one path for every agent and every tape drive must be defined

(65 agents x 26 drives = 1690 paths)

  • these paths must be manually maintained
slide-7
SLIDE 7

Steinbuch Centre for Computing 7 Dorin Lobontu

Distributing Data over all Libraries

01.06.2011

TSS TSM

Grau ITL-XL IBM TS3500 STK SL-8500

MGMTC1 STGPOOL1 MGMTC2 STGPOOL2 MGMTCN STGPOOLN StorageClass1 <-> TSM MGMTC1 StorageClass2 <-> TSM MGMTC2 StorageClassN <-> TSM MGMTCN StorageClass1 StorageClass2 StorageClassN StorageClassX

dCache

  • data is statically distributed by TSS (Tape Staging Server) over the

libraries

  • drives load-balancing is not possible
  • a library crash interrupts the processes assigned to this library

DevC-Grau DevC-IBM DevC-STK

slide-8
SLIDE 8

Steinbuch Centre for Computing 8 Dorin Lobontu

Presentation Overview

01.06.2011

GridKa Storage Overview TSM as Management System for MSS

Tape Libraries Virtualization with ERMM

Tape Reports

slide-9
SLIDE 9

Steinbuch Centre for Computing 9 Dorin Lobontu

ERMM as Library Manager

01.06.2011

IBM TS3500 Grau ITL-XL STK SL-8500

ERMM TSM

ERMM-Client dCache-Pool tss StorageAgent ERMM-Client ERMM-Client ERMM-Client dCache-Pool dCache-Pool dCache-Pool tss tss tss StorageAgent StorageAgent StorageAgent

  • ERMM :
  • takes over the entire management of the libraries
  • coordinates the access to drives and tapes
  • logs all activities in an own DB2 database
  • provides a single point of control of tape resources
slide-10
SLIDE 10

Steinbuch Centre for Computing 10 Dorin Lobontu

Distributing Data over all Libraries

01.06.2011

dCache TSS TSM ERMM

StorageClass1 StorageClass2 StorageClassX StorageClassN StorageClass1 <-> TSM MGMTC1 StorageClass2 <-> TSM MGMTC2 StorageClassN <-> TSM MGMTCN MGMTC1 STGPOOL1 MGMTC2 STGPOOL2 MGMTCN STGPOOLN

DevC-LTO STK GRAU IBM drives group tapes group

  • TSM has only one external library
  • TSM defines only one path for every storage agent to the external

library

  • ERMM maintains dynamically all path from the storage agents to all

drives

  • ERMM spreads the data over all phisycal libraries
  • ERMM makes dynamic drives load balancing
slide-11
SLIDE 11

Steinbuch Centre for Computing 11 Dorin Lobontu

Presentation Overview

01.06.2011

GridKa Storage Overview TSM as Management System for MSS Tape Libraries Virtualization with ERMM

Tape Reports

slide-12
SLIDE 12

Steinbuch Centre for Computing 12 Dorin Lobontu

Collecting Statistics Data

01.06.2011

Sense data request ERMM event Sense data

ERMM

pipe collector

TSM dCache

  • archive DB
  • one external library
  • no drive
  • no scrtach
  • library manager
  • all drives
  • all cartridges
  • temporary DCA

Mass Storage System MySql DB

  • drive information
  • library information
  • cartridge information
  • drive cartridge access record for every
  • peration
slide-13
SLIDE 13

Steinbuch Centre for Computing 13 Dorin Lobontu

Generate Tape Reports

01.06.2011

Statistics generator (perl program) Statistics:

  • throuput reports per drive, cartridge, library

and time unit

  • number of mounts per drive, cartridge,

library and time unit

  • number of concurrent drives in use per

library and time unit

  • error reports per drive, cartridge, library and

time unit

plot generator graphics Complete history af Drive Cartridge Access

  • amount of data written/read per mount
  • mout and unmount time
  • number of soft/hard error per mount

MySql DB Web

slide-14
SLIDE 14

Steinbuch Centre for Computing 14

Activity Reports

Dorin Lobontu 01.06.2011

DriveActivity LibraryActivity VolumeInfo Home

slide-15
SLIDE 15

Steinbuch Centre for Computing 15

Activity Reports

Dorin Lobontu 01.06.2011

DriveActivity LibraryActivity VolumeInfo Home

slide-16
SLIDE 16

Steinbuch Centre for Computing 16

Activity Reports

Dorin Lobontu 01.06.2011

DriveActivity LibraryActivity VolumeInfo Home

slide-17
SLIDE 17

Steinbuch Centre for Computing 17

Error Reports - per Library per month

Dorin Lobontu 01.06.2011

iwr_grau1_lto3(16 drives) iwr_grau1_lto4(8 drives)

slide-18
SLIDE 18

Steinbuch Centre for Computing 18

Tape Errors

Since November 2009 about 100 cartrigdes removed due to increasing correctable errors (~25 LTO3 from a total of ~5000 ~75LTO4 from a total of ~5000) 4 drives(from ~64) replaced due to bad performance and increasing error rate Lost 4 cartrigdes with internal label destroyed TSM: ANR8355E Error reading label for volume …

Dorin Lobontu 01.06.2011

slide-19
SLIDE 19

Steinbuch Centre for Computing 19 Dorin Lobontu 01.06.2011