>> Research Project 2 1 Content The organisation The - - PowerPoint PPT Presentation

research project 2 1 content the organisation the project
SMART_READER_LITE
LIVE PREVIEW

>> Research Project 2 1 Content The organisation The - - PowerPoint PPT Presentation

Concept S torage A rea N etwork H ealth S tatus M onitor Amsterdam Adriaan van der Zee 1 July 2009 Yanick de Jong >> Research Project 2 1 Content The organisation The project Storage infrastructure, physical and logical


slide-1
SLIDE 1

1

Concept

Storage Area Network Health Status Monitor

Adriaan van der Zee Yanick de Jong Research Project 2

Amsterdam 1 July 2009

>>

slide-2
SLIDE 2

2

Content

 The organisation  The project  Storage infrastructure, physical and logical  Problem conditions and indicators  Health status levels  Instant and historical status reports  Conclusions  Future work  Questions

slide-3
SLIDE 3

3

The organisation

 KLM IS delivers ICT-services to KLM’s business

processes

 Electronic booking, online check-in, …  Primarily database and web applications  Different platforms (UNIX, Linux, Windows) are

managed by their own departments

 A central Fibre Channel Storage Area Network

(SAN) with connected storage systems is managed by the SAN department

slide-4
SLIDE 4

4

The project

 Each department monitors its own systems to

support their own daily operations

 Therefore the SAN department does not see

storage related problems experienced by hosts

 A better understanding of the storage

infrastructure’s health is desired

slide-5
SLIDE 5

5

Problem definition

How can an alarm system be created that monitors the long term as well as immediate health of a Fibre Channel fabric?

 What indicators are relevant for the health of the Fibre Channel

fabric, and where can they be found?

 What are the important interrelations between such

indicators, and how can they be quantified?

 What kind of health status levels can be defined, and by

which indicators and thresholds should they be reached?

slide-6
SLIDE 6

6

Storage infrastructure (physical)

slide-7
SLIDE 7

7

Storage infrastructure (logical, 1)

 One or more hosts can share one or more HBAs, and each HBA can

have one or more host ports connected to a switch port. Such a connection is a host link.

 One or more hosts share one or more LUNs.  A fabric consists of one or more interconnected switches and

includes all connected host ports and storage ports as well.

 A switch has one or more switch blades, which each contain one or

more switch ports.

 An ISL is a link that connects a switch port to a switch port from

another switch, both switches are by definition in the same fabric.

 A storage subsystem contains one or more LUNs which can be

made available via one or more storage ports that are connected to a switch port. Such a connection is a storage link

slide-8
SLIDE 8

8

Storage infrastructure (logical, 2)

slide-9
SLIDE 9

9

Problem conditions

 Hardware failure  Capacity shortage  Reduced redundancy of load balanced

components poses an extra risk

 Can be caused by hardware failure

slide-10
SLIDE 10

10

Problem indicators

 DCB error  Path failure  Mirror out of sync  Frame discard  Over-utilisation  Hardware failure  Port latency

slide-11
SLIDE 11

11

Relating problem indicators (1)

 An established problem

can be related to other components

 A failed storage port on

the fabric can be related to a number of affected hosts

slide-12
SLIDE 12

12

Relating problem indicators (1)

 From some problem

indicators, more specific relations can be found

 A DCB error points to a

storage port

 A relation between

DCB errors and frame discards on a storage port can be confirmed or denied

slide-13
SLIDE 13

13

Health Status Levels (1)

 No problems  Problems with no impact  Limited impact  Severe impact

Per fabric, as well as in total

slide-14
SLIDE 14

14

Health Status Levels (2)

Fabric 1 Fabric 0 No problem s No impact Limited impact Severe impact No problems 1 2 4 8 No impact 2 4 8 16 Limited impact 4 8 16 32 Severe impact 8 16 32 64

slide-15
SLIDE 15

15

Instant Health Status

slide-16
SLIDE 16

16

Average Health Status

slide-17
SLIDE 17

17

Conclusions

 A relational model of components relevant for the

storage infrastructure has been developed

 Hardware failures, as well as (increased risks of)

capacity shortages are indicators that affect the health status of the storage infrastructure

 Health status levels are determined by their impact,

and the seperate fabric statuses are being combined

 Over longer time periods an average health status,

and the amount of activity is presented

slide-18
SLIDE 18

18

What's next?

 Implementation  Evaluation  Extra indicators and relations to enhance the

system

slide-19
SLIDE 19

19

Questions