Walking toward moving goalposts: agile management for evolving - - PowerPoint PPT Presentation

walking toward moving goalposts agile management for
SMART_READER_LITE
LIVE PREVIEW

Walking toward moving goalposts: agile management for evolving - - PowerPoint PPT Presentation

Walking toward moving goalposts: agile management for evolving systems Richard Golding, Theodore Wong IBM Almaden Research Center 16 June 2006 1 Main points Bolt-on management considered harmful Proponents of building self-management


slide-1
SLIDE 1

Walking toward moving goalposts: agile management for evolving systems

Richard Golding, Theodore Wong IBM Almaden Research Center

16 June 2006

1

slide-2
SLIDE 2

Main points

  • Bolt-on management considered harmful
  • Proponents of building self-management into

system

  • Bolt-on = management separate from and

external to system under management

  • An architecture pattern for building distributed

systems: layering and federation

  • Investigating simplest possible specifications

2

slide-3
SLIDE 3

Research direction

  • Self-management without a central

management authority

  • In a storage system as an example
  • Can we make a system administratorless?
  • Almost all storage systems use centralized

management

  • Metadata server
  • Exception is peer-to-peer... but most of those

are limited in function (e.g. read-only)

3

slide-4
SLIDE 4

Why is this decentralization worth while?

  • Some environments require it
  • Cooperating organizations—no single business

authority

  • On-demand provisioning from competing service

bureaux

  • Possible route to aligning vendor economic

incentives

  • Systems do get smarter over time
  • Currently: system vendors have incentive for

incompatible differentiation

  • Can a higher-level standardized interface help?

4

slide-5
SLIDE 5

Architecture: layering

  • For any given problem:
  • Delegate to lower level?
  • Use global view of higher level?

Higher layer Lower layer

5

slide-6
SLIDE 6

Simplest possible specification

  • Desire for human understandability
  • How well will it go if we start from minimum

possible?

  • Existing storage management started from

high-fidelity

  • Is directionally accurate sufficient? 90%

solution? Iterative tuning?

  • Can we mask local complexity?
  • Makes global decision algorithms easier
  • Smart local resource management

6

slide-7
SLIDE 7

K2 distributed storage system

  • Vehicle for research—not a product
  • No central administration; federate when global view

needed

  • Delegate function to as low a level as possible
  • Provide support to higher-level application

management

Resource pool AP Node

7

slide-8
SLIDE 8

Resource pools: external view

  • A virtual collection of storage
  • One per user or application
  • Each pool is independent
  • Specified by:
  • Capacity, Performance, Reliability
  • Reserve and limit
  • Initially: capacity = bytes; performance = IO/s;

reliability = MTTDL

8

slide-9
SLIDE 9

Implementing pools

  • Virtual pool backed by physical allocation pools
  • Pools contain objects for storing user data
  • Decision algorithm: how much to put where
  • Storage server enforces resource allocation

user application resource pool allocation pool storage server

requirements capacity performance reliability

virtual object (e.g. file)

requirements + usage

physical

  • bject

resource usage requirements + usage

physical

  • bject

uses backed by placed

  • n

resource usage

9

slide-10
SLIDE 10

Resource allocation decisions: normal

  • Normal case: online decision for one pool
  • Creating or modifying a pool’s requirements
  • Load balancing
  • Use constrained multidimensional bin packing
  • Constraints derived from reliability requirements

Pool resource requirement Available server resources Constraints: at least 2 servers at most 3 servers P 0.5P 0.5P Server 1 Server 2 Server 3 Server 4

10

slide-11
SLIDE 11

Resource allocation decisions: failure

  • Multi-pool assignment required
  • Backtracking search for feasible solution

(better is possible)

50 50 50 75 75 75 75 75 75 200 200 200 200 50 75 75 spare pool 1 (150) pool 2 (150) pool 3 (150) pool 4 (150)

11

slide-12
SLIDE 12

Resource allocation decisions: failure

  • Multi-pool assignment required
  • Backtracking search for feasible solution

(better is possible)

50 50 50 75 to 50 75 to 50 75 75 75 75 200 200 200 200 50 75 75 spare 75 75 50 50 50 pool 1 (150) pool 2 (150) pool 3 (150) pool 4 (150)

12

slide-13
SLIDE 13

Making decisions

  • Each resource pool is an independent group
  • APs elect a manager; manager watches over pool
  • Manager is disposable
  • Manager runs decision algorithm
  • All information in allocation pools
  • 1. vote

allocation pool candidate manager allocation pool allocation pool allocation pool candidate manager

  • 1. vote
  • 1. vote
  • 2. candidate

loses election

  • 2. candidate

wins election

  • 3. acquire
  • 4. heartbeat

13

slide-14
SLIDE 14

Local resource management

  • Goal: isolation between

pools

  • Capacity: just accounting
  • Performance: requires

scheduler

  • Tradeoff: performance
  • vs. efficiency
  • Provides reserve and

limit, plus fair sharing

  • Working to add cache,

network

L R F L R F L R F L R F L R F

sessions pools disk queue disk

14

slide-15
SLIDE 15

Contacts and information

  • Richard Golding rgolding@us.ibm.com

http://soe.ucsc.edu/~golding

  • Theodore Wong theowong@us.ibm.com

http://www.tmwong.org

15