IT As Service Meichun Hsu ( ) HP Labs China 2006/5/8 1 - - PowerPoint PPT Presentation

it as service
SMART_READER_LITE
LIVE PREVIEW

IT As Service Meichun Hsu ( ) HP Labs China 2006/5/8 1 - - PowerPoint PPT Presentation

IT As Service Meichun Hsu ( ) HP Labs China 2006/5/8 1 Outline of Talk Technology and Economic Trends Selected Research at HP Labs


slide-1
SLIDE 1

2006/5/8 1

IT As Service

Meichun Hsu (许玫君

玫君)

HP Labs China

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-2
SLIDE 2

2006/5/8 2

Outline of Talk

  • Technology and Economic Trends
  • Selected Research at HP Labs
  • Concluding Remarks

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-3
SLIDE 3

2006/5/8 3

IT as Service

Technology and Economic Trend

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-4
SLIDE 4

2006/5/8 4

Economic Trend of IT

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-5
SLIDE 5

2006/5/8 5

Virtual Resource Virtual Resource Virtual Resource Virtual Resource

Next Generation IT Center: Consolidation and Virtualization

Application Stacks (silos) Application Stacks

De-couples resources from consumption

Duplicated Infrastructure Consolidated Infrastructure

Virtualization Without Virtualization:

  • Duplicated, fragmented management
  • Duplicated, under-utilized resources
  • Vertically-integrated teams
  • Fragmented expertise

With Virtualization:

  • Consolidated management
  • Consolidated, highly-utilized resources
  • Specialized, nimble teams
  • Centers of expertise

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-6
SLIDE 6

2006/5/8 6

An Example IT Consolidation Case

  • 66% reduction in data centers

utilized

  • Reduction from 4 custom and 6

payroll PeopleSoft instances to 1 single global instance of PeopleSoft 8

  • 63% server reduction

− From 41 to 10 dedicated − From 0 to 5 utility servers

  • Cycle time improvement – some

applications have gone from development to production in 6 weeks

  • Almost a 4x reduction in

application infrastructure specific deployment costs

  • 60% reduction in storage (disk

space)

Before Now

2 3 1 1 7 TB 3 TB

Benefits

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-7
SLIDE 7

2006/5/8 7

The economics of managing a data center will fundamentally change

software is primary enabler

A Case Study

Existing Network (100 servers) − 20 Infrastructure, 80 Capacity Servers − 12 computer racks required − High maintenance requirements (2.5 FTE) Up-Front Hardware & Installation $700k Hosting ($1k/month/rack) $144k H/W support contracts + $100k Maintenance FTE + $200k Annual cost = $444k New (100 virtual servers) − 4 ProLiant DL580 servers manage 100 VMs − 2 MSA1000 SANs host storage − Low maintenance requirements (1.0 FTE) Up-Front Hardware & Installation $165k Hosting ($1k/month/rack) $ 12k HW support contracts + $ 5k Maintenance FTE + $100k Annual cost = $117k

Before New Consolidation Solution

Typical IT Ratios Today:

One person 20 servers One Person 2TB storage

Typical IT Ratios NGDC:

One person 200 servers One Person 200TB storage

(fewer people, more capacity) Source: IDC Source: HP analysis

Today Consolidated Solution

Net result of almost 75% annual cost savings, 60% labor savings, and 94% reduction in physical hw (servers/storage)

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-8
SLIDE 8

2006/5/8 8

The next-generation IT center enables & requires running IT as Utility Services

  • Provided by an infrastructure invisible to the end-

user

  • Standard interface and properties
  • Expected to be always available
  • Low cost to plug in to
  • Payments aligned with usage
  • Delivered as a service

Consolidation + Virtualization = Running IT as Utility Services

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-9
SLIDE 9

2006/5/8 9

IT as Utility Service - Selected Research

  • Capacity Planning for Server Consolidation
  • Data Center Automation
  • Market-based Resource Allocation
  • Technology for Support Services
  • Grid Computing

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-10
SLIDE 10

2006/5/8 10

Capacity Planning for Server Consolidation

Alex Zhang et al

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-11
SLIDE 11

2006/5/8 11

Server Consolidation Problem

  • Problem description:

− Given a set of old servers and their associated workload traces (e.g. CPU utilization time series), how can we “pack” them into a minimum set of new servers?

Servers before consolidation Servers after consolidation

VMM VMM

Output: Number of servers required and assignment

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-12
SLIDE 12

2006/5/8 12

Fitting Jigsaw Puzzles with Probabilistic Goals

t = 1 t = 2 t = 3 t = 4 t = T t = 1 t = 2 t = 3 t = 4 t = T

CPU Memory

Server 1 Server 2 Server 3 Server 4 Server 5 Servers 1+3+4 =New Server 1 Servers 2+5 = New Server 2

Probabilistic Capacity Limit

  • Probabilistic goals

−E.g. 5-minute CPU utilization < 50% to be satisfied with probability 0.995

  • Issues

− Can workloads be proportionally scaled? − Are Workloads additive? − Metrics independence? − What is the VMM Overhead?

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-13
SLIDE 13

2006/5/8 13

A High-Dimensional Bin-Packing Algorithm for Server Consolidation

The problem input data is

  • 1. The CPU utilization traces (objects) w(i, a, t), for server i = 1, 2, …, n, performance metric a = 1, 2, …, K, and time interval t = 1, 2,…, T;
  • 2. The bin capacities C(a) (same over all time intervals, but potentially different across performance metrics a = 1, 2, …, K ); and
  • 3. The probability

) (a α

(such as 0.99) for satisfying the bin capacity (potentially different across performance metrics a = 1, 2, …, K ). (1)

=

=

m j

j y

1

) ( Z Minimize

Subject to: (2)

. ,..., 2 , 1 and ,..., 2 , 1 for ), ( ) , ( m j n i j y j i x = = ≤

(3)

. ,... 2 , 1 for , 1 ) , (

1

n i j i x

m j

= =

=

(4p)

. ,..., 2 , 1 and ,..., 2 , 1 , ,..., 2 , 1 for ), ( ) , , ( ) , , ( ) , ( ) , , (

1

T t K a m j a C t a j v t a j M j i x t a i w

n i

= = = ≤ ⋅ − ⋅

=

(5) x(i, j) = 0 or 1 (binary variable). (6)

. variable) s (continuou 1 ) ( ≤ ≤ j y

(7p)

. ,..., 2 , 1 and ,..., 2 , 1 for , )] ( [ ) , , (

1

K a m j T a t a j v

T t

= = − 1 ≤

=

α

(8p)

. ,..., 2 , 1 and ,..., 2 , 1 , ,..., 2 , 1 for ), ( ) , , ( T t K a m j j y t a j v = = = ≤

(9p) v(j, a, t) = 0 or 1 (binary variable).

  • Optimization Model:

− Given: A set of old servers to be packed; bin capacity (on multiple metrics) with Probabilistic goals − High-Dimensional Bin-Packing Formulation

  • Achieved an implementation of complexity

O(n2 m T )

− n is number of old servers, − m is number of metrics, − T is number of time periods (m*T is the dimensionality) וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-14
SLIDE 14

2006/5/8 14

Data Center Automation

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-15
SLIDE 15

2006/5/8 15

Resource pools (physical in this example) Mechanisms and workflows Resource pools (virtual in this example) Services catalog (virtual resources)

Hierarchy of abstraction

Server utility Storage utility Application utility

Virtual resource pools Virtual system services Capacity planning Mapping Virtual system services

Application utility

Virtual resource pools Virtual system services Capacity planning Mapping Virtual system services

Data fabric Data fabric Datacenter service bus Datacenter service bus

The future Data Center is one that

  • ffers an intelligent

infrastructure that provides appropriate resources, that is highly automated

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-16
SLIDE 16

2006/5/8 16

Model Based Management Automation - Quartermaster Software Architecture

Quartermaster Core

  • Model and instance repository

Quartermaster Model Manager(s)

  • Model creation and management
  • Model translation

Quartermaster Tools

  • System composition
  • Capacity management
  • Resource Allocation
  • Reservation/Scheduling

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-17
SLIDE 17

2006/5/8 17

Thermal Cooling

dynamic data center thermal management

  • Power density becoming critical – affects reliability and cost
  • Use modeling and measurement to understand thermal

characteristics of data centers − Saving 25% today

  • Exploit this for dynamic resource allocation and proper

provisioning

− Today 40 1u boxes in a rack in a data center

− creates 10Kw rack − cooling in by intuition

  • Smart cooled data center - shift workload to cool areas or cool where the workload is
  • In a 15 Mw data center, we can save $1Million/year

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-18
SLIDE 18

2006/5/8 18

Tycoon

A Market-Based Resource Allocation System

Kevin Lai, Lars Rasmusson, Li Zhang, Eytan Adar, and Bernardo Huberman

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-19
SLIDE 19

2006/5/8 19

Resource Pricing and Allocation in Computing Utility

  • Question: How to allocate host resources to each client?
  • Tycoon: a market-based resource allocation
  • you get what you pay for
  • price varies depending on supply and demand
  • Based on the mechanism of continuous bid

Host

Resource Broker / VMM

Clients (e.g. Linux VMs) share a physical host, each getting a slice of physical resource Host managed by resource broker (e.g. embedded in Xen)

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-20
SLIDE 20

2006/5/8 20

Market-based Resource Allocation – Basic Idea

  • Host allocates resources to maximize credits($) received
  • Clients have a limited budget of credits ($)
  • Clients spend these credits to bid for “weight” in resource

allocation

  • 5. Adjust bids based on actual

performance if necessary

  • 6. Pay only what is used

Host 0

  • Res. Broker

client agent

  • 2. Calculate bidst

to maximize own utility

client A app client B app

  • 1. Host communicate total

bids received during last time period

  • 4. Allocate resources

based on bids received; a client gets resource proportional to bid $

  • 3. Submit bids to Host

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-21
SLIDE 21

2006/5/8 21

Example Result – Effect of Raising Bids

  • progress of a job of a

scene being rendered on cluster

  • frames are distributed to

different hosts in cluster

  • user changes bid by

changing bidding interval

  • n all hosts at 185s
  • Host begins reallocating

in <1s

Initial bid was $10

  • ver 30,000

seconds, resulting in being allocated small% of a server Updated bid to $10

  • ver 300 seconds,

resulting in increase in throughput

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-22
SLIDE 22

2006/5/8 22

Tycoon Pulse – Current Deployment

Deployed on ~40 x86 nodes running XEN, distributed between America, Europe and Asia

http://tycoon.hpl.hp.com/pulse/

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-23
SLIDE 23

2006/5/8 23

Mining Large Knowledge Base for Issue Resolution

George Foreman, Jaap Suermondt et al

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-24
SLIDE 24

2006/5/8 24

See some data to appreciate the problem.

Very hard to determine top 10 issues by browsing, as done today. Just try. ~20,000 calls on iPAQs.

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-25
SLIDE 25

2006/5/8 25

Clustering reveals major themes.

Clustering reveals major themes. Define categories you care about. For each category: search & assign some training examples

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-26
SLIDE 26

2006/5/8 26

Ongoing - automated

Enable automated reporting, trending, anomaly detection on an ongoing basis

Initial Call Data Explore Train Classify Quantify New Call Data

Monthly results

Only once

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-27
SLIDE 27

2006/5/8 27

Ł Trend Analysis (with no human effort each month!)

6/20 6/27 7/4 7/11 7/18 7/25

synchronizing damage battery hung screen issues network misc

  • rdering

question

US Independence Holiday Ł iPAQ damage

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-28
SLIDE 28

2006/5/8 28

Grid Computing

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-29
SLIDE 29

2006/5/8 29

Grid in the infrastructure

  • Next Generation Data Center allows for the

“service-ification” and virtualization of hardware and software resources

  • Grid enables virtualized, distributed, federated

“data center” that is flexible to changing demands

  • Grid allows for the secure sharing of these

resources, in the form of services, among members of virtual organizations

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-30
SLIDE 30

2006/5/8 30

HP’s Open Source Implementations

  • f Grid Service Standards
  • WSRF, WSN, and WSDM in Apache

− An open source implementation of the Web Services Resource Framework, Web Services Notification, and Web Services Distributed Management (Management Using Web Services) family

  • f specifications

− Known as Apache WSRF, Pubscribe, and MUSE − (Just) now reached version 1.0! − For more information on these projects

  • http://www.apache.org/dist/ws/wsrf/1.0
  • http://www.apache.org/dist/ws/pubscribe/1.0
  • http://www.apache.org/dist/ws/muse/1.0

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-31
SLIDE 31

2006/5/8 31

TPM-enabled Grid Security

  • Trusted Platform Module (TPM) – an
  • pen standard-based cryptographic

hardware − Provide secured storage of cryptographic keys

  • Integrating TPM with Grid Security

Infrastructure (GSI) − Enhanced trust between Grid resources and users

User proxy Process resource proxy resource proxy Process resource proxy Process TP M TP M

− TPM on server enables remote attestation

TP M

− TPM on client protects keys used by user proxy We are working with China Grid team to integrate TPM into GSI

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-32
SLIDE 32

2006/5/8 32

  • We are building an adaptive monitoring service that will interact with

existing monitoring solutions

Adaptive Monitoring Solution

  • China Grid collaboration concerns coupling China Grid monitoring

to monitoring service.

− Focus: data access models and model parser

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-33
SLIDE 33

2006/5/8 33

Concluding Remarks

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-34
SLIDE 34

2006/5/8 34

Summary

  • Information technologies are increasingly linked to

service enabling in general

  • IT function within a business is becoming a utility

service, fueling research in virtualization, consolidation, resource allocation and pricing, secured sharing, monitoring and scalable support technologies

  • HP Labs is pursuing research strategies to

advance IT’s role in service enabling

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-35
SLIDE 35

2006/5/8 35

HP Labs worldwide

http://www.hpl.hp

  • Extensive collaborations with

universities and research institutes worldwide

  • Research laboratories

− U.S. − U.K. − Israel − Japan − India − China

Bristol Japan Israel Palo Alto India China וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-36
SLIDE 36

2006/5/8 36

HP Labs web site

http://www.hpl.hp.com

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ

slide-37
SLIDE 37

2006/5/8 37

וֹכּמּף ףץ٪ّ٠מּَِ ٩٭۶ףוֹ٭٩ץף ێ ۖףףףِِ