Autonomic Grid Computing: Concepts, Infrastructure and Applications - - PDF document

autonomic grid computing concepts infrastructure and
SMART_READER_LITE
LIVE PREVIEW

Autonomic Grid Computing: Concepts, Infrastructure and Applications - - PDF document

Autonomic Grid Computing: Concepts, Infrastructure and Applications The Applied Software Systems Laboratory ECE/CAIP, Rutgers University http://www.caip.rutgers.edu/TASSL htt // i t d /TASSL (Ack: NSF, DoE, NIH) Outline Pervasive


slide-1
SLIDE 1

1

Autonomic Grid Computing: Concepts, Infrastructure and Applications

The Applied Software Systems Laboratory ECE/CAIP, Rutgers University htt // i t d /TASSL http://www.caip.rutgers.edu/TASSL (Ack: NSF, DoE, NIH)

Outline

  • Pervasive Grid Environments - Unprecedented

Opportunities

  • Pervasive Grid Environments - Unprecedented Challenges
  • Autonomic Grid Computing
  • Project AutoMate @ TASSL, Rutgers University – Enabling

Autonomic Applications in Pervasive Grid Environments Autonomic Applications in Pervasive Grid Environments

  • An Illustrative Application
  • Concluding Remarks
slide-2
SLIDE 2

2 Grid Computing – The Hype!

Grid Computing

By M. Mitchell Waldrop May 2002 Hook enough computers together and what do you get? A new kind of utility that offers supercomputer processing on tap. Is Internet history about to repeat itself?

Defining Grid Computing …

  • “… a concept, a network, a work in progress, part hype and part reality, and it’s

increasingly capturing the attention of the computing community …” A. Applewhite, IEEE DS-Online

grids are networks for computation – they are thinking number-crunching entities … grids are networks for computation they are thinking, number crunching entities. Like a decentralized nervous system, grids consist of high-end computers, servers, workstations, storage systems, and databases that work in tandem across private and public networks …” O. Malik, Red Herring

  • “… a kind of hyper network that links computers and data storage owned by different

groups so that they can share computing power…” USA Today

  • ‘The Matrix’ crossed with ‘Minority Report’ … D. Metcalfe (quoted in Newsweek)
  • “… use clusters of personal computers, servers or other machines. They link together

to tackle complex calculations In part grid computing lets companies harness their to tackle complex calculations. In part, grid computing lets companies harness their unused computing power, or processing cycles, to create a type of supercomputer …”

  • J. Bonasia, Investor’s Business Daily
  • “… grid computing links far-flung computers, databases, and scientific instruments
  • ver the public internet or a virtual private network and promises IT power on

demand…… All a user has to do is submit a calculation to a network of computers linked by grid-computing middleware. The middleware polls a directory of available machines to see which have the capacity to handle the request fastest…” A. Ricadela, Information Week

slide-3
SLIDE 3

3

The Grid Vision

  • Imagine a world

– in which computational power (resources, services, data, etc.) is as readily available as electrical power y p – in which computational services make this power available to users with differing levels of expertise in diverse areas – in which these services can interact to perform specified tasks efficiently and securely with minimum of human intervention

  • on-demand, ubiquitous access to computing, data, and services
  • new capabilities constructed dynamically and transparently from

distributed services

– a large part of this vision was originally proposed by Fenando Corbato (The Multics Project, 1965, www.multicians.org)

Grid Idea By A Simple Analogy

Some power stations dispersed One consumer dispersed everywhere produce the electrical power The produced power is distributed over a power network One consumer wants to access to that power Now the user is able to access to the power grid He/she comes to an agreement with the electrical society The electrical society provides for a new socket in which the user can plug

  • The user:

– Does not need to know anything about what stays beyond the socket. – Can absorb all the power he wants according to the agreement

  • The power society

– Can modify production technologies at any moment – Manages the power network as it wants – Defines terms and conditions of the agreement

  • Ack. F. Scibilia
slide-4
SLIDE 4

4

In the same way . . .

Some computing farms produce the computing Computing power is made available over the Internet One user wants to access to intensive

  • The user:

the computing power computational power He/she comes to an agreement with some society that

  • ffers grid services

Now the user accesses to grid facilities as a grid user The society will provide for grid facilities allowing the user to access to its grid resources and providing for proper tools

  • The user:

– Does not need to know what stays beyond its user interface – Can access to a massive amount of computational power through a simple terminal

  • The society:

– Can extend grid facilities at any moment – Manages the architecture of the grid – Defines policies and rules for accessing to grid resources

  • Ack. F. Scibilia

What about Grid Computing

Grid Computing paradigm is an emerging way of thinking distributed environments in a global scale infrastructure to:

  • Share data
  • Distribute computation
  • Coordinate works
  • Access to remote

instrumentation

infrastructure to:

  • Ack. F. Scibilia
slide-5
SLIDE 5

5

Key Enablers of Grid Computing - Exponentials

  • Network vs. computer performance

– Computer speed doubles every 18 months S – Storage density doubles every 12 months – Network speed doubles every 9 months – Difference = order of magnitude per 5 years

  • 1986 to 2000

– Computers: x 500

Scientific American (Jan-2001)

– Computers: x 500 – Networks: x 340,000

  • 2001 to 2010

– Computers: x 60 – Networks: x 4000 “When the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances” (George Gilder)

Ack: I. Foster

Why Computing Grids now?

  • Because the amount of

computational power needed by many applications is getting very huge

Thousands of CPUs working at the same time

  • n the same task

huge

  • Because the amount of data requires

massive and complex distributed storage systems

  • To make easier the cooperation of

people and resources belonging to different organizations

From hundreds of Gigabytes to Petabytes (1015) produced by the same application. People of several organizations working together to achieve a common goal

  • To access to particular

instrumentation that is not easily reachable in a different way

  • Because it is the next of step in the

evolution of distribution of computation

Because it cannot be moved or replicated or its cost is too much expensive. To create a marketplace of computational power and storage over the Internet

  • Ack. F. Scibilia
slide-6
SLIDE 6

6

Who is interested in Grids?

Research community, to carry

  • ut important results from

experiments that involve many d l d i and many people and massive amounts of resources Enterprises that can have huge computation without the need for extending their current informatics infrastructure Businesses, which can provide for computational power and data storage against a contract or for rental

  • Ack. F. Scibilia

Properties of Grids

  • Transparency

– The complexity of the Grid architecture is hidden to the final user – The user must be able to use a Grid as it was a unique virtual supercomputer The user must be able to use a Grid as it was a unique virtual supercomputer – Resources must be accessible setting their location apart

  • Openness

– Each subcomponent of the Grid is accessible independently of the other components

  • Heterogeneity

– Grids are composed by several and different resources

  • Scalability
  • Scalability

– Resources can be added and removed from the Grid dynamically

  • Fault Tolerance

– Grids must be able to work even if a component fails or a system crashes

  • Concurrency

– Different processes on different nodes must be able to work at the same time

  • Ack. F. Scibilia
slide-7
SLIDE 7

7

Challenged Issues in Grids (i)

  • Security

– Authentication and authorization of users – Confidentiality and not repudiation Confidentiality and not repudiation

  • Information Services

– To discover and monitor Grid resource – To check for health-status of resources – As basis for decision making processes

  • File Management

– Creation, modification and deletion of files – Replication of files to improve access performances – Ability to access to files without the need to move tham locally to the code

  • Administration

– Systems to administer Grid resource respecting local administration policies

  • Ack. F. Scibilia

Challenged Issues in Grids (ii)

  • Resource Brokering

– To schedule tasks across different resources To make optimal or s boptimal decisions – To make optimal or suboptimal decisions – To reserve (in the future) resources and network bandwidth

  • Naming services

– To name resources in un unambiguous way in the Grid scope

  • Friendly User Interfaces

– Because most of Grid users have nothing to do with computing science (physicians chemistries ) science (physicians, chemistries . . .) – Graphical User Interfaces (GUIs) – Grid Portals (very similar to classical Web Portals) – Command Line Interfaces (CLIs) for experts

  • Ack. F. Scibilia
slide-8
SLIDE 8

8

The Grid

“Resource sharing & coordinated problem solving in

dynamic, multi-institutional virtual organizations”

  • 1. Enable integration of distributed resources
  • 2. Using general-purpose protocols & infrastructure
  • 3. To achieve better-than-best-effort service

Ack: I. Foster

Virtual Organizations (VOs)

  • A Virtual Organization is a collection of people and resources that work in

a coordinated way to achieve a common goal

  • To use Grid facilities, any user MUST subscribe to a Virtual Organization

To use Grid facilities, any user MUST subscribe to a Virtual Organization as member

  • Each people or resource can be member of more VOs at the same time
  • Each VO can contain people or resources belonging to different

administration domains

  • Ack. F. Scibilia
slide-9
SLIDE 9

9

Virtual Laboratory

  • A new way of

cooperating in experiments experiments

  • A platform that allow

scientists to work together on in the same “Virtual” Laboratory

Grid Infrastructre People Devices ? ? Instruments

  • Strictly correlated to

Grids and Virtual Organizations

Computing resources Data

  • Ack. F. Scibilia

Profound Technical Challenges

How do we, in dynamic, scalable, multi-institutional, computationally & data- rich settings:

  • Negotiate & manage trust
  • Access & integrate data
  • Construct & reuse workflows
  • Plan complex computations
  • Detect & recover from failures
  • Capture & share knowledge

R t & f li i

  • Support collaborative work
  • Define primitive protocols
  • Build reusable software
  • Package & deliver software
  • Deploy & operate services
  • Operate infrastructure

U d i f t t

  • Represent & enforce policies
  • Achieve end-to-end QoX
  • Move data rapidly & reliably
  • Upgrade infrastructure
  • Perform troubleshooting
  • Etc., etc., etc.

Ack: I. Foster

slide-10
SLIDE 10

10

Globus Alliance

  • The Globus Alliance

– Is a community of people and organizations involved in projection and development of Grid technologies f f – University of Illinois, Argonne National Laboratory, University of Edinburgh, EPCC, etc…

  • The Globus Toolkit (GT)

– It is a standard de facto – It is a bag of services – At its fourth release (GT4) – Now adopts Web Services interfaces p

  • The Global Grid Forum

– It is a forum of grid researchers – Works to define standards and protocols on grid technologies – It is divided in Working Groups (WGs) – http://www.ggf.org

  • Ack. F. Scibilia

Globus Services

P WS Delegation Data Replication Grid Telecontrol Protocol Community Scheduler Framework WebMDS Python WS Core Credential Management Pre-WS Authentication Authorization GridFTP Replica Location

Pre-WS Grid Resource Allocation & Management

Monitoring & Discovery (MDS2) eXtensible IO C Common Libraries Authentication Authorization Pre-WS Authentication Authorization Reliable File Transfer OGSA-DAI

Grid Resource Allocation & Management

Workspace Management Trigger Index Java WS Core C WS Core WS Components Non-WS Components Management p Security Data Management Execution Management Information Services Common Runtime eXtensible IO (XIO) Components Core GT Component: public interfaces frozen between incremental releases; best effort support Contribution/Tech Preview: public interfaces may change between incremental releases Deprecated Component: not supported; will be dropped in a future release

  • Ack. F. Scibilia
slide-11
SLIDE 11

11

Hourglass Reference Model

  • Fabric layer:

– Manages resources locally

  • Connectivity

– Network communications (IP DNS etc )

Application

Network communications (IP, DNS etc.) – Security: authentication, authorization, certification – Single Sign On

  • Resource

– Allocation, reservation and monitoring of resources – Data access and transport – Gathering of information on resources

  • Collective

– View of services as collections – Discovery and allocation

Connectivity Resource Collective

sco e y a d a ocat o – Replica and catalogue of data – Management of workflow

  • Application

– User applications – Tools and interfaces

Fabric

  • Ack. F. Scibilia

Application Areas (i)

  • Physicical Science Applications

– GryPhiN, http://www.gryphin.org/ Particle Ph sics DataGrid (PPDG) http //grid fnal go /ppdg/ – Particle Physics DataGrid (PPDG), http://grid.fnal.gov/ppdg/ – GridPP, http://www.gridpp.ac.uk/ – AstroGrid, http://www.astrogrid.org/

  • Life Science Applications

– Protein Data Bank (PDB), http://www.rcsb.org/pdb/Welcome.do Biomedical Informatics Research Network (BIRN) – Biomedical Informatics Research Network (BIRN), http://www.nbirn.net/ – Telemicroscopy, http://ncmir.ucsd.edu/ – myGrid, http://www.mygrid.org.uk/

  • Ack. F. Scibilia
slide-12
SLIDE 12

12

Application Areas (ii)

  • Engineering Oriented Applications

– NASA Information Power Grid (IPG), http://www.ipg.nasa.gov/ Grid Enabled Optimi ation and Design Search for Engineering – Grid Enabled Optimization and Design Search for Engineering (GEODISE), http://www.geodise.org/

  • Commercial Applications

– Butterfly Grid, http://www.butterfly.net/ – Everquest, http://www.everquest.com/

  • E-Utility

– ClimatePrediction experiment, http://www.climateprediction.net/

  • Ack. F. Scibilia

An Example: SETI@Home

  • The SETI@Home project

– Searches for Extra Terrestrial Intelligence (SETI)

C ll ti l f i i f

  • Collecting samples of microwaves coming from

the Universe through a telescope

  • Scheduling tasks spread over Grid nodes to

analyse these samples

– Uses desktop computers as Grid nodes – Working nodes are dynamically added and removed to the grid – The owner of the desktop machine decides how contribute to the project offering its j g computational power

  • To contribute to the project

– http://setiathome.berkeley.edu/ – Download and install the client – Your machine will work as a Grid node when is idle (in place of your screensaver)

  • Ack. F. Scibilia
slide-13
SLIDE 13

13

A Global Grid Community

Ack: I. Foster

The Grid

“Resource sharing & coordinated problem

solving in dynamic, multi-institutional virtual

  • rganizations”

The original Grid concept has moved on!

Ack: I. Foster

slide-14
SLIDE 14

14

Emerging Pervasive Information/Computation Infrastructures

  • Explosive growth in computation, communication,

information and integration technologies

– Computing, communication, data is ubiquitous Computing, communication, data is ubiquitous

  • Pervasive ad hoc “anytime-anywhere” access

environments

– Ubiquitous access to information – Peers capable of producing/consuming/processing information at different levels and granularities – Embedded devices in clothes, phones, cars, mile-markers, traffic lights, lamp posts, medical instruments …

  • “On demand” computational/storage resources, services

Pervasive Computational Ecosystems and Dynamic Information Driven Applications

Resources discovered, negotiated, co-allocated

  • n-the-fly. components

deployed Experts query, configure resources

Computers, Storage, Instruments, ...

Experts interact and collaborate using ubiquitous and pervasive portals

Applications & Services

Model A Model B

Laptop PDA Computer

Scientist Scientist Resources

Component s write into the archive Experts monitor/interact with/interrogate/steer models (“what if” scenarios,…). Application notifies experts of interesting events. Components dynamically composed. “WebServices” discovered & invoked.

Data Archive & Sensors

Data Archives Sensors, Non- Traditional Data Sources

Experts mine archive, match real-time data with history Real-time data assimilation/injection (sensors, instruments, experiments, data archives), Automated mining & matching

slide-15
SLIDE 15

15

Pervasive Grid Environments - Unprecedented Opportunities

  • Pervasive Grids Environments

– Seamless, secure, on-demand access to and aggregation of, geographically distributed computing, communication and information resources

C t t k d t hi i t t b t i i t

  • Computers, networks, data archives, instruments, observatories, experiments,

sensors/actuators, ambient information, etc.

– Context, content, capability, capacity awareness – Ubiquity and mobility

  • Knowledge-based, information/data-driven, context/content-aware

computationally intensive, pervasive applications

– Symbiotically and opportunistically combine services/computations, real- time information, experiments, observations, and to manage, control, predict, adapt, optimize, … predict, adapt, optimize, …

  • Crisis management, monitor and predict natural phenomenon, monitor and

manage engineering systems, optimize business processes

  • A new paradigm ?

– seamless access

  • resources, services, data, information, expertise, …

– seamless aggregation – seamless (opportunistic) interactions/couplings

Information-driven Management of Subsurface Geosystems: The Instrumented Oil Field (with UT-CSM, UT-IG, OSU, UMD, ANL) Detect and track changes in data during production. Invert data for reservoir properties. Detect and track reservoir changes. Assimilate data & reservoir properties into the evolving reservoir model. Use simulation and optimization to guide future production.

Data Driven Model Driven

slide-16
SLIDE 16

16

Landfills Landfills Oilfields Oilfields

Vision: Diverse Geosystems – Similar Solutions

Models Models Simulation Simulation Data Data Control Control

Underground Pollution Underground Pollution Undersea Reservoirs Undersea Reservoirs

Management of the Ruby Gulch Waste Repository (with UT-CSM, INL, OU)

  • Ruby Gulch Waste

Repository/Gilt Edge Mine, South Dakota

– ~ 20 million cubic yard of – ~ 20 million cubic yard of waste rock – AMD (acid mine drainage) impacting drinking water supplies

  • Monitoring System

– Multi electrode resistivity system (523) – Flowmeter at bottom of dump – Weather-station – Manually sampled chemical/air ports in wells – Approx 40K measurements/day (523)

  • One data point every 2.4

seconds from any 4 electrodes

– Temperature & Moisture sensors in four wells

“Towards Dynamic Data-Driven Management of the Ruby Gulch Waste Repository,” M. Parashar, et al, DDDAS Workshop, ICCS 2006, Reading, UK, LNCS, Springer Verlag, Vol. 3993, pp. 384 – 392, May 2006.

slide-17
SLIDE 17

17 Data-Driven Forest Fire Simulation (U of AZ)

  • Predict the behavior and

spread of wildfires (intensity, propagation speed and direction, modes of spread)

– based on both dynamic and static environmental and vegetation conditions – factors include fuel characteristics and configurations, chemical reactions balances reactions, balances between different modes of hear transfer, topography, and fire/atmosphere interactions.

“Self-Optimizing of Large Scale Wild Fire Simulations,” J. Yang*, H. Chen*, S. Hariri and M. Parashar, Proceedings of the 5th International Conference on Computational Science (ICCS 2005), Atlanta, GA, USA, Springer-Verlag, May 2005.

Many Application Areas ….

  • Hazard prevention, mitigation and response

– Earthquakes, hurricanes, tornados, wild fires, floods, landslides, tsunamis, terrorist attacks

  • Critical infrastructure systems

– Condition monitoring and prediction of future capability

  • Transportation of humans and goods

– Safe, speedy, and cost effective transportation networks and vehicles (air, ground, space)

  • Energy and environment

– Safe and efficient power grids, safe and efficient operation of regional collections

  • f buildings
  • Health

– Reliable and cost effective health care systems with improved outcomes

E t i id d i i ki

  • Enterprise-wide decision making

– Coordination of dynamic distributed decisions for supply chains under uncertainty

  • Next generation communication systems

– Reliable wireless networks for homes and businesses

  • … … … …
  • Report of the Workshop on Dynamic Data Driven Applications Systems, F. Darema et

al., March 2006, www.dddas.org Source: M. Rotea, NSF

slide-18
SLIDE 18

18

Outline

  • Pervasive Grid Environments - Unprecedented Opportunities
  • Pervasive Grid Environments - Unprecedented Challenges

Pervasive Grid Environments Unprecedented Challenges

– System, Information, Application Uncertainty

  • Autonomic Grid Computing
  • Project AutoMate @ TASSL, Rutgers University – Enabling Autonomic

Applications in Pervasive Grid Environments

  • An Illustrative Application
  • Concluding Remarks

Pervasive Grid Applications – Unprecedented Challenges: Uncertainty

  • System Uncertainty

– Very large scales – Ad hoc structures/behaviors

p2p hierarchical etc architect res

  • Information Uncertainty

– Availability, resolution, quality of information – Devices capability, operation,

  • p2p, hierarchical, etc, architectures

– Dynamic

  • entities join, leave, move, change

behavior

– Heterogeneous

  • capability, connectivity, reliability,

guarantees, QoS

– Lack of guarantees

  • components, communication

p y, p , calibration – Trust in data, data models – Semantics

  • Application Uncertainty

– Dynamic behaviors

  • space-time adaptivity

p

– Lack of common/complete knowledge (LOCK)

  • number, type, location, availability,

connectivity, protocols, semantics, etc.

– Dynamic and complex couplings

  • multi-physics, multi-model, multi-

resolution, ….

– Dynamic and complex (ad hoc,

  • pportunistic) interactions

– Software/systems engineering issues

  • Emergent rather than by design
slide-19
SLIDE 19

19 Pervasive Grid Computing – Research Issues, Opportunities

  • Programming systems/models for data integration and runtime self-

management

– components and compositions capable of adapting behavior, interactions and information correctness consistency performance quality-of-service constraints – correctness, consistency, performance, quality-of-service constraints

  • Content-based asynchronous and decentralized discovery and access

services

– semantics, metadata definition, indexing, querying, notification

  • Data management mechanisms for data acquisition and transport with

real time, space and data quality constraints

– high data volumes/rates heterogeneous data qualities sources high data volumes/rates, heterogeneous data qualities, sources – in-network aggregation, integration, assimilation, caching

  • Runtime execution services that guarantee correct, reliable execution

with predictable and controllable response time

– data assimilation, injection, adaptation

  • Security, trust, access control, data provenance, audit trails, accounting

Outline

  • Pervasive Grid Environments - Unprecedented

Opportunities

  • Pervasive Grid Environments - Unprecedented Challenges
  • Autonomic Grid Computing
  • Project AutoMate @ TASSL, Rutgers University – Enabling

Autonomic Applications in Pervasive Grid Environments Autonomic Applications in Pervasive Grid Environments

  • An Illustrative Application
  • Concluding Remarks
slide-20
SLIDE 20

20 Integrating Biology and Information Technology: The Autonomic Computing Metaphor

  • Current programming paradigms, methods, management

tools are inadequate to handle the scale, complexity, dynamism and heterogeneity of emerging systems

  • Nature has evolved to cope with scale, complexity,

heterogeneity, dynamism and unpredictability, lack of guarantees

– self configuring, self adapting, self optimizing, self healing, self protecting, highly decentralized, heterogeneous architectures that work !!!

  • Goal of autonomic computing is to build a self-managing

system that addresses these challenges using high level guidance

“Autonomic Computing: An Overview,” M. Parashar, and S. Hariri, Hot Topics, Lecture Notes in Computer Science, Springer Verlag, Vol. 3566, pp. 247-259, 2005.

Adaptive Biological Systems

  • The body’s internal mechanisms

continuously work together to maintain essential variables within physiological essential variables within physiological limits that define the viability zone

  • Two important observations:

– The goal of the adaptive behavior is directly linked with the survivability – If the external or internal environment h th t t id it pushes the system

  • utside

its physiological equilibrium state the system will always work towards coming back to the original equilibrium state

slide-21
SLIDE 21

21

Ashby’s Ultrastable System

Environment Essential Variables Motor channels Sensor channels Reacting Part R Step Mechanisms/Input Parameter S

Autonomic Computing Characteristics (IBM)

By IBM

slide-22
SLIDE 22

22

Autonomic Computing Architecture

  • Autonomic elements (components/services)

– Responsible for policy-driven self-management of individual components

  • Relationships among autonomic elements

– Based on agreements established/maintained by autonomic elements – Governed by policies – Give rise to resiliency, robustness, self-management of system system

Autonomic Computing – Conceptual Architecture (IBM)

slide-23
SLIDE 23

23

Autonomic Elements: Structure

  • Fundamental atom of the

architecture

  • Ack. IBM

– Managed element(s)

  • Database, storage system,

server, software app, etc.

– Plus one autonomic manager

  • Responsible for:

– Providing its service – Managing its own behavior in

E S Monitor Analyze Execute Plan Knowledge

Autonomic Manager

g g accordance with policies – Interacting with other autonomic elements

An Autonomic Element Managed Element

E S

Autonomic Elements: Interactions

  • Relationships

– Dynamic, ephemeral,

  • Ack. IBM

y , p ,

  • pportunistic

– Defined by rules and constraints – Formed by agreement

  • May be negotiated

Full spectrum – Full spectrum

  • Peer-to-peer
  • Hierarchical

– Subject to policies

slide-24
SLIDE 24

24

Autonomic Systems: Composition of Autonomic Elements

  • Ack. IBM

Reputation Authority Network Event Correlator Monitor Workload Manager Server Server Negotiator Broker Provisioner Registry Broker Sentinel Arbiter Planner Workload Manager Database Network Registry Database Monitor Server Storage Storage Storage Sentinel Monitor Aggregator Monitor Database

Autonomic Computing: Research Issues and Challenges Scale, Complexity, Dynamism, Heterogeneity, Unreliability, Uncertainty

  • Defining autonomic elements

g

– Programming paradigms and development models/frameworks

  • Autonomic element definition and construction
  • Rule definition, representation, and enforcement
  • Constructing autonomic systems/applications

– Composition, coordination, interactions models and infrastructures

  • Dynamic (rule-based) configuration, execution and optimization
  • Dynamic (opportunistic) interactions, coordination, negotiation
  • Execution and management

Execution and management

– Runtime and middleware services

  • Discovery, Coordination, Messaging, Security, Management, …

– Security, protection – Fault tolerance, reliability, availability, …

  • Policies, Learning, AI, …
slide-25
SLIDE 25

25

Outline

  • Pervasive Grid Environments - Unprecedented Opportunities
  • Pervasive Grid Environments - Unprecedented Challenges,

Pervasive Grid Environments Unprecedented Challenges, Opportunities

  • Autonomic Grid Computing
  • Project AutoMate @ TASSL, Rutgers University – Enabling Autonomic

Applications in Pervasive Grid Environments

  • An Illustrative Application
  • Concluding Remarks

Applications

Programming System

(Programming Model + Abstract Machine)

Applications

Programming System

(Programming Model + Abstract Machine)

Programming Pervasive Grid Systems

– Programming System

  • programming model,

languages/abstraction – syntax + semantics

Grid Middleware Infrastructure

(Programming Model Abstract Machine)

Virtualization

(Create and manage VOs and VMs)

Abstraction

(Support abstract behaviors/assumptions)

Virtual Machine Virtual Machine Virtual Machine

Grid Middleware Infrastructure

(Programming Model Abstract Machine)

Virtualization

(Create and manage VOs and VMs)

Abstraction

(Support abstract behaviors/assumptions)

Virtualization

(Create and manage VOs and VMs)

Abstraction

(Support abstract behaviors/assumptions)

Virtual Machine Virtual Machine Virtual Machine

– entities, operations, rules of composition, models of coordination/communication

  • abstract machine, execution context

and assumptions

  • infrastructure, middleware and

runtime – Hide v/s expose uncertainty?

Organization Organization Organization

Virtual Organization

Organization Organization Organization Organization Organization Organization Organization Organization Organization

Virtual Organization

Hide v/s expose uncertainty?

  • virtual homogeneity, ease of

programming

  • robustness
  • cross-layer adaptation and
  • ptimization

– the inverted stack …

“Conceptual and Implementation Models for the Grid,” M. Parashar and J.C. Browne, Proceedings of the IEEE, Special Issue on Grid Computing, IEEE Press, Vol. 19, No. 3, March 2005.

slide-26
SLIDE 26

26 Autonomic Grid Computing – A Holistic Approach

  • Computing has evolved and matured to provide specialized solutions to

satisfy relatively narrow and well defined requirements in isolation

– performance, security, dependability, reliability, availability, throughput, pervasive/amorphous automation reasoning etc pervasive/amorphous, automation, reasoning, etc.

  • In case of pervasive Grid applications/environments, requirements,
  • bjectives, execution contexts are dynamic and not known a priori

– requirements, objectives and choice of specific solutions (algorithms, behaviors, interactions, etc.) depend on runtime state, context, and content – applications should be aware of changing requirements and executions contexts and to respond to these changes are runtime

  • Autonomic Grid computing - systems/applications that self-manage

– use appropriate solutions based on current state/context/content and based on specified policies – address uncertainty at multiple levels – asynchronous algorithms, decoupled interactions/coordination, content-based substrates

Project AutoMate: Enabling Autonomic Applications

Sesame/D Autonomic Grid Applications Programming System Autonomic Components, Dynamic Composition, Opportunistic Interactions, Collaborative Monitoring/ Control y Accord Programming Framework Rudder Coordination Middleware DAIS Protection Service Decentralized Coordination Engine Agent Framework, Decentralized Reactive Tuple Space Semantic Middleware Services Content-based Discovery, Associative Messaging Content Overlay Content-based Routing Engine, Self-Organizing Overlay Ontology, Taxonomy Meteor/Squid Content-based Middleware

  • Conceptual models and implementation architectures

– programming systems based on popular programming models

  • object, component and service based prototypes

– content-based coordination and messaging middleware – amorphous and emergent overlays

  • http://automate.rutgers.edu
slide-27
SLIDE 27

27

Project AutoMate: Core Components

  • Accord – A Programming System for Autonomic Grid

Applications S id D t li d I f ti Di d C t t

  • Squid – Decentralized Information Discovery and Content-

based Routing

  • Meteor – Content-based Interactions/Messaging Middleware
  • Rudder/Comet – Decentralized Coordination Middleware
  • ACE – Autonomic Composition Engine
  • SESAME – Context-Aware Access Management

SESAME Context Aware Access Management

  • DAIS – Cooperative Protection against Network Attacks
  • More information/Papers – http://automate.rutgers.edu

“AutoMate: Enabling Autonomic Grid Applications,” M. Parashar et al, Cluster Computing: The Journal of Networks, Software Tools, and Applications, Special Issue on Autonomic Computing, Kluwer Academic Publishers. Vol. 9, No. 2, pp. 161 – 174, 2006.

Accord: Rule-Based Programming System

  • Accord is a programming system which supports the

development of autonomic applications.

– Enables definition of autonomic components with programmable behaviors and interactions. – Enables runtime composition and autonomic management of these components using dynamically defined rules.

  • Dynamic specification of adaptation behaviors using rules.
  • Enforcement of adaptation behaviors by invoking sensors and actuators.

p y g

  • Runtime conflict detection and resolution.
  • 3 Prototypes: Object-based, Components-based (CCA),

Service-based (Web Service)

“Accord: A Programming Framework for Autonomic Applications,” H. Liu* and M. Parashar, IEEE Transactions on Systems, Man and Cybernetics, Special Issue on Engineering Autonomic Systems, IEEE Press, Vol. 36, No 3, pp. 341 – 352, 2006.

slide-28
SLIDE 28

28

Autonomic Element/Behaviors in Accord

Element Manager Functional Port Computational Element Manager Functional Port Computational Application strategies Application strategies Autonomic Element Control Port Operational Port Computational Element Autonomic Element Control Port Operational Port Computational Element

E t A t t Other E t A t t Other

Application workflow Composition manager Application strategies Application requirements

Interaction rules Interaction rules Interaction rules Interaction rules Behavior rules Behavior rules Behavior rules Behavior rules

Application workflow Composition manager Application strategies Application requirements

Interaction rules Interaction rules Interaction rules Interaction rules Behavior rules Behavior rules Behavior rules Behavior rules

Element Manager Event generation Actuator invocation Interface invocation Internal state Contextual state Rules Element Manager Event generation Actuator invocation Interface invocation Internal state Contextual state Rules

LLC Based Self Management within Accord

Self-Managing Element Optimization Function LLC Controller

El t/S i M t d ith LLC

Computational Element LLC Controller Element Manager Internal State Contextual State Element Manager Advice Computational Element Model

  • Element/Service Managers are augmented with LLC

Controllers

– monitors state/execution context of elements – enforces adaptation actions determined by the controller – augment human defined rules

slide-29
SLIDE 29

29 The Self-managing Shock Simulation: Self-optimizing Via Component Replacement The Self-managing Shock Simulation: Self-optimizing Via Component Adaptation

slide-30
SLIDE 30

30 The Self-managing Shock Simulation: Self-healing Via Component Replacement

Decentralized (Decoupled/Asynchronous) Content-based Middleware Services

Pervasive Grid Environment Self-Managing Overlay (SquidTON)

slide-31
SLIDE 31

31 SquidTON: Reliable & Fault Tolerant Overlay

  • Pervasive Grid systems are dynamic, with nodes joining, leaving and

failing relatively often

  • => data loss and temporarily inconsistent overlay structure
  • => the system cannot offer guarantees

B ild d d i t th l t k – Build redundancy into the overlay network – Replicate the data

  • SquidTON = Squid Two-tier Overlay Network

– Consecutive nodes form unstructured groups, and at the same time are connected by a global structured overlay (e.g. Chord) – Data is replicated in the group

25 22 249 17 10 22 17 10 33 82 86 91 94 128 132 150 157 141 231 249 240 185 192 Groups of nodes Group identifier 250 231 249 240 185 192 128 132 150 157 141 161 82 86 91 94 33 25 22 102 41

Content Descriptors and Information Space

  • Data element = a piece of information that is indexed and discovered

– Data, documents, resources, services, metadata, messages, events, etc.

  • Each data element has a set of keywords associated with it, which

y describe its content => data elements form a keyword space

2D keyword space for a P2P file sharing system

network Document

keyword2

computer

keyword1

comp* Complex query (comp*, *)

keyword1 keyword2

2D keyword space for a P2P file sharing system

network Document

keyword2

computer

keyword1

comp* Complex query (comp*, *)

keyword1 keyword2

network Document

keyword2

computer

keyword1

network Document

keyword2

computer

keyword1

computer

keyword1

comp* Complex query (comp*, *)

keyword1 keyword2

comp* Complex query (comp*, *)

keyword1 keyword2

comp* Complex query (comp*, *)

keyword1 keyword2

3D keyword space for resource sharing, using the attributes: storage space, base bandwidth and cost

Storage space (MB) Base bandwidth (Mbps)

Computational resource

Cost

30 100 9

Storage space (MB) Base bandwidth (Mbps) C

  • s

t

20 25 10 Complex query (10, 20-25, *)

3D keyword space for resource sharing, using the attributes: storage space, base bandwidth and cost

Storage space (MB) Base bandwidth (Mbps)

Computational resource

Cost

30 100 9

Storage space (MB) Base bandwidth (Mbps) C

  • s

t

20 25 10 Complex query (10, 20-25, *)

Storage space (MB) Base bandwidth (Mbps)

Computational resource

Cost

30 100 9

Storage space (MB) Base bandwidth (Mbps)

Computational resource

Cost

30 100 9

Storage space (MB) Base bandwidth (Mbps)

Computational resource

Cost

30 100 9

Storage space (MB) Base bandwidth (Mbps) C

  • s

t

20 25 10 Complex query (10, 20-25, *)

Storage space (MB) Base bandwidth (Mbps) C

  • s

t

20 25 10 Complex query (10, 20-25, *)

Storage space (MB) Base bandwidth (Mbps) C

  • s

t

20 25 10 Complex query (10, 20-25, *)

slide-32
SLIDE 32

32

Content Indexing: Hilbert SFC

  • f: Nd → N, recursive generation

1 1 00 01 10 11 11 10 01 00 00 11 01 10 0000 0010 0001 0011 0100 0101 0110 0111 10001001 1010 1011 1100 1111 1110 1101 1 1 00 01 10 11 11 10 01 00 00 11 01 10 0000 0010 0001 0011 0100 0101 0110 0111 10001001 1010 1011 1100 1111 1110 1101

  • Properties:

– Digital causality – Locality preserving – Clustering Cluster: group of cells connected by a segment of the curve

Content Indexing, Routing & Querying

itude 51

(*, 4) Content profile 3:

Content profile 2

(4-7,0-3) Content profile 2:

Content profile1

  • Demonstrated analytically and experimentally that

4 4 7 3 Generate the clusters

alt longitude 13 33 47 51 40 Content profile 3

Matching messages 2 1

7 7

(2,1) Content profile 1:

p

– for large systems that queries with p% coverage will query p% of the nodes, independent on data distribution. – the system scales with the number of nodes and data – optimization significantly reduces the number of clusters generated and messages sent – slightly increases the number of nodes queried – only a small number of “intermediary” nodes are involved

Generate the clusters associated with the content profile Rout to the nodes that store the clusters Send the results to the requesting node

  • Note:

– More than one cluster are typically stored at a node – Not all clusters that are generated for a query exist in the network – SFC, clusters generation is recursive, i.e., a prefix tree (trie)

  • Optimization: embed the tree into the overlay and prune nodes during construction

number of intermediary nodes are involved

slide-33
SLIDE 33

33 Squid Content Routing/Discovery Engine: Experimental Evaluation

  • System size: 103 to 106 nodes
  • Data:

– Uniformly distributed synthetic

  • Experiments:

– Number of clusters generated for a query Uniformly distributed, synthetic generated data – 4*105 CiteSeer data

  • Load balanced system

– Number of nodes queried

  • All results are plotted on a

logarithmic scale

Squid Content Routing/Discovery Engine: Optimization

  • Number of clusters generated for queries with coverage 1%, 0.1%,

0.01%, with and without optimization

  • The results are normalized against the clusters that the query defines on

The results are normalized against the clusters that the query defines on the curve (i.e. without optimization).

0.01 0.1 1 10

ized number of clusters

0.001 0.01 0.1 1 10

ized number of clusters

3D Uniformly distributed data 3D CiteSeer data

0.0001 0.001

System size Normali

Clusters 1% query 0.1% query 0.01% query

10 3 10 4 10 5 0.00001 0.0001

System size Normali

Clusters 1% query 0.1% query 0.01% query

10 3 10 4 10 5

slide-34
SLIDE 34

34 Squid Content Routing/Discovery Engine – Nodes Queried

  • Percentage of nodes queried for queries with coverage 1%,

0.1%, 0.01%, with and without optimization

0.1 1 10

% of nodes queried

0.1 1 10

% of nodes queried

0.01

System size

1% 1% optimization 0.1% 0.1% optimization 0.01% 0.01% optimization

10 3 10 4 10 5 10 6 0.01

System size

10 3 10 4 10 5

1% 1% optimization 0.1% 0.1% optimization 0.01% 0.01% optimization

3D Uniformly distributed data 3D CiteSeer data

Project Meteor: Associative Rendezvous

  • Content-based decoupled interaction

with programmable reactive behaviors

– Messages - (header, action, data)

  • Symmetric post primitive: does not

diff i i b i /d profile credentials message context TTL (time to live)

Action Header

  • store

differentiating between interest/data

– Associative selection

  • match between interest and data

– Reactive behavior

  • Execute action field upon matching
  • Decentralized in-network aggregation

– Tries for back-propagating and aggregating matching data items

  • Supports WS Notification standard

[Data] store

  • retrieve
  • notify
  • delete

Profile = list of (attribute, value) pairs:

Example: <(sensor_type, temperature), (latitude, 10), (longitude, 20)>

C1 C2 post (<p1, p2>, store, data) notify_data(C2) notify(C2) (1) (3) (2) post(<p1, *>, notify_data(C2) ) <p1, p2> <p1, *> match

slide-35
SLIDE 35

35

Heterogeneity Management

  • Heterogeneity management and adaptations at AR nodes using reactive

behavrio

– Policy-based adaptations based on capabilities, preferences, resources y p p , p ,

Comet Coordination Space

  • A virtual global shared-space constructed from a

semantic multi-dimensional information space, which is deterministically mapped onto the system peer nodes deterministically mapped onto the system peer nodes

  • Space is associatively accessible by all system peer
  • nodes. Access is independent of the physical locations of

tuples or hosts

– Tuple distribution

  • A tuple/template (XML) is associated with k keywords
  • Squid content-based routing engine used or exact and approximate

Squid content based routing engine used or exact and approximate tuple distribution and retrieval

– Transient spaces

  • Enable application to explicitly exploit context locality

“COMET: A Scalable Coordination Space in Decentralized Distributed Environments,” Z. Li* and M. Parashar, Proceedings of the 2nd International Workshop on Hot Topics in Peer-to-Peer Systems (HOT-P2P 2005), San Diego, CA, USA, IEEE Computer Society Press, pp. 104 – 111, July 2005.

slide-36
SLIDE 36

36

Supporting the Rudder Agent Framework

  • Agents communication

– associatively reading, writing, and extracting tuples

  • Agent coordination protocols

g p

– Decentralized election protocol

  • Based on wait-free consensus protocols

– Resilient to node/link failures

– Discovery protocol

  • Registry implemented using XML tuples
  • Element registered using Out
  • Element unregistered using In
  • Elements discovered using Rd/RdAll operation
  • Elements discovered using Rd/RdAll operation

– Interaction protocol

  • Contract-Net protocol
  • Two agent bargaining protocol
  • Workflow engine

“Enabling Dynamic Composition and Coordination of Autonomic Applications using the Rudder Agent Framework,” Z. Li* and M. Parashar, The Knowledge Engineering Review, Cambridge University Press (also SAACS 2005).

Implementation/Deployment Overview

  • Current implementation builds on

JXTA

– SquidTON, Squid, Comet and

  • Deployments include

– Campus Grid @ Rutgers – Orbit wireless testbed (400 q , q , Meteor layers are implemented as event-driven JXTA services Orbit wireless testbed (400 nodes) – PlanetLab wide-area testbed

  • At least one node selected from

each continent

slide-37
SLIDE 37

37

Outline

  • Pervasive Grid Environments - Unprecedented Opportunities
  • Pervasive Grid Environments - Unprecedented Challenges,

Pervasive Grid Environments Unprecedented Challenges, Opportunities

  • Autonomic Grid Computing
  • Project AutoMate @ TASSL, Rutgers University – Enabling Autonomic

Applications in Pervasive Grid Environments

  • An Illustrative Application
  • Concluding Remarks

The Instrumented Oil Field of the Future (UT-CSM, UT-IG, RU, OSU, UMD, ANL)

  • Production of oil and gas can take advantage of installed sensors that will monitor

the reservoir’s state as fluids are extracted

  • Knowledge of the reservoir’s state during production can result in better

engineering decisions Detect and track changes in data during production Invert data for reservoir properties Detect and track reservoir changes Assimil t d t & s i p p ti s int engineering decisions

– economical evaluation; physical characteristics (bypassed oil, high pressure zones); productions techniques for safe operating conditions in complex and difficult areas

Assimilate data & reservoir properties into the evolving reservoir model Use simulation and optimization to guide future production, future data acquisition strategy

“Application of Grid-Enabled Technologies for Solving Optimization Problems in Data-Driven Reservoir Studies,” M. Parashar, H. Klie, U. Catalyurek, T. Kurc, V. Matossian, J. Saltz and M Wheeler,

  • FGCS. The International Journal of Grid Computing: Theory, Methods and Applications (FGCS),

Elsevier Science Publishers, Vol. 21, Issue 1, pp 19-26, 2005.

slide-38
SLIDE 38

38 Effective Oil Reservoir Management: Well Placement/Configuration

  • Why is it important

– Better utilization/cost-effectiveness of existing reservoirs – Minimizing adverse effects to the environment

Better Management Bad Management Less Bypassed Oil Much Bypassed Oil

An Autonomic Well Placement/Configuration Workflow

Start Parallel Instance connects to DISCOVER Notifies Clients Clients interact Send guesses Optimization Service IPARS Factory SPSA VFSA

Exhaustive

DISCOVER client client Generate Guesses Send Guesses Start Parallel IPARS Instances Instance connects to DISCOVER Clients interact with IPARS If guess not in DB instantiate IPARS with guess as parameter MySQL Database If guess in DB: send response to Clients and get new guess from Optimizer

Search

AutoMate Programming System/Grid Middleware

History/ Archived Data Sensor/C

  • ntext

Data

Oil prices, Weather, etc.

slide-39
SLIDE 39

39 Data-Flow for Autonomic Well Placement/Configuration

Data Conversion

Data Manipulation Tools

Seismic Sim Seismic Sim Seismic Sim

Seismic Datasets

Distributed Execution

Reservoir Datasets

Seismic Sim Seismic Sim Seismic Sim

Seismic Datasets

Reservoir Datasets

Data Manipulation Tools

Sensor Data

RD SUM AVG DIFF SUM SUM RD …….. Node 1 Node 20 DIFF DIFF DIFF Transparent Copies Transparent Copies (one copy per node) SUM Filters

Seismic Datasets

50.00 50.00

Visualization Portals

Data Contex t Data

Autonomic Autonomic Oil Oil Well Placement/Configuration

permeability

Pressure contours 3 wells, 2D profile Contours of NEval(y,z,500)(10) Requires NYxNZ (450)

  • evaluations. Minimum

appears here. VFSA solution: “walk”: found after 20 (81) evaluations ( )

slide-40
SLIDE 40

40 Autonomic Autonomic Oil Oil Well Placement/Configuration (VFSA)

“An Reservoir Framework for the Stochastic Optimization of Well Placement,” V. Matossian, M. Parashar, W. Bangerth, H. Klie, M.F. Wheeler, Cluster Computing: The Journal of Networks, Software Tools, and Applications, Kluwer Academic Publishers, Vol. 8, No. 4, pp 255 – 269, 2005 “Autonomic Oil Reservoir Optimization on the Grid,” V. Matossian, V. Bhat, M. Parashar, M. Peszynska, M. Sen, P. Stoffa and M. F. Wheeler, Concurrency and Computation: Practice and Experience, John Wiley and Sons, Volume 17, Issue 1, pp 1 – 26, 2005.

Wide Area Data Streaming in the Fusion Simulation Project (FSP)

GTC Runs on Teraflop/Petaflop Supercomputers Large data analysis End-to-end system with monitoring routines 40Gbps D a t a a r c h i v i n g Data replication Data replication User monitoring Post processing User monitoring Visualization

  • Wide Area Data Streaming Requirements

– Enable high-throughput, low latency data transfer to support near real-time access to the data – Minimize related overhead on the executing simulation – Adapt to network conditions to maintain desired QoS – Handle network failures while eliminating data loss.

Visualization

slide-41
SLIDE 41

41 Autonomic Data Streaming for Coupled Fusion Simulation Workflows

Transfer Simulation Analysis Storage PPPL, NJ NERSC, CA Data Grid Middleware & Logistical Networking Storage ORNL, TN Backbone Simulation : Simulation Service (SS) Analysis/Viz :Data Analysis Service (DAS) Storage : Data Storage Service (DSS)

Data Transfer Buffer

Autonomic Data Streaming Service Managed Service Transfer : Autonomic Data Streaming Service (ADSS) – Buffer Manager Service (BMS) – Data Transfer Service (DTS) g (ADSS)

  • Grid-based coupled fusion simulation workflow

– Run for days between ORNL (TN), NERSC (CA), PPPL (NJ) and Rutgers (NJ) – Generates multi-terabytes of data – Data coupled, analyzed and visualized at runtime

ADSS Implementation

Buffer size Prediction Optimization LLC Model

LLC Controller

Data Rate Prediction Buffer size Prediction Optimization LLC Model

LLC Controller

Data Rate Prediction Bandwidth Measurement Buffer Size Measurement Data Rate Measurement Optimization Function

Element/Service Manager Contextual State Service State

Rule Base Bandwidth Measurement Buffer Size Measurement Data Rate Measurement Optimization Function

Element/Service Manager Contextual State Service State

Rule Base

DTS BMS SS Local HD ADSS

Rule Based Adaptation Grid Middleware {LN} Backbone

DTS BMS SS Local HD ADSS

Rule Based Adaptation Grid Middleware {LN} Backbone

slide-42
SLIDE 42

42

Design of the ADSS Controller

  • Dynamics of Data Streaming Model

captured using a queuing model

  • Key Operating parameters

State variables

LLC model for the ADSS Controller q1(k)

Queue Manager

Threads n1 Data Blocks

– State variables

  • qi(k) : Current Average queue

size at ni – Environment Variables

  • λi(k): Data generation rate into

the queue at ni

  • B(k) : Effective bandwidth of

the network link – Control and Decision Variables

  • μi(k) : Data-transfer rate over

the network (k) D t t f t t th

“m” simulation processors “n” Data λ1(k) λ2(k) λn(k) μn(k) ω1(k) Analysis Clusters Simulation Processors ω2(k) μ1(k) μ2(k)

Queue Manager Queue Manager

n2 nn

. . . .

  • ωi(k), Data-transfer rate to the

hard disk

  • LLC Controller Problem

– Controller Aims to Maintain each Node (ni) queue around a desired value of q*

Local HD n Data transfer processors ωn(k) qn(k)

T k k k k q k q

i i i i i

⋅ − − ⋅ + = + ))) ( ) ( 1 ( ) ( ˆ ( ) ( ) 1 ( ˆ ω μ λ

) ), 1 ( ( ) ( k k k

i i

− = λ φ λ

System dynamics at each data streaming processor ni

Adaptive Data Transfer

MB) 100 120 140 ec) 80 100 120 2 4 6 8 10 12 14 16 18 20 22 24 Data Transferrred by DTS(M 20 40 60 80 Bandwidth (Mb/se 20 40 60 DTS to WAN DTS to LAN Bandwidth Congestion

  • No congestion in intervals 1-9

– Data transferred over WAN

  • Congested at intervals 9-19

– Controller recognizes this congestion and advises the “Element Manager” which in turn adapts DTS to transfer data to local storage (LAN).

  • Adaptation continues until the network is not congested

– Data sent to the local storage by the DTS falls to zero at the 19th controller interval.

Controller Interval 2 4 6 8 10 12 14 16 18 20 22 24

slide-43
SLIDE 43

43

Adaptive Buffer Management

MS 6 7 8 c) 80 100 120 Blocks 1 Block = 4MB Congestion Bandwidth Uniform Buffer Management Aggregate Buffer Management Blocks Grouped by BM 1 2 3 4 5 Bandwidth (Mb/sec 20 40 60 80 Bandwidth

  • Uniform buffer management is used when data generation is constant
  • Aggregate buffer management is triggered when congestion increases

Controller Interval 2 4 6 8 10 12 14 16 18 20 22 24 26

Self Optimizing Behavior: Buffer Management Service (BMS)

Transfer Simulation Autonomic Data Streaming

MB/block) 12 14

Aggregate Buffer Management

Data Transfer BMS Autonomic Data Streaming

Number of Blocks Sent (10M 2 4 6 8 10

100Mbps Uniform Buffer Mangement 50Mbps 120Mbps

  • Uniform Buffer Management :

buffer divides data blocks into fixed sizes (50Mbps).

  • Aggregate Buffer Management :

Aggregates block of data across iterations(60-400Mbps). P i i B ff M

Simulation Time (sec) 50 100 150 200 250 300 350 400

  • Priority Buffer Management :
  • rder blocks on nature of the

data.(2D and 3D data items).

  • BMS adaptations based on:

– Data Generation Rates – Network connectivity – Nature of data being transmitted

Sample Rule : IF DataGenerationRate <=50 AND DataType = 2D THEN Aggregate Buffer Management ELSE Uniform Buffer Management

slide-44
SLIDE 44

44

Adaptation of the Workflow

  • Create multiple instances
  • f the Autonomic Data

Streaming Service (ADSS)

t 80 100 nces 4 5

– Effective Network Transfer Rate dips below the Threshold (our case around 100Mbs) – Maximum number of instances is contained within a certain limit

% Network throughput 20 40 60 Number of ADSS Instan 1 2 3 Data Generation Rate (Mbps) 20 40 60 80 100 120 140 160 % Network throughput vs Mbps Number of ADSS Instances vs Mbps

Transfer Simulation

ADSS-0

Data Transfer Buffer Data Transfer Buffer Data Transfer Buffer

ADSS-1 ADSS-2

% Network throughput is difference between the max and current network transfer rate

Overhead of Self Optimizing in BMS

ulation 15 20 %Overhead vs Mbps using Autonomic Management %Overhead vs Mbps without Autonomic Management

  • Overhead = Abs (Time required

with and without Autonomic Data Streaming)

% Overhead on the Simu 5 10

Observations

  • Autonomic Data Streaming is

slightly costly at the start

  • Without Autonomic Management
  • verhead grows until about 10%

Data Generation Rate (Mbps) 20 40 60 80 100 120 140 160

  • With Autonomic Management
  • verhead is limited to 5%
  • Minimize the overhead on the executing

simulation.

  • Maximum data to be streamed from the

simulation.

slide-45
SLIDE 45

45

Self Healing behavior

Grid Middleware and Logistical Networking* backbone

ADSS Simulation

ADSS

DTS BMS DAS DSS PPPL, NJ

Sample Rule for Self Healing IF bufferUsage > 90% THEN DSS at ORNL ELSE DSS at PPPL

pancy 80 100 120 DSS(MB) Data Sent to Local DSS (at ORNL) vs Simulation Time(sec) % Buffer Occupancy vs Simulation Time (sec)

DSS ORNL, TN

  • ADSS continuously observes

the buffer occupancy to id i h

Buffer full Local Storage Service Triggered

Simulation Time(sec) 500 1000 1500 2000 % Buffer Occu 20 40 60 Data Sent to Local

Buffer full second time Local Storage Service Triggered

consider a switch.

  • Avoid writing to disk at the

simulation end.

  • Temporarily switch to a Data

Storage Service which is faster such as ORNL.

Heuristic (Rule Based) vs. Control Based Adaptations in ADSS

ancy 60 70 80 90 100 ancy 60 70 80 90 100 % Buffer Vacancy vs Time Mean % Buffer Vacancy

  • %Buffer vacancy refers to the empty space in the buffer

%Buffer Vacancy using heuristically based rules. %Buffer Vacancy using control based self management.

Time(sec) 100 200 300 400 500 600 700 800 900 1000 % Buffer Vaca 10 20 30 40 50 % Buffer Vacancy vs Time Mean %Buffer Vacancy Controller Interval Time (sec) 100 200 300 400 500 600 700 800 900 1000 % Buffer Vaca 10 20 30 40 50

y p y p

– Higher buffer vacancy leads to reduced overheads and data loss.

  • Pure reactive scheme was based on heuristics and the element manager

was not aware of the current and future data generation rate and the network bandwidth.

– Average buffer vacancy in this case was around 16%

  • When model-based controller was used in conjunction with the “Element

Manager” for rule based adaptations average buffer vacancy was around 75%.

slide-46
SLIDE 46

46

Overhead of the Self-Managing Framework

  • Overheads of the framework primarily due to two factors.

– Activities of the controller during a controller interval. – Overhead of data streaming on the simulation.

  • Overhead due to controller activities

– Controller interval of 80 seconds

  • Average controller decision-time is 2.5% at start reduced to 0.15% due to local

search methods used.

  • Network measurement cost was 18.8 sec (23.5%).
  • Operating cost of the BMS and DTS was 0.2 sec (0.25%) and 18.8 sec (23.5%).
  • Rule execution for triggering adaptations required less than 0.01 sec.
  • Overhead of data streaming

– (T’s - Ts)/Ts where T’s and Ts denote the simulation execution time with and without data streaming respectively. – %Overhead of data streaming on the GTC simulation was less than 9% for 16-64 processors. – %Overhead reduced to about 5% for 128-256 processors.

Outline

  • Pervasive Grid Environments - Unprecedented Opportunities
  • Pervasive Grid Environments - Unprecedented Challenges,

Pervasive Grid Environments Unprecedented Challenges, Opportunities

  • Autonomic Grid Computing
  • Project AutoMate @ TASSL, Rutgers University – Enabling Autonomic

Applications in Pervasive Grid Environments

  • An Illustrative Application
  • Concluding Remarks
slide-47
SLIDE 47

47

Conclusion

  • Pervasive Grid Environments

– Unprecedented opportunity

  • can enable a new generation of knowledge-based, data and information

driven context-aware computationally intensive pervasive applications driven, context aware, computationally intensive, pervasive applications

– Unprecedented research challenges

  • scale, complexity, heterogeneity, dynamism, reliability, uncertainty, …
  • applications, algorithms, measurements, data/information, software
  • Autonomic Grid Computing

– Addressing the complexity of pervasive Grid environments

  • Project AutoMate: Autonomic Computational Science on the

Grid Grid

– Semantic + Autonomics – Accord, Rudder/Comet, Meteor, Squid, Topos, …