ISGC - - Academia Academia Sinica Sinica ISGC 28 July 2004 28 - - PowerPoint PPT Presentation

isgc academia academia sinica sinica isgc
SMART_READER_LITE
LIVE PREVIEW

ISGC - - Academia Academia Sinica Sinica ISGC 28 July 2004 28 - - PowerPoint PPT Presentation

ISGC - - Academia Academia Sinica Sinica ISGC 28 July 2004 28 July 2004 Iosif Legrand Legrand Iosif California Institute of Technology July 2004 Iosif Legrand Monitoring Services An essential part of managing a global Data


slide-1
SLIDE 1

July 2004 Iosif Legrand

ISGC ISGC -

  • Academia

Academia Sinica Sinica

28 July 2004 28 July 2004

Iosif Iosif Legrand Legrand California Institute of Technology

slide-2
SLIDE 2

July 2004 Iosif Legrand

Monitoring Services

  • An essential part of managing a global Data Grid is a monitoring

system that is able to monitor and track the many site facilities, networks, and the many tasks in progress, in real time.

  • System information for nodes and clusters
  • Network information Wan and LAN
  • Application monitoring
  • The monitoring information gathered also is essential for developing

the required higher level services, and components of the Grid system that provide decision support, and eventually some degree

  • f automated decisions, to help maintain and optimize workflow

through the Grid.

  • The MonALISA system is designed as an ensemble of autonomous

multi-threaded, self-describing agent-based subsystems which are registered as dynamic services, and are able to collaborate and cooperate in performing a wide range of monitoring tasks and decisions in large scale distributed applications.

slide-3
SLIDE 3

July 2004 Iosif Legrand

MonALISA is A Dynamic, Distributed Service Architecture

  • The MonALISA system is designed as an ensemble of autonomous

multi-threaded, self-describing agent-based systems

  • The agent-based dynamic systems are able to carry out a wide range
  • f monitoring tasks in the LHC Data Grid (and other Grids).

Lookup Service

Proxy CLIENT

Lookup Service

Proxy Service

Any well suited protocol for the application Code Mobility Paradigm Dynamic Loading of Service

Remote Services Proxy == RMI Stub Mobile Agents Proxy == Entire Service “Smart Proxies” Proxy adjusts to the client

An agent-based architecture provides the ability to invest the system with increasing degrees of intelligence; to reduce complexity and make global systems (Grids and networks) manageable in real time

slide-4
SLIDE 4

July 2004 Iosif Legrand

The Key The Key MonALISA MonALISA Features for a Reliable and Features for a Reliable and Scalable Monitoring and Management System Scalable Monitoring and Management System

MonALISA Service

CLIEN T

Service ID Register with ID Web Sever Publish the Proxy (mobile Code)

  • MonALISA

MonALISA is able to dynamically is able to dynamically register register and and discover discover all the other all the other Services Services

  • It is based on a

It is based on a multi multi-

  • threaded

threaded services engine for global scalability services engine for global scalability

  • The services are

The services are self describing self describing and and provide loadable proxies provide loadable proxies

  • Automatic & secure code update

Automatic & secure code update

  • Dynamic configuration for services.

Dynamic configuration for services. Secure Admin interface. Secure Admin interface.

  • Active filter agents

Active filter agents to process the to process the data and provided dedicated / data and provided dedicated / customized information to other customized information to other services or clients. services or clients.

  • Mobile Agents

Mobile Agents for decision support for decision support and global optimization. and global optimization.

  • Dynamic proxies

Dynamic proxies and WSDL & WAP and WSDL & WAP pages for services. Lookup Service Lookup Service

JAR

PROXY Code Loading

The Lease Protocol

pages for services.

Fully Distributed System with no Single Point of Failure

slide-5
SLIDE 5

July 2004 Iosif Legrand

Monitoring: Data Collection Monitoring: Data Collection

Farm Monitor

WEB Server

Dynamic Thread Pool

SNMP get & walk rsh | ssh remote scripts End-To-End measurements PULL Trap Agent (ucd – snmp) perl Trap Listener PUSH snmp trap Dynamic loading of modules or agents

Configuration Control Configuration Control

Other tools

(Ganglia, MRT…)

slide-6
SLIDE 6

July 2004 Iosif Legrand

Lookup Service

Service Monitor UNIT & Data Handling Service Monitor UNIT & Data Handling

Farm Monitor

Data Cache Service & DB

Configuration Control (SSL) Configuration Control (SSL)

Lookup Service

Predicates & Agents Monitor Data Stores WEB Service WSDL SOAP

Client (other service) Java

Discovery Registration

Client (other service) Web client

data McKoi DB MySQL MDS UDP MySQL Other tools User defined loadable Modules to write /sent data Predicates & Agents

slide-7
SLIDE 7

July 2004 Iosif Legrand

Lookup Service

Registration / Discovery / Remote Registration / Discovery / Remote Notification Notification

MonALISA Service

Lookup Service Client (other service)

Registration Discovery

MonALISA Service MonALISA Service

Services Proxy Multiplexer Services Proxy Multiplexer Client (other service)

Data Data Filters & Agents Filters & Agents

slide-8
SLIDE 8

July 2004 Iosif Legrand

Secure Secure – – Automatic Update Mechanism Automatic Update Mechanism for Services, Clients & Embedded Applications for Services, Clients & Embedded Applications

download download Update Signal SSL Update Signal SSL Admin Client

Discovery

MonaLisa Service Lookup Service Lookup Service MonaLisa Service Key store

  • Priv. Key

Update Signal SSL Update Signal SSL

Web Server Sign Distribution

download download Key store

  • All running services

are update using the discovery mechanism

  • At startup each service

check if it an update is done at a set of Web Servers

  • Clients use the Web

Start mechanism

EMBEDDED APPLICATIONS EMBEDDED APPLICATIONS Monitor, Control Execution, Update Monitor, Control Execution, Update

slide-9
SLIDE 9

July 2004 Iosif Legrand

Pseudo Pseudo – – Clients & Dedicated Repositories Clients & Dedicated Repositories

F i l t e r A g e n t s / D a t a F i l t e r A g e n t s / D a t a

Pseudo Client

Discovery

MonaLisa Service MySQL IDB Lookup Service Lookup Service MonaLisa Service MySQL IDB MySQL TOMCAT JSP/servelts

F i l t e r A g e n t s / D a t a F i l t e r A g e n t s / D a t a

WAP WEB

slide-10
SLIDE 10

July 2004 Iosif Legrand

Data Collection and Interfacing Data Collection and Interfacing with Other Tools with Other Tools

MonALISA MonALISA is interfaced with many monitoring tools and is is interfaced with many monitoring tools and is capable to collect information from different applications: capable to collect information from different applications: Computing Nodes / Farms (system information , network Computing Nodes / Farms (system information , network traffic traffic… … ) )

  • SNMP, Ganglia, dedicated scripts

SNMP, Ganglia, dedicated scripts

Routers , Switches Routers , Switches

  • SNMP, MRTG, WS, very soon

SNMP, MRTG, WS, very soon NetFlow NetFlow

End to End Network performance End to End Network performance

  • IPERF, Pipes,

IPERF, Pipes, Abing Abing, , ABping ABping … …

Batch Queuing Systems Batch Queuing Systems

  • LSF, PBS, Condor, NQS, Grid Job Manager

LSF, PBS, Condor, NQS, Grid Job Manager

Applications Applications

  • ORCA,

ORCA, GridFTP GridFTP, TMDB, Proof, VRVS, Apache, RRD , TMDB, Proof, VRVS, Apache, RRD … …

slide-11
SLIDE 11

July 2004 Iosif Legrand

MonALISA MonALISA DEMO DEMO

slide-12
SLIDE 12

July 2004 Iosif Legrand

Monitoring And Controlling Optical Switches Monitoring And Controlling Optical Switches

External Application

Optical Path

Optical Switch Optical Switch Optical Switch Optical Path O p t i c a l P a t h

MonALISA

Agents Agents Agents

Agents Control A g e n t s C

  • n

t r

  • l

A g e n t s C

  • n

t r

  • l

Distributed Agent Proxies

Agent-to-Agent Communication Agent-to-Agent Communication

External Application Application Monitoring External Application

MonALISA MonALISA

Application Monitoring Real-time monitoring Real-time monitoring Real-time monitoring C

  • n

n e c t i

  • n

M a p

slide-13
SLIDE 13

July 2004 Iosif Legrand

Grid3

  • ~20 sites in US and 1 Korea

CMS-US sites CMS – DC04 We collected ~ 50 million monitoring records from the 6 T1 CDF D0 SAR ABILENE backbone GLORIAD STAR ALICE VRVS System RoEduNET backbone INTERNET2 PIPES

Communities using Communities using MonALISA MonALISA

ABI LENE ABI LENE VRVS VRVS

  • GRI D3

GRI D3 ALICE CMS CMS-

  • DC04

DC04 It has been used for Demonstrations at:

SC2003 Telecom 2003 WSIS 2003

slide-14
SLIDE 14

July 2004 Iosif Legrand

SUMMARY SUMMARY

  • MonALISA

MonALISA is able to dynamically discover all the is able to dynamically discover all the “ “Service Units" used by a Service Units" used by a community and through the remote event notification mechanism k community and through the remote event notification mechanism keeps an eeps an update state for the entire system update state for the entire system

  • Automatic & secure code update (services, embedded applications

Automatic & secure code update (services, embedded applications and clients) . and clients) .

  • Dynamic configuration for services. Secure Admin interface.

Dynamic configuration for services. Secure Admin interface.

  • Access to aggregate farm values and all the details for each nod

Access to aggregate farm values and all the details for each node e

  • Selected real time / historical data for any subscribed listen

Selected real time / historical data for any subscribed listeners ers

  • Active filter agents

Active filter agents to process the data and provided dedicated / customized to process the data and provided dedicated / customized information to other services or clients. information to other services or clients.

  • Mobile Agents

Mobile Agents for decision support and global optimization. for decision support and global optimization.

  • Dynamic proxies and WSDL & WAP pages for services.

Dynamic proxies and WSDL & WAP pages for services.

  • Dedicated pseudo

Dedicated pseudo-

  • clients for repository, WAP access or decision making units

clients for repository, WAP access or decision making units

  • It proved to be a stable and reliable distributed service system

It proved to be a stable and reliable distributed service system. .

It is currently running at ~ 150 sites It is currently running at ~ 150 sites

http://monalisa.caltech.edu http://monalisa.caltech.edu

slide-15
SLIDE 15

July 2004 Iosif Legrand

Global Client for HEP Grid Sites Global Client for HEP Grid Sites SC03 Grid3 setup SC03 Grid3 setup

DataTAG

  • @ CALTECH
slide-16
SLIDE 16

July 2004 Iosif Legrand

Grid03 : Monitoring Farms, Jobs, Grid03 : Monitoring Farms, Jobs, Network traffic Network traffic

slide-17
SLIDE 17

July 2004 Iosif Legrand

Real Real-

  • time Data for Large Systems

time Data for Large Systems

“ “lxshare lxshare” ” cluster at cluster at cern cern ~ 600 ~ 600 ndoes ndoes

slide-18
SLIDE 18

July 2004 Iosif Legrand

Monitoring CMS applications : Summary Plots for Monitoring CMS applications : Summary Plots for Probability Density for Several Parameters Probability Density for Several Parameters

slide-19
SLIDE 19

July 2004 Iosif Legrand

Historical Plots for large events generated Historical Plots for large events generated by the CMS by the CMS – – ORCA production ORCA production fremework fremework

slide-20
SLIDE 20

July 2004 Iosif Legrand

Mobile Agents and Filters Mobile Agents and Filters

Simple “Global Load” filter agent Maximum Flow Data Replication Path Agent Deployed to each RC and evaluates the best path for real-time data replication

From FNAL to all From CERN to all

slide-21
SLIDE 21

July 2004 Iosif Legrand

Monitoring ABILENE backbone Network Monitoring ABILENE backbone Network

  • Test for a Land Speed Record

Test for a Land Speed Record

  • ~ 7

~ 7 Gb/s Gb/s in a single TCP stream in a single TCP stream from Geneva to Caltech from Geneva to Caltech

slide-22
SLIDE 22

July 2004 Iosif Legrand

MonALISA MonALISA repositories repositories

slide-23
SLIDE 23

July 2004 Iosif Legrand

The VRVS Architecture The VRVS Architecture http:// http://www.vrvs.org www.vrvs.org

vrvs us vrvs eu pub

funet

star- light

sinica

kek cor- nell

triumf

cal- tech usf usp inet 2 vrvs 5

Reflectors are hosts that

interconnect users by permanent IP tunnels. The active IP tunnels must be selected so that there is no cycle formed.

Tree

The selection is made according to the assumed network links performance.

slide-24
SLIDE 24

July 2004 Iosif Legrand

Minimum Spanning Tree Algorithms

Finding a tree T that contains all the vertices

  • f a graph G spanning tree

spanning tree and has the least total weight over all such trees minimum minimum-

  • spanning

spanning tree tree (MST

(MST) )

Input: A weighted connected graph G = (V,E) with n vertices and m edges Output: A minimum- spanning tree T

=

T u v

u v w T w

) , (

)) , (( ) (

slide-25
SLIDE 25

July 2004 Iosif Legrand

Monitoring VRVS Reflectors Monitoring VRVS Reflectors

slide-26
SLIDE 26

July 2004 Iosif Legrand

Monitoring VRVS Reflectors Monitoring VRVS Reflectors Dynamic MST Dynamic MST

slide-27
SLIDE 27

July 2004 Iosif Legrand

Network Topology Service Network Topology Service

IP Information service

IPid Service

whois, NetGeo Ping, Ally

MonALISA service MonALISA client

findIP findAS addTrace g e t I P i d s

Trace from “d” to “c” d {DelayD, ASd, NetD, DescrD} 1 {Delay1, AS1, Net1, Descr1} 2 {Delay2, AS2, Net2, Descr2} c {DelayC, ASc, NetC, DescrC}

a b d e 1 2 3 4 c

Trace from “c” to “d” c {DelayC, ASc, NetC, DescrC} 3 {Delay3, AS3, Net3, Descr3} 4 {Delay4, AS4, Net4, Descr4} 5 {Delay5, AS5, Net5, Descr5} d {DelayD, ASd, NetD, DescrD}

5

identIPs

MonALISA MonALISA service service

  • Performs traces

Performs traces

  • Gets relevant info for

Gets relevant info for each hop each hop

  • Sends traces to the

Sends traces to the intersted intersted clients clients

  • Sends new

Sends new IPs IPs to the to the IPid IPid Service Service

MonALISA MonALISA client client

  • Discover all services

Discover all services

  • Get traces data

Get traces data

  • Resolve IP aliases

Resolve IP aliases

  • Display selected data

Display selected data

  • Perform algorithms

Perform algorithms

1~5, 2~3

slide-28
SLIDE 28

July 2004 Iosif Legrand

ROUTERS

ROUTERS NETWORKS AS

Monitoring the WAN Topology, the Monitoring the WAN Topology, the Latency and Routes Latency and Routes