An Intelligent Rule-Oriented Data Management System Wayne Schroeder - - PowerPoint PPT Presentation

an intelligent rule oriented data management system
SMART_READER_LITE
LIVE PREVIEW

An Intelligent Rule-Oriented Data Management System Wayne Schroeder - - PowerPoint PPT Presentation

An Intelligent Rule-Oriented Data Management System Wayne Schroeder San Diego Supercomputer Center, University of California San Diego DataGrid SAN DIEGO SUPERCOMPUTER CENTER Talk Outline Background Brief Overview of the SDSC SRB


slide-1
SLIDE 1

SAN DIEGO SUPERCOMPUTER CENTER

An Intelligent Rule-Oriented Data Management System

DataGrid

Wayne Schroeder San Diego Supercomputer Center, University of California San Diego

slide-2
SLIDE 2

SAN DIEGO SUPERCOMPUTER CENTER

Talk Outline

  • Background
  • Brief Overview of the SDSC SRB
  • Current Projects/Usage
  • Activities/Plans
  • Rule-Oriented Data Management System
  • iRODS Requirements/Planning
  • Architecture
  • Infrastructure Development
  • Collaborations/Plans
slide-3
SLIDE 3

SAN DIEGO SUPERCOMPUTER CENTER

Data Grid Data Grid

Using a Data Grid – in Abstract

Ask for data

  • User asks for data from the data grid

Data delivered

  • The data is found and returned
  • Where & how details are hidden
slide-4
SLIDE 4

SAN DIEGO SUPERCOMPUTER CENTER

  • User asks for data

Using a Data Grid - Details

Storage Resource Broker

  • Data request goes to SRB Server

Storage Resource Broker Metadata Catalog

DB

  • Server looks up data in catalog
  • Catalog tells which SRB server has data
  • 1st server asks 2nd for data
  • The data is found and returned
slide-5
SLIDE 5

SAN DIEGO SUPERCOMPUTER CENTER

Using a Data Grid - Details

SRB MCAT

DB

SRB SRB SRB SRB SRB

  • Data Grid has arbitrary number of servers
  • Complexity is hidden from users
slide-6
SLIDE 6

SAN DIEGO SUPERCOMPUTER CENTER

Storage Resource Broker A Data Grid Solution

  • Collaborative client-server system that

federates distributed heterogeneous resources using uniform interfaces and metadata

  • Provides a simple tool to integrate data and

metadata handling – attribute-based access

  • Blends browsing and searching
  • Developed at SDSC
  • Operational for 7+ years;
  • Under continual development since 1997;
  • Customer-driven
slide-7
SLIDE 7

SAN DIEGO SUPERCOMPUTER CENTER

Some SRB Features

The SRB is an integrated solution which includes:

  • a logical namespace,
  • interfaces to a wide variety of storage systems,
  • high performance data movement (including parallel I/O),
  • fault-tolerance and fail-over,
  • WAN-aware performance enhancements (bulk operations),
  • storage-system-aware performance enhancements ('containers' to aggregate files),
  • metadata ingestion and queries (a MetaData Catalog (MCAT)),
  • user accounts, groups, access control, audit trails, GUI administration tool
  • data management features, replication
  • user tools (including a Windows GUI tool (inQ), a set of SRB Unix commands, and Web

(mySRB)), and APIs (including C, C++, Java, and Python).

SRB Scales Well (many millions of files, terabytes) Supports Multiple Administrative Domains / MCATs (srbZones) And includes SDSC Matrix: SRB-based data grid workflow management system to create, access and manage workflow process pipelines.

slide-8
SLIDE 8

SAN DIEGO SUPERCOMPUTER CENTER

Recent SRB Release, April 28

  • Any valid ASCII characters are now acceptable in SRB filenames,

except a string of two quotes in a row

  • Data integrity and vault management
  • Quota System
  • SRB Web Perl Portal
  • SRB account management via grid-mapfile
  • Real time data management
  • New driver for NCAR MSS
  • Completely reworked web site/documentation system (MediaWiki)
  • Other new features
  • Critical bug patches for in 3.4.0 included
  • Other bugzilla fixes (about 35)
  • MCAT Patch
slide-9
SLIDE 9

SAN DIEGO SUPERCOMPUTER CENTER

Recent SRB Releases

  • 3.4.1 April 28, 2006
  • 3.4 October 31, 2005
  • 3.3.1 April 6, 2005
  • 3.3 February 18, 2005
  • 3.2.1 August 13, 2004
  • 3.2 July 2, 2004
  • 3.1 April 19, 2004
  • 3.0.1 December 19, 2003
  • 3.0 October 1, 2003
  • 2.1.2 August 12, 2003
  • 2.1.1 July 14, 2003
  • 2.1 June 3, 2003
  • 2.0.2 May 1, 2003
  • 2.0.1 March 14, 2003
  • 2.0 February 18, 2003
slide-10
SLIDE 10

SAN DIEGO SUPERCOMPUTER CENTER

SRB Projects

  • Astronomy
  • National Virtual Observatory
  • Data Grids
  • UK e-Science CCLRC
  • Teragrid
  • Digital Libraries and Archives
  • National Archives and Records Administration
  • National Science Digital Library
  • Persistent Archive Testbed
  • Ecological, Environmental, Oceanographic
  • ROADnet
  • Southern California Earthquake Center
  • SIO Digital Libraries
  • Molecular Sciences
  • Synchrotron Data Repository
  • Alliance for Cellular Signaling
  • Neuro Sciences
  • Biomedical Information Research Network
  • Physics and Chemistry
  • BaBar
  • Many others

Over 650 Tera Bytes in 106 million files

slide-11
SLIDE 11

SAN DIEGO SUPERCOMPUTER CENTER

SRB Scalability

  • Over 2 Petabytes World-wide
  • Major SRB instances in the UK, Australia,

Taiwan, US

  • United Kingdom - UK e-Science
  • Australia - APAC
  • Taiwan - Academia Sinica, NCHC
  • Europe -IN2P3, Italy, Norway
  • United States
  • 660 Terabytes at SDSC
  • 100 Million files
  • SAM QFS, HPSS, Unix file system, SRB Bricks
slide-12
SLIDE 12

SAN DIEGO SUPERCOMPUTER CENTER

SDSC Hosted SRB Data

slide-13
SLIDE 13

SAN DIEGO SUPERCOMPUTER CENTER

Case Study: SRB in BIRN

BIRN Toolkit Mediator

Viewing/Visualization Queries/Results Applications Data Management

File System MCAT HPSS

Data Model Data Access

Data Grid Computational Grid

Collaboration NMI Grid Management

Globus GridPort Scheduler Distributed Resources Database SRB Database

slide-14
SLIDE 14

SAN DIEGO SUPERCOMPUTER CENTER

SRB server SRB agent SRB server

Federated SRB Operation

MCAT Read Application in Boston SRB agent

1 2 3 4 6 5 5/6

Logical Name Or Attribute Condition 1.Logical-to-Physical mapping

  • 2. Identification of Replicas

3.Access & Audit Control Peer-to-peer Brokering Server(s) Spawning Data Access Parallel Data Access

R1 R2 R2

San Diego Durham

slide-15
SLIDE 15

SAN DIEGO SUPERCOMPUTER CENTER

SDSC Storage Resource Broker & Meta-data Catalog

SRB

Archives

HPSS, ADSM, UniTree, DMF

Databases

DB2, Oracle, Sybase

File Systems

Unix, NT, Mac OSX

Application

C, C++, Linux I/O Unix Shell

Dublin Core Resource, User User Defined Application Meta-data

Remote Proxies

DataCutter

Third-party copy

Java, NT Browsers Web Prolog Predicate

MCAT

HRM

slide-16
SLIDE 16

SAN DIEGO SUPERCOMPUTER CENTER

IRODS - the Next Generation

  • f Data Grid Technology
slide-17
SLIDE 17

17

Moving Forward, a Two-Prong Plan

Maintain and Adapt SRB to New Usages: SRB has reached a Stable Plateau

  • Bug Fixes
  • Some New Features
  • Merge Features Developed by others
  • Continue Testing
  • Improve Documentation
  • Continue Application Support
  • Existing and new Projects
  • Continue Answering User Queries

Chart New Areas

  • Federation Research - ZoneSRB
  • Collaborative Data Grids
  • Real-time Data Grids -
  • Virtual Object Ring Buffer
  • Sensors and Video Streams
  • Collaborating Observatories
  • SRB Workflows - New UI for Admins and users
  • Kepler actors, Matrix, etc
  • iRODS - Adaptive Middleware Architecture

MCAT1 MCAT2 MCAT3 Server1.1 Server1.2 Server2.1 Server2.2 Server3.1

slide-18
SLIDE 18

SAN DIEGO SUPERCOMPUTER CENTER

Continuing SRB Support

  • 10 FTEs SRB
  • 5 FTEs iRODS
  • iRODS Developers Support SRB
slide-19
SLIDE 19

19

Next generation Data Architecture

  • SRB is quite complex – with too many functions and operations
  • The intelligence is hard-coded
  • extensions/modifications require extreme care
  • But, the modules are fairly robust and reusable
  • AIM: Can we make SRB more flexible
  • Easy to customize at finer level
  • Example: Higher authentication for a particular collection
  • Example: Can we use stricter authorization for a collection
  • Example: Can we treat a particular resource differently
  • Currently- needs code changes
  • Solution: Use rule-based architecture to provide flexibility
slide-20
SLIDE 20

SAN DIEGO SUPERCOMPUTER CENTER

iRODS

  • A New Paradigm in Middleware

Development

  • Flexible Collection management
  • Can be customized at user/collection-levels, …
  • Language for Collection management
  • As in stored procedures, triggers (RDB)
  • Administrative ease
  • Lot of potential beyond SRB
  • adaptive middleware architectures
  • This will be a fully Open Source effort
slide-21
SLIDE 21

SAN DIEGO SUPERCOMPUTER CENTER

Rule-Oriented Data Systems Framework

Resources

Client Interface Admin Interface

Metadata Modifier Module Config Modifier Module Rule Modifier Module

Consistency Check Module

Confs Rule Base

Meta Data Base Engine Rule Current State

Rule Invoker

Micro Service Modules

Resource-based Services

Micro Service Modules

Metadata-based Services Service Manager

Consistency Check Module Consistency Check Module

slide-22
SLIDE 22

SAN DIEGO SUPERCOMPUTER CENTER

Rule Checking Establish State Data Movement CleanUp

Condition checking, rule firing Backend Processing Micro Services Setup state and interact with RCAT – updates and modifications to persistent state Cleanup state and interact with RCAT – updates and modifications to persistent state Client Operation such as srbObjCreate

Server-side Client-side Rule-oriented Data System

(Phase I Operational Model)

slide-23
SLIDE 23

23

Rules and Constraints

  • Rule-based
  • Lower-level Functions are composed of micro-services
  • Higher-level Functions are composed of rules of lower-level micro-

services

  • Rules are interpreted using a rule engine
  • Customizability
  • Problems with rule composition
  • Integrity checks to make sure rules do

not break higher-level functionalities

  • Declarative programming
  • Rules define semantics
  • Operational programming
  • Rule invocation provides procedural interpretation
  • Rules can be used as “checks and balances” to make

sure that collections are self-consistent

  • Example: Rule makes two copies of each files
  • Constraint checking: can be used to see if the collection is

consistent with this rule

slide-24
SLIDE 24

24

Rule Scalability and Decidability

Distinct Sets of Rules Applied in Different Ways

  • Atomic
  • Deferred (state flags)
  • Compound
  • Applied Using Micro-services

Granularity

  • User Input to Influence Rule Expression
  • Administration Enforcement
  • Collection Consistency Management

Rule Properties

  • Metadata Managing Execution (granularity, periodicity)
  • Metadata Defining Result of Rule Execution
slide-25
SLIDE 25

25

ingestInCollection(S) :- /* store & backup */ chkCond1(S), ingest(S), register(S) findBackUpRsrc(S.Coll, R), replicate(S,R). ingestInCollection(S) :- /*store & check */ chkCond2(S),computeClntChkSum(S,C1), ingest(S), register(S), computeSerChkSum(S,C2), checkAndRegisterChkSum(C1,C2,S). ingestInCollection(S) :- /* store, chk, backup & chk */ chkCond3(S),computeClntChkSum(S,C1), ingest(S), register(S), computeSerChkSum(S,C2), checkAndRegisterChkSum(C1,C2,S), findBackUpRsrc(S.Coll, R), replicate(S,R) computeSerChkSum(S,C3), checkAndRegisterChkSum(C2,C3,S). ingestInCollection(S) :- /*store,check, backup & extract metadata */ chkCond4(S),computeClntChkSum(S,C1), ingest(S), register(S), computeSerChkSum(S,C2), checkAndRegisterChkSum(C1,C2,S), findBackUpRsrc(S.Coll, R), [replicate(S,R) || extractRegisterMetadata(S)]. ingestInCollection(S) :- /* just store */ ingest(S), register(S).

Sample Rules

chkCond1(S) :- user(S) == ‘adil@cclrc’. chkCond1(S) :- coll(S) like ‘*/scec.sdsc/img/*’. chkCond2(S) :- user(S) == ‘*@nara’. chkCond3(S) :- user(S) == ‘@salk’. chkCond4(S) :- user(S) == ‘@birn’ , datatype(S) == ‘DICOM’. [OprList] implies delay for later

  • r send to a CronJobManager

Opr||Opr implies do them in parallel Opr, Opr implies do them serially

slide-26
SLIDE 26

SAN DIEGO SUPERCOMPUTER CENTER

New DataGrid Technology

  • Next Generation SRB -- iRODS: Intelligent Rule-Oriented Data Systems
  • Customizable and Flexible – User Configurable
  • Administratively Simpler – Admin Configurable
  • Build upon the experience of SRB Data Grid
  • Transition from SRB to iRODS
  • Client-level similarity
  • Meta Catalog transition
  • Current NSF Funding
  • Information Technology Research
  • 2 years
  • ~ 2 FTEs
  • Simple proto-type in a year
  • Started September 2004
  • Rule-based architecture
  • Follow-on funding
  • NARA
  • NSF
slide-27
SLIDE 27

SAN DIEGO SUPERCOMPUTER CENTER

iRODS Collaborations

  • SRB/iRODS Developers
  • Arcot Rajasekar
  • Michael Wan
  • Wayne Schroeder
  • Other SRB Team Members
  • Collaborative Development
  • UK e-Science
  • University of Queensland
  • University of Maryland
  • Others