 
              An Intelligent Rule-Oriented Data Management System Wayne Schroeder San Diego Supercomputer Center, University of California San Diego DataGrid SAN DIEGO SUPERCOMPUTER CENTER
Talk Outline • Background • Brief Overview of the SDSC SRB • Current Projects/Usage • Activities/Plans • Rule-Oriented Data Management System • iRODS Requirements/Planning • Architecture • Infrastructure Development • Collaborations/Plans SAN DIEGO SUPERCOMPUTER CENTER
Using a Data Grid – in Abstract Data Grid Data Grid Data delivered Ask for data •User asks for data from the data grid •The data is found and returned •Where & how details are hidden SAN DIEGO SUPERCOMPUTER CENTER
Using a Data Grid - Details DB Storage Resource Storage Resource Metadata Catalog Broker Broker •1 st server asks 2 nd for data •User asks for data •Data request goes to SRB Server •Server looks up data in catalog •Catalog tells which SRB server has data •The data is found and returned SAN DIEGO SUPERCOMPUTER CENTER
Using a Data Grid - Details DB MCAT SRB SRB SRB SRB SRB SRB •Data Grid has arbitrary number of servers •Complexity is hidden from users SAN DIEGO SUPERCOMPUTER CENTER
Storage Resource Broker A Data Grid Solution • Collaborative client-server system that federates distributed heterogeneous resources using uniform interfaces and metadata • Provides a simple tool to integrate data and metadata handling – attribute-based access • Blends browsing and searching • Developed at SDSC - Operational for 7+ years; - Under continual development since 1997; - Customer-driven SAN DIEGO SUPERCOMPUTER CENTER
Some SRB Features The SRB is an integrated solution which includes: • a logical namespace, • interfaces to a wide variety of storage systems, • high performance data movement (including parallel I/O), • fault-tolerance and fail-over, • WAN-aware performance enhancements (bulk operations), • storage-system-aware performance enhancements ('containers' to aggregate files), • metadata ingestion and queries (a MetaData Catalog (MCAT)), • user accounts, groups, access control, audit trails, GUI administration tool • data management features, replication • user tools (including a Windows GUI tool (inQ), a set of SRB Unix commands, and Web (mySRB)), and APIs (including C, C++, Java, and Python). SRB Scales Well (many millions of files, terabytes) Supports Multiple Administrative Domains / MCATs (srbZones) And includes SDSC Matrix: SRB-based data grid workflow management system to create, access and manage workflow process pipelines. SAN DIEGO SUPERCOMPUTER CENTER
Recent SRB Release, April 28 • Any valid ASCII characters are now acceptable in SRB filenames, except a string of two quotes in a row • Data integrity and vault management • Quota System • SRB Web Perl Portal • SRB account management via grid-mapfile • Real time data management • New driver for NCAR MSS • Completely reworked web site/documentation system (MediaWiki) • Other new features • Critical bug patches for in 3.4.0 included • Other bugzilla fixes (about 35) • MCAT Patch SAN DIEGO SUPERCOMPUTER CENTER
Recent SRB Releases • 3.4.1 April 28, 2006 • 3.4 October 31, 2005 • 3.3.1 April 6, 2005 • 3.3 February 18, 2005 • 3.2.1 August 13, 2004 • 3.2 July 2, 2004 • 3.1 April 19, 2004 • 3.0.1 December 19, 2003 • 3.0 October 1, 2003 • 2.1.2 August 12, 2003 • 2.1.1 July 14, 2003 • 2.1 June 3, 2003 • 2.0.2 May 1, 2003 • 2.0.1 March 14, 2003 • 2.0 February 18, 2003 SAN DIEGO SUPERCOMPUTER CENTER
SRB Projects • Astronomy • National Virtual Observatory • Data Grids • UK e-Science CCLRC • Teragrid • Digital Libraries and Archives • National Archives and Records Administration • National Science Digital Library • Persistent Archive Testbed • Ecological, Environmental, Oceanographic • ROADnet • Southern California Earthquake Center • SIO Digital Libraries • Molecular Sciences • Synchrotron Data Repository • Alliance for Cellular Signaling • Neuro Sciences • Biomedical Information Research Network • Physics and Chemistry • BaBar • Many others Over 650 Tera Bytes in 106 million files SAN DIEGO SUPERCOMPUTER CENTER
SRB Scalability • Over 2 Petabytes World-wide • Major SRB instances in the UK, Australia, Taiwan, US • United Kingdom - UK e-Science • Australia - APAC • Taiwan - Academia Sinica, NCHC • Europe -IN2P3, Italy, Norway • United States • 660 Terabytes at SDSC • 100 Million files • SAM QFS, HPSS, Unix file system, SRB Bricks SAN DIEGO SUPERCOMPUTER CENTER
SDSC Hosted SRB Data SAN DIEGO SUPERCOMPUTER CENTER
Case Study: SRB in BIRN BIRN Toolkit Collaboration Applications Queries/Results Viewing/Visualization Data Management Grid Management Computational Grid Mediator Data Model GridPort Database Data Grid Scheduler Database Data Access Globus SRB MCAT NMI File HPSS System Distributed Resources SAN DIEGO SUPERCOMPUTER CENTER
Federated SRB Operation Peer-to-peer Read Application Brokering in Boston Parallel Data Logical Name Access Or Attribute Condition 1 6 5/6 SRB server SRB 3 server 4 SRB 5 SRB agent Durham agent San Diego 2 Server(s) MCAT R1 Spawning 1.Logical-to-Physical mapping Data R2 R2 2. Identification of Replicas Access 3.Access & Audit Control SAN DIEGO SUPERCOMPUTER CENTER
Application SDSC Storage Resource Broker & Meta-data Catalog Resource, User C, C++, Unix Java, NT Prolog Third-party Web Linux I/O Shell User Browsers Predicate copy Defined SRB Remote MCAT Proxies Databases Archives File Systems DB2, Oracle, HPSS, ADSM, Unix, NT, HRM Sybase UniTree, DMF Mac OSX Dublin DataCutter Core Application Meta-data SAN DIEGO SUPERCOMPUTER CENTER
IRODS - the Next Generation of Data Grid Technology SAN DIEGO SUPERCOMPUTER CENTER
Moving Forward, a Two-Prong Plan Maintain and Adapt SRB to New Usages: SRB has reached a Stable Plateau • Bug Fixes • Some New Features • Merge Features Developed by others • Continue Testing • Improve Documentation • Continue Application Support • Existing and new Projects • Continue Answering User Queries Chart New Areas • Federation Research - ZoneSRB • Collaborative Data Grids • Real-time Data Grids - MCAT1 • Virtual Object Ring Buffer • Sensors and Video Streams Server1.1 • Collaborating Observatories Server1.2 MCAT3 • SRB Workflows - New UI for Admins and users • Kepler actors, Matrix, etc Server3.1 • iRODS - Adaptive Middleware Architecture MCAT2 Server2.2 Server2.1 17
Continuing SRB Support • 10 FTEs SRB • 5 FTEs iRODS • iRODS Developers Support SRB SAN DIEGO SUPERCOMPUTER CENTER
Next generation Data Architecture • SRB is quite complex – with too many functions and operations • The intelligence is hard-coded • extensions/modifications require extreme care • But, the modules are fairly robust and reusable • AIM: Can we make SRB more flexible • Easy to customize at finer level • Example: Higher authentication for a particular collection • Example: Can we use stricter authorization for a collection • Example: Can we treat a particular resource differently • Currently- needs code changes • Solution: Use rule-based architecture to provide flexibility 19
iRODS • A New Paradigm in Middleware Development • Flexible Collection management • Can be customized at user/collection-levels, … • Language for Collection management • As in stored procedures, triggers (RDB) • Administrative ease • Lot of potential beyond SRB • adaptive middleware architectures • This will be a fully Open Source effort SAN DIEGO SUPERCOMPUTER CENTER
Rule-Oriented Data Systems Framework Client Interface Admin Interface Rule Invoker Resources Rule Config Metadata Service Modifier Modifier Modifier Manager Resource-based Module Module Module Services Rule Consistency Consistency Consistency Micro Engine Check Check Check Service Module Module Module Modules Current Confs Metadata-based State Services Rule Base Micro Meta Data Service Base Modules SAN DIEGO SUPERCOMPUTER CENTER
Client Operation such as srbObjCreate Server-side Client-side Condition checking, rule Rule Checking firing Setup state and interact with RCAT – updates and Establish State modifications to persistent state Backend Processing Micro Data Movement Services Cleanup state and interact with RCAT – updates and CleanUp modifications to persistent state Rule-oriented Data System (Phase I Operational Model) SAN DIEGO SUPERCOMPUTER CENTER
Recommend
More recommend