Grid-enabled for Digital Archives: The Development and Applications - - PowerPoint PPT Presentation
Grid-enabled for Digital Archives: The Development and Applications - - PowerPoint PPT Presentation
Grid-enabled for Digital Archives: The Development and Applications of SRM- SRB interface for DataGrid Services Wei-Long UENG Academia Sinica Grid Computing Taiwan Outline Introduction to the National Digital Archive Program Data Grid
Outline
- Introduction to the National Digital
Archive Program Data Grid Services
- Deployment of Data Grid as the
Information Infrastructure
- Interoperation
– SRM-SRB interface implementation and application
- Summary
Goals of NDAP
- Preserve national cultural collections
- Popularize fine cultural holdings
- Revitalize cultural heritage and cultural
development
- Invigorate cultural, content and value-added
industries
- Enhance research, education and learning
- Promote knowledge and information sharing
- Improve literacy, creativity and quality of life
- Embrace international communities and
collaboration Cultural, Academic, Socio- economic & Educational (CASE) Values
Digital Archives in Taiwan
- Three Levels
– Archive Level
- high resolution, for preservation purposes
- accessible on a case-by-case basis
– Open-Market Level
- medium resolution, for value-added, commercial
purposes
- accessible for a fee by membership or by licensing
agreement
– Public Information Access Level
- low resolution, for educational purposes
- accessible to the public free of charge
Information Infrastructure
- Digital archives/libraries are widely
recognized as a crucial component of a global information infrastructure
- Pursue research and development efforts
- n many aspects of digital archive/library
technologies:
– (1) the establishment of standardized information reference guidelines for digital content creation, storage, and processing – (2)the development of common and application-specific information processing infrastructure and tools
Requirements
- Long-Term Preservation and Data Creation
– preserving ability to read (physically) and understand (logically)
- Full Spectrum and Precise Metadata in Collection, Object
and Management Level
- Workflow Support: Digital Information Life-Cycle
– Create--> Content Analysis & Annotation--> IPR Protection --> Repurposing-->Multi-modal/Integrative Search --> Archive
- Data Exploration across Institutional and Disciplinary
Domains
- Petabyte Scale Storage Management with Performance
Requirements
- Long-Term Preservation and Data Creation
– preserving ability to read (physically) and understand (logically)
- Full Spectrum and Precise Metadata in Collection, Object
and Management Level
- Workflow Support: Digital Information Life-Cycle
– Create--> Content Analysis & Annotation--> IPR Protection --> Repurposing-->Multi-modal/Integrative Search --> Archive
- Data Exploration across Institutional and Disciplinary
Domains
- Petabyte Scale Storage Management with Performance
A New Information Infrastructure is Required!
Why Grid is needed in multiple projects
- To conduct R&D and integration tasks to help
digitize and network the collections and resources of different institutes or multiple projects.
- A Data Grid Services is necessary to provide
virtualized middleware services for sharing data across distributed, heterogeneous data resources in different administrative and security domains.
- Digital Archives in Taiwan demand reliable storage systems
for persistent digital objects, well-organized information structure for effective content management, efficient and accurate information retrieval mechanisms, and flexible services for variant users needs.
- Grid technology enlightened a viable solution for long-
term preservation and processing diversified heterogeneous Petabyte scale digital archives.
- Data Grid aims to set up a computational and data-
intensive grid of resources for data analysis. It requires coordinated resource sharing, collaborative processing and analyzing on huge amounts of data produced and stored by many institutions.
Why Grid infrastructure for NDAP in general is required
Workflow of Digital Archives
Middleware Deployment
- The SRB system in Taiwan is used for the
long-term preservation of the digital contents produced by the digital archives projects.
- The system was deployed by the
Academia Sinica Grid Computing Centre (ASGC) in early 2004.
- Constituted from 8 sites in different
institutes.
System Architecture
Interoperation
- With the nature of Grid, the most
effective way to share data resources is to integrate the data sources and data grids.
- SRM for SRB interface is to make the
popular SRB Data Grid System interoperable with the EGEE infrastructure.
- Support the standard SRM services
for SRB
14
Why SRM
15
- SRM is an unique interface for accessing diffident backend storages
for diffident middleware.
- Easy to develop applications to adapt different backend storages.
- Provide space and file management on the storage system.
- SRM is the web service interface and the implementation usually
depends on the backend storage technology.
- Grid middleware needs to access files with an uniform interface
Concept
16
Concept
16
Architecture Overview
17
Core SRB+DSI Auxiliary Filecatalog Gridftp/management API SRM API File transfer (gridftp)
Web Service Data server management Users/applications
18
- User Interface
18
- User Interface
- SURL
- Gridftp/
management commands
18
- User Interface
- SURL
- Gridftp/
management commands
- Host information
- Hostname:t-ap51.grid.sinca.edu.tw
- Info: AMGA server
18
- User Interface
- Hostname: t-ap20.grid.sinica.edu.tw
- Info: SRB server (SRB-DSI installed)
- Return some
information
- SURL
- Gridftp/
management commands
- Host information
- Hostname:t-ap51.grid.sinca.edu.tw
- Info: AMGA server
18
- User Interface
- Hostname: t-ap20.grid.sinica.edu.tw
- Info: SRB server (SRB-DSI installed)
- Return some
information
- Hostname: fct01.grid.sinica.edu.tw
- The end point: httpg://fct01.grid.sinica.edu.tw:8443/axis/services/srm
- Info: SRM interface
- TURL
- SURL
- Gridftp/
management commands
- Host information
- Hostname:t-ap51.grid.sinca.edu.tw
- Info: AMGA server
List the content of a SRB directory
19
[sary357@fct01 isgc2008]$ sh SrmLs.sh Please input the SURL you'd like to list the content:srm://fct01.grid.sinica.edu.tw:8443/axis/services/srm?/AS/home/sary357.ASGC/sary3571/ *************** SrmLs ********************** Status code: SRM_SUCCESS Explanation: null ========================================================== ===== The individual status and result ========= The status code:SRM_SUCCESS The explanation:null File name:/AS/home/sary357.ASGC/sary3571 The Size:0 The File Type:DIRECTORY The owner of this file/directory:sary357 The owner permission of this file/directory:RW ************** The sub dir and files ***************** File name:/AS/home/sary357.ASGC/sary3571/steps2.html1 CheckSumType: The Size:9 The File Type:FILE The owner of this file/directory:sary357 The owner permission of this file/directory:RW File name:/AS/home/sary357.ASGC/sary3571/20080409 The Size:0 The File Type:DIRECTORY The owner of this file/directory:sary357
Putting a file
20
sary357@fct01 isgc2008]$ sh SrmPut.sh Please input the local file name you'd like to put:/tmp/testFile1.txt Please input the SURL you'd like to put:srm://fct01.grid.sinica.edu.tw:8443/axis/services/srm?/AS/home/sary357.ASGC/sary3571/testFile1.txt ***************SrmPrepareToPut*********************** Token:1207660686127 Status code: SRM_SUCCESS Explanation: null =============================================================== URI: srm://fct01.grid.sinica.edu.tw:8443/axis/services/srm?/AS/home/sary357.ASGC/sary3571/testFile1.txt | The status code: SRM_SPACE_AVAILABLE | TURL: gsiftp://t-ap20.grid.sinica.edu.tw:2811/AS/home/sary357.ASGC/sary3571/testFile1.txt **********Try to upload the data****************** Upload TURL: gsiftp://t-ap20.grid.sinica.edu.tw:2811/AS/home/sary357.ASGC/sary3571/testFile1.txt start.... Upload TURL: gsiftp://t-ap20.grid.sinica.edu.tw:2811/AS/home/sary357.ASGC/sary3571/testFile1.txt end...... *************** SrmPutDone ********************* Status code: SRM_SUCCESS Explanation: null
Getting a file
21
[sary357@fct01 isgc2008]$ sh SrmGet.sh Please input the SURL you'd like to get:srm://fct01.grid.sinica.edu.tw:8443/axis/services/srm?SFN=/AS/home/sary357.ASGC/111.jpg Please input the local file name you'd like to store:/tmp/111.jpg *************** SrmPrepareToGet ********************** The status: Status code: SRM_SUCCESS Explanation: null ======================================================== The individual result: The SURL: srm://fct01.grid.sinica.edu.tw:8443/axis/services/srm?SFN=/AS/home/sary357.ASGC/111.jpg the status code:SRM_FILE_PINNED the explanation of this SURL:null The TURL:gsiftp://t-ap20.grid.sinica.edu.tw:2811/AS/home/sary357.ASGC/111.jpg ************* Got the TURL **************************** ************* Download file *************************** download from TURL: gsiftp://t-ap20.grid.sinica.edu.tw:2811/AS/home/sary357.ASGC/111.jpg start.... download from TURL: gsiftp://t-ap20.grid.sinica.edu.tw:2811/AS/home/sary357.ASGC/111.jpg end...... And the file name after downloading is /tmp/111.jpg. *************** SrmReleaseFiles ******************* Status code: SRM_SUCCESS Explanation: null ==========================================================
Summary
- Need more interactions & sharing in LTP & data
curtain experiences inside and between academy and industry.
- Provides an production quality infrastructure for
distributed access to central-based data and replications.
- Make SRB an archival system of gLite-based e-
infrastructure.
- Support Lifetime policy for files - volatile, durable,
permanent in SRB
- Impose the same VO and security control to SRB as