DILIGENT Digital libraries powered by the Grid Bhaskar Mehta, Fhg - - PowerPoint PPT Presentation
DILIGENT Digital libraries powered by the Grid Bhaskar Mehta, Fhg - - PowerPoint PPT Presentation
DILIGENT Digital libraries powered by the Grid Bhaskar Mehta, Fhg IPS I, Germany Work Package Leader, Content and Metadata Management Bhaskar.Mehta@ ipsi.fraunhofer.de Overview Introduction to DILIGENT Grid: Oppurtunity and Challenge
International Symposium on Grid Computing, Taipei, 3rd May 2006 2
Overview
Introduction to DILIGENT Grid: Oppurtunity and Challenge Challenges in Information Management Data Management in DILIGENT Open Issues and Next S teps
International Symposium on Grid Computing, Taipei, 3rd May 2006 3
Introduction to DILIGENT
DILIGENT: A Digital Library Infrastructure on Grid-Enabled Technology Duration: 3 years Commencement Date: S eptember 2004 Effort: 1024 p/ m Cost: 9.8 M Euro European Union funding: 6.3 M Euro
International Symposium on Grid Computing, Taipei, 3rd May 2006 4
Partners
Consiglio Nazionale delle Ricerche – ISTI (Italy, S cientific Co-ordinator) European Research Consortium for Informatics and Mathematics (France, Adm Coordinator) University of Athens (Greece) Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. – IPS I (Germany) University for Health Informatics and Technology Tyrol (Austria)/ ETH Zürich/ UNI Basel University of S trathclyde (United Kingdom) Engineering Ingegneria Informatica S pA (Italy) Fast S earch & Transfer AS A (Norway) 4D S OFT S
- ftware Development Ltd. (Hungary)
European Organization for Nuclear Research (S witzerland) European S pace Agency – ESA (Italy) S cuola Normale S uperiore (Italy) RAI Radio Televisione Italiana (It aly)
International Symposium on Grid Computing, Taipei, 3rd May 2006 5
Grid Jobs
DILIGENT Objectives
To creat e an advanced test-bed t hat will allow members of dynamic virt ual e-S cience organizat ions t o access shared knowledge and t o collaborat e in a secure, coordinat ed, dynamic and cost -effect ive way. Expect ed Out come A Digit al library infrast ruct ure which is Grid based A t est bed based on t his infrast ruct ure Two implement ed S cenarios: Eart h S cience and Cult ural Herit age
International Symposium on Grid Computing, Taipei, 3rd May 2006 6
Motivation for NGDLs: Digital library challenges
Cost & Time Construction and management of a DL require Construction and management of a DL requires s high investments and specialized person high investments and specialized personn nel el Years are spent in designing and setting up a Years are spent in designing and setting up a DL DL S hared Infrastructure with Authoring Capabilities S hared Infrastructure with Authoring Capabilities New functionality is computat ionally expensive and evolving Multimedia indexing, clustering: e.g. LS I, pLS A Multimedia indexing, clustering: e.g. LS I, pLS A Multimedia querying: e.g. Image retrieval by feature vectors Multimedia querying: e.g. Image retrieval by feature vectors Multimedia processing: e.g. Satellite images, Partial encrpytion Multimedia processing: e.g. Satellite images, Partial encrpytion for video for video S ervice Based Digital Libaries, with process management/ distribu S ervice Based Digital Libaries, with process management/ distribution support tion support Heterogeneity & Distribution DLs (and underlying components) use different models, apis, data DLs (and underlying components) use different models, apis, data formats, etc formats, etc DLs are distributed/ replicated DLs are distributed/ replicated Basing DLs on standards Basing DLs on standards Providing support for federated/ distributed search, data broker Providing support for federated/ distributed search, data brokering ing
International Symposium on Grid Computing, Taipei, 3rd May 2006 7
Grid as an Oppurtunity... and a Challenge
Digital Digital Libary Libary
Challenge Potential
Grid Grid Grid Grid Grid DL DL OS OS
International Symposium on Grid Computing, Taipei, 3rd May 2006 8
Some Methodological Challenges
S ervice Oriented, Distributed Architecture Requires open systems for Requires open systems for indexing, searching, feature extraction, metadata management indexing, searching, feature extraction, metadata management Distributed S earch Query Optimization Query Optimization S emantic Data Integration S emantic Data Integration On Demand S ervice Activation S atellite Images S atellite Images Extraction Extraction Virtual Organizations Content S ecurity Content S ecurity Resource S ecurity Resource S ecurity
International Symposium on Grid Computing, Taipei, 3rd May 2006 9
Some Technological Challenges
Grid Technology is File centric: DLs are collection centric Metadata mmgt with the Grid: Based on key-value pairs Lack of support for structered data (e.g. XML) Lack of support for structered data (e.g. XML) Retrieval S upport is limited Retrieval S upport is limited Availibility and Replication S upport: file based Real time processing vs batch processing DL users require instantaneous response (ala Google) DL users require instantaneous response (ala Google) Grid processes usually can Grid processes usually can‘ ‘ t provide real time response t provide real time response
International Symposium on Grid Computing, Taipei, 3rd May 2006 10
DILIGENT Architecture
International Symposium on Grid Computing, Taipei, 3rd May 2006 11
Data Management in DILIGENT (1)
Common functionality for Content and Metadata management Effort duplication S torage S torage , , Replication Replication Change Change Notification Notification, Association , Association Consistancy Consistancy gLite functionality S eperate S eperate pipelines pipelines for for Content Content and and Metadata Metadata Incomplete Incomplete functionality functionality ( (e.g e.g. . replication replication) ) Insufficient Insufficient for for DILIGENT DILIGENT FileS ystem FileS ystem vs vs Data Data Model Model Flat Flat records records vs vs XML XML
gLite
CM MM
gLite
CM MM
Common Layer
gLite
CM MM
Common Layer
XMLDB Emulation
International Symposium on Grid Computing, Taipei, 3rd May 2006 12
Data Management in DILIGENT (2)
Indentifying 3 basic layers Base Base layer layer : : glite glite functionality functionality (S E (S E, , Catalog Catalog, FTS ) , FTS ) S torage S torage Layer Layer: ( : (Replication Replication, , change change notification notification, , transactional transactional support support ) ) S ervice S ervice Layer Layer: S ervice : S ervice specific specific functionality functionality, API/ WS , API/ WS view view. .
Content Management Metadata Management Storage Management Layer Base Layer
International Symposium on Grid Computing, Taipei, 3rd May 2006 13
Data management in DILIGENT (3)
API / WSDL API / WSDL
Storage Layer Base Layer
Content Manager
Metadata Management
Content Security
Metadata Catalog
Metadata Broker Query Processor Annotation
Manager
International Symposium on Grid Computing, Taipei, 3rd May 2006 14
Current Status and Future Steps
Detailed design has been completed APIs under implementation 1st Experimental prototype based on OpenDLib Extensive testing and deployment of gLite 1.1 -> 3.0 Next S teps Integrate finished components Integrate finished components Deploy Diligent on the Grid Infrastructure Deploy Diligent on the Grid Infrastructure Develop prototypes based on Diligent Develop prototypes based on Diligent Testing & User Feedback Testing & User Feedback
International Symposium on Grid Computing, Taipei, 3rd May 2006 15
Types of Involvement for Observers
Information about proj ect activities (www.diligentproj ect.org) Involvement in workshops Possible involvement in validation Feedback for DILIGENT development Candidates for adoption of DILIGENT infrastructure
International Symposium on Grid Computing, Taipei, 3rd May 2006 16
Contact us
Co-operation with other proj ects/ communities is welcome www.diligentproject.org
Contact people:
- Donatella Castelli, Pasquale Pagano, IS
TI-CNR donatella.castelli/ pasquale.pagano@ isti.cnr.it
- Jessica Michael, ERCIM
j essica.michel@ ercim.org
- Bhaskar Mehta, Fraunhofer IPS
I bhaskar.mehta@ ipsi.fraunhofer.de
International Symposium on Grid Computing, Taipei, 3rd May 2006 17
Questions ? Questions ?
International Symposium on Grid Computing, Taipei, 3rd May 2006 18
Research today
Research is carried out by groups of individuals, belonging to different institutions, that dynamically aggregate to carry out proj ects together By sharing their resources these individuals create better conditions for their research Digital libraries that maintain the produced knowledge and make it accessible worldwide are becoming key instruments for scientific collaboration in many research areas
International Symposium on Grid Computing, Taipei, 3rd May 2006 19
Complementary User Scenarios
Earth S cience Domain: Well Well-
- established tradition in exploiting new
established tradition in exploiting new technologies technologies Wide variety of content types (maps, satellite images, Wide variety of content types (maps, satellite images, measurements, text) measurements, text) Very large, dynamic data sets Very large, dynamic data sets S upport for community events, report generation, S upport for community events, report generation, disaster management. disaster management. Cultural Heritage Domain: IT technology exploitation still in infancy IT technology exploitation still in infancy Multidisciplinary collaborative research Multidisciplinary collaborative research Image based retrieval/ semantic analysis of images Image based retrieval/ semantic analysis of images S upport for research and teaching S upport for research and teaching
International Symposium on Grid Computing, Taipei, 3rd May 2006 20
DILIGENT Goals & Approach
S ervice Oriented Digital Library Infrastructure Based on EGEE High computing and storage capabilities for handling a wide vari High computing and storage capabilities for handling a wide variety ety
- f information obj ects
- f information obj ects