Efficient and Scalable Climate Metadata Management with the GRelC - - PowerPoint PPT Presentation

efficient and scalable climate metadata management with
SMART_READER_LITE
LIVE PREVIEW

Efficient and Scalable Climate Metadata Management with the GRelC - - PowerPoint PPT Presentation

Efficient and Scalable Climate Metadata Management with the GRelC DAIS G. Aloisio, S. Fiore CMCC Scientific Computing and Operations Division University of Salento, Lecce Context : countdown of the Intergovernmental Panel on Climate Change


slide-1
SLIDE 1

Efficient and Scalable Climate Metadata Management with the GRelC DAIS

  • G. Aloisio, S. Fiore

CMCC Scientific Computing and Operations Division University of Salento, Lecce

slide-2
SLIDE 2

2

Context : countdown of the Intergovernmental Panel on Climate Change (IPCC) report

  • End of 2009 - Autumn 2010 : Climate simulations
  • End of 2010 - ? : Data Distribution
  • End of 2010 - Early 2012 : Scientific publications
  • Early 2013 : Report publication IPCC AR5

(Assessment Report #5) (Boucher and Pham, 2002)

slide-3
SLIDE 3

3

Scenario, issues and needs

  • Data distribution
  • Data format heterogeneity
  • Metadata management
  • Metadata schema
  • Security and local policies
  • Transparent access to the system
  • Scalable approach
  • ….
  • Huge amount of data (PBs) produced at an international level
  • Need to share data among several centres
  • Data integration and sharing FP7 keywords
  • Need to move towards open, distributed and service-based environments
  • A lot of issues:
slide-4
SLIDE 4

4

Euro-Mediterranean Centre for Climate Change

The Euro-Mediterranean Centre for Climate Change (CMCC) is a national initiative of scientific research in the field of climate change

  • Research Divisions (SCO, ANS, CIP, ISC, IAFENT, FDD)
  • Partners (INGV, UNILE, CIRA, etc.)
  • Associated Centres (SPACI, etc.)

ING V UNILE SPACI UNISS CIRA UNITUS IAMB SANNI O CVR FEEM CRMPA

Partners Associate Centers

slide-5
SLIDE 5

INGV UNILE SPACI UNISS CIRA UNITUS IAMB SANNIO CVR FEEM CRMPA

Partners Associate Centers

CMCC: An Integrated and Ubiquitous Environment for Climate Change

Data Management: Metadata/data services CMCC Environment: acts as an incubator for the proposed technologies Interdisciplinary: Climate and Computing Scientists Key Points: Transparency and Interoperability Expertise and Know-how Computing Scientists (Unile) SPACI support Middleware: gLite, Globus, etc. Metadata Mng: Grid Metadata Handling System (GMHS)

slide-6
SLIDE 6

6

CMCC Data & Metadata: issues & requirements

Data & Metadata Management

  • Data distribution, access, management, delivery, etc.
  • Metadata management, access, integration
  • Metadata search & discovery facilities
  • Metadata access & browsing
  • Pervasive and Ubiquitous access (Data Portal)
  • Metadata Agreement: design and schema implementation

Main (non functional) requirements

  • Scalability
  • Transparency
  • Efficiency
  • Interoperability
  • Security
  • Loosely coupled system
  • Easy to access system
slide-7
SLIDE 7

7

CMCC Metadata Services

– Grid based solutions – Centralized Solution

  • Initially deployed
  • Based on GRelC DAS
  • Centralized Metadata Mng

– Distributed Solution

  • Work in progress
  • GRelC DAIS
  • P2P Solution
  • Metadata distribution

CMCC Data Distribution Centre

– CMCC Data Grid Portal – Data oriented functionalities – Search and discovery

  • Dataset browsing
  • Editing functionalities

Data Management @ CMCC - Phase1

CMCC Metadata Agreement

– Standard Analysis

  • ISO19115 / ISO19139
  • Dublin Core Metadata
  • Other standards and schema currently

used

– Schema definition

  • Design and schema implementation
  • CMCC Working Group
  • Interdisciplinary Group
  • Climate and Computer scientists
  • Schema describes
  • Models
  • Algorithms
  • Datasets

Alias METADATA

slide-8
SLIDE 8

8

Metadata Management Stack

METADATA EXTRACTION - ACCESS - BROWSING - QUERY - AGGREGATION - VALIDATION - DISPLAY SEARCH - DISCOVERY - DELIVERY BASIC ACCESS SERVICES - TRANSLATION LIBRARIES AUTOMATIC INGESTION LIBRARIES

Metadata Catalog

XML Doc/DB Metadata Schema

Low level APIs Low Level Services High Level Services Physical Layer

SOAP over GSI httpg protocol

CMCC Graphical User Interface , Data Grid Portal, Command Line Interface

Application Layer

SEARCH - DISCOVERY - PUBLISHING BROWSING - DISPLAY

Interoperable WS-I Interface

slide-9
SLIDE 9

9

Metadata Management: Stack

slide-10
SLIDE 10

10

GRelC Project (starting date 2001)

Grid Relational Catalog (GRelC) is a project which aims at designing and developing a set of efficient, secure and transparent Data Grid Services

Grid

DB

DB DB

XML

slide-11
SLIDE 11

11

Grid Metadata Handling System: Data Integration Layer

Grid Service Catalog Data Grid Portal

slide-12
SLIDE 12

12

GRelC DAIS

Grid Metadata Handling System: architecture in the small

slide-13
SLIDE 13

13

GRelC Data Access Data Sources (DB)

gandalf.unile.it Linux x86 sara.unile.it Mac OS X sigma2.unile.it Linux IA64 gridsurfer.unile.it FreeBSD galileo.hpcc.unical.it Linux IA64 sepac00.projects.cscs.ch Linux x86 spacina.na.infn.it Linux IA64

National & International Testbeds

Lecce (Italy) Bejing (China)

slide-14
SLIDE 14

14

Test Performance

slide-15
SLIDE 15

15

GRelC & EGEE RESPECT PROGRAM

slide-16
SLIDE 16

16

CMCC Metadata Grid Service

  • A Metadata Grid Service Infrastructure

– GRelC Project based solution

  • Moving from GRelC DAS to GRelC DAIS
  • Data Access and Integration capabilities
  • Scalable approach to distributed database management
  • P2P and Grid Protocols/Services
  • CMCC customization

– GRelC DAIS

  • Deployment is ongoing
  • 4 Sites within the preliminary phase
  • Lecce, Bologna, Capua, Sassari
  • Distributed solution
  • Data Grid Portal available for metadata access
  • Two step search & discovery process based on different data models
  • SOA based approach with full security support through GSI
slide-17
SLIDE 17

17

CMCC on iSGTW

Key issues:

  • GRelC DAIS 3.0
  • CMCC GMHS
  • RESPECT Program
  • CMCC Deployment

… See at: http://www.isgtw.org/?pid=1001234

slide-18
SLIDE 18

18

Scenario, issues and needs

  • Data distribution
  • Data format heterogeneity
  • Metadata management
  • Metadata schema
  • Security and local policies
  • Transparent access to the system
  • Scalable approach
  • ….
  • Huge amount of data produced at an international level
  • Need to share data among several centres
  • Data integration and sharing FP7 keywords
  • Need to move towards open, distributed and service-based environments
  • A lot of issues:
slide-19
SLIDE 19

19

A new research effort: Climate-G

The main goal of Climate-G is to create a unified environment for climate change, able to concentrate in the same context big amount

  • f data geographically spread among several centres, rich metadata

descriptions, efficient data access services, advanced data analysis and visualization tools, etc. exploiting and joining knowledge and skills in the fields of climate change and computational science

slide-20
SLIDE 20

20 Università del Salento

Climate-G partners

slide-21
SLIDE 21

21

Climate-G: Involved People

Principal Investigators Giovanni Aloisio - Euro-Mediterranean Centre for Climate Change (CMCC) and University of Salento, Italy Sandro Fiore - Euro-Mediterranean Centre for Climate Change (CMCC) and University of Salento, Italy Sébastien Denvil - Institut Pierre-Simon Laplace (IPSL), France Monique Petitdidier - Institut Pierre-Simon Laplace (IPSL), France

Involved people Giovanni Aloisio(1,6), Sandro Fiore(1,6), Sébastien Denvil(2), Monique Petitdidier(2), Peter Fox(3), Horst Schwichtenberg(4), Jon Blower(5), Roberto Barbera(7), David Weissenbach(8), André Gemuend(4)

Institutions

  • 1. Euro-Mediterranean Centre for Climate Change (CMCC), Italy
  • 2. Institut Pierre-Simon Laplace (IPSL), France
  • 3. High Altitude Observatory (HAO) at the NCAR,USA
  • 4. Fraunhofer-SCAI, Germany
  • 5. University of Reading, UK
  • 6. University of Salento, Italy
  • 7. University of Catania, Italy
  • 8. Institut de Physique du Globe de Paris, France
slide-22
SLIDE 22

22

Climate-G: Metadata (XML and RDB)

slide-23
SLIDE 23

23

Metadata Distribution and virtualization

For each site: Relational DB (index) XML DB (entire schema) Virtualization/Integration layer: GRelC DAIS Virtualization allows to conceal: Data distribution Number of sites, RDBMS and XML back-ends P2P Topology Data Integration aspects technological details …

slide-24
SLIDE 24

24

Thanks to all these efforts we published an article on EGEE Newsletter Title: Climate Modelling and EGEE

Link: http://eu-egee.org/newsletter/automn08/Autumn08_draft.html#news5

GRelC DAIS deployment in Climate-G

slide-25
SLIDE 25

25

Climate-G Data Distribution Centre

  • Main Functionalities
  • Search & Discovery
  • Data access & viz
  • Metadata browsing
  • Users and roles mng
  • ….
  • Features
  • Filters and listeners
  • Design Pattern approach
  • Easy to use interfaces
  • Platform independent
  • Secured by design
  • No additional software is required
  • It entirely replaces the Command Line Interface

Developed by CMCC ADM Team

slide-26
SLIDE 26

26

Climate-G DDC: Snapshots

slide-27
SLIDE 27

27

Complete OPeNDAP Support

slide-28
SLIDE 28

28

Data Visualization (IDV support)

slide-29
SLIDE 29

29

For more information…

P.Is.: G. Aloisio, S. Fiore, S. Denvil, M. Petitdidier Climate-G URL: http://grelc.unile.it:8080/ClimateG-DDC Newsletter: climateg-news@sara.unisalento.it (if you want to join send an email to climateg-info@cmcc.it) Questions/Information: climateg-info@cmcc.it To issue a new Grid Certificate: climateg-ca@cmcc.it

slide-30
SLIDE 30

30

Conclusions

  • CMCC Metadata handling system provides a scalable, secure and

interopable data grid framework

  • It is GRelC based: main service GRelC DAIS
  • GRelC DAIS provides support in Grid for a wide range of data

resources (relational and XML) is currently tested on several grid environments

  • GRelC DAIS provides a distributed metadata management P2P and

grid based

  • CMCC Data Grid Portal to ease Metadata mng via Web Interface
  • GRelC middleware is currently included in the EGEE Respect Program
  • An International Testbed is now exploiting the GRelC DAIS to share

datasets

Climate-G @ Demo session in Catania 4th EGEE UF/ OGF

slide-31
SLIDE 31

31

Acknowledgments

Giovanni Aloisio (CMCC) Roberto Barbera (Univ. Catania) Jon Blower (Univ. Reading) Sébastien Denvil (IPSL) Sandro Fiore (CMCC) Peter Fox (HAO, NCAR) André Gemuend (Fraunhofer-SCAI) Monique Petitdidier (IPSL) Horst Schwichtenberg (Fraunhofer-SCAI) David Weissenbach (IPGP)

Many thanks to all of the people involved in the testbed