The Climate-G testbed: issues, requirements and results S. Fiore, - - PowerPoint PPT Presentation

the climate g testbed issues requirements and results
SMART_READER_LITE
LIVE PREVIEW

The Climate-G testbed: issues, requirements and results S. Fiore, - - PowerPoint PPT Presentation

The Climate-G testbed: issues, requirements and results S. Fiore, Ph.D. SPACI and University of Salento, Italy sandro.fiore@unisalento.it On behalf of Climate-G Team EGI Technical Forum - Sept 16, 2010 Outline Introduction Issues


slide-1
SLIDE 1

The Climate-G testbed: issues, requirements and results

  • S. Fiore, Ph.D.

SPACI and University of Salento, Italy sandro.fiore@unisalento.it On behalf of Climate-G Team

EGI Technical Forum - Sept 16, 2010

slide-2
SLIDE 2

EGITF 2010 2

Outline

  • Introduction
  • Issues and requirements

– data, metadata – scientific gateways

  • The Climate-G testbed

– user requirements – architecture, infrastructure – the Climate-G Portal

  • Snapshots

– future work

slide-3
SLIDE 3

EGITF 2010 3

Climate change data deluge

  • Huge amount of data produced across several countries

leads to:

– Need to share data among centers at an international level – Need to move towards open, distributed and transparent environments – Need to easily access to data through Scientific Data gateways – Need to carry out post-processing activities as well as analysis – Need to move towards domain-specific metadata schemas – Need to exploit large infrastructures to implement world-wide “production-level” environments for climate change scientists – More emphasys on publishing data into “global” contexts

slide-4
SLIDE 4

EGITF 2010 4

Requirements, needs, issues and challenges (I)

  • Management of data and metadata
  • Data is distributed among several centers
  • Easy join (in terms of startup costs) for new sites
  • Metadata management needs to be distributed too
  • Local autonomy needs to be preserved
  • Data formats: basically NetCDF, but also CSV, Grib, etc.
  • Metadata schema
  • Fine Grained and coarse grained data management
  • Coarse grained (e.g. climate datasets)
  • Fine grained (e.g. impacts data)
  • Main functionalities to address first use cases
  • Data search & discovery
  • Data access (e.g. download functionality)
  • Data subsetting (e.g. slicing and dicing of data)
  • Data visualization through different tools
  • Metadata management
slide-5
SLIDE 5

EGITF 2010 5

Requirements, needs, issues and challenges (II)

  • Security
  • Secure access to data (at data service level)
  • Secure access to metadata (at metadata service level)
  • Secure access to the portal (security at portal level)
  • Different roles (admin, data provider, metadata contributor, etc.) must be defined at several

levels to set different privileges

  • Uniform security approach
  • Acces to the distributed environment via “Scientific Gateways”
  • Data Distribution Centre to manage data, metadata, tools, services, users, etc.
  • Integration of services and tools widely deployed, tested and adopted by the community
  • Pervasive, easy to access, easy to extend, ubiquitous, web based
  • Portal centric infrastructure & data centric portal
  • The data must be placed in the middle of the scene
  • Multiple options must be available to manage, display, download, analyze the data as needed.
slide-6
SLIDE 6

EGITF 2010 6

Search & Discovery: Metadata Management

  • “Context” description: from data to information
  • Exaustive schemas are needed

– Domain based schema, community driven vocabulary

  • Examples come from ES Curator and Metafor
  • Provenance metadata are challenging today to identify, trace and

record the history of data

  • Metadata Tools and Services
  • Metadata services to manage projects/experiments/datasets

descriptions

  • Main approaches

– DBMS based – Grid based – OGC based

  • Common interfaces definition is an on going process

– OGC, OGF and other standardization bodies

  • Standardization activity is still needed
  • Metadata Tools: automatic extraction, ingestion, validation, etc.are needed

due to the high number of metadata information

slide-7
SLIDE 7

EGITF 2010 7

Scientific Gateways (I)

  • From Data Portals to Scientific Data Gateways
  • From simple web data access applications to rich

integrated environments

  • Besides data, users can find:
  • Rich metadata descriptions, data visualization tools, a wide variety
  • f services, etc.
  • Data centric approach
  • Looking at the same data from different perspectives
  • Looking at the same data in complementary ways
  • Different grain-level approach for the data services

– From coase (file access/download) to fine (variable aggregation)

  • Union: join data from different datasets
  • Tiling: join data along existing dimension
  • ….
  • Different metadata support approach

– Domain based, community driven, widely adopted

  • Metafor (Europe)
  • ES Curator (US)
slide-8
SLIDE 8

EGITF 2010 8

Scientific Gateways (II)

  • From Data Portals to Scientific Data Gateways
  • Towards Web2.0 approach

– Usability and sharing as key concepts in Web2.0 – From personal websites to blogging – From publishing to participation – From content management systems to wikis – Mashup, Widgets and Tagging are some important features of Web2.0 – Web2.0 - a good reference available (Tim O’Reilly)

  • http://oreilly.com/web2/archive/what-is-web-20.html
  • Stronger integration of scientific, collaborative and

social aspects

– Social networking capabilities are poorly exploited today but…

  • They can increase level of discussions, feedback, data

exploitation, scientific results, dissemination among different groups, scientific teams, etc.

slide-9
SLIDE 9

EGITF 2010 9 CMCC Workshop - June 10-12, 2009 - Ugento

Scientific Gateways (III)

slide-10
SLIDE 10

EGITF 2010 10

A real use case: the Climate-G testbed

The main goal of Climate-G is to create an open and unified environment for climate change enabling geographical and cross- institutional data discovery, access, analysis, visualization and sharing. This effort has been conceived as a proof of concept for the involved technologies (in particular the GRelC service) and it has been supported during the EGEE project by the Earth Science Cluster Community. It acts as a virtual laboratory involving partners both in Europe and US

slide-11
SLIDE 11

EGITF 2010 11

The Climate-G partnership

slide-12
SLIDE 12

EGITF 2010 12

The central role of the User Community

Key assumption:

  • “The user community must be an active part in the whole process (requirements,

tools to be integrated into the system, semantics of metadata, feedback and validation, list of priorities, meetings, etc.)”

  • Several partners of the Climate-G testbed works in the Earth Sciences and

Environmental domains

  • Most of the users comes from the target community (about 80%)
  • The activity has been disseminated in the Geosciences conferences: EGU09,

EGU2010, ESA2009, AGU2010 (tentative), etc.

  • Attract new users
  • Identify new needs and requirements
  • Define new use cases
  • Improve the existing software
  • ….
slide-13
SLIDE 13

EGITF 2010 13

Data and Metadata distribution

slide-14
SLIDE 14

EGITF 2010 14

Grid Metadata Service: GRelC (EGEE RESPECT)

slide-15
SLIDE 15

EGITF 2010 15

Portal-centric view of the infrastructure

slide-16
SLIDE 16

EGITF 2010 16

Climate-G Portal

  • Main Functionalities
  • Search & Discovery
  • Data access & viz
  • Metadata management
  • Users and roles mng
  • List of experiments
  • List of entries/datasets
  • Features
  • Easy to use interfaces
  • Platform independent
  • Secured by design
  • No additional software is required
  • It entirely replaces the Command Line Interface
  • JSP/Servlets based, AJAX (dynamic web pages)
  • Fast adoption of components in mashups, like Google Maps
slide-17
SLIDE 17

EGITF 2010 17

Datasets

IPSL/CNRS Fraunhofer-SCAI University of Cantabria Euro-Med Centre for Climate Change

How data are organized?

Projects Experiments Variable 1:1

slide-18
SLIDE 18

EGITF 2010 18

Climate-G Portal: Snapshots

slide-19
SLIDE 19

EGITF 2010 19

Climate-G: domain based services/tools

Climate-G includes domain-based services & tools into the infrastructure

  • User community requirement: domain-based services part of the infrastructure
  • Provides domain specific tasks. Well known, tested and widely adopted.
  • Legacy systems already available and accessible

Some examples:

  • OPeNDAP (OPeNDAP Consortium)
  • Provides access to climate data sources
  • Widely adopted in the Climate community
  • nc Web Map Service (Univ. of Reading)
  • HTTP interface for requesting geo-registered map images from geospatial databases
  • Integrated Data Viewer (UNIDATA,UCAR) and Godiva2 (Univ. of Reading)
  • Data visualization tools widely adopted by the Climate community
slide-20
SLIDE 20

EGITF 2010 20

Data Access - Complete OPeNDAP Support

slide-21
SLIDE 21

EGITF 2010 21

Data Visualization (IDV support)

slide-22
SLIDE 22

EGITF 2010 22

Godiva2 Integration

Two-dimensional Data visualization tool Google Earth

slide-23
SLIDE 23

EGITF 2010 23

Climate-G and EGEE-EGI

  • In April 2009, Climate-G has been recognized as a new VO by the EGEE Resource

Allocation Group (climate-g.vo.eu-egee.org)

  • First VO devoted to climate change community!
  • Several Climate-G presentations in the Geoscience community (EGU09, EGU2010, ESA Workshop,

etc.)

  • About 80 users joined the VO since April
  • Most of them comes from the climate context and are using a grid infrastructure for the first time -> new

users

  • Interesting level of feedback from our users in terms of:
  • suggestions to improve the portal
  • new data sources and new tools to be included into the portal
  • application-level requirements (=> good for EGEE computational infrastructure)
  • Several EGEE sites have been configured to support the “Climate-G VO”

(Fraunhofer SCAI, SPACI-LECCE, IPSL/CNRS IPGP,UniCantabria)

  • More than 300 CPUs are now available for preliminary tests
  • Seed Resources will be exploited by the Climate-G testbed/users
  • Thanks to the EGEE NA4 VO Support Group for their support
  • The whole Climate-G EGEE infrastructure (data and computational) must be

accessible through the Climate-G Portal, our scientific gateway

  • Climate-G invited demo in the first year review of EGEE-III (June 2009)
slide-24
SLIDE 24

EGITF 2010 24

Next steps

  • Several issues and needs must be addressed to improve the system
  • Some EGI partners involved into the testbed will provide further support to this

community

  • Harvesting functionalities will be provided as part of the metadata support provided

by the GRelC software

  • A monitoring system will be included into the portal to check the service availability
  • The computational part will be integrated into the whole architecture to support

analysis, pre and post-processing

  • Improved web-based interfaces and stronger adoption of mashup, tagging and

Web2.0 features

  • Dissemination will also continue in the “Geosciences” conferences (an abstract has

been submitted in August to the AGU2010 - San Francisco December 13-17, 2010)

slide-25
SLIDE 25

EGITF 2010 25

For more information…

Climate-G URL: http://grelc.unile.it:8080/ClimateG-DDC If you wish to Join: climateg-info@cmcc.it To issue a test certificate, please contact our Certification Authority: climateg-ca@cmcc.it For any information please contact me at: sandro.fiore@unisalento.it

slide-26
SLIDE 26

EGITF 2010 26

Acknowledgments

Giovanni Aloisio (CMCC) Sandro Fiore (CMCC) Monique Petitdidier (CNRS/IPSL) Horst Schwichtenberg (Fraunhofer-SCAI) Sébastien Denvil (IPSL) Peter Fox (RPI, NCAR) Jon Blower (Univ. Reading) Antonio Cofino (Univ. of Cantabria)

Many thanks to all of the involved people in the Climate-G testbed

slide-27
SLIDE 27

EGITF 2010 27

Conclusions

  • Climate-G is a distributed environment (conceived in the context of the

EGEE project) and providing support to the ES/Env communities

  • It acts as a Virtual Laboratory promoting collaborations among the

involved partners (2 MoA)

  • Since April 2009, a new EGEE VO for the Climate-G testbed has been

created and seed resources have been made available to support the start-up. New tests must be carried out!

  • The GRelC DAIS is exploited as a grid based distributed metadata

management solution (including harvesting)

  • Computational services need to be integrated into the portal for

analysis and pre/post-processing activities

  • New map-enabled search pages will be developed to support

distributed search & discovery

  • Several visualization tools have been integrated (e.g. IDV, Godiva2)

and new ones will be added to the Climate-G Portal