EGI-EUDAT joint access to data and computing services: an executive - - PowerPoint PPT Presentation

egi eudat joint access to data and computing services an
SMART_READER_LITE
LIVE PREVIEW

EGI-EUDAT joint access to data and computing services: an executive - - PowerPoint PPT Presentation

EGI-EUDAT joint access to data and computing services: an executive report DI4R - Brussels Michaela BARTH caela@kth.se Ute KARSTENS ute.karstens@nateko.lu.se Matthew VILJOEN matthew.viljoen@egi.eu Peter GILLE petergil@kth.se Maggie


slide-1
SLIDE 1

www.eudat.eu

EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065

EGI-EUDAT joint access to data and computing services: an executive report

DI4R - Brussels

Michaela BARTH caela@kth.se

Ute KARSTENS ute.karstens@nateko.lu.se Matthew VILJOEN matthew.viljoen@egi.eu Peter GILLE petergil@kth.se Maggie HELLSTRÖM margareta.hellstrom@nateko.lu.se Xavier PIVAN xavier.pivan@cerfacs.fr Christian PAGÉ christian.page@cerfacs.fr

slide-2
SLIDE 2

WP7 Task 7.2: Joint Access to Data, HTC and Cloud Computing Resources

  • EGI-EUDAT collaboration started in March 2015 and officially

continues until end of EUDAT (February 2018).

  • Aiming at a production cross-infrastructure service
  • provide end-users with a seamless access to an integrated

infrastructure offering both EGI and EUDAT services

  • pairing data and high-throughput computing resources together.
  • Concrete community pilots
  • EPOS
  • ICOS
  • ENES
  • Harmonization on all levels (Technical, Operational, Policies)

2

slide-3
SLIDE 3
  • Federated services through EGI paired together with EUDAT’s

set of research data services:

Benefits of EGI-EUDAT interoperability

Computation on EGI Federated Cloud and HTC EUDAT services for transfer, syncing, sharing, staging and preservation of data

slide-4
SLIDE 4
  • EGI and EUDAT selected a set of relevant user communities
  • Identified user communities were prominent European

Research infrastructure in the field of Earth Science (EPOS and ICOS), Bioinformatics (BBMRI and ELIXIR) and Space Physics (EISCAT-3D).

  • Integration activity has been driven by the end users from the

start!

  • Process of getting requirements from user communities

and their indication of prioritization of those requirements

  • Definition of universal use case

Initial user community pilots

slide-5
SLIDE 5

Definition of the universal use case

  • Demo at EGI Community Forum 2015

5

slide-6
SLIDE 6
  • EPOS:
  • Fostering worldwide interoperability in Earth Sciences and

provide services to a broad community of users

  • ICOS:
  • Creating web-based service (“Footprint tool”) at ICOS

Carbon Portal, providing on-demand computing facilities

  • ENES:
  • Performing on-demand climate data analytics for climate

research and climate change impact communities

  • Planned joint open call to expand the pilot activity to further

early adopters:

  • Not many new user communities were expected to participate
  • Remaining user communities very productive in providing feedback
  • Instead: Integrate user communities that already previously used both EGI

and EUDAT services, but not yet in combination.

User community pilots

slide-7
SLIDE 7
  • EPOS is the integrated solid Earth Sciences research infrastructure
  • Aims to be an effective coordinated European-scale monitoring facility for

solid Earth dynamics

  • Aims to establish a long-term plan to facilitate the integrated use of data,

models and facilities from existing and new distributed research infrastructures (RIs), for solid Earth science

European Plate Observing System (EPOS)

slide-8
SLIDE 8

EPOS data workflow plan

slide-9
SLIDE 9
  • Use case first stage was:
  • Defining the best strategy for access to Federated Cloud

partners and identify secure and efficient data transfer protocols towards the iRODS system

  • Now in second stage:
  • Integration with data storage services (EUDAT) and cloud

computing resources (EGI)

EPOS use case status

slide-10
SLIDE 10
slide-11
SLIDE 11

Atmospheric observations Prior fluxes Emissions Meteorological driver fields ≈ 1 GB ≈ 0.5-1 TB ≈ 2-3 TB ≈ 1-2 TB per year Station Footprints GHG concentrations

STILT Lagrangian transport model

≈ 670 CPUs per footprint

=> 1700 CPUh per station per year

ICOS Carbon Portal

Atmospheric observations Prior fluxes Emissions Meteorological driver fields ≈ 1 GB ≈ 0.5-1 TB ≈ 2-3 TB ≈ 1-2 TB per year Station Footprints GHG concentrations

STILT Lagrangian transport model

≈ 670 CPUs per footprint

=> 1700 CPUh per station per year

ICOS Carbon Portal

Atmospheric observations Prior fluxes Emissions Meteorological driver fields ≈ 1 GB ≈ 0.5-1 TB ≈ 2-3 TB ≈ 1-2 TB per year Station Footprints GHG concentrations

STILT Lagrangian transport model

≈ 670 CPUs per footprint

=> 1700 CPUh per station per year

ICOS Carbon Portal

Atmospheric observations Prior fluxes Emissions Meteorological driver fields ≈ 1 GB ≈ 0.5-1 TB ≈ 2-3 TB ≈ 1-2 TB per year Station Footprints GHG concentrations

STILT Lagrangian transport model

≈ 670 CPUs per footprint

=> 1700 CPUh per station per year

ICOS Carbon Portal

Atmospheric observations Prior fluxes Emissions Meteorological driver fields ≈ 1 GB ≈ 0.5-1 TB ≈ 2-3 TB ≈ 1-2 TB per year Station Footprints GHG concentrations

STILT Lagrangian transport model

≈ 670 CPUs per footprint

=> 1700 CPUh per station per year

ICOS Carbon Portal

≈ ≈ ≈ ≈ ≈

Footprint tool calculations for atmospheric sites using Lagrangian atmospheric transport model STILT

slide-12
SLIDE 12

slide-13
SLIDE 13

ICOS Carbon Portal use case status

  • Virtual machines with attached block storage instantiated in the EGI

Federated Cloud.

  • Docker container for computations with local VM storage
  • Data transfer between VM and B2STAGE instance at PDC/KTH Stockholm
  • Storing of ICOS data tested on the B2SAFE system at KTH
  • Robot certificates installed to allow for further automation of the workflow
  • OneData software solution being tested

Next steps:

  • ICOS data replicating in B2SAFE and access via B2STAGE service
  • Access to common storage for several VMs (via the EGI DataHub)
  • Load balancing to distribute computations/users requests to several VMs

Beyond current EGI-EUDAT collaboration:

  • ICOS competence centre within EOSChub

13

slide-14
SLIDE 14

Infrastructure for the European Network for Earth Science Modelling (IS-ENES)

  • Spawned from work on EGI EUDAT interoperability in WP7/WP8

and the ICOS Carbon Portal use case developed therein.

  • Goal: enabling computation on data stored in the Earth System

Grid Federation (ESGF) infrastructure.

  • Calculations will be performed using the EUDAT General Execution

Framework Workflow API (GEF) combined with EUDAT B2 services and EGI FedCloud

  • Results to be sent back to climate4impact.eu platform

14

slide-15
SLIDE 15

IS-ENES: Current situation in the Climate Research Community

◆ Substantial increase in the federated climate data archive volume ◆ Download locally then analyze: not a sustainable workflow!

slide-16
SLIDE 16
slide-17
SLIDE 17

ENES use case status

  • Simplified view of the steps of the current adapted Use Case

1: Researcher finds data (e.g. via B2FIND) and provides a PID/URL. 2: Researcher prepares the configuration of the analysis that will be applied to the selected data using the GEF. 3: The GEF backend launches an EGI FedCloud VM and deploys a GEF

  • Docker. Calculations are executed based on input parameters. Output is

stored into EGI Volume. 4: Results are sent back to B2DROP for researcher to download, or execute another GEF for further calculations or to generate a figure. (Resulting figure could be put into B2DROP.)

  • So far: Automation of all steps is completed. Data comes from the

ENES/ESGF data nodes, but eventually from B2SHARE. It uses the dockerized jOCCI API for automatic instantiation of VMs.

  • Next steps:
  • All AAI aspects need to be revisited, as we use 3 infrastructures.
  • Better integration with EUDAT B2 services; the implementation is at a

prototype stage with generic aspects still needing to be developed.

17

slide-18
SLIDE 18

Prototype overview: Deploying GEF execution on EGI FedCloud

slide-19
SLIDE 19

B2STAGE/B2SAFE architecture example

19

B2STAGE ingestion B2SAFE replica rule

iRODS

Third party transfer

slide-20
SLIDE 20

Use Case Pilot Challenges Encountered

  • Scaling up (esp. AAI interoperation)
  • Managing co-existing support systems and channels
  • User-friendly documentation often missing or lacking
  • Steep learning curve for the user communities
  • Substantial time and trust investment
  • 3rd party dependencies and technical problems
  • Globus Toolkit GridFTP → B2STAGE HTTPS API
  • Support of metadata handling for B2SAFE (GraphDB)
  • Large amount of small files to be used as input for further model runs,

was a problem within EGI OneData prior to 17.06.0-beta2

  • Automatization freedom as unforeseen requirement

20

slide-21
SLIDE 21

Pilot concept interpretation mismatch

  • EGI and EUDAT developers eagerly reacting to feedback
  • Pilots are not free beta-users for non-production-ready

undocumented new features

  • Other parties using T7.2 to find the right contact within the

user community for their own agendas

  • Personal contacts still highly appreciated

21

slide-22
SLIDE 22

Outcomes

  • General:
  • Feedback on data-handling support within the EGI DataHub
  • Testing EGI Federated Cloud with automatic submission
  • Data transfer tests between the VMs and B2STAGE instances using

both OneData and EGI DataHub to access a common storage for several VMs

  • Evaluating the new B2STAGE HTTP API
  • AAI Interoperability:
  • Transparent access: See the EGI and EUDAT services as offered by a

unique infrastructure once authenticated

  • Access all (web + non-web) EGI and EUDAT services with the same credentials
  • Access Delegation from one service to another
  • Data privacy considerations and policy harmonisation
  • AAI overview document created for understanding each other’s AAI

layers, agreements on e.g. RCauth as common link

  • Establishing and revising common roadmap

22

slide-23
SLIDE 23

Finalization and Deliverables

  • Finalize implementation of use cases with result evaluation
  • Description of work and dataflow for the use cases
  • Improved documentation
  • Input to AARC via questionnaires
  • EGI OneData demo video http://go.egi.eu/datahub
  • EGI Engage Deliverable D4.9 “Open Data Platform: Demonstrator,

Experience Report and Use Cases” covering some EGI part of work done within ICOS use case

  • EUDAT2020 final report D7.4 (aim: end of Dec 2017)

23

slide-24
SLIDE 24

www.eudat.eu

EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065

Thank you!

Michaela BARTH caela@kth.se

Ute KARSTENS ute.karstens@nateko.lu.se Matthew VILJOEN matthew.viljoen@egi.eu Peter GILLE petergil@kth.se Maggie HELLSTRÖM margareta.hellstrom@nateko.lu.se Xavier PIVAN xavier.pivan@cerfacs.fr Christian Pagé christian.page@cerfacs.fr