Report from VLDATA F2F meeting in London G. Ganis, 16 June 2014 - - PowerPoint PPT Presentation

report from vldata f2f meeting in london g ganis 16 june
SMART_READER_LITE
LIVE PREVIEW

Report from VLDATA F2F meeting in London G. Ganis, 16 June 2014 - - PowerPoint PPT Presentation

Report from VLDATA F2F meeting in London G. Ganis, 16 June 2014 Reminder End of May R. Graciani / UB (DIRAC project leader) proposes to include a plan for some CernVM-FS of possible interest for us in a proposal (VLDATA) for the


slide-1
SLIDE 1

Report from VLDATA F2F meeting in London

  • G. Ganis, 16 June 2014
slide-2
SLIDE 2

Reminder

  • End of May

  • R. Graciani / UB (DIRAC project leader) proposes to include a plan for

some CernVM-FS of possible interest for us in a proposal (VLDATA) for the H2020 call EINFRA-1 {4,5}

  • 5 June

  • R. Graciani presents the project to us into a special SFT meeting; we

decide to evaluate our possible contribution

  • 6-11 June

○ We have some internal discussions and prepare a draft document; we decided that I will attend the London F2F meeting

slide-3
SLIDE 3

VLDATA Objectives

  • VLDATA = Collaboration among Technology providers,

integrating existing technologies, to simplify the connection between Users and Resource providers.

  • VLDATA = open & generic platform supporting efficient and

cost-effective solutions for large-scale distributed data processing, curation, analysis and publication.

  • User Community driven co-design, validated by end-users and

supporting a new generation of data scientists.

  • Sustainability, increasing the user base by promoting VLDATA

among other relevant communities.

June 12th, 2014 VLDATA

slide-4
SLIDE 4

VLDATA

slide-5
SLIDE 5

Make IT simple

  • Simplicity: VLDATA provides an abstraction of the different Resources that are all made accessible

the end user via the same interfaces.

  • Transparency: Users are allowed to specify their Workflows/Pipelines with different levels of
  • abstractions. The platform takes care of the necessary Resource Allocation to fulfill the required

specifications.

  • Extendibility and flexibility: VLDATA provides an API that allows users to extend the provided

functionality by developing new or customized components

  • Reliability: Quality standards and extensive validation in several scientific domains to ensure the

readiness-to-use and robustness of VLDATA based solutions

  • Scalability: Modular implementation allowing horizontal (amount of connected Resources or Users)

and vertical (amount of processed Units) scaling to adapt VLDATA to the needs of each particular community or Research Infrastructure project.

  • Smart and intelligent: building on collected experience and monitoring data, algorithm can look

for optimized scheduling/searching strategies, including automated decision making based on usage traces and expectations.

  • Cost-effective: Building up on existing well-established solutions and incrementally extending and

developing to address new challenges with an evolving validated common solution, avoiding unnecessary duplicated efforts. June 12th, 2014 VLDATA

slide-6
SLIDE 6

VLDATA or Open DISDATA

  • Main components

○ DIRAC

■ LHCb platform for distributed computing ■ Largely used in EGI community ■ Expects to improve scalability and capability to support efficiently multi-cores

○ SCI-BUS

■ SCIentific gateway Based User Support ■ Flexible solution for complex data processing workflows

slide-7
SLIDE 7

EINFRA-1 {4,5}

(4) Large scale virtualization of data/compute centre resources to achieve on-demand compute capacities, improve flexibility for data analysis and avoid unnecessary costly large data transfers.

  • Budget: 15 EUR million in 2014

(5) Development and adoption of a standards-based computing platform (with open software stack) that can be deployed on different hardware and e-instrastructures (such as clouds providing infrastructure-as-a-service (IaaS), HPC, grid infrastructure …) to abstract application development and execution from available (possibly remote) computing systems. This platform should be capable of federating multiple commercial and/or public cloud resources or services and deliver Platform-as-a-Service (PaaS) adapted to the scientific community with a short learning curve. Adequate coordination and interoperability with existing e-infrastructures (including GÉANT, EGI, PRACE and others) is recommended.

  • Budget: 40 EUR million in 2015
  • Deadline: 2 Septembre 2014, 17h CET
slide-8
SLIDE 8

Work Packages (WP)

  • Development

○ WP1: Requirements Analysis & Design (Cardiff Univ.) ○ WP2: Framework and basic modules (Univ. Barcelona) ○ WP3: Compute Management (MTA SZTAKI) ○ WP4: Data Management (CYFRONET) ○ WP5: Quality Assessment (Univ Auton Barcelona)

  • Validation

○ WP6 (No leading institution appointed yet; INAF possible candidate)

  • Exploitation

○ WP7: Communication (IDGC) ○ WP8: Exploitation (Emergence Tech) ○ WP9: Internationalization (Univ Amsterdam)

slide-9
SLIDE 9

WP2: Frameworks & Basic modules

1. Improve accessibility to secure interfaces, including new client libraries supporting standard Authentication mechanism. 2. Optimize DB backend access including schema redesign, more efficient access patterns, intensive usage of caching and bulk operations and transparent replication. 3. Transparent access, in user space, to application software and data. 4. Enhance Task – Resource matching capabilities including extended semantics without losing in efficiency (> 1M Matches / day). 5. Improve overall resource status awareness including automated active and flexible end-to- end functional probes down to the execution node level. 6. Scale current WMS and File Catalog solution vertically and horizontally. 7. Produce High Availability solution, including life migration between geographically separated instances. 8. Fulfill quality and security standards defined by WP6

slide-10
SLIDE 10

CernVM-FS developments

  • 1. Unified access to all repositories

a. Exploit bootstrap reposotiry cvmfs-config.cern.ch

  • 2. Easier access in user space

a. Improve efficiency of Parrot approach b. Provide tools to setup efficient caching

  • 3. Distribution of proprietary or licensed software

a. Granular ACL and/or encryption of repositories

slide-11
SLIDE 11

CernVM-FS work plan

1. Get familiar with CernVM-FS, Parrot and all the relevant packages (M 3) 2. Design at prototype the required additions/modificationa (M6) 3. Provide a proper, clean, production quality implementation, inclusive of all relevant tests (M12) 4. Provide relevant documentation and dissemination material (M14) 5. Follow closely early adoption, analyse feedback, implement consolidation changes (M24) Request for 24 person months (Junior fellow)

slide-12
SLIDE 12

CernVM developments

  • Multi-OS support for CernVM

○ Bootloader technology decouples OS from kernel ■ OS-on-demand; proved with SLC4 and SLC5 ○ Needs tools and procedures to automatize and simplify the addition of a new Linux flavour

slide-13
SLIDE 13

CernVM work plan

1. Get familiar with the uCernVM technology (M3) 2. Design at prototype the required additions/modificationa (M6) 3. Provide relevant documentation and dissemination material (M11) 4. Follow closely early adoption, analyse feedback, implement consolidation changes (M24) Request for 24 person months (Junior fellow)

slide-14
SLIDE 14

SFT commitment / request

  • Host the persons(s)
  • Provide relevant training
  • Code subject to the the same ownership and

license terms as current CernVM software

slide-15
SLIDE 15

LHCb online interest (N. Neufeld)

  • Study options for optimal fabric usage and management

○ Evaluate light-virtualization solutions (e.g. linux containers, dockers) ○ Assess performance and operational issues, such as opportunistic use

  • f facilities having a different main purpose (e.g. the LHCb online

farm), ease of maintenance, turn-around times, data handling

  • 1 junior fellow (24 person months) fully funded by EC

○ Hosted by LHCb Online admin team (2 senior, 2 junior members, 1 or 2 engineers/physicists) ○ No additional financial needs (e.g. for hardware)

  • To be coordinated with Manchester/LHCb (Andrew McNab)
slide-16
SLIDE 16

The London meeting: intentions

  • Objectives of the meeting

○ Define work plan ■ Work Packages + objectives & deliverables & milestones ○ Define contractors ○ First iteration on the budget ○ Define editors ○ Define contributors to the proposal text and calendar Location: University of Westminster (Cavendish campus)

slide-17
SLIDE 17

The London meeting: how it went

  • 17 attendees + ~10 remote

○ For 5 people was the first direct interaction with the proposal

  • Morning

○ Presentation Tour-de-table (30’) ○ R.Graciani introduction (~the same given at SFT meeting) ○ Expression of interest for the various WPs

■ 1 slide each with detailed explanations and some discussion

  • Afternoon

○ Objectives definition for each WP

■ Lot of discussion, took all the remaining time

  • Before leaving (17h15) draft schedule for writing and next virtual

meetings

slide-18
SLIDE 18

The London meeting: impressions

  • Program too much for a 7h meeting (including lunch)
  • More parts still have to decide about participation
  • Some WP still without a very defined list of objectives
  • The meeting was useful to identify weak points, but is the time left

enough (10 weeks including vacations)?

○ This meeting should have probably taken place before

  • Indicated budget (10 MEur) large compared to EINFRA-1 {4,5}

○ Perhaps lack of required ‘political’ work ○ Competing with other big proposals (EUDAT, HelixNebula, …)

slide-19
SLIDE 19

The London meeting: outcome

  • General consensus (during and after the meeting) that a global

vision and detailed list of concrete developments and their expected impact need to be clearly stated in the proposal

  • Editors need to be appointed

○ Essentially done after the meeting

  • Almost daily cross-checks from here to the draft completeness

deadline (July 11th)

slide-20
SLIDE 20

What all this does mean for us

  • We are in technical niche of WP2
  • We took the opportunity to write down in

some detail what we need to do for some devs

  • We are ready to give our contribution
  • Coordination with PH-LBC
slide-21
SLIDE 21

Writing Calendar

Meetings/Dates Type Date Decide on Communities proposed by P. Kacsuk Mail all May 30th Reorganization of Development Area Virtual WPL(*) June 2 - 6 First WP review: put in common contributions, all WP must provide a first draft, list of activities Virtual WPLs Jun 6th 14:00 WP integration: ensure consistency, third parties, budget F2F WPLs June 11 - 12 First draft of Sections: (1) Excellence, (2) Impact, (3) Implementation Virtual Editors June 27th 14:00 Complete proposal: full review F2F Editors July 9 - 10 Proofread: submit Mail July 11th Proofread: received, full review, merging Virtual Editors July 25 - 30 External: submit Mail July 30th External: received, full review, merging Virtual Editors August 18 - 29