The Helmholtz Association Project Large Scale Data Management and - - PowerPoint PPT Presentation

the helmholtz association project large
SMART_READER_LITE
LIVE PREVIEW

The Helmholtz Association Project Large Scale Data Management and - - PowerPoint PPT Presentation

The Helmholtz Association Project Large Scale Data Management and Analysis (LSDMA) Kilian Schwarz, GSI; Christopher Jung, KIT Overview Motivation Data Life Cycle LSDMAs dual approach Facts and Numbers Initial


slide-1
SLIDE 1

The Helmholtz Association Project „Large Scale Data Management and Analysis“ (LSDMA)

Kilian Schwarz, GSI; Christopher Jung, KIT

slide-2
SLIDE 2

2 05.10.2012 Christopher Jung SCC, KIT

Overview

  • Motivation
  • Data Life Cycle
  • LSDMA’s dual approach
  • Facts and Numbers
  • Initial Communities
  • LSDMA, FAIR and ALICE
slide-3
SLIDE 3

3 05.10.2012 Christopher Jung SCC, KIT

Why is Scientific Big Data important?

Honestly, I do not need to explain this to you.

slide-4
SLIDE 4

4 05.10.2012 Christopher Jung SCC, KIT

Examples of Scientific Big Data in non-HEP

Examples for sciences with Big Data:

  • Systems Biology: ~10 TB per day in high-

throughput microscopy (zebra fish embryos)

  • Climate simulation: 10-100 PB per year
  • Brain research: 1 PB per year for brain

mapping

  • Photon Science: XFEL 10 PB/year
  • and many other sciences which do know their

needs yet

slide-5
SLIDE 5

5 05.10.2012 Christopher Jung SCC, KIT

Challenges of Big Data

  • Non-reproducibility of scientific data (or at high costs)
  • Current analysis methods scale poorly
  • Existing big data knowledge in the respective fields
  • Each discipline has its specific needs
  • Multidiscliplanary research
  • Metadata
  • Authentication and authorization (single sign-on)
  • Data privacy (incl. removal of private data)
  • “Good scientific practice”
  • Cost estimation for long-term archival (at different service levels)
  • Data preservation
  • Open Access
slide-6
SLIDE 6

6 05.10.2012 Christopher Jung SCC, KIT

Data Life Cycle

Inspiration for LSDMA: support the whole data life cycle!

slide-7
SLIDE 7

7 05.10.2012 Christopher Jung SCC, KIT

Dual approach: community-specific and generic

Data Life Cycle Labs

  • Joint r&d with the scientific user

communities

– Optimization of the data life cycle – Community-specific data analysis tools and services

Data Services Integration Team

  • Generic r&d

– Interface between federated data infrastructures and DLCLs/communities – Integration of data services into scientific working process

slide-8
SLIDE 8

8 05.10.2012 Christopher Jung SCC, KIT

Facts and numbers

  • Initial project period: 1.1.2012-31.12.2016
  • Funded by Helmholtz Association (13 MEUR for 5 years)
  • To become a part of the sustainable program-oriented funding of

Helmholtz Association in 2015

  • Partners: 4 Helmholtz research centers, 6 universities and the

German climate research center

  • Leading project partner: KIT
slide-9
SLIDE 9

9 05.10.2012 Christopher Jung SCC, KIT

Initial communities

  • Energy

– Smart grids, battery research, fusion research

  • Earth and Environment

– Climate model, environmental satellite data

  • Health

– Virtual human brain map

  • Key Technologies

– Synchroton radiation, nanoscopy, systems biology, electron- microscopical imaging techniques

  • Structure of Matter

– Photon Science: Petra 3, XFEL – FAIR@GSI (14 experiments with big and small communities)

slide-10
SLIDE 10

10 05.10.2012 Christopher Jung SCC, KIT

LHC Computing – Prototype for FAIR

  • FAIR profits from computing experience

within an already running experiment

  • ALICE can test new developments in

FAIR

  • new FAIR developments are on the

way, and to some extend they already go back to ALICE

  • FAIR will play an increasing role

(funding, network architecture, software development and more ...)

slide-11
SLIDE 11

11 05.10.2012 Christopher Jung SCC, KIT

  • parallel and distributed computing

– triggerless “online” system

  • porting of needed algorithms to

GPU

– Grid/Cloud infrastructure

  • enable the possibility to submit

compute jobs to Clouds

– create interfaces to existing environments (AliEn, ...)

  • data archives

– long term data archives

  • including concepts for xrootd and

gStore

– meta data calatog and data analysis

To be developed within LSDMA (DLCL: structure of matter) in collaboration with LSDMA – DSIT, the FAIR community, and ALICE (whereever synergy can be found)

Goals for GSI/FAIR in LSDMA

  • Metropolitan Area Systems

– include the distributed FAIR T0/T1 centre into a global Grid/Cloud infrastructure – Federated Identity Management

  • Global Federations

– Global File System – Optimization of Data Storage

  • hot versus cold data
  • corrupt and incomplete data sets
  • parallel storage
  • 3rd party copy

Additional synergies via DSIT

slide-12
SLIDE 12

12 05.10.2012 Christopher Jung SCC, KIT

Next Steps at GSI

  • Advertise LSDMA positions (2 for FAIR DLCL) – do you

know candidates ? – GSI DSIT already started to hire people

  • Discussion with FAIR experiments and ALICE
  • Set-up of e-science infrastructures, first for PANDA and

CBM, based on the experiences with ALICE (AliEn/xrootd/...)

  • Include smaller FAIR experiments
  • Continue to develop existing e-science infrastructure,

also in close collaboration with DSIT and ALICE

slide-13
SLIDE 13

13 05.10.2012 Christopher Jung SCC, KIT

Summary and Outlook

  • There are many challenges in Scientific Big Data
  • LSDMA is a sustainable Helmholtz Association project, supporting

the whole data life cycle, using a community-specific and a generic approach

  • FAIR is an important initial community in the research field ‘structure
  • f matter’; several developments planned -> synergies w/ALICE
  • GSI has two open job positions for LSDMA