the helmholtz association project large
play

The Helmholtz Association Project Large Scale Data Management and - PowerPoint PPT Presentation

The Helmholtz Association Project Large Scale Data Management and Analysis (LSDMA) Kilian Schwarz, GSI; Christopher Jung, KIT Overview Motivation Data Life Cycle LSDMAs dual approach Facts and Numbers Initial


  1. The Helmholtz Association Project „Large Scale Data Management and Analysis“ (LSDMA) Kilian Schwarz, GSI; Christopher Jung, KIT

  2. Overview • Motivation • Data Life Cycle • LSDMA’s dual approach • Facts and Numbers • Initial Communities • LSDMA, FAIR and ALICE 2 05.10.2012 Christopher Jung SCC, KIT

  3. Why is Scientific Big Data important? Honestly, I do not need to explain this to you. 3 05.10.2012 Christopher Jung SCC, KIT

  4. Examples of Scientific Big Data in non-HEP Examples for sciences with Big Data: • Systems Biology: ~10 TB per day in high- throughput microscopy (zebra fish embryos) • Climate simulation: 10-100 PB per year • Brain research: 1 PB per year for brain mapping • Photon Science: XFEL 10 PB/year • and many other sciences which do know their needs yet 4 05.10.2012 Christopher Jung SCC, KIT

  5. Challenges of Big Data • Non-reproducibility of scientific data (or at high costs) • Current analysis methods scale poorly • Existing big data knowledge in the respective fields • Each discipline has its specific needs • Multidiscliplanary research • Metadata • Authentication and authorization (single sign-on) • Data privacy (incl. removal of private data) • “ Good scientific practice ” • Cost estimation for long-term archival (at different service levels) • Data preservation • Open Access • … 5 05.10.2012 Christopher Jung SCC, KIT

  6. Data Life Cycle Inspiration for LSDMA: support the whole data life cycle! 6 05.10.2012 Christopher Jung SCC, KIT

  7. Dual approach: community-specific and generic Data Life Cycle Labs Data Services Integration Team • • Joint r&d with the scientific user Generic r&d communities – Interface between federated – Optimization of the data life data infrastructures and DLCLs/communities cycle – Integration of data services into – Community-specific data scientific working process analysis tools and services 7 05.10.2012 Christopher Jung SCC, KIT

  8. Facts and numbers • Initial project period: 1.1.2012-31.12.2016 • Funded by Helmholtz Association (13 MEUR for 5 years) • To become a part of the sustainable program-oriented funding of Helmholtz Association in 2015 • Partners: 4 Helmholtz research centers, 6 universities and the German climate research center • Leading project partner: KIT 8 05.10.2012 Christopher Jung SCC, KIT

  9. Initial communities • Energy – Smart grids, battery research, fusion research • Earth and Environment – Climate model, environmental satellite data • Health – Virtual human brain map • Key Technologies – Synchroton radiation, nanoscopy, systems biology, electron- microscopical imaging techniques • Structure of Matter – Photon Science: Petra 3, XFEL – FAIR@GSI (14 experiments with big and small communities) 9 05.10.2012 Christopher Jung SCC, KIT

  10. LHC Computing – Prototype for FAIR • FAIR profits from computing experience within an already running experiment • ALICE can test new developments in FAIR • new FAIR developments are on the way, and to some extend they already go back to ALICE • FAIR will play an increasing role (funding, network architecture, software development and more ...) 10 05.10.2012 Christopher Jung SCC, KIT

  11. Goals for GSI/FAIR in LSDMA To be developed within LSDMA (DLCL: structure of matter) in collaboration with LSDMA – DSIT, the FAIR community, and ALICE (whereever synergy can be found) • parallel and distributed computing • Metropolitan Area Systems – triggerless “online” system – include the distributed FAIR • porting of needed algorithms to T0/T1 centre into a global GPU Grid/Cloud infrastructure – Grid/Cloud infrastructure – Federated Identity Management • enable the possibility to submit compute jobs to Clouds • Global Federations – create interfaces to existing environments (AliEn, ...) – Global File System • data archives – Optimization of Data Storage – long term data archives • hot versus cold data • including concepts for xrootd and • corrupt and incomplete data sets gStore – meta data calatog and data • parallel storage analysis • 3rd party copy Additional synergies via DSIT 11 05.10.2012 Christopher Jung SCC, KIT

  12. Next Steps at GSI • Advertise LSDMA positions (2 for FAIR DLCL) – do you know candidates ? – GSI DSIT already started to hire people • Discussion with FAIR experiments and ALICE • Set-up of e-science infrastructures, first for PANDA and CBM, based on the experiences with ALICE (AliEn/xrootd/...) • Include smaller FAIR experiments • Continue to develop existing e-science infrastructure, also in close collaboration with DSIT and ALICE 12 05.10.2012 Christopher Jung SCC, KIT

  13. Summary and Outlook • There are many challenges in Scientific Big Data • LSDMA is a sustainable Helmholtz Association project, supporting the whole data life cycle, using a community-specific and a generic approach • FAIR is an important initial community in the research field ‘structure of matter’; several developments planned -> synergies w/ALICE • GSI has two open job positions for LSDMA 13 05.10.2012 Christopher Jung SCC, KIT

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend