I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
1
I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
1
Data Management: report & news.
PaNDaaS WG 2nd meeting @ESRF
12th of Dec 2016 Jean-François Perrin (ILL)
Data Management: report & news. PaNDaaS WG 2 nd meeting @ESRF - - PowerPoint PPT Presentation
Data Management: report & news. PaNDaaS WG 2 nd meeting @ESRF Jean-Franois Perrin (ILL) 12th of Dec 2016 I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I
I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
1
I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
1
PaNDaaS WG 2nd meeting @ESRF
12th of Dec 2016 Jean-François Perrin (ILL)
I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
2
I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
2
Some Results: Dec 2012 – Dec 2016
Co-funded by the European Union : PaNData-Europe Grant Agreement No 261537 PaNData-ODI Grant Agreement No RI-283556
I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
3
available for users: search, access, annotate, archive, identify, publish, …
I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
4
Based on the PaNData framework
associated on-line catalogue.
associated metadata and the analysis data is restricted to the experimental team for a period of 3 years. During the 2 next years data are available on request. Thereafter, they become publicly accessible.
identifier (CC-BY licence).
https://www.ill.eu/DataPolicy
I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
5
– User management of data access authorization. – Users could decide to publish (open access) their
data, before the end of the embargo period.
– Linked to DOIs. – Linked to experimental logs. – Linked to user annotation tool. – Linked with proposal system. – Download of data. – Full text search
landing page, …
Index all available information: Proposal, experimental report, data file annotation, publications, …
I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
6
embargo
(at least through the portal)
concerning 376 unique datasets
Collaboration with
DataCite/INIST
Linking data and people through ORCID/ResearcherID Linking data with publications
I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
8
I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
9
“What are DOIs? What are you talking about?”
education. Need to fill the gap between what we hear in RDA-like meetings and the daily reality of the scientists. Still need to convince the scientist that a change is happening regarding experimental data.
I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
10
citations …
– CrossRef cited by linking - currently only for article (vs data) publishers ? -, OpenAire. – This is a business for the publishers.
– We have currently (Dec 2016) collected less than 50 peer reviewed article referencing the
data DOI.
– How many are we missing?
Need to access freely information for building metrics.
I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
11
Not easily findable through most of search services (WoS, scopus, …) Only findable through google scholar.
I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
12
Not findable at all
I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
13
Should be the DOI of the article, instead of the one
I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
14
This is by nature a long process, but seeing the level of investment needed, we need to convince, we need evidence of success urgently.
I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
15
in scientific articles, through DOIs, is recently improving.
publishers http://www.elsevier.com/?a=57755
get a DOI for experiment XYZ?”
1 2 3 4 5 6 7 2012 2013 2014 2015 2016
Year
% of ILL users' publication citing the data sets through DOIs
% Scientists name disambiguation:
I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
16
Cite as
diffraction dataset to determine the weld fusion zone shape on residual stress in submerged arc welding [Data set]. Zenodo. http://doi.org/10.5281/zenodo.165765
Instead of
WITHERS Philip J.; ISHIGAMI Atsushi; PIRLING Thilo; ROY Matthew and WALSH Joanna. (2014). The effect
heat input welding of steel. Institut Laue-Langevin (ILL) doi:10.5291/ILL-DATA.1-02-145
Licence ?
I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
17
I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
17
I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
18
10 20 30 40 50 60 2000 2001 2001 2002 2003 2004 2005 2006 2007 2008 2008 2009 2010 2011 2012 2013 2014 2015 2015 2016 2017
TB 2016-2017
Volume of experimental data / cycle
Raw (TB) Processed (TB) Forecast (TB)
Evaluation of new detectors leading to permanent instruments starting from Dec 2016. Moving to list mode (vs Histo)
I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
19
– ILL archive capacity & performance – Users’ storage becoming almost impossible
– Today how to carry 40TB to 10 different labs? – Why carrying them?
– Almost impossible in most users’ labs with such data
sets.
– 32 direct (h-index 4) peer reviewed articles published – 2 Phd-thesis – 10+ international conferences – …
Example of the EXILL campaign
I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
20
(data, software, IT capacity and expertise) remotely using standard tools (ideally only web browser).
1) The user connects remotely using his web browser and its credentials (Federated IM) 2) Then select one of the experiment he has performed in the list. 3) he is then connected to a service where the necessary analysis applications have been installed and configured for accessing directly the experimental data. 4) If necessary he could receive help and support from facility expert, during the analysis. 5) Analysis data are published.
As of Dec 2016
I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
21
– LAMP
, Mantid, Matlab through a private cloud + remote desktop
– OpenAire/Datacite: help us to communicate, collect metrics of data usage – GEANT: Global AAI? Hybrid-Cloud? – EGI/EUDAT: ???
solve.
– DaaS (volume and ease), analysis preservation, metrics
I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
22
I N S T I T U T M A X V O N L A U E - P A U L L A N G E V I N
22
Contact: data@ill.eu Portal: https://data.ill.eu Policy: https://www.ill.eu/DataPolicy PaNData Collaboration: http://pan-data.eu