Investigation as a member of research discourse Vasily Bunakov - - PowerPoint PPT Presentation
Investigation as a member of research discourse Vasily Bunakov - - PowerPoint PPT Presentation
Investigation as a member of research discourse Vasily Bunakov Science and Technology Facilities Council United Kingdom Digital Libraries: Advanced Methods and Technologies, Digital Collections. Dubna, Russia, October 16, 2014 STFC Funds and
Scientific Computing develops and
- perates computing infrastructure:
- High Performance Computing
- Petabyte data store
- CERN LHC Tier 1 hub
also conducts applied research and does software development Funds and operates large scale instruments for the UK and visitor researchers in:
- physics, astronomy
- chemistry, materials
- biology, medicine
STFC
Facilities Support
Diamond Light Source ISIS neutron and muon source Central Laser Facility
Big Facilities for Small Science
Facilities science in Europe
PaNdata Europe 2010 – 2011 Preparation: common policies and standards http://pan-data.eu/pandata/?q=PaNdataEurope PaNdata ODI 2011 – 2014 Implementation: delivering new infrastructure http://pan-data.eu/pandata/?q=ODIWP
Computing support throughout the scientific lifecycle
STFC Scientific Computing supports each stage in the work of researchers from background research through conducting simulations and experiments, to analysing and archiving data.
JISCMail Grid Computing e-pubs Grid Computing
Facilities Research Lifecycle
Proposal Approval Scheduling Experiment Data storage Record Publication
Scientist submits application for beamtime Facility committee approves application Facility registers, trains, and schedules scientist’s visit Scientists visits, facility run’s experiment Subsequent publication registered with facility Raw data filtered, and stored
Data analysis
Tools for processing made available
ICAT data catalogue software: http://code.google.com/p/icatproject/
A corresponding intellectual entity: Investigation
DOIs for experimental data
www.DataCite.org Much cheaper DOIs than directly from DOI Foundation
LCDP 2013
Our DOIs landing pages are in fact for Investigations (series of Experiments)
ICAT data catalogue called from DOI landing page
https://data.isis.stfc.ac.uk/TOPCATWeb.jsp#view///&&tab=Search//&Model =INVESTIGATION&ServerName=ISIS&InvestigationId=24071239
What can cite what
Citations “from” column “to” row
Publication Investigation Dataset Software Publication V V V V Investigation V V V V Dataset X V V (derived or
aggregated datasets)
V (simulation) Software V X V (testing) V (software libraries,
service calls)
Publication and Investigation similarity
No Feature / aspect Publication Investigation 1 Is an intellectual entity V V 2 Is a subject of peer review V V (via proposal approval) 3 Can cite all significant intellectual entities of a research discourse V V 4 Citation chains (steps of discourse) observed V V 5 Universal identifiers “mints” available V V This gives Investigation a potential for a “full membership” in the research discourse along with Publication. Datasets and software are likely to remain “associated members” because of weaker features 2, 3 and a de-facto weaker feature 5.
Publications and investigations network
Bibliographic Reference in ICAT Data Catalogue (a few thousand records) ePubs Publications Repository Reference Pratt et al, Phys. Rev. Lett. 96, 247203 (2006) Phys Rev Lett 96 247203 (2006) Lancaster et al, Phys. Rev B73, 020410(R) (2005) Phys Rev B 73 020410 (2006) Blundell and Pratt, J. Phys.: Condens. Matter 16, R771 (2004) J Phys Condens Matter 16 R771-R828 (2004) M.T.F.Telling and S.H.Kilcoyne, Electron transfer in dextran, J. Phys.:
- Condens. Matter 19 No 2 (17 January
2007) J Phys Condens Matter 19 2 026221 (2007) J Tomkinson and M.T.F Telling, Ammonium ions in alkali metal halide crystals: Tunnelling and spin relaxation, PCCP 2006 8 38 4434 Phys Chem Chem Phys 8 4434-4440 (2006)
How to link Publications and Investigations?
Beyond bibliographic records matching
A few thousand publications mapped with publications repository
- n previous stage could be used for tuning and testing the machine
learning techniques
Publications repository records and their DOI landing pages Instruments / departments tags Authors Publication title Authors’ organizations Publication date Abstract Keywords, e.g. PACS indices Data catalogue records (investigation descriptions) Instruments to investigations mapping Researchers’ names Investigation title Researchers’ organizations Investigation period Investigation description ICAT keywords
What represents facilities research?
Publications catalogue Experiment descriptions catalogue Publica tions
Research Awards
Experi ments Selected
- ntologies
+
Yesterday: publications Today: publications and data (in fact, Investigations) Tomorrow: “facility-centric” Linked Open Data cloud Selected external Linked Data Publications catalogue
Triple store (Jena TDB) OAI –PMH sources Fuseki Web application Harvesters & Mappers Command line tools (ARQ, loaders,
- ptimizers, …)
SPARQL (can be from a remote client) Linked Data API Bespoke (Jena) Web application RDF extractors and loaders Data cleansers and mappers with vocabularies,
- ntologies,
geolocation services and other Linked Data sources OAI-PMH Linked Data wrappers Databases Other triple stores , SPARQL endpoints and Linked Data APIs Data converters Database Linked Data wrappers Semantically enriched data Bespoke / customized software applications
Linked Data technology stack
Legend:
Prospective components Implemented or evaluated components Facilities user community
Information entities circulating in your research domain
- What are they? (beyond
publications)
- Do they have a clear identity?
- Do they circulate in your
- rganization only or universally
across organizations?
- Can they be linked with
publications and other information entities?
- Can they be linked with the
world-wide data cloud?
Scienti tifi fic c Computi uting Department