Documenting Heri ritage Scie ience: A CID IDOC CRM-based System - - PowerPoint PPT Presentation

documenting heri ritage scie ience
SMART_READER_LITE
LIVE PREVIEW

Documenting Heri ritage Scie ience: A CID IDOC CRM-based System - - PowerPoint PPT Presentation

Documenting Heri ritage Scie ience: A CID IDOC CRM-based System for r Modell lling Scie ientific Data Lisa Castelli | INFN, Italy Achille Felicetti | PIN, University of Florence, Italy Apologies Im NOT A DIGITAL WOMAN For any


slide-1
SLIDE 1

Documenting Heri ritage Scie ience: A CID IDOC CRM-based System for r Modell lling Scie ientific Data

Lisa Castelli | INFN, Italy Achille Felicetti | PIN, University of Florence, Italy

slide-2
SLIDE 2

Apologies

I’m NOT A DIGITAL WOMAN… For any question about the technical details of the model please contact my colleague Achille Felicetti achille.felicetti@gmail.com

slide-3
SLIDE 3

Apologies

I’m NOT A DIGITAL WOMAN… WHY AM I HERE? INFN-CHNet in ARIADNE plus

slide-4
SLIDE 4

IN INFN-CHNet

INF INFN-CHNet: Born to coordin inate th the cu cult ltural l heri ritage ac activ ivities of

  • f IN

INFN facilities is opening to par artn tners with ith dif ifferent competencies (restorations centres, archaeology/chemistry departments in Universities, …)

slide-5
SLIDE 5

IN INFN-CHNet

Fixed Labs

Medium-large scale facilities (IBA, 14C, ...) TL dating X-ray imaging X-ray imaging Mass Spectrometry

slide-6
SLIDE 6

IN INFN-CHNet

Fixed Labs

Medium-large scale facilities (IBA, 14C, ...) TL dating X-ray imaging X-ray imaging

Mobile Labs

XRD XRF Mass Spectrometry Thermography

slide-7
SLIDE 7

IN INFN-CHNet

Fixed Labs

Medium-large scale facilities (IBA, 14C, ...) TL dating X-ray imaging X-ray imaging

Mobile Labs

XRD XRF Mass Spectrometry

Digital Labs

Thermography Data Storage and fruition Web tools for data fruition

slide-8
SLIDE 8

CHNet-DIGILAB: an ideal world

  • User access

control

CHNet database @CNAF Non- expert User Services

Browse/Query Interfaces

Data + Metadata Data + Metadata

Fruition tools Analysis tools

WEB

expert User

Fruition Fruition & Analysis

slide-9
SLIDE 9

ARIADNE plus

  • Realization of a complex Digital Infrastructure
  • 40 partners - all Europe!
  • Archaeology + historical sources, linguistic data, catalographic data,

scientific data from archaeometric analyses

  • Innovative Services and Cloud Environment
  • Data annotation (image and text)
  • More advanced NLP and text mining (machine learning approach)
  • Definition of univoque spatial and temporal entities for archaeology
  • Geografic system: interoperability of national geografical systems and GIS, to

integrate in a unique transnational system (ARIADNE GeoServer)

  • Cloud environment: shared resources, efficient management of services
  • Implementation of services for visualization and analysis also of scientific

data, high resolution images and online 3D visualization.

slide-10
SLIDE 10

Goals of f DIG IGIL ILAB (CHNet and more…)

  • Interoperability between different Heritage Science communities
  • Data from analyses, conservation and restoration activities
  • Cross-disciplinary information integration
  • Interoperability with Humanities

(History, Archaeology, Art History, […])

  • System for discovering, accessing, reusing integrated data and

services

  • Solid metadata model for data and service description required
  • Design of the integrated system
slide-11
SLIDE 11

Exi xisting schemas and metadata models

  • International metadata standards for scientific research
  • CERIF (EU recommended format), OBOE (scientific observation and measurement), NeXus

(neutron and x-ray), AVM (astronomical images), CIF (crystallography), […]

  • Formats in use by national labs/institutions (Italy)
  • Institute for the Conservation and Valorisation of Cultural Heritage (ICVBC-CNR): XRD data

in XRDML format, CIF taxonomy of crystallographic terms

  • National Institute for Nuclear Physics (INFN) CHNet: C14 binary data + data in Excel tables,

XRF data in RAW format

  • Istituto Superiore per la Conservazione ed il Restauro (ISCR): multispectral information in

ICCD-based format

  • Opificio delle Pietre Dure (OPD): photographs, radiographies, 3D models in various formats
  • National Institute of Optics (CNR-INO): multispectral and OCT data in RAW and TIFF

formats

  • Institute of Molecular Science and Technologies (ISTM-CNR): MOVIDA software

(multi‐technique diagnostics, information in relational database) -> MOLAB

  • Formats used by European labs/institutions: investigation planned for next period
slide-12
SLIDE 12

Exi xisting schemas and metadata models

  • International metadata standards for scientific research
  • CERIF (EU recommended format), OBOE (scientific observation and measurement), NeXus

(neutron and x-ray), AVM (astronomical images), CIF (crystallography), […] Too general/too specific

  • Formats in use by national labs/institutions (Italy)
  • Institute for the Conservation and Valorisation of Cultural Heritage (ICVBC-CNR): XRD data

in XRDML format, CIF taxonomy of crystallographic terms

  • National Institute for Nuclear Physics (INFN) CHNet: C14 binary data + data in Excel tables,

XRF data in RAW format

  • Istituto Superiore per la Conservazione ed il Restauro (ISCR): multispectral information in

ICCD-based format

  • Opificio delle Pietre Dure (OPD): photographs, radiographies, 3D models in various formats
  • National Institute of Optics (CNR-INO): multispectral and OCT data in RAW and TIFF

formats

  • Institute of Molecular Science and Technologies (ISTM-CNR): MOVIDA software

(multi‐technique diagnostics, information in relational database) -> MOLAB

  • Formats used by European labs/institutions: investigation planned for next period
slide-13
SLIDE 13

Exi xisting schemas and metadata models

  • International metadata standards for scientific research
  • CERIF (EU recommended format), OBOE (scientific observation and measurement), NeXus

(neutron and x-ray), AVM (astronomical images), CIF (crystallography), […] Too general/too specific

  • Formats in use by national labs/institutions (Italy)
  • Institute for the Conservation and Valorisation of Cultural Heritage (ICVBC-CNR): XRD data

in XRDML format, CIF taxonomy of crystallographic terms

  • National Institute for Nuclear Physics (INFN) CHNet: C14 binary data + data in Excel tables,

XRF data in RAW format

  • Istituto Superiore per la Conservazione ed il Restauro (ISCR): multispectral information in

ICCD-based format

  • Opificio delle Pietre Dure (OPD): photographs, radiographies, 3D models in various formats
  • National Institute of Optics (CNR-INO): multispectral and OCT data in RAW and TIFF

formats

  • Institute of Molecular Science and Technologies (ISTM-CNR): MOVIDA software

(multi‐technique diagnostics, information in relational database) -> MOLAB

  • Formats used by European labs/institutions: investigation planned for next period

a chaos!!!

slide-14
SLIDE 14

“Going semantics”: the CIDOC CRM family

  • CIDOC CRM (ISO 21127:2006) - http://cidoc-crm.org
  • High-level conceptual model for standardisation, integration and interoperability
  • f Cultural Heritage information
  • Core model + domain-specific extensions
  • CRMsci - http://cidoc-crm.org/crmsci
  • Scientific observation information
  • CMRdig - http://cidoc-crm.org/crmdig
  • Provenance metadata for digital objects
  • CRMpe
  • PARTHENOS Entities Model
  • Datasets, services and curators
  • Versioning, accessibility, licensing …
  • PARTHENOS Registry
slide-15
SLIDE 15

Devices (D8) Actor (E39) Places (E53)/ Time (E52) Man-made object (E22) Datasets (PE18) Activity (E7) Legal Bodies (E40)

Basis of f CID IDOC CRM model

Campaign Measurement Analysis

slide-16
SLIDE 16

Devices (D8) Actor (E39) Places (E53)/ Time (E52) Man-made object (E22) Datasets (PE18) Activity (E7) Legal Bodies (E40)

Basis of f CID IDOC CRM model

Campaign Measurement Analysis Belong to

slide-17
SLIDE 17

Devices Actor Places/ Time Man-made object Datasets Legal Bodies

Our model

Campaign Measurement Analysis

XML file of metadata

slide-18
SLIDE 18

Our model for scie

ientific datasets description

  • Reuses logics and components of existing CIDOC models
  • Modular approach
  • “Complexity hiding” paradigm in content generation
  • User-friendly interfaces for rapid manual and semi-automatic metadata creation
  • Scripts for automatic generation of information (e.g.: from digital devices) or

existing metadata in a transparent way

  • Semantic network for advanced queries (CIDOC CRM modelling principles)
  • Properties and relationships to link entities in a meaningful way
  • Platform independent intermediate meta-format (XML)
  • Dynamic mapping and re-encoding in existing standards (CSV, SQL, RDF, JSON …)
  • Mapping to CIDOC CRM for HS + CH interoperability implementation
slide-19
SLIDE 19

Entities: standardising th the content

  • Standard notations, Gazetteers, PIDs, Permalinks, URIs, URLs
  • Actors and Artefacts: VIAF/ORCID, DBPedia URIs, authority files,

controlled lists of institution/person names

  • Time Spans and Periods: ISO8601 encoding,

xsd:Date/xsd:DateTime formats, PeriodO URIs (period names and time spans)

  • Places: WGS84 (spatial coordinates), Geonames URIs (places

identification), Pleiades/Pelagios URIs/URLs (historical places)

  • Persistent Identifiers: unambiguous, permanent entities

identification

  • Thesauri, vocabularies, controlled lists of types
  • Widely used in Cultural Heritage
  • AAT (Getty’s Art and Architecture Thesaurus), Nomisma.org

(numismatics), PICO (Italian ICCD thesaurus), […]

  • Examples in science: CIF taxonomy of crystallographic terms
  • Existing ones: to be identified | New ones: to be created

Artefacts

http://dbpedia.org/resource/Mona_Lisa https://viaf.org/258811 (Franco Niccolucci)

People

slide-20
SLIDE 20

Im Implementing the integrated scenario

  • Manual / assisted / automatic metadata creation in agnostic XML format
  • Scientists, IT experts: to prepare information descriptions of their datasets
  • Content processing, enrichment, mapping, conversion to CIDOC CRM (RDF)
  • Integrated semantic graph: Cultural Heritage + Heritage Science information ready to be ingested

and queried within any Triple Store

  • Fully compliant with FAIR Principles - Ready for DIGILAB

Datasets XML Enrichment Mapping Conversion CIDOC CRM Semantic Graph Semantic Queries RDF N3

slide-21
SLIDE 21

Real cases application and tests

  • X-Ray Radiography
  • X-Ray Tomography
  • X-Ray Fluorescence Imaging
  • Raman Spectroscopy
  • Radiocarbon Dating

What do we need to search?!?!? Which processes do we need to describe?!?!?

slide-22
SLIDE 22

Example: Radiocarbon dating

Database

slide-23
SLIDE 23

Example: Radiocarbon dating

dataset Database

slide-24
SLIDE 24

Example: Radiocarbon dating

dataset metadata Database

slide-25
SLIDE 25

Example: Radiocarbon dating

Metadata file Event SamplePreparation added to the model, with three types: Chemical pretreatment Combustion Graphitization In the metadata file all the information we need to recover. Example: the sample name after graphitization

slide-26
SLIDE 26

Example: Radiocarbon dating

Metadata file Event SamplePreparation added to the model, with three types: Chemical pretreatment Combustion Graphitization All other information can be stored in the dataset (laboratory logbook)

slide-27
SLIDE 27

Example: Radiocarbon dating

Metadata file Event SamplePreparation added to the model, with three types: Chemical pretreatment Combustion Graphitization We need to do the same for the results Searching all the samples with final date in the range …

slide-28
SLIDE 28

Example: Radiocarbon dating

Metadata file Event SamplePreparation added to the model, with three types: Chemical pretreatment Combustion Graphitization At the moment this information is hidden in the Report file

slide-29
SLIDE 29

Conclusions and fu future work

  • Model first release (v0.9b): XML Schema, descriptive document, application

scenarios, use cases and examples

  • Mapping to CIDOC CRM already specified within the documentation
  • Minimal effort required for extending and adapting the model to analytical

scenarios in other disciplines

Future plans

  • Testing the model with existing scientific datasets: to be continued
  • Describing analysis results: modelling of measurement types, units and

values

  • Thesauri and vocabularies definition and preparation
  • Standardisation of Research Protocols -> reusable chains of planned

activities involving specific settings, tools and procedures

  • User interfaces and scripts for fast information generation
  • Towards a definition of an international standard for Heritage Science
slide-30
SLIDE 30

Thank you for your attention!

Lisa Castelli | castelli@fi.infn.it Achille Felicetti | achille.felicetti@gmail.com