Documenting Heri ritage Scie ience: A CID IDOC CRM-based System - - PowerPoint PPT Presentation
Documenting Heri ritage Scie ience: A CID IDOC CRM-based System - - PowerPoint PPT Presentation
Documenting Heri ritage Scie ience: A CID IDOC CRM-based System for r Modell lling Scie ientific Data Lisa Castelli | INFN, Italy Achille Felicetti | PIN, University of Florence, Italy Apologies Im NOT A DIGITAL WOMAN For any
Apologies
I’m NOT A DIGITAL WOMAN… For any question about the technical details of the model please contact my colleague Achille Felicetti achille.felicetti@gmail.com
Apologies
I’m NOT A DIGITAL WOMAN… WHY AM I HERE? INFN-CHNet in ARIADNE plus
IN INFN-CHNet
INF INFN-CHNet: Born to coordin inate th the cu cult ltural l heri ritage ac activ ivities of
- f IN
INFN facilities is opening to par artn tners with ith dif ifferent competencies (restorations centres, archaeology/chemistry departments in Universities, …)
IN INFN-CHNet
Fixed Labs
Medium-large scale facilities (IBA, 14C, ...) TL dating X-ray imaging X-ray imaging Mass Spectrometry
IN INFN-CHNet
Fixed Labs
Medium-large scale facilities (IBA, 14C, ...) TL dating X-ray imaging X-ray imaging
Mobile Labs
XRD XRF Mass Spectrometry Thermography
IN INFN-CHNet
Fixed Labs
Medium-large scale facilities (IBA, 14C, ...) TL dating X-ray imaging X-ray imaging
Mobile Labs
XRD XRF Mass Spectrometry
Digital Labs
Thermography Data Storage and fruition Web tools for data fruition
CHNet-DIGILAB: an ideal world
- User access
control
CHNet database @CNAF Non- expert User Services
Browse/Query Interfaces
Data + Metadata Data + Metadata
Fruition tools Analysis tools
WEB
expert User
Fruition Fruition & Analysis
ARIADNE plus
- Realization of a complex Digital Infrastructure
- 40 partners - all Europe!
- Archaeology + historical sources, linguistic data, catalographic data,
scientific data from archaeometric analyses
- Innovative Services and Cloud Environment
- Data annotation (image and text)
- More advanced NLP and text mining (machine learning approach)
- Definition of univoque spatial and temporal entities for archaeology
- Geografic system: interoperability of national geografical systems and GIS, to
integrate in a unique transnational system (ARIADNE GeoServer)
- Cloud environment: shared resources, efficient management of services
- Implementation of services for visualization and analysis also of scientific
data, high resolution images and online 3D visualization.
Goals of f DIG IGIL ILAB (CHNet and more…)
- Interoperability between different Heritage Science communities
- Data from analyses, conservation and restoration activities
- Cross-disciplinary information integration
- Interoperability with Humanities
(History, Archaeology, Art History, […])
- System for discovering, accessing, reusing integrated data and
services
- Solid metadata model for data and service description required
- Design of the integrated system
Exi xisting schemas and metadata models
- International metadata standards for scientific research
- CERIF (EU recommended format), OBOE (scientific observation and measurement), NeXus
(neutron and x-ray), AVM (astronomical images), CIF (crystallography), […]
- Formats in use by national labs/institutions (Italy)
- Institute for the Conservation and Valorisation of Cultural Heritage (ICVBC-CNR): XRD data
in XRDML format, CIF taxonomy of crystallographic terms
- National Institute for Nuclear Physics (INFN) CHNet: C14 binary data + data in Excel tables,
XRF data in RAW format
- Istituto Superiore per la Conservazione ed il Restauro (ISCR): multispectral information in
ICCD-based format
- Opificio delle Pietre Dure (OPD): photographs, radiographies, 3D models in various formats
- National Institute of Optics (CNR-INO): multispectral and OCT data in RAW and TIFF
formats
- Institute of Molecular Science and Technologies (ISTM-CNR): MOVIDA software
(multi‐technique diagnostics, information in relational database) -> MOLAB
- Formats used by European labs/institutions: investigation planned for next period
Exi xisting schemas and metadata models
- International metadata standards for scientific research
- CERIF (EU recommended format), OBOE (scientific observation and measurement), NeXus
(neutron and x-ray), AVM (astronomical images), CIF (crystallography), […] Too general/too specific
- Formats in use by national labs/institutions (Italy)
- Institute for the Conservation and Valorisation of Cultural Heritage (ICVBC-CNR): XRD data
in XRDML format, CIF taxonomy of crystallographic terms
- National Institute for Nuclear Physics (INFN) CHNet: C14 binary data + data in Excel tables,
XRF data in RAW format
- Istituto Superiore per la Conservazione ed il Restauro (ISCR): multispectral information in
ICCD-based format
- Opificio delle Pietre Dure (OPD): photographs, radiographies, 3D models in various formats
- National Institute of Optics (CNR-INO): multispectral and OCT data in RAW and TIFF
formats
- Institute of Molecular Science and Technologies (ISTM-CNR): MOVIDA software
(multi‐technique diagnostics, information in relational database) -> MOLAB
- Formats used by European labs/institutions: investigation planned for next period
Exi xisting schemas and metadata models
- International metadata standards for scientific research
- CERIF (EU recommended format), OBOE (scientific observation and measurement), NeXus
(neutron and x-ray), AVM (astronomical images), CIF (crystallography), […] Too general/too specific
- Formats in use by national labs/institutions (Italy)
- Institute for the Conservation and Valorisation of Cultural Heritage (ICVBC-CNR): XRD data
in XRDML format, CIF taxonomy of crystallographic terms
- National Institute for Nuclear Physics (INFN) CHNet: C14 binary data + data in Excel tables,
XRF data in RAW format
- Istituto Superiore per la Conservazione ed il Restauro (ISCR): multispectral information in
ICCD-based format
- Opificio delle Pietre Dure (OPD): photographs, radiographies, 3D models in various formats
- National Institute of Optics (CNR-INO): multispectral and OCT data in RAW and TIFF
formats
- Institute of Molecular Science and Technologies (ISTM-CNR): MOVIDA software
(multi‐technique diagnostics, information in relational database) -> MOLAB
- Formats used by European labs/institutions: investigation planned for next period
a chaos!!!
“Going semantics”: the CIDOC CRM family
- CIDOC CRM (ISO 21127:2006) - http://cidoc-crm.org
- High-level conceptual model for standardisation, integration and interoperability
- f Cultural Heritage information
- Core model + domain-specific extensions
- CRMsci - http://cidoc-crm.org/crmsci
- Scientific observation information
- CMRdig - http://cidoc-crm.org/crmdig
- Provenance metadata for digital objects
- CRMpe
- PARTHENOS Entities Model
- Datasets, services and curators
- Versioning, accessibility, licensing …
- PARTHENOS Registry
Devices (D8) Actor (E39) Places (E53)/ Time (E52) Man-made object (E22) Datasets (PE18) Activity (E7) Legal Bodies (E40)
Basis of f CID IDOC CRM model
Campaign Measurement Analysis
Devices (D8) Actor (E39) Places (E53)/ Time (E52) Man-made object (E22) Datasets (PE18) Activity (E7) Legal Bodies (E40)
Basis of f CID IDOC CRM model
Campaign Measurement Analysis Belong to
Devices Actor Places/ Time Man-made object Datasets Legal Bodies
Our model
Campaign Measurement Analysis
XML file of metadata
Our model for scie
ientific datasets description
- Reuses logics and components of existing CIDOC models
- Modular approach
- “Complexity hiding” paradigm in content generation
- User-friendly interfaces for rapid manual and semi-automatic metadata creation
- Scripts for automatic generation of information (e.g.: from digital devices) or
existing metadata in a transparent way
- Semantic network for advanced queries (CIDOC CRM modelling principles)
- Properties and relationships to link entities in a meaningful way
- Platform independent intermediate meta-format (XML)
- Dynamic mapping and re-encoding in existing standards (CSV, SQL, RDF, JSON …)
- Mapping to CIDOC CRM for HS + CH interoperability implementation
Entities: standardising th the content
- Standard notations, Gazetteers, PIDs, Permalinks, URIs, URLs
- Actors and Artefacts: VIAF/ORCID, DBPedia URIs, authority files,
controlled lists of institution/person names
- Time Spans and Periods: ISO8601 encoding,
xsd:Date/xsd:DateTime formats, PeriodO URIs (period names and time spans)
- Places: WGS84 (spatial coordinates), Geonames URIs (places
identification), Pleiades/Pelagios URIs/URLs (historical places)
- Persistent Identifiers: unambiguous, permanent entities
identification
- Thesauri, vocabularies, controlled lists of types
- Widely used in Cultural Heritage
- AAT (Getty’s Art and Architecture Thesaurus), Nomisma.org
(numismatics), PICO (Italian ICCD thesaurus), […]
- Examples in science: CIF taxonomy of crystallographic terms
- Existing ones: to be identified | New ones: to be created
Artefacts
http://dbpedia.org/resource/Mona_Lisa https://viaf.org/258811 (Franco Niccolucci)
People
Im Implementing the integrated scenario
- Manual / assisted / automatic metadata creation in agnostic XML format
- Scientists, IT experts: to prepare information descriptions of their datasets
- Content processing, enrichment, mapping, conversion to CIDOC CRM (RDF)
- Integrated semantic graph: Cultural Heritage + Heritage Science information ready to be ingested
and queried within any Triple Store
- Fully compliant with FAIR Principles - Ready for DIGILAB
Datasets XML Enrichment Mapping Conversion CIDOC CRM Semantic Graph Semantic Queries RDF N3
Real cases application and tests
- X-Ray Radiography
- X-Ray Tomography
- X-Ray Fluorescence Imaging
- Raman Spectroscopy
- Radiocarbon Dating
…
What do we need to search?!?!? Which processes do we need to describe?!?!?
Example: Radiocarbon dating
Database
Example: Radiocarbon dating
dataset Database
Example: Radiocarbon dating
dataset metadata Database
Example: Radiocarbon dating
Metadata file Event SamplePreparation added to the model, with three types: Chemical pretreatment Combustion Graphitization In the metadata file all the information we need to recover. Example: the sample name after graphitization
Example: Radiocarbon dating
Metadata file Event SamplePreparation added to the model, with three types: Chemical pretreatment Combustion Graphitization All other information can be stored in the dataset (laboratory logbook)
Example: Radiocarbon dating
Metadata file Event SamplePreparation added to the model, with three types: Chemical pretreatment Combustion Graphitization We need to do the same for the results Searching all the samples with final date in the range …
Example: Radiocarbon dating
Metadata file Event SamplePreparation added to the model, with three types: Chemical pretreatment Combustion Graphitization At the moment this information is hidden in the Report file
Conclusions and fu future work
- Model first release (v0.9b): XML Schema, descriptive document, application
scenarios, use cases and examples
- Mapping to CIDOC CRM already specified within the documentation
- Minimal effort required for extending and adapting the model to analytical
scenarios in other disciplines
Future plans
- Testing the model with existing scientific datasets: to be continued
- Describing analysis results: modelling of measurement types, units and
values
- Thesauri and vocabularies definition and preparation
- Standardisation of Research Protocols -> reusable chains of planned
activities involving specific settings, tools and procedures
- User interfaces and scripts for fast information generation
- Towards a definition of an international standard for Heritage Science