The FAIR data scientist Dr Rebecca Lange Curtin Institute for - - PowerPoint PPT Presentation

the fair data scientist
SMART_READER_LITE
LIVE PREVIEW

The FAIR data scientist Dr Rebecca Lange Curtin Institute for - - PowerPoint PPT Presentation

The FAIR data scientist Dr Rebecca Lange Curtin Institute for Computation WAGUL Research Forum - 23 July 2019 CC BY-SA 4.0 What do Astronomy, Art Conservation and Smart Cities have in common? CC BY-SA 4.0 one data scientist CC BY-SA 4.0


slide-1
SLIDE 1

CC BY-SA 4.0

The FAIR data scientist

Dr Rebecca Lange Curtin Institute for Computation

WAGUL Research Forum - 23 July 2019

slide-2
SLIDE 2

CC BY-SA 4.0

What do Astronomy, Art Conservation and Smart Cities have in common?

slide-3
SLIDE 3

CC BY-SA 4.0

  • ne data

scientist

slide-4
SLIDE 4

CC BY-SA 4.0

I loved astronomy ever since I can remember 💜⭐ But I also like to build things and study old things. 🔭🎩 And how will all the technological advances change how we live? 🤗🏚

slide-5
SLIDE 5

CC BY-SA 4.0

  • Lived and

studied in 3 countries

  • Visited 11

countries for work

  • Collaborators

across the globe

slide-6
SLIDE 6

CC BY-SA 4.0

My Journey

slide-7
SLIDE 7

CC BY-SA 4.0 CC BY-SA 4.0

Art Conservation

Imaging & Sensing for Archaeology, Art History & Conservation [1]

slide-8
SLIDE 8

CC BY-SA 4.0 CC BY-SA 4.0

Astronomy

Galaxy And Mass Assembly survey [2]

slide-9
SLIDE 9

CC BY-SA 4.0 CC BY-SA 4.0

Data Science

Curtin Institute for Computation [3]

RAC Pulse of Perth [5]

RENeW Nexus [6] Multi-modal analysis - Shiny Web App [4]

slide-10
SLIDE 10

CC BY-SA 4.0

How do you share your data?

slide-11
SLIDE 11

CC BY-SA 4.0

How do you share your data? FAIR-ly.

slide-12
SLIDE 12

CC BY-SA 4.0

Findable Accessible Interoperable Reusable

FORCE11 [7] To be Findable: F1. (meta)data are assigned a globally unique and eternally persistent identifier. F2. data are described with rich metadata. F3. (meta)data are registered or indexed in a searchable resource. F4. metadata specify the data identifier. To be Accessible: A1 (meta)data are retrievable by their identifier using a standardized communications protocol. A1.1 the protocol is open, free, and universally implementable. A1.2 the protocol allows for an authentication and authorization procedure, where necessary. A2 metadata are accessible, even when the data are no longer available. To be Interoperable: I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. I2. (meta)data use vocabularies that follow FAIR principles. I3. (meta)data include qualified references to other (meta)data. To be Re-usable: R1. meta(data) have a plurality of accurate and relevant attributes. R1.1. (meta)data are released with a clear and accessible data usage license. R1.2. (meta)data are associated with their provenance. R1.3. (meta)data meet domain-relevant community standards.

slide-13
SLIDE 13

CC BY-SA 4.0

The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing. [8]

Findable Accessible Interoperable Reusable

slide-14
SLIDE 14

CC BY-SA 4.0 CC BY-SA 4.0

I.1 - exchange of data We worked with a software developer to write the

  • perational software for our instrument.

After lengthy discussions we agreed to save the data as a simple CSV file. JSON [11] (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and

  • generate. ➡ Easy to automate with API.

Findable Accessible Interoperable Reusable

slide-15
SLIDE 15

CC BY-SA 4.0 CC BY-SA 4.0

I.1 & I.2 - exchange & vocabularies FITS [12] is used for the transport, analysis, and archival storage of scientific data sets (open standard, 1981)

  • Multi-dimensional arrays: 1D, images, 3D+ cubes
  • Tables containing rows and columns of data
  • Header keywords provide descriptive information

about the content ○ Agreed standard for e.g. telescope images

Findable Accessible Interoperable Reusable

slide-16
SLIDE 16

CC BY-SA 4.0 CC BY-SA 4.0

I.3 - linked data The Centre de Données astronomiques de Strasbourg (CDS [13]) provides a service that links various data sources making discovery easy.

Findable Accessible Interoperable Reusable

slide-17
SLIDE 17

CC BY-SA 4.0

The ultimate goal of FAIR is to

  • ptimise the reuse of data. To

achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings. [8]

Findable Accessible Interoperable Reusable

slide-18
SLIDE 18

CC BY-SA 4.0 CC BY-SA 4.0

R1 & R1.2 - usability and history of data For any of our value-add catalogues for GAMA [2] we followed strict guidelines of how to collate the data and metadata making sure the catalogues and tables had explanations and descriptions detailed enough for any new users to jump right in, e.g.:

  • Who, when, what?
  • Scope and limitations of data

Findable Accessible Interoperable Reusable

slide-19
SLIDE 19

CC BY-SA 4.0 CC BY-SA 4.0

R1.1 - license Most telescope data is propriety for a short period of time before it is being made public, e.g. Hubble Space telescope data is made public after 1 year. [14] Many large surveys work on value-add catalogues which are made public after a period of time or after a journal publication.

Findable Accessible Interoperable Reusable

slide-20
SLIDE 20

CC BY-SA 4.0 CC BY-SA 4.0

R1.3 - community standards “The Virtual Observatory (VO) is the vision that astronomical datasets and other resources should work as a seamless whole. Many projects and data centres worldwide are working towards this goal. The International Virtual Observatory Alliance (IVOA) is an organisation that debates and agrees the technical standards that are needed to make the VO possible. It also acts as a focus for VO aspirations, a framework for discussing and sharing VO ideas and technology, and body for promoting and publicising the VO.” [15]

Findable Accessible Interoperable Reusable

slide-21
SLIDE 21

CC BY-SA 4.0

References and Further Reading

Projects mentioned

[1] Imaging & Sensing for Archaeology, Art History & Conservation (ISAAC) https://www.ntu.ac.uk/research/groups-and-centres/groups/imaging-sensing-for-archaeology-art-history-and-conservation [2] Galaxy And Mass Assembly survey http://www.gama-survey.org/ [3] Curtin Institute for Computation https://computation.curtin.edu.au/ [4] Multi-modal analysis shiny app https://shiny.computation.org.au/MMAv0.2/ [5] RAC pulse of Perth https://imovecrc.com/news-articles/personal-public-mobility/data-visualisation-perth-public-transport/ [6] RENEW NEXUS https://mysay.fremantle.wa.gov.au/renew-nexus

FAIR principle

[7] https://www.force11.org/group/fairgroup/fairprinciples [8] https://www.go-fair.org/fair-principles/ [9] https://www.ands.org.au/working-with-data/fairdata [10] https://ardc.edu.au/resources/working-with-data/fair-data/

FAIR examples

[11] JSON https://json.org/ [12] FITS https://fits.gsfc.nasa.gov/ [13] Centre de Données astronomiques de Strasbourg (CDS) http://cds.u-strasbg.fr/ [14] Barbara A. Mikulski Archive for Space Telescopes (MAST) http://archive.stsci.edu/ [15] International Virtual Observatory Alliance http://www.ivoa.net/

The Magnifying glass, Tap, Gears set, Recycle sign, Storage, Infinity, Discussion, Shield, and Man User icons made by Freepik from www.flaticon.com are licensed by CC 3.0 BY. All other icons made by ARDC. Entire FAIR resources graphic is licensed under a Creative Commons Attribution 4.0 International License

Thank you.