Formatting records for the NBN Atlas: An introduction to Darwin Core - - PowerPoint PPT Presentation

formatting records for the nbn atlas an introduction to
SMART_READER_LITE
LIVE PREVIEW

Formatting records for the NBN Atlas: An introduction to Darwin Core - - PowerPoint PPT Presentation

Formatting records for the NBN Atlas: An introduction to Darwin Core SOPHIA RATCLIFFE NBN Trust Technical & Data Partner Support Officer REUBEN ROBERTS NBN Systems Developer NBN Conference 2018 Knowledge Transfer Session Sharing UK


slide-1
SLIDE 1

SOPHIA RATCLIFFE NBN Trust Technical & Data Partner Support Officer REUBEN ROBERTS NBN Systems Developer

Formatting records for the NBN Atlas: An introduction to Darwin Core

Sharing UK wildlife data

NBN Conference 2018 Knowledge Transfer Session

slide-2
SLIDE 2

Session Aims

Sharing UK wildlife data

— What is Darwin Core (DwC)? — How is DwC used in the NBN Atlas — Can we (NBN) use DwC better? — What can we contribute to DwC?

— (Improvements to NBN Atlas pages)

2

slide-3
SLIDE 3

What is DwC?

Sharing UK wildlife data

Darwin Core is the data standard for publishing and integrating biodiversity information Library of terms aimed at to providing common naming conventions and data structure Primarily based on taxa and their occurrence

Adapted from: http://rs.tdwg.org/dwc/ Wieczorek et al. (2012) Darwin Core: An Evolving Community-Developed Biodiversity Data Standard. PLoS ONE 7(1): e29715

3

slide-4
SLIDE 4

Taxonomic Databases Working Group

# DwC terms:

1985 1998 2009 2018

1st DwC protocol DwC ratified

20 169 255

4

Sharing UK wildlife data

slide-5
SLIDE 5

Who (else) uses DwC?

Sharing UK wildlife data

5

slide-6
SLIDE 6

DwC reference guide

http://rs.tdwg.org/dwc/terms/

6

slide-7
SLIDE 7

Record-level terms Institution, collection, nature of record, licence, rightsholder Occurrence Occurrence ID, recorder, individual count, quantity (and quantity type), sex, life stage, behaviour, status (presence/absence) Organism Organism scope (colony, nest, clump), organism remarks Event Date, sampling protocol and methods, field notes Location Latitude and longitude coordinates, geodetic datum, location name and remarks Identification Verification status, identifier Taxon Taxon ID (UKSI taxon version key), scientific name, vernacular name

DwC classes and terms

7

Sharing UK wildlife data

slide-8
SLIDE 8

DwC term example

http://rs.tdwg.org/dwc/terms/

8

Sharing UK wildlife data

slide-9
SLIDE 9

DwC example

What does it mean in terms of the data?

HBRG Insects Dataset

9

Sharing UK wildlife data

slide-10
SLIDE 10

DwC Extensions

— Simple multimedia — Literature references — Minimum Information about any (x) Sequence

(MIxS)

10

Sharing UK wildlife data

slide-11
SLIDE 11

Who manages DwC?

Darwin Core Maintenance Group

https://www.tdwg.org/standards/dwc/maintenance/ — Issues submitted to a Github site:

https://github.com/tdwg/dwc/issues

— 30-day public review — review by TDWG's Technical Architecture Group

11

Sharing UK wildlife data

slide-12
SLIDE 12

DwC Archives

Sharing data

http://tools.gbif.org/dwca-assistant/

TXT TXT TXT

Meta.XML

ZIP

Archive XML

EML.XML

XML

Describes E x t e n d s

Extensions Core 12

Sharing UK wildlife data

slide-13
SLIDE 13

DwC on the NBN Atlas

— Taxon information (species dictionary) — updated 6-12 monthly — Occurrence records — monthly processing run (1st weekend of each

month)

13

Sharing UK wildlife data

slide-14
SLIDE 14

Species dictionary

Taxon identifier Scientific names Vernacular names Rank Status (accepted/synonym) Establishment means (native/non-native) Establishment status Realm (terrestrial, marine, freshwater)

Darwin Core TAXON UK Species Inventory Access DB (NHM, London)

14

Sharing UK wildlife data

slide-15
SLIDE 15

Occurrence records

Accepted formats

— DwCA (iRecord, RBGE) — NBN Atlas formatted spreadsheets — NBN Exchange format (Recorder 6, Marine

Recorder)

— Unformatted spreadsheets

15

Sharing UK wildlife data

slide-16
SLIDE 16

Occurrence records terms

— Core — Desirable — Non-DwC — Other

16

Sharing UK wildlife data

slide-17
SLIDE 17

Core terms

— occurrenceID — basisOfRecord — license — rightsholder — institutionCode — occurrenceStatus (present / absent) — identificationVerificationStatus

17

Sharing UK wildlife data

slide-18
SLIDE 18

basisOfRecord

18

Sharing UK wildlife data

slide-19
SLIDE 19

§ Accepted § Accepted - considered correct § Accepted - correct § Unconfirmed § Unconfirmed - plausible § Unconfirmed - not reviewed

identificationVerificationStatus

19

Sharing UK wildlife data

slide-20
SLIDE 20

Core terms cont.

— taxonID or scientificName or vernacularName — eventDate — gridReference / decimalLatitude & decimalLongitude — geodeticDatum — coordinateUncertaintyInMeters — locality — recordedBy — identifiedBy

20

Sharing UK wildlife data

slide-21
SLIDE 21

— eventDate (YYYY-MM-DD) (ISO 8601)

÷1998-03-28 ÷1998-03-28/05-31 ÷1998-03 ÷1998-03/05 ÷1998 ÷1998/2002

— day, month, year (single fields)

÷preferred method for single day events and partial

dates (?)

eventDate

21

Sharing UK wildlife data

slide-22
SLIDE 22

— verbatimEventDate

÷“spring 1998”

— datePrecision (non-DwC) — endDate (non-DwC)

÷ endDate day, month, year

eventDate cont.

22

Sharing UK wildlife data

slide-23
SLIDE 23

Core terms cont.

— taxonID or scientificName or vernacularName — eventDate — gridReference / decimalLatitude & decimalLongitude — geodeticDatum – default WGS84 — coordinateUncertaintyInMeters — locality — recordedBy — identifiedBy — datasetName

23

Sharing UK wildlife data

slide-24
SLIDE 24

— verifier — organismStatus (alive/dead)

non-DwC terms

24

Sharing UK wildlife data

slide-25
SLIDE 25

— individualCount — organismQuantity — organismQuantityType — organismScope — sex — lifeStage

Desirable terms

25

Sharing UK wildlife data

slide-26
SLIDE 26

Examples: “1 Adult”, “Frequent”, “1 Male”, “#NAME?”, “0.25”, “2 Adult Male; 1 Juvenile Female”, “Many”

individualCount

— 3% records have individual count (~5m) — 29,000 different values

26

Sharing UK wildlife data

slide-27
SLIDE 27
  • rganismQuantity

27

Sharing UK wildlife data

slide-28
SLIDE 28
  • rganismQuantity

Examples: “Many”, “Several”, ”sev.”, ”Present” “Occasional” or “O” (organismQuantityType: DAFOR) ”50” (organismQuantityType: % cover)

— 540,000 records with organismQuantity — 2,000 different values

28

Sharing UK wildlife data

slide-29
SLIDE 29

— individualCount — organismQuantity — organismQuantityType — organismScope — sex — lifeStage

Desirable terms

29

Sharing UK wildlife data

slide-30
SLIDE 30
  • rganismScope

30

Sharing UK wildlife data

slide-31
SLIDE 31
  • rganismScope

32.4% 13% 9.1% 9.1%

Breeding pair droppings Female heard Male nest pair Pair shell territories

Other examples: sett, spraint, tracks, nest, burrow, eggs 5,227 records with organismScope

31

Sharing UK wildlife data

slide-32
SLIDE 32

lifeStage

20% 65.4% 6%

ad adult calves gall larva larvae male not recorded pre preadult

Other examples: immature, nymph, young, dead, chick 473,631 records with lifeStage

32

Sharing UK wildlife data

slide-33
SLIDE 33

— occurrenceRemarks — organismRemarks — eventRemarks — locationRemarks — identificationRemarks

Comment (remarks) fields

33

Sharing UK wildlife data

slide-34
SLIDE 34

Event

— eventID — samplingProtocol — sampleSizeValue — sampleSizeUnit — samplingEffort

Other terms

34

Sharing UK wildlife data

slide-35
SLIDE 35

Record-level

— bibliographicCitation — references — informationWithheld — dataGeneralizations — dynamicProperties

Other terms cont.

35

Sharing UK wildlife data

slide-36
SLIDE 36

dynamicProperties

A list of additional measurements, facts, characteristics, or assertions about the record. Meant to provide a mechanism for structured content.

36

Sharing UK wildlife data

slide-37
SLIDE 37

dynamicProperties

National Dormouse Database (NDD)

NDMPsite: Yes RecordType: Live specimen RecordTypeReliability: Good A list of additional measurements, facts, characteristics, or assertions about the record. Meant to provide a mechanism for structured content.

37

Sharing UK wildlife data

slide-38
SLIDE 38

Data processing

38

Sharing UK wildlife data

slide-39
SLIDE 39

Data processing

  • 1. Processing
  • 2. Sampling
  • 3. Indexing

SEARCH INDEX (raw and processed values)

39

Sharing UK wildlife data

slide-40
SLIDE 40
  • 1. Processing

40

Sharing UK wildlife data

— Name matching routine — OSGR <> Latitude/longitude coordinates — Dates — Species list membership — Sensitive species

slide-41
SLIDE 41

Sensitive species

41

NBN 2018 Conference – Knowledge Transfer Session

slide-42
SLIDE 42
  • 1. Processing cont.

42

Sharing UK wildlife data

— Data quality checks

÷recordHasIssues ÷recordIssues

slide-43
SLIDE 43

Data quality checks

43

Sharing UK wildlife data

slide-44
SLIDE 44
  • 2. Sampling

44

Sharing UK wildlife data

— Boundaries — Habitats — Environmental layers

slide-45
SLIDE 45
  • 2. Sampling

45

NBN 2018 Conference – Knowledge Transfer Session

slide-46
SLIDE 46
  • 3. Indexing

46

Sharing UK wildlife data

— SOLR — Occurrence record fields:

÷https://records-ws.nbnatlas.org/index/fields

— Only possible to search / filter / facet indexed fields — Can add fields to the index (e.g. lifeStage)

slide-47
SLIDE 47

— Recorder 6 dataset (Highlands Biological Records Centre) — CEDaR Northern Ireland Seal Survey

Worked examples

47

Sharing UK wildlife data

slide-48
SLIDE 48

Help

48

Sharing UK wildlife data

— NBN Atlas documentation site

https://docs.nbnatlas.org/share-species-occurrence- records-with-the-nbn-atlas/

— Darwin Core quick reference guide

÷https://dwc.tdwg.org/terms/

— Darwin Core Archive Assistant (GBIF)

÷http://tools.gbif.org/dwca-assistant/

— Darwin Core Archive Validator (GBIF)

÷https://tools.gbif.org/dwca-validator/

slide-49
SLIDE 49

Controlled vocabularies:

— lifeStage — sex — organismScope

Can we use DwC better?

49

Sharing UK wildlife data

slide-50
SLIDE 50

— organismStatus — verifier

What can we contribute back?

50

Sharing UK wildlife data

slide-51
SLIDE 51

Improvements to the presentation of records in the NBN Atlas:

  • 1. Occurrence records page
  • 2. Data resource metadata page
  • 3. Advanced records search

Improvements

51

Sharing UK wildlife data

slide-52
SLIDE 52

— https://github.com/nbnuk/nbnatlas-issues

Improvements

52

Sharing UK wildlife data