Biodiversity and Ecosystem Informatics: Research, Technology - - PowerPoint PPT Presentation

biodiversity and ecosystem informatics
SMART_READER_LITE
LIVE PREVIEW

Biodiversity and Ecosystem Informatics: Research, Technology - - PowerPoint PPT Presentation

Biodiversity and Ecosystem Informatics: Research, Technology Transfer, or Application Development? Jessie Kennedy http://www.soc.napier.ac.uk/jessie 02/12/2002 VLDB 2002 1 An ecological question What is the effect of change in


slide-1
SLIDE 1

02/12/2002 VLDB 2002 1

Biodiversity and Ecosystem Informatics:

Research, Technology Transfer, or Application Development?

Jessie Kennedy http://www.soc.napier.ac.uk/jessie

slide-2
SLIDE 2

An ecological question…

➤ What is the effect of change in (ozone) on the distribution of (Bellis perennis) in (temperate grasslands) in (Europe)?

➤ Answer requires (amongst other problems) integration of many different databases

Climate Data CHM Sequence Data (GenBank, RNA, protein, etc.) Ecosystems Data Geospatial Data Ecological Data

Content area responsibilities of GBIF

Catalog of Names of Known Organisms Catalog of Names of Known Organisms Search Engines Biological Specimen Data Access/Inter-

  • perability

Courtesy of Global Biodiversity Information Facility - http://www.gbif.org

BIG assumption……….

slide-3
SLIDE 3

Biological taxonomy

➤ How do we catalogue all of the species of plants on Earth?

➤ Plant Taxonomy (classification)

NHM London

➤ Data -> Real specimens

➤Stored in herbaria, museums…. ➤Recorded in notebooks, books, journals ➤Masses of information inaccessible

Conversion to electronic media isn’t really a DB problem

  • is it?
slide-4
SLIDE 4

Specimen - Description

➤ Taxonomic characters

➤ annual, leaves hairy, lanceolate, mostly white flowers…..

Fruit of Torilis japonica

➤No agreed terminology

➤plant structure ➤Attributes ➤Values

➤Ontology problem

➤DB research?

➤DB support ➤definitions ➤exemplars

slide-5
SLIDE 5

Globba Cerantera Globba G pendula G albiflora G siamensis

Classifying and naming plants

➤Specimens classified into taxa then named by set of rules

genus species section

Globba Ceratanthera Marantella G pendula G calophylla G siamensis

➤Revisions are common

➤Taxa or specimens may appear in many classifications simultaneously

What about this specimen – what species is it? Are these two taxa (concepts) the same? There is over 250 years legacy data contributing to this problem... Retrieve info on G albiflora?

slide-6
SLIDE 6

Multiple Overlapping Classifications

slide-7
SLIDE 7

What are the DB issues?

➤ DBMSs don’t provide sufficient semantic mechanisms to support required application functionality…….

➤ Orthogonality of classification and data

➤Objects are not designed to be classified

➤ Support for trees/graphs

➤Multiple overlapping classifications -> a directed acyclic graph ➤Nodes (taxa or specimens) are complex objects ➤Levels (ranks) contain information ➤Each classification is independent from all the others

➤ Support for traceability

➤Rationale for classification is important

➤ Support domain specific rules

➤data derivation ➤constraints

slide-8
SLIDE 8

What have I learned….

➤ Need to understand the domain problems to provide good solutions. ➤ Accurately representing data from observations or experiments is vital if data is of use in future ➤ Database technology plays an important role in ensuring this

➤ Data modelling research and incorporation into DB systems

➤Lots of semantic models – but still not in DBs in usable/efficient form

➤ Query languages suitable for end-users with complex queries ➤ Data visualisation tools

➤query as fast as I would brushing over a visualisation

➤ Ontology problem

➤how can we get the meaning in to the DB in a manageable way.

➤ Data Provenance

➤annotations, workflow

➤ Core DB

➤all these extra semantics means even better performance needed.

slide-9
SLIDE 9

SEEK

Science Environment for Ecological Knowledge (SEEK) Algorithm Execution System Information Access Layer Semantic Mediation Layer Analysis and Modeling Layer Semantic Mediation Engine Mediated Data View Output Integrated Analysis View Sn Mn CMn S1 M1 CM1 Domain Map Species Analyst SRB Metacat S2 M2 CM2 PM1 A1 PMn An U D D I

This incorporates the taxonomy problem 5 year NSF funded project

slide-10
SLIDE 10

Questions

➤ Is there original DB research in biodiversity / ecological informatics?

➤ Yes - Many of the same general problems but with challenging difficulties

➤ Do they need off the shelf applications

➤ Yes - but they’re not there in any usable form

➤ Organizational infrastructure for supporting data re-use

➤ Yes - vital they re-use concepts accurately

➤ Training for ecologists in using systems?

➤ Yes - but not inappropriate ones that waste their time...

➤ More ecologists doing domain research?

➤ Yes - but with tools to help them do the job more efficiently and accurately…..