the annotation of the encyclopedia of life http
play

the annotation of the Encyclopedia of Life - PowerPoint PPT Presentation

Identification of Environment Ontology terms in text and the annotation of the Encyclopedia of Life http://www.environments-eol.blogspot.com/ Evangelos Pafilis 1* , Sune Frankild 2 , Lucia Fanini 1 , Sarah Faulwetter 1 , Christina Pavloudi 1 ,


  1. Identification of Environment Ontology terms in text and the annotation of the Encyclopedia of Life http://www.environments-eol.blogspot.com/ Evangelos Pafilis 1* , Sune Frankild 2 , Lucia Fanini 1 , Sarah Faulwetter 1 , Christina Pavloudi 1 , Julia Schnetzer 3 , Aikaterini Vasileiadou 1 , Umer Ijaz 4 , Christos Arvanitidis 1 , Robert Stevenson 5 , Lars Juhl Jensen 2, 1 HCMR, Crete, GR, 2 CPR-NNF, Copenhagen, DK, 3 MPI-MM, Bremen, DE, 4 Uni of Glasgow, UK, 5 Uni of Massachusetts, Boston, US *: pafilis@hcmr.gr, http://epafilis.info TDWG – 27 th Oct 2014 – Jönköping

  2. Species – Environments TDWG – 27 th Oct 2014 – Jönköping

  3. http://eol.org/ Parr CS, et al. The Encyclopedia of Life v2: Providing Global Access to Knowledge About Life on Earth (2014) Biodiversity Data Journal 2: e1079 TDWG – 27 th Oct 2014 – Jönköping

  4. Encyclopedia of Life (EOL) http://www.eol.org • one-stop-shop for biodiversity knowledge • Over 3 Mi Taxa • http://eol.org/info/discover_what TDWG – 27 th Oct 2014 – Jönköping

  5. Encyclopedia of Life (EOL) http://www.eol.org • one-stop-shop for biodiversity knowledge • Over 3 Mi Taxa • http://eol.org/info/discover_what TDWG – 27 th Oct 2014 – Jönköping

  6. Species Descriptions (e.g. “Biology Description”, “Ecology”, “Habitat”) TDWG – 27 th Oct 2014 – Jönköping

  7. Species Descriptions (e.g. “Biology Description”, “Ecology”, “Habitat”) h ttp://rs.tdwg.org/ontology/voc/SPMInfoItems# Biology http://rs.tdwg.org/ontology/voc/SPMInfoItems# Conservation http://rs.tdwg.org/ontology/voc/SPMInfoItems# Description http://rs.tdwg.org/ontology/voc/SPMInfoItems# Dispersal http://rs.tdwg.org/ontology/voc/SPMInfoItems# Distribution http://rs.tdwg.org/ontology/voc/SPMInfoItems# Ecology http://rs.tdwg.org/ontology/voc/SPMInfoItems# Habitat http://rs.tdwg.org/ontology/voc/SPMInfoItems# LifeCycle http://rs.tdwg.org/ontology/voc/SPMInfoItems# Migration http://rs.tdwg.org/ontology/voc/SPMInfoItems# Reproduction http://rs.tdwg.org/ontology/voc/SPMInfoItems# TrophicStrategy http://www.eol.org/voc/table_of_contents# Wikipedia More info: http://eol.org/info/98 (“EOL Subject Types”) TDWG – 27 th Oct 2014 – Jönköping

  8. Information in Free Text http://eol.org/data_objects/31415353 TDWG – 27 th Oct 2014 – Jönköping

  9. Information in Free Text http://eol.org/data_objects/31415353 TDWG – 27 th Oct 2014 – Jönköping

  10. Information in Free Text ID: ENVO:00000192 Name: mudflat http://eol.org/data_objects/31415353 ID: ENVO:00000020 Name: lake TDWG – 27 th Oct 2014 – Jönköping

  11. Literature Mining TDWG – 27 th Oct 2014 – Jönköping

  12. processing text to extract facts of interest TDWG – 27 th Oct 2014 – Jönköping

  13. terrestrial, aquatic, marine, lagoon, coral reef, sediment, freshwater, soil TDWG – 27 th Oct 2014 – Jönköping

  14. ENVIRONMENTS TDWG – 27 th Oct 2014 – Jönköping

  15. ENVIRONMENTS http://environments.hcmr.gr http://environments-eol.blogspot.gr/ ● Dictionary based, Open source ● E600 gold standard: EnvO-based ● Environment Ontology corpus of EOL Species pages ● fast performance ● Recognition Accuracy – Mention Level: ● 4000 PubMed abstracts / - F1: 82.0% second * 87.1% of the TPs: exact id ● Based on SPECIES name recognition among predicted ones tagger (Pafilis et al , PLOS ONE) Pafilis E et al . (2013) The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text. PLoS ONE 8(6): e65390 *: based a single-thread run on an Intel 2,27GHz, 24 GB RAM processing a set of 536,052 abstracts TDWG – 27 th Oct 2014 – Jönköping

  16. EnvO: source of environment descriptor names and synonyms … biome … environmental feature … environmental material … environmental condition http://environmentontology.org … ~1600 terms, June 2013 habitat Based on slides by Dr. Pier Luigi Buttigier, AWI, Bremenhaven, Germany TDWG – 27 th Oct 2014 – Jönköping

  17. ENVIRONMENTS – Improving Accuracy ● Increasing matches in text ● orthographic variation supported e.g. freshwater, fresh water, and fresh-water ● Case-insensitive matching ● Synonym generation to reflect the way environment descriptive terms are mentioned in text (both generic and EnvO specific) Action Example Add a variant in which epipelagic zone → epipelagic non-informative words estuarine biome → estuarine have been removed Plural form addition sediment → sediments Adjective form addition lagoon → lagoonal ● Preventing overmatching (i.e. avoiding increased FP) ● „stopword-list” (e.g. spring, well, range) TDWG – 27 th Oct 2014 – Jönköping

  18. Scope EnvO parts Not included: species tissues foods Limitations – Known Issues negation not supported conflicts with anatomy terms (e.g. mouth, blowhole) TDWG – 27 th Oct 2014 – Jönköping

  19. ENVIRONMENTS – Sample Output Match End Start EnvO ID File Name text coord coord eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000192 eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00002297 eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000043 eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000000 eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000012 eol_documents_ascii_nonHTML.txt 346289871 346289873 mud ENVO:01000001 eol_documents_ascii_nonHTML.txt 346289871 346289873 mud ENVO:00010483 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000180 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000191 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00002297 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000176 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000000 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000477 Tags corresponding to “Habitat” text data object: http://eol.org/data_objects/31415353 of EOL Taxon Phoenicopterus ruber (Greater Flamingo): http://eol.org/pages/913221 TDWG – 27 th Oct 2014 – Jönköping

  20. ENVIRONMENTS – Sample Output Match End Start EnvO ID File Name text coord coord eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000192 eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00002297 eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000043 eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000000 eol_documents_ascii_nonHTML.txt 346289845 346289853 mud flats ENVO:00000012 eol_documents_ascii_nonHTML.txt 346289871 346289873 mud ENVO:01000001 eol_documents_ascii_nonHTML.txt 346289871 346289873 mud ENVO:00010483 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000180 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000191 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00002297 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000176 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000000 eol_documents_ascii_nonHTML.txt 346289905 346289910 mounds ENVO:00000477 Traversing all IS_A, PART_OF Relationships in EnvO Tags corresponding to “Habitat” text data object: http://eol.org/data_objects/31415353 of EOL Taxon Phoenicopterus ruber (Greater Flamingo): http://eol.org/pages/913221 TDWG – 27 th Oct 2014 – Jönköping

  21. Monthly Updates In collab.: Dr. Jennifer Hammock, Mr. Patrick Leary, Dr. Katja Schulz, Dr. Cyndy Parr TDWG – 27 th Oct 2014 – Jönköping

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend