abbreviation detection for biomedical articles
play

Abbreviation detection for biomedical articles by Sonja Kenari - PowerPoint PPT Presentation

Abbreviation detection for biomedical articles by Sonja Kenari Agenda Introduction Background Implementation Results Further Improvents Introduction Full project description COVID-19 Open Research Dataset Challenge (CORD-19): What do we


  1. Abbreviation detection for biomedical articles by Sonja Kenari

  2. Agenda Introduction Background Implementation Results Further Improvents

  3. Introduction Full project description COVID-19 Open Research Dataset Challenge (CORD-19): What do we know about vaccines and therapeutics? Abbreviation Dictionary Relationship NER detection tagger extraction 1

  4. Introduction Abbreviation Detection spaCy Python library for NLP Abbreviation detection Makes it easier to: Find articles of interest faster ? Keep up with the amount of new abbreviations 2

  5. Background Abbreviation Detection Pre trained models by spaCy scispaCy: AbbreviationDetector Detect: abbreviations & definitions short form long form Accuracy? 3

  6. Implementation Generate Pubannotations data subset [json] pubannotation [json] 100 out of 60,000 articles metadata file [csv] 4

  7. Implementation Generating files of abbreviations web scraping scispaCy output file format metadata file [csv] data subset [json] url full texts HTML parser AbbreviationDetector BeautifulSoup abbreviation, abbreviations, csv files Abbreviation, Abbreviations csv files 5

  8. Implementation Evaluation Compare the 2 { detected abbreviations with spaCy [csv] detected abbreviations with web scraping [csv] Number unique short forms detected by spaCy = (%) Number short forms detected by web scraping Number unique long forms detected by spaCy = (%) Number long forms detected by web scraping 6

  9. Result Result Abbreviation lists in short forms hit rate long forms hit rate Highest: 87.5% Highest: 52.6% Lowest: 25% Lowest : 0% 20 out of 100 notable faults - spaCy weak on long form - text from json files not updated after url articles - faults in denotation extraction 7

  10. Further Improvements spaCy Optimize programs Improve the results Make more time effjcient Extract from web scraper Pubannotations Update data Instead of full text extraction 8

  11. Thank you for listening! Questions...? Sonja Kenari nat14sta@student.lu.se 9 2020-05-29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend