Abbreviation detection for biomedical articles by Sonja Kenari - - PowerPoint PPT Presentation

▶

Aug 12, 2023 385 likes •502 views

Abbreviation detection for biomedical articles by Sonja Kenari Agenda Introduction Background Implementation Results Further Improvents Introduction Full project description COVID-19 Open Research Dataset Challenge (CORD-19): What do we

SLIDE 1

by Sonja Kenari

Abbreviation detection for biomedical articles

SLIDE 2

Introduction Background Implementation Results Further Improvents

Agenda

SLIDE 3

Abbreviation detection Dictionary tagger NER Relationship extraction

Introduction

Full project description

COVID-19 Open Research Dataset Challenge (CORD-19):

What do we know about vaccines and therapeutics?

1

SLIDE 4

Introduction

Abbreviation Detection

?

spaCy Python library for NLP

Makes it easier to: Find articles of interest faster Keep up with the amount of new abbreviations

Abbreviation detection

2

SLIDE 5

Background

Abbreviation Detection

scispaCy:

AbbreviationDetector

Pre trained models by spaCy Detect: abbreviations & definitions Accuracy?

long form short form

3

SLIDE 6

data subset [json] metadata file [csv] pubannotation [json] 100 out of 60,000 articles

Implementation

Generate Pubannotations

4

SLIDE 7

metadata file [csv] url HTML parser

BeautifulSoup

abbreviation, abbreviations, Abbreviation, Abbreviations

csv files web scraping

Implementation

Generating files of abbreviations

data subset [json]

full texts AbbreviationDetector csv files scispaCy

utput file format

5

SLIDE 8

= (%)

detected abbreviations with spaCy [csv] detected abbreviations with web scraping [csv] Compare the 2 { Number unique short forms detected by spaCy Number short forms detected by web scraping Number unique long forms detected by spaCy Number long forms detected by web scraping

= (%)

Implementation

Evaluation

6

SLIDE 9

Highest: 87.5% Lowest: 25%

short forms hit rate

Highest: 52.6% Lowest: 0%

long forms hit rate

spaCy weak on long form
text from json files not updated after url articles
faults in denotation extraction

notable faults

20 out of 100 Abbreviation lists in

Result Result

7

SLIDE 10

web scraper

Update data

Further Improvements

spaCy

Improve the results

Extract from Pubannotations

Instead of full text extraction

Optimize programs

Make more time effjcient

8

SLIDE 11

Thank you for listening!

Questions...?

Sonja Kenari nat14sta@student.lu.se 2020-05-29