SLIDE 1
- ASIMPLEALGORITHMFORIDENTIFYINGABBREVIATION
DEFINITIONSINBIOMEDICALTEXT
- ARIELS.SCHWARTZ
- MARTIA.HEARST
ComputerScienceDivision UniversityofCalifornia,Berkeley Berkeley,CA94720 sariel@cs.berkeley.edu SIMS UniversityofCalifornia,Berkeley Berkeley,CA94720 hearst@sims.berkeley.edu
Abstract
- Thevolumeofbiomedicaltextisgrowingatafastrate,creatingchallengesforhumansand
computer systems alike. One of these challenges arises from the frequent use of novel abbreviationsinthesetexts,thusrequiringthatbiomedicallexicalontologiesbecontinually updated.Inthispaperweshowthattheproblemofidentifyingabbreviations’definitionscan be solved with a much simpler algorithm than that proposedby other research efforts. The algorithmachieves96%precisionand82%recallonastandardtestcollection,whichisatleast as good as existing approaches. It also achieves 95% precision and 82% recall on another, largertestset.Anotableadvantageofthealgorithmisthat,unlikeotherapproaches,itdoesnot requireanytrainingdata.
1 Introduction Therehasbeenanincreasedinterestrecentlyintechniquestoautomaticallyextract informationfrombiomedicaltext,andparticularlyfromMEDLINEabstracts.3,4,7,15 The size and growth rate of biomedical literature creates new challenges for researcherswhoneedtokeepuptodate.Onespecificissueisthehighrateatwhich new abbreviations are introduced in biomedical texts. Existing databases,
- ntologies, and dictionaries must be continually updated with new abbreviations
and their definitions. In an attempt to help resolve the problem, new techniques have been introduced to automatically extract abbreviations and their definitions fromMEDLINEabstracts. In this paper we propose a new, simple, fast algorithm for extraction of abbreviations from biomedical text. The scope of the task addressed here is the same as the one described in Pustejovsky et al.:14 identify <“short form”, “long form”>pairswherethereexistsamapping(ofanykind)fromcharactersintheshort formtocharactersinthelongform.a
- aThroughoutthepaperweusetheterms“shortform”and“longform”interchangeablywith