SLIDE 8 Information Extraction in the Medical Domain
Models learned on newswire data do not adapt well to the medical domain. Different vocabulary, sentence and document structure. More importantly, the medical domain offers a chance to do better than the
general newswire domain.
Background Knowledge: Narrow domain; a lot of manually curated KB
resources that can be used to help identification & disambiguation.
UMLS: A large biomedical KB, with semantic types and relationships between
concepts.
Mesh: A large thesaurus of medical vocabulary. SNOMED CT: A comprehensive clinical terminology.
Structure: Medical Text has more structure that can be exploited.
Discourse structure: Concepts in the section “Principal Diagnosis” are more likely
to be “medical problems”.
HERs have some internal structure: Doctors, One Patient, Family Members.
8