form and meaning in complex medical terms Evidence from clinical - - PowerPoint PPT Presentation

form and meaning
SMART_READER_LITE
LIVE PREVIEW

form and meaning in complex medical terms Evidence from clinical - - PowerPoint PPT Presentation

The interplay of form and meaning in complex medical terms Evidence from clinical Dutch Leonie Grn, Ann Bertels & Kris Heylen LAW-MWE-CxG-2018, 26 August 2018, Santa Fe Specialized terminologies are dominated by Multi-Word Expressions


slide-1
SLIDE 1

The interplay of in complex medical terms

form and meaning

Leonie Grön, Ann Bertels & Kris Heylen

LAW-MWE-CxG-2018, 26 August 2018, Santa Fe

Evidence from clinical Dutch

slide-2
SLIDE 2

Specialized terminologies are dominated by Multi-Word Expressions

(cf. e.g. Daille, 1994; De Hertog & Heylen, 2012)

slide-3
SLIDE 3

Specialized terminologies are dominated by Multi-Word Expressions

(cf. e.g. Daille, 1994; De Hertog & Heylen, 2012)

slide-4
SLIDE 4

Specialized terminologies are dominated by Multi-Word Expressions

(cf. e.g. Daille, 1994; De Hertog & Heylen, 2012)

slide-5
SLIDE 5

Specialized terminologies are dominated by Multi-Word Expressions

(cf. e.g. Daille, 1994; De Hertog & Heylen, 2012)

slide-6
SLIDE 6
slide-7
SLIDE 7

property_of

slide-8
SLIDE 8
slide-9
SLIDE 9

site abdomen, head severity mild, moderate

finding

site abdomen, head device used catheter

procedure

slide-10
SLIDE 10
  • besity

abdominale obesitas ‘abdominal obesity’ abdomen obees ‘abdomen obese’

  • besitas ter hoogte van

abdomen ‘obesitas at the abdomen’

  • besitas abdomen

‘obesitas abdomen’

  • besitas abdominaal

‘obesitas abdominal’

abdomen +

slide-11
SLIDE 11

BUT:

mutual attraction between syntactic & semantic structures specialized information is entrenched in linguistic structures grammatical features can indicate conceptual relations

(cf. Schulze & Römer, 2008; Faber & Léon-Araùz, 2016; ten Hacken, 2015)

slide-12
SLIDE 12

Is there a patterning of the

  • f complex medical terms

surface form

and

conceptual features ?

slide-13
SLIDE 13

ID term 249533007 abdomen obees 249533007 abdominale obesitas 249533007 abd obesitas

corpus of EHRs 14,999 consultations 500 patients

Annotation of medical terms with SNOMED codes

274,082 entities 15,025 unique terms 7,687 concepts annotation of 4,426 consultations 171 patients validation of term- concept associations

slide-14
SLIDE 14

Retrieval of MWEs

SNOMED term

  • besity

diagnostic ultrasonography Dutch variants

  • besitas

adipositas echografie sonografie lexical stems

  • bes

adipo echo sono

findings procedures

Σ 59,731 Σ 63,559

slide-15
SLIDE 15

Included types of MWEs

  • 3
  • 2
  • 1

head noun +1 +2 +3

  • chtend

‘morning’ hypo ‘hypoglycemia’ matinale ‘matinal’ hypo ‘hypoglycemia’ hypo ‘hypoglycemia’ met ‘with forse strong convulsie seizure’

slide-16
SLIDE 16

Included types of MWEs

  • 3
  • 2
  • 1

head noun +1 +2 +3

  • chtend

‘morning’ hypo ‘hypoglycemia’ matinale ‘matinal’ hypo hypo met ‘with forse strong convulsie seizure’

compounds

slide-17
SLIDE 17

Included types of MWEs

  • 3
  • 2
  • 1

head noun +1 +2 +3

  • chtend

‘morning’ hypo ‘hypoglycemia’ matinale ‘matinal’ hypo hypo met ‘with forse strong convulsie seizure’

pre-modified noun phrases

slide-18
SLIDE 18

Included types of MWEs

  • 3
  • 2
  • 1

head noun +1 +2 +3

  • chtend

‘morning’ hypo ‘hypoglycemia’ matinale ‘matinal’ hypo hypo met ‘with forse strong convulsie seizure’

post-modified noun phrases

slide-19
SLIDE 19

Annotation of MWEs at 2 levels:

formal conceptual

Penn Tagset for biomedical text SNOMED Semantic classes & attributes

(de Castilho et al., 2016; Warner et al., 2012; SNOMED International, 2018)

slide-20
SLIDE 20

formal JJ NN diabetische ‘diabetic retinopathie retinopathy’ conceptual CAUSE FINDING

slide-21
SLIDE 21

formal NN NN injectie ‘injection [of] insuline insulin’ conceptual PROCEDURE SUBSTANCE

slide-22
SLIDE 22

Analysis at phrase level: Influence of the semantic type of the headword ~ proportion of phrase types ~ degree of lexicalization ?

slide-23
SLIDE 23

0% 20% 40% 60% 80% 100%

compounds pre-modified NPs post-modified NPs Distribution of phrase types

findings procedures

slide-24
SLIDE 24

Average number of unique expressions per concept across different phrase types

compounds pre-modified NPs post-modified NPs 2.57 3.69 3.38 1.33 3.63 2.83

findings procedures

slide-25
SLIDE 25

Analysis at token level: Patterning of concept combinations ? ~ grammatical structures

slide-26
SLIDE 26

Associate expressions with overlapping tag sequences with grammatico-semantic patterns rx thorax ‘x-ray [of the] chest’ CT schedel ‘CT [of the] skull’

NN, NN PROCEDURE, SITE

( )

abdominale injectie ‘abdominal injection’

JJ, NN SITE, PROCEDURE

( )

slide-27
SLIDE 27

absolute frequency of the concept combination frequency of the grammatico-semantic pattern how dominant is a construction to express a combination of concepts

slide-28
SLIDE 28

combined with PoS sequence example relative frequency CAUSE JJ, NN alimentaire obesitas ‘alimentary obesity’ 0.90 COURSE RB, NN vaak hypo ‘frequently hypoglycemia’ 0.35 SEVERITY JJ, NN morbiede obesitas ‘morbid obesity’ 0.83

Top patterns for findings

slide-29
SLIDE 29

combined with PoS sequence example relative frequency CAUSE JJ, NN alimentaire obesitas ‘alimentary obesity’ 0.90 COURSE RB, NN vaak hypo ‘frequently hypoglycemia’ 0.35 SEVERITY JJ, NN morbiede obesitas ‘morbid obesity’ 0.83

Top patterns for findings

slide-30
SLIDE 30

combined with PoS sequence example relative frequency COMPONENT NNS, NN lipidenmeting ‘measurement of lipids’ 0.44 COMPONENT, PROPERTY JJ, NNS, NN gunstig lipidenprofiel ‘good lipid profile’ 0.71 COMPONENT, TIME NN, NN, NNS glycemiedagprofielen ‘glycemic day profiles’ 0.72

Top patterns for procedures

slide-31
SLIDE 31

combined with PoS sequence example relative frequency COMPONENT NNS, NN lipidenmeting ‘measurement of lipids’ 0.44 COMPONENT, PROPERTY JJ, NNS, NN gunstig lipidenprofiel ‘good lipid profile’ 0.71 COMPONENT, TIME NN, NN, NNS glycemiedagprofielen ‘glycemic day profiles’ 0.72

Top patterns for procedures

slide-32
SLIDE 32

conceptual composition ~ formal structure of medical MWEs

findings

~ pre-modified NPs

procedures

~ nominal compounds

slide-33
SLIDE 33

combined with adjective noun SEVERITY extreem ‘extreme’ *extremiteit ‘extremity’ SUBSTANCE – insuline ‘insulin’

  • ne reason: lexical gaps

finding procedure adj + noun

extreme obesitas ‘extreme obesity’

noun + noun

insulineinjectie ‘insulin injection’

slide-34
SLIDE 34

combined with adjective noun SITE renaal ‘renal’ nier ‘kidney’

BUT: tendency is robust across concept combinations! finding procedure adj + noun

renale insufficiëntie ‘renal insufficiency’

noun + noun

nierecho ‘kidney echography’ echo nier ‘echography [of the] kidney’

slide-35
SLIDE 35

structural reductions ~ fixed concept combinations

combined with full prepositional phrase reduced prepositional phrase COMPONENT meting van de lipiden ‘measurement of lipids’ meting lipiden ‘measurement lipids’ SITE rx van de thorax ‘x-ray of the thorax’ rx thorax ‘x-ray thorax’ SITE lipodistrofie ter hoogte van het abdomen ‘lipodystrophy at the abdomen’ lipodistrofie abdomen ‘lipodystrophy abdomen’ CAUSE nefropathie ten gevolge van diabetes ‘nephropathy due to diabetes’ *nephropathie diabetes ‘nephropathy diabetes’

finding procedure finding procedure

slide-36
SLIDE 36

complex medical terms fixed concept constellations constructions take on communicative value in themselves habitual formal constructions

slide-37
SLIDE 37

identification and segmentation of MWEs semantic classification and relation extraction benefit for clinical NLP

slide-38
SLIDE 38

Thank you for your attention!

Questions? Suggestions?

leonie.gron@kuleuven.be

38

slide-39
SLIDE 39

References

de Castilho, R. E., Mujdricza-Maydt, E., Yimam, S. M., Hartmann, S., Gurevych, I., Frank, A., & Biemann, C. (2016). A Web-based Tool for the Integrated Annotation of Semantic and Syntactic Structures. In Proceedings of the LT4DH workshop at COLING 2016 (pp. 76–84). Osaka. Daille, B. (1994). Study and Implementation of Combined Techniques for Automatic Extraction of Terminology. In The Balancing Act: Combining Symbolic and Statistical Approaches to Language. Workshop at the 32nd Annual Meeting of the Association for Computational Linguistics (pp. 29–36). Stroudsburg: Association for Computational Linguistics. De Hertog, D., & Heylen, K. (2012). The Prevalence of Multiword Term Candidates in a Legal

  • Corpus. In G. Aguado de Cea (Ed.), Proceedings of the 10th Terminology and

Knowledge Engineering Conference (TKE2012): New Frontiers in the Constructive Symbiosis of Terminology and Knowledge Engineering (pp. 283–290). Madrid: Universidad Politecnica de Madrid.

slide-40
SLIDE 40

References

Faber, P., & León-Araùz, P. (2016). Specialized Knowledge Representation and the Parameterization of Context. Frontiers in Psychology, 7(February). http://doi.org/10.3389/fpsyg.2016.00196 Warner, C., Lanfranchi, A., O’Gorman, T., Howard, A., Gould, K., & Regan, M. (2012). Bracketing Biomedical Text: An Addendum to Penn Treebank II Guidelines. Retrieved May 14, 2018, from https://clear.colorado.edu/compsem/documents/treebank_guidelines.pdf Schulze, R., & Römer, U. (2008). Introduction. Patterns, Meaningful Units and Specialized

  • Discourses. International Journal of Corpus Linguistics, 13(3), 265–270.

http://doi.org/10.1075/ijcl.13.3.01sch SNOMED CT Editorial Guide. (2018). Retrieved May 14, 2018, from https://confluence.ihtsdotools.org/display/DOCEG/SNOMED+CT+Editorial+Guide

slide-41
SLIDE 41

References

Icons from the Noun Project created by Ben Davis Cengis SARI Drishya Ken Murray Melvin https://thenounproject.com/