Effective Feature Representation for Clinical Text Concept - - PowerPoint PPT Presentation

▶

Nov 03, 2023 128 likes •307 views

Effective Feature Representation for Clinical Text Concept Extraction Yifeng Tao 1,2 , Bruno Godefroy 1 , Guillaume Genthial 1 , Christopher Potts 1,3,* 1 Roam Analytics 2 Carnegie Mellon University 3 Stanford University Yifeng Tao et al. NAACL

SLIDE 1

Effective Feature Representation for Clinical Text Concept Extraction

Yifeng Tao1,2, Bruno Godefroy1, Guillaume Genthial1, Christopher Potts1,3,*

1Roam Analytics 2Carnegie Mellon University 3Stanford University

NAACL Clinical NLP 2019 1 Yifeng Tao et al.

SLIDE 2

Background: Healthcare Text Datasets

Crucial information of healthcare recorded only in free-form text

Yifeng Tao et al. NAACL Clinical NLP 2019 2

FDA Drug Labels Crowdsourcing Maximization Expectation Init End convergence Expert annotation Drug-Disease Relations Dataset

Commercial

Drug-Disease Relations

Clinical

Diagnosis Detection Prescription Reasons

Scientific

Chemical-Disease Relations

Social

Penn Adverse Drug Reactions

[Figures from: 1. Lamjed Ben Jabeur et al. Uprising microblogs: A Bayesian network retrieval model for tweet search. 2012, 2. https://www.sjm.com.br/utilidades/pubmed-busca, 3. http://anakin.uta.cloud/uncategorized/the-need-for-drug-donations, 4. https://www.autismawareness.com.au/news-events/aupdate/is-there-an-over-diagnosis-of-autism]

SLIDE 3

Background: Healthcare Text Datasets

Clinical text datasets are scarce and expensive
Privacy considerations
Domain specialists

Yifeng Tao et al. NAACL Clinical NLP 2019 3

2000 4000 6000 8000 10000

Diagnosis Detection Prescription Reasons Penn Adverse Drug Reactions Chemical-Disease Relations Drug-Disease Relations

# texts

SLIDE 4

Task: Clinical Text Annotation

Yifeng Tao et al. NAACL Clinical NLP 2019 4

Diagnosis Detection Prescription Reasons Penn Adverse Drug Reactions (ADR) Chemical–Disease Relations (CDR) Drug–Disease Relations Asymptomatic bacteriuria , could be neurogenic bladder disorder . I will go ahead and place him on Clarinex for his seasonal allergic rhinitis . #TwoThingsThatDontMixWell venlafaxine and alcohol - you’ll cry and throw chairs at your mom’s BBQ . Ocular and auditory toxicity in hemodialyzed patients receiving desferrioxamine . Indicated for the management of active rheumatoid arthritis and should not be used for rheumatoid arthritis in pregnant women .

POSITIVE CONCERN REASON PRESCRIBED ADR ADR DISEASE DRUG TREATS CONTRA

SLIDE 5

LSTM-CRF
General text
Distributed word embeddings

Previous Models

Yifeng Tao et al. NAACL Clinical NLP 2019 5 Stop Soma for cost

OTHER DISCONTINUED REASON REASON

CRF sparse features

HB-CRF
Clinical text
Sparse hand-built features

CRF CRF CRF sparse features sparse features sparse features

Stop Soma for cost

OTHER DISCONTINUED REASON REASON

CRF LSTM word embedding CRF CRF CRF LSTM LSTM LSTM word embedding word embedding word embedding

SLIDE 6

Model: ELMo-LSTM-CRF-HB

Dense ELMo word embeddings + Sparse hand-built features

Yifeng Tao et al. NAACL Clinical NLP 2019 6

Stop Soma for cost

LSTM sparse features DISCONTINUED OTHER REASON REASON ELMo sparse features ELMo ELMo ELMo dense features dense features dense features sparse features dense features sparse features LSTM LSTM LSTM

CRF CRF CRF CRF

SLIDE 7

Performance: Per-token Macro-F1 Scores

*: p<0.05, **: p<0.01, ***: p<0.001

Yifeng Tao et al. NAACL Clinical NLP 2019 7

*** *** * *** **

40 50 60 70 80 90 Diagnosis Detection Prescription Reasons Penn Adverse Drug Reactions Chemical-Disease Relations Drug-Disease Relations

F1 Score rand-LSTM-CRF HB-CRF ELMo-LSTM-CRF ELMo-LSTM-CRF-HB

Hyperparameters tuned through cross-validation
Each experiment repeated for five times

SLIDE 8

The Role of Text Length

LSTM: handles short texts well
HB-CRF: robust on long texts

Yifeng Tao et al. NAACL Clinical NLP 2019 8

SLIDE 9

Stop Soma for cost

LSTM sparse features DISCONTINUED OTHER REASON REASON ELMo sparse features ELMo ELMo ELMo dense features dense features dense features sparse features dense features sparse features LSTM LSTM LSTM

CRF Potential Scores

LSTM features always more

important

HB features make substantial

contribution

Yifeng Tao et al. NAACL Clinical NLP 2019 9

SLIDE 10

Major Improvements in Minor Categories

Yifeng Tao et al. NAACL Clinical NLP 2019 10

Label/Category (Support)

2 4 6 8 10 40 50 60 70 80 90 100

OTHER (74888) POSITIVE (24489) RULED-OUT (2797) CONCERN (2780)

Imrpovement (%) F1 score (%)

Diagnosis Detection

2 4 6 8 10 40 50 60 70 80 90 100

OTHER (83618) REASON (9114) PRESCRIBED (5967) DISCONTINUED (2754)

Improvement (%) F1 score (%)

Prescription Reasons

20 40 60 80 100 120 40 50 60 70 80 90 100

OTHER (10634) TREATS (3671) UNRELATED (1145) PREVENTS (320)

Improvement (%) F1 score (%)

Drug-Disease Relations

1 2 3 4 5 6 7 8 9 10 40 50 60 70 80 90 100

OTHER (104530) DISEASE (6887) CHEMICAL (6270)

Improvement (%) F1 score (%)

Chemical-Disease Relations

SLIDE 11

Conclusion

A unified feature representation for clinical text sequence labeling
Sparse, ontology-driven features
Dense LSTM features
Best performance on five distinct healthcare datasets
Takes advantages of both feature types
Makes maximal use of small, expensive, domain-specific healthcare texts
A new labeled clinical dataset
Identifies the treatment relations between drugs and diseases
Extensive analysis to identify what information our model makes use
f, and why its performance is consistently improved

Yifeng Tao et al. NAACL Clinical NLP 2019 11

SLIDE 12

Acknowledgement

Roam Analytics
Christopher Potts
Bruno Godefroy
Guillaume Genthial
Kevin Reschke
NLP Group

NAACL Clinical NLP 2019 12 Yifeng Tao et al.

SLIDE 13

Penn Adverse Drug Reactions (ADR) Results

The Role of Text Length
Major Improvements in Minor Categories

Yifeng Tao et al. NAACL Clinical NLP 2019 13

50 100 150 200 30 40 50 60 70 80 90 100

OTHER (5023) ADR (283) INDICATION (29)

Improvement (%) F1 score (%)

Penn Adverse Drug Reactions

F1 score (%) Improvement (%)

SLIDE 14

Example of Hand-built Features

Yifeng Tao et al. NAACL Clinical NLP 2019 14

SLIDE 15

Procedure for Building Drug-Disease Relations Dataset

Yifeng Tao et al. NAACL Clinical NLP 2019 15 FDA Drug Labels Crowdsourcing Maximization Expectation Init End convergence Expert annotation Drug-Disease Relations Dataset

SLIDE 16

Statistics of Datasets

Yifeng Tao et al. NAACL Clinical NLP 2019 16

SLIDE 17

Hyperparameters of Experiments

Yifeng Tao et al. NAACL Clinical NLP 2019 17