Effective Feature Representation for Clinical Text Concept - - PowerPoint PPT Presentation

effective feature representation for clinical text
SMART_READER_LITE
LIVE PREVIEW

Effective Feature Representation for Clinical Text Concept - - PowerPoint PPT Presentation

Effective Feature Representation for Clinical Text Concept Extraction Yifeng Tao 1,2 , Bruno Godefroy 1 , Guillaume Genthial 1 , Christopher Potts 1,3,* 1 Roam Analytics 2 Carnegie Mellon University 3 Stanford University Yifeng Tao et al. NAACL


slide-1
SLIDE 1

Effective Feature Representation for Clinical Text Concept Extraction

Yifeng Tao1,2, Bruno Godefroy1, Guillaume Genthial1, Christopher Potts1,3,*

1Roam Analytics 2Carnegie Mellon University 3Stanford University

NAACL Clinical NLP 2019 1 Yifeng Tao et al.

slide-2
SLIDE 2

Background: Healthcare Text Datasets

  • Crucial information of healthcare recorded only in free-form text

Yifeng Tao et al. NAACL Clinical NLP 2019 2

FDA Drug Labels Crowdsourcing Maximization Expectation Init End convergence Expert annotation Drug-Disease Relations Dataset

Commercial

Drug-Disease Relations

Clinical

Diagnosis Detection Prescription Reasons

Scientific

Chemical-Disease Relations

Social

Penn Adverse Drug Reactions

[Figures from: 1. Lamjed Ben Jabeur et al. Uprising microblogs: A Bayesian network retrieval model for tweet search. 2012, 2. https://www.sjm.com.br/utilidades/pubmed-busca, 3. http://anakin.uta.cloud/uncategorized/the-need-for-drug-donations, 4. https://www.autismawareness.com.au/news-events/aupdate/is-there-an-over-diagnosis-of-autism]

slide-3
SLIDE 3

Background: Healthcare Text Datasets

  • Clinical text datasets are scarce and expensive
  • Privacy considerations
  • Domain specialists

Yifeng Tao et al. NAACL Clinical NLP 2019 3

2000 4000 6000 8000 10000

Diagnosis Detection Prescription Reasons Penn Adverse Drug Reactions Chemical-Disease Relations Drug-Disease Relations

# texts

slide-4
SLIDE 4

Task: Clinical Text Annotation

Yifeng Tao et al. NAACL Clinical NLP 2019 4

Diagnosis Detection Prescription Reasons Penn Adverse Drug Reactions (ADR) Chemical–Disease Relations (CDR) Drug–Disease Relations Asymptomatic bacteriuria , could be neurogenic bladder disorder . I will go ahead and place him on Clarinex for his seasonal allergic rhinitis . #TwoThingsThatDontMixWell venlafaxine and alcohol - you’ll cry and throw chairs at your mom’s BBQ . Ocular and auditory toxicity in hemodialyzed patients receiving desferrioxamine . Indicated for the management of active rheumatoid arthritis and should not be used for rheumatoid arthritis in pregnant women .

POSITIVE CONCERN REASON PRESCRIBED ADR ADR DISEASE DRUG TREATS CONTRA

slide-5
SLIDE 5
  • LSTM-CRF
  • General text
  • Distributed word embeddings

Previous Models

Yifeng Tao et al. NAACL Clinical NLP 2019 5 Stop Soma for cost

OTHER DISCONTINUED REASON REASON

CRF sparse features

  • HB-CRF
  • Clinical text
  • Sparse hand-built features

CRF CRF CRF sparse features sparse features sparse features

Stop Soma for cost

OTHER DISCONTINUED REASON REASON

CRF LSTM word embedding CRF CRF CRF LSTM LSTM LSTM word embedding word embedding word embedding

slide-6
SLIDE 6

Model: ELMo-LSTM-CRF-HB

  • Dense ELMo word embeddings + Sparse hand-built features

Yifeng Tao et al. NAACL Clinical NLP 2019 6

Stop Soma for cost

LSTM sparse features DISCONTINUED OTHER REASON REASON ELMo sparse features ELMo ELMo ELMo dense features dense features dense features sparse features dense features sparse features LSTM LSTM LSTM

CRF CRF CRF CRF

slide-7
SLIDE 7

Performance: Per-token Macro-F1 Scores

*: p<0.05, **: p<0.01, ***: p<0.001

Yifeng Tao et al. NAACL Clinical NLP 2019 7

*** *** * *** **

40 50 60 70 80 90 Diagnosis Detection Prescription Reasons Penn Adverse Drug Reactions Chemical-Disease Relations Drug-Disease Relations

F1 Score rand-LSTM-CRF HB-CRF ELMo-LSTM-CRF ELMo-LSTM-CRF-HB

  • Hyperparameters tuned through cross-validation
  • Each experiment repeated for five times
slide-8
SLIDE 8

The Role of Text Length

  • LSTM: handles short texts well
  • HB-CRF: robust on long texts

Yifeng Tao et al. NAACL Clinical NLP 2019 8

slide-9
SLIDE 9

Stop Soma for cost

LSTM sparse features DISCONTINUED OTHER REASON REASON ELMo sparse features ELMo ELMo ELMo dense features dense features dense features sparse features dense features sparse features LSTM LSTM LSTM

CRF Potential Scores

  • LSTM features always more

important

  • HB features make substantial

contribution

Yifeng Tao et al. NAACL Clinical NLP 2019 9

slide-10
SLIDE 10

Major Improvements in Minor Categories

Yifeng Tao et al. NAACL Clinical NLP 2019 10

Label/Category (Support)

2 4 6 8 10 40 50 60 70 80 90 100

OTHER (74888) POSITIVE (24489) RULED-OUT (2797) CONCERN (2780)

Imrpovement (%) F1 score (%)

Diagnosis Detection

2 4 6 8 10 40 50 60 70 80 90 100

OTHER (83618) REASON (9114) PRESCRIBED (5967) DISCONTINUED (2754)

Improvement (%) F1 score (%)

Prescription Reasons

20 40 60 80 100 120 40 50 60 70 80 90 100

OTHER (10634) TREATS (3671) UNRELATED (1145) PREVENTS (320)

Improvement (%) F1 score (%)

Drug-Disease Relations

1 2 3 4 5 6 7 8 9 10 40 50 60 70 80 90 100

OTHER (104530) DISEASE (6887) CHEMICAL (6270)

Improvement (%) F1 score (%)

Chemical-Disease Relations

slide-11
SLIDE 11

Conclusion

  • A unified feature representation for clinical text sequence labeling
  • Sparse, ontology-driven features
  • Dense LSTM features
  • Best performance on five distinct healthcare datasets
  • Takes advantages of both feature types
  • Makes maximal use of small, expensive, domain-specific healthcare texts
  • A new labeled clinical dataset
  • Identifies the treatment relations between drugs and diseases
  • Extensive analysis to identify what information our model makes use
  • f, and why its performance is consistently improved

Yifeng Tao et al. NAACL Clinical NLP 2019 11

slide-12
SLIDE 12

Acknowledgement

  • Roam Analytics
  • Christopher Potts
  • Bruno Godefroy
  • Guillaume Genthial
  • Kevin Reschke
  • NLP Group

NAACL Clinical NLP 2019 12 Yifeng Tao et al.

slide-13
SLIDE 13

Penn Adverse Drug Reactions (ADR) Results

  • The Role of Text Length
  • Major Improvements in Minor Categories

Yifeng Tao et al. NAACL Clinical NLP 2019 13

50 100 150 200 30 40 50 60 70 80 90 100

OTHER (5023) ADR (283) INDICATION (29)

Improvement (%) F1 score (%)

Penn Adverse Drug Reactions

F1 score (%) Improvement (%)

slide-14
SLIDE 14

Example of Hand-built Features

Yifeng Tao et al. NAACL Clinical NLP 2019 14

slide-15
SLIDE 15

Procedure for Building Drug-Disease Relations Dataset

Yifeng Tao et al. NAACL Clinical NLP 2019 15 FDA Drug Labels Crowdsourcing Maximization Expectation Init End convergence Expert annotation Drug-Disease Relations Dataset

slide-16
SLIDE 16

Statistics of Datasets

Yifeng Tao et al. NAACL Clinical NLP 2019 16

slide-17
SLIDE 17

Hyperparameters of Experiments

Yifeng Tao et al. NAACL Clinical NLP 2019 17