Health Search
From Consumers to Clinicians
Slides available at
https://ielab.io/russir2018-health-search- tutorial/
Guido Zuccon
Queensland University of Technology
@guidozuc
Health Search From Consumers to Clinicians Slides available at - - PowerPoint PPT Presentation
Health Search From Consumers to Clinicians Slides available at https://ielab.io/russir2018-health-search- tutorial/ Guido Zuccon Queensland University of Technology @guidozuc Knowledge based vs data-driven Query Expansion Knowledge
Slides available at
Guido Zuccon
Queensland University of Technology
@guidozuc
Knowledge based query expansion Corpus/Data Driven
Multi-evidence Co-
Latent methods & Word2vec Subsumption Concept relationships Inference
Combine documents that refer to the same case [Zhu&Carterette, 2012; Limsopatham et al., 2013b] Different, diverse corpora used for query expansion [Zhu&Carterette, 2012 b; Zhu et al., 2014] Measure the usefulness of different collections [Limsopatham et al., 2015] …
[Zhu&Carterette, 2012]
for one case (e.g. with health records, where case=patient)
visits reports
indexing merging III merging I
visits ranking II visits ranking I
retrieving
reports ranking
merging II retrieving indexing
visits ranking III RbM VRM baseline/MRF/MRM models ICD, NEG MbR
Fused into new ranking
[Limsopatham et al., 2013b]
use, depending on query
medical concepts in query
different collections to derive query expansions
(166K), ClueWeb09B (44M), TREC Medical Records (100K)
expansion
benefits from auxiliary collections
curation
[Zhu et al., 2014]
expansion evidence
(e.g. MEDLINE abstracts) to generic (e.g. blogs and webpages)
extent to generate query expansion terms
[Limsopatham et al., 2015]
Martinez et al., 2014]
kordan&Kotov, 2016]
et al., 2015, b; Nguyen et al., 2017]
document
expansion
and not taxonomic (eg., disease has associated anatomic site).
sources (KBs, PRF) should have different weights
2005]
features of KB graphs, and statistics of concepts in collection
CDS 2015
cancer
p(cancer|d)
headache
p(headache|d)
carcinoma
p(carcinoma|d)
chemotherapy
p(chemotherapy |d)
seizures
p(seizures|d)
p(cancer|headache) p(cancer|carcinoma) p(cancer|seizures) p(cancer| chemotherapy)
pt(w|d) = X
u2d
pt(w|u)p(u|d) (
use Word Embeddings for computing this
[Zuccon et al., 2015, b]
cancer
p(cancer|d)
headache
p(headache|d)
carcinoma
p(carcinoma|d)
chemotherapy
p(chemotherapy |d)
seizures
p(seizures|d)
p(cancer|headache) p(cancer|carcinoma) p(cancer|seizures) p(cancer| chemotherapy)
pt(w|d) = X
u2d
pt(w|u)p(u|d) (
p(cancer|cancer): self-translation probability
use Word Embeddings for computing this
[Zuccon et al., 2015, b]
constrained by relations in KB (UMLS)
embeddings
word concepts
Skipped
approach (akin to doc2vec) to create embedding that captures latent relations from concepts and terms in text.
top documents
concepts from top documents to produce expansions
[Nguyen et al., 2017]: optimises document representation for medical content
Skipped
[Soldaini&Goharian, 2017]: compares 5 LTR in CHS context:
AdaRank, ListNet
UMLS (26), latent semantic analysis (2), word embeddings (4).
Skipped
“denies fever” “no fracture” “mother had breast cancer”
“denies fever” “no fracture” “mother had breast cancer”
NegEx/ConText [Harkema et al., 2009]: Algorithm for extracting negated content
“denies fever” “no fracture” “mother had breast cancer”
NegEx/ConText [Harkema et al., 2009]: Algorithm for extracting negated content
separately [Limsopatham et al., 2012]
P: Patient/Problem (P) (e.g., males aged 20-50) I: Intervention (e.g., weight loss drug) C: Comparison (e.g., controlled exercise regime) O: Outcome (e.g., weight loss)
for retrieval
PICO annotations
RobotReviewer [Marshall et al., 2015]: Algorithm for extracting PICO elements from free-text
documents that clinicians would understand
understandable and relevant
readability measures and medical lexical aspects
Enter your search terms at http://chs.ielab.webfactional.com/
19
Skipped
Symptom Group Crowdsourced Circumlocutory Queries alopecia baldness in multiple spots, circular bald spots, loss of hair
angular cheilitis broken lips, dry cracked lips, lip sores, sores around mouth edema fluid in leg, puffy sore calf, swollen legs exophthalmos bulging eye, eye balls coming out, swollen eye, swollen eye balls hematoma hand turned dark blue, neck hematoma, large purple bruise on arm jaundice yellow eyes, eye illness, white part of the eye turned green psoriasis red dry skin, dry irritated skin on scalp, silvery-white scalp + inner ear urticaria hives all over body, skin rash on chest, extreme red rash
20
[Stanton et al., 2014]
Skipped
[Zuccon et al., 2015]
[Zuccon et al., 2015]
P@5 P@10 0.00 0.25 0.50 0.75 1.00
Performance
Any relevant
system Bing Google 22
[Zuccon et al., 2015]
Skipped
P@5 P@10 0.00 0.25 0.50 0.75 1.00
Performance
Any relevant
P@5 P@10 0.00 0.25 0.50 0.75 1.00
Performance
Only highly rele
Only highly relevant
system Bing Google 22
[Zuccon et al., 2015]
Skipped
P@5 P@10 0.00 0.25 0.50 0.75 1.00
Performance
Any relevant
P@5 P@10 0.00 0.25 0.50 0.75 1.00
Performance
Only highly rele
Only highly relevant
system Bing Google
exophthalmos: “eye balls coming out” “swollen eye”
22
[Zuccon et al., 2015]
Skipped
[Zeng et al, 2006]: recommend queries based on UMLS and query log (CHS task)
Skipped
[Soldaini et al., 2015]: compares the effectiveness of 7 query reformulation techniques (CDS task)
with no mapping to any UMLS concepts
associated Wikipedia page P being health-related over being not-health-related. Retain only query terms with ratio ≥ 2.
SVMrank to select query terms.
found, UMLS sem-types found, HT ratio, and MeSH found.
[Soldaini et al., 2015]: compares the effectiveness of 7 query reformulation techniques (CDS task)
preferred terms UMLS query concepts to expand original query
10 initial results, rank and add top 20 terms not in original query.
expansion terms filtered health term ratio
[Soldaini et al., 2017]: considers short clinical notes as queries (CDS task)
Relevance Ratio (WRR) of candidate terms: importance
features over multiple collections, syntactical and semantical features
similarly
[Soldaini et al., 2016]: add the most appropriate expert expression to queries submitted by users
MedSyn, and DBpedia
health-related Wikipedia pages, using logistic regression classifier
(behavioural KB best)
clicked HON-certified websites
records, CDS task) using generic & domain-specific methods
terms retained)
queries significantly more effective
from narrative
Skipped
[Soldaini et al., 2017 b]: use convolutional neural networks (CNN) to reduce queries (CDS task)
document:
documents
higher than relevant document
Skipped
[Scells&Zuccon, 2018]: through a chain of transformation, generates better (Boolean) queries (for systematic reviews compilation)
to rank
q c1τ1 c1τ2 c1τ3 c5τ1 c5τ2 c5τ3 c7τ1 c7τ2 c7τ3 ˆ q
A rewritten query
ascertain how difficult queries are — estimates query variability and specificity
MeSH
(V=variation) in systematic reviews compilation
MeSH-QD(Q, T ) = X
t2Q term variability
z }| { d f(t) P
t02V (t)
d f(t0) · ln ⇣ 1 + N d f(t) ⌘ ·
term generality
z }| { depth(t) length(t)
Skipped
standard query types [Ely et al., 2000]
i.searching for diagnoses given a list of symptoms; ii.searching for relevant tests given a patient’s situation iii.searching for effective treatments given a particular condition.
Skipped
concepts essential for the information need of a medical search task” [Limsopatham et al., 2013]
[Koopman et al., 2017 b]
Skipped
concepts essential for the information need of a medical search task” [Limsopatham et al., 2013]
[Koopman et al., 2017 b]
Skipped
concepts essential for the information need of a medical search task” [Limsopatham et al., 2013]
[Koopman et al., 2017 b]
Skipped
concepts essential for the information need of a medical search task” [Limsopatham et al., 2013]
[Koopman et al., 2017 b]
Field-based inverted file index Task extraction Diagnoses Tests Treatments Medical articles Annotated medical articles Task-oriented indexing of articles Task-oriented retrieval
Significant concept estimationUser Interface Clinician searcher
Indexing Retrieval
Skipped
concepts essential for the information need of a medical search task” [Limsopatham et al., 2013]
[Koopman et al., 2017 b]
Field-based inverted file index Task extraction Diagnoses Tests Treatments Medical articles Annotated medical articles Task-oriented indexing of articles Task-oriented retrieval
Significant concept estimationUser Interface Clinician searcher
Indexing Retrieval
Skipped
Skipped
quality is influenced by medical expertise
clinicians
(but retrieval models optimised for short queries)
keywords most likely to appear in relevant documents
rather than diagnoses (but influenced by task: searching for clinical trials)
beyond
(and beyond)
documents most demanding
and secondary diseases of diabetes and or hypertension”
receive endoscopy”
[Palotti et al., 2016 c] + [Tamine&Chouquete, 2017] + [Koopman&Zuccon, 2014]:
relevance
truth doesn’t exist” -> “variability of system rankings with respect to the level
Skipped
measures as proxy) or computed as a function of understandability label
relevance
uRB
(1 − β) PK
k=1 βk−1P(T|k)P(U|k)
0.30 0.35 0.40 0.15 0.20 0.25 0.30 0.35 RBP uRBP
systems equivalent with RBP, different with uRBP systems equivalent with uRBP, different with RBP
[Zuccon, 2016]
Skipped
retrieved documents w.r.t. their ideal rank (by relevance or credibility).
(interpolation, harmonic mean)
Skipped
Task Dataset
Matching patient to clinical trials
2016] Consumer Health Search
et al., 2016]
Evidence-based Medicine & Clinical Decision Support (CDS)
Compilation of systematic reviews
[Kanoulas et al., 2017] Image Retrieval ImageCLEF [Muller et al., 2010] Identifying concepts from free- text
hoc, passage retrieval, entity-based QA, text annotation/ categorisation
Preprocessing&Indexing:
removal)
filtering)
filtering Query Expansion:
interactive methods for expansion terms
UMLS, Entrez Gene, MeSH, HUGO, MetaMAP etc.
Document retrieval:
JelinekMercer smoothing, KLdivergence
ensemble of standard algorithms
[Hersh&Bhupatiraju, 2003; Hersh, 2005; Hersh et al., 2006]
Skipped
Results are affected by 4 main factors:
forms
look-up Specific to passage retrieval:
paragraphs and a sentence, using these algorithms)
Skipped
up of one or more reports
number of relevant documents [Voorhees&Hersh, 2012; Voorhees, 2013]
Samuel J. Smith 1234567-8 4/5/2006 HISTORY OF PRESENT ILLNESS: Mr. Smith is a 63-year-old gentleman with coronary artery disease, hypertension, hypercholesterolemia, COPD and tobacco abuse. He reports doing
having more trouble with his sinuses. I had started him on Flonase back in December. He says this has not really helped. Over the past couple weeks he has had significant congestion and thick discharge. No fevers or headaches but does have diffuse upper right-sided teeth pain. He denies any chest pains, palpitations, PND, orthopnea, edema or syncope. His breathing is doing fine. No cough. He continues to smoke about half-a-pack per day. He plans on trying the patches again. CURRENT MEDICATIONS: Updated on CIS. They include aspirin, atenolol, Lipitor, Advair, Spiriva, albuterol and will add Singulair today. ALLERGIES: Sulfa caused a rash. SOCIAL HISTORY: Smokes as above. REVIEW OF SYSTEMS: CONSTITUTIONAL: Weight stable. GI: No abdominal pain or change in bowel habits. PHYSICAL EXAMINATION: VITAL SIGNS: Weight is 217 lbs, blood pressure 131/61, pulse 63. HEENT: TMs clear bilaterally, mild maxillary sinus tenderness on the right, nasal mucosa boggy with moderate discharge, teeth in good repair with no erythema or swelling LUNGS: Clear, even with forced expiration.
Topics 136: Children with dental caries 137: Patients with inflammatory disorders receiving TNF-inhibitor treatment 152: Patients with Diabetes exhibiting good Hemoglobin A1c Control (<8.0%) 160: Adults under age 60 undergoing alcohol withdrawal
(in 2017 evolved into the Precision Medicine Track)
~733K articles in 2014&2015, 1.5M in 2016
[Simpson et al, 2014; Roberts et al., 2015]
Topic Type Description 1 Diagnosis A 58-year-old African-American woman presents to the ER with episodic pressing/burning anterior chest pain that began two days earlier for the first time in her life. The pain started while she was walking, radiates to the back, and is accompanied by nausea, diaphoresis and mild dyspnea, but is not increased
hypertension and obesity. She denies smoking, diabetes, hypercholesterolemia, or a family history of heart
changes. 11 Test A 40-year-old woman with no past medical history presents to the ER with excruciating pain in her right arm that had started 1 hour prior to her admission. She denies trauma. On examination she is pale and in moderate discomfort, as well as tachypneic and tachycardic. Her body temperature is normal and her blood pressure is 80/60. Her right arm has no discoloration or movement limitation. 21 Treatment A 21-year-old female is evaluated for progressive arthralgias and malaise. On examination she is found to have alopecia, a rash mainly distributed on the bridge of her nose and her cheeks, a delicate non-palpable purpura on her calves, and swelling and tenderness of her wrists and ankles. Her lab shows normocytic anemia, thrombocytopenia, a 4/4 positive ANA and anti-dsDNA. Her urine is positive for protein and RBC casts.
trials
most effective treatments
techniques ineffective for patient) [Roberts et al., 2017]
reliable&unreliable health websites
personalisation, query variations, multilingual queries
[Zuccon et al., 2016]
discharge summary (aims to simulate layperson wanting to know more about term)
topic description derived from Reddit AskADoctor
(6x50=300)
headaches relieved by blood donation headaches caused by too much blood or "high blood pressure" high iron headache headache that only goes away with blood loss blood donation headache reduction what causes strong headaches at base of skull, stops with blood donation
differently
(abstract level) of conducting Diagnostic Test Accuracy systematic reviews
possible,
at which to stop in the result list
[Kanoulas et al., 2017]
Topic: CD009551 Title: Polymerase chain reaction blood tests for the diagnosis of invasive aspergillosis in immunocompromised people Query: exp Aspergillosis/ exp Pulmonary Aspergillosis/ exp Aspergillus/ (aspergillosis or aspergillus or aspergilloma or "A.fumigatus" or "A. flavus" or "A. clavatus" or "A. terreus" or "A. niger").ti,ab.
exp Nucleic Acid Amplification Techniques/ pcr.ti,ab. "polymerase chain reaction*".ti,ab.
5 and 9 exp Animals/ not Humans/ 10 not 11 Pmid’s: 25815649 26065322 ...
Boolean query in Ovid format Title of the Systematic Review Articles retrieved by the boolean query
(Chinese, English, Japanese)
made in the query
retrieval, image annotation, modality detection, caption prediction, etc
Skipped
expected number of trials
(1) retrieval for screening; (2) screening prioritisation; (3) stopping point
(TREC Medical Records [Edinger et al., 2012])
representations and lexical mismatches
contained a non-relevant reference to the topic terms
they used a synonym for a topic term
terminology between conditions or procedures (hearing loss vs hearing aid)
(TREC CDS [Roberts et al., 2016], analysing 2014 results)
key importance: can easily become a red herring
important, but best systems did not use them If negation extraction, soft-matching strategy best
Treatment, and Test (fundamental mismatch b/w irrelevant articles and clinical important attributes)
machine learning classifiers
[Karimi et al., 2018] provides platform to facilitate experimentation and hypothesis testing
detection/removal, LTR
Desc and Sum, but not Note
can outperform all these
for large scale evaluation
require personalisation, context understanding, better user understanding
https://ielab.io/russir2018-health-search-tutorial/
search-tutorial
Perspective”
evaluation)
for start in 2019
Avg temp in Kazan 4 degrees C
http://ielab.io/
SIGIR 2018 tutorial developed together with Dr Bevan Koopman (AEHRC, CSIRO)
Scells and Jimmy have provided comments and support when developing parts of this tutorial
for the Student Volunteers for their friendly help and assistance
68
Guido Zuccon
Queensland University of Technology
@guidozuc